How to overcome cloud complexity


The cloud’s original promise of simplicity is largely challenged by the facts. Midway through their cloud transformation, organizations find they must manage increasingly complex environments. In a world that has become largely hybrid and multicloud, they have one foot on-premises and another in the cloud. And this while discovering new features offered by providers on platforms over which they do not have control.

According to a study carried out by Forrester Consulting on behalf of Cloudflare, nearly 40% of companies admit to losing control of their IT environments and security. They deplore, among other things, the increase in the number of locations where SaaS applications are deployed or the difficult migration of on-premises assets to the cloud.

The multicloud approach adds to this complexity. According to the Infosys Cloud Radar 2023 study, nearly two-thirds of respondents use three or four cloud providers. An increase of 75% compared to 2021. Head of cloud Southern Europe at Eviden (Atos group), Julien Giraud notes, however, that this complexity differs depending on the companies, the choices they have made, the regulatory or geopolitical constraints to which they are exposed.

“Regulated professions in the fields of health, defense or energy must adopt fine segregation of their sensitive data,” he believes. A multinational that has activities in the United States and Asia must also comply with the regulatory issues specific to each geography. »

Set the right performance indicators

Based on this observation, how can you regain control of your multiple cloud environments? Controlling only what it measures, a company must first of all equip itself well. To this end, Julien Giraud recommends using management tools offered natively by cloud providers rather than building an additional layer of abstraction which will only add complexity to complexity.

To have, this time, a global vision of its cloud resources, a company will equip itself with an observability platform such as New Relic, Dynatrace, Datadog or their open source alternative Prometheus. Dedicated to monitoring and incident management, this type of solution reports a certain number of performance indicators.

But which ones? “Black box” monitoring consists of stimulating user behavior while “white box” monitoring aims to observe the functioning of the servers (disk space, use of the processor (CPU) or memory (RAM).

The four indicators used by Google, known as “Golden Signals”, are:

  • Latency
  • Traffic
  • Resource saturation
  • The error rate

After the tools, the methodologies. Head of tech for the “run” part at Padok, Emmanuel Lilette believes that “agile methods are entirely adaptable to the world of infrastructure. It is possible to reuse industry standards to measure the quality of service provided to users. »

CI/CD, DevSecOps, NoOps, SRE

Applied to software engineering, continuous integration and delivery (CI/CD) makes it possible to automate the design and deployment phases based on code repository platforms such as GitHub or GitLab. This notion is more generally part of the DevSecOps movement.

“More and more organizations are adopting the DevSecOps approach, which makes it possible to adapt a high pace of integration while maintaining the highest standards in terms of quality of service and security,” confirms Julien Giraud. It offers its own metrics called DORA (DevOps Research and Assessment) which are:

  • Deployment frequency
  • The time linked to production (compliance tests)
  • Service restoration time
  • The failure rate

In a final stage of automation and virtualization – we are talking about NoOps (No Operations) – “internal teams are freed from the administration of the underlying infrastructure by purely software management”, continues Julien Giraud. Developers become responsible for deploying applications, without going through operational staff. Serverless mode can be considered a springboard towards this NoOps approach.

Finally, the “pizza team” approaches promoted by AWS and SRE (Site Reliability Engineering) by Google make it possible to break down the last silos between areas of expertise – applications, OS, middleware, infrastructure – to have a transversal vision of a project.

This entire methodological framework must aim, according to Julien Giraud, “to understand cloud transformation from the point of view of the application platforms that the cloud supports”, focusing above all on the performance of large software packages such as SAP or Oracle. This implies, in his eyes, going beyond the “lift and shift” stage to build these platforms on so-called cloud-native technologies (micro-services architecture, containerization, etc.)

Finally, according to Emmanuel Lilette, it is appropriate to “increase the skills of the IT teams previously responsible for supervising on-premise and network infrastructures to reorient them towards a cloud perspective”. This training and certification policy for cloud environments also constitutes a lever for attracting and retaining talent for an IT department.



Source link -97