Many tech professionals are now tasked with applying artificial intelligence (AI) and machine learning (ML) to business problems, in hopes of improving their products and services. An emerging approach, AIOps, promises to apply artificial intelligence to vexing computing problems.
A recent report from Constellation Research makes the case for AIOps to improve the state of IT operations and help untangle the spaghetti architectures that have emerged over the years. “IT managers face enormous challenges to be effective because they have added too many tools and become siled,” says Andy Thurai, the report’s author. “In addition to data fragmentation, many tools produce critical alerts for the same event, creating ‘alert fatigue’. »
“AIOps is about applying AI to improve IT operations,” he says. “Contrary to some beliefs, it is not about improving AI with IT operations, but rather the reverse. »
Why opt for an AIOps approach?
AIOps is a tool that could potentially increase the productivity of IT teams. Andy Thurai provides seven good reasons to consider an AIOps approach to managing IT complexity:
- Reduce computer noise and fatigue alerts. “IT teams today are truly overwhelmed by the noise created by false alarms, as well as too many alerts for a single incident,” he writes. “The overwhelming amount of noise can create alert fatigue. AIOps can help reduce that noise by 80-90%, he estimates.
- Identify the origin of an anomaly. In today’s multicloud or hybrid environments, “it is extremely difficult to identify the underlying event that caused the incident,” he explains. “The main problem with root cause analysis is to bring together logs, metrics, and traces that occur in the same time frame across the entire stack. AIOps sheds light on the origins of anomalies. An AIOps solution also makes it possible to “show the timeline of the incident from the time it occurred”.
- Improved capacity planning and resource utilization. “With AI-assisted, data-driven mapping, you can deploy workloads to the right mix of servers, instances, and machines,” the report author explains. “If a specific combination didn’t work, you can tweak it in real time and keep making changes in real time as well until it works as expected, without manual intervention.” »
- Ability to correlate events. AIOps can play a role that “gathers related telemetry information – logs, metrics, and traces”. It allows “looking at related telemetry information from various tools, all together, on the same dashboard and at the same time, which will give you a clear view of what is happening in the system and help identify the cause. premiere quite quickly”.
- Context enrichment/alerts/incidents. “Once an incident occurs, the first step the AIOps team must take is to determine the context of the incident (what, when, and why) as quickly as possible,” says Andy Thurai. “A properly implemented AIOps solution will add context to the incident or alert, instead of notifying the support personnel involved. »
- Anomaly detection. “AIOps must be able to analyze all the data and identify patterns. »
- Self-repair and automation capabilities. “A good AIOps solution should either have automation in place or integrate with automation providers via APIs to initiate remediation measures. For example, in the event of CPU or memory overuse, restarting or stopping certain processes can solve problems without the need to create an alert, trigger an incident and waste IT resources to investigate. on this incident and remedy it. »
Keeping up with all the demands of today’s panoply of systems can be overwhelming for IT teams constrained in size, time, and budget. AIOps provides intelligent digital assistance to help manage day-to-day issues, so IT professionals can stay focused on the business.
(function(d, s, id) var js, fjs = d.getElementsByTagName(s); if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/fr_FR/all.js#appId=243265768935&xfbml=1"; fjs.parentNode.insertBefore(js, fjs); (document, 'script', 'facebook-jssdk'));