Dale Wang – Dell Technologies
Today customers demand and expect IT vendors to deliver solutions with SLA’s instead of simply stacks of hardware and software as exemplified by the everything-as-a-service model, which pushes the management and operation of the infrastructure to the provider regardless of whether the service is hosted on-prem or in the cloud. This implies that the quality of an IT service depends not just on the quality of the hardware and software utilized, but must now also include the ability, speed, and scale to which a vendor can detect, mitigate, and resolve issues that arise while the service is live.
To achieve that end, the service provider must automate the management and operation that are typically performed by highly experienced and knowledgeable IT professionals. This automation capability would require the use of AI and ML, for which Gartner has coined the term, AIOps.
Many vendors have started to collect data to build effective ML models; however, most of the model development is primarily for detecting hardware component issues, or for detecting anomalies using unsupervised learning. The ability to actually predict issues, hardware or software, before they occur, or predict what is the root cause of the issue after it has occurred, is really the true goal and is much more difficult to achieve.
In this presentation, I will discuss the general approach of building ML models for the issue and resolution prediction using supervised learning based on historical data, such as defects or bug reports, trouble-ticket case data, and telemetry/log data.
Dale Wang, 2021 Technical Presentation, Presentation