BEST PRACTICES WHITE PAPER
Predictive Intelligence:
Identify Future Problems and Prevent Them from Happening
Table of Contents
Introduction ...................................................................................................................................1 Business Cha l l
enge (1)
A Solution: Predictive Intelligence (1)
Dynamic Thresholding >..............................................................................................................2Event Correlation and Analysis > ................................................................................................2Predictive Modeling > .................................................................................................................2Pre-Assigned Business Impact Priority > ....................................................................................3Reducing the Noise Level by 60% or More
> (3)
Predictive Intelligence Is Here Now (3)
Strategic and Still Evolving > .......................................................................................................3 Conc l
usion (4)
Introduction
Wouldn’t you like to have a crystal ball that allows you to see future IT problems? How would seeing what happens in your infrastructure in the future impact the way you manage it today?
Enter predictive intelligence, a forward-looking IT management technology that performs far beyond what the crystal ball has to offer. Predictive intelligence is comprised
of advanced technologies that actually learn what is normal and abnormal based on usage patterns within your IT environment, trigger targeted actions to isolate the root cause of future threats, and take corrective action before services are impacted.
This paper reviews how this convergence of technologies is providing IT with a tangible solution to intercepting, prioritizing, and resolving future incidents in order of business impact.
By leveraging predictive intelligence, IT organizations can benefit from rapid returns on their technology investments, reducing capital and operating expenses.
Business Challenge
Each day, incidents arise in the IT environment, even in the most advanced organizations. They come in many degrees
of severity, but no matter how minor, they can still impact the business through costly service disruptions if they are not quickly intercepted and resolved.
In response, many IT organizations are taking proactive approaches to IT management and adopting best-practices frameworks, such as the IT Infrastructure Library© (ITIL©). They are focusing on Business Service Management (BSM), a comprehensive approach and unified platform for running IT, which offers an effective and proven way to manage IT based on business priorities. In fact, IT is making significant progress in anticipating and meeting the needs of the business and is actually becoming more integrated with the business.
However, as the pace of business continues to accelerate, managing the IT infrastructure becomes even more complex. Manual processes, such as “change control meetings”
to discuss and enact proposed changes, have been made obsolete by the enormous amounts of data and demand placed on IT personnel each day.
Although labor-intensive processes are often being replaced by automated ones, a proactive approach to
IT management only catches threats that cross a pre-determined “line” or threshold, as opposed to making intelligent decisions based on a thorough analysis of historical usage patterns and trends.Newer, proven technologies now exist that provide forward-looking identification, notification, an
reactive和proactived automated resolution of potential threats. Known as “predictive intelligence,” this approach and corresponding technologies provide the most effective and efficient line of defense for critical business services.
A Solution: Predictive Intelligence
There is a natural progression among what is considered reactive, proactive, and predictive technology, and this progression corresponds to the maturity level of your IT organization and the tools you leverage. No matter where your organization lies in terms of maturity, the end goal is the same: to effectively manage an increasingly intricate infrastructure. To accomplish this objective, IT organizations must be able to predict and solve problems before they affect the customer.
In general, the three approaches to IT management can be defined as follows:
A reactive approach entails addressing an IT problem or incident after it has already occurred. Clearly, this approach poses the most risk of prolonged business service disruptions, as the identification and resolution of each incident are not started until a service disruption is reported. This risk alone has justified the fact that most IT organizations take gradual, if not immediate, action to adopt a more refined approach.
A proactive approach involves setting pre-defined “lines” or thresholds for key performance measurements, and enacting a specified course of action (such as triggering an alert) each time these thresholds are crossed. For example, if a server’s utilization exceeds a predefined percentage of the total capacity available, an alert is raised before service delivery is impacted.
Static thresholds, as utilized in a proactive IT management model, are typically set arbitrarily through a time-consuming manual process. Measurements of even a single threshold can vary widely over time as the environment changes and your infrastructure evolves. Adjusting the thresholds regularly is extremely tedious and inefficient. As such, this approach is much less effective than the predictive model described in this paper.
Predictive intelligence, as its name implies, does far more. By understanding and determining what is normal or abnormal based on usage patterns, predictive intelligence technology lowers the level of event “noise” and often eliminates duplicate and excessive alerts that account for as much as 80% of total alerts. Similar to “accidental” calls to 911, duplicate alerts consume valuable personnel resources that are better utilized by attending to more critical issues.
Put simply, predictive intelligence technology gathers and collects data, then correlates and analyzes t
his information to help IT identify, isolate, and resolve threats to mission-critical applications at the earliest possible opportunity, in advance of a service disruption.
While monitoring the performance of every infrastructure component for abnormal behavior is a critical part of successful management, it is equally crucial that the infrastructure be able to automatically adapt and adjust
to normal changes as they happen.
Predictive intelligence technology, through advanced measurement and analysis functionality, actually learns what is normal for your organization. This is achieved by automatically establishing and adjusting to a band of normal operation for every attribute within your environment,
a practice referred to as dynamic thresholding. As your infrastructure changes, the technology adapts, quickly learning, anticipating, and adjusting to how the system will behave on a Monday morning versus a Sunday evening, for example. This virtually eliminates the need to manually create and maintain static thresholds, or manually set and maintain rules-based alarms.
Predictive intelligence is a natural evolution for maturing IT organizations. According to industry analyst reports, IT still discovers about 70% of the service problems when an end-user calls the service desk.
Predictive intelligence is a convergence of several highly advanced technologies, the most important of which are described below.
Dynamic Threshol ding
Most IT organizations have already installed some type
of monitoring/alarm system that will send an alarm when a static threshold is breached. For example, if a CPU breaches 90% utilization, or if a disk becomes 80% full, an administrator will be alerted who can analyze the problem and take corrective action.
As described earlier in this paper, dynamic thresholding involves the ability to set a threshold based on past behavior and then watch the behavior of the application over time. In this way, the system continually learns what
is really normal and abnormal for that application in your particular environment, and keeps adjusting the threshold automatically. The system also compares alarm patterns with other components in the infrastructure, such as configuration items.Event Correlation and Analysis
IT organizations are often unable to give enough attention to problem management, typically due to re
source constraints. In both reactive and even many proactive IT environments, the sheer volume of data generated by alerts and events far exceeds the time available to prioritize their resolution. Real-time root cause analysis, a part of event correlation and analysis, filters through the noise. Consider what usually happens when a server goes down. Layer-upon-layer of alarms go off simultaneously. The database, operating system, and application may even appear to be down, but this may be due to a single point of failure elsewhere in the network.
Real-time root cause analysis technology considers the behavior of an application or device within the context
of activity across the entire infrastructure. It closes sympathetic alerts (which are symptoms of the root cause), clears up duplication, determines if relationships exist between the remaining alerts, and then determines what those relationships are. In essence, the technology intelligently identifies the root cause and how it relates to other alerts in the queue.
T o illustrate, consider when a router goes down. Numerous alerts are triggered from various network components.
In a predictive model, event correlation automatically analyzes the problem, and upon resolution, the t
echnology archives information about the service disruption and root cause, to prevent future occurrences of the same problem. Furthermore, the applications may even be configured to automatically open change tickets if a change is required in order to prevent a reoccurrence of this problem.
In summary, event correlation and analysis solutions minimize the downtime of IT components by identifying failures before they impact IT service levels. This technology also helps IT managers resolve problems more rapidly
by translating events into actionable, business-relevant information, allowing them to take action — often before the problem happens.
Predictive Modeling
Predictive modeling begins by establishing baselines
and rules that frame your approach and by understanding your current performance and usage patterns. Once you have created a model, you apply predictive analytics and correlations to your performance data to identify patterns and relationships that are seen on a regular basis. The relationships are then correlated to your overall business and resource needs.
Predictive modeling also applies to capacity management. “What-if” modeling techniques, such as queuing theory, identify areas where IT resources can be better utilized. Predictive modeling can be used to balance workloads across servers or accurately consolidate physical workloads and servers onto a virtual platform. You can also use predictive modeling to accurately forecast future capacity requirements given transaction or user growth. The ability to optimize existing IT resources — while ensuring delivery of service levels and accurately predicting future capacity requirements — can mean millions of dollars saved in capital and operating expenditures.
Pre-Assigned Business Impact Priority
In the past, event management prioritization used to be democratic; that is, if two routers went down, they were assumed to be equally important. T oday, predictive intelligence technology can automatically detect the priority by which these events should be addressed to best support business services. For example, you can automatically detect that one router supports a small, remote sales office, while the other supports the entire European customer base.
Impact management technology bridges the gap between IT operations and your service desk, helping you identify and prioritize IT events based on their business impact. This technology automatically rais
es incidents with the service desk that contain event and root-cause information with pre-assigned business impact priority, helping to shorten overall time-to-resolution and meet or exceed your SLAs with the business, without service disruption.
Reducing the Noise Level by 60% or More
The convergence of these and other technologies that support predictive intelligence can reduce the noise level by 60 to 70%, eliminating both sympathetic events and duplicate events, according to industry studies.
Predictive intelligence is an important byproduct of a comprehensive, properly integrated service assurance solutions set. Service assurance solutions deliver adaptive, automated, and predictive technology across
the enterprise, dramatically reducing the risk of service disruptions and delivering the consistent levels of service required by the business.
The process flow for predictive intelligence includes detection, diagnosis, isolation, and correction.
> IT incidents are first identified in the detect phase. Detection can be as simple as an end-user call to
the service desk or may involve the proactive use of advanced IT tools.> In the diagnose phase, duplicate and sympathetic events are drastically reduced, enabling you to determine what you should really focus on.
>By completion of the isolate phase, the technology determines, in effect, “What’s the business relevance of this? Is there one thing that’s more important than the others? Are there capacity problems I need to know about?”
> The final phase, correct, resolves issues in order of business priority. Automated fixes can even be embedded here to further streamline repetitive tasks — improving overall mean time to repair (MTTR).
In summary, these converging technologies combine to form a way for IT to continue advancing in IT maturity. These technologies unify management across both distributed and mainframe environments, physical and virtual.
Predictive Intelligence Is Here Now
Many organizations are already profiting from the technologies that make up predictive intelligence. Here are some examples of companies that used one or more technological components of predictive intelligence:
> A large bank saved $2.2 million over a two-year period by avoiding the purchase of 400 servers, cut business service downtime by 50%, and achieved an ROI of 268%.
> A pharmaceutical company eliminated server hardware purchases for 24 months, saved $7 million in projected hardware and software purchases, increased virtual machine density by 3x, and realized ROI in three months.
> A large insurance company reduced incident alarms by 90%. Alarms are more meaningful to the company now.
> An international hospital group reduced time to resolution for critical incidents by 68% and achieved $1.2 million of annualized savings.
Strategic and Still Evolving
These successes provide a clear conclusion: predictive intelligence delivers fast payback in direct savings, increased revenue, faster MTTR, and better service levels to support strategic business goals. It frees IT organizations and their technical staff to work on more strategic projects, allowing them to deliver a more competitive, profitable business.
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论