چارچوبی برای ارزیابی ریسک مبتنی بر اجرای تجزیه و تحلیل اطلاعات تاریخی جریان کاری در سیستم های IT
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
21952 | 2011 | 22 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computer Networks, Volume 55, Issue 13, 15 September 2011, Pages 2954–2975
چکیده انگلیسی
Services provided by modern organizations are usually designed, deployed, and supported by large-scale IT infrastructures. In order to obtain the best performance out of these services, it is essential that organizations enforce rational practices for the management of the resources that compose their infrastructures. A common point in most guides and libraries of best practices for IT management – such as ITIL or COBIT – is the explicit concern with the risks related to IT activities. Proactively dealing with adverse and favorable events that may arise during everyday operations might prevent, for example: delay on deployment of services, cost overrun in activities, predictable failures of handled resources, and, consequently, waste of money. Although important, risk management in practice usually lacks in automation and standardization in IT environments. Therefore, in this article, we introduce a framework to support the automation of some key steps of risk management. Our goal is to organize risk information related to IT activities providing support for decision making thus turning risk response planning simpler, faster, and more accurate. The proposed framework is targeted to workflow-based IT management systems. The fundamental approach is to learn from problems reported in the history of previously conducted workflows in order to estimate risks for future executions. We evaluated the applicability of the framework in two case studies both in IT related areas, namely: IT change management and IT project management. The results show how the framework is not only useful to speed up the risk assessment process, but also to assist the decision making of project managers and IT operators by organizing risk detailed information in a comprehensive way.
مقدمه انگلیسی
In order to deliver high quality services to customers, modern organizations often end up employing large-scale Information Technology (IT) infrastructures, typically composed of physical and logical heterogeneous resources such as routers, firewalls, servers, end-user hosts, network protocols, and software packages. As IT services are designed, deployed, maintained, and improved, organizations can run into problems, for example, of scalability and complexity of management. To achieve better outcome from provided services and avoid waste of substantial resources, rational practices in the management of IT infrastructures must be enforced. To this end, some best practice standards and libraries have been published, aiming to provide guidance for proper IT management. Two of the most widely recognized guides are the Information Technology Infrastructure Library (ITIL) [1] – proposed by the Office of Government Commerce (OGC) – and the Control Objectives for Information and related Technologies (COBIT) [2] – introduced by the Information Systems Audit and Control Association (ISACA). An explicit concern of the IT management guides is related to the necessity of managing risks associated with an organization’s IT activities. This is emphasized by the fact that both OGC and ISACA have published specific documents for corporative IT risk management: the Management of Risk (M_o_R) [3] from OGC, and the Risk IT [4] from ISACA. According to M_o_R, to achieve their objectives, organizations must necessarily take a certain amount of risk. It is thus the role of the risk management discipline to help organizations to methodologically deal with risks associated with their activities. Usually, organizations take risks as uncertain events or conditions that, if happen, may affect the accomplishment of business goals. Those events, along with the conditions that represent risks to the business, should be identified and assessed in terms of probability of occurrence and possible impact to the business objectives. Although the literature recommends tackling both negative (threats) and positive (opportunities) effects of risks, in practice, negative effects are far more considered in real IT environments. This results in current risk management practices being in fact strongly focused on the prevention and mitigation of harm. The risk management discipline is based on four logically sequential and cyclic processes [3]: (i) identification of possible threats and opportunities to the objectives of a given organizational activity, (ii) assessment of identified risks in terms of probability of occurrence and associated impact (i.e., estimation of possible losses or earnings), (iii) response planning for preventive and reactive responses to identified risks, aiming to minimize threats and enhance opportunities, and (iv) implementation and monitoring of the planned responses in order to tackle risks, evaluate the effectiveness of preventive actions, and occasionally dispatch corrective ones. Along all these processes, it is important that organizations adopt a common set of internal policies and strategies for risk management to be shared among their departments and teams. Some of these policies and strategies, for example, may define tolerance thresholds, scales for estimating probabilities and impacts, and tools for documenting, reporting, and communicating risks. Despite all best practices and recommendations, the experience of practitioners shows that there is little evidence that risk management is being efficiently applied in a systematic and repeatable way. In fact, standard guides like ITIL or COBIT only provide high level guidelines for general purpose risk management in a textual descriptive form. Very few information is given about how to actually implement these standards in practice and most of the proposed processes are assumed to be manual. Recently, some authors have investigated the actual benefits and shortcomings of different approaches for risk management in real-life environments [5], [6] and [7]. These investigations expose many issues of current risk management actual practices, such as inadequate documentation, little knowledge reuse, and lack of tools to automate, report, monitor, and support decision making. In the end, the quality of risk-related decisions is often too much dependent on the experience of IT managers. The current practice on risk management usually encompasses an excessive dependency on people, thus becoming a time/resource consuming, occasionally counterproductive task. Considering the today’s actual risk management scenario, we emphasize that one of the major problems in risk management is the lack of automation and system-assisted routines. In this research, we pay special attention to problems in the risk assessment process, in which risks are tackled in terms of probability of occurrence and possible impact. The risk assessment process is usually based on interviews and brainstorms with involved stakeholders, in a very ad hoc fashion. Since the quality of risk related decisions and response planning depends directly on the accuracy of risk assessment, the employment of automated tools to assist IT managers in achieving more precise estimations becomes a key factor for the success of risk management as a whole. Several authors have been investigating ways to support risk management in specific contexts or situations [8], [9], [10], [11], [12] and [13]. In previous investigations of our research group, we have also proposed punctual solutions to enable some degree of automation in estimating probabilities and impacts in risk assessment [14], [15] and [16]. Our main goal in this article is to consolidate the approaches previously proposed into one single unified framework to support the automation of key processes of risk management, aiming to make it simpler, faster, and more accurate. The proposed framework is based mostly on best practices proposed in the aforementioned standards and libraries (e.g., ITIL and M_o_R). In this work, we focus on risk assessment for workflow-based systems designed for the management of IT infrastructures and services. There are many types of IT management processes that can be modeled in the form of workflows, such as change management, project management, portfolio management, and incident management. The advantage of using workflows lies in the fact that they define a sequence of fine-grained activities to be executed in a given order and the details of the execution of these activities (including reports of adverse and favorable events) can be recorded to logs for further analysis. Our approach encompasses the automated analysis of logs of previously executed workflows in order to learn from events reported in the past, aiming to help in the design of better workflows for future execution. In order to prove the concept of our solution, two case studies are taken from two IT related areas, namely IT change management and IT project management. The former presents general guidelines for consistently conducting changes over IT infrastructures, from the early specification, planning, and deployment, towards evaluation and review [17]. The latter is focused on the design phase of services, aiming to ensure that a project meets its objectives avoiding waste of resources [18] and [19]. These areas are relevant in the context of IT infrastructures and services management since they have received much attention from both academy and industry in recent years. Moreover, both projects and changes can be organized in the form of workflows and therefore may have their risks assessed using the unified framework proposed in this work. The remainder of this article is organized as follows. In Section 2 a review of the available literature on risk management, specially related to IT Change Management and IT Project Management, is presented. In Section 3, some background concepts that are fundamental to understanding and motivating the proposal of our framework are presented. The information models employed to represent workflows and their executions are presented in Section 4. The conceptual framework itself, algorithms used for impact and probability estimation, strategies for calculating similarity among workflows, and risk summarization are introduced in Section 5. In Section 6 a discussion on the results from the evaluation of both case studies is presented. Finally, in Section 7 the article is concluded with final remarks and future work.
نتیجه گیری انگلیسی
In this article, the current need of organizations to enforce rational practices for IT infrastructures and services management has been discussed. Among many aspects covered by widely employed standards of best practice for IT management, such as ITIL introduced by OGC and PMBOK presented by PMI, the concern with risk management is remarkable. Guidelines from both the M_o_R framework (also from OGC) and the Project Risk Management knowledge area of PMBOK head the efforts of many modern organizations that want to rationally deal with their risks. Despite all guidelines and best practices provided by theses standards, this research has shown that, in practice, the adoption of risk management procedures is performed in a very ad hoc fashion. Lack of automation, standardization, and knowledge reuse are some of the causes that turn risk management inefficient and sometimes counterproductive in actual environments. Therefore, we have introduced a novel framework with the objective of helping in the risk management process, particularly focusing in workflow-based IT management systems. This objective is pursued, in a first moment, by gathering risk related information from the execution records of past workflows and learning from them in order to assess probability and impact factors of risky events. This kind of data gathering procedure, when performed only based on human experience, tends to be time/resource consuming and sometimes too imprecise to guide decision making. Another relevant contribution of the proposed framework is that risk information is organized in interactive and comprehensive reports. This enables operators/managers to have an overview of the risks automatically assessed in different levels of detail, helping quick identification of threats and efficient directing of risk mitigation efforts. The case studies presented in two different scenarios, namely IT change management and IT project management, have shown that the framework can be applicable to at least two different environments and also that it can be customized to better reflect specific needs in each situation. Although not exhaustive, the results indicate that the proposed framework is generic and may be applied a wider range of environments. The main contribution of this research is the proposed framework itself and the way risk related information flows through its modules independently of how each module internally performs calculations. As previously mentioned, the framework has the objective of helping on risk management by automating certain procedures, such as data gathering for estimations of probability and impact. This is a too complex problem to tackle with one single and monolithic solution. The approach of creating a modular framework enables breaking the whole problem down into smaller and less complex parts that can be handled individually. Adopting such approach makes it also easier to customize some parts of the framework in order to better reflect the needs of a particular environment, as discussed in the first case study (Section 6.1). Moreover, there are some other contributions in the context of this research that are worth mentioning. First, classifications of events that represent risks have been proposed, as presented in our two case studies. These classifications have shown to be useful to group events together reflecting the concerns of operators/mangers, thus making the results of risk assessment more meaningful. Additionally, a strategy to calculate similarity among workflows has been introduced, which enabled knowledge reuse in automated risk assessment even when analyzing newly designed workflows. Different algorithms have been presented to calculate probabilities and impacts of events considering the nuances of the analyzed environment. Finally, strategies to categorize and summarize risk information aiming to present more comprehensive and interactive risk reports have been proposed. Future investigations could extend the framework and apply it to other scenarios, such as incident management or portfolio management, as long as they employ workflow-based management systems. In both case studies presented in this article we have performed risk analysis over randomly generated workflow execution records and emulated IT environments. In order to evaluate the accuracy of predictions performed by our framework, it would be of great value to use data from real life IT management systems and compare the results with the actual measurements of these systems. Moreover, it would be interesting to conduct a survey and receive feedback from experienced managers, operators, and other personnel involved in IT operations to evaluate the usability of the proposed risk reports.