یکپارچگی بازیابی اطلاعات و داده کاوی برای کشف الگوهای هماهنگی تیم پروژه
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22089 | 2006 | 14 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Decision Support Systems, Volume 42, Issue 2, November 2006, Pages 745–758
چکیده انگلیسی
This study integrates information retrieval and data mining techniques to discover project team coordination patterns from project documents written in Chinese. The coordination pattern of a project team describes the project execution process, including task category, execution sequence and duration, as well as the team member cooperation. The integration comprises two phases. The first phase extracts the most relevant keywords describing tasks executed by projects from unstructured or semi-structured documents using the mutual information estimate and the term weighting system. A concept hierarchy tree generated using the hierarchical clustering technique represents multiple levels of task categories. The second phase discovers project team coordination patterns through sequential pattern analysis. The proposed approach obtains encouraging results by mining coordination patterns from information system development projects. In the present era of the knowledge economy, the application of groupware to facilitate team coordination and collaboration streamlines the collection and analysis of project documents throughout the project life cycle. A project manager can visualize the project execution process of a team, and can anticipate the project outcomes based on discovered team coordination patterns. Accordingly, the proposed approach can be adapted to team projects that share certain characteristics with information system development projects.
مقدمه انگلیسی
A teamwork approach for executing projects involves various roles and tasks conducted within a period of time. Numerous business practices, for example new product development, information system development, and so on, adopt the teamwork approach. Applying groupware to facilitate team coordination and collaboration streamlines the collection and analysis of project documents throughout the project life cycle. Team coordination patterns can be retrieved from these electronic documents, which may be either semi-structured or unstructured. The coordination pattern of a project team describes the project execution process, including the task category, sequence and duration, and the cooperation degree among team members. For example, consider an information system (IS) development project, which generally includes such tasks as requirement determination, modeling data and processes, database and interface design, implementation, testing, installation, and documentation [11], that has been completed by seven team members (e.g., A1, A2, …, A7) in six weeks. During the first two weeks, two system analysts (e.g., A1, A2) worked together to identify customer requirements, and design data flow diagrams (DFD) and entity relation diagrams (ERD). The next two weeks were used for prototyping the system after designing the database and completing the structure charts. The other four members (e.g., A2, A3, A4, A5) took charge of these tasks. During the final two weeks, the finished system was tested, installed, and cut over to users. The main tasks, namely system testing, document editing, and user training, were performed by team members A1, A6 and A7. The discovery of coordination patterns may help the project management to assign tasks, allocate resources, and evaluate performance. Techniques adopted for identifying team coordination patterns require joint efforts from researchers on information retrieval and data mining. Efforts to design methods for retrieving information from unstructured documents were presented in the Text Retrieval Conference (TREC, http://trec.nist.gov/). Data mining involves the exploration and analysis of large quantities of data to discover meaningful patterns and rules using automatic or semiautomatic methods [2]. Knowledge discovery in databases (KDD) is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [7]. The KDD process can be divided into five phases: selection, preprocessing, transformation, data mining, and interpretation and evaluation. Common model functions in the current data mining practice include classification, regression, clustering, summarization, dependency modeling, link analysis, and sequential pattern analysis. However, KDD research has focused on structured data; thus existing KDD tools are of limited usefulness in mining knowledge from unstructured or semi-structured documents [8] and [9]. This work attempts to integrate information retrieval and data mining techniques for generalizing, visualizing, and forecasting project team coordination patterns. The integration of information retrieval and data mining techniques is conducted in two phases. During the first phase, approaches developed in research on information retrieval are employed to label each document with a set of keywords that represent the main concepts extracted from the document. These keywords are converted to a structured format, and represent the tasks, participants, resources, and time involved in the project execution. After obtaining the structured project description, the second phase is conducted, which applies sequential pattern analysis to generalize the coordination patterns, and then displays them in graphic user interfaces. Sequential pattern analysis is a data mining technique that generalizes sequential patterns from transactions occurring in different time periods. A project manager can compare the progress of on-going projects with the generalized coordination patterns to predict the project outcomes. The integration of information retrieval and data mining techniques contributes to the discovery of project team coordination patterns by using sequential pattern analysis. The integration of the aforementioned two techniques for mining knowledge from unstructured or semi-structured documents sheds light on project management. Research on the PLANMINE system relates closely to the present study [23]. The PLANMINE sequence mining algorithm predicts plan failures based on extracted event patterns. Each plan is labeled as good or bad depending on whether it succeeds or fails to achieve its goals. PLANMINE attempts to find event sequences that can be used to confidently predict plan failure. First, the planner uses simulation to generate a database of good or bad plans. These plans are fed into the mining engine to identify event patterns of bad plans, which are used to fix the plan and prevent failures. The pattern generation and plan modification loop is executed repeatedly until no further improvement is obtained. Second, the high confidence patterns are used to generate a plan monitor that notifies a plan manager before the failure of a new plan. PLANMINE was demonstrated using two planning applications: TRIPS and IMPROVE. TRIPS is a collaborative planning system for designing an evacuation plan by using simulation and data mining for plan analysis. IMPROVE first simulates a plan repeatedly and calls PLANMINE to extract high confidence rules for predicting plan failure. IMPROVE then applies qualitative reasoning and plan adaptation techniques to suggest actions for reducing the likelihood of failure. In summary, PLANMINE generates and simulates candidate plans to discover high frequency patterns among the bad plans. On the contrary, the approach proposed in this paper gathers documents from the project execution, and integrates information retrieval and sequence pattern mining techniques to generate project team coordination patterns. This study presents the proposed integration approach by discovering the coordination patterns from documents written in Chinese in information system (IS) development projects. The progress reports and meeting minutes from the project execution provide rich sources for a project manager not only to identify project status but also to generalize team coordination patterns to predict project performance. Since project composition generally includes tasks, resources, time schedule, and performance evaluation, the proposed approach can be applied to other business processes with similar settings to identify their coordination patterns, and in turn, to improve their performance. The rest of this paper is organized as follows. Section 2 describes techniques for information retrieval and data mining used in this study. The framework for discovering coordination patterns from project documents then is presented in Section 3. Next, Section 4 demonstrates the effectiveness of the proposed approach in discovering coordination patterns from IS development projects. Finally, Section 5 concludes this research and identifies directions for future research.
نتیجه گیری انگلیسی
This study demonstrates the integration of information retrieval and data mining techniques for discovering project team coordination patterns. Information retrieval techniques, such as keyword extraction, task similarity analysis, and concept hierarchy tree generation, are used to transform unstructured or semi-structured project documents into structured data schema attributed by task category, sequence, duration, and participants. Project coordination patterns represented by the structured data schema are discovered using the sequential pattern analysis method. A project manager can use the discovered patterns to visualize the task execution sequence, duration, and cooperation degree of a project team. A project manager can also forecast the outcomes of an on-going project by matching the project with existing patterns. An example application on discovering coordination patterns from information system development projects was used to illustrate the proposed methods. This study sheds lights on the performance improvement of information system development projects by learning from past project execution experiences. Other team projects with similar settings can adopt the proposed approach to enhance project management. This study has some limitations. First, this study may not accurately capture concepts from documents when ignoring syntax and semantics in extracting terms from documents. Second, a task that spans consecutive time windows was encoded as different tasks owing to the limitations of the sequential pattern analysis method that only deals with discrete events. Third, setting a unit time as a window size depends heavily on the task characteristics, and limits pattern transfer to other tasks which have different time spans. Future research can apply the proposed approach to enterprise information system development projects, where uneven project scope, duration, and interaction may create the need for new structured data encoding methods or the extension of sequence pattern analysis methods. Second, the key term extraction during the information retrieval stage can be improved by developing information retrieval methods, for example semantic key terms extraction, ontology-based extraction, and so on. Third, the proposed approach can be applied to team projects other than information system development to evaluate its general applicability.