Existing theory has framed the process of information extraction and agglomeration, also referred to as the knowledge discovery (KD) process, as a series of strategic search decisions, subject to constraints, with the objective of attaining a sufficient level of domain-specific knowledge for use in strategic planning. Supported by the experiences of firms representative of Client, Developer, and Third-party segments of the data mining (DM) community, this work provides an extension to this basic framework. The implications provided suggest a wealth of untapped opportunities in the area of KD research.
The adoption of enterprise resource planning (ERP) systems over the last fifteen years has been accompanied by an explosion of readily available transactional data. Sales figures, human resource activity, stock-out, and defect occurrences are only a part of the issues accessible to and empowering the modern corporation. Often however, the relevance of this data comes from the information that can be derived from examining multiple issues simultaneously and the ability to draw inferences critical to strategic planning [2]. Benefits contribute to the firm’s business intelligence and subsequently to the overall competitive advantage of the firm [9] and [20]. The challenge faced by firms, and the analysts charged with manipulating this data, rests in their ability to provide acceptable levels of strategically applicable information as the result of allowable effort, in time and money.
One of the fundamental problems of information extraction is that the formats of available data sources are often incompatible, requiring extensive conversion efforts. In an attempt to reduce this difficulty, several ERP systems, such as SAP and Baan, have embedded means by which to organize and archive the transactional data in their application databases. The incorporation of such data warehousing schemes has interested both researchers and practitioners. Knowledge discovery (KD) describes both the overall process by which information is then extracted/agglomerated and the domain dedicated to research on it [8] and [27].
There has also recently been a concentrated effort to provide data mining (DM) tools able to assist analysts faced with unstructured KD tasks. Distinction between KD and DM concepts remain ambiguous, however, in spite of efforts by various researchers in giving good definitions and examples of their differences [3]. Of the distinguishing features discussed, the most common is the iterative and process-oriented nature of KD, as well as its emphasis on the development of strategic knowledge and domain understanding. Data mining on the other hand is discussed in specific applications and tools for finding rules and relationships among the data.
From a knowledge management standpoint, DM tools allow for the creation of well-defined transferable information [18]. In contrast, KD processes are also characterized by data retrieval, data cleansing, criteria specification, and performance analysis. KD processes agglomerate interim information found by such techniques as data mining in generating understanding and domain knowledge. In an adaptation of the scheme proposed by Haeckel and Nolan in 1993 [10] and inspired by the linkages to decision theory as proposed by Kuhlthau [17], the relationships can be depicted hierarchically as in Fig. 1.There may be many different goals of a particular KD task, including such objectives as the derivation of dependent relationships, development of forecasts, and classification. The product is an agglomeration of such information, organized in a format that can be applied as knowledge ultimately relevant to a downstream planning activity. At any one iteration in the process, alternate levels of task information become applicable and thus alternate extraction techniques (DM tools) may prove superior. However, while the majority of recent KD literature has focused on the efficiency of the algorithmic tools, a review of the literature reveals that little attention has been given to the strategic nature and internal dynamics of the discovery process. Since the overall success of a KD process is dependent upon both process efficiency and quality of output, and since certain constraints, such as time, may apply to an overall knowledge discovery task, an analysis of the dynamic nature of these strategies is needed to effect improvements.