Expert systems are built from knowledge traditionally elicited from the human expert. It is precisely knowledge elicitation from the expert that is the bottleneck in expert system construction. On the other hand, a data mining system, which automatically extracts knowledge, needs expert guidance on the successive decisions to be made in each of the system phases. In this context, expert knowledge and data mining discovered knowledge can cooperate, maximizing their individual capabilities: data mining discovered knowledge can be used as a complementary source of knowledge for the expert system, whereas expert knowledge can be used to guide the data mining process. This article summarizes different examples of systems where there is cooperation between expert knowledge and data mining discovered knowledge and reports our experience of such cooperation gathered from a medical diagnosis project called Intelligent Interpretation of Isokinetics Data, which we developed. From that experience, a series of lessons were learned throughout project development. Some of these lessons are generally applicable and others pertain exclusively to certain project types.
Expert knowledge and discovered knowledge are two powerful tools that can be combined. Used together they maximize the qualities that they have separately.
An expert system operates on a knowledge base that contains the knowledge elicited from the expert (EK). This knowledge base is represented by some formalism (rules, frames, Bayesian networks, etc.) and is built by the knowledge engineer from elicited expert knowledge and, later, validated by the expert. Evidently, the system is subject to and limited by the amount of knowledge entered, that is, represented in its knowledge base. And, precisely, the bottleneck in expert system construction is knowledge elicitation, a phase conditioned by countless constraints ranging from the number of available experts, or how much expertise the experts have, to the complexity of the actual knowledge elicitation process.
Recently, automatic knowledge acquisition techniques have attracted a lot of interest as they are potentially a big help for remedying this bottleneck. The knowledge discovery in databases (KDD) process, especially data mining techniques, is used to automatically discover knowledge from data. The knowledge discovered by data mining (DMK) is implicit in the data and can take the shape of patterns or models that fit the data, trends in temporal data, associations among different data features, rules, etc.
The key point is that these two approaches, knowledge elicitation from experts and knowledge discovery from data, complement each other (da Silva et al., 2002, Daniels and van Dissel, 2002, de la Vega et al., 2010 and Weiss et al., 2003). Applied together, they can be used to build better systems: data mining techniques can be used to support the different tasks involved in expert system (ES) or knowledge-based system (KBS) development (Flior et al., 2010, Mejía-Lavalle and Rodríguez-Ortiz, 1998, Phuong et al., 2001 and Wang et al., 2004), and expert knowledge can be used to facilitate and improve the results of the different stages of the KDD process (Kusiak and Shah, 2006 and Zhang and Figueiredo, 2006).
The aim of this article is to describe the key results of this interaction between EK and DMK, while highlighting the lessons learned over the years from our own experience of these issues in the medical field, presenting a long-term project called I4 (Intelligent Interpretation of Isokinetics Data). This project integrates expert systems and data mining techniques to process isokinetics data. We believe that the results of and the lessons learned from this project are potentially useful for developing systems incorporating EK and DMK.
The remainder of the article is organized as follows. Section 2 describes related work analyzing other applications that present some facet of this type of cooperation. In Section 3 we outline our I4 project. Sections 4, 5 and 6 describe the three I4 project phases: expert system development, data mining and symbolic data mining. In Section 7 we summarize the lessons learned. And, finally, Section 8 outlines some conclusions
Expert knowledge and data mining discovered knowledge do not have to be two separate problem-solving alternatives. They can be used together, complementarily, to develop, validate and maintain a KBS. In this article, we have highlighted some real examples of how this cooperation can be exploited to build better systems and optimize the resulting system performance. In some cases, the cooperation between these two fields can lead to the construction of systems that would not have been built if it were not for the positive effects of that cooperation.
We detailed a real example of this cooperation in the I4 project, which our research group developed over several years. This project started off as a typical expert system development, but later had to incorporate the development of a data mining system intrinsically linked to the expert system. This second part of the project led to the implementation of a system with numerous functionalities that would not have been possible if only one of the paradigms had been used: EK or DMK.
Noteworthy is the fact that we have been able to draw numerous conclusions from the experience acquired in the development of this project, which we set out in Section 7 of this paper as lessons learned.
As regards the cooperation between EK and DMK during project development, it is worth mentioning that there were several types of cooperation in the I4 system:
•
Expert functions remove incorrect tests, eliminate incorrect extensions and flexions and remove noise before applying the numerical DM for pattern discovery. They also check that the medical protocols for the tests were correctly applied.
•
Expert knowledge was used to select and validate the patterns discovered by the numerical DM system. Once the candidate patterns had been discovered, the expert selected and validated the relevant patterns.
•
Expert knowledge was used for guidance at the beginning of the reference model generation by semi-automatically selecting the population to be used.
•
In the symbolic stage, expert knowledge was used to generate the vocabulary, gather symbolic data from the numerical data and define the weights used in the symbolic distance.
The originality of the I4 system lies in the fact that an ES was built that directly intervenes in the KDD process. Once this process is complete, the discovered knowledge can be fed back to the ES.
From our experience in the I4 project and from the other examples described, we can conclude that, no matter what the direction, the cooperation between these two disciplines helps to build superior and better validated systems containing more, higher quality knowledge.