This study develops an ontology building process for extracting conceptual tags and hierarchies in textual corpus. Though humans have been creating ontologies for many years, efficient ontology building processes in textual corpus are extremely ad hoc. Several issues have identified including how to recognize terminology in textual document, name concept tags in terminologies, and derive conceptual hierarchies among concepts. The proposed approach is extraction technique combinations to produce ontology prototype for editors. The empirical feedback indicates that elicitation synergy is productive during the early stages of building. Additionally, this elicitation synergy is especially useful for ontology editors who lack reference models of a working domain and who encounter textual corpus as major knowledge sources.
Ontologies are built to establish a classification or conceptualization in knowledge related disciplines. Ontologies have long been used to express shared human understanding of information. The use of ontologies by information technology is “a specification of a conceptualization” that was defined by Gruber (1993). Moreover, a conceptualization is an abstract, simplified world view used for representational purposes. Noy and McGuinness (2001) summarized the reasons for developing ontology as follows: sharing a common understanding regarding information structure among people or agents, enabling reuse of domain knowledge, and clarifying domain assumptions. Various studies present extensive evidence that ontologies are involved in information technology to improve existing Web-based applications ( García-Sánchez et al., 2005 and Staab et al., 2000), in addition to document management ( Martin and Eklund, 2000 and Motta et al., 2000), and agent negotiation ( Huhns and Singh, 1997 and Khedr and Karmouch, 2005). Ontology techniques also enable knowledge, semantics, and intelligent in application systems. The advantages of using ontology include permitting more disciplined knowledge base design and facilitating knowledge sharing and reuse ( Fernandez-Breis and Martnez-Bejar, 2000 and van Elst and Abecker, 2002).
Wide agreement exists that when trying to apply ontology-based system experts must focus on specific domain problems and provide common understandings of individual concepts. However, challenges exist in eliciting cognition from the real world and thus designing concepts of ontology. Human experts encounter clear and proper ontologies for using information systems. Thus, building ontologies is extremely time-consuming and requires considerable human effort (Sugumaran & Storey, 2002). Ontology building may become increasingly difficult when either systematical categories or predefined taxonomy is unavailable. For example, building ontology in daily events is harder than in biological nature science. Additionally, human experts can be important in creating ontology, but have difficulty in coming up with widely recognized impersonal perspectives. Restated, ontology building is more of a craft than an engineering task. To extend referenced sources in building ontologies, various studies have suggested retrieving concepts from documents and from the Web (De Bruijn and Martin, 2002 and Gillam et al., 2005). The abundant textual document not only gives knowledge but also provides raw materials in building ontologies.
This study develops an ontology building process for extracting conceptual tags and hierarchies in textual corpus. Since textual corpus is created for expressing semantics to human understanding, a systematic granular evolution of eliciting stages is required. Thus, the objectives of this study include: (1) Recognizing terminologies based on textual corpus of specific domains; (2) Identifying unambiguous tag name of concept based on terminologies; and (3) Discovering the conceptual hierarchies according to an “is-a” relationship among concepts. To achieve these objectives, this study explored existing extraction approaches, surveyed corresponding tools, and made the revisions necessary to achieve each elicitation stage. To objectively evaluate the proposed approaches, 15 experts were invited to assess the usefulness of this investigation. The empirical results illustrate that the synergy approaches used to derive such an elicitation may be particularly useful in ontology editors that deal with textual corpus as major knowledge sources.
This study describes a combination of elicitation approaches for ontology building in textual corpus. Three stages of gradual development, including recognized terminologies, named concept tags, and discovered hierarchical structures, are identified and corresponding techniques are derived. This study does not claim that the synergy approach achieves well applicable expertise in ontology building. However, ontology editors can exploit the candidate conceptual structure to further construct their formal ontology. Future studies should be conducted to increase the usability ratio and an integrated elicitation development environment.