بهینه سازی کلونی مورچه برای نمایش زنجیره ای RDF جهت پشتیبانی تصمیم گیری
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
5817 | 2013 | 9 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 40, Issue 5, April 2013, Pages 1555–1563
چکیده انگلیسی
Semantic Web technologies can be utilized in expert systems for decision support, allowing a user to explore in the decision making process numerous interconnected sources of data, commonly represented by means of the Resource Description Framework (RDF). In order to disclose the ever-growing amount of widely distributed RDF data to demanding users in real-time environments, fast RDF query engines are of paramount importance. A crucial task of such engines is to optimize the order in which partial results of a query are joined. Several soft computing techniques have already been proposed to address this problem, i.e., two-phase optimization (2PO) and a genetic algorithm (GA). We propose an alternative approach – an ant colony optimization (ACO) algorithm, which may be more suitable for a Semantic Web environment. Experimental results with respect to the optimization of RDF chain queries on a large RDF data source demonstrate that our approach outperforms both 2PO and a GA in terms of execution time and solution quality for queries consisting of up to 15 joins. For larger queries, both ACO and a GA may be preferable over 2PO, subject to a trade-off between execution time and solution quality. The GA yields relatively good solutions in a comparably short time frame, whereas ACO needs more time to converge to high-quality solutions.
مقدمه انگلیسی
In today’s information-driven society, decision makers need to process a continuous flow of data through various input channels, by extracting information, understanding its meaning, and acquiring knowledge by applying reasoning to the gathered information (Hogenboom, Hogenboom, Frasincar, Schouten, & van der Meer, 2012). Yet, an overwhelming amount of data is available at any given time, whereas decision makers – businesses and consumers alike – need a complete overview of their environment in order to enable effective, well-informed decision making. In order to address this issue, Semantic Web (Berners-Lee, Hendler, & Lassila, 2001) technologies can be utilized in expert systems for decision support, as has been demonstrated in recent research on knowledge management (Joo & Lee, 2009), annotation (Aksac, Ozturk, & Dogdu, 2012) and recommendation of data sources (Hsu, 2009), intelligent search (Batzios et al., 2008, Lupiani-Ruiz et al., 2011 and Vandic et al., 2012), and personalized user experiences in, e.g., tourism (Garcia-Crespo, Lopez-Cuadrado, Colomo-Palacios, Gonzalez-Carrasco, & Ruiz-Mezcua, 2011) or e-commerce (Blanco-Fernandez et al., 2010). The rise of the Semantic Web facilitates an ever-growing amount of data to be stored in many heterogenous, yet interconnected sources. This data is commonly represented by means of the Resource Description Framework (RDF), a framework for describing and interchanging meta-data (Klyne & Carroll, 2004), which describes the context of data and thus enables machine-interpretability. Due to the interconnectivity of data rather than pages, the Semantic Web has the potential of addressing today’s typical users’ complex information needs in a more effective and efficient way than the current Web can. Semantic Web technologies allow a user to explore many different data sources in order to address very specific information needs. A typical scenario here may be a query for features and reviews of a holiday destination and comparable alternatives, as well as prices and details of trips to any of these locations. Such queries can be executed on multiple RDF sources by means of the SPARQL Protocol and RDF Query Language (SPARQL) (Prud’hommeaux & Seaborne, 2008). In order for SPARQL queries to disclose the ever-growing amount of widely distributed RDF data to demanding users in real-time environments, fast RDF query engines are crucial. Recent expert systems for querying distributed environments with multiple heterogenous information sources such as databases or repositories focus on the problem of identifying the information sources that are most relevant with respect to queries (Jung, 2010) or on the problem of combining the query results from multiple sources while optimizing coverage and search effectiveness (Amin and Emrouznejad, 2011 and Batzios and Mitkas, 2012). However, to the best of our knowledge, expert systems dealing with optimizing the execution efficiency of a given query in a distributed environment have been relatively unexplored. One of the problems today’s RDF query engines face is the optimization of the order in which the distinct parts of a query are executed. The total execution times of queries depend on these query paths. A good algorithm for optimizing the execution order in a query path can thus contribute to efficient querying. As the number of possible query paths grows exponentially with the query size, the optimization of RDF query paths is far from trivial. Therefore, several soft computing techniques have been proposed to address this problem. For instance, Stuckenschmidt, Vdovjak, Broekstra, and Houben (2005) present a two-phase optimization (2PO) algorithm, consisting of an iterative improvement (II) method followed by simulated annealing (SA). More recently, a genetic algorithm (GA) has been shown to be a promising alternative when optimizing RDF queries (Hogenboom, Milea, Frasincar, & Kaymak, 2009). As their design has been inspired by methods for optimization of query paths in traditional databases, existing soft computing approaches to RDF query path optimization have essentially been designed for more or less static environments – changes in the environment typically require the optimization to be run all over again. However, the Semantic Web is a complex and dynamic environment. Data changes, sources come and go, and latency between sources may be volatile. In this light, ant colony optimization (ACO) (Dorigo et al., 1996 and Dorigo et al., 2006) is a good alternative to the existing soft computing approaches in an RDF environment. ACO is a soft computing technique inspired by the foraging behavior of ant colonies. Its nature allows the algorithm to be run continuously and to adapt to changes in the environment in real time. Moreover, ACO has been shown to outperform GAs in solving complex problems such as scheduling (Merkle, Middendorf, & Schmeck, 2002) and sequential ordering (Gambardella & Dorigo, 2000). As its characteristics render ACO an attractive alternative to existing soft computing techniques for RDF query optimization, we explore the applicability of ACO to query path optimization in an RDF environment. Furthermore, we compare the performance of existing soft computing techniques for RDF query path optimization with our novel ACO algorithm. We focus on the performance of the considered algorithms when optimizing a special class of SPARQL queries, the so-called RDF chain queries, on a single source. The remainder of this paper is structured as follows. First, Section 2 provides a short introduction to RDF and chain queries. We then discuss the two existing soft computing techniques as well as our novel ACO algorithm for RDF chain query optimization in Sections 3 and 4. In Section 5, we evaluate the performance of our considered methods in terms of execution time and solution quality. Finally, we draw conclusions and propose directions for future work in Section 6.
نتیجه گیری انگلیسی
In this paper, we have demonstrated how ACO can facilitate effective and efficient chain querying in expert systems for decision support in a Semantic Web environment, by having artificial ants iteratively construct RDF chain query paths, partly guided by the perceived quality of previously encountered solutions and partly guided by local heuristics. On a large, single RDF source, our experimental results demonstrate the potential of our novel ACO approach to RDF chain query optimization in comparison to previously proposed soft computing techniques, i.e., 2PO and a GA. When exploring the large solution spaces associated with the join ordering problem arising when optimizing RDF chain queries, our ACO approach significantly outperforms the existing 2PO and GA approaches in terms of solution quality and execution time for RDF chain queries consisting of up to 15 joins. For larger queries, consisting of up to 20 joins, the existing GA delivers relatively good solutions in a comparably short time frame. However, our novel ACO approach delivers by far the best solutions in a shorter time frame than 2PO, albeit while needing more time than the recently proposed GA in order to reach convergence. As its design allows our proposed ACO for RDF chain query optimization to adapt to dynamic environments, our next step will be assessing the performance of our novel algorithm in a dynamic RDF environment with multiple sources. In such an environment, we can explore how our algorithm can best adapt to changes in the environment, caused by, e.g., latency differences or updated data sources. In this light, we also consider real-time updating of our join selectivity estimation, which in our current set-up has a fixed value, as it depends on the cardinalities of a join’s operands. Another direction for future work lies in optimizing the parameters of our ACO algorithm. Finally, we also aim to investigate how to devise a more scalable graph representation of the join ordering problem in RDF chain query optimization, as we envisage a more scalable graph representation to improve the performance of our ACO approach even further.