پیش بینی رفتار کاربر وب با استفاده از یادگیری مبتنی بر بهینه سازی کلونی مورچه ها
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|7789||2012||9 صفحه PDF||سفارش دهید||8110 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Engineering Applications of Artificial Intelligence, Volume 25, Issue 5, August 2012, Pages 889–897
An ant colony optimization-based algorithm to predict web usage patterns is presented. Our methodology incorporates multiple data sources, such as web content and structure, as well as web usage. The model is based on a continuous learning strategy based on previous usage in which artificial ants try to fit their sessions with real usage through the modification of a text preference vector. Subsequently, trained ants are released onto a new web graph and the new artificial sessions are compared with real sessions, previously captured via web log processing. The main results of this work are related to an effective prediction of the aggregated patterns of real usage, reaching approximately 80%. In the second place, this approach allows the obtaining of a quantitative representation of the keywords that influence the navigational sessions.
Since the beginning of the World Wide Web, one of the aspects that has caught the attention of the emerging researchers was the way in which the users interact with the structure and content of web sites. For that purpose, multiple analyses and models were generated to understand web user behavior in order to display relevant content and maximize traffic. The emergence of e-commerce represented a major change in terms of the valuation of the web user, not only as a consumer of content, but also as a client that has to be seduced into making a purchase. This, together with the rise of the Web 2.0 paradigm, promoted the unification of knowledge and techniques in what is now commonly known as Web usage mining (WUM), a field specializing in the study of web user behavior. One of its main objectives is to achieve web user personalization, which means the capability of generating adaptability within the web site, both through link structure transformation and real time suggestions, in relation to the user's preferences and generated paths. In general, WUM uses traditional behavioral models, operations research and data mining methods to deal with web usage data. However, some modifications are necessary according to their respective application domain. Two families of techniques have been used to analyze the sequential patterns: deterministic and stochastic. Each one has been used depending on the approaches that have been adopted. Soft computing methodologies have gained a considerable amount of relevance in relation with WUM, due to their flexible implementation and results in the field of recommendation-based systems and adaptive web sites (Lin and Tseng, 2010). Within these fields, special attention has been concentrated on bio-inspired metaheuristics, which are commonly ruled by the concept of swarm intelligence, the ability of a group of agents to perform complex tasks through a collaborative process. Instead of trying to mimic human intelligence, the inspiration is taken from the observation of social insects such as ants or bees (Christensen et al., 2007). Ant colony optimization (ACO) is one of the tools used for these purposes. This metaheuristic is inspired by the way ants optimize their trails for food foraging based on releasing chemical substances into the environment called pheromones. This simple idea is applied to the web user trails of visited web pages, also called sessions (Liu, 2007). Artificial ants are trained through a web session clustering method modifying an intrinsic text preference vector which represents the importance given by the users to the set of most important keywords. Furthermore, trained ants are used to predict future browsing behavior. This paper is organized as follows. Section 2 provides an overview of related work. In Section 3 the proposed model is presented. Then, in Section 4 an application of our work on a real web site is described. Finally, conclusions and future work are presented in Section 5.
نتیجه گیری انگلیسی
We conclude that the model presented in this paper is a plausible way to integrate the ACO metaheuristic with web mining methodologies. Its multi-variable approach, which uses both content, structure and usage data from web sources, allows the building of a framework for web user simulation based on the construction of text preference vectors, whose fully parameterized structure allows the detection and incorporation of any change in web environment. It must be noted that the simplicity of problem formulation which is intrinsic to ACO cannot be applied directly to web-based problems because of the complexity of data and its time dependency. Additional functionalities are required to handle the interaction of data from different dimensions. For example, as in standard ACO implementations such as vehicle routing, in which both starting and ending points are well defined, web user sessions present a multi-starting–ending nature, which is a challenge to ACO in terms of ensuring algorithm convergence. As the learning algorithm depends basically on a text preferences analysis, the lack of structure of the Web makes mandatory to perform a good pre-processing of the data. If the access to the data is not ensured or the quality is poor, then the results showed in this work may not be replicable. Thus, while the deep dependency on the input data could be seen as an advantage in terms of that the results will be specific for each web site, it also could represent a drawback due to the existence of web sites on which will be infeasible to gather the minimum amount of data necessary for a good training. The importance of the acceptance of web standards along with the rising of new technologies such as HTML5, which incorporates a semantic approach and native support for most multimedia formats, could help to make website content more accessible. Another fact to be pointed out is that, as this model is based on a collaborative multi-agent construction of solutions, results have to be understood in an aggregated fashion. Thus, this work allows the obtaining of a global estimation of web usage. Specifically, 81% of explanation in relation to the coherent matching between artificial and real sessions based on a similarity measure was achieved. Additionally, the initial results related with the characterization of text preference vectors and their correlation with the keywords present in the most visited pages could represent a starting point for a new methodology for estimating in an aggregated way the relevant topics for a given web user. This could be compared with existing techniques such as Latent Dirichlet Allocation (LDA). Future work is related to the refinement of the text preference model in order to incorporate different utilities when collecting information, which could adapt in a better way to real user behavior. Changes in the collaborative process could also be made to obtain a faster solution, including comparison with other metaheuristics.