بهره برداری از روابط معنایی و ساختار داده های سلسله مراتبی برای حمایت از فعالیت های حاشیه نویسی و در حال دیدن سایت یک کاربر در نظام رده بندی مردمی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
20344 | 2009 | 25 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Systems, Volume 34, Issue 6, September 2009, Pages 511–535
چکیده انگلیسی
In this paper we present a new approach to supporting users to annotate and browse resources referred by a folksonomy. Our approach is characterized by the following novelties: (i) it proposes a probabilistic technique to quickly and accurately determine the similarity and the generalization degrees of two tags; (ii) it proposes two hierarchical structures and two related algorithms to arrange groups of semantically related tags in a hierarchy; this allows users to visualize tags of their interests according to desired semantic granularities and, then, helps them to find those tags best expressing their information needs. In this paper we first illustrate the technical characteristics of our approach; then we describe various experiments allowing its performance to be tested; finally, we compare it with other related approaches already proposed in the literature.
مقدمه انگلیسی
The term folksonomy is currently used to indicate a support data structure that allows human users to classify and categorize various kinds of resources (e.g., URLs, photos, videos, scientific papers, and so on) by means of plain keywords, also known as tags [6], [20] and [31]. A folksonomy consists of a set of URIs (used to identify the resources referred by it), a set of tags (used to label these resources) and a set of users (who produce and label these resources) [26]. The number of folksonomies on the Web is rapidly increasing since 2004. In the meantime, also the number of resources referred by each folksonomy, as well as the number of users who exploit folksonomies, are rapidly growing [6]. Popular examples of folksonomies are Flickr [3] (which allows users to annotate their photos), del.icio.us [2] (which allows users to store and share their Web bookmarks), and Bibsonomy [1] (which allows users to share bibliographic data on scientific papers). Actually, folksonomies are gaining a wider and wider popularity not only on the Web but also in large organizations and businesses (see, for instance [15] and [33]). Some of the main reasons underlying the pervasive diffusion of folksonomies are the following: •• In traditional knowledge management systems, classification activity is performed by a human expert or a pool of human experts. As the size and the variety of available information increase, both the costs and the time required to carry out classification in this way increase too. Specifically, we can observe that the rate at which new resources are referred by a folksonomy is very high; for instance, authors of [25] report that, in June 2007, del.icio.us users posted about 120,000 URLs per day; the same study estimated that the number of URLs stored in del.icio.us in 2007 was about 115 millions. Clearly, in presence of this rate, the amount of time needed by human experts to view, analyse and catalogue these URLs would be huge. Moreover, the pool of human experts should be huge and, consequently, the classification costs would be prohibitive. In addition, resources referred by a folksonomy encompass a large number of disparate domains: for instance, we observed that, in del.icio.us, 524,109 URLs were labelled with the tag “Environment” whereas 617,927 were labelled with the tag “Database”; as a consequence, in this system, there are very important and frequent topics (which, therefore, cannot be neglected) that are, at the same time, very different and possibly related each other. A human expert (or a pool of human experts) would be required to own a very large vocabulary spanning many specialized domains. Moreover, since new topics rapidly emerge, the experts are required to swiftly acquire new knowledge. An analogous problem arises for large-scale businesses; in fact, also in this context the number and the variety of available resources are high; therefore, also in this scenario, performing and maintaining a classification of available resources is a very complex task which cannot be delegated to a human expert (or a pool of human experts). In both the two contexts described above it appears necessary a totally different form of coordination in which each user classifies his resources and users’ classifications are made available to other users on the basis of their needs/requirements. •• Many authors [6] and [25] conjecture that the usage of folksonomies (in particular, those regarding Web bookmarks) can enhance the performance of traditional Web search engines. As an example, the authors of [6] observe that the set of tags used to label a Web page can be more effective than a classical information retrieval technique (like TF/IDF) to summarize its content. As a consequence, tags can play the role of metadata and can be effectively used to compute the degree of match between a query and a Web page. •• Folksonomies are useful to highlight hidden social ties among users. For instance, the authors of [22] propose an approach to organize tags in a hierarchical structure and to associate a specific topic with each tag. After this, given two users u1u1 and u2u2 and a topic TT, this approach considers the set I(u1,T)I(u1,T) (resp., I(u2,T)I(u2,T)), consisting of the set of resources that u1u1 (resp., u2u2) labelled with the tags associated with TT, and computes the degree of overlap between I(u1,T)I(u1,T) and I(u2,T)I(u2,T); if this degree is higher than a specific threshold it is possible to conclude that there exists a social tie between u1u1 and u2u2. Therefore, the exploitation of folksonomies provides a notion of users’ social tie more effective than that generally proposed in commercial systems; in fact, these systems assume that there exists a social tie between two users only if this fact is explicitly claimed by them; as a consequence, with these systems, there would not exist a social tie between two users who do not know each other. •• In large businesses folksonomies are useful to identify communities of users with shared (or complimentary) interests as well as experts of a certain topic. For instance, in the social bookmark system DOGEAR [33] a firm expert can insert a tag tt and can identify all URLs tagged with tt; after this, he can retrieve the list of all firm experts who exploited tt and can contact them to receive help or additional material on a subject related to tt. Moreover, he can select one (or more) of the firm experts and can browse his (their) lists of bookmarks to access new resources, to strengthen his skills or acquire new ones. However, as clearly pointed out in [20] and [41], despite these advantages, folksonomies suffer from some, quite crippling, disadvantages that can be summarized in: (i) ambiguity, (ii) usage of synonymous tags, and (iii) discrepancy on granularity of tags. In order to concretely illustrate these disadvantages we consider a real-life folksonomy dealing with Databases and Information Systems; it will be the reference folksonomy throughout the paper. It covers a wide spectrum of topics, like the design/implementation of a Database/Information System, the usage of Information Systems on the Web, the usage of object-oriented programming languages in Databases/Information Systems, and so on. Ambiguity refers to the fact that some terms may have multiple meanings. The basic example of ambiguity are homonyms; we say that two terms are homonymous if they have the same name but different meanings. For instance, in our reference folksonomy, the tag “Generalization” could be exploited to label both slides about the inheritance among classes in an object-oriented programming language and slides about the generalization relationship in E/R diagrams. As a consequence, the answer to a query consisting of the term “Generalization” performed by a user interested in E/R diagrams would include also a reference to the slides about the inheritance among classes in an object-oriented language. Usage of synonymous tags in a folksonomy means that different users could exploit different (yet synonymous) tags to label/query the same type of resources (or, even, the same resource). For instance, in our reference folksonomy, some resources about an E/R modelling tool could be labelled with the tag “Data Modelling” by some users, whereas other resources about the same tool could be labelled with the tag “Database Design” by other users. As a consequence, a user who ignores the tag “Database Design” and submits a query containing only the tag “Data Modelling” would not receive some relevant resources as answer to his query. Generally speaking, a user could receive a complete set of answers to his query only if his vocabulary is rich enough to allow him to encompass a tag with the whole set of its synonyms. This hypothesis is clearly unrealistic because users’ vocabulary is often quite limited. Discrepancy on granularity of tags could arise because a resource could be reasonably described by various tags, ranging from terms having a broad meaning to terms characterized by a narrow meaning. Therefore, some users, according to their expertise level and cultural background, may prefer to exploit generic tags, whereas other users could be driven to exploit specific tags. For instance, consider a tutorial about the JOIN clause in SQL and assume that it is labelled only by the tag “Join”. An expert user could submit a query containing very specific tags like “Outer Join” or “Left Join” and, then, he would not receive the tutorial about the join operator, even though it may be relevant to his goals. By contrast, a novice user could submit a query containing a generic tag like “Select Clause” and, therefore, also he would not receive the tutorial. In order to better face the three problems mentioned above, it would be extremely useful a tool capable of parsing the set of tags a user is inserting (for either cataloguing a resource in a folksonomy or submitting a query over it) in such a way as to interactively suggest new related tags. As a matter of fact, these new tags could help him to more properly label the resource he is registering or to more precisely specify the query he is submitting. More concretely, suggested tags would be able to: •• Disambiguate the meaning of a tag. For instance, with regard to the previous example about ambiguity, if a user is inserting the tag “Generalization”, the tool could suggest tags like “Class Diagram” and “E/R Diagram”; then, the user can examine these tags and can enrich his query by selecting those ones best specifying his needs. •• Extend the vocabulary of a user. For instance, with reference to the example about the usage of synonymous tags, a user who is inserting the tag “Data Modelling” could receive the tag “Database Design” to complete his query. •• Enable users to opt for the subjectively “right” level of granularity. For instance, with reference to the example about discrepancy on granularity of tags, the system can suggest a set of tags having a broader or a narrower meaning than that characterizing the tags specified by a user. As an example, if a user is inserting the tag “SQL”, the system can suggest a set of more specific tags, like “Join”, and a set of more generic tags, like “RDBMS”. The user can select, among these tags, those ones best specifying the desired level of granularity. Clearly, the number of suggested tags should be limited in such a way that the time and the effort required to the user to evaluate proposed tags are reasonable. From the previous discussion it emerges that the more vague the knowledge of a user about a domain is the higher the benefit he would gain from such a tool will be. In fact, if a user has a vague knowledge about a domain, it might happen that: (i) he ignores possible multiple meanings of a tag; (ii) he has a limited vocabulary (and, then, he would be incapable of recognizing terms having a similar meaning), and (iii) he is not aware about the granularity of terms used to label a resource (and, then, he could use tags with a wrong level of granularity).
نتیجه گیری انگلیسی
In this paper we have presented a new approach to supporting users to perform social annotations and browsing activities in folksonomies. Our approach receives a set TSetInputTSetInput of tags specified by a user. It first constructs a set NeighTSetInputNeighTSetInput of tags semantically related to those specified in TSetInputTSetInput. After this, it organizes tags of NeightTSetInputNeightTSetInput in a hierarchy in such a way as to allow a user to visualize the tags of his interest according to the desired semantic granularity as well as to find those tags best expressing his information needs. We have seen that our approach is characterized by the following novelties: (i) it proposes a probabilistic technique to quickly and accurately determine how much a tag is similar to (or more general of) another; (ii) it proposes two suitable data structures and two related algorithms to organize NeighTSetInputNeighTSetInput in a hierarchy. In this paper we have provided all technical details about our approach; then, we have described various experiments devoted to measure its performance; finally, we have compared it with other related ones previously presented in the literature. As for current and future work, we are planning to develop an online demo of our approach in such a way as to make it available to a large audience. Moreover, in our opinion the ideas proposed in this paper present various interesting developments. Specifically, since the annotations of a user can be regarded as a reliable indicator of his preferences, we plan to design a recommender system capable of learning the profile of a user from his annotations. In particular, we plan to design both a content-based recommender system (relying on the analysis of tags specified by a user in the past) and a collaborative-filtering one (based on the analysis of tags jointly adopted by multiple users presumably sharing some common interests). A further research direction consists of defining complex hierarchies capable of capturing different types of semantic relationship among tags and displaying it in an intuitive fashion. Specifically, we think of exploiting graph visualization techniques to represent tags as charged particles whose interaction is regulated by attractive or repulsive forces. These techniques graphically display tags in a bi-dimensional layout by finding a (locally) minimum energy state for the associated physical system. The result is that two tags appear close in this space if they are semantically similar, and distant otherwise.