استفاده از رگرسیون لجستیک برای ارتباط بازخورد در سیستم های بازیابی تصویر
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24751 | 2007 | 12 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Pattern Recognition, Volume 40, Issue 10, October 2007, Pages 2621–2632
چکیده انگلیسی
This paper deals with the problem of image retrieval from large image databases. A particularly interesting problem is the retrieval of all images which are similar to one in the user's mind, taking into account his/her feedback which is expressed as positive or negative preferences for the images that the system progressively shows during the search. Here we present a novel algorithm for the incorporation of user preferences in an image retrieval system based exclusively on the visual content of the image, which is stored as a vector of low-level features. The algorithm considers the probability of an image belonging to the set of those sought by the user, and models the logit of this probability as the output of a generalized linear model whose inputs are the low-level image features. The image database is ranked by the output of the model and shown to the user, who selects a few positive and negative samples, repeating the process in an iterative way until he/she is satisfied. The problem of the small sample size with respect to the number of features is solved by adjusting several partial generalized linear models and combining their relevance probabilities by means of an ordered averaged weighted operator. Experiments were made with 40 users and they exhibited good performance in finding a target image (4 iterations on average) in a database of about 4700 images. The mean number of positive and negative examples is of 4 and 6 per iteration. A clustering of users into sets also shows consistent patterns of behavior.
مقدمه انگلیسی
The last few years have witnessed an increasing amount of pictorial information in different digital formats. Thus large image databases raise the need to retrieve relevant data efficiently. In this framework, content-based image retrieval (CBIR) systems are one of the most promising techniques for retrieving multimedia information [1], [2] and [3]. CBIR systems are thought of as an improvement on traditional image retrieval systems based on textual information such as keywords. The new CBIR systems take advantage of valuable digital information held by the image itself. Visual features related to color, shape and texture are extracted in order to describe the image content [4]. The main drawback of textual image retrieval systems, that is, the annotator dependency, would be overcome in pure CBIR systems. Several papers have been published trying to integrate both approaches: textual and CBIR [5] and [6]. Image features are a key aspect of any CBIR system. A general classification can be made: low-level features (color, texture and shape) and high-level features (usually obtained by combining low-level features in a reasonably predefined model). High-level features have a strong dependency on the application domain, therefore they are not usually suitable for general purpose systems. This is the reason why one of the most important and developed research activities in this field has been the extraction of good low-level image descriptors. Obviously, there is an important gap between these features and human perception (a semantic gap). For this reason, different methods (mostly iterative procedures) have been proposed to deal with the semantic gap [7]. In most cases the idea underlying these methods is to integrate the information provided by the user into the decision process. This way, the user is in charge of guiding the search by indicating his/her preferences, desires and requirements to the system. The basic idea is rather simple: the system displays a set of images (resulting from a previous search); the user selects the images that are relevant (desired images) and rejects those which are not (images to avoid) according to his/her particular criterion; the system then learns from these training examples to achieve an improved performance in the next run. The process goes on iteratively until the user is satisfied. The iterative algorithms which, in order to improve the set resulting from a query, require the user to enter his/her preferences in each iteration, are called relevance feedback algorithms [8]. These algorithms have been shown to provide a dramatic boost in retrieval system performance. Being part of this mainstream, this paper presents a new algorithm for relevance feedback in image databases based on logistic regression models. A query can be seen as an expression of an information need to be satisfied. Any CBIR system aims at finding images relevant to a query and thus to the information need expressed by the query. The relationship between any image in the database and a particular query can be expressed by a relevance value. This relevance value relies on the user-perceived satisfaction of his/her information need. The relevance value can be interpreted as a mathematical probability (a relevance probability). The notion of relevance probability is not unique because different interpretations have been given by different authors. In this paper a relevance probability π(I)π(I) is a quantity which reflects the estimation of the relevance of the image I with respect to the user's information needs. Initially, every image in the database is equally likely, but as more information on the user's preferences becomes available, the probability measure concentrates on a subset of the database. The iterative relevance feedback scheme proposed in the present paper is based on logistic regression analysis for ranking a set of images in decreasing order of their evaluated relevance probabilities. Logistic regression is based on the construction of a linear model whose inputs, in our case, will be the image characteristics extracted from the image I and whose output is a function of π(I)π(I). In logistic regression analysis, one of the key features to be established is the order of the model to be fitted. The order of logistic regression model, the number of image characteristics, and the number of relevant (positive/negative) images the user is prompted to select, are strongly related. The order of the model must be in accordance with the reasonable amount of feedback images requested from the user. For example, it is not reasonable for the user to select 40 images in each iteration; a feedback of 5/10 images would be acceptable. This requirement leads us to group the image features into n smaller subsets, each consisting of semantically related characteristics. The outcome of this strategy is that n smaller regression models must be adjusted: each sub-model will produce a different relevance probability πk(I)πk(I)(k=1,…,n)(k=1,…,n). We then face to the question of how to combine the πk(I)πk(I) in order to rank the database according to the user's preferences. We tackled this problem by making use of the so-called OWA (ordered weighted averaging) operators which were introduced by Yager [9] and provide a consistent and versatile way of aggregating multiple inputs into one single output. Section 2 describes related work addressing issues of feature relevance computation. Section 3 presents and explains our approach in detail. Next, Section 3.1 describes the low-level features extracted from the images and used to retrieve them. After that, in Section 4 we present experimental results which evaluate the performance of our technique using real-world data. Finally, in Section 5 we extract conclusions and point to further work.
نتیجه گیری انگلیسی
The new requirements for a system able to retrieve images from very large databases based only on their visual content are motivating a lot of research on this topic. This paper addresses the problem by means of an algorithm based on logistic regression. Since the user looks for images which are similar to his/her query, this defines a set whose indicator function, appropriately transformed by the logit mapping, is the output of the model to be fitted; its inputs are the low-level image features directly extracted from the image. The main advantage of the method is the facility of incorporating the user's feedback. Its main drawback is the lack of sufficient information (too small a sample) to fit the model, since the number of inputs (image features) is usually high. This has been addressed by means of partial models that get the output from each subset of the inputs whose components are semantically related. The problem of combining the information from the different models, which is a data fusion problem, is addressed by using an ordered weighted averaging (OWA) operator. An experiment of image retrieval from a large database (about 4700 images) has been designed and executed by 40 users of different ages and backgrounds. Due to the difficulty of evaluating subjective similarity in an objective way, the goal of the experiment was to retrieve a requested image. Results show that this could be done in an average of less than four (3.79) cycles of selection/ordering/presentation, for which the user selects an average of 3.5 positive and 5.7 negative samples per cycle. We consider this to be a good preliminary result that shows the usefulness of the proposed algorithm. As a side result, a cluster of the users based on their behavior when faced with the system has been done. Results show four clear groups of users, depending on their personal attitude (patient/impatient) and on their ability to capture the visual resemblance between images. As a further project, we intend to extend the model to ordinal data, allowing the user to qualify the degree of similarity with several levels (probably, five) instead of just identifying the image as similar/dissimilar. This can be done with standard statistical techniques for categorical data regression. Also, the preprocessing of the inputs (low-level image features) will be improved by using principal component or independent component analysis, which will improve the robustness and accuracy of the fitted models.