دانلود مقاله ISI انگلیسی شماره 5816
ترجمه فارسی عنوان مقاله

تلخیص اسناد متعدد بر اساس الگوریتم بهینه سازی تکاملی

عنوان انگلیسی
Multiple documents summarization based on evolutionary optimization algorithm
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
5816 2013 15 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 40, Issue 5, April 2013, Pages 1675–1689

ترجمه کلمات کلیدی
تلخیص چندگانه - تنوع - پوشش محتوا - مدل بهینه سازی - الگوریتم تکامل افتراقی - متقاطع خود انطباقی -
کلمات کلیدی انگلیسی
Multi-document summarization,Diversity,Content coverage,Optimization model, Differential evolution algorithm,Self-adaptive crossover,
پیش نمایش مقاله
پیش نمایش مقاله  تلخیص اسناد متعدد بر اساس الگوریتم بهینه سازی تکاملی

چکیده انگلیسی

This paper proposes an optimization-based model for generic document summarization. The model generates a summary by extracting salient sentences from documents. This approach uses the sentence-to-document collection, the summary-to-document collection and the sentence-to-sentence relations to select salient sentences from given document collection and reduce redundancy in the summary. To solve the optimization problem has been created an improved differential evolution algorithm. The algorithm can adjust crossover rate adaptively according to the fitness of individuals. We implemented the proposed model on multi-document summarization task. Experiments have been performed on DUC2002 and DUC2004 data sets. The experimental results provide strong evidence that the proposed optimization-based approach is a viable method for document summarization.

مقدمه انگلیسی

Interest in text mining started with advent of on-line publishing, the increased impact of the Internet and the rapid development of electronic government (e-government). With the exponential growing of the information–communication technologies a huge amount of electronic documents are available online. This explosion of electronic documents has made it difficult for users to extract useful information from them. While the Internet has increased access to text collections on a variety of topics, consumers now face a considerable amount of redundancy in the texts that they encounter online. In this case, the user due to the large amount of information does not read many relevant and interesting documents. Thus, now more than ever, consumers need access to robust text summarization systems, which can effectively condense information found in several documents into a short, readable synopsis, or summary (Harabagiu and Lacatusu, 2010 and Yang and Wang, 2008). Text mining approach is feasible and powerful for e-government digital archives. Digital archives have been built up in almost every level of e-government hierarchy. Digital archives in the domain of e-government involve various medium formats, such as video, audio and scanned document. In fact, governmental documents are the most important production of e-government, which contain the majority information of government affairs. The text mining approach described in Dong, Yu, and Jiang (2009) targets the text in the scanned documents. The mined knowledge helps a lot in policymaking, emergency decision support, and government routines for civil servants. The successful application of the system to archives testifies the correctness and soundness of this approach. Text summarization is a good way to condense a large amount of information into a concise form by selecting the most important and discarding the redundant information. According to Mani and Maybury (1999), automatic text summarization takes a partially structured source text from multiple texts written about the same topic, extracts information content from it, and presents the most important content to the user in a manner sensitive to the user’s needs. Nowadays, without browsing the large volume of documents, search engines such as Google, Yahoo!, AltaVista, and others provide users with the clusters of documents they are interested in and present a summary of each document briefly which facilitates the task of finding the desired documents (Boydell and Smyth, 2010, Shen et al., 2007, Song et al., 2011 and Yang and Wang, 2008). Boydell and Smyth (2010) focus on the role of snippets in collaborative web search and describe a technique for summarizing search results that harnesses the collaborative search behavior of communities of like-minded searchers to produce snippets that are more focused on the preferences of the searchers. They go on to show how this so-called social summarization technique can generate summaries that are significantly better adapted to searcher preferences and describe a novel personalized search interface that combines result recommendation with social summarization. Depending on the number of documents, summarization techniques can be classified into two classes: single-document and multi-document (Fattah and Ren, 2009 and Zajic et al., 2008). Single-document summarization can only condense one document into a shorter representation, whereas multi-document summarization can condense a set of documents into a summary. Multi-document summarization can be considered as an extension of single-document summarization and used for precisely describing the information contained in a cluster of documents and facilitate users to understand the document cluster. Since it combines and integrates the information across documents, it performs knowledge synthesis and knowledge discovery, and can be used for knowledge acquisition (Zajic et al., 2008). In addition to single document summarization, which has been first studied in this field for years, researchers have started to work on multi-document summarization whose goal is to generate a summary from multiple documents. The multi-document summarization task has turned out to be much more complex than summarizing a single document, even a very large one. This difficulty arises from inevitable thematic diversity within a large set of documents. A multi-document summary can be used to concisely describe the information contained in a cluster of documents and to facilitate the users to understand the document cluster.

نتیجه گیری انگلیسی

For effective multi-document summarization, it is important to reduce redundant information in the summaries and extract sentences which are common to given documents. This paper presents a document summarization model which extracts key sentences from given documents while reducing redundant information in the summaries. The model is represented as a discrete optimization problem. To solve the discrete optimization problem we created a self-adaptive DE algorithm. We implemented our model on multi-document summarization task. Our experiments have shown that the proposed model is to be preferred over summarization systems. We also showed that the resulting summarization system based on the proposed optimization approach is competitive on the DUC2002 and DUC2004 data sets.