دانلود مقاله ISI انگلیسی شماره 156791
ترجمه فارسی عنوان مقاله

ترکیبی از لغات شباهت هایی برای تشخیص پارافرها

عنوان انگلیسی
Combining sentence similarities measures to identify paraphrases
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
156791 2018 15 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computer Speech & Language, Volume 47, January 2018, Pages 59-73

ترجمه کلمات کلیدی
شباهت جمله شناسایی پارافرز، ساده سازی احکام، مدل مبتنی بر گراف
کلمات کلیدی انگلیسی
Sentence similarity; Paraphrase identification; Sentence simplification; Graph-based model;
پیش نمایش مقاله
پیش نمایش مقاله  ترکیبی از لغات شباهت هایی برای تشخیص پارافرها

چکیده انگلیسی

Paraphrase identification consists in the process of verifying if two sentences are semantically equivalent or not. It is applied in many natural language tasks, such as text summarization, information retrieval, text categorization, and machine translation. In general, methods for assessing paraphrase identification perform three steps. First, they represent sentences as vectors using bag of words or syntactic information of the words present the sentence. Next, this representation is used to measure different similarities between two sentences. In the third step, these similarities are given as input to a machine learning algorithm that classifies these two sentences as paraphrase or not. However, two important problems in the area of paraphrase identification are not handled: (i) the meaning problem: two sentences sharing the same meaning, composed of different words; and (ii) the word order problem: the order of the words in the sentences may change the meaning of the text. This paper proposes a paraphrase identification system that represents each pair of sentence as a combination of different similarity measures. These measures extract lexical, syntactic and semantic components of the sentences encompassed in a graph. The proposed method was benchmarked using the Microsoft Paraphrase Corpus, which is the publicly available standard dataset for the task. Different machine learning algorithms were applied to classify a sentence pair as paraphrase or not. The results show that the proposed method outperforms state-of-the-art systems.