دانلود مقاله ISI انگلیسی شماره 156509
ترجمه فارسی عنوان مقاله

طبقه بندی سوال در فارسی با استفاده از واژه های بردار و فرکانس

عنوان انگلیسی
Question classification in Persian using word vectors and frequencies
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
156509 2018 12 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Cognitive Systems Research, Volume 47, January 2018, Pages 16-27

پیش نمایش مقاله
پیش نمایش مقاله  طبقه بندی سوال در فارسی با استفاده از واژه های بردار و فرکانس

چکیده انگلیسی

The necessity of the existence of Question Answering (QA) systems becomes evident by considering the fact that the enormous amount of unstructured data created by humans nowadays, results in ineffectiveness of search engines to provide the exact solution for a given question. However, an outstanding question answering system requires an outstanding Question Classification (QC) system. Question classifier is a system that assigns a label to each question. There exist different ways of solving this problem such as rule-based, machine learning, and hybrid approaches. This paper provides a better solution for QC using machine-learning approaches. Three methods of feature extraction are proposed in this paper. The First method uses clustering algorithms to partition vocabulary into clusters and acquires feature vector corresponding to each question using clustering information. The second one suggests a method of extracting features from questions to dispose of using recurrent neural networks and to use feedforward neural networks, which have the advantage of learning faster and less need for data, instead. Each question is converted to a feature vector, which is obtained by the Word2vec method and weighted by tf-idf coefficients. The results of question classification using Support Vector Machine and Neural Network classifiers indicate the effectiveness of this type of feature vector and based on that, high performance of the proposed QC system. Finally, the third approach keeps the innovation behind first approach, but it also keeps the fact that we are dealing with a sequence based type of data into consideration. Eventually, it would be concluded that even with a limited amount of data it is reasonable to take Recurrent Neural Networks into consideration.