دانلود مقاله ISI انگلیسی شماره 79619
ترجمه فارسی عنوان مقاله

یادگیری مدت توزین از طریق برنامه نویسی ژنتیک برای طبقه بندی متن

عنوان انگلیسی
Term-weighting learning via genetic programming for text classification
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
79619 2015 14 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Knowledge-Based Systems, Volume 83, July 2015, Pages 176–189

ترجمه کلمات کلیدی
یادگیری مدت توزین؛ برنامه نویسی ژنتیک؛ استخراج متن؛ آموزش نمایندگی؛ کیسه از کلمات
کلمات کلیدی انگلیسی
Term-weighting learning; Genetic programming; Text mining; Representation learning; Bag of words
پیش نمایش مقاله
پیش نمایش مقاله  یادگیری مدت توزین از طریق برنامه نویسی ژنتیک برای طبقه بندی متن

چکیده انگلیسی

This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks.