دانلود مقاله ISI انگلیسی شماره 157505
ترجمه فارسی عنوان مقاله

انتخاب پاسخ از اسناد غیر ساختاری برای سیستم های مکالمه انسانی

عنوان انگلیسی
Response selection from unstructured documents for human-computer conversation systems
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
157505 2018 38 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Knowledge-Based Systems, Volume 142, 15 February 2018, Pages 149-159

ترجمه کلمات کلیدی
سیستم مکالمه انسانی، تعامل انسان و روبات، تطبیق متن، پردازش زبان طبیعی، هوش مصنوعی، 00-01، 99-00،
کلمات کلیدی انگلیسی
Human-computer conversation system; Human-robot interaction; Text matching; Natural language processing; Artificial intelligence; 00-01; 99-00;
پیش نمایش مقاله
پیش نمایش مقاله  انتخاب پاسخ از اسناد غیر ساختاری برای سیستم های مکالمه انسانی

چکیده انگلیسی

This paper studies response selection for human-computer conversation systems. Existing retrieval-based human-computer conversation systems are intended to reply to user utterances based on existing utterance-response pairs. However, collecting sufficient utterance-response pairs is intractable in practical situations, especially for many specific domains. We introduce DocChat a novel information retrieval approach for human-computer conversation systems that can use unstructured documents rather than semi-structured utterance-response pairs, to react to user utterances. The key of DocChat is a learning to rank model with features designed at various levels of granularity which is proposed to quantify the relevance between utterances and responses directly. We conduct comprehensive experiments on both sentence selection and real human-computer conversation scenarios. Empirical studies of sentence selection datasets shows reasonable improvements and the strong adaptability of our model. We compare DocChat with Xiaoice, a famous open domain chitchat engine in China. Side-by-side evaluation shows that DocChat is a good complement for human-computer conversation systems using utterance-response pairs as the primary source of responses. Furthermore, we release a large scale open-domain dataset for sentence selection which contains 304,413 query-sentence pairs.