شبکه های بیزی پویا و الگوریتم ژنتیک با طول متغیر برای طراحی مدل مبتنی بر نشانه برای شناسایی عمل گفت و گو
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
29008 | 2010 | 29 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computer Speech & Language, Volume 24, Issue 2, April 2010, Pages 190–218
چکیده انگلیسی
The automatic recognition of dialogue act is a task of crucial importance for the processing of natural language dialogue at discourse level. It is also one of the most challenging problems as most often the dialogue act is not expressed directly in speaker’s utterance. In this paper, a new cue-based model for dialogue act recognition is presented. The model is, essentially, a dynamic Bayesian network induced from manually annotated dialogue corpus via dynamic Bayesian machine learning algorithms. Furthermore, the dynamic Bayesian network’s random variables are constituted from sets of lexical cues selected automatically by means of a variable length genetic algorithm, developed specifically for this purpose. To evaluate the proposed approaches of design, three stages of experiments have been conducted. In the initial stage, the dynamic Bayesian network model is constructed using sets of lexical cues selected manually from the dialogue corpus. The model is evaluated against two previously proposed models and the results confirm the potentiality of dynamic Bayesian networks for dialogue act recognition. In the second stage, the developed variable length genetic algorithm is used to select different sets of lexical cues to constitute the dynamic Bayesian networks’ random variables. The developed approach is evaluated against some of the previously used ranking approaches and the results provide experimental evidences on its ability to avoid the drawbacks of the ranking approaches. In the third stage, the dynamic Bayesian networks model is constructed using random variables constituted from the sets of lexical cues generated in the second stage and the results confirm the effectiveness of the proposed approaches for designing dialogue act recognition model.
مقدمه انگلیسی
Dialogue act (hereafter DA) is a concise abstraction of speaker’s intention-what a speaker is trying to achieve by his utterance. It has roots in several language theories of meaning, particularly speech act theory (Austin, 1962) which interprets any utterance as a kind of action, called speech acts, performed by a speaker and categorised them into speech acts categories (Searle, 1975). DA, however, extends speech act by taking into account the context of the utterance (Bunt, 1994). Fig. 1 is a hypothetical dialogue annotated with DAs. Full-size image (14 K) Fig. 1. Hypothetical dialogue annotated with DAs. Figure options Dialogue act recognition (DAR) is a task of crucial importance for the processing of natural language dialogue at discourse level in various applications such as dialogue systems, machine translation, speech recognition, and meeting summarisation. For example, it conditions a successful interpretation of user’s utterance which is the main function of natural language understanding unit in dialogue systems. Formally, it is defined as follows: given an utterance with its preceding context, how to determine the DA it realises. On the other hand, the task is challenging because most often DA is not expressed directly in speaker’s utterance, and consequently the meaning of the utterance is not the intended meaning. For instance, a dialogue system without DAR ability interprets the utterance “Can you reserve three tickets for me?” as if a user questioning its ability to reserve tickets, whereas his actual intention is a request to reserve three tickets. Obviously, such dialogue system is inadequate. The literature of DAR indicates that the endeavours for modeling DAR started early seventies and resulted in two types of models (Jurafsky, 2004 and Jurafsky and Martin, 2000). The models of the first type, known as plan-based model (Cohen and Perrault, 1979, Perrault and Allen, 1980 and Allen and Perrault, 1980), are based, essentially, on belief logic to infer the meaning of the utterance, and then use it to infer the DA in a subsequent stage as depicted in Fig. 2. These models tend to be very time consuming of both human labour in development of plan inference heuristics and system time in running these heuristics (Jurafsky and Martin, 2000). Full-size image (8 K) Fig. 2. Plan-based DAR model. Figure options The models of the second type, known as cue-based model (Stolcke et al., 2000), are characterised by extensive use of Machine Learning (ML) approaches to automatically discover association rules between surface linguistic cues of utterances and DAs as shown in Fig. 3. This particular aspect of cue-based model displaces the burden of manual design of the association rules from human expert and makes these models more attractive from computational point of view (Jurafsky, 2004). Full-size image (6 K) Fig. 3. Cue-based DAR model. Figure options Among the wide spectrum of ML approaches investigated for constructing cue-based models (Fishel, 2007), the statistical approaches are the most prominent, due to their distinctive properties of modularity and ability to handle well both rules and exceptions to those rules. N-Gram ( Reithinger and klesen, 1997), classification and regression tree ( Shriberg et al., 2000), hidden Markov models ( Wright, 1998, Chu-Carroll, 1998 and Stolcke et al., 2000), naïve bayes (NB) ( Grau et al., 2004 and Ivanovic, 2005), static Bayesian networks (SBNs) ( Keizer et al., 2002 and Keizer and op den Akker, 2007), and maximum entropy ( Lesch, 2005), as statistical ML approaches, have been explored. Besides that, non-statistical ML approaches have also been investigated such as artificial neural networks ( Kipp, 1998), transformation-based learning ( Samuel et al., 1998), decision trees (Verbree et al., 2006), and memory-based learning ( Lendvai et al., 2003). Although the cue-based models, particularly statistical ones, have proven a remarkable success in practical implementations of DAR, they still have some drawbacks which limit their recognition accuracy. These drawbacks stem from the lack of capturing certain aspects of dialogue discourse, and consequently, are reflected as simplification assumptions and tentative approaches for designing cue-based models. In what follows, a number of these drawbacks are highlighted. • Inadequate representation of dialogue context: In the previous cue-based models, the representation of dialogue context is confined to a short span of previous DAs. However, dialogue theories suggest a detailed representation of dialogue context. For example, dynamic interpretation theory ( Bunt, 2000) represents dialogue context in several dimensions, and one of these dimensions is the linguistic context, which involves all linguistic materials of the surrounding utterances. For cue-based DAR model, the linguistic context may involve DAs, lexical cues, or syntactical cues from previous utterances that are useful for recognising the DA of the current utterance. • Intra-utterance and Inter-utterances independency assumptions: In the previous cue-based models, two radical and simplification assumptions, yet incorrect are assumed. The first assumption, the intra-utterance independency assumption, adopted in all models except SBN, assumes independency between cues extracted from the utterance that is being interpreted ( Jurafsky and Martin, 2000). The second assumption, the inter-utterances independency assumption, which can be observed in hidden Markov model, assumes independency between cues from consecutive utterances. The second assumption was pointed out by Stolcke et al. (2000) and Clark (2003) who recommend further researches to relax it. • Inaccurate estimation of recognition accuracy: Except hidden Markov model, all these models assume that the previous utterance’s DAs are computed correctly or known values, while in actual scenario the previous utterances’ DAs are results of previous estimation by either the DAR model or other component in the system. • Suboptimal lexical cues selection approaches: Three approaches of lexical cues selection can be characterised ( Clark, 2003): Language models, manual approaches, and automatic approaches. The language models treat all phrases as useful cues and represent them as huge probabilities matrix, and consequently suffer from effects of irrelevant phrases. The manual approaches produce general cues which cannot be used for all domains. The automatic lexical cues selection approaches, called ranking approaches, depend on statistical metrics to measure the relevancy of phrases to DAs ( Samuel et al., 1999, Webb et al., 2005a, Lesch, 2005 and Kats, 2006; Verbree et al., 2006) and select the top k phrases. Despite their efficiency, they are suboptimal, because they do not account for the correlations between the selected cues. Furthermore, the selection strategy of the ranking approaches does not lead to exploiting negative lexical cues effectively. More precisely, they either ignore the negative lexical cues or treat them as positive lexical cues, despite the fact that, in an imbalanced dataset the distribution of each type is different, and consequently each type can be exploited in a different manner to maximise their usefulness. Motivated by these drawbacks and inspired by the work of Keizer and op den Akker (2007), we have proposed a new cue-based model for DAR. In their research, Keizer and op den Akker (2007) built a cue-based DAR model in the form of SBN, to deal with the uncertainty that is inherent in DAR, using ML approaches. In doing so, they made use of both lexical features extracted from the utterance that is being interpreted, and context features in the form of DAs of previous utterances. However, a review of their model reveals that, it still has the common drawbacks mentioned above. More specifically, in the SBN model, the representation of dialogue context is confined to the DAs of the previous utterances, the DAs of the previous utterances are assumed known values, successive utterances are assumed independent, and the selection of the lexical cues is performed manually based on initial linguistic intuitions. Based on these observations, the new cue-based model for DAR has been proposed. The proposed model adopts dynamic Bayesian networks (DBNs) for DAR and employs a variable length genetic algorithm (VLGA) for automatic selection of lexical cues. DBNs allow a rich representation of dialogue context, relax intra-utterance and inter-utterances independency assumptions, and represent the previous utterances’ DAs as joint probabilities distribution, instead of assuming them known values. Moreover, VLGA approach handles the drawbacks of the ranking approaches through accounting for the correlation between the selected lexical cues, and providing a framework in which the negative lexical cues can be exploited effectively. The remainder of this paper is organised as follows. In Section 2, how DBN is being used as cue-based DAR model is described. In Section 3, the ranking approaches used for lexical cues selection are reviewed and their drawbacks are highlighted. How VLGA is employed to perform lexical cues selection is also explained in Section 3. The results of different stages of experiments conducted to evaluate the proposed approaches are also presented and discussed in Section 4. Section 5 is devoted for conclusion and future extensions.
نتیجه گیری انگلیسی
In this paper a new cue-based model for DAR has been presented. It is based essentially on DBN and VLGA, which have been experimented in three stages of experiments allow us to draw a number of conclusions. In the first place, the results confirm the potentiality of using DBNs to model DAR. The distinctive features of DBN model make them promising and applicable in the real world systems. Another conclusion pertaining the maximum number of DBNs time slices needed to maximise DAR accuracy. The experiments confirm that two time slices is enough to enable DBNs to give the highest recognition accuracy. While going beyond two time slices in Blf models neither improve the recognition accuracy nor affect it negatively, using more than two time slices in Flf models gradually decline the recognition accuracy. Finally the results of the initial DBNs model confirm the relevancy of the lexical cues to the recognition of Flf DAs rather than the Blf and the relevancy of the contextual features particularly previous Flf DAs to the recognition of Blf DAs. The results of the lexical cues selection experiments suggest a number of important conclusions. First, the ranking approaches are not optimal for lexical cues selection in DAR and similar high dimension domains where the number of features is huge and the features are highly correlated. Second, the results confirm the inability of the ranking approaches to exploit the negative features in the imbalanced data domains. the ability of the proposed VLGA approach to account for the correlation between the selected cues and to exploit the negative cues make the proposed VLGA approach efficient for the selection of useful cues in high dimension domains. A general conclusion that can be drawn from the lexical cue selection experiments is the importance of the selection of more informative cues for improving the recognition accuracy of the DBNs models. The proposed lexical cues selection approach is more educated and its use for DBNs models is more systematic. As a result of this, all the DBNs models in the final stage experiments have witnessed a remarkable improvement in the recognition accuracy, particularly for Ff DAs. Several directions can be explored to improve the recognition accuracy of the DBNs model or to investigate it in the real word applications. Firstly, for the sake of generalisation, the proposed model should be tested on different dialogue corpora form different domains and annotated by different type of annotation schemes. Secondly, in the presented DBNs models, there is one factor that could give rise to improved accuracies, that is, the dialogue structure. The dialogue is structured with subdialogues such as subdialogue for opening, verification or subdialogues related to the task at hand. In this research the dialogue structure has not been taken into account. Thirdly, further experiments are needed to show how the proposed model can be used in dialogue systems in more general sense than just for the specific task of DAR. Finally, the DBNs models that were discussed extensively in this study focus only on the recognition of the DAs based on some relevant superficial linguistic features and contextual information. Thus other features such as the beliefs and preferences of the speaker may also be incorporated into the DBNs models.