شناخت انسان و کامپیوتر از حالات احساسات صورت
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|37680||2007||11 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Neuropsychologia, Volume 45, Issue 1, 2007, Pages 152–162
Abstract Neuropsychological and neuroimaging evidence suggests that the human brain contains facial expression recognition detectors specialized for specific discrete emotions. However, some human behavioral data suggest that humans recognize expressions as similar and not discrete entities. This latter observation has been taken to indicate that internal representations of facial expressions may be best characterized as varying along continuous underlying dimensions. To examine the potential compatibility of these two views, the present study compared human and support vector machine (SVM) facial expression recognition performance. Separate SVMs were trained to develop fully automatic optimal recognition of one of six basic emotional expressions in real-time with no explicit training on expression similarity. Performance revealed high recognition accuracy for expression prototypes. Without explicit training of similarity detection, magnitude of activation across each emotion-specific SVM captured human judgments of expression similarity. This evidence suggests that combinations of expert classifiers from separate internal neural representations result in similarity judgments between expressions, supporting the appearance of a continuous underlying dimensionality. Further, these data suggest similarity in express
Introduction The premise that emotions are discrete entities with distinct physiological signatures dates back to Charles Darwin's observations of continuity in prototypical displays of emotion across animal species (Darwin, 1872). Darwin speculated that displays across species mapped onto such emotion states as pain, anger, astonishment, and terror. In revisiting Darwin's observations, the universality of emotions was examined in cross-cultural human studies in which participants were asked to identify (Ekman & Friesen, 1971) and pose (Ekman, 1972) facial expressions associated with emotion-specific described contexts. A primary set of basic emotions was identified with characteristic facial signatures that had substantial cross-cultural expression and recognition (Ekman & Friesen, 1971). Thus emotional experience and expression has been characterized as a set of discrete dimensions coding activation of specific states, such as fear, anger, sadness, or happiness (Ekman, 1992). More complex emotions, like love, may occur from secondary mixtures of these proposed basic prototypes. Basic emotions would then provide the palette from which more complex emotions are mixed (Plutchik, 1980). Behavioral evidence from forced choice recognition of morphs between prototypical expressions demonstrates non-linearities consistent with categorical perception, implying the existence of discrete expression categories (Calder, Young, Perrett, Etcoff, & Rowland, 1996; Etcoff & Magee, 1992; Young et al., 1997). Neuropsychological and neuroimaging evidence likewise provide evidence consistent with neurally localized discrete representations of facial expressions. Damage to the amygdala differentially impairs fear recognition whilst leaving other discrete emotions such as disgust recognition largely intact, while damage to anterior insula differentially impairs disgust recognition but leaves fear recognition intact (Adolphs et al., 1999 and Phillips et al., 1998). Convergent evidence from functional neuroimaging demonstrates that fear expressions maximally activate the amygdala while disgust expressions maximally activate the anterior insula (Anderson, Christoff, Panitz, De Rosa, & Gabrieli, 2003; Phillips et al., 1998). Similarly, discrete neural representations have recently been proposed for recognition of anger in the ventral striatum (Calder, Keane, Lawrence, & Manes, 2004). Finding such dissociations in recognition for a variety of basic prototypes would provide further evidence for their status as primaries on which emotional experience and communication depend. The alternative view suggests that emotion space is characterized by lower order dimensions, such that emotions are fuzzy categories clustered on axes such as valence, arousal, or dominance (Russell, 1980; Russell & Bullock, 1986; Schlosberg, 1952). Thus emotions can be understood according to their relatively continuous ordering around a circumplex characterized by a few underlying dimensions. In these models, recognizing facial expressions relies on an ability to find the nearest cluster to the current exemplar in this continuous, low dimensional space rather than by matching to basic emotion prototypes. Behavioral evidence is consistent with some form of lower order dimensional representation of emotions, whereby some emotion types (e.g., anger and disgust) are closer than others (e.g., sadness and happiness) in emotion space. As such, expression judgments tend to overlap, indicating that emotion categories are not entirely discrete and independent. Proximity of particular expression exemplars (e.g. anger) to other expression exemplars (e.g. disgust) is tightly clustered across individuals, reflecting the possibility that categorization tasks force boundaries to be drawn in the lower dimensional expression space. In contrast with these lower order dimension theories, basic prototype accounts do not make explicit the similarity relationships between the basic emotions, as they do not explain the tight or distant clustering between expression types. Although integrating behavioral accounts with neuropsychological and neuroimaging studies provides important data towards explaining emotion space, progress in the field of machine perception and machine learning offers an opportunity to test the computational consequences of different representational theories. Such an approach also affords examining the extent to which recognition of emotional expressions directly reflects the statistical structure of the images to which humans are exposed. Interest in facial expression recognition has been evolving in computer science as researchers focus on building socially interactive systems that attempt to infer the emotional state of users (Fasel et al., 2004). Progress in computer facial expression analysis has just begun to contribute to understanding the information representations and brain mechanisms involved in facial emotion perception because approaches from the various disciplines have not been integrated and closely compared with human recognition data. Machine learning approaches to facial expression recognition provide a unique opportunity to explore the compatibility or incompatibility of different theories of emotion representation. To the degree that human data on facial expression recognition is consistent with basic prototype accounts, it is unclear if such representations can support the similarity relationships between the basic emotions, as do models that describe emotions in terms of a small number of underlying dimensions. We addressed this issue in the present study by comparing human behavioural data to a computer model that was trained to make a seven-way forced choice between basic expressions plus neutral faces. The system was developed by machine learning methods with the only goal of providing strong expression discrimination performance by developing distinct expert classifiers for different basic emotions. No attempt was made to fit human data. In the model, support vector machine (SVM) classifiers were trained to maximally discriminate a particular emotion. In contrast to traditional back-propagating neural networks that minimize the training error between network output and target for each training example (e.g. Dailey, Cottrell, Padgett, & Adolphs, 2002), SVMs learn an optimal decision boundary between two labeled classes by focusing on difficult training examples (Burges, 1998). This method finds features that maximally separate decision boundaries resulting in a high level of discrimination performance between expression types, minimizing false alarms to non-target expressions. Each expert is trained independently from all the other experts, and then their opinions are integrated. The extent to which such a computer model of expression recognition correlates with human judgments of expression similarity will be a strong test of whether separate internal representations can support similarity judgments attributed to continuous underlying dimensions. Such a comparison can provide important computational constraints on how emotional expression recognition may take place in the human brain.
نتیجه گیری انگلیسی
. Results 3.1. Discrimination performance Computer model outputs were measured on generalization to untrained exemplars on which human subjects made their judgments. As illustrated in Fig. 4, standardized ratings for each of the target emotions (ratings for humans and SVM activations) demonstrate that the model performed comparably to human ratings for all expressions (falling within 1 standard deviation). For both human and model judgments, consistent with accurate discrimination, the target expression received the highest average ratings for each expression type (i.e. surprise ratings were highest for surprise, sadness ratings highest for sad, etc.). Standardized target emotion ratings (e.g. anger ratings for angry faces) for ... Fig. 4. Standardized target emotion ratings (e.g. anger ratings for angry faces) for human subjects and SVM activations for the computer model averaged over exemplars. Means for each subject are plotted as points and the overall human subject mean is represented by a horizontal line. Mean standard ratings for the computer model are indicated by a triangle. Figure options As a different index of discrimination performance, the continuous data were converted into a forced choice format by defining correct responses as the proportion of exemplars on which the maximal response was for the target prototypical label. Humans correctly classified the target expressions with differing degrees of accuracy (mean = 89.2%, S.D. = 0.17). Happiness (mean = 98.4%, S.D. = 0.04), followed by surprise (92.9%, 0.10) and sadness (91.8%, 0.10) were discriminated accurately followed by anger (88.0%, 0.16), fear (84.8%, 0.17) and disgust (79.3%, 0.30). An ANOVA demonstrated statistically reliable differences in accuracy across expression types, F(5, 132) = 3.71, p < 0.005, consistent with expression recognition success differing across expression type. The computer model showed good generalization performance on the untrained exemplars (mean = 79.2%, S.D. = 0.292). Similar to human performance, accuracy was highest for happiness = 100%, sadness = 100% and surprise = 100%, with less accurate performance on anger = 75%, disgust = 75% and relatively poor performance on fear (25%). Despite the model's high average ratings of fear for the fear prototypes, the low forced choice performance for fear expressions reflected an overlap with surprise and sadness ratings. Fear expressions were classified as surprise 62.5% of the time, and as sadness 12.5% of the time. The low accuracy for fear relative to the other expressions is consistent with evidence that fear recognition is particularly difficult ( Rapcsak et al., 2000). Inspection of Fig. 4 revealed that, rather than all human perceivers demonstrating identical expression recognition performance, there was substantial variability across subjects, particularly for anger, disgust and fear. We used principal component analysis (PCA) to explore the patterns of accuracy variability across subjects and the model. The PCA two-factor solution indicated two clear clusters each containing 7 of the 23 subjects. As shown in Fig. 5a, the cluster that contained the computer model was characterized by subjects who had difficulty with fear. The other cluster, not containing the model, was defined by difficulty in classifying disgust (Fig. 5b). Thus, to the degree that the model differs from idealized mean group performance, it also behaved similarly to a major subgroup of human participants. Target emotion forced choice accuracy for two clusters of human subjects ... Fig. 5. Target emotion forced choice accuracy for two clusters of human subjects identified by MDS. Each human subject is represented by a different filled shape. (a) Depicts subjects that consistently rate fear lower than the other expressions. The computer model (open circles) fits this accuracy pattern. (b) Unlike the model, some subgroups of subjects consistently rate disgust lower than the other expressions. Figure options 3.2. Similarity performance We next examined whether the model's appreciation of expression similarity was comparable to that of human observers. Similarity of exemplars in terms of average subject ratings across expression types was computed and visualized using multidimensional scaling (MDS) analyses of the human data (average rating for each exemplar on the six emotion scales). The same analysis was performed for the computer model. Human and computer MDS plots were then compared for similarity of the relative positions of exemplars on a circumplex across the six basic expressions. 3.2.1. Trained exemplars Human rating norms from the original POFA rating study (Ekman & Friesen, 1976) were compared to SVM ratings using MDS. Focusing on training exemplars allowed examination of similarity on data where discrimination between expression classes was most accurate. The MDS circumplex for the human ratings, shown in Fig. 6a, demonstrates that each expression class is clustered tightly together with no overlap between adjacent classes. In addition, exemplars were clustered in a characteristic order, replicating MDS analyses in previous studies (Adolphs, Damasio, Tranel, Cooper, & Damasio, 2000; Dailey, Cottrell, & Adolphs, 2000; Dailey et al., 2002). Although more diffuse than mean human performance, highly distinct clusters were also formed in the computer model (Fig. 6b). Critically, the ordering of the clusters and their relative positions was identical to that of human observers. For example, anger exemplars were rated between sadness and disgust, surprise was between happiness and fear, with sadness rated maximally distant from happiness. MDS for human and computer model data resulted in similar levels of stress (Stress-I) in two-dimensional solutions (0.256 versus 0.257). Thus, where supervised training achieved maximal discrimination of expression types, a secondary unsupervised aspect of performance was the model's capturing of the similarity between expression types. MDS plots of similarity between exemplars of different emotions from the POFA ... Fig. 6. MDS plots of similarity between exemplars of different emotions from the POFA training dataset. (a) Human rating norms. (b) Computer model activations. Figure options 3.2.2. Untrained exemplars We next assessed similarity performance on exemplars not in the model's training set. Human and computer ratings were again converted to standard scores for comparison. MDS on the human ratings verified that the circumplex ordering matched the above reported human norms for POFA (see Fig. 7a). Adjacent clusters were no longer equidistant; angry exemplars fell in close proximity to disgust exemplars while fear exemplars fell close to surprise exemplars, suggesting greater perceived similarity in these expression pairings in comparison to the POFA images. MDS plots of similarity between exemplars of different emotions from the JACFEE ... Fig. 7. MDS plots of similarity between exemplars of different emotions from the JACFEE dataset. (a) Human ratings averaged across all 23 subjects. (b) Human ratings for subjects in two characteristic clusters of subject rating patterns (see Fig. 5). The first column shows ratings for two subjects with low accuracy for fear. The second column shows ratings for two subjects with low accuracy for disgust. (c) Computer model activations. Figure options Our above finding of individual differences in discrimination of fear and disgust expressions may be due in part to the perceived similarity with adjacent clusters on the circumplex. To address this further, we examined MDS solutions on subjects who formed the two major sub-clusters in discrimination performance reported above (see Fig. 5). As illustrated in Fig. 7b, individuals reveal different clustering from idealized mean performance, with much less separation of expression types, such as fear and surprise, or disgust and anger. This demonstrates that ordering and clustering on the circumplex is somewhat variable and that averaging over subjects reveals a stronger tendency towards clustering than may be present in individual subjects. When MDS was applied to the computer model, despite overall similarity in the circumplex solution, generalization to new exemplars revealed looser clustering of exemplars and more overlap between expression types than the mean human ratings, as depicted in Fig. 7c. However, the model's performance appears more similar to individual subjects’ performance, in particular those with less pronounced discrimination of fear (Fig. 7b). Critically, the circumplex for the computer ratings followed the same order as the human circumplex, demonstrated in both group and individual subject data. In particular, where the computer model fails to define distinct clusters, it largely captures the similarity amongst exemplar types in humans. MDS solutions for human and computer model data resulted in similar levels of stress with a two-dimensional projection (0.157 versus 0.221). Despite the more sparse clustering found in the computer model relative to average human data, the correlation coefficient between human and computer judgments across expression types was very high (r = 0.80, p < 0.001), suggesting a great deal of similarity in the rating patterns. Examining how well the activation of distinct expert SVMs (anger, fear, disgust, etc.) corresponded to humans, we found that specific correlations for each expression type were consistently high (anger, r = 0.96; sadness, r = 0.94; happiness, r = 0.89; fear, r = 0.85; surprise, r = 0.83; disgust, r = 0.60). For example, as illustrated in Fig. 8 for fear expressions, humans and SVM experts agreed upon fear as the target expression, and also rated surprise as the most similar relative to the other expression types. The model's capturing of the similarity between fear and surprise underlies its poor discrimination of fear, often providing false alarms to surprise. Similarly, with anger expressions, humans and SVMs agreed upon angry faces as the target expression, and rated disgust as the most similar relative to the other expression types. Comparison of human and computer rating profiles. (a) Profile comparison ... Fig. 8. Comparison of human and computer rating profiles. (a) Profile comparison averaged over anger exemplars. (b) Profile comparison averaged over fear exemplars. The x-axis is rank-ordered by human ratings and thus the label order in (a) and (b) differ.