تجزیه و تحلیل چند متغیره از داده رفتار انسانی با استفاده از پنجره های فازی: به عنوان مثال با سیستم راننده اتومبیل محیط زیست
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|28125||2012||8 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Engineering Applications of Artificial Intelligence, Volume 25, Issue 5, August 2012, Pages 989–996
In most human component system studies performed in simulators, several factors (or independent variables) (at least two, i.e., individual and time) and many variables (or dependent variables) are present. Large and complex databases have to be analyzed. Instead of using rather automatic procedures, this article suggest that, for a very first analysis at least, the human being must be present and he/she must choose a method being adapted to the data, which is different to run a method supposing that the data fit such or such model. This article suggests starting the analysis while keeping both the multifactorial (MF) and multivariate (MV) aspects. To achieve this aim, with the possibility to show nonlinear relationships, a MFMV exploration of the experimental database is performed using the pair (fuzzy space windowing, Multiple Correspondence Analysis). Then may come an inference analysis. This long (due to multiple large graphical views) but rich procedure is illustrated and discussed using a car driving study example.
Most system empirical studies yield large and complex databases, large because data is recorded using high frequency sampling (compared to the system dynamics) and/or during a long time period and complex because data is heterogeneous. This is particularly true in the areas of human component system studies (psychology, medicine, ergonomics or sociology) with the possible presence of different scale mathematical models (nominal, ordinal and quantitative) (Stevens, 1974), of objective and subjective origins and of time and not-time variables. Whatever the strategy employed in conducting studies (observational, experimental or correlational method) (Sheskin, 2007), the disciplinary fields focusing on data exploitation (e.g., Statistics, Pattern Recognition, Signal Analysis, Data Mining, Artificial Intelligence) propose many methods. Even though these fields are not independent, here are some possible Taxonomic Dimensions (TD) that are often cited: – TD1, descriptive vs. inferential methods (Sheskin, 2007), – TD2, monovariate vs. multivariate methods (Jobson, 1991), – TD3, time vs. not-time methods (Fitzmaurice et al., 2004), – TD4, supervised vs. unsupervised methods (Pal and Pal, 2001), – TD5, fuzzy vs. ordinary sets based methods (Komen and Schneider, 2005), – TD6, probability vs. possibility theory methods (Dubois and Prade, 1988), – TD7, connexionist vs. analytic methods (Silipo, 2007). Conceiving and executing an empirical study is a long and complex task so that as soon as all the time data sets have been recorded (or maybe one time dataset only), the system analyst may be in a hurry to get results. Then for questions of speed or lack of knowledge or curiosity, the analyst might choose a fast and well known data analysis path. For instance, in the case of a study performed using an experimental design (e.g., a chemical process in an industrial study), the researcher may rapidly want to test the statistical hypotheses that the experiment was primarily designed to test. The same behavior may occur in the case of a study performed using an observational design (e.g., a system including users and designers in a web use analysis over several years). Faced with a large and complex database, instead of using rather automatic procedures, we suggest that, for a very first analysis at least, the human being must be present and he/she must choose a method being adapted to the data, which is different to run a method supposing that the data fit such or such model. For instance, with many signals, running the arithmetic mean program for summarizing time data has a poor meaning if the signals present sudden changes or a monotone evolution. In the same way, with experimental design data, running the usual variance analysis computing program has a poor meaning if the data is very far from the Laplace–Gauss model or the variances related to the factor levels are very different. Our point of view is to start the data analysis using knowledge and graphics as following.
نتیجه گیری انگلیسی
4.1. Generally speaking The main idea of our five-stages procedure is to start the analysis using as much as possible the actual behavior of the data and the domain expert's knowledge. If several more sophisticated methods do exist to investigate a time database (Tak-ching, 2011 and Pandit and Wu, 2001), displaying magnitude histograms and cutting the scales (for variables and/or the factors) can be performed first. The fuzzy space windowing (FSW) must be based on the domain knowledge, which is linked with the specificity of the histogram (such as modal areas) and the real meaning of the scale (e.g., distinguishing between acceleration and deceleration). Then, the membership value table can be considered as MCA input. Here again more sophisticated methods (with or without automatic procedures) can be used. This is the case of classification/clustering methods (Jobson, 1991 and Moschopoulos et al., 2009), but most procedures are able to deal either with the variables or the empirical situations. On the contrary, MCA allows to show both points corresponding to the variables (through their space windows) and to the empirical situations. One must acknowledge that considering a space trajectory with S points for a variable v with MCA is much more complex that considering a single point for a variable v, such as with the usual Principal Component Analysis (PCA) ( Jobson, 1991 and Kuri-Morames and Rodriguez-Erazo, 2009), nevertheless it becomes possible to line out more complex relational phenomena. For instance, imagine a parabolic relation between 2 variables v and v′; thanks to 2⁎S space window point consideration and the way the MCA main axes are positioned (i.e., with the relative contribution of space window points which are given by the table that aid interpretation, just as with PCA), one may obtain “when v is M, v′ is VL and when v is either VL or VH, v′ is VH” (it is worth noting that with a linear relation between 2 variables v and v′, the 2 spaces trajectory may look like a parabola, the first main axis opposing VL and VH windows, the second main axis opposing M windows to extreme ones (VL and VH). Of course the idea of space windowing with quantitative variables makes it possible to draw relationships between quantitative and qualitative variables. Another advantage of the FSW/MCA pair is the possibility to shown outliers. This approach is less automatic than others (Han and Kamber, 2006 and Garces and Sbarbaro, 2011) but, thanks to the interpretation of the main axes, it becomes possible to known for which space window and/or which space window combination(s) the outliers occur. Finally, the possibility to consider points without any contribution to the main axes positioning (e.g., supplementary points) (Benzecri, 1992), makes it possible to show the factor influences. Using a Data Analysis Path (DAP) based on fuzzy space windowing and MCA has several advantages compared to a more traditional DAP. However, these advantages have a price. The method is rather complex and most of the time requires many feedbacks, especially when dealing with numerous factors or variables. Understanding relationships among those requires to consider iteratively subsets of the original database or to use the supplementary points projection capacity of the method, and this takes time and space to get the results. Nevertheless, with a good knowledge and practice of the method, its assets largely compensate its drawbacks. 4.2. Concerning the driver–car–environment system (DCES) With time multivariate data, the literature often considers high summarizing technique and/or monovariate analyses. Here we have suggested to keep both the time factor and the multivariate aspect. A number of results stand out from the descriptive analysis. One of them is the high correlation between the two variables that were specifically designed to characterize the notion of safety (i.e., Dx, the longitudinal distance prior to collision and Dv, the visibility distance). Such a result is consistent with the way the two variables were built. Nevertheless, a careful analysis of the space window trajectories in the three factor planes (1,2), (3,4) and (5,6) (not shown) indicates that the relationship is not fully linear. The Dx variation observed along horizontal lines can be explained by modifications in the heading error of the vehicle. A second interesting result obtained with the multivariate descriptive analysis is the possibility of consistently defining behavior classes a posteriori in terms of the a priori driving situation. This indicates that the analysis variables allow the different road environments (e.g., geometry, obstacles) to be differentiated in terms of driver behavior (with multivariate aspect). Inter-individual differences were highlighted for sharp left-hand curves ( Sections 2 and (5)), principally related to the lateral vehicle position (Y). For example, subject 1 drove more often in the middle of the right lane, while the other drivers positioned their cars to the left. Despite these differences, it appears, as we had thought, that the driving behavior is time consistent. A single subject behaves identically in all four laps (4⁎11=44 km). Most of the results were confirmed through the tests (Fig. 4). More particularly, there were large differences in the variables Y and right lateral distance prior to collision Dyr for narrow straight sections and curves. Of course, as suggested in the first part of this discussion, more sophisticated techniques could be used in a second analysis ( Wang et al., 2010). As a conclusion, rather than preferring to start the analysis using rather automatic procedures, we have shown that employing first graphical techniques allows adapting the models to the data instead of the contrary. Using this technique leads to a DAP that can be generalized (Fig. 5). This DAP is based on fuzzy space windowing, then membership value summarizing using the arithmetic mean within each fuzzy time window and then MCA. In these steps, outliers, connections between the variables and factor effects on the most informative variables can be pointed out. Furthermore, thanks to membership value averaging, this can be done for qualitative/quantitative data and temporal/non temporal data, in the same analysis. In addition the possibility to add supplementary points make possible to show the impact of specific factors, such as the “individual” in human machine studies. In a second phase of the DAP, other techniques can be used for example to test statistically the significance of the results, or if needed, more sophisticated techniques.