شهروندان به عنوان مصرف کنندگان: پرونده کاربران خدمات دولت الکترونیکی "در مصر از طریق تکنیک های داده کاوی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22296 | 2013 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : International Journal of Information Management, Volume 33, Issue 4, August 2013, Pages 627–641
چکیده انگلیسی
This study uses data mining techniques to examine the effect of various demographic, cognitive and psychographic factors on Egyptian citizens’ use of e-government services. Data mining uses a broad family of computationally intensive methods that include decision trees, neural networks, rule induction, machine learning and graphic visualization. Three artificial neural network models (multi-layer perceptron neural network [MLP], probabilistic neural network [PNN] and self-organizing maps neural network [SOM]) and three machine learning techniques (classification and regression trees [CART], multivariate adaptive regression splines [MARS], and support vector machines [SVM]) are compared to a standard statistical method (linear discriminant analysis [LDA]). The variable sets considered are sex, age, educational level, e-government services perceived usefulness, ease of use, compatibility, subjective norms, trust, civic mindedness, and attitudes. The study shows how it is possible to identify various dimensions of e-government services usage behavior by uncovering complex patterns in the dataset, and also shows the classification abilities of data mining techniques.
مقدمه انگلیسی
One of the most intractable problems for anyone dealing with government is the sheer complexity of its organizational structure. For example, it has been estimated that the average government has between 50 and 70 different departments, agencies and regulatory bodies (Silcock, 2001). A number of government's different agencies may be involved in simple matters such as registering the birth of a child. Fortunately, advances in technology, particularly the advent of the Internet, has made it possible for local governments to deliver their services to citizens via a single portal known as e-government. e-Government has been regarded as a ‘paradigm shift’ or a catalyst for government administrative reform resulting in improved quality of service, cost savings, wider political participation and more effective policies and programs (Helbig, Gil-Grcia, & Ferro, 2009). e-Government has also been proposed as a solution for increasing citizen communication with government agencies and, ultimately, political trust (Chadwick & May, 2003). In several countries there has been a growing pressure for governments to move online. In the Arab world, Dubai pioneered e-voting in elections for half the members of the United Arab Emirates’ consultative assembly (The Economist, 2008). In Bahrain the e-government authority of Bahrain (E-GA) has recently launched the Enterprise Architecture Project (EAP) initiative, which is considered to be the first of its kind in the Arab world. The initiative aims at streamlining government procedures by unifying the standards and procedures among all government entities in all matters related to information communication technology (Bahrain Tribune, 2009). Finally, in Egypt e-government currently provides 85 services to citizens including government forms, public policy information and tax filing (Hamed, 2008). Two main reasons are behind governments’ decision to move online. First, a more enlightened view has begun in the ranks of government to treat the citizen like a consumer where transaction satisfaction is important. Second, pressures for governments to do more with less will force governments to provide services in a more efficient way. In fact, e-government offers substantial performance gains over the traditional model of government. For example, based on the analysis of 49 empirical studies, Danziger and Andersen (2002) concluded that there were positive e-government impacts on data access and efficiency and productivity of government performance in both internal operations and external service functions. In fact it has been argued that a significant portion of the benefits created by e-government services are obtained by the government itself in terms of efficiency gains (Tung & Rieck, 2005). For example, the U.S. government generates around US$ 3 billion on its Web site (Clark, 2003). While several terms are synonymous with e-government such as digital government, e-governance and e-democracy, authors generally use a broad conceptualization for e-government to encompass all government roles and activities shaped by information and communication technologies (Brown, 2005). There are three relationships in the e-government interactive processes: government to government (G2G), government to business (G2B) and government to citizen (G2C). For the purpose of this study, only G2C relationship will be discussed. e-Government progress has also been divided into three phases: the first phase is to publish, in which e-government has only limited digital presence through limited published information. The second phase is to interact, where citizens can interact with government via electronic media such as emails and chat rooms. The third stage is to transact, where citizens participate in government services via the government's digital portal ( Lau, Aboulhoson, Lin, & Atkin, 2008).
نتیجه گیری انگلیسی
5.1. Discriminant analysis To compare e-government services users and non-users the traditional LDA was used using the SPSS 16.0 package. Classification with LDA involves classifying subjects into one of several groups on the basis of a set of measurements. LDA has a long tradition in the marketing literature for providing solutions to problems involving discrete outcomes, such as choice or classification (Heilman, Kaefer, & Ramenofsky, 2003) and for its competency in predicting choice as a function of past behavior (Mela, Gupta, & Lehmann, 1997). LDA assumes certain statistical characteristics of the data, such as multivariate normality and homogeneity of variance/covariance matrices. In a preliminary analysis of the data, a case analysis was conducted to identify possible outliers and violations of the LDA assumptions. No serious violations of the assumptions were detected. Given that the optimal ordering of variables was not known a priori and the purpose was to determine the extent to which certain variables each contributed to prediction of user status, the discriminant functions were computed using simultaneous estimation of all independent variables. In the LDA if there are G groups, G-1 discriminant functions can be estimated. Since this study considers two groups, one function was generated to predict group membership. The function was found to significantly differentiate between users and non-users based on usefulness, ease of use, compatability, trust, subjective norms, civic mindedness, and attitudes as shown in Table 1 (Wilks’ lambda = 0.558, p < 0.001). The canonical correlation was found to be 0.665, indicating that these variables explain 0.44% of the variance in e-government services usage behavior. The group centroids (−2.780 versus .284) further illustrate the separation between the two groups.In order to examine the relative importance of each variable in discriminating between which citizens become users versus non-users of e-government services, discriminant loadings were obtained and are presented in Table 2. Variables are ordered by the absolute size of heir correlation with the discriminant function. Each independent variable's canonical discriminant function coefficient is also presented in Table 2. Respondents’ attitudes had the greatest influence in determining whether respondents become users or non-users. Respondents’ trust scores were next in importance, followed by compatability and usefulness.In order to assess the overall fit of the discriminant function classification results were examined. In combination, the discriminant function achieved 94.3% classification accuracy. This result was cross-validated using the jackknife procedure, which repeatedly re-estimates the discriminant function eliminating one observation at a time; 92.8% of cross-validated group cases were correctly classified as shown in Table 3. This validation procedure indicated that the overall model results were robust and were not specific to the sample used in estimation. Classification results in both samples were also higher than the proportional chance criterion and the maximum chance criterion. Press's Q statistic confirmed that the predictions in both samples were significantly better than chance (p < 0.001).5.2. Multi-layer perceptron neural network Given its usefulness in data mining (Smith & Gupta, 2000), MLP is a logical choice for the problem studied here. In fact, Zahavi and Levin (1997) highlighted MLP models as a promising and effective alternative to conventional response modeling in database marketing for targeting purposes. McKechnie (2006) have proposed that NN and data mining techniques should be used to classify customers into distinct segments based on their past behavior, cluster data to establish specific customer type and use this knowledge to tailor marketing efforts. Furthermore, MLP has been employed in the context of competitive market structuring and segmentation analysis (Reutterer & Natter, 2000). MLP was first developed to mimic the functioning of the brain. It consists of interconnected nodes referred to as processing elements that receive, process, and transmit information. MLP consists of three types of layers: the first layer is known as the input layer and corresponds to the problem input variables with one node for each input variable. The second layer is known as the hidden layer and is useful in capturing nonlinear relationships among variables. The final layer is known as the output layer and corresponds to the classification being predicted (Baranoff, Sager, & Shively, 2000). There are many software packages available for analyzing MLP models. We chose NeuroIntelligence package (Alyuda, 2003). This software applies artificial intelligence techniques to automatically find the efficient MLP architecture. Typically, the application of MLP requires a training data set and a testing data set (Lek & Guegan, 1999). The training data set is used to train the MLP and must have enough examples of data to be representative for the overall problem. The testing data set should be independent of the training set and is used to assess the classification accuracy of the MLP after training. Following Lim and Kirikoshi (2005), a quasi-Newton algorithm with weight updates occurring after each epoch was used for MLP training. The learning rate was set at 0.1. After 100 iterations the correct classification rate (CCR) reached 99.8% as seen in Fig. 2. Table 4 reports the properties and predictive accuracy of the MLP model. As can be observed, the MLP classifier predicted training sample with 99.8% accuracy and validation sample with 98.3% accuracy.5.3. Probabilistic neural network The MLP is the most frequently used neural network technique in pattern recognition (Bishop, 1999) and classification problems (Sharda, 1994). However, numerous researchers document the disadvantages of the MLP approach. For example, Calderon and Cheh (2002) argue that the standard MLP network is subject to problems of local minima. Swicegood and Clark (2001) claim that there is no formal method of deriving a MLP network configuration for a given classification task. Thus, there is no direct method of finding the ultimate structure for modeling process. Consequently, the refining process can be lengthy, accomplished by iterative testing of various architectural parameters and keeping only the most successful structures. Wang (1995) argues that standard MLP provides unpredictable solutions in terms of classifying statistical data. An alternative NN architecture, the PNN is non-linear, nonparametric pattern recognition modeling technique that was originally introduced to the neural network literature by Specht (1990). PNNs require no assumptions about distributions of random variables used to classify; they even can handle multi-modal distributions. They train quickly and as well as, or better than MLP networks. They have the ability to provide mathematically sound confidence levels and are relatively insensitive to outliers (Singer & Bliss, 2003). While the MLP network requires a validation data set (i.e., wasted cases) to search for over-fitting, PNNs use all available data in model building. PNNs feature a feed-forward architecture and supervised training algorithm similar to back-propagation. The training pattern is presented to the input layer. The main role of the input layer is to map all the external signals into hidden layers by a scaling function through which each input neuron normalizes the range of external signals into a specific range that the neuron network can process. The neurons in hidden layer aim to add flexibility to the performance of the PNN so as to recording the knowledge of classification extracted from the training pattern. There must be, at least, as many neurons in the hidden layer as the number of training patterns (Tam, Tong, Lau, & Chan, 2005). The summation layer consists of one neuron for each data class and sums the outputs from all hidden neurons of each respective data class. The output layer has one neuron for each possible category. The network produces activation, a value between zero and one in the output layer corresponding to the probability density function estimated from that category. The output with the highest value represents the most probable category. PNNs are used for classification problems where the objective is to assign cases to one of a umber of discrete classes (Hunter, 2000). Theoretically, the PNN can classify an out-of-sample data with the maximum probability of success when enough training data is given (Enke & Thawornwong, 2005). The PNN has been extensively used in various pattern classification tasks across several domains due to ease of training and sound statistical foundation in Bayesian estimation theory. For example, Yang and Marjorie (1999) utilized a PNN to predict the financial crisis in oil industry companies in the USA. Jin and Srinivasan (2001) proposed a new technique for freeway incident detection using PNN. Hajmeer and Basheer (2002) used PNN to study the classification of bacterial growth. Chen, Leung, and Daouk (2003) applied PNN to stock index forecasting. Huang (2004) applied PNN to predict the class of leukemia and colon cancer. Gerbec, Gasperic, Smon, and Gubina (2005) used PNN to classify consumers’ electricity load profiles. Xue, Zhang, Liu, Hu, and Fan (2005) classified 102 active compounds from diverse medicinal plants with anticancer activity. Jin and Englande (2006) used PNN to classify whether a condition in a lake is safe to swim or not. Wilson (2006) successfully tested the PNN on 209 seizures obtained from an epilepsy-monitoring unit. Laskari, Meletiou, Tasoulis, and Vrahatis (2006) evaluated the performance of PNN on approximation problems related to cryptography. These applications show that while PNN has been applied to many areas, little attention has been paid to applying PNN to consumer profiling and market segmentation problems. There are many computer software packages available for building and analyzing NNs. Because of its extensive capabilities for building networks based on a variety of training and learning methods, NeuralTools Professional package (Palisade Corporation, 2005) was chosen to conduct PNN analysis in this study. This software automatically scales all input data. Scaling involves mapping each variable to a range with minimum and maximum values of 0 and 1. NeuralTools Professional software uses a non-linear scaling function known as the ‘tanh’, which scales inputs to a (−1, 1) range. This function tends to squeeze data together at the low and high ends of the original data range. It may thus be helpful in reducing the effects of outliers (Tam et al., 2005). Table 5 reports the properties and predictive accuracy of the PNN model. As can be observed, the PNN classifier predicted both training and testing samples with 100% accuracy.5.4. Self-organizing maps The SOM, also called Kohonen map, is a heuristic model for exploring and visualizing patterns in high dimensional datasets. It was first introduced to the neural networks community by Kohonen (1982). SOM can be viewed as a clustering technique that identifies clusters in a dataset without the rigid assumptions of linearity or normality of more traditional statistical techniques. Indeed, like k-means, it clusters data based on an unsupervised competitive algorithm where each cluster has a fixed coordinate in a topological map ( Audrain-Pontevia, 2006). The SOM is trained based on an unsupervised training algorithm where no target output is provided and the network evolves until convergence. Based on the Gladyshev's theorem, it has been shown that SOM models have almost sure convergence ( Lo & Bavarian, 1993). The SOM consists of only two layers: the input layer which classifies data according to their similarity, and the output layer of radial neurons arranged in a two-dimensional map (Fig. 3). Output neurons will self-organize to an ordered map and neurons with similar weights are placed together. They are connected to adjacent neurons by a neighborhood relation, dictating the topology of the map (Moreno, Marco, & Olmeda, 2006). The number of neurons can vary from a few dozen to several thousand. Since the SOM compresses information while preserving the most important topological and metrical relationships of the primary data elements on the display, it can also be used for pattern classification (Silven, Niskanen, & Kauppinen, 2003)Due to the unsupervised character of their learning algorithm and the excellent visualization ability, SOMs have been recently used in myriad classification and clustering tasks. Examples include classifying cognitive performance in schizophrenic patients and healthy individuals (Silver & Shmoish, 2008), mutual funds classification (Moreno et al., 2006), speech quality assessment (Mahdi, 2006), vehicle routing (Ghaziri & Osman, 2006), network intrusion detection (Zhong, Khoshgoftaar, & Seliya, 2007), anomalous behavior in communication networks (Frota, Barreto, & Mota, 2007), compounds pattern recognition (Yan, 2006), market segmentation (Kuo, Ho, & Hu, 2002) and classifying magnetic resonance brain images (Chaplot, Patnaik, & Jagannathan, 2006). There are many software packages available for analyzing SOM models. We chose SOMine package version 5.0 (Viscovery Software GmbH, 2008). This software applies artificial intelligence techniques to automatically find the efficient SOM clusters. To visualize the cluster structure, some authors use the unified distance matrix (U-matrix) (e.g., Vijayakumar, Damayanti, Pant, & Sreedhar, 2007). However, this method does not give crisp boundaries to the clusters ( Worner & Gevrey, 2006). In this study a hierarchical cluster analysis with a Ward linkage method was applied to the SOM to clearly delineate the edges of each cluster. The number of neurons is chosen to be 2000. There are two learning algorithms for SOM ( Kohonen, 2001): the sequential or stochastic learning algorithm and the batch learning algorithm. In the former, the reference vectors are updated immediately after a single input vector is presented. In the latter, the update is done using all input vectors. While the batch algorithm does not suffer from convergence problems, the sequential algorithm is stochastic in nature and is less likely trapped to a local minimum. Following Ding and Patra (2007), we choose the sequential learning algorithm to train the SOM. Fig. 4 shows the cluster indicator. This figure clearly shows that the SOM converges successfully after 50 iterationsThe SOM cluster results are shown in Fig. 5. This two-dimensional hexagonal grid shows clear division of the input pattern into three clusters. Since the order on the grid reflects the neighborhood within the data, features of the data distribution can be read off from the emerging landscape on the grid. For example, it can be seen that the green-colored cluster is the smallest cluster with a frequency of 9.28%. This is the non-users cluster. Respondents in this cluster are characterized by low perceived usefulness, ease of use and compatibility of e-government services and less favorable attitudes toward e-government services. Surprisingly, this cluster includes individuals who are more civically minded. The red-colored cluster represents the users cluster. This cluster accounts for 19.33% of respondents. The respondents in this cluster are characterized by high perceived usefulness, ease of use compatibility, and favorable attitudes toward e-government. Table 6 summarizes the basic information in each clusterBased on the SOM-Ward clusters, feature or component maps can be constructed (Vesanto, 1999). These maps are also known in the literature as ‘temperature maps’ (Churilov & Flitman, 2006). On these maps, the nodes which share similar information are organized in close color proximity to each other. Fig. 6 shows the feature maps for every cluster and for all input attributes. Feature maps show the distribution of values of the respective input component over the map. Relationships between variables could be inspected by visually comparing the pattern of shaded pixels for each map; similarity of the patterns indicates strong monotonic relationships between the variables. The name of the displayed input component appears on top of each map. The color scale at the bottom of the component window shows that blue is used for low values, green for mid-range values and red for high values. From the feature maps we note, for example, that the extreme values represented in red in the e-government feature map in the users’ cluster matches those extreme values found in the perceived usefulness, ease of use and compatibility feature map. Interestingly, the non-user cluster includes the highest constellation of red pixels in the civic mindedness attribute. Through the use of colors, one can immediately see that the trust portion is significantly larger than the rest in the trust window for the users cluster – thus implying that trust is positively related to e-government services usage – a result that was previously confirmed by other researchers (e.g., Bélanger, 2008). In essence, these colorful maps reveal the existence of previously theorized assumptions and it can even create new ones. The maps also make it possible to find subgroups that do not follow the main theoretical assumptions. For example, when red dots are found in the middle of the green area, this signals the presence of deviant subgroups. When either blue or red nodes are forming two clearly separated areas, this might be considered as a sign of non-linear correlation (Thneberg & Hotulainen, 2006).5.5. Classification and regression trees CART is a nonparametric technique developed by Brieman, Friedman, Olshen, and Stone (1984) to classify group observations based on a set of characteristics into distinct groups, using the decision tree methodology. The technique was introduced to overcome the inherent limitations in the automatic interaction detector (AID) and the chi-square automatic interaction detector (CHAID) techniques. Unlike AID or CHAID, CART can work in classification tree mode withy categorical predictor variables, or in regression tree mode with interval or ratio scaled predictors. CART recursively splits a dataset into non-overlapping subgroups based on the independent variables until splitting is no longer possible (Baker & Song, 2008). CART has been widely applied in various fields of research such as mortgage default (Feldman & Gross, 2005), reliability (Bevilacqua, Braglia, & Montanari, 2003), detecting user web search preferences (Pendharkar, 2006), female sexuality (Wiegel, Meston, & Rosen, 2005), detecting change in consumer behavior (Kim, Song, Kim, & Kim, 2005), determining the role of race in capital cases (Berk, Li, & Hickman, 2005), site quality evaluation (Corona, Dettori, Filigheddu, Maetzke, & Scotti, 2005), and auditor change decisions (Calderon & Ofobike, 2008). In this study CART software version 6.0 (Steinberg & Golovnya, 2006) was used to build classification trees. The Gini index was used in the splitting process while test sample estimation was implemented to evaluate the predictive performance of each classifier. Following D’Alisa et al. (2006), the 10-fold validation approach with re-substitution was adopted. This consists of simulating 10 different samples by subtracting randomly each time 10% of the subjects and duplicating randomly another 10%. After each run, the original sample is restored. The final tree represents the best trade-off between variance explanation and variance stability across 10 “different” samples. Overall correct classification rate obtained from CART was 99.48%. Fig. 7 depicts the final obtained pruned CART tree. From this figure we see that trust in e-government systems plays the most important role in rule induction. Table 7 summarizes the rules and the classified results from the CART tree. Fig. 8 represents the significant predictors arranged according to their importance in profiling e-government services users. These are the variables proposed by the most widely used CART classification method namely, the Gini reduction method. Other methods such as the symmetric Gini method and the class probability methods reached similar results. From this figure we see that trust, attitudes and compatibility are the most important factors in determining e-government services adoption behavior in Egypt.5.6. Multivariate adaptive regression splines MARS is a relatively novel data mining technique developed by Friedman (1991). This technique combines classical linear regression, mathematical construction of splines and binary recursive partitioning to produce a local model where the relationship between response and predictors are either linear or nonlinear through approximating the underlying function through a set of adaptive piecewise linear regression termed basis functions (BF) (Jesus & Angel, 2004). MARS develops models in a forward growing stage by adding the BF that is most effective in error-minimizing, while it allows for the detection of interactions by multiplication of a term already entered in the model with another candidate basis function (Flouris & Duffy, 2006). The BF transform makes it possible to blank out certain regions of a variable by making them zero, allowing the model to focus on specific sub-groups of the data deemed important (Ture, Kurt, Kurum, & Ozdamar, 2005). The power of MARS for building prediction and classification models has been demonstrated in many applications such as information technology productivity studies (Ko & Osei-Bryson, 2006), genetics (York & Eaves, 2001), biomedical analysis (Deconinck, Ates, Callebaut, Van Gyseghem, & Heyden, 2005), network intrusion detection (Peddabachigari, Abraham, Grosan, & Thomas, 2007), credit scoring (Lee & Chen, 2005), finance (Abraham, 2002), software maintainability (Zhou & Leung, 2007) and cancer diagnosis (Chou, Lee, Shao, & Chen, 2004). In this study we used MARS 2.0 package (Steinberg, Colla, & Martin, 1999) to conduct the analysis. Overall correct classification rate obtained from MARS was 99.10% (sensitivity = 0.931 and specificity = 0.997). To help interpret the models obtained, we visualize major two-way interactions between independent variables. Fig. 9 is a typical example of such two-way interactions. For example, the lower left part of the graph represents the model's predicted surface for the dependent variable (i.e., e-government services adoption) when only considering the interaction effect between trust and civic mindedness. MARS shifts values on the contribution axis so that the minimum value is 0. Color codes represent different contribution value intervals. From this figure we see that for low levels of trust, the probability of using e-government services rises steeply with civic mindedness and maintains a relatively high probability except for the very low and very high levels of civic mindedness.5.7. Support vector machines SVMs have been developed by Vapnik (1995) as a novel type of machine learning. SVMs are a set of related supervised learning methods used for classification and regression. In the case of classification, SVMs obtain the ‘optimal’ boundary of two classes in a vector space independently on the probabilistic distributions of training vectors in the data set. If the categories are linearly separated, the aim of SVMs is to find the ‘optimal’ hyperplane boundary which separates both classes, classifying not only the training set but also unknown samples. When the classes are non-linearly separable, the input data are implicitly mapped into a higher dimensional space by a kernel function, e.g., Guassian radial basis function (Berrueta, Alonso-Salces, & Heberger, 2007). An operating model of SVM is shown in Fig. 10. The dashed black line in this figure is the SVM solution. According to this idea, e-government services usage can be viewed as a simple SVM application, classification of linearly separable classes; that is, a citizen belongs to the user category or not.