دانلود مقاله ISI انگلیسی شماره 22188
ترجمه فارسی عنوان مقاله

استفاده از تکنیک های متن و داده کاوی برای پیش بینی روند دادخواست واصل الکترونیکی افراد

عنوان انگلیسی
Applying text and data mining techniques to forecasting the trend of petitions filed to e-People
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22188 2010 14 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 37, Issue 10, October 2010, Pages 7255–7268

ترجمه کلمات کلیدی
متن داده کاوی - داده کاوی - دادخواست - استخراج کلید واژه - خوشه بندی سند - پیش بینی - دولت الکترونیکی - نوآوری باز - افراد الکترونیکی
کلمات کلیدی انگلیسی
Text mining, Data mining, Petition, Keyword extracting, Document clustering, Forecasting, e-Government, Open Innovation, e-People
پیش نمایش مقاله
پیش نمایش مقاله  استفاده از تکنیک های متن و داده کاوی برای پیش بینی روند دادخواست واصل الکترونیکی افراد

چکیده انگلیسی

As the Internet has been the virtual place where citizens are united and their opinions are promptly shifted into the action, two way communications between the government sector and the citizen have been more important among activities of e-Government. Hence, Anti-corruption and Civil Rights Commission (ACRC) in the Republic of Korea has constructed the online petition portal system named e-People. In addition, the nation’s Open Innovation through e-People has gained increasing attention. That is because e-People can be applied for the virtual space where citizens participate in improving the national law and policy by simply filing petitions to e-People as the voice of the nation. However, currently there are problems and challenging issues to be solved until e-People can function as the virtual space for the nation’s Open Innovation based on petitions collected from citizens. First, there is no objective and systematic method for analyzing a large number of petitions filed to e-People without a lot of manual works of petition inspectors. Second, e-People is required to forecast the trend of petitions filed to e-People more accurately and quickly than petition inspectors for making a better decision on the national law and policy strategy. Therefore, in this paper, we propose the framework of applying text and data mining techniques not only to analyze a large number of petitions filed to e-People but also to predict the trend of petitions. In detail, we apply text mining techniques to unstructured data of petitions to elicit keywords from petitions and identify groups of petitions with the elicited keywords. Moreover, we apply data mining techniques to structured data of the identified petition groups on purpose to forecast the trend of petitions. Our approach based on applying text and data mining techniques decreases time-consuming manual works on reading and classifying a large number of petitions, and contributes to increasing accuracy in evaluating the trend of petitions. Eventually, it helps petition inspectors to give more attention on detecting and tracking important groups of petitions that possibly grow as nationwide problems. Further, the petitions ordered by their petition groups’ trend values can be used as the baseline for making a better decision on the national law and policy strategy.

مقدمه انگلیسی

Owing to the development of information and communication technology (ICT), particularly the Internet, the government sectors around the world have tried to progress themselves into the electronic government (e-Government), a.k.a. digital government. The construction of e-Government aims at providing citizens with services quickly and accurately, effectiveness of government work, innovation by redesign work process, and raising national competitiveness by improving productivity (Lee & Jung, 2004). Therefore, the anticipated benefits of e-Government can be more efficiency, greater convenience, improved services, better accessibility of public services, less corruption, more transparency, revenue growth, and cost reductions (Atkinson & Castro, 2008). According to Palvia and Sharma (2007), various activities of e-Government can be summarized into four categories with respect to interaction domains. The first type is to push information over the Internet, e.g. regulatory services, issue briefs, notifications, etc. Secondly, some models aim at improving two way communications between the government agency and the citizen, a business, or another government agency. In the second type of models, users can engage in dialogue with government agencies and post problems, comments, or requests to the government agencies. Third, e-Government helps conduct transactions such as lodging tax returns and applying for services and grants. The fourth type is based on governance, e.g. online polling, voting, and campaigning. Among these four types in activities of e-Government, the second one has recently been more important, especially in the Republic of Korea. This is because the Internet has started to be used as the virtual space where citizens are united and their opinions are promptly shifted into the action such as a mass rally. Therefore, Anti-corruption and Civil Rights Commission (ACRC), one of the government sectors in the Republic of Korea, has decided to strengthen its activities of collecting public opinions by building up the online petition portal system. At last, in June 2006, ACRC constructed the online petition portal system called e-People1 on purpose to hear the voice of the nation by merging scattered online channels which had collected civil complaints and petitions. As a result, ACRC could achieve fame by developing e-People with its winning the Best Demonstration Stand Award at the e-Challenge Conference and Exhibition 2008 held in Stockholm, Sweden (ACRC, 2008). Nevertheless, we expect that e-People will be faced with another challenging need as Open Innovation gains increasing attention as a new paradigm for innovation in the company’s business. Shortly, the concept behind Open Innovation is that companies cannot rely entirely on their own research but they should instead buy or license processes or inventions from other companies (Chesbrough, 2003). Similarly, the government sectors will also need to consider putting the concept of Open Innovation into action in the way like the innovation of the national law and policy not only from the inside, i.e. themselves, but also from the outside, i.e. citizens. Thus, we predict that the concept of Open Innovation is going to affect on the activities of e-Government sooner or later. As a result, we reached an agreement that we should evolve e-People into the virtual space where boundaries between citizens and government sectors are melted down so that citizens can participate in innovating on the government system conveniently by filing petitions to e-People. However, currently there are problems and challenging issues until e-People realize the concept of Open Innovation in the government sector with raising e-Government of the Republic of Korea to the next level. And they can be summarized into two matters as follows. First, there is no objective and systematic method to analyze a large number of petitions filed to e-People without a lot of manual works of petition inspectors. As you may expect, petitions in e-People are collected from citizens all around the places in the Republic of Korea. Therefore, the number of petitions to be read is beyond the man’s ability. Besides, more than half data that a petition contains is text-based, and thereby it is difficult to understand petitions at a glance by manual works of petition inspectors. This makes it hard to conceive the voice of the nation on the basis of petitions filed to e-People. Therefore, we need to take advantage of text mining techniques. If keywords are elicited from text in petitions and petitions are clustered into petition groups with the elicited keywords, petition inspectors will be able to focus on analyzing the trend of petitions and consequently conceiving the voice of the nation with their manual works being reduced much in reading a great number of petitions. However, the result from our literature survey showed there are few researches done about applying text mining techniques to petitions to overcome these problems and we need to perform the related research. As the second matter, it is required to forecast the trend of petitions more accurately and quickly by using e-People rather than petition inspectors for the better national law and policy strategy. e-People currently perceives the importance of petitions after they get serious actually because they are evaluated by manual analyses of petition inspectors. So it takes a lot of time to find important petitions that might grow as nationwide problems while delaying planning and practicing their related national law and policy strategy. However, if the prediction models are built up by applying data mining techniques to petitions, we are going to be able to predict the trend of petitions more accurately and quickly by using e-People rather than petition inspectors. In the end, the predicted trend of petitions will contribute to making it possible for the government sectors to make a better decision on the national law and policy strategy. Therefore, to applying data mining techniques for forecasting the trend of petitions can be the challenging issue to be solved by us. Hence, we propose the framework of applying text and data mining techniques to petitions filed to e-People to solve those problems and challenging issues that are stated previously. In other words, we apply text mining techniques to unstructured data to elicit keywords from petitions and identify groups of petitions with the elicited keywords. Moreover, we apply data mining techniques to structured data of the identified petition groups to forecast the trend of petitions. To sum up contributions of our applying to text and data mining techniques to petitions filed to e-People, we provide an objective and systematic method for analyzing a large number of petitions filed to e-People and predicting the trend of petitions with manual works of petition inspectors being reduced. And we consequently help the government sectors to make a better decision on the national law and policy strategy on the basis of the trend of petitions forecasted by our approach. The rest of the paper is structured as follows. In Section 2, we introduce the taxonomy of e-Government, and we take a look at related works on keyword extracting and document clustering in text mining, and forecasting models with data mining techniques. In Section 3, we explain the framework of our methodology through three subsections: eliciting keywords from petitions; identifying petition groups; forecasting the trend of petition. In Section 4, we apply the methodology suggested in Section 3 to 8 groups of petitions filed to e-People, i.e. the online petition portal system constructed by ACRC of the Republic of Korea. In detail, we perform the 8 fold validation on the prediction models based on RBFNs and C5.0 after dividing the 8 petition groups of petitions repeatedly 8 times into 7 petition groups for training sets and the rest petition group for the test set. And we discuss the implication of results in the performance validation. In Section 5, finally we conclude the paper with discussion of contributions and further researches.

نتیجه گیری انگلیسی

In this paper, we put forward the framework of applying text and data mining techniques to petitions filed to e-People on purpose to solve those problems and challenging issues as introduced in Section 1. To sum up our framework explained through three subsections in Section 3, firstly, we suggested steps in which keywords are elicited from unstructured data, i.e. <title> ⋯ </title> and <body> ⋯ </body> of petitions (XML). Secondly, we proposed the way how petition groups are identified from consistent clusters of petitions after clustering analyses on petitions with the elicited keywords and how newly filed petitions are classified into one of petition groups. As the third step, we suggested forecasting the trend of petitions filed to e-People by adopting two types of data mining techniques such as RBFNs and C5.0 with formulas explained in Table 1 that produce feature values from structured data of petitions for each petition group. As the application in Section 4, we collected 4,217 petitions filed to e-People over 92 days from July 1, 2008 to September 31, 2008. Using the elicited 46 keywords from 4,217 petitions, we identified 8 petition groups (See Table 2). On the basis of petition information of the identified 8 petition groups, we performed the 8 fold validation on the constructed prediction models. And it was turned out that their estimated accuracies in testing phase were 62.35–86.96% by prediction models based on RBFNs and 93.32–97.98% by prediction models based on C5.0 (see Table 3). Consequently, trend values predicted for 8 petition groups were expressed as graphs in Fig. 9. And there we found out that the prediction models applied to three petition groups such as petition group 2, petition group 3, and petition group 5 forecasted the time when the predicted trend values of petition groups turn from ‘L’ to ‘H’ earlier than petition inspectors had evaluated manually petition groups from ‘L’ to ‘H’ (see Fig. 10, Fig. 11 and Fig. 12). Likewise, our methodology transformed a great number of petitions into the petition groups analyzable by petition inspectors on the basis of applying text mining techniques to unstructured data of petitions, i.e. through identifying petition groups by clustering petitions with the elicited keywords. And we expect this will decrease time-consuming manual works on reading and classifying petitions, and thereby petition inspectors will be able to concentrate on daily analysis of continually filed petitions with more efficiency. Subsequently, through applying data mining techniques to structured data of petitions, we could predict the trend of petitions with appropriate degree of accuracy, and we expect our forecasting models based on RBFNs and C5.0 will be able to replace petition inspectors’ decision making on the trend of petitions. Besides, we found out that our forecasting models based on RBFNs and C5.0 possibly predicts the moment when the trend value turns into ‘H’ earlier than the petition inspectors. This will help the government sectors to concentrate on improving the related national law and policy strategy by saving their time in finding and chasing significant groups of petitions that might grow as the nationwide problems. Moreover, if the priorities of petitions with the respect to their petition groups’ trend values are evaluated, it will lead to the priorities of the related laws and policies that are to be improved by government sectors. Eventually, these contributions by our paper will evolve e-People into the virtual space where boundaries between citizens and government sectors are melted down so that citizens can participate easily in innovating on the national law and policy by just filing petitions to e-People as the voice of the nation. As a further work, we would like to improve the performance in eliciting keywords from petitions by adding visualization methods based on semantic networks. This can be advantageous to our approach because visualization methods are known to be proper for representing unstructured data and its analysis results and the semantic networks consider the relationship among keywords. In addition, we wish to introduce additional formulas for new feature values and do researches on fining priorities among feature values used as input variables for prediction models. Finally, we have a plan to evolve e-People system by implementing our framework of applying text and data mining techniques to petitions filed to e-People. This will play an important role as a reference model in realizing Open Innovation in e-Government by enabling citizens to participate in innovating on the nation system conveniently by filing petitions to e-People