دانلود مقاله ISI انگلیسی شماره 22273
ترجمه فارسی عنوان مقاله

داده کاوی جهانی : مطالعه تجربی از روند فعلی، پیش بینی آینده و اشاعه های تکنولوژی

عنوان انگلیسی
Global data mining: An empirical study of current trends, future forecasts and technology diffusions
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22273 2012 10 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 39, Issue 9, July 2012, Pages 8172–8181

ترجمه کلمات کلیدی
داده کاوی - روند تحقیقات و پیش بینی - دیفیوژن فناوری - روش کتابسنجی
کلمات کلیدی انگلیسی
Data mining, Research trends and forecasts, Technology diffusions, Bibliometric methodology
پیش نمایش مقاله
پیش نمایش مقاله  داده کاوی جهانی : مطالعه تجربی از روند فعلی، پیش بینی آینده و اشاعه های تکنولوژی

چکیده انگلیسی

Using a bibliometric approach, this paper analyzes research trends and forecasts of data mining from 1989 to 2009 by locating heading “data mining” in topic in the SSCI database. The bibliometric analytical technique was used to examine the topic in SSCI journals from 1989 to 2009, we found 1181 articles with data mining. This paper implemented and classified data mining articles using the following eight categories—publication year, citation, country/territory, document type, institute name, language, source title and subject area—for different distribution status in order to explore the differences and how data mining technologies have developed in this period and to analyze technology tendencies and forecasts of data mining under the above results. Also, the paper performs the K-S test to check whether the analysis follows Lotka’s law. Besides, the analysis also reviews the historical literatures to come out technology diffusions of data mining. The paper provides a roadmap for future research, abstracts technology trends and forecasts, and facilitates knowledge accumulation so that data mining researchers can save some time since core knowledge will be concentrated in core categories. This implies that the phenomenon “success breeds success” is more common in higher quality publications.

مقدمه انگلیسی

Data mining is an interdisciplinary field that combines artificial intelligence, database management, data visualization, machine learning, mathematic algorithms, and statistics. Data mining, also known as knowledge discovery in databases (KDD) (Chen et al., 1996 and Fayyad et al., 1996a), is a rapidly emerging field. This technology provides different methodologies for decision-making, problem solving, analysis, planning, diagnosis, detection, integration, prevention, learning, and innovation This technology is motivated by the need of new techniques to help analyze, understand or even visualize the huge amounts of stored data gathered from business and scientific applications. It is the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies and significant structures from large amounts of data stored in databases, data warehouses, or other information repositories. It can be used to help companies to make better decisions to stay competitive in the marketplace. The major data mining functions that are developed in commercial and research communities include summarization, association, classification, prediction and clustering. These functions can be implemented using a variety of technologies, such as database-oriented techniques, machine learning and statistical techniques (Fayyad, Piatetsky-Shapiro, & Smyth, 1996b). Data mining was defined by Turban, Aronson, Liang, and Sharda (2007, p.305) as a process that uses statistical, mathematical, artificial intelligence and machine-learning techniques to extract and identify useful information and subsequently gain knowledge from large databases. In an effort to develop new insights into practice-performance relationships, data mining was used to investigate improvement programs, strategic priorities, environmental factors, manufacturing performance dimensions and their interactions (Hajirezaie, Husseini, Barfourosh, et al., 2010). Berson et al., 2000, Lejeune, 2001 and Ahmed, 2004 and Berry and Linoff (2004) also defined data mining as the process of extracting or detecting hidden patterns or information from large databases. With an enormous amount of customer data, data mining technology can provide business intelligence to generate new opportunities (Bortiz and Kennedy, 1995, Fletcher and Goss, 1993, Langley and Simon, 1995, Lau et al., 2003, Salchenberger et al., 1992, Su et al., 2002, Tam and Kiang, 1992 and Zhang et al., 1999). Recently, a number of data mining applications and prototypes have been developed for a variety of domains (Brachman, Khabaza, Kloesgen, Piatetsky-Shapiro, & Simoudis, 1996) including marketing, banking, finance, manufacturing and health care. In addition, data mining has also been applied to other types of data such as time-series, spatial, telecommunications, web, and multimedia data. In general, the data mining process, and the data mining technique and function to be applied depend very much on the application domain and the nature of the data available. Using a bibliometric approach, the paper analyzes technology trends and forecasts of data mining from 1989 to 2009 by locating heading “data mining” in topic in the SSCI database. This paper surveys and classifies data mining articles using the following eight categories – publication year, citation, document type, country/territory, institute name, language, source title and subject area – for different distribution status in order to explore the difference and how technologies and applications of data mining have developed in this period and to analyze technology trends and forecasts of data mining under the above results. Besides, the analysis also reviews the historical literatures to come out technology diffusions of data mining. The analysis provides a roadmap for future research, abstracts technology trends and forecasts, and facilitates knowledge accumulation so that data mining researchers can save some time since core knowledge will be concentrated in core categories. This implies that the phenomenon “success breeds success” is more common in higher quality publications

نتیجه گیری انگلیسی

Using a bibliometric approach, the paper analyzes technology trends and forecasts of data mining from 1989 to 2009 by locating heading “data mining” in topic in the SSCI database. The bibliometric analytical technique was used to examine the topic in SSCI journals from 1989 to 2009, we found of 1181 articles with data mining. This paper surveys and classifies data mining articles using the following eight categories – publication year, citation, document type, country/territory, institute name, language, source title and subject area – for different distribution status in order to explore the difference and how technologies and applications of data mining have developed in this period and to analyze technology trends and forecasts of data mining under the above results. Also, the paper performs the K-S test to check whether the analysis follows Lotka’s law. The results in this paper have several important implications: (1) Based on the distribution of publication year, data mining has the potential to grow up and becomes more popular in the future. (2) An existing upward trend of data mining is expected to continue in the future from the distribution of citation. (3) On the basis of the countries/territories, the US, England and Taiwan are the top three countries/territories and the sum of the research output reaches 64.61% of the total publication. Australia, Canada, the P.R.C. and Germany also become the major academic work providers in the field of data mining research. Regarding to the relationship between article production and citation, there are only ten articles from Finland in data mining, however, its citations are 474 times in the domain. The others almost follow the article production ranking accordingly. (4) Regarding the institutions, Noish, Pennsylvania State University and the University of Wisconsin are the specific scholarly affiliation in data mining research. After analyzing the locations of these affiliations, the U.S. is still the most productive country within the research aspect in the world as well. Regarding to the relationship between article production and citation, there are only nine articles from Yale University in data mining, their citations, however, are the largest amount in the domain. The others almost follow the article production ranking accordingly. (5) The article is the main trend of document type in data mining research. (6) English is still the major tendency of language in data mining research. (7) Judging from the subjects, the most relevant disciplines for subject category of data mining provided by information science & library science, computer science & information system, operations research & management science, management, computer science & artificial intelligence, economics, computer science & interdisciplinary applications, public, environmental & occupational health, engineering, electrical & electronic and environmental studies and will become the most important categories for data mining researchers. The citation of data mining follows the article production ranking except statistics & probability, social sciences & mathematical methods, economics, computer science & artificial intelligence, engineering, electrical & electronic and computer science & information systems. (8) Based on the sources, the most enthusiastic supports for scholarly publishing enterprises of data mining come from Expert Systems with Applications, Journal of the American Medical Informatics Association, Journal of the Operational Research Society, Journal of the American Society for Information and Technology, Information Processing & Management, International Journal of Geographical Information Science, Journal of Information Science, Online Information Review, Information & Management, Decision Support Systems and Resources Policy and will turn into the most critical journals for data mining researchers. The citation of data mining follows the article production ranking except for Decision Support Systems, Information & Management, Journal of the American Society for Information Science, International Journal of Geographical Information Science and Scientometrics. (9) According to the K-S test, the result shows that the author productivity distribution predicted by Lotka holds for data mining. The reason why data mining does fit Lotka’s law is the rate of authors who published one article is close to constant c. The result causes that the difference between observed value and expected value becomes smaller than the K-S test critical value. The outcome causes the data mining distribution to fit the slope of Lotka’s law. The research findings can be extended to investigate author productivity by analyzing variables such as chronological and academic age, number and frequency of previous publications, access to research grants, job status, etc. In such a way characteristics of high, medium and low publishing activity of authors can be identified. Besides, the research findings can also support governments and enterprises to judge scientific research trends and forecasts of data mining, and to understand the scale of development of research in data mining through analyzing the increases of the article author. The resources are limited, especially for emerging and developing countries, and small and medium enterprises. Based on the above information, governments and enterprises may infer collective tendencies and demands for scientific researcher in data mining to facilitate the decision of appropriate training strategies and policies in the future. The analysis provides a roadmap for future research, abstracts technology trends and forecasts, and facilitates knowledge accumulation so that data mining researchers can save some time since core knowledge will be concentrated in core categories. This implies that the phenomenon “success breeds success” is more common in higher quality publications.