دانلود مقاله ISI انگلیسی شماره 22613
ترجمه فارسی عنوان مقاله

استخراج اطلاعات تجاری از صفحات وب نیمه ساختار یافته

عنوان انگلیسی
Business information extraction from semi-structured webpages
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22613 2004 8 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 26, Issue 4, May 2004, Pages 575–582

ترجمه کلمات کلیدی
تجارت الکترونیک - مرکز خرید اینترنتی - اطلاعات تجاری - استخراج اطلاعات - عامل -
کلمات کلیدی انگلیسی
Electronic commerce, Internet shopping mall, Business information, Information extraction, Agent,
پیش نمایش مقاله
پیش نمایش مقاله  استخراج اطلاعات تجاری از صفحات وب نیمه ساختار یافته

چکیده انگلیسی

To protect online consumers, as OECD Guidelines recommend, Internet shopping malls should provide information about their business on their webpages. In Korea, The Consumer Protection Law in Electronic Commerce, forced Internet shopping malls to provide their business information, so that consumers could easily identify them. Since most Korean Internet shopping malls provide consumers with business information in a semi-structured format on their homepages, a software agent can easily identify them. To investigate automatically the provision of the business information with the Internet shopping malls, this article proposes the methods of gathering URLs of Internet shopping malls, of monitoring alterations of webpages, and of extracting business information. Business information extraction in our research is based on synonyms and indicator words of the attributes. We used inductive learning to raise the efficiency of information extraction. With experiments, we showed the potentialities of our agent system. The average extraction accuracy of our agent system was 89.3%.

مقدمه انگلیسی

The factors that affect public confidence in Internet shopping malls are reputations of shopping malls, clearness of business information, protection policies for consumers' privacy information, and security policies for payment, etc. Among these factors, clearness of business information is a basic factor that can lead to confidence in shopping malls in the electronic commerce environment. OECD announced the Guidelines for Consumer Protection in Electronic Commerce in 1999 (OECD, 1999). The OECD Guidelines and the Guidelines of Membership Nations, created shortly thereafter, specify that Internet shopping malls should provide at least a basic minimum of business information on their webpages, including the name of the business, the name of the representative, geographical address, telephone number, fax number, e-mail address, and business license number. As examples, Fig. 1 depicts two homepages including business information: BEST BUY Co., INC. (www.bestbuy.com) in the US and LGeshop (www.lgeshop.com) in Korea As we can see in the examples of BEST BUY Co., Inc. and LGeshop, while most Internet shopping malls in the US provide their business information, scattered in several pages, in an unstructured format, most Korean Internet shopping malls provide their business information on the bottom of their homepages in a semi-structured format. In Korea, The Consumer Protection Law in Electronic Commerce, which came into effect in July 2002, forced Internet shopping malls to provide a minimum of seven forms of business information, including the name of the business, the name of the representative, geographical address, telephone number, fax number, e-mail address, and business license number, so that consumers could easily identify them. Therefore, in Korea, Internet shopping malls should provide their business information on their webpages. Since most Korean Internet shopping malls provide their business information in a semi-structured format, an agent can easily identify them when compared with other countries such as the US. If any shopping mall intentionally omits all or a part of the required business information, they can be regarded as a suspect of online fraud. If an organization would detect Internet shopping malls which lack business information and admonish them for not providing business information, it would enhance public confidence in electronic commerce; however, it is difficult for a person to visit a large number of Internet shopping malls' homepages to investigate whether or not they provide business information. According to an announcement by the Korea National Statistical Office on March 2003, the number of Internet shopping malls in Korea was 3188, generating 571.0 billion won (475.8 million dollars) in sales. Those numbers increased by 35.8 and 28.6%, respectively, compared with 1 year ago and are expected to increase continuously (Korea National Statistical Office, 2003). In December 2002, the ratio of Internet users in Korea is estimated to be 59.4% of the population, and it is estimated that 31.0% of those Internet users made online purchases (Korea Network Information Center, 2003). These statistics show the growth of electronic commerce. As electronic commerce grows, subversive activities such as fraud will become increasingly prevalent. The Korea Consumer Protection Board assisted 4631 customers with complaints related to electronic commerce for the first half of year 2002, and the ratio was increased by 110.1% compared with 2001. The number of grievances concerning electronic commerce forms 2.9% of all customer complaints, compared with 1.2% of year 2001, showing that the ratio of complaints related to electronic commerce is growing rapidly (Korea Consumer Protection Board, 2002). To grasp the status of providing business information of Internet shopping malls, 33 monitors of the Cyber Consumer Council (www.consumer.go.kr) investigated whether or not 380 Internet shopping malls provide business information on their webpages (Korea Fair Trade Commission, 2000). However, it was very difficult for the members of the monitoring organizations to visit a great number of Internet shopping malls to investigate their business information. For investigation efficiency, a software agent is needed. Because Internet shopping malls open and close their businesses on the web everyday, public organizations should monitor Internet shopping malls at all times. A monitoring system to investigate business information of Internet shopping malls should have following components: an agent to collect URLs of Internet shopping malls, an agent for tracing whether or not the webpages of Internet shopping malls have been altered, and an agent to extract business information from webpages. To resolve the above issues, we organized our article as follows. In Section 2 we review literature on systems to monitor fraudulent sites and literature on implementation of wrappers. In Section 3, we proposed the methods to gather the URLs of Internet shopping malls, to monitor alteration of webpages, and to extract business information. In Section 4, we conducted experiments to show the usefulness of the agent system. Finally, in Section 5, we conclude by suggesting further applications and limitations of the proposed agent system.

نتیجه گیری انگلیسی

This study can be useful for government authorities and consumer protection organizations to promote the growth of electronic commerce and protect online consumers. First, the agent can find Internet shopping malls which did not register intentionally with the local government or the KECDMA. Second, this study can help to grade the clearness of business information and can help to find the suspects of unethical business practices. If Internet shopping malls provide correct business information, it will increase the agent system's accuracy to extract business information and raise consumer confidence. It is a starting point that Internet shopping malls provide business information on their webpages to protect online consumers and to get public confidence. The Korean case of providing business information in the semi-structured format and use of software agent to investigate the provision of business information, can be useful for the promotion of electronic commerce in other countries. The software agent is very efficient when we extract business information that Internet shopping malls provide on their webpages. Since synonyms of attribute names and indicator words of attribute values affect the accuracy of extraction results, it is important to manage the database of synonyms and indicator words. Although the agent system we have proposed is efficient to extract the business information of Internet shopping malls, there are some limitations, because webpages are not uniformly structured and business information is written in diverse forms. The specific limitations are as follows. • Graphic information: some shopping malls provide business information in the graphic form. In this case, the agent cannot extract it. • Vagueness of representation: in some cases, name of business and name of representative are provided without any indicator word. In this case, if the attribute name is not provided, the agent cannot extract those attribute values. • Unintelligible characters in business information: If the start and end points of business information are vague or if there are HTML tags between attributes, it is difficult for the agent to locate attribute names and extract attribute values correctly. Since programming skills to build homepages are growing more diverse and more sophisticated, it is not easy for the agent to cope with those changes. For efficient information extraction by the agent as well as for consumer protection, it would be better if following criteria are kept when Internet shopping malls publish business information on their webpages. • All business information should be written on the homepage. • All business information should be written in the business information area of the homepage. • All business information should be written in text mode.