ترجمه فارسی عنوان مقاله

داده کاوی به کار گرفته شده در تخصیص بودجه اکتسابی مواد برای کتابخانه ها : طراحی و توسعه

عنوان انگلیسی

Data mining applied to material acquisition budget allocation for libraries: design and development

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
22039	2003	11 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 25, Issue 3, October 2003, Pages 401–411

ترجمه کلمات کلیدی

اکتساب تخصیص بودجه - شمارگان - داده کاوی

کلمات کلیدی انگلیسی

Acquisition budget allocation, Circulation, Data mining

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Library management frequently faces the need of making the value of the acquired materials significant as far as the most beneficial use of the allocated acquisition budget is concerned. Knowledge in the circulation databases can be explored in-depth to relevantly reflect this need. In this paper, a data mining based model (DMBA) is designed and developed to help allocate the library material acquisition budget by opening up the utilization of library materials that users have made use. The developed model is based on the feature of ID3 algorithm to explore explanatory knowledge via information theory and statistics to derive appropriateness via utilization gain. The main output of the DMBA is the weights as the basis of library material acquisition budget allocation for departments via the combination of explored explanatory knowledge and appropriateness. The developed DMBA was supported by a practical application case.

مقدمه انگلیسی

The objective of the budget allocation is to smoothen the progress of planning and resource allocation in an organization to itemize, slice up and examine all of the products/services that are offered to patrons (Robort, 1998). Seer (2000) has mentioned that in the library budget allocation process, it is important to explain how to arrive at the numbers that are used as a base of budget allocation. To deal with the most advantageous use of materials that are budgeted, it is thought that the performance with respect to the use of materials that an academic department has made should be able to appropriately reflect the budget being allocated. Importantly, since the budget is increasingly limited, it should be an inviolable policy that the more use of materials a department makes, the more budget it is allocated. Consequently, knowledge discovered in the circulation databases is fairly valuable for an academic library to support allocating budget for departments as per appropriateness. In response to the decision of acquisition budget allocation, Greaves (1974) indicated that the circulation statistics has to be taken into account to reflect the practical demands. The survey provided by Tuten and Lones (1995) and the study conducted by Budd and Adams (1989) also emphasized that circulation statistics is one of the most extensively referred factors in the decision of desired allocation of budget. Furthermore, in the study of the measurement of the resource utilization efficiency for university libraries, Chen (1997) put the book circulation onto the measurement list as one of the most important dimensions. More recently, Wise and Perushek (2000) have pointed out that circulation is a reliable factor to evaluate the success of material utilization in their research that employed the goal programming as a solution technique to solve the acquisition allocation problem. These research works have indeed shown a substantial aid for the improvement of budget allocation process. However, a drawback that is revealed is the lack of a model that can be employed to help effectively derive the utilization performance with respect to the use of materials in different subjects represented by academic departments, and in consequence cannot pull a proper sign that the operation of budget allocation can rely on. For example, the information that ‘the circulation database of this academic year indicated that the department of International Trade made much more use of materials in its subject than others in theirs’ is more decision-supportable than that ‘the circulation database of this academic year indicated that the department of International Trade made much more use of materials than others’ while budgeting. The techniques that are used to help in determining academic library acquisition budget allocation generally include goal programming and statistics. Goal programming deals with the development of mathematical models that can offer near optimal solutions to the problem of multiple, competitive, and conflicting objectives by giving the rank order of the concerned goals and the constraints of the concerned factors in advance. Wise and Perushek (1996, 2000) have demonstrated the successful use of goal programming for the acquisition allocation problem. The statistics concentrates on the corresponding shared ratio for the concerned factors that are contained in a hierarchical decision tree (Anderson, Sweeney, & Williams, 1994). Although both techniques have shown a significant contribution to the support of library budget allocation decision for departments, a problem that is seen is the illustration of information discovered in the historical circulation data. In other words, the use of materials that a department has made should appropriately reflect its acquisition budget allocation for the coming academic year. In response to this need, Kao, Chang, and Lin (2003) present a decision support model that can discover information in circulation databases for the library acquisition budget allocation. However, the huge labor consumption for the complex computation may limit its applicability, and thus should be improved from the manual operation toward computerization. Basically, it is a fairly complex task to deeply explore meaningful information in the historical circulation database that stands for the material utilization for a department. To simply get the percentage of the number of records a department has made use of materials to the total records in the collected circulation database for a period of time is not sufficient. The preprocessing of circulation data, the definition of strength that a material belongs to a department, and the complexity of gain computation are all issues that may cause the task to be fairly complicated. The data collection and preparation needs to gather the circulation data from daily operations and store in a database, clean unnecessary attributes (or fields) and missing data if existing, and reconstruct the created database if necessary. The strength that a material belongs to a department needs to be suitably defined in order for a department to compute the total utilization gain with respect to its performance. For example, the strength that a material is categorized into the subject of financial economics may be defined to be related to the department of Accounting with a degree of ‘absolutely matching’, Information Management with ‘partially matching’, and electrical engineering with ‘not matching’. This implies that when a material classified into financial economics was utilized, the department of Accounting performs better than both Information Management and Mechanical Engineering because it made use of a material in its subject more appropriately. In this case, an assumption that has to be made is that a department uses its own allocated budget to acquire materials only in its subject. In spite of our belief that some exceptions may occur and are very difficult to eliminate, it is a general policy that the more the suitable use, the more the budget being allocated. For example, although the department of Electrical Engineering may possibly acquire materials in the subject of financial economics, the use made by Electrical Engineering will be regarded as not matching while measuring its performance. In this study, we do not attempt to discourage a department making use of library materials that are not in its subject. However, when the problem of making the value of allocated budget obvious is concerned, we have to consider with greater care whether or not a department makes suitable use of materials. The Data Mining (DM) with a capability of description and prediction can explore patterns in large historical databases that are meaningful, interpretable, and decision-supportable (Chen et al., 1996, Fayyad, 1996, Fayyad and Stolorz, 1997 and Hirota and Pedrycz, 1999). Lee and Siau (2001) have addressed in a basic detail the use of major DM techniques including artificial intelligence, decision tree paradigm, genetic algorithm, visualization, and statistics. The applications in various domains that are depicted in literature in the past few years have also witnessed the increased use of DM, such as hotel data mart (Sung and Sang, 1998), personal bankruptcy prediction (Donato et al., 1999), customer service support (Hui & Jha, 2000), and the special issue edited by Kohavi and Provost (2001) of a referable journal. Bigus, 1996 and Adriaans and Zantinge, 1996 also presented a fundamental concept for the applicability of DM in business problems covering marketing segmentation, customer ranking, real estate pricing, sales forecasting, customer profiling, and prediction of bid behavior of pilots. In this study, the discovered knowledge with an explanatory form and the measure of appropriateness is used to support the making of the library acquisition budget allocation decision. For example, the explanatory knowledge that ‘the material categories that the department of International Trade made use were more concentrative than those of the other departments’ is helpful for the library budget allocation operation. Moreover, regarding the appropriateness ‘The appropriateness of the use of materials by the department of International Trade was much higher than that of others’ is also fairly decision-supportable. Usually, the size of a circulation database is very big, and consequently the complexity of gain computation may significantly influence the processing efficiency. This research employs the technique of Structured Query Language (SQL) to overcome this critical problem. The SQL is a well-known database technique that helps in efficiently retrieving the data satisfying a user's requirements (Connolly, Beg, & Strachan, 1996). Han et al., 1996 and Imielinski and Virmani, 1999, and Meo, Psaila, and Ceri (1998) have successfully made its obvious use to explore knowledge in large databases. Therefore, the SQL is employed in this study to help data preprocessing. The advanced information technology facilitates the computer based system development process when one is dealing with complex computation in various domains. In this present research, a data mining based decision support model (DMBA) is designed and developed to help explore knowledge in circulation databases that are used to support the making of the library budget allocation decision. It designs and develops a computer-based decision support model (DMBA) that embeds a circulation database mining mechanism to help produce a table that contains the final allocated acquisition budget for academic departments. The remainder of this paper is organized as follows. Section 2 describes the research methodology where the architecture of DMBA, the circulation data preprocessing, the knowledge discovery in circulation databases, and the generation of weights of budget allocation are contained. An example used to test and validate the developed DMBA is delineated in Section 3. Section 4 delineates a practical application case for the developed DMBA. Section 5 addresses the conclusion and future research focuses.

نتیجه گیری انگلیسی

In this paper, we have highlighted the value of circulation data processing in a basic detail, developed a decision support model (DMBA) that embeds the data mining technique for the library budget allocation operation, demonstrated the use of DMBA, and delineated an application for LKSUT in practice. Basically, the current study of DMBA extends the research work conducted by Kao et al. (2003) by computerizing the model to increase its applicability. It employs SQL to help efficiently preprocess the circulation data when necessary, utilizes information theory to measure the degree of concentration for categories observed in the circulation data table, and uses statistics to compute utilization gain as the degree of appropriateness, and combines the degree of concentration and appropriateness to derive the final weights as a decisional base of acquisition budget allocation. It offers a new way of processing the circulation data at hand to elicit information that can interpret the data in a more appropriate mode. The developed DMBA has been used to efficiently support making decisions in regard to the library acquisition budget allocation. Although the DMBA can serve as a budget allocation advisor, the final results are still highly affected by subjective information provided by management. For example, the definition of linguistic strength for categories and departments and the value of ξ are factors that need to be determined carefully. Additionally, in this study the data collected was limited to only the circulative materials. It can be extended to other types of materials, typically to the on-line materials, by analyzing the hits and residing time to obtain the result in more completeness for material acquisitions. While librarians are paying increased attention on easing access, filtering and retrieving knowledge sources, and bringing new services onto the Web and users are industriously looking for their needs and figure out what are really good for them, budget still plays the final role. It has been seen that the availability of accessing materials via Internet is rapidly changing the strategy as a transition from print to electronic forms for libraries. Importantly, as mentioned by Miller (1999), the large amount of expenditure during the past decade has revealed that a rather swift shift or reallocation of the collection budget from print to electronic publications makes the budget allocation decision more complex and difficult. For example, what can be relied on while making the decision on which electronic journals or e-books are good for our library? Indisputably, data collection via daily circulation work will be greatly influenced by the way a user makes use of the on-line materials, and in consequence makes the budget allocation operation even more difficult. For example, how to deal with the number of login and the number of entrance when analysis of arrival is concerned with? In spite of the fact that many issues and arguments have been brought and are being brought onto the discussion and research platform, it is our belief that ‘allocating more money for the things that are more essential’ should be an inviolable policy while budgeting. After all, budget is not unlimited. Furthermore, it is advantageous to discover unknown information in historical data to support making not only budget allocation related decisions, but other management related topics such as personalized information service via the individual transaction profile. Finally, a decision support model that helps in information processing with large data involved should be computerized to increase its applicability.