ترجمه فارسی عنوان مقاله

مبادله خدمات زیرساخت برای حداقل هزینه تأمین تجهیزات داده ها بر اساس مدل کیفیت ـ کمیت

عنوان انگلیسی

Brokering infrastructure for minimum cost data procurement based on quality–quantity models

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
16964	2008	15 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Decision Support Systems, Volume 45, Issue 1, April 2008, Pages 95–109

ترجمه کلمات کلیدی

کیفیت داده ها - بازار اطلاعات - اقتصاد اطلاعات - بهینه سازی هزینه کیفیت - داده های بسته نرم افزاری - مبادله خدمات - برنامه ریزی خطی صحیح

کلمات کلیدی انگلیسی

Data quality, Information market, Information economics, Quality cost optimization, Bundle of data, Brokering service, Integer linear programming,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Inter-organization business processes involve the exchange of structured data across information systems. We assume that data are exchanged under given condition of quality (offered or required) and prices. Data offer may include bundling schemes, whereby different types of data are offered together with a single associated price and quality. We describe a brokering algorithm for obtaining data from peers, by minimizing the overall cost under quality requirements constraints. The algorithm extends query processing techniques over multiple database schemas to automatically derive an integer linear programming problem that returns an optimal matching of data providers to data consumers under realistic economic cost models.

مقدمه انگلیسی

For large businesses and public sector agencies, good management of information assets has long been a key to their effectiveness in delivering quality services to users, and many organizations have processes to manage the quality of their data. Recently, advances in the technology for large-scale deployment of information services, for example over service-oriented software infrastructures, have enabled cost-effective data exchange across organizations. In business terms, this means that it is becoming increasingly feasible for organizations to (i) purchase or otherwise acquire data from other peers, and (ii) exploit their own information assets for marketing purposes. These capabilities may be used to offer advanced services to users. Thus, a general common trend is for organizations to acquire the information needed to support user services from third-parties. Several studies have analyzed the economic relevance of the potential information market. Public agencies have been found to be the greatest producers of information by far, and the information they create and disseminate is often relevant for both the private and public processes, products, and services. In [33] an analysis of the commercial exploitation of public sector information is presented both for the USA and the European Union (EU). The study shows that the economic value of the information market in the EU for year 2000 amounted approximately to 10% of that of the US, where it was 750 billion dollars, and it recommended regulating the information market, to provide further incentives for the public sector information trading across and within member states. To understand the implications of this trend, the size of the information market must be compounded with the issue of its quality, as a factor that will presumably affect the cost of data and hence the overall information market. Quality of data has been an issue since the nineties. General frameworks are available from the literature for describing data quality properties, or dimensions [37] and [38]. For instance, accuracy characterizes how well data represents its corresponding real-world entities. Another main issue concerned with information market is represented by offering bundles of data, which are indivisible units of data, each one with a single associated price and quality level. In fact, both the cost structure behind the production and the selling of digital information goods, and the necessity of implementing anti-competitive strategies can induce more and more data providers to offer indivisible units of different types of data (for example [28]). Focusing again on the public sector, it is well known that public agencies, in order to provide services to citizens and businesses, manage large registries with overlapping and heterogeneous data, and exchange large amounts of data flows. Such a huge number of registries, from one side is characterized by a high overlap, from the other side they are usually managed and updated with different policies, resulting in different levels of accuracy and other quality dimensions. In many data intensive processes sources are combined, and it is important for agencies and private users to be able to choose and compose data on the basis of the desired target quality. In other terms, the availability of such overlapping sources of data may be seen as an opportunity for the data demand, that may use a quality driven query processing strategy [27] that builds the global data set on the basis of the differentiated offer of data characterized by different qualities. Furthermore, the quality of data has a cost, and, at the same time, heavily influences the quality, the cost, and the revenues of the processes that use the data. While considering the relationship between the quality and cost of quality issues, some authors start their analysis from a parallel between the emerging information market and established markets for other goods [7], with the final purpose of defining criteria for data quality control and improvement. These activities, like for other types of goods, have a cost which is a component of the selling price. Furthermore, in order to conceive rational methodologies for improving the quality of data, several authors have proposed data quality cost classifications [13] and [25] and cost/quality optimization procedures [6] that investigate the various different types of cost of non quality of data. Issues of quality driven query processing and cost/quality optimization have been addressed only recently so far. In particular, in the field of Decision Support Systems, the topic of data quality models to support and improve different situations of decision making has been studied in [36], [10] and [20]. In this paper we propose a brokering algorithm that provides a cost quality broker service for facilitating the procurement of data from third parties, based on the assumptions that consumer interest for data is based both on its cost and on its quality, and that distinct data can be sold together in a bundle with a single associated quality and price. The algorithm, starting from: (i) the offer of data with possible bundling schemes from a set of providers, its quality and cost, (ii) the global, integrated knowledge on the information content offered by providers, and (iii) a query, that expresses the data demand, namely data requested by consumers and their quality, provides the optimal choice in terms of selected data, their quality and cost. We note that the broker service can be used as a decision support system for managers who have the responsibility of information acquisition activities. The rest of the paper is organized as follows. In Section 2 the information procurement scenario underlying our approach is presented, together with a first overview of the algorithm is presented, and basic definitions. The two phases of the algorithm, decomposition and optimization, are detailed in 3 and 4, respectively. A discussion on related work is presented in Section 5; Section 6 concludes the paper.

نتیجه گیری انگلیسی

We have presented a brokering algorithm that supports managers in the process of buying information from multiple data sources, that are characterized by different cost and quality. The algorithm accepts (i) a collection of quality vectors, one for each record in the sources, and (ii) a query over a global schema, as well as the mappings from local to global schema (in a local-as-view setting). It computes the most complete answer to the global query with the best cost-quality ratio. The algorithm consists of two phases. During the first phase, using the schema mappings, a set of local fragments for the query result are identified. In the second phase a variable is associated to each fragment, while their corresponding quality and cost are used to formulate constraints for an ILP problem. The problem solution contains a complete answer obtained under quality constraints, and at minimal cost. The first phase includes a particular case of a query subsumption problem. However the simplifying assumptions on the conditions make it polynomial. Although the second phase is known to be NP-complete, the size of the problem which is determined by the number of providers and of local schemas is expected to be small. The algorithm in practice is meant to be used by decision-makers with the responsibility of acquiring quality data from third parties. The query answer is computed under the assumption that the quality vectors, supplied by the data provider reflect the best quality information known to the provider. In reality, providers may provide misleading quality information to their advantage, and consumers have normally no way to verify this information during the course of query execution. As mentioned in Section 5, this issue is normally addressed by assuming that trusted third parties have auditing authority over the providers, and that they can issue penalties when the information is found to be untruthful. The algorithm can be usefully extended to support a coordinated spot market, where multiple consumers simultaneously require portions of data with specified quality levels, and multiple suppliers submit their offers and associated quantity-quality matrices to a Central Public Supplier (CPS) mediator. For instance, the CPS might be in charge of selling data owned by multiple local public agencies to individuals, businesses and other public agencies. In this case, in order to exploit the quantity/quality discounts as much as possible, the CPS could coordinate the purchasing process by collecting and then matching the overall demand and offer. In particular, the problem of allocating offered data among consumers can be formulated as a simple extension of the ILP presented in Section 4. We are interested in implementing the DSS presented in this paper and to develop the whole model underlying the coordinated spot market outlined above.