The performance of financial decision-making directly concerns both businesses and individuals. Data quality is a key factor for decision performance. As the availability of online financial data increases, it also heightens the problem of data quality. In this paper, a taxonomy is created for data quality problems. More importantly, an ontology-based framework is proposed to improve the quality of online financial data. An empirical evaluation of the framework with the financial data of real-world firms provides preliminary evidence for the effectiveness of the framework. The framework is expected to support decision-making in finance and in other domains where data is spread across multiple sources with overlap but complementary in content.
Today's widespread financial problems and the economic downturn highlight the importance of financial decision-making to individuals, businesses, and organizations. Intelligence gathering is the first stage of decision making [57], and data quality is a key factor for decision performance [29]. It is reported [63] that 20% of asset managers, investment bankers and hedge fund professionals spend between 25% and 50% of their time in validating data, which prevents them from focusing on tasks that contribute to the bottom line. According to a recent study of the costs and other consequences of dirty or inconsistent data in the secondary mortgage market in the U.S. [22], inaccurate data results in slow and expensive loan processing, weak underwriting, incorrect portfolio management, and other costs to lenders and mortgage investors. Given that financial data including financial statements, market data, and business news are being used increasingly by investors in stock market predictions [16] and [53], data quality has become an important and widespread issue in financial decision-making.
The problems with financial data come in a variety of forms. The main problems of financial data include ambiguity, inconsistency, missing values, inaccuracy, misrepresentation, incompletion, and so on [40]. For instance, missing values are not uncommon in Standard & Poor's Compustat North America dataset. Such problems can directly impact the performance of financial decision-making. Under this backdrop, this study aims to answer the following research question: How should one address the quality problems of financial data so as to improve the performance of financial decision-making?
Both qualitative and quantitative approaches have been proposed to address various types of data quality problems [42]. For example, missing values can be replaced with global means or the most probable values [15]. Nevertheless, validating data quality is a challenging and time-consuming task [59]. This is especially true for financial data as an increasing amount of it becomes available on the Internet. The characteristics of the high frequency [25], high diversity and dependency of financial data render the conventional static approaches ineffective. Therefore, financial data calls for a synergic semantic alignment of various resources to improve financial data quality.
This study proposes a framework for addressing and identifying data quality problems following the design science research framework [28]. There are three types of artifacts created in our study. First, this study proposes an ontology-based framework to address the quality problems associated with online financial data. The framework is motivated by one unique feature of financial data, namely redundancy. Specifically, financial data about a firm is duplicated across multiple yet complementary online sources such as Yahoo!Finance, Google Finance, MSN Money Central, and Compustat. Yet the data are heterogeneous across different sources, even within the highly regulated financial domain. Our ontology is expected to address the above problem by enabling the mapping of data across different sources.
Second, this study creates a taxonomy and formalization of quality problems associated with financial data. The taxonomy, comprised of six types of quality problems such as missing values, is organized along two dimensions: the foundation and the abstraction level of ontology. Third, this study introduces a baseline method for evaluating the performance of financial decision-making that is based on fuzzy theories. In view of the uncertainty involved in financial decision-making, the neuro-fuzzy approach is expected to be more robust when faced with data quality problems. The results of this study demonstrate that the proposed framework is effective for improving the quality of financial decision-making.
The remainder of this paper is organized as follows. Section 2 provides background information on financial decision-making, financial data quality, and ontology. Section 3 presents a taxonomy of problems associated with online financial data. Section 4 introduces the ontology-based framework for improving data quality in financial decision-making. The framework is evaluated in Section 5 and the results are presented and discussed in Section 6. Section 7 concludes the paper.
An ontology-based framework is proposed in this research to improve financial data quality. The framework can be used to address various types of problems associated with online financial data. The positive impact of the framework on the performance of financial decision-making is empirically demonstrated with asset valuation.
Improving quality of financial data is just the first step toward effective financial decision-making. In view that the financial market is dynamic, the domain knowledge modeled in FinO (Financial Ontology) should evolve accordingly.