سیستم پشتیبانی تصمیم گیری مبتنی بر متن برای پیش بینی توالی مالی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
5738 | 2011 | 10 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Decision Support Systems, Volume 52, Issue 1, December 2011, Pages 189–198
چکیده انگلیسی
Although most quantitative financial data are analyzed using traditional statistical, artificial intelligence or data mining techniques, the abundance of online electronic financial news articles has opened up new possibilities for intelligent systems that can extract and organize relevant knowledge automatically in a usable format. Most information extraction systems require a hand-built dictionary of templates and thus need continual modification to accommodate new patterns that are observed in the text. In this research, we propose a novel text-based decision support system (DSS) that (i) extracts event sequences from shallow text patterns, and (ii) predicts the likelihood of the occurrence of events using a classifier-based inference engine. The prediction relies on two major, but complementary, feature sets: adjacent events and a set of information-theoretic functions. In contrast to other approaches, the proposed text-based DSS gives explanatory hypotheses about its predictions from a coalition of intimations learned from the inference engine, while preserving robustness and without indulging in formalism. We investigate more than 2000 financial reports with 28,000 sentences. Experiments show that the prediction accuracy of our model outperforms similar statistical models by 7% for the seen data while significantly improving the prediction accuracy for the unseen data. Further comparisons substantiate the experimental findings.
مقدمه انگلیسی
There has long been a strong interest in applying computational intelligence to the analysis of financial data. Such analysis has traditionally concerned forecasting based on past price data. One area of limited success in financial prediction comes from textual data [39]. Textual data contain more information than numeric data because the former not only allow us to predict financial trends but also provide us with justification of the predictions. For example, a news article on a company containing words and phrases such as “shortfall”, “risk of default”, “resignation” gives reason to expect a fall in the company's stock price, even if the company's reported financial figures appear sound. The current availability of huge volumes of financial electronic text has created a pressing need for better knowledge discovery and the construction of applications for managing the knowledge that is extracted. Most of the existing research into financial text mining or knowledge discovery from text (KDT) relies on the identification of a predefined set of keywords. In this approach, a text is usually scanned for a specific type of event template, such as corporate acquisitions. The main goal is to fill in the values for sets of handcrafted and predefined template slots. Consequently, the construction of event extraction templates is a fairly laborious activity. It is difficult to design templates that anticipate all of the possible combinations of events or objects of interest that can be described, as well as to cover trivial redescriptions. Given the weakness of the approach and the demand for high-level representations, that is, not just keywords, to take advantage of linguistic knowledge, there has been considerable interest in the development of an automatic means of learning shallow event patterns from text, without indulging in linguistic formalism. In this article, we propose a novel inference engine for financial text sequence prediction that brings together the benefits of shallow text processing and classifier-based inferences to produce effective knowledge discovery. Our approach aims to extract key underlying event sequences from financial texts and then hypothesize and assess incoming, even new and unseen, event sequences in the prediction. Unlike similar approaches, an inferential mechanism is developed in the engine that can extract event sequences from a collection of relevant texts, and collate the sequences in such a manner that both explicit and implicit information can be tailored to the needs of users. The task we are addressing and the problem of predicting the financial event sequence can be stated as follows. Given a corpus of financial documents that demonstrate event sequences, we explain how to extract all of the event sequences from the texts and predict the interesting and unseen relationships between them. The rest of this article is organized as follows. Section 2 provides an overview of the research that applies linguistic style information to enhance KDT. Section 3 gives an overview of our system and issues regarding text preprocessing, shallow parsing, textual information generalization, and event sequence extraction. Section 4 describes the design of the inference engine for event sequence prediction. A practical boosting algorithm is introduced into the engine to produce a set of prediction rules. Numerous features that characterize the sequences and their latent inter-event relations are captured in the engine. The system prototype is implemented and we conduct a series of experiments to evaluate and compare the engine with the hidden Markov model. Section 5 provides an overview of our experimental design. We also quantify the outcome and give a detailed analysis of the results in our evaluation. Finally, the conclusions and further research directions are presented in Section 6.
نتیجه گیری انگلیسی
The efficient-market hypothesis (EMH) in finance asserts that the price movements are an unbiased or rational reflection of all past or incoming publicly available information about future earning prospects. However, the models of bounded rationality have suggested that humans are inherently limited in their ability to process information [36]. A decision support system that can unveil the hidden regularities and uncover the trends in price movement found in financial news has become inevitable in this age of information explosion. While most DSS make use of clean and orderly data which have usually been normalized in their own databases, in this paper, we have devised a DSS and demonstrated one of the techniques which can extract the knowledge from text with limited human intervention. This research not only simply studies the structure of financial news articles. More importantly, it illustrates how the useful textual information can be extracted from pieces of raw text, and consequently, how they are converted into event sequences. Unlike all other similar approaches, our DSS does not rely on a bag of words; rather, it uses a shallow language model to eliminate the incorrectly conflated information. As a result, the event collocations, which demonstrate the intertwined links among the events, can be captured in the sequences. In addition, we have moved one step forward in exploring how the extracted event sequences can be used to support the financial sequence predictions. Our DSS collates the prior event sequences in such a way that both explicit and implicit knowledge can participate during the predictions. The predictions seek to watch for events that may be missing, but underlying, in the financial news. Another important contribution of this research is the design of the classifier-based inference engine. We have provided a detailed discussion on the inferential mechanism of the inference engine and explained how the engine is applied into the predictive analytics as in other DSS. The engine provides a robust and efficient means to handle predictions in financial texts, with the support from the extracted event sequences which have a high propensity to provide evidences why the predictions can be made. At the same time, we have evaluated the system and provided a head-to-head comparison with the state of the art hidden markov model. The results demonstrate our engine outperforms the counterpart. While texts usually convey more diverse and “rich” information than purely numerical data, it is well-recognized it is complicated, if not impossible, to exploit re-occurring textual patterns that can support or perform sequence predictive analytics. The instrumental rules used in sequence prediction are not always obvious. Our approach has provided some hints to extract such rules no matter how hidden they may be in the transactional data, particularly in facing the tremendous collections of textual data. Lastly, while the findings are certainly interesting, we have to accept that the experiments conducted may inevitably involve some minimal human intervention, particularly in sense tagging. Indeed, it is intended primarily as a prototype for illustrating the relevant issues in an uncomplicated setting. A massive infusion of linguistic knowledge into the text-based DSS is obligatory when the DSS is being fully deployed. Further investigation needs to be done to extend the work to an unrestricted and fully functional model; however, we have demonstrated and implemented one of the alternatives in revealing the relevant knowledge that can support DSS from texts.