تجزیه و تحلیل مبتنی بر موجک با رزولوشن چند منظوره برای پاک کردن داده ها و کاربرد آن در سیستم های مدیریت کیفیت آب
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
4439 | 2008 | 10 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 35, Issue 3, October 2008, Pages 1301–1310
چکیده انگلیسی
Data cleaning techniques are useful for extracting desirable knowledge or interesting patterns from existing databases in engineering applications. The major problems of conventional techniques (e.g., Fourier Transformation Technique) are that they are (1) more appropriate in linear systems than nonlinear systems, and (2) stringently depend on state space functions. In this study a wavelet-based multiresolution analysis technique (WMAT) is proposed for reducing noises induced by complex uncertainty. The approach is applied to a river water quality simulation system for showing its practicability in data cleaning and parameter estimation. Clean data are prepared through running a Thomas’ river water quality model and polluted data are synthesized by mixing clean data with white Gaussian noises. The results show that WMAT will not distort the clean data, and can effectively reduce the noise in the polluted data. The data denoised by WMAT are furthermore used for estimating the modeling parameters. It is also indicated that the parameters estimated with the denoised data through WMAT are much closer to real values than those (1) with polluted data through WMAT and (2) with data through Fourier analysis technique. It is thus recommended that the prepared data be used for estimating the modeling parameters until being cleaned with WMAT.
مقدمه انگلیسی
With the increasing ease of generating, collecting and storing data, we are living in an expanding universe of too much data (Sőrensen & Janssens, 2003). Extracting useful information is a must from these abundant data. Data mining is a process of extracting desirable knowledge or interesting patterns from existing databases for special purposes. The process mainly covers six stages: data collection, data preprocessing, feature extraction, patterns recognition, data visualization, and results evaluation. Conventional data mining techniques involve decision trees, multicriteria analysis (Zeng & Trauth, 2005), artificial neural networks (Zeng et al., 2003), statistical analysis (Battista & Visini, 2006), Bayesian data analysis (He, Chan, Huang, & Zeng, 2006), etc. There have been a variety of fields such as marketing, management, health care and other areas of computer, astronomy, bioinformatics, high-energy physics, chemistry, and environmental management (Hong et al., 2003, Babovic et al., 2002, Liu and Shih, 2005, Kusiak et al., 2005, Ye, 2003 and Huang, 2006; Yin et al., 2007 and Fisher et al., 2007). The above study efforts were normally based on an assumption that the data to be mined should be reliable and accurate. However, the data arising from investigation, experiment, and simulation processes may be polluted by noise signals due to the subjective and/or objective errors (Li and Shue, 2004 and Mu, 1996). For example, the experiment errors may be resulted from measurement, reading, recording, and external conditions; the simulation errors might cover model uncertainty, parameter uncertainty, and computation errors. Since these noisy signals are probably to distort the results of the data mining, it is a must to remove them (that is, signal denoising) before using any original data. Signals can be denoised through the application of a set of linear filters (Bell and Martin, 2004, Constable, 1978 and Mu, 1996). However, one problem of these filters is that they are more appropriate in linear systems than nonlinear systems. Another problem is that they are dependent of state space functions. While in fact, most of signals are nonlinear and can hardly be represented by a special state space functions. In addition, Fourier analysis technique (FAT) is a classical tool for reducing noises, but it is only suitable for denoising data/signals containing steady noises. Due to the noises that are unsteady in real-world cases, its application is still limited. To overcome the problems of traditional denoising techniques, more sophisticated techniques such as wavelet-based multiresolution analysis technique (WMAT) has been proposed. WMAT is useful for denoising multi-dimensional spatial/temporal signals containing steady/unsteady noises. It has been widely applied to engineering systems for patterns recognition and knowledge discovery (Avci, 2007, Ceylan and Özbay, 2007, Duport et al., 1996, Galal, 2002, Hobbs and Hepenstal, 1989, Hsieh and Kuo, 2008, Li and Shue, 2004, Lung, 2006, Lung, 2007, Mallat, 1989, Murtagh et al., 1995, Osowski and Nghia, 2002, Otazu and Pujol, 2006, Schutze, 2001, Sorzano et al., 2006, Subasi, 2007, Starck and Murtagh, 1998 and Tirtom et al., 2008). Nevertheless, few of these studies were applied to water quality management systems, where the water quality monitoring data needs to be used for parameter estimation (Dohan and Whitfield, 1997a and Dohan and Whitfield, 1997b). Therefore, the objective of this study is to propose a wavelet-based multiresolution analysis technique (WMAT) for cleaning the polluted water quality monitoring data. The technique, together with the traditional Fourier analysis technique (FAT), will be applied to a numerical example to illustrate the performance of WMAT in data cleaning. In addition, the denoised and non-denoised data will be simultaneously applied to a water quality management system for dealing with parameter estimation issues
نتیجه گیری انگلیسی
In this study, three groups of water quality data are collected to test the performance of WMAT on noise reduction. The data are composed of the CBOD, NBOD, and DO concentrations in a practical river segment with 3801 cross-sections. Virtually, it would be better to acquire these data from field experiments. However, it is hard and costly to undertake such a huge monitoring task. Therefore, we synthesize the data through a water quality model. In the system, a river segment with five sections is used to generate a one-dimensional spatial data. Based on it, polluted data are produced through mixing with white Gaussian noise data. WMAT and FAT are then employed to reduce noises from the polluted data. The results show that data denoised by WMAT are much closer to original data than that by FAT. This can demonstrate the denoising effectiveness of WMAT. It is because that FAT can only deal with steady noise or noise with low frequency, while WMAT is suitable for any types of noise. Although only the simulated data are tested, WMAT could be extended to test practical data polluted by noises in other studies. Without large volumes of data with high quality, the accuracy and reliability of conclusions from the data mining are problematic. However, the data distorted by various subjective or objective noises can hardly be reflected and found from the data itself. As a result, the first step of the data mining is to acquire clean data before we draw any conclusions. It thus can be proposed that the prepared data be denoised using WMAT before their use. Such a process would be safe according to the denoising results in this study. The reason is that the clean data will not be distorted after the denoising processing, while the polluted data can be effectively reduced. Therefore, WMAT provides an effective tool for water quality management. The collected data denoised by WMAT are used for modeling parameter estimation. The results show that the estimated parameters using denoised data by WMAT more approximate to real values than those by FAT and those without denoising. Particularly with the increase of noise intensity, the parameter will have more accurate estimated values through the data denoised by WMAT than those by FAT and those without denoising. This indicates the superiority of WMAT in cleaning the polluted data. Besides the modeling parameter estimation, there are many other environmental needs that the collected data be denoised. They include modeling calibration and verification, contaminant concentration predication, optimal water quality planning, and so on. Although not described in this study, WMAT could be used in dealing with these problems. This study only addresses the noise removal of spatial series. Virtually, one-dimensional time series may be required to be cleaned. There are also a number of study efforts focused on the time series prediction and periodic analysis. However, few of them could incorporate data cleaning techniques into the water quality management framework. Although this study attempts to use WMAT to clean the water quality monitoring data in one-dimensional spatial series, it can be extended for dealing with time-series data. The Db3 base function is adopted for the multiresolution analysis. In the wavelet theory, many types of base functions can be used such as Haar, Daubechies, Biorthogonal, and Mollet wavelets. Since different base functions are applicable for dealing with different types of noisy signals, the denoising performance using difference wavelets deserves to be compared. Selection of an optimal, suitable wavelet is an extended work in future studies.