Total suspended solids (TSS) are a major pollutant that affects waterways all over the world. Predicting the values of TSS is of interest to quality control of wastewater processing. Due to infrequent measurements, time series data for TSS are constructed using influent flow rate and influent carbonaceous bio-chemical oxygen demand (CBOD). We investigated different scenarios of daily average influent CBOD and influent flow rate measured at 15 min intervals. Then, we used five data-mining algorithms, i.e., multi-layered perceptron, k-nearest neighbor, multi-variate adaptive regression spline, support vector machine, and random forest, to construct day-ahead, time-series prediction models for TSS. Historical TSS values were used as input parameters to predict current and future values of TSS. A sliding-window approach was used to improve the results of the predictions.
Total suspended solids (TSS) are considered to be one of the major pollutants that contributes to the deterioration of water quality, contributing to higher costs for water treatment, decreases in fish resources, and the general aesthetics of the water (Bilotta and Brazier, 2008). The activities associated with wastewater treatment include control of water quality, protection of the shoreline, and identification of economic life of protective structures (Mamais et al., 1998). Predicting suspended sediments is important in controlling the quality of waste water. TSS is an important parameter, because excess TSS depletes the dissolved oxygen (DO) in the effluent water. Thus, it is imperative to know the values of influent TSS at future time horizons in order to maintain the desired characteristics of the effluent.
Industrial facilities usually measure the water quality parameters of their influents two or three times a week, and the measurements include CBOD, pH, and TSS (Choi and Park, 2002 and Cartensen et al., 1996). Thus, the infrequently recorded data must be modified to make it suitable for time-series analysis. Sufficient associated parameters must be available to develop accurate TSS prediction models. Wastewater treatment involves complex physical, chemical, and biological processes that cannot be accurately represented in paramedic models. Understanding the relationships among the parameters of the wastewater treatment process can be accomplished by mining the historical data. A detailed description of various waste water treatment plant (WWTP) modeling approaches is described in Gernaey et al. (2004). Their review work is mainly focused on application of white-box modeling, and artificial intelligence to capture the behavior of numerous WWTP processes. Poch et al. (2004) developed an environmental decision support system (EDSS) to build real world waste water treatment processes. In another research, Rivas et al. (2008) utilized mathematical programming approach to identify the WWTP design parameters.
Data-mining algorithms are useful in wastewater research. Examples of data-mining applications reported in the literature include the following: (1) prediction of the inlet and outlet biochemical oxygen demand (BOD) using multi-layered perceptrons (MLPs), and function-linked, neural networks (FNNs) (Patricia et al., 2004); (2) modeling the impact of the biological treatment process with time-delay neural networks (TDNN) (Zhu et al., 1998); (3) predicting future values of influent flow rate using a k-step predictor ( Tan et al., 1991); (4) estimation of flow patterns using auto-regressive with exogenous input (ARX) filters ( Lindqvist et al., 2005); (5) clustering based step-wise process estimation ( Gibert et al., 2010); and (5) rapid performance evaluation of WWTP using artificial neural network ( Raudly et al., 2007).
In the research reported in this paper, the influent flow rate and the influent CBOD were used as inputs to estimate TSS. Due to the limitations of the industrial data-acquisition system, the TSS values are recorded only two or three times per week. The data must be consistent in order to develop time-series prediction models. Thus, we established two research goals: (1) to construct TSS time series using influent flow rate and influent CBOD as inputs and (2) to develop models that can predict TSS using the TSS values recorded in the past.
The paper is organized as follows. Section 2 provides details of the dataset used in the research. In Section 3, the TSS time-series models are discussed. In Section 4, data-mining models are constructed for predicting TSS. The computational results are discussed in Section 5. Section 6 concludes the paper with topics suggested as future research.
Data-mining algorithms were applied to predict total suspended solids (TSS). Numerous scenarios involving carbonaceous biochemical oxygen demand (CBOD) and influent flow rate were investigated to construct the TSS time-series. The multi-layered perceptron (MLP) model performed best among the five different data-mining models that were derived for predicting TSS. The accuracy of the predictions was improved further by an iterative construction of MLP algorithm models. The values of TSS were predicted seven days in advance with accuracies that ranged from 73% to 79%.
The research reported in this paper indicated that accurate predictions are feasible if sufficient data are available. The performance of the models constructed in the research will be tested for long term prediction once more data becomes available. Future research will involve application of the proposed models to optimize performance of wastewater treatment plants (WTPs).