In this paper, we present a new method for estimating null values in relational database systems using automatic clustering and multiple regression techniques. First, we present a new automatic clustering algorithm for clustering numerical data. The proposed automatic clustering algorithm does not need to determine the number of clusters in advance and does not need to sort the data in the database in advance. Then, based on the proposed automatic clustering algorithm and multiple regression techniques, we present a new method to estimate null values in relational database systems. The proposed method estimating null values in relational database systems only needs to process a particular cluster instead of the whole database. It gets a higher average estimation accuracy rate than the existing methods for estimating null values in relational database systems.
In this information age, many enterprises use relational database systems for storing and processing data. In real-world applications, null values may exist in relational database systems. When some attributes of relational database systems have null values, they will not operate properly. Therefore, how to estimate null values in relational database systems is an important research topic. In recent years, some methods have been proposed to estimate null values in relational database systems (Chang and Chen, 2006, Chen and Chen, 2000, Chen and Hsiao, 2005, Chen and Huang, 2003, Chen and Lee, 2003, Chen and Yeh, 1998, Cheng and Wang, 2006, Huang and Chen, 2002 and Lee and Chen, 2002).
Chen and Chen (2000) presented a method to estimate null values in the distributed relational databases environment. Chen and Huang (2003) presented a method to generate weighted fuzzy rules for estimating null values in relational database systems using genetic algorithms. Chen and Yeh (1998) presented a method to estimate null values in relational database systems by generating fuzzy rules. Cheng and Wang (2006) presented some methods for estimating null values in relational database systems by utilizing clustering techniques for clustering data and using fuzzy correlations and the distance similarity measure to calculate the correlation of different attributes. Chen and Hsiao (2005) presented an automatic clustering algorithm to estimate null values in relational database systems. Huang and Chen (2002) presented a method for estimating null values in relational database systems with negative dependency relationship between attributes. Lee and Chen (2002) presented a method to estimate null values in relational database systems using genetic algorithms.
In this paper, we present a new method for estimating null values in relational database systems using automatic clustering and multiple regression techniques. First, we present a new automatic clustering algorithm for clustering numerical data. The proposed automatic clustering algorithm does not need to determine the number of clusters in advance and does not need to sort the data in the database in advance. Then, based on the proposed automatic clustering algorithm and multiple regression techniques, we present a new method to estimate null values in relational database systems. The proposed method for estimating null values in relational database systems only needs to process a particular cluster instead of the whole database. It gets a higher average estimation accuracy rate than Chen and Chen’s method (2000), Chen and Yeh’s method (1998), Chen and Hsiao’s method (2005) and Cheng and Wang’s method (2006) for estimating null values in relational database systems.
In this paper, we have presented a new method for estimating null values in relational database systems using automatic clustering and multiple regression techniques. First, we presented a new automatic clustering algorithm for clustering numerical data. The algorithm does not need to determine the number of clusters in advance and does not need to sort the data in the database in advance. Then, based on the proposed automatic clustering algorithm and multiple regression techniques, we presented a new method to estimate null values in relational database systems. The proposed method gets a higher average accuracy rate than the existing methods (Chen and Chen, 2000, Chen and Hsiao, 2005, Chen and Yeh, 1998 and Cheng and Wang, 2006) to estimate null values in relational database systems. In the future, we will extend the proposed approach to the fuzzy database environment, which can improve the crisp values’ constraint of traditional databases.