The design and implementation of web-based education systems have grown exponentially in the last years, spurred by the fact that neither students nor teachers are bound to a specific location and that this form of computer-based education is virtually independent of any specific hardware platforms. These systems accumulate a great deal of information which is very valuable in analyzing students’ behavior and assisting authors in the detection of possible errors, shortcomings and improvements. However, due to the vast quantities of data these systems can generate daily, it is very difficult to manage manually, and authors demand tools which assist them in this task, preferably on a continuous basis. The use of data mining is a promising area in the achievement of this objective (Romero and Ventura, 2006 and Romero and Ventura, 2007).
In the knowledge discovery in databases (KDD) process, the data mining step consists of the automatic extraction of implicit and interesting patterns from large data collections. A list of data mining techniques or tasks includes statistics, clustering, classification, outlier detection, association rule mining, sequential pattern mining, text mining, or subgroup discovery, among others (Klösgen & Zytkow, 2002).
In recent years, researchers have begun to investigate various data mining methods in order to help teachers improve e-learning systems. A review can be seen in (Romero & Ventura, 2007). These methods allow the discovery of new knowledge-based on students’ usage data.
Subgroup discovery is a specific method for discovering descriptive rules (Klösgen, 1996 and Wrobel, 1997). The objective is to discover characteristics of subgroups with respect to a specific property of interest (represented in the rule consequent). It must be noted that subgroup discovery aims at discovering individual rules (or local patterns of interest), which must be represented in explicit symbolic form and which must be relatively simple in order to be recognized as actionable by potential users. Therefore, the subgroups discovered in data have an explanatory nature and the interpretability for the final user of the extracted knowledge is a crucial aspect in this field. This task has been applied to different domains: detection of patient groups with risk for atherosclerotic coronary heart disease (Gamberger & Lavrac, 2002b), mining UK traffic data (Kavsek, Lavrac, & Bullas, 2002), personal web pages (Nakada & Kunifuji, 2003), identification of interesting diagnostic patterns to supplement a medical documentation and consultation system (Atzmueller, Puppe, & Buscher, 2004) or marketing problems (del Jesus, González, Herrera, & Mesonero, 2007).
This work proposes the application of subgroup discovery to the usage data of the course management system Moodle at the University of Cordoba, Spain. Moodle is a free open source course management system designed to help educators create effective online learning communities. Moodle has a flexible array of course activities such as forums, chats, quizzes, resources, choices, surveys, or assignments. Our objective is to obtain rules which describe relationships between the student’s usage of the different activities and modules provided by this e-learning system and the final score obtained in the courses. These rules can help the teacher to discover beneficial or detrimental relationships between the use of web-based educational resources and the student’s learning.
We will focus our attention in the use of a subgroup discovery algorithm-based on the use of genetic algorithms (GAs) called SDIGA (Subgroup Discovery Iterative Genetic Algorithm). SDIGA is an evolutionary model for the extraction of fuzzy rules for the subgroup discovery task. This algorithm is described in detail in (del Jesus et al., 2007). Its main characteristics are presented in this paper.
We compare the results obtained by this algorithm with those obtained by two classical subgroup discovery methods: Apriori-SD (Kavsek & Lavrac, 2006) and CN2-SD (Lavrac, Kavsec, Flach, & Todorovski, 2004). Furthermore, we also use an algorithm for class association rule discovery such as CBA (Classification Based on Association) (Liu, Hsu, & Ma, 1998). We will present an experimental study where SDIGA obtains the best results for our educational mining problem.
This paper is arranged in the following way: Section 2 describes the problem of discovering rules in e-learning and surveys some specific work in the area. Section 3 introduces the subgroup discovery task, the type of rules and quality measures used and the fuzzy evolutionary approach. Section 4 describes the e-learning case study, the experimentation carried out and the analysis of results. Finally, the conclusions and further research are outlined.
In this work we have described the application of subgroup discovery to e-learning, with the case study of the Moodle course management system. We have used real usage data pickep up from students at the University of Cordoba, Spain.
We have compared the results obtained by different algorithms for subgroup discovery, showing the suitability of the evolutionary subgroup discovery to this problem. In particular, SDIGA algorithm obtains a small number of rules which are highly understandable for the teacher. It also obtains similar results in the rules’ quality measures and optimum results in the accuracy of the rules.
Our final objective is to show the discovered rules and theirs measures to the teacher, so that he can decide on course improvement. We have shown how the teacher can make decisions concerning the courses’ activities and type of students in order to improve the course using the information provided by these rules.