Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

1/6

Using Data Warehouse and Data Mining Resources for

Ongoing Assessment of Distance Learning

Daniela Resende Silva1

E-mail: [email protected]

Marina Teresa Pires Vieira

E-mail: [email protected]

Department of Computer Sciences

UFSCar - Federal University of So CarlosRod. Washington Lus, Km 235

Caixa Postal 676

13565-905 / So Carlos SP Brazil

Phone/Fax:(55 16) 260-8232

Abstract The work proposed herein presents an approach that

differs from the existing ones for the ongoing assessmentof distance learning using some of the aspects relating to

those utilized in the above cited studies.

This paper discusses the use of Data Warehouse and

Data Mining resources to aid in the assessment of

distance learning of students enrolled in distance courses.

Information considered relevant for the assessment of

distance learning is presented, as is the modeling of a

data warehouse to store this information and the

MultiStar environment, which allows for knowledge

discovery to be performed in the data warehouse.

Section 2 provides a set of information to guide the

implementation of ongoing assessment of learning in

distance learning environments, while Section 3 briefly

discusses the modeling of a data warehouse based on the

set of information proposed. Section 4 presents the

implementation of this data warehouse using the

MultiStar environment, and finally, Section 5 lists our

conclusions to this paper.1. Introduction

2. Ongoing Assessment of Distance Learning

A variety of applications have benefited from the useof Data Warehousing technology [1, 2, 3] to support

management analyses, which can be obtained through the

use of Data Mining [4]. The joint use of Data

Warehousing and Data Mining techniques is a trend in

KDD Knowledge Discovery in Data Warehousing

applications (referred to herein as KDW Knowledge

Discovery in Data Warehouse), since the data in a

warehouse are better prepared for data mining.

The teaching-learning process naturally produces

information about the status of a students activities in a

course. The study of this information and the decisions

based on this study characterize the ongoing assessment

of the learner.

In most computational environments for distance

learning involving some kind of student assessment, this

is done by collecting the students interactions with the

environment (the students actions). Analyzing the

students history of interactions can reveal how the

This paper discusses how the data warehouse and data

mining resources can be used for the assessment of

distance learning and proposes the MultiStar environment

for KDW to support this assessment.manner in which he conducts his studies influences the

extent to which he profits from the course. Today there is

a wide range of environments available for distancecourses. To identify how these environments assess the

students assimilation, a survey was made of the ones

most frequently cited in the literature, as documented by

Several studies focus on supporting studentassessment, among them those of [5, 6] and [7]. Some

studies apply data mining resources to Web log

information [8, 9, 10 11].

1 MPhil scholarship-CAPES/Brazil

0-473-08801-0/01 $20.00 2002 IEEE40
http://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]


2/6

[12]. Five mechanisms to support the ongoing assessment

of distance learning were identified through this survey:

tracking of the students actions; redirectioning through evaluation; records of messages from lists; records of messages from forums;

records of messages from chats.The results of this survey show a tendency for these

environments to support the tracking of some student

activities to monitor his learning.

Most of these environments contain a small set of

information that tracks the path the student has taken

during the course. This set varies from one

environment.to another, according to a criterion not

divulged by its designers.

Although there is no standard set of requisites to assess

the students learning, there are clearly two types of

information to guide the implementation of ongoing

assessment of learning in distance learning environments:

Information about the students actions andcommunication [13].This information can aid in understanding how the

students interactions with the environment and with

other course participants influence his learning. Two

types of student interaction can be identified:

Student-Person Interactions: which are those inwhich the student interacts with other course

participants, such as the teacher, the assistant

teacher or another student, through some

communication mechanism. With regard to these

interactions, it is interesting to know, for

instance, the subject of the message and the

mechanism (chat, email, list, forum, etc.)employed.

Student-Material Interactions: which are those inwhich the student interacts with the didactic

material (content pages, tests, exercises, etc.).

About these interactions, it is interesting to

know, for example, how much time was spent on

them, if the interaction consisted ofdownloading

or uploading, which discipline the material

belongs to, what link was used to access the

material, etc.

Information about the students activities in thecourse [8] This kind of information, which depends

on a rule established by the teacher, strongly

influences in determining whether or not the student

has actually learned. Each activity proposed by the

teacher may have a result: for instance, participation

or not in a conference, the grade given for an

assignment, and so on. This type of information

depends on the activities proposed for the course and

the way the teacher has chosen to validate them, i.e.,

the criterion used to decide whether or not the student

has carried them out.

3. Ongoing Assessment of Distance Learning

using Data Warehouse Resources

The relevant information for ongoing assessment ofdistance learning can be stored in a data warehouse to

support management decisions. This study explores the

use of a data warehouse with these characteristics for the

application of data mining techniques, allowing for

patterns of student behavior to be identified, thereby

favoring decision making for ongoing assessment of the

student.

In this work, the modeling of the data warehouse

follows the fact constellation schema [2], incorporating

generalization hierarchies for fact or dimensions tables of

the data warehouse.

Figure 1 constitutes part of the data warehouse that was

developed based on the information discussed in theprevious section. The gray boxes in these figures

represent fact tables, i.e., tables that store information

about a subject, about which measures (or facts) are

defined (highlighted in bold). The remaining boxes

represent the dimension tables from which one wishes to

store the values that determine the fact table measures.

The representation of a fact table with its dimension tables

is called Star Schema. Part A and B of Figure 1 represent

two star schemas.

Figure 1. Fact constellation schema for the Activity

and Personal Interaction.

41


3/6

Information about the activities developed by the

student during the course can be stored in the data

warehouse, as illustrated in part A of Figure 1, while

information about the student-person interactions can

follow the model shown in part B of Figure 1.

ThePersonalInteraction fact table shown in part B of

Figure 1 specializes in 4 different interactions:InteractionViaChat, InteractionViaEmail,

InteractionViaList and InteractionViaForum. The

semantics of this hierarchical structure is translated into

the measures and dimensions of the specialized facts.

These fact tables contain all the dimensions and measures

of the PersonalInteraction. In analytical terms, this

represents the possibility of examining, in each fact of the

specialization, the .dimensions and measures common to

all the personal interactions as well as the specific

information about each interaction (via chat, via email,

via list or via forum), considering the instances pertinent

to the fact table in question. For analytical purposes, the

PersonalInteraction fact table is used when one wishes toanalyze measures and attributes common to all the types

of personal interaction.

An analysis of Figure 1 reveals that the stars of the

Activity and PersonalInteraction facts have common

dimensions: Student, Course, Discipline, Institution,

Group and Time. Joining these two stars forms a

constellation with two facts that share six dimensions.

This union is advantageous because, in addition to

avoiding the duplication of data, in practice it means that

the measures and dimensions of these two facts can be

analysed jointly, crossing information about the

interactions and activities developed by the students. One

kind of analysis that can be made, for example, is to check

if the students interactions influence in the performance

of the course activities.

Figure 2 illustrates the fact constellation schema of thedata warehouse developed to assess distance learning. A

fact constellation is a collection of stars.

In addition to the information about activities and

personal interactions, this data warehouse contains the

following information:

the students interaction (access) with the didacticmaterial (StudentMaterialInteraction fact table-

centered), involving the attributes

DurationOfTheAccess, LinkOfTheMaterialAccessed,

TypeOfAccess (download or upload), etc.

the tests the student has taken (Test fact table-centered), with the attributes Grade,

NumberOfIncorrectly AnsweredQuestions, etc. and whether the student has passed the tests upon

conclusion of a discipline (Approval fact table-

centered), with the attributesDropped-out?, Passed?,

TemporarilySuspended?, etc.

For purposes of legibility, Figure 2 groups the Student,

Course, Discipline, Institution, Time and Group

dimensions shared by all the facts into one entity to avoid

the pollution caused by linking.

The data warehouse in Figure 2 shows various indirect

Figure 2. Fact constellation for ongoing assessement.

42


4/6

Figures 3 and 4 exemplify the use of the MultiStar

environment for knowledge discovery in the data

warehouse in Figure 2. These figures portray how the

selection and mining of information in this environment

can be performed. Field 1 of Figure 3 represents the fact

tables of Figure 2 which, upon being expanded

(fields 2, 3 and 4), show the attributes that represent thesubjects subjected to analysis in the fact table (called

measures or facts) and information about the related

dimension tables.

relationships among the fact tables. This opens up a wide

range of possibilities when combining measures and

dimensions to carry out analyses, e.g.,

analyze whether there is a relation between astudents score, his personal interactions and his

accessing of the didactic material (involving the Test,

PersonalInteraction and StudentMaterialInteractionfacts);

verify the influence of factors such as communicationand study on learning (involving the

PersonalInteraction and StudentMaterialInteraction

facts);

discover if the type of connection a student possessesinfluences the number of times he accesses the

environment (involving the Student dimension and

the StudentMaterialInteraction fact);

find activities that are more effective in givencourses, age groups, level of schooling, etc.

(involving the Course and Student dimensions and

theActivity fact).These analyses can be made using the environment for

Knowledge Discovery in Data Warehouses (KDW)

described in the following section.

4. A KDW Application for Assessment of

Distance Learning Figure 3. MultiStar: selecting information.

The purpose of the data selection process illustrated in

Figure 3 is to support an analysis of the influence of the

chat interactions on the students activities. Thus, a

selection was made in the data warehouse of the Student

dimension common to the Activity (field 2), Approval

(field 3) and PersonalInteraction (field 4) fact tables, theTypeOfInteraction and Reply? measures in the

PersonalInteraction fact table, the Passed? measure of

the Approvalfact table, and the Accomplished? measure

of the Activity fact table. This analysis was restricted to

students of the ATA Institution during the period of 1999

to 2001. This led to the creation of filters (field 5) for the

attribute Name of the dimension Institution (field 6) and

for the attribute Yearof the dimension Time (field 7), both

of which are attributes of dimensions common to the three

fact tables.

Commercial tools can be used to carry out

management analyses in the data warehouse presented in

the previous section; however, they support simple

analyses, i.e., using only one fact and its dimension tables,

e.g., identify the profiles of students more prone to

dropping out of a course (involving the Studentdimension

table and theApprovalfact table).

However, there are important analyses that can be

performed in this warehouse which require a comparison

of the different aspects of the students learning process.

Examples of this type of analysis were given in the

previous section.

To support this type of broad analysis, i.e., those

involving more than one fact (star), an environment called

MultiStar was developed for knowledge discovery [14].

This environment allows information to be selected in

which data mining tasks will be applied, providing

resources for the recognition of fact constellations and thetreatment of generalization hierarchies. By recognizing

.fact constellations, MultiStar allows for analyses

involving facts that belong to the same constellation, i.e.,

facts that share dimensions. The treatment of

generalization hierarchies involving the relationship of

inheritance among the fact or dimension tables of a data

warehouse does not require the user to understand the

concept on which it is based.

The information selected is stored in a data cube2

called Interactions and Activities, which contains all the

attributes of the Student dimension table (as shown in

Figure 1) and the measures cited below.

In the MultiStar environment, for a generalization

hierarchy between fact or dimension tables, characteristics

inherited from the parent tables are displayed

automatically in the child tables, making the hierarchies

43

2 A data cube [4] is a structure composed of dimensions and facts

organized to facilitate analyses of the data.


5/6

The data mining task chosen was Classification, with

the purpose of classifying the student according to the

measurePassed?.

clear to the user. With regard to the fact constellations,

when a dimension or measure is selected, the MultiStar

environment allows for the selection of only the fact

tables that are related directly or indirectly with the

selected information.

When this mining task is performed, MultiStar

textually presents the patterns it finds. The patterns

resulting from the classification task are expressed

through rules, as shown in the example below:IF Accomplished? = yes, and

TypeOfConnection = superfast, and

TypeOfInteraction = chat, and

Reply? = yes

THEN Passed? = yes

The number of cases in which a rule occurs and the

degree of reliability of the rule are indicated for each rule

found.

5. Conclusions

This paper discusses the relevant information for

ongoing assessment of learning in computational distancelearning environments, proposing a solution to aid in

those ongoing assessment through the use of data

warehouse and data mining resources. Modeling of a data

warehouse was presented to illustrate the information

identified, as well as the MultiStar environment, which

allows for knowledge discovery in this data warehouse.

The authors intend to present the results of the

application of data mining tasks in the next version of the

environment in a more user intuitive form, using graphic

resources.

Figure 4. MultiStar: mining data.

Once the data has been selected, MultiStar provides

resources for the application of data mining tasks so that

patterns can be extracted based on those data. Figure 4

shows the interface for the application of data mining onthe data selected in Figure 3.

An intelligent tutor can also be developed to

automatically guide the student in his learning process,

based on the results of the data mining tasks applied to the

data warehouse discussed herein.In Field 1 of Figure 4, the user selects the cube to be

analyzed (the Interactions and Activities cube was

selected here). Field 2 shows the attributes of the selected

cube (dimensions and measures). The user must choose

one attribute from each dimension of the cube (the

attribute TypeOfConnection from the Student dimension

table was selected). These attributes together with the

measures of the cube (Accomplished? from the Activity

.fact table, Passed? from the Approval fact table, and

TypeOfInteraction and Reply? from the

PersonalInteraction fact table, in our example) compose a

view to be mined. Field 5 shows the cube filter selected.A mining task is selected in Field 3, and the parameters

for this task are defined in Field 4. The data mining tasks

available in the environment are Association [15],

Classification [16] and Clustering [17]. Each of these

tasks allows the data to be analyzed from a different

standpoint.

6. References

[1] W.H. Inmon, Building the Data Warehouse, John

Wiley & Sons, 2nd edition, 1996

[2] R. Kimball, The Data Warehouse Toolkit Practical

Techniques for Building Dimensional Data Warehouses,

John Wiley Professio, 1996

[3] R. Kimball, L. Reeves, M. Ross and W. Thornthwaite,

The Data Warehouse Lifecycle Toolkit, Willey ComputerPublishings, 1998

[4] J. Han and M. Kamber, Data mining Concepts and

Techniques, 1 st edition, New York: Morgan Kaufmann,

2000

[5] K. Nurmela, E. Lehtinen, T. Palonen, Evaluating

CSCL Log Files by Social Network Analysis, In:

44


6/6

Computer Support for Collaborative Learning, Stanford,

USA, 1999. Proceedings. p. 434-441

[6] M. Rahkila and M. Karjalainen, Evaluation of

Learning in Computer Based Education Using Log

Systems. In: ASEE/IEEE Frontiers in Education

Conference, 29., San Juan, Puerto Rico, 1999, Procedings.

p. 16-21

[7] S.L. Tanimoto, Towards an Ontology for Alternative

Assessment in Education. Metting of IEEE Learning

Technology Standards Committee, Pittsburgh, USA, 1998

[8] J. Pei, J. Han, B. Mortazavi-Asl and H. Zhu, Mining

Access Patterns Efficiently from Web Logs, In: Pacific-

Asia Conference on Knowledge Discovery and Data

Mining, Kyoto, Japan, 2000, Proceedings. p. 396-407

[9] O.R. Zaiane, M. Xin and J. Han, Discovering Web

Access Patterns and Trends by Applying OLAP and Data

Mining Technology on Web Logs, In:Advances in Digital

Libraries Conference, Santa Barbara, USA, 1998,Proceedings. p. 19-29

[11] B. Mortazavi-Asl, Discovering and Mining User

Web-Page Traversal Patterns, MPhil. Dissertation, Simon

Fraser University, 1999, p. 93

[12] D.R. Silva and M.T.P. Vieira, An Ongoing

Assessment Model in Distance Learning, In:Proceedings

of Internet and Multimedia Systems and Applications,

Honolulu, USA, 2001

[13] C. Vrasidas and M.S. McIsaac, Factors Influencing

Interaction in an Online Course; The American Journal of

Distance Education, v. 13, n. 3, 1999.

[14] D.R. Silva, A Tool for Knowledge Discovery using

Data Warehousing and its Application on the Ongoing

Assessment of Distance Learning. MPhil. Dissertation,

Departament of Computer Science, UFSCar, So Carlos,

Brazil, 2002, 108p. (In portuguese)

[15] R. Agrawal, T. Imielinski and A. Swami, Mining

Associations between Sets of Items in Massive Databases.

In: ACM SIGMOD International Conference on the

Management of Data. New York, USA, 1993.

Proceedings. NY: ACM Press, 1993, p. 207--216.

[16] J.R. Quinlan, Induction of Decision Trees. MachineLearning, 1:81-106, 1986

[17] P. Cheeseman and J. Stutz, Bayesian Classification

(AutoClass): Theory and Results, In: Advances in

Knowledge Discovery in Databases, 1995. 10.,

Proceedings. AAAI Press, p. 61-83, 1995

45

Documents

Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning