Upload
aman4u
View
223
Download
0
Embed Size (px)
Citation preview
7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning
1/6
Using Data Warehouse and Data Mining Resources for
Ongoing Assessment of Distance Learning
Daniela Resende Silva1
E-mail: [email protected]
Marina Teresa Pires Vieira
E-mail: [email protected]
Department of Computer Sciences
UFSCar - Federal University of So CarlosRod. Washington Lus, Km 235
Caixa Postal 676
13565-905 / So Carlos SP Brazil
Phone/Fax:(55 16) 260-8232
Abstract The work proposed herein presents an approach that
differs from the existing ones for the ongoing assessmentof distance learning using some of the aspects relating to
those utilized in the above cited studies.
This paper discusses the use of Data Warehouse and
Data Mining resources to aid in the assessment of
distance learning of students enrolled in distance courses.
Information considered relevant for the assessment of
distance learning is presented, as is the modeling of a
data warehouse to store this information and the
MultiStar environment, which allows for knowledge
discovery to be performed in the data warehouse.
Section 2 provides a set of information to guide the
implementation of ongoing assessment of learning in
distance learning environments, while Section 3 briefly
discusses the modeling of a data warehouse based on the
set of information proposed. Section 4 presents the
implementation of this data warehouse using the
MultiStar environment, and finally, Section 5 lists our
conclusions to this paper.1. Introduction
2. Ongoing Assessment of Distance Learning
A variety of applications have benefited from the useof Data Warehousing technology [1, 2, 3] to support
management analyses, which can be obtained through the
use of Data Mining [4]. The joint use of Data
Warehousing and Data Mining techniques is a trend in
KDD Knowledge Discovery in Data Warehousing
applications (referred to herein as KDW Knowledge
Discovery in Data Warehouse), since the data in a
warehouse are better prepared for data mining.
The teaching-learning process naturally produces
information about the status of a students activities in a
course. The study of this information and the decisions
based on this study characterize the ongoing assessment
of the learner.
In most computational environments for distance
learning involving some kind of student assessment, this
is done by collecting the students interactions with the
environment (the students actions). Analyzing the
students history of interactions can reveal how the
This paper discusses how the data warehouse and data
mining resources can be used for the assessment of
distance learning and proposes the MultiStar environment
for KDW to support this assessment.manner in which he conducts his studies influences the
extent to which he profits from the course. Today there is
a wide range of environments available for distancecourses. To identify how these environments assess the
students assimilation, a survey was made of the ones
most frequently cited in the literature, as documented by
Several studies focus on supporting studentassessment, among them those of [5, 6] and [7]. Some
studies apply data mining resources to Web log
information [8, 9, 10 11].
1 MPhil scholarship-CAPES/Brazil
0-473-08801-0/01 $20.00 2002 IEEE40
http://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning
2/6
[12]. Five mechanisms to support the ongoing assessment
of distance learning were identified through this survey:
tracking of the students actions; redirectioning through evaluation; records of messages from lists; records of messages from forums;
records of messages from chats.The results of this survey show a tendency for these
environments to support the tracking of some student
activities to monitor his learning.
Most of these environments contain a small set of
information that tracks the path the student has taken
during the course. This set varies from one
environment.to another, according to a criterion not
divulged by its designers.
Although there is no standard set of requisites to assess
the students learning, there are clearly two types of
information to guide the implementation of ongoing
assessment of learning in distance learning environments:
Information about the students actions andcommunication [13].This information can aid in understanding how the
students interactions with the environment and with
other course participants influence his learning. Two
types of student interaction can be identified:
Student-Person Interactions: which are those inwhich the student interacts with other course
participants, such as the teacher, the assistant
teacher or another student, through some
communication mechanism. With regard to these
interactions, it is interesting to know, for
instance, the subject of the message and the
mechanism (chat, email, list, forum, etc.)employed.
Student-Material Interactions: which are those inwhich the student interacts with the didactic
material (content pages, tests, exercises, etc.).
About these interactions, it is interesting to
know, for example, how much time was spent on
them, if the interaction consisted ofdownloading
or uploading, which discipline the material
belongs to, what link was used to access the
material, etc.
Information about the students activities in thecourse [8] This kind of information, which depends
on a rule established by the teacher, strongly
influences in determining whether or not the student
has actually learned. Each activity proposed by the
teacher may have a result: for instance, participation
or not in a conference, the grade given for an
assignment, and so on. This type of information
depends on the activities proposed for the course and
the way the teacher has chosen to validate them, i.e.,
the criterion used to decide whether or not the student
has carried them out.
3. Ongoing Assessment of Distance Learning
using Data Warehouse Resources
The relevant information for ongoing assessment ofdistance learning can be stored in a data warehouse to
support management decisions. This study explores the
use of a data warehouse with these characteristics for the
application of data mining techniques, allowing for
patterns of student behavior to be identified, thereby
favoring decision making for ongoing assessment of the
student.
In this work, the modeling of the data warehouse
follows the fact constellation schema [2], incorporating
generalization hierarchies for fact or dimensions tables of
the data warehouse.
Figure 1 constitutes part of the data warehouse that was
developed based on the information discussed in theprevious section. The gray boxes in these figures
represent fact tables, i.e., tables that store information
about a subject, about which measures (or facts) are
defined (highlighted in bold). The remaining boxes
represent the dimension tables from which one wishes to
store the values that determine the fact table measures.
The representation of a fact table with its dimension tables
is called Star Schema. Part A and B of Figure 1 represent
two star schemas.
Figure 1. Fact constellation schema for the Activity
and Personal Interaction.
41
7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning
3/6
Information about the activities developed by the
student during the course can be stored in the data
warehouse, as illustrated in part A of Figure 1, while
information about the student-person interactions can
follow the model shown in part B of Figure 1.
ThePersonalInteraction fact table shown in part B of
Figure 1 specializes in 4 different interactions:InteractionViaChat, InteractionViaEmail,
InteractionViaList and InteractionViaForum. The
semantics of this hierarchical structure is translated into
the measures and dimensions of the specialized facts.
These fact tables contain all the dimensions and measures
of the PersonalInteraction. In analytical terms, this
represents the possibility of examining, in each fact of the
specialization, the .dimensions and measures common to
all the personal interactions as well as the specific
information about each interaction (via chat, via email,
via list or via forum), considering the instances pertinent
to the fact table in question. For analytical purposes, the
PersonalInteraction fact table is used when one wishes toanalyze measures and attributes common to all the types
of personal interaction.
An analysis of Figure 1 reveals that the stars of the
Activity and PersonalInteraction facts have common
dimensions: Student, Course, Discipline, Institution,
Group and Time. Joining these two stars forms a
constellation with two facts that share six dimensions.
This union is advantageous because, in addition to
avoiding the duplication of data, in practice it means that
the measures and dimensions of these two facts can be
analysed jointly, crossing information about the
interactions and activities developed by the students. One
kind of analysis that can be made, for example, is to check
if the students interactions influence in the performance
of the course activities.
Figure 2 illustrates the fact constellation schema of thedata warehouse developed to assess distance learning. A
fact constellation is a collection of stars.
In addition to the information about activities and
personal interactions, this data warehouse contains the
following information:
the students interaction (access) with the didacticmaterial (StudentMaterialInteraction fact table-
centered), involving the attributes
DurationOfTheAccess, LinkOfTheMaterialAccessed,
TypeOfAccess (download or upload), etc.
the tests the student has taken (Test fact table-centered), with the attributes Grade,
NumberOfIncorrectly AnsweredQuestions, etc. and whether the student has passed the tests upon
conclusion of a discipline (Approval fact table-
centered), with the attributesDropped-out?, Passed?,
TemporarilySuspended?, etc.
For purposes of legibility, Figure 2 groups the Student,
Course, Discipline, Institution, Time and Group
dimensions shared by all the facts into one entity to avoid
the pollution caused by linking.
The data warehouse in Figure 2 shows various indirect
Figure 2. Fact constellation for ongoing assessement.
42
7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning
4/6
Figures 3 and 4 exemplify the use of the MultiStar
environment for knowledge discovery in the data
warehouse in Figure 2. These figures portray how the
selection and mining of information in this environment
can be performed. Field 1 of Figure 3 represents the fact
tables of Figure 2 which, upon being expanded
(fields 2, 3 and 4), show the attributes that represent thesubjects subjected to analysis in the fact table (called
measures or facts) and information about the related
dimension tables.
relationships among the fact tables. This opens up a wide
range of possibilities when combining measures and
dimensions to carry out analyses, e.g.,
analyze whether there is a relation between astudents score, his personal interactions and his
accessing of the didactic material (involving the Test,
PersonalInteraction and StudentMaterialInteractionfacts);
verify the influence of factors such as communicationand study on learning (involving the
PersonalInteraction and StudentMaterialInteraction
facts);
discover if the type of connection a student possessesinfluences the number of times he accesses the
environment (involving the Student dimension and
the StudentMaterialInteraction fact);
find activities that are more effective in givencourses, age groups, level of schooling, etc.
(involving the Course and Student dimensions and
theActivity fact).These analyses can be made using the environment for
Knowledge Discovery in Data Warehouses (KDW)
described in the following section.
4. A KDW Application for Assessment of
Distance Learning Figure 3. MultiStar: selecting information.
The purpose of the data selection process illustrated in
Figure 3 is to support an analysis of the influence of the
chat interactions on the students activities. Thus, a
selection was made in the data warehouse of the Student
dimension common to the Activity (field 2), Approval
(field 3) and PersonalInteraction (field 4) fact tables, theTypeOfInteraction and Reply? measures in the
PersonalInteraction fact table, the Passed? measure of
the Approvalfact table, and the Accomplished? measure
of the Activity fact table. This analysis was restricted to
students of the ATA Institution during the period of 1999
to 2001. This led to the creation of filters (field 5) for the
attribute Name of the dimension Institution (field 6) and
for the attribute Yearof the dimension Time (field 7), both
of which are attributes of dimensions common to the three
fact tables.
Commercial tools can be used to carry out
management analyses in the data warehouse presented in
the previous section; however, they support simple
analyses, i.e., using only one fact and its dimension tables,
e.g., identify the profiles of students more prone to
dropping out of a course (involving the Studentdimension
table and theApprovalfact table).
However, there are important analyses that can be
performed in this warehouse which require a comparison
of the different aspects of the students learning process.
Examples of this type of analysis were given in the
previous section.
To support this type of broad analysis, i.e., those
involving more than one fact (star), an environment called
MultiStar was developed for knowledge discovery [14].
This environment allows information to be selected in
which data mining tasks will be applied, providing
resources for the recognition of fact constellations and thetreatment of generalization hierarchies. By recognizing
.fact constellations, MultiStar allows for analyses
involving facts that belong to the same constellation, i.e.,
facts that share dimensions. The treatment of
generalization hierarchies involving the relationship of
inheritance among the fact or dimension tables of a data
warehouse does not require the user to understand the
concept on which it is based.
The information selected is stored in a data cube2
called Interactions and Activities, which contains all the
attributes of the Student dimension table (as shown in
Figure 1) and the measures cited below.
In the MultiStar environment, for a generalization
hierarchy between fact or dimension tables, characteristics
inherited from the parent tables are displayed
automatically in the child tables, making the hierarchies
43
2 A data cube [4] is a structure composed of dimensions and facts
organized to facilitate analyses of the data.
7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning
5/6
The data mining task chosen was Classification, with
the purpose of classifying the student according to the
measurePassed?.
clear to the user. With regard to the fact constellations,
when a dimension or measure is selected, the MultiStar
environment allows for the selection of only the fact
tables that are related directly or indirectly with the
selected information.
When this mining task is performed, MultiStar
textually presents the patterns it finds. The patterns
resulting from the classification task are expressed
through rules, as shown in the example below:IF Accomplished? = yes, and
TypeOfConnection = superfast, and
TypeOfInteraction = chat, and
Reply? = yes
THEN Passed? = yes
The number of cases in which a rule occurs and the
degree of reliability of the rule are indicated for each rule
found.
5. Conclusions
This paper discusses the relevant information for
ongoing assessment of learning in computational distancelearning environments, proposing a solution to aid in
those ongoing assessment through the use of data
warehouse and data mining resources. Modeling of a data
warehouse was presented to illustrate the information
identified, as well as the MultiStar environment, which
allows for knowledge discovery in this data warehouse.
The authors intend to present the results of the
application of data mining tasks in the next version of the
environment in a more user intuitive form, using graphic
resources.
Figure 4. MultiStar: mining data.
Once the data has been selected, MultiStar provides
resources for the application of data mining tasks so that
patterns can be extracted based on those data. Figure 4
shows the interface for the application of data mining onthe data selected in Figure 3.
An intelligent tutor can also be developed to
automatically guide the student in his learning process,
based on the results of the data mining tasks applied to the
data warehouse discussed herein.In Field 1 of Figure 4, the user selects the cube to be
analyzed (the Interactions and Activities cube was
selected here). Field 2 shows the attributes of the selected
cube (dimensions and measures). The user must choose
one attribute from each dimension of the cube (the
attribute TypeOfConnection from the Student dimension
table was selected). These attributes together with the
measures of the cube (Accomplished? from the Activity
.fact table, Passed? from the Approval fact table, and
TypeOfInteraction and Reply? from the
PersonalInteraction fact table, in our example) compose a
view to be mined. Field 5 shows the cube filter selected.A mining task is selected in Field 3, and the parameters
for this task are defined in Field 4. The data mining tasks
available in the environment are Association [15],
Classification [16] and Clustering [17]. Each of these
tasks allows the data to be analyzed from a different
standpoint.
6. References
[1] W.H. Inmon, Building the Data Warehouse, John
Wiley & Sons, 2nd edition, 1996
[2] R. Kimball, The Data Warehouse Toolkit Practical
Techniques for Building Dimensional Data Warehouses,
John Wiley Professio, 1996
[3] R. Kimball, L. Reeves, M. Ross and W. Thornthwaite,
The Data Warehouse Lifecycle Toolkit, Willey ComputerPublishings, 1998
[4] J. Han and M. Kamber, Data mining Concepts and
Techniques, 1 st edition, New York: Morgan Kaufmann,
2000
[5] K. Nurmela, E. Lehtinen, T. Palonen, Evaluating
CSCL Log Files by Social Network Analysis, In:
44
7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning
6/6
Computer Support for Collaborative Learning, Stanford,
USA, 1999. Proceedings. p. 434-441
[6] M. Rahkila and M. Karjalainen, Evaluation of
Learning in Computer Based Education Using Log
Systems. In: ASEE/IEEE Frontiers in Education
Conference, 29., San Juan, Puerto Rico, 1999, Procedings.
p. 16-21
[7] S.L. Tanimoto, Towards an Ontology for Alternative
Assessment in Education. Metting of IEEE Learning
Technology Standards Committee, Pittsburgh, USA, 1998
[8] J. Pei, J. Han, B. Mortazavi-Asl and H. Zhu, Mining
Access Patterns Efficiently from Web Logs, In: Pacific-
Asia Conference on Knowledge Discovery and Data
Mining, Kyoto, Japan, 2000, Proceedings. p. 396-407
[9] O.R. Zaiane, M. Xin and J. Han, Discovering Web
Access Patterns and Trends by Applying OLAP and Data
Mining Technology on Web Logs, In:Advances in Digital
Libraries Conference, Santa Barbara, USA, 1998,Proceedings. p. 19-29
[11] B. Mortazavi-Asl, Discovering and Mining User
Web-Page Traversal Patterns, MPhil. Dissertation, Simon
Fraser University, 1999, p. 93
[12] D.R. Silva and M.T.P. Vieira, An Ongoing
Assessment Model in Distance Learning, In:Proceedings
of Internet and Multimedia Systems and Applications,
Honolulu, USA, 2001
[13] C. Vrasidas and M.S. McIsaac, Factors Influencing
Interaction in an Online Course; The American Journal of
Distance Education, v. 13, n. 3, 1999.
[14] D.R. Silva, A Tool for Knowledge Discovery using
Data Warehousing and its Application on the Ongoing
Assessment of Distance Learning. MPhil. Dissertation,
Departament of Computer Science, UFSCar, So Carlos,
Brazil, 2002, 108p. (In portuguese)
[15] R. Agrawal, T. Imielinski and A. Swami, Mining
Associations between Sets of Items in Massive Databases.
In: ACM SIGMOD International Conference on the
Management of Data. New York, USA, 1993.
Proceedings. NY: ACM Press, 1993, p. 207--216.
[16] J.R. Quinlan, Induction of Decision Trees. MachineLearning, 1:81-106, 1986
[17] P. Cheeseman and J. Stutz, Bayesian Classification
(AutoClass): Theory and Results, In: Advances in
Knowledge Discovery in Databases, 1995. 10.,
Proceedings. AAAI Press, p. 61-83, 1995
45