Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

  • Upload
    aman4u

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

  • 7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

    1/6

    Using Data Warehouse and Data Mining Resources for

    Ongoing Assessment of Distance Learning

    Daniela Resende Silva1

    E-mail: [email protected]

    Marina Teresa Pires Vieira

    E-mail: [email protected]

    Department of Computer Sciences

    UFSCar - Federal University of So CarlosRod. Washington Lus, Km 235

    Caixa Postal 676

    13565-905 / So Carlos SP Brazil

    Phone/Fax:(55 16) 260-8232

    Abstract The work proposed herein presents an approach that

    differs from the existing ones for the ongoing assessmentof distance learning using some of the aspects relating to

    those utilized in the above cited studies.

    This paper discusses the use of Data Warehouse and

    Data Mining resources to aid in the assessment of

    distance learning of students enrolled in distance courses.

    Information considered relevant for the assessment of

    distance learning is presented, as is the modeling of a

    data warehouse to store this information and the

    MultiStar environment, which allows for knowledge

    discovery to be performed in the data warehouse.

    Section 2 provides a set of information to guide the

    implementation of ongoing assessment of learning in

    distance learning environments, while Section 3 briefly

    discusses the modeling of a data warehouse based on the

    set of information proposed. Section 4 presents the

    implementation of this data warehouse using the

    MultiStar environment, and finally, Section 5 lists our

    conclusions to this paper.1. Introduction

    2. Ongoing Assessment of Distance Learning

    A variety of applications have benefited from the useof Data Warehousing technology [1, 2, 3] to support

    management analyses, which can be obtained through the

    use of Data Mining [4]. The joint use of Data

    Warehousing and Data Mining techniques is a trend in

    KDD Knowledge Discovery in Data Warehousing

    applications (referred to herein as KDW Knowledge

    Discovery in Data Warehouse), since the data in a

    warehouse are better prepared for data mining.

    The teaching-learning process naturally produces

    information about the status of a students activities in a

    course. The study of this information and the decisions

    based on this study characterize the ongoing assessment

    of the learner.

    In most computational environments for distance

    learning involving some kind of student assessment, this

    is done by collecting the students interactions with the

    environment (the students actions). Analyzing the

    students history of interactions can reveal how the

    This paper discusses how the data warehouse and data

    mining resources can be used for the assessment of

    distance learning and proposes the MultiStar environment

    for KDW to support this assessment.manner in which he conducts his studies influences the

    extent to which he profits from the course. Today there is

    a wide range of environments available for distancecourses. To identify how these environments assess the

    students assimilation, a survey was made of the ones

    most frequently cited in the literature, as documented by

    Several studies focus on supporting studentassessment, among them those of [5, 6] and [7]. Some

    studies apply data mining resources to Web log

    information [8, 9, 10 11].

    1 MPhil scholarship-CAPES/Brazil

    0-473-08801-0/01 $20.00 2002 IEEE40

    http://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]://../10_Theme%201_/[email protected]
  • 7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

    2/6

    [12]. Five mechanisms to support the ongoing assessment

    of distance learning were identified through this survey:

    tracking of the students actions; redirectioning through evaluation; records of messages from lists; records of messages from forums;

    records of messages from chats.The results of this survey show a tendency for these

    environments to support the tracking of some student

    activities to monitor his learning.

    Most of these environments contain a small set of

    information that tracks the path the student has taken

    during the course. This set varies from one

    environment.to another, according to a criterion not

    divulged by its designers.

    Although there is no standard set of requisites to assess

    the students learning, there are clearly two types of

    information to guide the implementation of ongoing

    assessment of learning in distance learning environments:

    Information about the students actions andcommunication [13].This information can aid in understanding how the

    students interactions with the environment and with

    other course participants influence his learning. Two

    types of student interaction can be identified:

    Student-Person Interactions: which are those inwhich the student interacts with other course

    participants, such as the teacher, the assistant

    teacher or another student, through some

    communication mechanism. With regard to these

    interactions, it is interesting to know, for

    instance, the subject of the message and the

    mechanism (chat, email, list, forum, etc.)employed.

    Student-Material Interactions: which are those inwhich the student interacts with the didactic

    material (content pages, tests, exercises, etc.).

    About these interactions, it is interesting to

    know, for example, how much time was spent on

    them, if the interaction consisted ofdownloading

    or uploading, which discipline the material

    belongs to, what link was used to access the

    material, etc.

    Information about the students activities in thecourse [8] This kind of information, which depends

    on a rule established by the teacher, strongly

    influences in determining whether or not the student

    has actually learned. Each activity proposed by the

    teacher may have a result: for instance, participation

    or not in a conference, the grade given for an

    assignment, and so on. This type of information

    depends on the activities proposed for the course and

    the way the teacher has chosen to validate them, i.e.,

    the criterion used to decide whether or not the student

    has carried them out.

    3. Ongoing Assessment of Distance Learning

    using Data Warehouse Resources

    The relevant information for ongoing assessment ofdistance learning can be stored in a data warehouse to

    support management decisions. This study explores the

    use of a data warehouse with these characteristics for the

    application of data mining techniques, allowing for

    patterns of student behavior to be identified, thereby

    favoring decision making for ongoing assessment of the

    student.

    In this work, the modeling of the data warehouse

    follows the fact constellation schema [2], incorporating

    generalization hierarchies for fact or dimensions tables of

    the data warehouse.

    Figure 1 constitutes part of the data warehouse that was

    developed based on the information discussed in theprevious section. The gray boxes in these figures

    represent fact tables, i.e., tables that store information

    about a subject, about which measures (or facts) are

    defined (highlighted in bold). The remaining boxes

    represent the dimension tables from which one wishes to

    store the values that determine the fact table measures.

    The representation of a fact table with its dimension tables

    is called Star Schema. Part A and B of Figure 1 represent

    two star schemas.

    Figure 1. Fact constellation schema for the Activity

    and Personal Interaction.

    41

  • 7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

    3/6

    Information about the activities developed by the

    student during the course can be stored in the data

    warehouse, as illustrated in part A of Figure 1, while

    information about the student-person interactions can

    follow the model shown in part B of Figure 1.

    ThePersonalInteraction fact table shown in part B of

    Figure 1 specializes in 4 different interactions:InteractionViaChat, InteractionViaEmail,

    InteractionViaList and InteractionViaForum. The

    semantics of this hierarchical structure is translated into

    the measures and dimensions of the specialized facts.

    These fact tables contain all the dimensions and measures

    of the PersonalInteraction. In analytical terms, this

    represents the possibility of examining, in each fact of the

    specialization, the .dimensions and measures common to

    all the personal interactions as well as the specific

    information about each interaction (via chat, via email,

    via list or via forum), considering the instances pertinent

    to the fact table in question. For analytical purposes, the

    PersonalInteraction fact table is used when one wishes toanalyze measures and attributes common to all the types

    of personal interaction.

    An analysis of Figure 1 reveals that the stars of the

    Activity and PersonalInteraction facts have common

    dimensions: Student, Course, Discipline, Institution,

    Group and Time. Joining these two stars forms a

    constellation with two facts that share six dimensions.

    This union is advantageous because, in addition to

    avoiding the duplication of data, in practice it means that

    the measures and dimensions of these two facts can be

    analysed jointly, crossing information about the

    interactions and activities developed by the students. One

    kind of analysis that can be made, for example, is to check

    if the students interactions influence in the performance

    of the course activities.

    Figure 2 illustrates the fact constellation schema of thedata warehouse developed to assess distance learning. A

    fact constellation is a collection of stars.

    In addition to the information about activities and

    personal interactions, this data warehouse contains the

    following information:

    the students interaction (access) with the didacticmaterial (StudentMaterialInteraction fact table-

    centered), involving the attributes

    DurationOfTheAccess, LinkOfTheMaterialAccessed,

    TypeOfAccess (download or upload), etc.

    the tests the student has taken (Test fact table-centered), with the attributes Grade,

    NumberOfIncorrectly AnsweredQuestions, etc. and whether the student has passed the tests upon

    conclusion of a discipline (Approval fact table-

    centered), with the attributesDropped-out?, Passed?,

    TemporarilySuspended?, etc.

    For purposes of legibility, Figure 2 groups the Student,

    Course, Discipline, Institution, Time and Group

    dimensions shared by all the facts into one entity to avoid

    the pollution caused by linking.

    The data warehouse in Figure 2 shows various indirect

    Figure 2. Fact constellation for ongoing assessement.

    42

  • 7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

    4/6

    Figures 3 and 4 exemplify the use of the MultiStar

    environment for knowledge discovery in the data

    warehouse in Figure 2. These figures portray how the

    selection and mining of information in this environment

    can be performed. Field 1 of Figure 3 represents the fact

    tables of Figure 2 which, upon being expanded

    (fields 2, 3 and 4), show the attributes that represent thesubjects subjected to analysis in the fact table (called

    measures or facts) and information about the related

    dimension tables.

    relationships among the fact tables. This opens up a wide

    range of possibilities when combining measures and

    dimensions to carry out analyses, e.g.,

    analyze whether there is a relation between astudents score, his personal interactions and his

    accessing of the didactic material (involving the Test,

    PersonalInteraction and StudentMaterialInteractionfacts);

    verify the influence of factors such as communicationand study on learning (involving the

    PersonalInteraction and StudentMaterialInteraction

    facts);

    discover if the type of connection a student possessesinfluences the number of times he accesses the

    environment (involving the Student dimension and

    the StudentMaterialInteraction fact);

    find activities that are more effective in givencourses, age groups, level of schooling, etc.

    (involving the Course and Student dimensions and

    theActivity fact).These analyses can be made using the environment for

    Knowledge Discovery in Data Warehouses (KDW)

    described in the following section.

    4. A KDW Application for Assessment of

    Distance Learning Figure 3. MultiStar: selecting information.

    The purpose of the data selection process illustrated in

    Figure 3 is to support an analysis of the influence of the

    chat interactions on the students activities. Thus, a

    selection was made in the data warehouse of the Student

    dimension common to the Activity (field 2), Approval

    (field 3) and PersonalInteraction (field 4) fact tables, theTypeOfInteraction and Reply? measures in the

    PersonalInteraction fact table, the Passed? measure of

    the Approvalfact table, and the Accomplished? measure

    of the Activity fact table. This analysis was restricted to

    students of the ATA Institution during the period of 1999

    to 2001. This led to the creation of filters (field 5) for the

    attribute Name of the dimension Institution (field 6) and

    for the attribute Yearof the dimension Time (field 7), both

    of which are attributes of dimensions common to the three

    fact tables.

    Commercial tools can be used to carry out

    management analyses in the data warehouse presented in

    the previous section; however, they support simple

    analyses, i.e., using only one fact and its dimension tables,

    e.g., identify the profiles of students more prone to

    dropping out of a course (involving the Studentdimension

    table and theApprovalfact table).

    However, there are important analyses that can be

    performed in this warehouse which require a comparison

    of the different aspects of the students learning process.

    Examples of this type of analysis were given in the

    previous section.

    To support this type of broad analysis, i.e., those

    involving more than one fact (star), an environment called

    MultiStar was developed for knowledge discovery [14].

    This environment allows information to be selected in

    which data mining tasks will be applied, providing

    resources for the recognition of fact constellations and thetreatment of generalization hierarchies. By recognizing

    .fact constellations, MultiStar allows for analyses

    involving facts that belong to the same constellation, i.e.,

    facts that share dimensions. The treatment of

    generalization hierarchies involving the relationship of

    inheritance among the fact or dimension tables of a data

    warehouse does not require the user to understand the

    concept on which it is based.

    The information selected is stored in a data cube2

    called Interactions and Activities, which contains all the

    attributes of the Student dimension table (as shown in

    Figure 1) and the measures cited below.

    In the MultiStar environment, for a generalization

    hierarchy between fact or dimension tables, characteristics

    inherited from the parent tables are displayed

    automatically in the child tables, making the hierarchies

    43

    2 A data cube [4] is a structure composed of dimensions and facts

    organized to facilitate analyses of the data.

  • 7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

    5/6

    The data mining task chosen was Classification, with

    the purpose of classifying the student according to the

    measurePassed?.

    clear to the user. With regard to the fact constellations,

    when a dimension or measure is selected, the MultiStar

    environment allows for the selection of only the fact

    tables that are related directly or indirectly with the

    selected information.

    When this mining task is performed, MultiStar

    textually presents the patterns it finds. The patterns

    resulting from the classification task are expressed

    through rules, as shown in the example below:IF Accomplished? = yes, and

    TypeOfConnection = superfast, and

    TypeOfInteraction = chat, and

    Reply? = yes

    THEN Passed? = yes

    The number of cases in which a rule occurs and the

    degree of reliability of the rule are indicated for each rule

    found.

    5. Conclusions

    This paper discusses the relevant information for

    ongoing assessment of learning in computational distancelearning environments, proposing a solution to aid in

    those ongoing assessment through the use of data

    warehouse and data mining resources. Modeling of a data

    warehouse was presented to illustrate the information

    identified, as well as the MultiStar environment, which

    allows for knowledge discovery in this data warehouse.

    The authors intend to present the results of the

    application of data mining tasks in the next version of the

    environment in a more user intuitive form, using graphic

    resources.

    Figure 4. MultiStar: mining data.

    Once the data has been selected, MultiStar provides

    resources for the application of data mining tasks so that

    patterns can be extracted based on those data. Figure 4

    shows the interface for the application of data mining onthe data selected in Figure 3.

    An intelligent tutor can also be developed to

    automatically guide the student in his learning process,

    based on the results of the data mining tasks applied to the

    data warehouse discussed herein.In Field 1 of Figure 4, the user selects the cube to be

    analyzed (the Interactions and Activities cube was

    selected here). Field 2 shows the attributes of the selected

    cube (dimensions and measures). The user must choose

    one attribute from each dimension of the cube (the

    attribute TypeOfConnection from the Student dimension

    table was selected). These attributes together with the

    measures of the cube (Accomplished? from the Activity

    .fact table, Passed? from the Approval fact table, and

    TypeOfInteraction and Reply? from the

    PersonalInteraction fact table, in our example) compose a

    view to be mined. Field 5 shows the cube filter selected.A mining task is selected in Field 3, and the parameters

    for this task are defined in Field 4. The data mining tasks

    available in the environment are Association [15],

    Classification [16] and Clustering [17]. Each of these

    tasks allows the data to be analyzed from a different

    standpoint.

    6. References

    [1] W.H. Inmon, Building the Data Warehouse, John

    Wiley & Sons, 2nd edition, 1996

    [2] R. Kimball, The Data Warehouse Toolkit Practical

    Techniques for Building Dimensional Data Warehouses,

    John Wiley Professio, 1996

    [3] R. Kimball, L. Reeves, M. Ross and W. Thornthwaite,

    The Data Warehouse Lifecycle Toolkit, Willey ComputerPublishings, 1998

    [4] J. Han and M. Kamber, Data mining Concepts and

    Techniques, 1 st edition, New York: Morgan Kaufmann,

    2000

    [5] K. Nurmela, E. Lehtinen, T. Palonen, Evaluating

    CSCL Log Files by Social Network Analysis, In:

    44

  • 7/29/2019 Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning

    6/6

    Computer Support for Collaborative Learning, Stanford,

    USA, 1999. Proceedings. p. 434-441

    [6] M. Rahkila and M. Karjalainen, Evaluation of

    Learning in Computer Based Education Using Log

    Systems. In: ASEE/IEEE Frontiers in Education

    Conference, 29., San Juan, Puerto Rico, 1999, Procedings.

    p. 16-21

    [7] S.L. Tanimoto, Towards an Ontology for Alternative

    Assessment in Education. Metting of IEEE Learning

    Technology Standards Committee, Pittsburgh, USA, 1998

    [8] J. Pei, J. Han, B. Mortazavi-Asl and H. Zhu, Mining

    Access Patterns Efficiently from Web Logs, In: Pacific-

    Asia Conference on Knowledge Discovery and Data

    Mining, Kyoto, Japan, 2000, Proceedings. p. 396-407

    [9] O.R. Zaiane, M. Xin and J. Han, Discovering Web

    Access Patterns and Trends by Applying OLAP and Data

    Mining Technology on Web Logs, In:Advances in Digital

    Libraries Conference, Santa Barbara, USA, 1998,Proceedings. p. 19-29

    [11] B. Mortazavi-Asl, Discovering and Mining User

    Web-Page Traversal Patterns, MPhil. Dissertation, Simon

    Fraser University, 1999, p. 93

    [12] D.R. Silva and M.T.P. Vieira, An Ongoing

    Assessment Model in Distance Learning, In:Proceedings

    of Internet and Multimedia Systems and Applications,

    Honolulu, USA, 2001

    [13] C. Vrasidas and M.S. McIsaac, Factors Influencing

    Interaction in an Online Course; The American Journal of

    Distance Education, v. 13, n. 3, 1999.

    [14] D.R. Silva, A Tool for Knowledge Discovery using

    Data Warehousing and its Application on the Ongoing

    Assessment of Distance Learning. MPhil. Dissertation,

    Departament of Computer Science, UFSCar, So Carlos,

    Brazil, 2002, 108p. (In portuguese)

    [15] R. Agrawal, T. Imielinski and A. Swami, Mining

    Associations between Sets of Items in Massive Databases.

    In: ACM SIGMOD International Conference on the

    Management of Data. New York, USA, 1993.

    Proceedings. NY: ACM Press, 1993, p. 207--216.

    [16] J.R. Quinlan, Induction of Decision Trees. MachineLearning, 1:81-106, 1986

    [17] P. Cheeseman and J. Stutz, Bayesian Classification

    (AutoClass): Theory and Results, In: Advances in

    Knowledge Discovery in Databases, 1995. 10.,

    Proceedings. AAAI Press, p. 61-83, 1995

    45