time table scheduling in data mining

7/21/2019 time table scheduling in data mining

1/61

Chapter 1

INTRODUCTION

This chapter includes the introduction to the timetabling. It describes the basic concepts and

types related to the timetable. Then the objectives of this study and thesis outline are described.

1.1 Introduction to Data miningData Miningis a process to analyzing the data from large databases. As it is also clear from its

name Data Mining : searching for valuable information in a large database. Data mining is

also known as knowledge discovery.

Generally, data mining (sometimes called data or knowledge discovery) is the process of

analyzing data from different perspectives and summarizing it into useful information -

information that can be used to increase revenue, cuts costs, or both. Data mining software is one

of a number of analytical tools for analyzing data. It allows users to analyze data from many

different dimensions or angles, categorize it, and summarize the relationships identified.

Technically, data mining is the process of finding correlations or patterns among dozens of fields

in large relational databases.The overall goal of the data mining process is to extract information

from a data set and transform it into an understandable structure for further use.

Data mining is a process that uses a variety of data analysis tools to discover patterns and

relationships in data that may be used to make valid predictions. The first and simplest analyticalstep in data mining is to describe the data summarize its statistical attributes (such as means

and standard deviations), visually review it using charts and graphs, and look for potentially

meaningful links among variables (such as values that often occur together). But data description

alone cannot provide an action plan. You must build a predictive model based on patterns

determined from known results, and then test that model on results outside the original sample. A

good model should never be confused with reality (we know a road map isnt a perfect

representation of the actual road), but it can be a useful guide to understanding our business. The

final step is to empirically verify the model. For example, from a database of customers who

have already responded to a particular offer, we have built a model predicting which prospects

are likeliest to respond to the same offer.

1.1.1 Importance of Data Mining

We can simply define data mining as a process that involves searching, collecting, filtering and


2/61

analyzing the data. It is important to understand that this is not the standard or accepted

definition. But the above definition caters for the whole process. Large amount of data can be

retrieved from various websites and databases. It can be retrieved in form of data relationships,

co-relations and patterns. With the advent of computers, internet and large databases it is

possible collect large amounts of data. The data collected may be analyzed steadily and help

identify relationships and find solutions to the existing problems. Governments, private

companies, large organizations and all businesses are after large volume of data collection for

the purposes of business and research development. The data collected can be stored for future

use. Storage of information is quite important whenever it is required. It is important to note that

it may take long time for finding and searching for information from websites, databases and

other internet sources.

1.1.2 How does Data mining work?

While large-scale information technology has been evolving separate transaction and

analytical systems, data mining provides the link between the two. Data mining software

analyzes relationships and patterns in stored transaction data based on open-ended user queries.

Several types of analytical software are available: statistical, machine learning, and neural

networks. Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

1.1.3 Data Mining (KDD) Process

Understand the application domain Identify data sources and select target data Pre-process: cleaning, attribute selection Data mining to extract patterns or models Post-process: identifying interesting or useful patterns Incorporate patterns in real world tasks


3/61

Figure 1.1 Data mining process

1.1.4 Data mining Techniquesa) Classification

Classification consists of examining the features of a newly presented object and assigning to it

a predefined class. The classification task is characterized by the well-defined classes, and atraining set consisting of pre-classified examples. The task is to build a model that can be

applied to unclassified data in order to classify it. Examples of classification tasks include:

Classification ofcredit applicants as low, medium or high risk

Classification of mushrooms as edible or poisonous

Determination of which home telephone lines are used for internet access

b) ClusteringClustering is the task of segmenting a diverse group into a number of similar subgroups or

clusters. What distinguishes clustering from classification is that clustering does not rely on

predefined classes. In clustering, there are no predefined classes. The records are grouped

together on the basis of self similarity. Clustering is often done as a prelude to some other form

of data mining or modeling. For example, clustering might be the first step in a market


4/61

segmentation effort, instead of trying to come up with a one-size-fits-all rule for determining

what kind of promotion works best for each cluster.

c) Association RulesAn association rule is a rule which implies certain association relationships among a set of

objects (such as occur together or one implies the other) in a database. Given a set of

transactions, where each transaction is a set of literals (called items), an association rule is an

expression of the form XY, where X and Y are sets of items. The intuitive meaning of such a

rule is that transactions of the database which contain X tend to contain Y. An example of an

association rule is: 30% of farmers that grow wheat also grow pulses; 2% of all farmers grow

both of these items. Here 30% is called the confidence of the rule, and 2% the support of the

rule. The problem is to fund all association rule that satisfy user-specified minimum support and

minimum confidence constraints.

d) RegressionRegression is a data mining (machine learning) technique used to fit an equation to a dataset.

Regression is a data mining function that predicts a number. Age, weight, distance, temperature,

income, or sales could all be predicted using regression techniques. The simplest form of

regression, linear regression, uses the formula of a straight line (y = mx + b) and determines the

appropriate values for m and b to predict the value of y based upon a given value of x. The

regression functions are used to determine the relationship between the dependent variable

(target field) and one or more independent variables. The dependent variable is the one whose

values you want to predict, whereas the independent variables are the variables that you base

your prediction on.

1.1.5 Advantages of data mining

Provides new knowledge from existing datao Public databaseso Government sourceso Company Databaseso Old data can be used to develop new knowledge

New knowledge can be used to improve services or products Improvements lead to:


5/61

o Bigger profitso More efficient service

1.1.6 Disadvantages of data mining

User privacy/security Amount of data is overwhelming Great cost at implementation stage Possible misuse of information Possible in accuracy of data

1.2 Scheduling and Timetabling1.2.4 SchedulingScheduling is one of the important tasks encountered in real life situations. Various scheduling

problems are present, like personnel scheduling, production scheduling, education time table

scheduling etc. Education time table scheduling is a difficult task because of the many

constraints that are needed to be satisfied in order to get a feasible solution. Education time table

scheduling problem is known to be NP hard. NP hard stands for non polynomial hard and means

that; there is no known exact algorithm that can solve problems of time table scheduling in

polynomial time. Methodologies like genetic algorithms (GAs), Evolutionary Algorithms (EAs)

etc. have been used with mixed successes.

Scheduling theory is concerned with the optimal allocation of scarce resources to activities

over time. The practice of this field dates back to the first time two human contended for a

shared resource and developed a plan to share it without bloodshed. The theory of the design of

algorithm for scheduling is younger but still has a significant history. The earliest papers in the

field were published more than 40 years ago.

Scheduling problems arise in a variety of settings, as illustrated by the following examples:

Consider the central processing unit of a computer that must process a sequence of jobsthat arrive over time.

Consider a team of five astronauts preparing for the reentry of their space shuttle intoatmosphere.

Consider a factory that produce different sorts of gadgets. Each gadget must first beprocessed by machine 1 then machine 2 then machine 3 where different gadgets require

different amount of processing time on different machines.


6/61

Consider an academic environment, which requires the scheduling of a given set ofcourses and meetings between students and lecturers. Each course takes place in a

particular hall and each hall has its capacity. We must also make sure students or

lecturers are not fixed up in more than one appointment.

1.2.2 Scheduling of timetabling

The general area of scheduling has been the subject of intense research for a number of decades.

Scheduling and timetabling are typically viewed as two separate activities, with the term

scheduling used as a generic term to cover specific types of problems in this area. Consequently,

timetable constructions can be considered as a special case of generic scheduling activity.

In the most general terms, scheduling can be described as the constrained allocation of resources

to objects, being placed in space-time in such a way that the total cost of a set of the resources

used can be minimized. Examples of this problem set can be seen in transport scheduling and

delivery vehicle outing where the business driven objective is to minimize the total cost function.

Timetable construction is the allocation, subject to constraints, of given resources to objects

being placed in space-time in such a way as to satisfy or nearly satisfy a desirable set of possible

objectives. Class timetables and exam timetables are examples of these problems where all hard

constraints must be satisfied to generate a valid solution. [1]

Thus the term scheduling covers all aspects of the activity of allocating resources and, at the

same time, satisfying some predetermined objective. However, due to the enormity of the

problem, it becomes necessary to classify the scheduling problem into specialized activities such

as timetabling. Thus, in practical terms the timetabling problem can be described as scheduling a

sequence of lectures between teachers and students in a prefixed time period (typically week),

satisfying a set of varying constraints.


7/61

1.3 What is Timetabling?

Timetabling problems are a specific type of scheduling problem and are mainly concerned with

the assignment of events to timeslots subject to constraints with the resultant solution

constituting a timetable. Wren (1996) defined timetabling in the following way:

Timetabling is the allocation, subject to constraints,

Of given resources to objects being placed in space time,

in such a way as to satisfy as nearly as possible a set of desirable objectives.

Based on the definition given by Wren (1996), we need to know whether there are sufficient

resources available for the given event to take place at its specified time as well as which

resources are allocated. The goal is to optimize some objective function depending on the

application domain at hand. For example, in examination timetabling environments, the functionto optimise is usually the gap between two examinations that a student has to sit in i.e. try to

spread the examinations throughout the examinations periods of time. The basic terminology

used in timetabling problems is summarised in Table 1.1.

Table 1.1 Basic terminology used in timetabling

Terminology Definition

Event An activity to be scheduled. Examples include

examinations and courses.

Timeslot (period) An interval of time in which events can be scheduled.

Resource Resources required by events. Examples include rooms

and equipment (i.e. projectors).

Constraint A restriction to schedule the events. Examples include

room capacity andspecific timeslot.

Individual A person who has to attend the events.

Conflict Two events are clashing with each other if they have at

least a common individual and are scheduled in the same

timeslot.


8/61

The constraints in timetabling can be divided into two categories: hard and soft. Hard constraints

cannot be violated. Soft constraints are not essential but their satisfaction is highly desirable in

order to produce a good quality timetable.

1.3.1 Hard Constraints: Hard constraints [9] [15] are the constraints that physically cannot

be violated; a timetable in presence of violation of such hard constraints can never be

acceptable. For example, a lecturer cannot be in two places at once. Following are the list of

hard constraints:

1. Classrooms must not be double booked.

2. Every class must be scheduled exactly once.

3. Lecturers must not be double booked.

4. A lecturer must not be booked when he/she is unavailable.

5. Some classes need to be held consecutively. For example the Labs.

6. Some classes require particular rooms like experiments must be held in particular

laboratories.

7. Classrooms must be large enough to hold the class scheduled in it.

1.3.2 Soft constraints: Some constraints [9] [15] are less straight forward to define. Usually,

these constraints must be fulfilled as well as possible. The timetable that violates these constraints

is still usable, but it is not convenient for either students or teachers. Following are the soft

constraints:

1. Teachers may prefer specific time slots.

2. Teachers may prefer specific rooms.

3. Certain kind of subjects should not be in contiguous time slots.

4. Some lecturers do not wish to have classes assigned consecutively in time.

5. There are preferred hours in which a lecturer's classes might be scheduled.

6. Most students and some lecturers do not wish to have empty periods in their timetables.


9/61

7. Classes should be distributed evenly over the week.

8. Classrooms should not be booked which are much larger than the size of the class.

9. More than one member of staff might need to be assigned to a particular class.

It is desirable that timetables should satisfy all hard and soft constraints. However, it is usually

difficult to meet all these constraints because hard constraint must not be violated in any case, but

some soft constraints can be sacrificed to find feasible timetables.

1.4 Classification of Educational Timetabling Problems

Schaerf (1999a) classified educational timetabling into three main classes i.e. school

timetabling, course timetabling and examination timetabling. They share the same basic

characteristics of the general timetabling problem but can still have significant differences

between them. Each one of them has its own constraints, requirements and rules. More details on

educational timetabling can be found in Burke et al. (2004e). In this section, a classification of

educational timetabling and its properties are discussed. We divided educational timetabling into

two categories i.e. school timetabling and university timetabling (which consists of examination

timetabling and course timetabling).

1.4.1 School Timetabling

The school timetabling problem is concerned with the weekly scheduling for all the lessons of aschool. The problem consists of a set of teachers, classes, subject/lessons and weekly periods.

These weekly periods are predefined. This problem tries to assign lessons to periods and, a

teacher to a particular class at a given time while satisfying a set of constraints in order to

produce a feasible timetable. Some examples of constraints in the school timetabling problem are

capacities, locations, teacher loads, rest time between two lessons and other personal preferences.

Examples of research on school timetabling can be found in Abramson (1991) who employed

simulated annealing, Carrasco and Pato (2001) who employed a multi-objective genetic

algorithm and Legierski (2003) who applied a constraint-based approach.

1.4.2 University Timetabling

The university timetabling problem can be grouped into two categories: (i) course (or lecture)

timetabling and (ii) examination timetabling. The course timetabling problem is the process of


10/61

assigning timeslots and rooms so that meetings between lecturers and students can take place.

The examination timetabling problem refers to the assignment of timeslots and rooms so that

students can take examinations. These two (examination and course) timetabling problems are

fairly similar in some superficial ways, but there are some distinct underlying differences

between them. In examination timetabling, several examinations can be assigned to one (large)

room at the same time. However, this is not possible for course timetabling where only one

course can be assigned to one room.

a) The Examination Timetabling ProblemThe examination timetabling problem represents a major administrative activity for academic

institutions. It is often a difficult and demanding process and it affects a significant number of

people. Romero (1982) reports that there are three broad categories of people that are affected by

its outcome: administrators, academic staff and students. Many universities are seeing an

increasing number of student enrolments into a wider variety of courses and an increasing

number of combined degree courses. This is contributing to the growing challenge of developing

examination timetabling software to cater for the broad spectrum of constraints and demands that

are required by educational institutions across the world and therefore the quality of a timetable

should be evaluated from several points of view.

Carter and Laporte (1996) defined the examination timetabling problem as:

The assigning of examinations to a limited number of available

Timeperiods in such a way that there are no conflicts or clashes

The examination timetabling problem is very common in both schools and universities. It is

concerned with allocating a set of examinations, into a limited number of timeslots (periods),

subject to a set of constraints. Carter et al. (1994) quoted that the basic challenge of examination

timetabling is to schedule examinations over a limited number of timeslots so as to avoidconflicts and to satisfy a number of side constraints. In this case, the conflict is referred to as a

hard constraint and side constraints are referred to as soft constraints.

The generally accepted hard constraints for the examination timetabling problem are (i) there

must be enough seating capacity and (ii) no student should be required to sit two examinations at


11/61

the same time. Solutions that satisfy all the hard constraints are called feasible. On the other

hand, there might be some requirements that are not essential. These are referred to as soft

constraints. Common soft constraints are (i) Students should not be scheduled to sit more than

one examination in a day. (ii) Each students examinations should be spread as evenly as

possible over the schedule.

In a real world situation, it is, of course, usually impossible to satisfy all the soft constraints, but

minimizing these violations will increase the quality of the solution by calculating the penalty

function to the extent to which a timetable has violated its soft constraints.

b) The Course Timetabling ProblemCarter and Laporte (1998) defined course timetabling as:

amulti-dimensional assignment problem in which students,

teachers (or faculty members) are assigned to courses, course

sections or classes; events (individual meetings between students

and teachers) are assigned to classrooms and times

In course timetabling (which is also sometimes known as class/teacher timetabling), a set of

courses is scheduled into a given number of rooms and timeslots within a week and, at the same

time, students and teachers are assigned to courses so that the meetings can take place. Some

combinatorial models which draw upon graph colouring for simple class-teacher timetabling

problems can be found in de Werra (1996b, 1997b). As in examination timetabling, course

timetabling also involves hard and soft constraints. Examples of hard constraints for the course

timetabling problem are:

1. A student and a teacher cannot be in two places at the same time.

2. Only one course is allowed to be assigned to a timeslot in each classroom.3. The classroom capacity should be equal to or greater than the number of students attending the

course at a particular timeslot.

Some related soft constraints for course timetabling reported by Socha et al. (2002) are:

1. Teachers may prefer specific time slots.


12/61

2. Teachers may prefer specific rooms.

3. Certain kind of subjects should not be in contiguous time slots.

Some combinations of assignments lead to acceptable timetables, others do not. Such

restrictions follow from conditions imposed by rooms, students or teachers. As stated earlier, in

university course timetable, a set of course and associated events is assigned to a set of rooms and

time periods within a week and at the same time, students and teachers are assigned to the

courses so that the appropriate lessons can take place, subject to a variety of hard and soft

constraints.

1.5 Need of Study

Organizations like universities and schools uses timetable to schedule classes and lectures,

assigning times and places to them in such a way that makes best use of available resources.

Universities in particular increasingly have to deal with a large number of courses and flexible

degree structures. A timetable that is not well designed will be inconvenient and will be

expensive in terms of wasted time and money. Timetabling is a search for Good Solutions in a

space of possible timetables.

Traditionally, the educational staff solved the problem manually. Making timetable is a

slow, laborious task, performed by people working on the strength of their knowledge of

resources and constraints of a specific institution. Generating universitys timetable is a tedious

job with lots of constraints to be satisfied. Different requirements by different departments or

universities must be satisfied also. Thus, generating timetable is being considered as a complex

problem, but result is often not reasonable i.e. it does not meet all the requirements. These

uncertainties have motivated for the scientific study of the problem, and to develop a semi-

automated solution technique for it. These programs build a set of timetables but still do not

solve the whole problem.

The construction of automated course timetables for academic institutions is a very

difficult problem with a lot of constraints that have to be respected and a huge search space to be

explored, even if the size of the problem input is not significantly large, due to the exponential

number of the possible feasible timetables. On the other hand, the problem itself does not have a

widely approved definition, since different departments face different variations of it. This

problem has therefore proven to be a very complex. Timetables are considered feasible provided


13/61

the so-called hard constraints are respected. However, to obtain high-quality timetabling

solutions, soft constraints, which impose satisfaction of a set of desirable conditions for classes

and teachers, should be satisfied and also gives more accurate timetable schedule, high precision,

high recall and takes less execution time for scheduling the timetable by using the modified k-

mean clustering algorithm.

1.6 Research Objectives

The main objective of this thesis work is to fully utilize the resources of the university in the

automated timetable Generator. The goals of the thesis work are:

1. Analysis of the problem exists in timetabling.2. To create a system that can utilize the resources in efficient and effective manner in

order to remove the redundancy, ambiguity so that the system should be cost effective

and user friendly.

3. To compare the existing system with new one.

In this work we check the accuracy, precision, recall of generated timetable and we also compare

the execution time for the timetable generation. In this work we have to use improved k-mean

clustering algorithm and analysis that execution time for improved k-mean clustering is less then

k-mean clustering.

1.7 Scope of Work

A time-table is very crucial for educational institutions and schools. We have found an

effective solution to the problem of adjusting time tables in a simple and economic way. Through

our interactive means we can ensure that the time-table can be generated really fast and

smoothly. The software for time-table is equipped with the most efficient features that can ensure

that the whole time table can be made easily to ensure that the most effective time schedules can

be fixed according to the school's needs and suggestions. With the help of some really effective

tools we can make sure that there are no unnecessary delays or confusion caused with effective

time-tables.


14/61

1.8 Structure of dissertation

The dissertation has been organized into six chapters. A brief description of the content of

these chapters is given in the following paragraphs:

Chapter 1 provides an overview and introduction to timetable scheduling. It also introduces

various timetabling problems and concentrates upon particular research issues concerned with

university timetabling problems. This chapter presents the need of study and the aims of the

research.

Chapter 2 introduces background of timetabling problems. It reviews and analyses the current

published research on the subject of university timetabling.

Chapter 3 introduces the various techniques for solving the timetabling problem.

Chapter 4 introduces the methodology used in this research work for solving the timetable

scheduling problem.

Chapter 5 shows the working of new system and snap-shots of results. It also compares the new

system with existing system. This chapter also describes the accuracy, precision, recall and

execution time of this new system.

Chapter 6 describes the conclusion and future work.


15/61

Chapter 2

SURVEY OF LITERATURE

This chapter discusses the various literatures that are reviewed during the whole research work.

Various research papers and journals have been studies during the period.

2.1 Background

Timetabling is known to be a non-polynomial complete problem i.e. there is no known efficient

way to locate a solution. Also, the most striking characteristic of NP-complete problems is that,

no best solution to them is known. Hence, in order to find a solution to a timetabling problem, a

heuristic approach is chosen. This heuristic approach, therein, leads to a set of good solutions

(but not necessarily the best solution). In a general educational timetabling problem, a set of

events (e.g. courses and exams, etc) are assigned into a certain number of timeslots (time

periods) subject to a set of constraints, which often makes the problem very difficult to solve in

real-world circumstances [2]. In fact, large-scale timetables such as university timetables may

need many hours of work spent by qualified people or team in order to produce high quality

timetables with optimal constraint satisfaction [7] and optimization of timetables objectives at

the same time. These constraints are of two types Hard and Soft constraints. Hard constraints

include those constraints that cannot be violated while a timetable is being computed. For

example, for a teacher to be scheduled for a timeslot, the teacher must be available for that time

slot. A solution is acceptable only when no hard constraint is violated. On the other hand soft

constraints are those that are desired to be addressed in the solution as much as possible. For

example, though importance is given to a teachers scheduling, focus is on setting a val id

timetable and this can lead to a teacher going free for a time slot. Thus, while addressing the

timetabling problem, hard constraints have to be adhered, at the same time effort is made to

satisfy as many soft constraints as possible. Due to complexity of the problem, most of the work

done concentrates on heuristic algorithms which try to find good approximate solutions [8].

Some of these include Genetic Algorithms (GA) [8], Tabu Search [10], Simulated Annealing

[11] and recently used Scatter Search methods. Heuristic optimization methods are explicitly

aimed at good feasible solutions that may not be optimal where complexity of problem or limited

time available does not allow exact solution. Generally, two questions arise (i) How fast the


16/61

solution is computed? and (ii) How close the solution is to the optimal one? Tradeoff is often

required between time and quality which is taken care of by running simpler algorithms more

than once, comparing results obtained with more complicated ones and effectiveness in

comparing different heuristics. The empirical evaluation of heuristic method is based on

analytical difficulty involved in the problems worst case result. In its simplest form the

scheduling task consists of mapping class, teacher and room combinations (which have already

been pre- allocated) onto time slots.

2.2 Literature Review

Many approaches and models have been proposed for dealing with the variety of timetable

problems. Problems range from the construction of semester or annual timetables in schools,

colleges and universities to exam timetabling at the end of the period. Early timetable activities

were carried out manually and a typical timetable once constructed remained static with only a

few changes necessary, in order to fine tune it every semester or year. However, the nature of

education has changed substantially over the years and thus the requirements of timetables have

become much more complicated than they used to be. Consequently the need for automated

timetable generation is increasing and thus the development of a timetable generation system

that generates valid solutions is essential. As a result, during the last 30 years, many papers

related to automate timetabling have been published in conferences, proceedings and journals. In

addition, several applications have been developed and implemented with various successes.

The early techniques used in solving timetabling problems were based on a simulation of the

human approach in resolving the problem. These included techniques based on successive

augmentation that were called direct heuristics. These techniques were based on the idea of

creating a partial timetable by scheduling the most constrained lecture first and then extending

this partial solution lecture by lecture until all lectures were scheduled .[3] Then exit step was

for researchers to apply general techniques like integer and linear programming, graph coloring

and network flow to solve the timetable problem. Hence the first two papers published on

timetable construction using these general techniques are generally attributed to Kuhn and

Haynes. Kuhns a paper adopts a mathematical approach to the fundamental timetable problem

in contrast to Haynes paper, which concentrates, on the more practical problem aspects of

scheduling events for a conference. Interest in timetable solution generators increased


17/61

dramatically in the 1960s mainly due to the more common availability of computers to perform

the number crunching required by the algorithms developed. [4]

The first non-heuristic approach was developed by Gotlieb in 1963 and discussed in the now

famous process of reducing the availability array and presented at the Munich IFIP congress.

This was arguably the first paper on this partitioning approach and was further enhanced by

Berghuis, where the concept of virtual classes or teachers to obtain the classical bipartite

problem was introduced. Typically these papers were based on a heuristic approach. Due to this

work many of the papers followed which discussed the problem but had very little new work in

them.

Around the late 60s some attempts at limiting the general problem by considering case

examples were beginning to be published. For instance, Lawrie in 1969 developed a model for

the school timetable problem by using an integer linear programming approach.

During the 1970s several authors adopted the usage of the heuristic approach in tackling the

timetable problem. For example Junginger in 1972 provided a reduction of the timetable

problem by applying it to a three dimensional transport problem. Schmidtand Strohlein in 1973

predicted the generation of timetables by computer would be heavily influenced by devices at

hand, with timetable programming moving from remote handling in huge computing centers to

micro computer centre's owned by schools and directly handled by teachers on their desktops.

The major general techniques that seemed to have been prevalent in the 1970s and1980s havetheir roots in artificial intelligence and are based on algorithms supported by simulated

annealing, tabu search and genetic algorithm methods. Papers in the literature typically

described a substantial software implementation and this is supported by the presentation of

results of the application of the method in one or more cases. Furthermore, there were a number

of important surveys of timetabling literature that were published in the 1980s. [4]

DeWerra in 1985 listed the various problems dealing with timetabling in a formal way and

provided different formulations in an attempt to solve them. He also described the approaches

considered the most important at that time. Carter in 1986 analyzed a survey, which discussed

actual applications of timetables at several universities. He also provided details of a tutorial

guide for practitioners on electing and/or designing an algorithm for their own institutions. [5]


18/61

Junginger in 1986 described research work in Germany on the school timetable problem and the

underlying approaches that were based on direct heuristics. Corneetal in 1994 provided a survey

of Genetic Algorithm application to timetables, discussed future perspectives of such approaches

and compared results obtained with respect to other approaches. Although there were papers

published in the 1990s solving timetable problems using the above artificial intelligence based

techniques, there was a new approach emerging, also rooted in Artificial Intelligence that has

gained prominence called Constraint Satisfaction Programming (CSP). [6]

Abramson in 1991 used Simulated Annealing as an optimization technique. The possibility of

adding cost components was discussed in an attempt to include the more complex scheduling

constraints that arise in schools. Also described is how the weighting of cost components

allowed one component to be made more important than others. He implemented this in a

parallel computer system and proved that the speed of the algorithm improved along with

results. Cooper and Kingston in 1993 described a computer program that solved a problem

within a large and highly constrained high school without any simplifications. A timetable

specification language was provided that helped to avoid many constraints in a uniform way.

Schaerf in 1999 provided a survey of the different techniques used in timetable generation.

Constraint satisfaction techniques were stressed as an important addition to the tools that are

used in solving the timetabling problem. [10]

KennedyandEberhartin1995 developed Particle Swarm Optimization (PSO) algorithm for

optimization.Shu-Chuan Chu,Yi-Tin Chen in 2006 developed the school timetable using the

PSO. They observed that PSO has many successful applications in continuous optimization

problems. The main contribution of their work is to utilize PSO to solve the discrete problem of

timetable scheduling. [7]

Ahmed Hamdi Abu Absa and Dr. Sana Wafa Ai Sayegh [8] explained the details of the

implementation of the Genetic Algorithms (GA) which is used for university timetable generator.

This paper presents a program, written in java. In a simple university timetable problem it creates

efficient time table without constraint violation. The study tested the effects of mutation rate and
http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Shu-Chuan%20Chu.QT.&searchWithin=p_Author_Ids:37420219100&newsearch=truehttp://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Yi-Tin%20Chen.QT.&searchWithin=p_Author_Ids:38045954200&newsearch=truehttp://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Yi-Tin%20Chen.QT.&searchWithin=p_Author_Ids:38045954200&newsearch=truehttp://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Shu-Chuan%20Chu.QT.&searchWithin=p_Author_Ids:37420219100&newsearch=true


19/61

population size. This paper discussed Genetic Algorithm approach is very effective and useful on

the lecture time tabling problems.

Alberto Colorni et al. [9] analyzed the results of an automated timetabling problem solve by

genetic algorithm. They described that automated timetable problem is representative of the class

of multi-constrained, NP-hard, combinatorial optimization problems with real-world application.

This paper compares two versions of the genetic algorithm, with and without local search, both

to a handmade timetable and also compare with to two other approaches based on simulated

annealing and tabu search. The results show that genetic algorithm with local search and tabu

search with relaxation perform better than simulated annealing and handmade timetables. When

they tested algorithm the results were better, both the didactical requirements and teachers

preferences were better satisfied. The total cost of this newly built system was much less than

handmade timetable.

E.K. Burke et al. [10] discussed automatic timetable generation with the use of traditional

methods such as graph coloring and advanced methods such as the genetic algorithms. This

paper presents the examination timetabling. This paper discussed Genetic algorithm is very

useful general purpose optimization tools that may be applied to wide range of very difficult

problems

Khaled Mahar [2] proposed genetic algorithm has a simple representation that handles all the

university timetables at once and easily modified to creation of a accurate timetable which

satisfies constraints that must not be broken. This paper presents that algorithm is applied to

create timetables for the college of Arab Academy for Science and Technology in Egypt and the

results are very satisfactory and there is no hard constraint violation encountered. The program

tested with different population sizes, a crossover and mutation rates. This paper also provides an

overview of different techniques for automatic generation of university time tables like tabu

search, simulated annealing, genetic algorithms, graph coloring heuristics, constraint

programming, network flow models, and constraint programming .This papers proves that as

long as population size increases the cost changes faster and large size takes too much running

time and memory consumption.


20/61

Dipti Srinivasan et al. [11] stated an evolutionary algorithm based approach to solving a large

constrained university timetabling problem. Other techniques also used for obtaining feasible

timetables in a appropriate time that are Heuristics and context-based reasoning. The complete

course timetabling system presented in this paper has been accurate, tested and discussed using

data from a university. The results have shown that implementing the intelligent adaptive

mutation operator has led a more than 10 times of improvement in the performance of

evolutionary algorithm.

Prof. Swapna Borde et al. [12] presented a hybrid algorithm for university timetabling problem

which is combination of two techniques first is If Else Algorithm and second is Graph Coloring

Algorithm. First algorithm is based on simple if else statement which can be easily used in any

programming language and second algorithm is based on connected graphs. This hybrid

algorithm is removing the individual methods disadvantages and provide more efficient time

table generating algorithm.

Hana Rudova et al. [13] focused on the hard constraint with preference propagation for soft

constraints. They extended the constraint logic programming technique that used for partial

satisfy the soft constraints. They applied this method to solve the timetabling problem of Purdue

University. This model and search methods applied to the solution of the large lecture room

component are presented and analyzed the computational results. Their results were able to

satisfy the course requests of 98% of students.

Ashish jain, Dr. Suresh Jain and DR. P.K. Chande [14] showed the various genetic operators

such as selection, mutation and crossover. They select the best chromosomes on the basis of

fitness function from the groups of chromosomes and similarly with crossover we can exchange

the information of the timetable as per our requirements. This paper claims that evolutionary

based genetic algorithm approach as an effective solution and powerful method to solve course

timetabling problem.

Yao-Te Wang et al. [15] proposed a practical automatic timetabling scheduling system based on

students needs in which process is divided into two stages. In the first stage, students needs in

course selection are decided and an association among courses selected by students is extract

using the association mining technique; while in the second stage, the genetic algorithm is used

to arrange the course timetable. This study is based on students willingness in course selection,


21/61

analyzes the performance of student learning, teachers preferred schedules, determines the cost

function value of each class period, and then applies the genetic algorithm for class period

exchange and produce an optimal course timetable. The automatic course scheduling system

proposed in this study not only can efficiently replace the task of conventional manual

timetabling scheduling, but also produce course timetables that truly fulfill users needs and

increase students and teachers satisfaction. The automatic course scheduling system proposed

in this study is capable of improving the interaction among students, teachers and the school,

creating a good relationship among the three parties.

Kuldeep Kumar et al. [6] suggested in their paper, there are a very large number of feasible

solutions of university timetabling problem. Some method is required to permission the overall

quality of different solutions to be measured, in order to allow them to be compared, so that the

best one is selected. The use of Genetic Algorithm for university timetabling problems is

generally the appropriate technique that gives a number of alternative solutions that satisfy most

of the hard constraints are possible.

Nikita Desai [16] proposed a large number of tools that are used for solving timetabling problem

based on the resources provided by user. The tools consider those resources which are ignored.

They focus mainly on the specifications of classrooms, teachers, and subjects but are not able to

fit in resources related to the human factors like fondness, hostile and weakness of the teachers

and students. She presented a survey done to find the preferences of teachers and later concludes

with the rules extract using classification method. She analyzed rules can be proper utilized

resources for an accurate timetable generation.

Carpente [17] presented an application that solves the complex school timetabling process, from

the resources that are available and adjustment the resources with the fully utilization in

automatically generated solution. Their application interacts with the Academic Administration

Official Systems (AAOS) and makes simple the hard phase of introducing the data and complete

solutions are efficiently provided by different heuristic techniques. The application can be easily

updated by designed user interface.

Ho Sheau Fen, Irene et al. [18] described the PSO technique for solving the university

timetabling problem. They apply the constraint based reasoning to the PSO. The proposed

algorithm is tested using real data from Teknologi University, Malaysia. The result is compared


22/61

against Standard PSO and hybrid PSO-Local search and the results of proposed algorithm is

better than others but computational time used to generate a solution by proposed algorithm is

slightly longer compare to hybrid PSO- Local search and standard PSO.

Ruey-Maw Chen et al. [19] explained that course timetabling problem is NP-complete problem.They used the PSO for solving this problem due to its fast convergence, fewer parameters setting

and ability to fit dynamic environmental characteristics. They check the performance of PSO and

SPSO with and without local search. They indicate that after local search added thee outcomes

are significantly better than those obtained by using PSO or SPSO alone. Moreover the

performance of SPSO-local search is better than that of applying PSO-local search.

Elizabeth Montero et al. [20] analyzed the PSO when we have to face a dynamic problem where

new courses and exam can appear during the semester. They use the forward checking algorithm

and this approach can efficiently handle the creation of new courses or after the initial and static

start up planning.

Danial Qarouni - Faral et al. [21] used the swarm intelligence that is based on social

psychological principles as well as contributing to engineering applications. This paper applies

the PSO to the classic timetabling problem. The result shows that the number of errors is

decreased in comparison with previous approaches.

LI Lin et al. [22] described the course-scheduling problem with the PSO algorithm. Theydescribed initial population with higher performance was obtained by improving greedy strategy

so as to promote efficiency of the algorithm. This paper analyzed the application of PSO in

course scheduling system, adding greedy strategy to the algorithm in selecting initial particle

swarm. The initial population is much closer to the target of optimal solution, and the

convergence speed of the algorithm could be enhanced. Meanwhile varies kinds of hard or soft

constraints are taken into account; the difficulty of course scheduling is reduced.

ZHU Jihrong et al. [23] explained a new adaptive particle swarm optimization algorithm. Every

particle chooses its inertial factor according to the fitness of itself and the optimal particle in the

presented algorithm. With better fitness, the particle chooses a smaller inertial factor. The

simulation results show that the proposed algorithm is effective and robust. Simulation results

show that the new algorithm has advantage of global convergence property and can effectively

alleviate the problem of premature convergence. At the same time, the experimental results also


23/61

show that the suggested algorithm is greatly superior to PSO and APSO in terms of robustness.

R.C. Eberhart et al. [24] compared the two methods of particle swarm optimization. They

compare performance of particle swarm optimization using an inertia weight and using a

constriction factor. Five benchmark functions a r e used for the comparison. It is concluded that

the best approach is to use the constriction factor w h i l e limiting the maximum velocity Vmax

to the dynamic range of the variable Xmax on each dimension. The results here also indicate that

improved performance can be obtained by carefully selecting the inertia weight w,c1,and c2.

Almost all of the papers in the literature describe a substantial software implementation. In

addition, this is supported by the presentation of results of the application of the method in one or

more test cases. The results obtained are measured against manual results but unfortunately, the

absence of a common definition of the various problems and of widely accepted benchmarks

prevents the comparison of the algorithms among each other. The computational complexity of

the proposed systems is determined only through computing time. However comparisons are

difficult as hardware varies from case to case. Furthermore there seems to be a substantial gap

between the theoretical discussion and implementation of the software to test cases in contrast to

obtaining effective and realistic timetables that can be used in every day operations. Therefore in

order to generate a timetable that is practical and effectual it needs to be flexible enough so that it

can facilitate and overcome the problems.


24/61

Chapter 3

TECHNIQUES

3.1 Techniques Applied to the Timetabling Problem

A timetabling problem can be defined as the scheduling of a certain number of lectures, which

are to be attended by specific group of students and given by a teacher, over a definite period of

time. Each lecture requires certain resources in limited number and must fulfill certain specific

requirements. In particular, automatic building of timetable is extremely difficult because of

diversity of constraints that must be taken into account.

The most usual methods to solve this problem are inherited from operations research such as

graph coloring and mathematical programming, or from Genetic Algorithms [9]. These well-

known and widely used methods have given good results. But, OR inherited methods generally

lack flexibility (i.e modifying the data may lead to the necessity of reconsidering the initial

model); moreover it is difficult to find a model which includes all the constraints. For local

search methods (where most of the constraints are put in the objective function) or for Genetic

Algorithms (where the constraints are active in the fitness function), the user frequently obtains

solutions by tuning rather than by defining his own search strategy dedicated to the problem.

This section divides the related techniques applied to university timetabling problems into six

categories i.e. constraint-based methods, graph-based approaches, cluster-based methods,

heuristic base approach and genetic approach. The details of these categories are discussed in the

following sub-sections. There are many other approaches are also for timetabling like

population-based approaches, meta-heuristic methods, multi-criteria approaches, hyper-

heuristic/self adaptive approaches, case-based reasoning, knowledge-based and fuzzy-based

approaches.

3.1.1 Constraint-based Methods

Reasoning approaches are considered new methodologies in problem solving. Two types ofreasoning approaches are applied on the UCT problem: Case-Based Reasoning (CBR) [28]

approaches and Constraint-Based Reasoning approaches. Case-Based Reasoning (CBR)

approaches are considered new methodologies in solving timetabling problems which use

previous timetables and previous construction methodology in solving latest timetabling

problems by using similarity measures. The big challenge for these approaches is a definition for


25/61


26/61

Cluster methods were classified as one of the four major approaches by Carter and Laporte

(1996). The idea of the cluster method was first coined by Desroches et al. (1978). White and

Chan (1979) and White and Haddad (1983) describe cluster methods as which can be thought of

as representing a three phase approach. In the first phase, the examinations are grouped into

timeslots to construct a feasible timetable. The second phase attempts to reduce second order

conflicts by considering permutations of timeslots. Then the third stage is employed with the aim

of improving the solution quality further. This is done by moving a particular examination

between timeslots such as by employing a hill climbing local search.

3.1.3 Graph-based Approaches

Graph coloring is concerned with coloring the vertices of a given graph using a given number of

colors. Let us consider the examination timetabling problem. We need to schedule all theexaminations within a limited number of timeslots in such a way that any clashing examinations

(i.e. examinations that have at least a common student) are scheduled in different timeslots, so

this problem can be viewed as a graph coloring model where the vertices represent the

examinations, the colors represent the slots and the edges represent the conflicts between

examinations. Each vertex of a graph should be colored using p colors so that no two vertices

connected by an edge are both assigned the same color and normally there are a limited number

of colors available.

A definition of the concepts and terms that relate to a graph is given before progressing with the

explanation of this model. An undirected graph G = (V,E) is a representation that consists of a

set of vertices, V = {v1,,v

n}, and a set of edges, E. If (v

i,v

j) is an edge in a graph G = (V, E),

then vertex viis adjacent to vertex v

j(Burke et al., 2004a). Figure 3.1 shows the representation of

an undirected graph on the vertex set {v1, v

2, v

3, v

4, v

5}.


27/61

Figure 3.1 an undirected graph G = (V,E)

Other related definitions are:

The degree of a vertex is the number of edges connected t o it. For example, from Figure 3.1,

vertex v1

has a degree of 3.

The chromatic number of a graph is the minimum number of colors necessary to color the

vertices, so that no two vertices connected by an edge are both assigned the same color.

For a better understanding about the relationship between the graph coloring problem and the

timetabling problem, an example of course timetabling is presented in Figure 3.2.

From Figure 3.2, we can see that there are five different courses coded as A,B, C,D andE. One

possible goal is to find the minimum number of timeslots that are needed to schedule the five

courses. A set of edges represents clashes between courses. If there is an edge between vertices,

it means that these courses cannot be scheduled in the same timeslot. In our example, course A

cannot be scheduled at the same time as course B and C. Course B cannot be scheduled at thesame time as courseA andD and so on. Clearly 3 colors (timeslots) are needed to schedule this

problem. CourseA andE could be colored red, courseD could be colored yellow, courseB could

be colored blue and course C could be colored yellow or blue. The colors correspond to

timeslots. The graph coloring problem is concerned with finding the chromatic number of a


28/61

graph (which is the minimum number of colors required to color the graph). From the graph in

Figure 3.2, it is easy to see that the chromatic number is 3.

Figure 3.2 A graph model for a simple course timetabling problem

A variety of graph coloring based heuristics for constructing a clash-free timetable is available in

the literature.

3.1.4 Heuristic Approach

The Heuristic-based approaches use heuristic concepts and heuristic search to construct and

define the solutions for many problems and give good results. Through the recent decades, there

has been a heaviness of literature on heuristic approaches to solve timetabling problems and

many researches discuss the heuristic topics in related field. Some heuristic approaches employ

heuristic ordering where a heuristic is used to measure the difficulty of scheduling a particular

course and solve conflicting between other courses [2]. These approaches order courses by using

heuristics and then assign the courses sequentially into proper time slot; so that, courses in the

period are free-conflict with each other.

Heuristic optimization methods are explicitly aimed at good feasible solutions that may not be

optimal where complexity of problem or limited time available does not allow exact solution.


29/61

Generally, two questions arise (i) How fast the solution is computed? and (ii) How close the

solution is to the optimal one? Tradeoff is often required between time and quality which is taken

care of by running simpler algorithms more than once, comparing results obtained with more

complicated ones and effectiveness in comparing different heuristics. The empirical evaluation

of heuristic method is based on analytical difficulty involved in the problems worst case result.

In its simplest form the scheduling task consists of mapping class, teacher and room

combinations (which have already been pre- allocated) onto time slots.

One possible approach is as follows: We define a tuple as a particular combination of identifiers

such as class, teacher and room, which is supplied as an input to the problem.[2] The problem

now becomes one of mapping of tuples onto period slots such that tuples which occupy the same

period slot are disjoint (have no identifiers in common). If tuples are assigned arbitrarily to

periods, then in anything but the most trivial cases, a number of clashes will exist. We can use

the number of clashes in a timetable as an objective measure of the quality of the schedule. Thus,

we adopt the number of clashes as the cost of any given schedule. It is simple to measure the cost

of a schedule. For each period of the week, we make a count of the number of occurrences of

each class, teacher and room identifier. The cost of the entire timetable is the sum of each of the

individual costs. This procedure is discussed in more detail in Abramson [21]. The proposed

algorithm aids solving the timetabling problem while giving importance to teacher availability.

This algorithm uses a heuristic approach to give a general solution to school timetabling

problem. It takes the user input of a number of subjects, number of teachers, subjects every

teacher takes, number of days in a week for which the timetable needs to be set, number of time

slots in a day and the maximum lectures a teacher can conduct in a week. It initially uses

randomly generated subject sequence to make a temporary time table. While generating this

sequence, care is taken to avoid repetition of subjects over a day. After this, the teacher

availability for each of the subjects allocated for the respective slot is checked. Every time a

teacher is available for the subject at the allocated slot, the subject and the teacher are entered

into the output data structure and marked as final. Before the allocation of this subject to the

output data structure, a check is also conducted on the number of maximum lectures a teacher

can conduct. If the teacher has been allocated more than the allowed maximum lectures the

subject is moved into a Clash data structure. To avoid cycling and to improve the search, this


30/61

variable selection criterion can be randomized. There are several methods [22] which can be

applied,

e.g.:a random walk technique (with the given probability p a random variable is selected)not

the worst variable, but a random selection of a variable worse enough (e.g., from the top N worst

variables), ora selection of a variable according to a probability based on the above mentioned

criteria (e.g., roulette wheel selection).

The main advantage for these orderings, it is easy to implement. After the courses ordered,

variety of approaches can be used to choose the best time slot for each course.

The disadvantages of SA method, it needs long time to get good solutions and must supply some

parameters with awareness [16]. Another meta-heuristics used to solve timetabling problem is

the Tabu Search (TS) method, which remembers the features of prior solutions to avoid visiting

them again. This reduces the search space and gets results relatively quickly.

3.1.5 Genetic approaches

The Genetic Searching (GS) algorithms are other meta-heuristics approaches, which employed to

obtain high quality timetables. Many papers written in the literature employ and apply the

genetic algorithms in their approaches to solve the timetabling problems such as [28].

In general, a genetic searching method starts by producing randomized timetables which present

a parent population for the timetabling problem. After that, each generated timetable is converted

to consistent timetable by eliminating courses that cause conflicting with other courses. Someinitial timetables may be empty which no courses are scheduled. After that, selection criterion

applied to choose timetables that used to get new parent population using genetic operators [6].

This operation repeated until the produced solution contains all scheduled courses and soft

constraints satisfied with maximum satisfaction degree. The general algorithm for genetic is as

follows:

Create a Random ini tial state

An initial population is created from a random selection of solutions (which are analogous to

chromosomes).

Evaluate Fitness

A value for fitness is assigned to each solution (chromosome) depending on how close it actually

is to solving the problem (thus arriving to the answer of desired problem),(These solutions are


31/61

not to be confused with answers to the problem, think of them as possible characteristics that the

system would employ in order to reach the answer.)

Reproduce (&Chil dren M utate)

Those chromosomes with a higher fitness value are more likely to reproduce offspring (which

can mutate after reproduction). The offspring is a product of the father and mother, whose

composition consists of a combination of genes from them (this process is known as crossing

over).

Next Generation

If the new generation contains a solution that produce an output that is close enough or equal to

the desired answer then the problem has been solved. If this is not the case, then the new

generation will go through the same process as their parents did. This will continue until a

solution is reached.


32/61

Chapter 4

EXPERIMENTAL PROCEDURES

This chapter describes the experimental procedures followed and processing parameters

selected in the present study.

4.1 Methodology

Making a class schedule is one of those NP complete problems. The problem can be solved

using a heuristic search algorithm and genetic algorithm to find the solution, but it only works for

simple cases. For more complex inputs and requirements, finding a considerably good solution can

take a while, or it may be impossible. In this dissertation work we use the improved k-mean

clustering algorithm and decision tree techniques for solving the timetabling problem.

4.2 Research Design

The thesis work is carried out through a number of stages starting from problem selection to

literature review about the state of art technology specific to Automated timetable Generator on

Java Platform. Most of the time is spent in identifying and selecting the problem and literature

review. Selection of optimization algorithms and understanding the working of it also took a lot

of time. We divided the overall research into four stages as shown in figure 4.1 below:

Figure 4.1 Research MethodologyIn this research work we used the improved k-mean clustering algorithm for the clustering of

data set. Clustering is finding groups of objects such that the objects in one group will be similar

to one another and different from the objects in another group. The traditional K-means

algorithm is a widely used clustering algorithm, with a wide range of applications. In the

improved K-means clustering algorithm analysis the advantages and disadvantages of the

Problem

Identification &

Selection

Literature

Review

Select

Appropriate

Algorithm

Toolbox

Optimization

Results


33/61

traditional K-means clustering algorithm elaborates the method of improving the K-means

clustering algorithm based on improve the initial focal point and determine the K value.

Simulation experiments prove that the improved clustering algorithm is not only more stable in

clustering process, at the same time, improved clustering algorithm to reduce or even avoid the

impact of the noise data in the dataset object to ensure that the final clustering result is more

accurate and effective.

We also used the decision tree technique of data mining for the classification of the clustered

data set. A decision tree is a flow-chart-like tree structure, where each internal node is denoted

by rectangles, and leaf nodes are denoted by ovals. All internal nodes have two or more child

nodes. All internal nodes contain splits, which test the value of an expression of the attributes.

Arcs from an internal node to its children are labeled with distinct outcomes of the test. Each leaf

node has a class label associated with it.

This dissertation work is implemented on java platform. Java is a computer programming

language that is concurrent, class-based, object-oriented, and specifically designed to have as

few implementation dependencies as possible. It is intended to let application developers "write

once, run anywhere"(WORA), meaning that code that runs on one platform does not need to be

recompiled to run on another. Java applications are typically compiled to bytecode (class file)

that can run on anyJava virtual machine (JVM) regardless ofcomputer architecture.Java is, as

of 2014, one of the most popular programming languages in use. The main steps of this research

work are following:

1. Dynamically/ manually create the data base

2. Connect the database with Java Net-beans IDE.

3. Pre process the data set with clustering algorithm.

4. Classify the data set clusters using Decision Tree

5. Knowledge discovery of time table scheduled and made.

4.2.1 Dynamically/manually create the database

In this dissertation work, we first create the database for the teacher registration, student

registration, for subjects of seven semesters of b.tech computer science, for the attendance of

teachers, for the room numbers of college and for the time slots. In this work we create the

database timetable scheduling. Timetable scheduling database has total 13 tables like teacher that

has all the information about the teachers. We can enter the information about a teacher in this
http://en.wikipedia.org/wiki/Computer_programming_languagehttp://en.wikipedia.org/wiki/Computer_programming_languagehttp://en.wikipedia.org/wiki/Concurrent_computinghttp://en.wikipedia.org/wiki/Class-basedhttp://en.wikipedia.org/wiki/Object-oriented_programminghttp://en.wikipedia.org/wiki/Write_once,_run_anywherehttp://en.wikipedia.org/wiki/Write_once,_run_anywherehttp://en.wikipedia.org/wiki/Compilerhttp://en.wikipedia.org/wiki/Java_bytecodehttp://en.wikipedia.org/wiki/Class_%28file_format%29http://en.wikipedia.org/wiki/Java_virtual_machinehttp://en.wikipedia.org/wiki/Computer_architecturehttp://en.wikipedia.org/wiki/Computer_architecturehttp://en.wikipedia.org/wiki/Java_virtual_machinehttp://en.wikipedia.org/wiki/Class_%28file_format%29http://en.wikipedia.org/wiki/Java_bytecodehttp://en.wikipedia.org/wiki/Compilerhttp://en.wikipedia.org/wiki/Write_once,_run_anywherehttp://en.wikipedia.org/wiki/Write_once,_run_anywherehttp://en.wikipedia.org/wiki/Object-oriented_programminghttp://en.wikipedia.org/wiki/Class-basedhttp://en.wikipedia.org/wiki/Concurrent_computinghttp://en.wikipedia.org/wiki/Computer_programming_languagehttp://en.wikipedia.org/wiki/Computer_programming_language


34/61

table when we perform the registration procedure for any teacher; we also take the snap of teacher

at the time of registration. Timetable database also has a table student that contains the information

regarding registration of student. We can enter information of student in student table when a new

student takes admission in college or during their registration. Timetable scheduling database also

has the tables like semesterfirst, semestersecond, semesterthird, semesterforth, semesterfifth,

semestersixth and semesterseventh that has information about the subjects of relative semesters.

We create an attendance table in this database that contains the information about the presence or

absence of teachers. We also have a timetable table in the timetable scheduling database that has

the information regarding timeslots of college.

4.2.2 Connect the database with java net-beans IDE

After creating the database we need to connect it with java net-beans. We connect the databasewith java net-beans, so that we can dynamically enter the data into database by using user interface

and we can also fetch the data from database using this interface when required. Using this

connection with the database, we can enter the teachers registration information; Student

registration information, attendance of teachers etc enter into the database dynamically. We can

also able to fetch any information from database when required. Using this connection we fetch

data from the database and schedule the timetable for seven semesters of courses.

4.2.3 Pre process the data set with clustering algorithm

After entering the data into database, we need to preprocess the dataset by using the clustering

algorithm. In this dissertation work we use the improved k-mean clustering algorithm. By using

improved k-mean clustering algorithm, we create the clusters for teachers, subjects, rooms and

timeslots. We use the improved k-mean clustering algorithm instead of k-mean algorithm because

improved k-mean clustering has number of advantages over k-mean and it also overcome the

disadvantages of k-mean clustering algorithm. The k-mean and improved k-mean algorithms are

described in following sections.

4.2.3.1 K-mean algorithm

K-means cluster algorithm was proposed by J. B. MacQueen in 1967, which is used to deal with

the problem of data clustering, the algorithm is relatively simple, so generate a widely influence in

the scientific field research and industrial applications [30]. It is based on decomposition, using K


35/61

as a parameter, divide n object into K relatively low similarity between clusters. And minimize the

total distance between the values in each cluster to the cluster center. The cluster center of each

cluster is the mean value of the cluster. The calculation of similarity is done by mean value of the

cluster objects. The measurement of the similarity for the algorithm selection is by the reciprocal of

the Euclidean distance.

a) Procedure of K-means Algorithm

Distribute all objects to K number of different cluster at random;

Calculate the mean value of each cluster, and use this mean value to represent the cluster;

Re-distribute the objects to the closest cluster according to its distance to the cluster center;

Update the mean value of the cluster. That is to say, calculate the mean value of the objects

in each cluster;

Calculate the criterion function E, until the criterion function converges.

Usually, the K-means algorithm criterion function adopts square error criterion, be defined as:

K n

E= ||xi -mj ||2

J=1 i=1

xi cj

In which, E is total square error of all the objects in the data cluster, xibellows to data object set,

mi is mean value of cluster Ci (x and m are both multi-dimensional). The function of this

criterion is to make the generated cluster be as compacted and independent as possible.

b) Analysis of the Performance of K-means Algorithm

Advantages:1. K-mean value algorithm is a classic algorithm to resolve cluster problems; this algorithm is

relatively simple and fast.

2. For large data collection, this algorithm is relatively flexible and high efficient, because the

Complexity is O (ntk). Among which, n is the times of iteration, k is the number of cluster, t is

the times of iteration. Usually, kn and tn. The algorithm usually ends with local optimum.

3. Because the limitation of the Euclidean distance. It can only process the numerical value, with

good geometrical and statistic meaning.

Disadvantages:


36/61

The inherent prosperities of the K-means clustering algorithm to determine its limitations,

specific performance is as follows:

1. The K value is most important for K-means clustering algorithm. There is no applicable

evidence for the decision of the value of K (number of cluster to generate), and sensitive to initial

value, for different initial value, there may be different clusters generated.

2. K-means clustering algorithm has a higher dependence of the initial cluster centers. If the

initial cluster center is completely away from the cluster center of the data itself, the number of

iterations tends to infinity, but also makes it easier for the final clustering results into local

optimization, resulting in incorrect clustering results.

3. K-means clustering algorithm has a strong sensitivity to the noise data objects. If there is a

certain amount of noise data in dataset, it will affect the final clustering results, leading to its

error.

4. K-means clustering algorithm for the discovery of clusters of arbitrary shape is most difficult.

5. K-means clustering algorithm has main limitation on amount of data. In the iterative process,

every time you need to adjust the cluster to which data object belongs and compute cluster

center, so in case of large amount of data, the K-means clustering algorithm is not applicable.

4.2.3.2 The Research Point of K-means Clustering Algorithm

The research on K-means clustering algorithm is mainly from the following two aspects:

First, about the determination of k value. Through the above analysis, the K value of the initial

cluster centers to determine the far-reaching impact throughout the clustering process and the

final clustering results, while the K value in practical applications is very difficult to direct or

one-time determination [30]. Especially, if the amount of data tends to infinity which is pending,

the K value of the K-means algorithm to determine will be very difficult. At present, there are

two clustering algorithms to determine the K value is relatively effective which is the cost

function based on distance and propagation clustering algorithm based on nearest neighbors. The

former find the minimum through using the cost function. Thus obtain the corresponding K

value. The latter using nearest neighbor clustering algorithm to calculate the appropriate number

of cluster center, the number of cluster center provides for the maximum K value of the K-means

clustering algorithm to get the optimal value of K. Second, about the choice of initial cluster

centers. K-means clustering algorithm using the iterative method to solve the problem, except the

first step, the clustering results of each step are improved to some extent; otherwise terminate the


37/61

process of iteration. Traditional K-means clustering algorithm takes the cluster squares error and

the criterion function value change or not as the iterative termination conditions. But the

clustering results obtained from this criterion function easily fall into local minimum solution,

the result is the clustering results of search are moving toward the direction of diminishing the

criterion function value [31]. In this, the improvement of K-means algorithm is mainly reflected

in the following two aspects:

Optimize the initial cluster centers, to find a set of data to reflect the characteristics of data

distribution as the initial cluster centers, to support the division of the data to the greatest extent.

Optimize the calculation of cluster centers and data points to the cluster center distance,and

make it more match with the goal of clustering.

4.2.3.3 Improved K-means Clustering Algorithm

a) Related Concept

Definition 1 The distance between data points and the cluster center. The distance formula of

data point xiand cluster center kjdefined as following [5]:

(2)Where w represents the number of attributes of the data points xi.

Definition 2 The density parameter . The number of data points which is contained by a scope

defined as density parameter. The scope is a round which takes space point of not statistics x ias

the center, as the radius. The greater the density of xi, the greater the value of the densityparameter are.

Definition 3 The core data points. If the y-neighborhood of a data point contains at least PTS

_min number of data points, then the data point called the core data point.

Definition 4 The cluster center. Differences from the traditional clustering adjustment, the

improved clustering algorithm add the weight of data point to the cluster center. Data points near

the center of the cluster weights, on the contrary, the value of data points away from the cluster

center is less weight. The formula of cluster center defined as follow:

(3)

Where j represents the jth cluster, h is the number of data points in the cluster, djhrepresents the

distance between the hth data point which belongs to cluster c and cluster center. And with the

restriction of dj1 dj2 : : : djh,


38/61

Definition 5 The Euclidean distance between data points and the cluster center. The distance

between data point and the cluster center determine the cluster which data point belongs to, the

formula of Euclidean distance is defined as follows:

(4)

where j represents the jth cluster cj, i represents the ith data point x i , dji is the Euclidean distance

between data point xi and the cluster center cj , represents the squares error of the cluster cj, is the squares error sum of the K clusters c.b) Improved K-means Algorithm Description

Algorithm 1: Improved K-means Algorithm

Input: data set x contains n data points; the number of cluster is k.

Output: k clusters of meet the criterion function convergence.

Program process:

Step 1. Initialize the cluster center.

Step 1.1 Select a data point xifrom data set X, set the identified as statistics and compute the

distance betweenxiand other data point in the data set X. If it meet the distance threshold, then

identify the data points as statistics, the density value of the data pointxiadd 1.

Step 1.2 Select the data point which is not identified as statistics, set the identified as statistics

and compute its density value. Repeat Step 1.2 until all the data points in the data set X have

been identified as statistics.

Step1.3 Select data point from data set which the density value is greater than the threshold and

add it to the corresponding high-density area set D.

Step 1.4 Filter the data point from the corresponding high-density area set D that the density of

data points relatively high, added it to the initial cluster center set. Followed to find the k-1 data

points, making the distance among k initial cluster centers are the largest.

Step 2 Assigned the n data points from data set X to the closet cluster.

Step 3 Adjust each cluster center K by the formula (3).

Step 4 Calculate the distance of various data objects from each cluster center by formula (4), and

redistribute the n data points to corresponding cluster.

Step 5 Adjust each cluster center K by the formula (3).

Step6 Calculate the criterion function E using formula (1), to determine whether the

convergence, if convergence, then continue; otherwise, jump to Step 4.


39/61

4.2.4 Classify the data set clusters using Decision Tree

After making the clusters for the data set, we need to do classification of the clusters, so that we

can assign the teachers to different courses, subjects to teachers, class rooms to different classes

without any clash. In this dissertation work we classify the data set clusters by using the decision

tree technique. By using decision tree technique we also try to satisfy the soft constraints on

timetable schedule like assign the subjects to teachers according to their choice, give the preference

to more experienced teachers, try to assign the class rooms according to their choice and also

assign the timeslots according to teachers choice.

4.2.4.1 Classification

Classification consists of examining the features of a newly presented object and assigning to it a

predefined class. The classification task is characterized by the well-defined classes, and atraining set consisting of pre-classified examples. The task is to build a model that can be applied

to unclassified data in order to classify it. Examples of classification tasks include:

Classification of credit applicants as low, medium or high risk

Classification of mushrooms as edible or poisonous

Determination of which home telephone lines are used for internet access

Predictive modeling can sometime-but not necessarily desirably be seen as a Black box that

makes predictions about the future based on information from the past and present. Some models

are better than others in terms of accuracy. Some models are better than others in terms of

understandability; for example, the models range from easy-to-understand to incomprehensible

(in order of understandability): decision trees, rule induction, regression models, and neural

networks. Classification is one kind of predictive modeling. More specially, classification is the

Documents

time table scheduling in data mining