Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

1/16

Predictive Job Scheduling in a Connection Limited

System using Parallel Genetic Algorithm

(Synopsis)


2/16

INTRODUCTION

Most job-scheduling approaches for parallel machines apply

space sharing which

means allocating CPUs/nodes to jobs in a dedicated manner and

sharing the machine

among multiple jobs by allocation on different subsets of nodes. Some

approaches

apply time sharing (or better to say a combination of time and space

sharing), i.e. use

multiple time slices per CPU/node. Job scheduling determines when

and where to execute the job, given a stream of parallel jobs and set

of computing resources. In a standard working model, when a parallel

job arrives to the system, the scheduler tries to allocate required

number of processors for the duration of runtime to the job and, if

available, starts the job immediately. If the requested processors are

currently unavailable, the job is queued and scheduled to start at a

later time. The most common metrics evaluated include system

metrics such as the system utilization, throughput, etc. and users

metrics such as turnaround time, wait time, etc. The typical charging

model is based on the amount of total resources used (resources

$\times$ runtime) by any job.


3/16

Data mining, the extraction of hidden predictive information

from large databases, is a powerful new technology with great

potential to help companies focus on the most important information in

their data warehouses. Data mining tools predict future trends and

behaviors, allowing businesses to make proactive, knowledge-driven

decisions. The automated, prospective analyses offered by data mining

move beyond the analyses of past events provided by retrospective

tools typical of decision support systems. Data mining tools can

answer business questions that traditionally were too time consuming

to resolve. They scour databases for hidden patterns, finding

predictive information that experts may miss because it lies outside

their expectations.

Most companies already collect and refine massive quantities of

data. Data mining techniques can be implemented rapidly on existing

software and hardware platforms to enhance the value of existing

information resources, and can be integrated with new products and

systems as they are brought on-line. When implemented on high

performance client/server or parallel processing computers, data

mining tools can analyze massive databases to deliver answers to

questions such as, "Which clients are most likely to respond to my

next promotional mailing, and why?"


4/16

Data mining (DM), also called Knowledge-Discovery in

Databases (KDD) or Knowledge-Discovery and Data Mining, is the

process of automatically searching large volumes of data for patterns

using tools such as classification, association rule mining, clustering,

etc.. Data mining is a complex topic and has links with multiple core

fields such as computer science and adds value to rich seminal

computational techniques from statistics, information retrieval,

machine learning and pattern recognition.

Data mining techniques are the result of a long process of research

and product development. This evolution began when business data

was first stored on computers, continued with improvements in data

access, and more recently, generated technologies that allow users to

navigate through their data in real time. Data mining takes this

evolutionary process beyond retrospective data access and navigation

to prospective and proactive information delivery. Data mining is ready

for application in the business community because it is supported by

three technologies that are now sufficiently mature:

o Massive data collection

o Powerful multiprocessor computers

o Data mining algorithms

Commercial databases are growing at unprecedented rates. A recent

META Group survey of data warehouse projects found that 19% of


5/16

respondents are beyond the 50 gigabyte level, while 59% expect to be

there by second quarter of 1996.1 In some industries, such as retail,

these numbers can be much larger. The accompanying need for

improved computational engines can now be met in a cost-effective

manner with parallel multiprocessor computer technology. Data mining

algorithms embody techniques that have existed for at least 10 years,

but have only recently been implemented as mature, reliable,

understandable tools that consistently outperform older statistical

methods.


6/16

Overview of the System

There are mainly two types of scheduling namely the system level

scheduling and the application level scheduling. The scheduling system

will analyze the load situation of every node and select one node to

run the job. The scheduling policy is to optimize the total performance

of the whole system. If the system is heavily loaded, the scheduling

system has to realize the load balancing and increase the throughput

and resource utilization under restricted conditions. This kind of

scheduling is known as the system level scheduling.

If multiple jobs arrive within a unit scheduling time slot, the

scheduling system shall allocate an appropriate number of jobs to

every node in order to finish these jobs under a defined objective.

Obviously, the objective is usually the minimal average execution

time. This scheduling policy is application-oriented so we call it

application-level scheduling.

A genetic algorithm (or GA) is a search technique used in computing

to find true or approximate solutions to optimization and search

problems. Genetic algorithms are categorized as global search

heuristics. Genetic algorithms are a particular class of evolutionary

algorithms that use techniques inspired by evolutionary biology such

as inheritance, mutation, selection, and crossover (also called

recombination).


7/16

Genetic algorithms are implemented as a computer simulation in which

a population of abstract representations (called chromosomes or the

genotype or the genome) of candidate solutions (called individuals,

creatures, or phenotypes) to an optimization problem evolves toward

better solutions. Traditionally, solutions are represented in binary as

strings of 0s and 1s, but other encodings are also possible. The

evolution usually starts from a population of randomly generated

individuals and happens in generations. In each generation, the fitness

of every individual in the population is evaluated, multiple individuals

are stochastically selected from the current population (based on their

fitness), and modified (recombined and possibly mutated) to form a

new population. The new population is then used in the next iteration

of the algorithm. Commonly, the algorithm terminates when either a

maximum number of generations has been produced, or a satisfactory

fitness level has been reached for the population. If the algorithm has

terminated due to a maximum number of generations, a satisfactory

solution may or may not have been reached.

A typical genetic algorithm requires two things to be defined:

1. a genetic representation of the solution domain,

2. a fitness function to evaluate the solution domain.


8/16

A standard representation of the solution is as an array of bits. Arrays

of other types and structures can be used in essentially the same way.

The main property that makes these genetic representations

convenient is that their parts are easily aligned due to their fixed size,

that facilitates simple crossover operation. Variable length

representations may also be used, but crossover implementation is

more complex in this case. Tree-like representations are explored in

Genetic programming and free-form representations are explored in

HBGA.

The fitness function is defined over the genetic representation and

measures the qualityof the represented solution. The fitness function

is always problem dependent. For instance, in the knapsack problem

we want to maximize the total value of objects that we can put in a

knapsack of some fixed capacity. A representation of a solution might

be an array of bits, where each bit represents a different object, and

the value of the bit (0 or 1) represents whether or not the object is in

the knapsack. Not every such representation is valid, as the size of

objects may exceed the capacity of the knapsack. The fitness of the

solution is the sum of values of all objects in the knapsack if the

representation is valid, or 0 otherwise. In some problems, it is hard or

even impossible to define the fitness expression; in these cases,

interactive genetic algorithms are used.


9/16

Once we have the genetic representation and the fitness function

defined, GA proceeds to initialize a population of solutions randomly,

then improve it through repetitive application of mutation, crossover,

and selection operators.


10/16

Abstract

Job scheduling is the key feature of any computing environment

and the efficiency of computing depends largely on the scheduling

technique used. Intelligence is the key factor which is lacking in the

job scheduling techniques of today. Genetic algorithms are powerful

search techniques based on the mechanisms of natural selection and

natural genetics.

Multiple jobs are handled by the scheduler and the resource the

job needs are in remote locations. Here we assume that the resource a

job needs are in a location and not split over nodes and each node that

has a resource runs a fixed number of jobs.

The existing algorithms used are non predictive and employs

greedy based algorithms or a variant of it. The efficiency of the job

scheduling process would increase if previous experience and the

genetic algorithms are used.

In this paper, we propose a model of the scheduling algorithm

where the scheduler can learn from previous experiences and an

effective job scheduling is achieved as time progresses.


11/16

Description of Problem

The similar system is already available are non predictive and employs

greedy based algorithms or a variant of it. That is the existing system

will not predict in advance regarding the situation. So we can not

schedule the jobs in network in such a way that the resources are

utilized at the optimal level. The problem is to reduce the processing

overhead during scheduling.

The proposed system work to data transfer between computers of two

networks. generally,during data transfer between pc's of two different

networks.

Existing Method

The Data mining Algorithms can be categorized into the following

:

Association Algorithm

Classification

Clustering Algorithm

Classification:


12/16

The process of dividing a dataset into mutually exclusive groups

such that the members of each group are as "close" as possible to one

another, and different groups are as "far" as possible from one

another, where distance is measured with respect to specific

variable(s) you are trying to predict. For example, a typical

classification problem is to divide a database of companies into groups

that are as homogeneous as possible with respect to a

creditworthiness variable with values "Good" and "Bad."

Clustering:

The process of dividing a dataset into mutually exclusive groups

such that the members of each group are as "close" as possible to one

another, and different groups are as "far" as possible from one

another, where distance is measured with respect to all available

variables.

Given databases of sufficient size and quality, data mining technology

can generate new business opportunities by providing these

capabilities:

Automated prediction of trends and behaviors. Data mining

automates the process of finding predictive information in large

databases. Questions that traditionally required extensive hands-

on analysis can now be answered directly from the data


13/16

quickly. A typical example of a predictive problem is targeted

marketing. Data mining uses data on past promotional mailings

to identify the targets most likely to maximize return on

investment in future mailings. Other predictive problems include

forecasting bankruptcy and other forms of default, and

identifying segments of a population likely to respond similarly to

given events.

Automated discovery of previously unknown patterns.

Data mining tools sweep through databases and identify

previously hidden patterns in one step. An example of pattern

discovery is the analysis of retail sales data to identify seemingly

unrelated products that are often purchased together. Other

pattern discovery problems include detecting fraudulent credit

card transactions and identifying anomalous data that could

represent data entry keying errors.


14/16

Proposed System

Job scheduling is the key feature of any computing environment

and the efficiency of computing depends largely on the scheduling

technique used. Popular algorithm called genetic concept is used in the

systems across the network and scheduling the job according to

predicting the load.

Here the system will take care of the

scheduling of data packets between the source and destination

computers.

Job scheduling to route the packets at all the ports in the router

Maintaining queue of data packets and scheduling algorithm is

implemented

First Come First Serve scheduling and Genetic algorithm

scheduling is called for source and destination

Comparison of two algorithm is shown in this proposed system


15/16

Hardware specifications:

Processor : Intel Processor IV

RAM : 128 MB

Hard disk : 20 GB

CD drive : 40 x Samsung

Floppy drive : 1.44 MB

Monitor : 15 Samtron color

Keyboard : 108 mercury keyboard

Mouse : Logitech mouse

Software Specification

Operating System Windows XP/2000

Language used J2sdk1.4.0, JCreator


16/16

Module Design

Simulated Model :

The simulated model of network is constructed by keeping

group of computer as Network 0 and Network 1. In between the two

network the router is placed from where the data from one network

flows to other network.

First Come First Serve Algorithm:

The packet transfer between the network in implemented

using FCFS algorithm

Genetic Algorithm:

The packet transfer between the network in implemented

using Genetic algorithm. The algorithm details were discussed in

Proposed system design.

Projecting Result and Comparison:

The data transfer between the network of source and

destination is shown by drawing the path between source and

destination. For drawing the path , the points across the network is

also collected. The comparison of two algorithm result are displayed to

the user in separate frame to see the efficiency of Genetic algorithm

Documents

Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)