Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

Embed Size (px)

Citation preview

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    1/16

    Predictive Job Scheduling in a Connection Limited

    System using Parallel Genetic Algorithm

    (Synopsis)

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    2/16

    INTRODUCTION

    Most job-scheduling approaches for parallel machines apply

    space sharing which

    means allocating CPUs/nodes to jobs in a dedicated manner and

    sharing the machine

    among multiple jobs by allocation on different subsets of nodes. Some

    approaches

    apply time sharing (or better to say a combination of time and space

    sharing), i.e. use

    multiple time slices per CPU/node. Job scheduling determines when

    and where to execute the job, given a stream of parallel jobs and set

    of computing resources. In a standard working model, when a parallel

    job arrives to the system, the scheduler tries to allocate required

    number of processors for the duration of runtime to the job and, if

    available, starts the job immediately. If the requested processors are

    currently unavailable, the job is queued and scheduled to start at a

    later time. The most common metrics evaluated include system

    metrics such as the system utilization, throughput, etc. and users

    metrics such as turnaround time, wait time, etc. The typical charging

    model is based on the amount of total resources used (resources

    $\times$ runtime) by any job.

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    3/16

    Data mining, the extraction of hidden predictive information

    from large databases, is a powerful new technology with great

    potential to help companies focus on the most important information in

    their data warehouses. Data mining tools predict future trends and

    behaviors, allowing businesses to make proactive, knowledge-driven

    decisions. The automated, prospective analyses offered by data mining

    move beyond the analyses of past events provided by retrospective

    tools typical of decision support systems. Data mining tools can

    answer business questions that traditionally were too time consuming

    to resolve. They scour databases for hidden patterns, finding

    predictive information that experts may miss because it lies outside

    their expectations.

    Most companies already collect and refine massive quantities of

    data. Data mining techniques can be implemented rapidly on existing

    software and hardware platforms to enhance the value of existing

    information resources, and can be integrated with new products and

    systems as they are brought on-line. When implemented on high

    performance client/server or parallel processing computers, data

    mining tools can analyze massive databases to deliver answers to

    questions such as, "Which clients are most likely to respond to my

    next promotional mailing, and why?"

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    4/16

    Data mining (DM), also called Knowledge-Discovery in

    Databases (KDD) or Knowledge-Discovery and Data Mining, is the

    process of automatically searching large volumes of data for patterns

    using tools such as classification, association rule mining, clustering,

    etc.. Data mining is a complex topic and has links with multiple core

    fields such as computer science and adds value to rich seminal

    computational techniques from statistics, information retrieval,

    machine learning and pattern recognition.

    Data mining techniques are the result of a long process of research

    and product development. This evolution began when business data

    was first stored on computers, continued with improvements in data

    access, and more recently, generated technologies that allow users to

    navigate through their data in real time. Data mining takes this

    evolutionary process beyond retrospective data access and navigation

    to prospective and proactive information delivery. Data mining is ready

    for application in the business community because it is supported by

    three technologies that are now sufficiently mature:

    o Massive data collection

    o Powerful multiprocessor computers

    o Data mining algorithms

    Commercial databases are growing at unprecedented rates. A recent

    META Group survey of data warehouse projects found that 19% of

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    5/16

    respondents are beyond the 50 gigabyte level, while 59% expect to be

    there by second quarter of 1996.1 In some industries, such as retail,

    these numbers can be much larger. The accompanying need for

    improved computational engines can now be met in a cost-effective

    manner with parallel multiprocessor computer technology. Data mining

    algorithms embody techniques that have existed for at least 10 years,

    but have only recently been implemented as mature, reliable,

    understandable tools that consistently outperform older statistical

    methods.

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    6/16

    Overview of the System

    There are mainly two types of scheduling namely the system level

    scheduling and the application level scheduling. The scheduling system

    will analyze the load situation of every node and select one node to

    run the job. The scheduling policy is to optimize the total performance

    of the whole system. If the system is heavily loaded, the scheduling

    system has to realize the load balancing and increase the throughput

    and resource utilization under restricted conditions. This kind of

    scheduling is known as the system level scheduling.

    If multiple jobs arrive within a unit scheduling time slot, the

    scheduling system shall allocate an appropriate number of jobs to

    every node in order to finish these jobs under a defined objective.

    Obviously, the objective is usually the minimal average execution

    time. This scheduling policy is application-oriented so we call it

    application-level scheduling.

    A genetic algorithm (or GA) is a search technique used in computing

    to find true or approximate solutions to optimization and search

    problems. Genetic algorithms are categorized as global search

    heuristics. Genetic algorithms are a particular class of evolutionary

    algorithms that use techniques inspired by evolutionary biology such

    as inheritance, mutation, selection, and crossover (also called

    recombination).

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    7/16

    Genetic algorithms are implemented as a computer simulation in which

    a population of abstract representations (called chromosomes or the

    genotype or the genome) of candidate solutions (called individuals,

    creatures, or phenotypes) to an optimization problem evolves toward

    better solutions. Traditionally, solutions are represented in binary as

    strings of 0s and 1s, but other encodings are also possible. The

    evolution usually starts from a population of randomly generated

    individuals and happens in generations. In each generation, the fitness

    of every individual in the population is evaluated, multiple individuals

    are stochastically selected from the current population (based on their

    fitness), and modified (recombined and possibly mutated) to form a

    new population. The new population is then used in the next iteration

    of the algorithm. Commonly, the algorithm terminates when either a

    maximum number of generations has been produced, or a satisfactory

    fitness level has been reached for the population. If the algorithm has

    terminated due to a maximum number of generations, a satisfactory

    solution may or may not have been reached.

    A typical genetic algorithm requires two things to be defined:

    1. a genetic representation of the solution domain,

    2. a fitness function to evaluate the solution domain.

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    8/16

    A standard representation of the solution is as an array of bits. Arrays

    of other types and structures can be used in essentially the same way.

    The main property that makes these genetic representations

    convenient is that their parts are easily aligned due to their fixed size,

    that facilitates simple crossover operation. Variable length

    representations may also be used, but crossover implementation is

    more complex in this case. Tree-like representations are explored in

    Genetic programming and free-form representations are explored in

    HBGA.

    The fitness function is defined over the genetic representation and

    measures the qualityof the represented solution. The fitness function

    is always problem dependent. For instance, in the knapsack problem

    we want to maximize the total value of objects that we can put in a

    knapsack of some fixed capacity. A representation of a solution might

    be an array of bits, where each bit represents a different object, and

    the value of the bit (0 or 1) represents whether or not the object is in

    the knapsack. Not every such representation is valid, as the size of

    objects may exceed the capacity of the knapsack. The fitness of the

    solution is the sum of values of all objects in the knapsack if the

    representation is valid, or 0 otherwise. In some problems, it is hard or

    even impossible to define the fitness expression; in these cases,

    interactive genetic algorithms are used.

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    9/16

    Once we have the genetic representation and the fitness function

    defined, GA proceeds to initialize a population of solutions randomly,

    then improve it through repetitive application of mutation, crossover,

    and selection operators.

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    10/16

    Abstract

    Job scheduling is the key feature of any computing environment

    and the efficiency of computing depends largely on the scheduling

    technique used. Intelligence is the key factor which is lacking in the

    job scheduling techniques of today. Genetic algorithms are powerful

    search techniques based on the mechanisms of natural selection and

    natural genetics.

    Multiple jobs are handled by the scheduler and the resource the

    job needs are in remote locations. Here we assume that the resource a

    job needs are in a location and not split over nodes and each node that

    has a resource runs a fixed number of jobs.

    The existing algorithms used are non predictive and employs

    greedy based algorithms or a variant of it. The efficiency of the job

    scheduling process would increase if previous experience and the

    genetic algorithms are used.

    In this paper, we propose a model of the scheduling algorithm

    where the scheduler can learn from previous experiences and an

    effective job scheduling is achieved as time progresses.

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    11/16

    Description of Problem

    The similar system is already available are non predictive and employs

    greedy based algorithms or a variant of it. That is the existing system

    will not predict in advance regarding the situation. So we can not

    schedule the jobs in network in such a way that the resources are

    utilized at the optimal level. The problem is to reduce the processing

    overhead during scheduling.

    The proposed system work to data transfer between computers of two

    networks. generally,during data transfer between pc's of two different

    networks.

    Existing Method

    The Data mining Algorithms can be categorized into the following

    :

    Association Algorithm

    Classification

    Clustering Algorithm

    Classification:

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    12/16

    The process of dividing a dataset into mutually exclusive groups

    such that the members of each group are as "close" as possible to one

    another, and different groups are as "far" as possible from one

    another, where distance is measured with respect to specific

    variable(s) you are trying to predict. For example, a typical

    classification problem is to divide a database of companies into groups

    that are as homogeneous as possible with respect to a

    creditworthiness variable with values "Good" and "Bad."

    Clustering:

    The process of dividing a dataset into mutually exclusive groups

    such that the members of each group are as "close" as possible to one

    another, and different groups are as "far" as possible from one

    another, where distance is measured with respect to all available

    variables.

    Given databases of sufficient size and quality, data mining technology

    can generate new business opportunities by providing these

    capabilities:

    Automated prediction of trends and behaviors. Data mining

    automates the process of finding predictive information in large

    databases. Questions that traditionally required extensive hands-

    on analysis can now be answered directly from the data

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    13/16

    quickly. A typical example of a predictive problem is targeted

    marketing. Data mining uses data on past promotional mailings

    to identify the targets most likely to maximize return on

    investment in future mailings. Other predictive problems include

    forecasting bankruptcy and other forms of default, and

    identifying segments of a population likely to respond similarly to

    given events.

    Automated discovery of previously unknown patterns.

    Data mining tools sweep through databases and identify

    previously hidden patterns in one step. An example of pattern

    discovery is the analysis of retail sales data to identify seemingly

    unrelated products that are often purchased together. Other

    pattern discovery problems include detecting fraudulent credit

    card transactions and identifying anomalous data that could

    represent data entry keying errors.

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    14/16

    Proposed System

    Job scheduling is the key feature of any computing environment

    and the efficiency of computing depends largely on the scheduling

    technique used. Popular algorithm called genetic concept is used in the

    systems across the network and scheduling the job according to

    predicting the load.

    Here the system will take care of the

    scheduling of data packets between the source and destination

    computers.

    Job scheduling to route the packets at all the ports in the router

    Maintaining queue of data packets and scheduling algorithm is

    implemented

    First Come First Serve scheduling and Genetic algorithm

    scheduling is called for source and destination

    Comparison of two algorithm is shown in this proposed system

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    15/16

    Hardware specifications:

    Processor : Intel Processor IV

    RAM : 128 MB

    Hard disk : 20 GB

    CD drive : 40 x Samsung

    Floppy drive : 1.44 MB

    Monitor : 15 Samtron color

    Keyboard : 108 mercury keyboard

    Mouse : Logitech mouse

    Software Specification

    Operating System Windows XP/2000

    Language used J2sdk1.4.0, JCreator

  • 7/27/2019 Predictive Job Scheduling in a Connection Limited System Using Parallel Genetic Algorithm (Synopsis)

    16/16

    Module Design

    Simulated Model :

    The simulated model of network is constructed by keeping

    group of computer as Network 0 and Network 1. In between the two

    network the router is placed from where the data from one network

    flows to other network.

    First Come First Serve Algorithm:

    The packet transfer between the network in implemented

    using FCFS algorithm

    Genetic Algorithm:

    The packet transfer between the network in implemented

    using Genetic algorithm. The algorithm details were discussed in

    Proposed system design.

    Projecting Result and Comparison:

    The data transfer between the network of source and

    destination is shown by drawing the path between source and

    destination. For drawing the path , the points across the network is

    also collected. The comparison of two algorithm result are displayed to

    the user in separate frame to see the efficiency of Genetic algorithm