GrC_2010_13305576265

Embed Size (px)

Citation preview

  • 8/3/2019 GrC_2010_13305576265

    1/6

    Knowledge Representation and Expert Systems for Mineral Processing using

    Infobright

    Alberto Rui Frutuoso BarrosoNatural Resources Engineering

    School of Engineering

    Laurentian University

    Sudbury, Canada

    ar [email protected]

    Greg BaidenPenguin Automated Systems Inc

    School of Engineering

    Laurentian University

    Sudbury, Canada

    [email protected]

    Julia JohnsonDepartment of Mathematics

    and Computer Science

    Laurentian University

    Sudbury, Canada

    [email protected]

    AbstractOpen source tools for Knowledge Representationin databases and the implementation of a real time expertsystem for mineral processing operations (size reduction andenrichment) are discussed. The use of a column-oriented

    database system (Infobright IEE) to store quantitative datafrom sensors that measure feed size distribution, feed rate,aeration rate, pulp density, pH and temperature allows lowlatency database query responses and real time process controland analysis. Qualitative metadata can be generated with theuse of mathematical process models (simulation outputs, re-duction equations, transforms), and from the natural languageanalysis of process data (reagents and ore mineralogy). Thetoolkits Wordnet and the Natural Language Toolkit (NLTK)were used for metadata generation, processing qualitative textinformation present in process databases, and for generatingdata for subsequent inference engine rule checking. We tookadvantage of the power and ease of the programming languagePython to implement a framework for fuzzy and rough setrules generation, and to create an on-line-analytical-processing(OLAP) system for reporting production process parameters.

    Keywords-Expert Systems; Artificial Intelligence; Infobright;CLIPS; Pyke; Rough Sets; Fuzzy Sets; OLAP cubes

    I. INTRODUCTION

    In a real-time expert system used in mineral processing

    operations (grinding, flotation, cyclone separation), multiple

    types of input data need to be processed and used to control

    equipment on the mill production floor. In grinding and flota-

    tion units we have vibration sensors detecting operational

    malfunctions, high speed video cameras and processing units

    giving dimensional values of rock particles that are exiting

    the grinding circuits, gamma ray devices giving density

    values of the liquid-solid mixtures (slurries), pH sensors,and high speed temperature and pressure measurements

    (pumping circuits, autoclaves, etc). Such are examples of

    quantitative data.

    In contrast, qualitative data are present in local databases,

    such as the characteristics of froth products used, type of

    rods (or balls) used in the grinders, parameters adjusted with

    manual procedures, and operators comments present in the

    form of short natural language (English) texts containing

    domain specific terse abbreviations. There is evidence [1]

    that fuzzy sets decisions help to process uncertain quanti-

    tative data from the equipment sensors, with the options to

    be implemented in a local Programmable Logic Controllers

    (PLC) or in a centralized system using OPC data links.

    In a previous application [2], the qualitative data used

    to populate the expert system, were extracted from local

    databases and transformed and loaded (ETL operations)

    into the knowledge database. In conjunction with [3] we

    conclude that a qualitative knowledge database for mineral

    processing can be established only after removing imprecise

    data emerging from a knowledge acquisition phase, allow-

    ing subsequent efficient searches by the inference engine

    (Figure 1 from [4]). Typically the inference engine will

    parse statements, assign degrees of belief, examine and fire

    rules, use customized search strategies, provide explanations

    and justifications, communicate these to users and external

    programs, and process the problem solving results [4].NLTK and Wordnet are versatile tools to discard re-

    dundant information from the data before storing it in the

    knowledge database. NLTK and Wordnet are discussed here

    as precursors of the Rule-based representation, semantic

    networks and frames that are the three main methods used

    for knowledge representation in intelligent decision support

    systems [5].

    Figure 1. Expert System Components

    2010 IEEE International Conference on Granular Computing

    978-0-7695-4161-7/10 $26.00 2010 IEEE

    DOI 10.1109/GrC.2010.133

    49

  • 8/3/2019 GrC_2010_13305576265

    2/6

    I I . COLUMN BASED DATABASE MANAGEMENT SYSTEMS

    Column based database management systems (DBMS)

    access the database content reading and writing entire

    columns, allowing fast searches in large databases and data

    warehouses. Various approaches to column based database

    management systems have been used in different appli-

    cations, and some solutions have been tuned to specificproblems (Netezza Skimmer [6] ). Hybrid software-hardware

    solutions add the power of a column based architecture, with

    the performance of a SQL query processor implemented in

    a field programmable gate array (FPGA). Kickfire [7] and

    Xtremedata [8] are examples of analytic appliances with

    capabilities up to 10 Peta bytes, and query performance

    improvements up to 100x (10x minimum).

    A. Infobright system

    In contrast with hardware based FPGAs, Infobright [9]

    is a high performance analytic software system designed

    to handle specific queries on large data sets. Infobrighttechnology [10] combines a column-oriented database with

    a Knowledge Grid architecture [11] to deliver low waiting

    times in data analysis. The data are partitioned and physical

    data structures built with a self-managing structure that

    eliminates the need for standard database indexes. Infobright

    provides scalability with solutions up to 50 Tera bytes using

    a single server, and the 10:1 (up to 40:1) data compression

    allowing a significant reduction of the storage media of the

    database while delivering rapid response to complex queries.

    The APIs supported by Infobright are extensive and

    among them a mineral processing engineer will surely find

    one that he/she prefers. The APIs include: C, C++, C#,

    Borland Delphi (via dbExpress), Eiffel, SmallTalk, Java(with a native Java driver implementation), Lisp, Perl, PHP,

    Python, Ruby, REALbasic, FreeBasic, and Tcl. Infobright

    supports ANSI SQL-92 with some SQL-99 extensions,

    standard database interfaces (including ODBC, JDBC and

    native connections), 500 database users with up to 32

    concurrent queries (depending on number of CPU cores and

    amount of memory), and admits a variety of schema designs.

    Two versions are available: a GPL2-licensed, open source

    Community Edition (ICE), and a commercially licensed

    Enterprise Edition (IEE).

    ICE, being a self-contained system on PCs was convenient

    to use for initial implementation and testing, but deficiencies

    regarding limited data types were observed for the appli-cation at hand. A complete comparison matrix has been

    provided on the IEE/ICE site. In summary, it is more fast

    to migraate a mySQL database to IEE, than ICE.

    Infobright Enterprise Edition (IEE) was obtained for use

    in this project and others at Laurentian University on the

    basis of a special academic promotional offer. It was in-

    stalled on an 8-core, 8 Giga byte RAM server running the

    Debian Linux operating system. IEEs MySQL pluggable

    storage engine architecture allowed the database server to

    be accessed using an SQL client running on Windows.

    III. EXTRACT, TRANSFORM AND LOAD TOOLS

    Extract, transform, and load (ETL) are functions used

    in the population of databases, and typically consists in

    Extracting data from outside sources (files or OPC servers),

    transforming (deleting, filtering, etc), and loading it into the

    target database [10].

    A. How will ETL be used for mineral processing data?

    ETL operations are needed for equipment operator com-

    ments on error acknowledgments and for log files generated

    in programmable logic controllers PLC systems. The PLC

    that controls a mine hoist or a conveyor, transporting the feed

    (ore) to the mill is one good example of log data generation

    of parameters, in this case the transported weight, velocity,

    number of trips, downtime, MTBF, etc.

    B. AWK scripts

    AWK is a programming language that is designed forprocessing text-based data, either in files or data streams,

    and was created at Bell Labs in the 1970s. The name AWK

    is derived from the family names of its authors Alfred Aho,

    Peter Wgeinberger, and Brian Kernighan. Awk scripts find

    use for mineral processing data to extract numbers and

    values from text log files generated in production equipment.

    C. Palo ETL server

    Key issues of Palo are ETL and cube technology. The

    open source Palo ETL Server 3.0 [12] is an Extract,

    Transform and Load software designed for importing and

    exporting large quantities of data to/from Palo databases.

    Data are extracted from heterogeneous sources and masterand transaction data are transformed and loaded into Palo

    models illustrated by the vertical arrows in figure 2. The

    Palo ETL Server allows automatic data imports. Established

    relational databases can be connected as data sources via

    a standardized interface. This includes Infobright databases

    used in the expert system framework described pictorially

    shortly.

    Complex transformations and aggregations, for example,

    models for cyclone circuits and grinder loops can be rep-

    resented within a Palo model. Palo ETL Server 3.0 can

    be operated both from the command line level and, more

    conveniently, using the ETL web client. Palo uses online

    analytical processing cube technology for its data structure.Manipulations and analyzes of data may be executed from

    multiple perspectives. The arrangement of data into cubes

    overcomes a limitation of relational databases that makes

    them unsuitable for near instantaneous analysis and display.

    It is proposed to couple advantages of column-oriented

    relational databases regarding compression and the resulting

    speed enhancements with modeling of instantaneous phe-

    nomena found in mineral processing applications afforded

    50

  • 8/3/2019 GrC_2010_13305576265

    3/6

    by Palo cube technology for transforming between relational

    and OLAP databases.

    Figure 2. Jedox Palo ETL Server Architecture [12]

    IV. METADATA GENERATION

    We are showing software to implement in complex min-

    eral processing projects, which have the goal of maximum

    production efficiency. The use of an efficient expert system

    will reduce the time to execute optimizations cycles. The

    magnitude of the project dictates that some areas are not

    developed and metadata generation is one of them. It suffices

    here to provide one example of metadata generation from a

    previous project.

    The Dublin Core metadata element set - ISO 15836:2009(ISO, 2009) is an example of the need to create a well

    structured metadata. The Simple Dublin Core Metadata

    Element Set (DCMES) consists of 15 metadata elements

    (Title, Creator, Subject, Description, Publisher, Contributor,

    Date, Type, Format, Identifier, Source, Language, Relation,

    Coverage, Rights).

    A. DCMES used for mineral processing metadata

    Only at the highest level (level 1) of DCMES standard

    can we normalize process metadata. The layered architecture

    is diagrammed at http://dublincore.org/metadata-basics/. At

    Level 1, interoperability among applications sharing meta-

    data is based on a shared vocabulary. Participants withinan application environment agree upon the terms to use

    in their metadata and on how those terms are defined.

    Interoperability with the rest of the world outside of the

    implementation environment is generally not a priority. Most

    existing metadata applications operate at level 1. When

    metadata is automatically generated from raw data present

    in log servers, compliance with level 1 of the DCMES

    architecture is under consideration.

    Generating metadata from qualitative noisy data, compatible

    with the Dublin core qualifiers, requires a powerful language

    and toolboxes to achieve that objective. Python comes with

    complete implementations of Wordnet and NLTK toolboxes

    and hence provides an excellent replacement to AWK scripts

    in the task of text extraction.

    B. Wordnet and NLTK code example

    The following code is an example of word synonymsearch using Wordnet and Python. The objective is totransform the text to a level 1 compliant form.

    ## ExpertS.py ##

    import Tkinter

    import nltk

    import MySQLdb

    from nltk.book import *from MySQLdb import Connect

    def GenMeta(palavras):

    from nltk.corpus import wordnet as wn

    conta=0

    while conta < palavras: #len(words):

    conta = conta + 1for synset in wn.synsets(words[conta]):

    print synset.definition

    palavras=10

    conn = Connect(host=localhost ,

    user=root , passwd=123456)

    C. Fields trim with AWK

    PERL and Python are other examples of powerful text

    processing facilities, but the simplicity of AWK as a Turing-

    complete programming language, allows creating lean code

    to manipulate text-based data and feed it to our real time

    processing expert system. The following code is a functionavailable in the ETL Datamelt Tootkit. The objective is to

    trim (eg spaces or special characters) from a raw text file.

    The code is self documenting with comments internal to it.

    #!/usr/bin/awk -f

    # [email protected]

    # http://datamelt.com

    # function to remove blanks on both sides of the string

    function trim(value)

    {

    sub(/ */,"",value)

    sub(/ *$/,"",value)

    return value

    }

    # begin of processing

    BEGIN {

    # setting the files field separatorFS=";";

    OFS=";";

    }

    {

    for(i=1;i

  • 8/3/2019 GrC_2010_13305576265

    4/6

    V. BLENDING INFOBRIGHT WITH OPC SERVERS AND

    DATA MANAGEMENT SYSTEMS

    The two essential tasks to be done in a database migration

    of any sort are: First, export the data from the original

    source database and, second, import the data into the target

    database. The syntax for Infobright

    MySQL export command is SELECT INTO OUTFILEFrom . . . WHERE . . ..

    The Infobright analytical engine has differences when

    compared with a standard MySQL DBMS:

    Declaration of storage engine type

    Lack of need for indices or partition schemes

    Lack of referential integrity checks

    Removal of constraints

    Minor data type differences (ICE, 2010)

    Supported character sets and collations

    A. Integration with OPC client-server systems

    Mineral processing function is typically automated with

    programmable logic controllers (PLC), to read analog anddiscrete values from the sensors wired to the I/Os. OPC

    servers (Figure 3) can be used to map those values to

    MySQL databases for post-processing [13].

    Figure 3. Dataporter CommServer [14]

    B. Integration with PI systems

    Invensys Process Engineering Suite (PES), Wonderware,

    and Osisoft PI systems (figure 4) are examples of datamanagement systems, used in mineral processing operations

    (Xstrata, ValeInco).

    Xstrata is a major global diversified mining group. Xs-

    trata Nickel, Sudbury Operation has approximately 900

    employees and produces Nickel and copper smelter products.

    The 2008 Sudbury Smelter annual production rates were

    64,906 tons nickel-in-concentrate, 17,811 tons copper-in-

    concentrate and 2,698 tons cobalt-in-concentrate.

    Key data items, to name just a few, among those mentioned

    in Xtratas 2009 Regional, Divisional and Site Sustainability

    Reports (published April 2010) follow (their units of mea-

    sure are parenthesized):

    Environmental indicators

    Direct energy use (PJ), Total energy use (PJ),

    Total water use (ML) Direct and total greenhouse gas emissions (both mea-

    sured in CO2 equivalent million tons)

    Sulphur dioxide stack emissions (tons)

    Oxides of nitrogen stack emissions (tons)

    Total recycling and reuse of water (ML)

    Land disturbed (hectares)

    Land rehabilitated (hectares)

    Production indicators and their units of measure

    Ferrochrome (kt)

    Vanadium pentoxide (k lbs)

    Ferrovanadium (k kg)

    Thermal coal (mt)

    Coking coal (mt)

    Semi-soft coking (mt)

    Total coal (mt)

    Total mined copper (contained metal) (kt)

    Total mined gold (contained metal) (koz)

    Nickel (kt)

    Ferronickel (kt)

    Cobalt (kt)

    Zinc in concentrate production (kt)

    Zinc metal production (kt)

    Lead in concentrate production (kt)

    Lead metal production (kt)

    Such measured quantities are related in different ways toother items of interest, for example:

    Indirect energy consumption by primary source

    Energy saved due to conservation and efficiency im-

    provements

    NOx, SOx, and other significant air emissions by type

    and weight

    Total water discharge by quality and destination

    Total weight of waste by type and disposal method

    Total number and volume of significant spills

    Weight of transported, imported, exported, or treated

    hazardous waste

    Identity, size, protected status, and biodiversity value of

    water bodies and related habitats significantly affectedby discharges of water and runoff

    Extent of impact of initiatives to mitigate environmental

    impacts of products and services

    Percentage of products sold and their packaging mate-

    rials that are reclaimed by category

    Value and number of significant fines and non-monetary

    sanctions for non-compliance with environmental laws

    and regulations

    52

  • 8/3/2019 GrC_2010_13305576265

    5/6

    Extent of environmental impacts of transporting prod-

    ucts and other goods and materials

    Total amount of land owned, leased, and managed

    for production activities or extractive use; total land

    distributed, total land rehabilitated

    The number/percentage of sites identified as requiring

    biodiversity management plans, and with plans in place Percentage of product(s) derived from secondary mate-

    rials

    The Xstrata local operation uses PI systems for oper-

    ational, event, and real-time data management recording

    quantities that eventually feed into national and global

    reports by extracting data from sensors positioned in produc-

    tion operations. However, users of PI systems require an aid

    to help them find and evaluate specific data values emitted

    from sensors and the relationships among their data types.

    Sensors may either produce or consume data sometimes

    switching roles in response to perceived (consumed) inputs

    from its environment. It is useful to view sensors within

    a consumer/producer paradigm because the large body ofresearch into data mining in commercial and business

    applications can be brought to bear upon the staggering

    knowledge management needs of a mineral processing plant.

    Figure 4. Osisoft PI systems [15]

    Recommender systems connect users with items to con-

    sume (purchase, view, listen to, etc.) by associating the

    content of recommended items or the opinions of otherindividuals with the consuming users actions or opinions

    [16]. A sensor in its role as consumer expresses an interest

    in data from its environment either through its perceptual

    instrument or by data received from other sensors. Data

    items from other sensors that might be of interest to a

    given sensor (the consumer) are recommended based on

    the sensors on a site that have the most traffic, on certain

    characteristics of the consumer (eg. strength of its signal), or

    on a historical analysis of the past behavior of the consumer

    as a prediction for future producer/consumer interactions.

    V I . OPEN SOURCE EXPERT SYSTEMS

    Expert systems can be implemented from scratch using

    high level programming languages (Python, Lua, Ruby) and

    specialized modules for the inference engines (PyFuzzyLib,Pyke). A more desireable approach for Engineers is to use a

    ready to populate open source expert system, two possibilites

    of which are described in the remainder of this section.

    A. CLIPS - C Language Integrated Production System

    The first versions of CLIPS [17] were developed at

    NASA-Johnson Space Center in 1984, trying to eliminate

    the problems of the LISP language. Nowadays CLIPS is

    a public domain software tool to develop expert systems

    that supports three different programming paradigms: rule-

    based, object-oriented and procedural. CLIPS is written in

    C for portability and speed, interfaces with Python, with

    procedural programming capabilities provided by CLIPS aresimilar to capabilities found in languages such as C, Java,

    Ada, and LISP.

    B. D3Web Knowledge System

    The d3web system [18] is a Java-based prototyping and

    development toolkit for distributed knowledge systems. It in-

    cludes a knowledge modelling environment tool (KnowME)

    and a visual knowledge acquisition tool and an evaluation

    & management tool.

    D3web offers various problem-solving methods including:

    categorical and heuristic rules

    decision trees and decision tables

    set-covering models case-based reasoning

    VII. TYING IT ALL TOGETHER

    The architecture for a real time expert system in a

    mineral processing plant is illustrated in figure 5. The im-

    plementation requires an OPC server that translates different

    PLC protocols (Modbus, Profibus, CAN, DeviceNet, etc) to

    standard TCP-IP socket connections. The OPC server (figure

    4) will populate the Infobright IEE database as a MySQL

    compatible database. A list of freely available OPC servers

    is given in [13].

    The OPC server (orange box) sends and receives digital

    and analog signals to PLC and DCS systems. The OCPInfobright connection (between orange and blue boxes)

    is the recommended scenario for a typical mineral pro-

    cessing automation system. In this connection we have

    Human Machine Interfaces (HMI), Distributed Computer

    Systems (DCS) with supervisory control and data acquisition

    (SCADA) systems. A customized ETL is required for each

    of these distinct cases. The tools described [13] are either

    pre-built or supplied as ready to build source code, or both.

    53

  • 8/3/2019 GrC_2010_13305576265

    6/6

    Some are evaluation versions and some are downloadable

    from the Web.

    The grey boxes represent the Expert System that can be

    implemented using Clips, Python (for connection with the

    database) and Pyke (for the inference engine), or D3Web.

    Light yellow illustrates a typical implementation of report

    generation and OLAP cubes creation. Manual and automaticprocess control allowing the test and implementation of

    distinct control strategies (stochastic, heuristic, deterministic,

    Monte Carlo methods) are placed in the human machine

    interface block.

    In future work the ETL operations for qualitative data

    will be implemented using Python toolboxes, while the

    quantitative data will be processed using normal algorithmic

    calculus implementations.

    Figure 5. Real Time Mineral Processing Expert System

    VIII. CONCLUSION

    In the current mineral processing plants in Northern

    Ontario, Canada (eg., producing copper/nickel matte from

    sulphidic ores), the level of automation is increasing due to

    increased demands for productivity and efficiency as well asfor the need of compliance regarding environment factors.

    Hundreds of analog and discrete signals given by sensors

    and control systems are stored in databases either directly

    or using real-time data management infrastructures [15]. The

    implementation of a real time expert system to make use

    of such data requires low latency database read cycles, and

    column oriented databases tuned for performance and single

    variable analysis. Most automation communications systems

    can be integrated with an OPC server to send and receive

    data from central or distributed systems. The divide and

    conquer strategy implemented in automation systems with

    local PID and fuzzy logic control makes way for distributed

    artificial intelligence through remote expert systems.

    This paper has addressed mineral processing data needs

    with a focus on the application of data mining and datawarehousing techniques. We have provided a framework

    in which a variety of software are put together to support

    the development and implementation of a real time expert

    system in a mineral processing plant. The software are

    CLIPS, D3web, and Pyke, freely downloadable or open

    source available for purchase. It was found that the software

    for ETL (extract, transform, and load) showed variability

    among applications requiring the evaluation and selection

    from a variety of available products. Those products have

    been enumerated. Additionally, the parameters by which the

    ETL products should be evaluated have been listed.

    REFERENCES

    [1] R. K. Brouwer, Fuzzy rule extraction from feed forwardneural network by training a representative fuzzy neuralnetwork using gradient descent, International Journal ofUncertainty, pp. 673698, December 2005.

    [2] J. Johnson and G. Johnson, Infobright for analyzing socialsciences data, Comm. Computer and Information Science:

    Database Theory and Applications, pp. 9098, March 2009.

    [3] P. Vaillancourt and J. Johnson, Monitoring network awaresensors using BACnet, IJCNS Int. Journal of ComputerScience and Network Security, pp. 1523, November 2006.

    [4] T. Yalcin, Advanced mineral processing, LU-ENGR5207,pp. 144161, September 2007.

    [5] EUNITE roadmap, http://www.eunite.org/eunite/index.htm.

    [6] Netezza, Analytic appliancehttp://www.netezza.com.[7] Kickfire, Kickfires SQL chip,http://www.kickfire.com.

    [8] XtremeData, Sql in silicon,http://www.xtremedata.com.

    [9] Infobright, Open source data warehousing http://www.infobright.com.

    [10] D. Slezak, J. Wroblewski, V. Eastwood, and P. Synak,Bright-house: An analytic data warehouse for ad-hocqueries, PVLDB, pp. 13371345, June 2008.

    [11] D. Slezak and M. Kowalski, Intelligent data granulation onload: Improving infobrights knowledge grid, Lecture Notesin Computer Science, vol. 5899, pp. 1225, 2009.

    [12] Jedox Plan Analyse Report,http://www.jedox.com, 2010.

    [13] OPCconnect, http://www.opcconnect.com/freesrv.php.

    [14] COMMserver, Opc servers powered by commserve, http:

    //www.commsvr.com/Products/OPCServer.aspx.

    [15] OSIsoft PI system http://www.osisoft.com, April 2010.

    [16] J. B. Schafer, The application of data mining to recom-mender systems, Encyclopedia of Data Warehousing and

    Mining, pp. 4448, Mar. 2006.

    [17] CLIPS, A tool for building expert systems, http://www.clipsrules.sourceforge.net, April 2010.

    [18] D3Web Knowledge Systems, http://www.d3web.sourceforge.net, April 2010.

    54