GrC_2010_13305576265

8/3/2019 GrC_2010_13305576265

1/6

Knowledge Representation and Expert Systems for Mineral Processing using

Infobright

Alberto Rui Frutuoso BarrosoNatural Resources Engineering

School of Engineering

Laurentian University

Sudbury, Canada

ar [email protected]

Greg BaidenPenguin Automated Systems Inc

School of Engineering


Sudbury, Canada

[email protected]

Julia JohnsonDepartment of Mathematics

and Computer Science


Sudbury, Canada

[email protected]

AbstractOpen source tools for Knowledge Representationin databases and the implementation of a real time expertsystem for mineral processing operations (size reduction andenrichment) are discussed. The use of a column-oriented

database system (Infobright IEE) to store quantitative datafrom sensors that measure feed size distribution, feed rate,aeration rate, pulp density, pH and temperature allows lowlatency database query responses and real time process controland analysis. Qualitative metadata can be generated with theuse of mathematical process models (simulation outputs, re-duction equations, transforms), and from the natural languageanalysis of process data (reagents and ore mineralogy). Thetoolkits Wordnet and the Natural Language Toolkit (NLTK)were used for metadata generation, processing qualitative textinformation present in process databases, and for generatingdata for subsequent inference engine rule checking. We tookadvantage of the power and ease of the programming languagePython to implement a framework for fuzzy and rough setrules generation, and to create an on-line-analytical-processing(OLAP) system for reporting production process parameters.

Keywords-Expert Systems; Artificial Intelligence; Infobright;CLIPS; Pyke; Rough Sets; Fuzzy Sets; OLAP cubes

I. INTRODUCTION

In a real-time expert system used in mineral processing

operations (grinding, flotation, cyclone separation), multiple

types of input data need to be processed and used to control

equipment on the mill production floor. In grinding and flota-

tion units we have vibration sensors detecting operational

malfunctions, high speed video cameras and processing units

giving dimensional values of rock particles that are exiting

the grinding circuits, gamma ray devices giving density

values of the liquid-solid mixtures (slurries), pH sensors,and high speed temperature and pressure measurements

(pumping circuits, autoclaves, etc). Such are examples of

quantitative data.

In contrast, qualitative data are present in local databases,

such as the characteristics of froth products used, type of

rods (or balls) used in the grinders, parameters adjusted with

manual procedures, and operators comments present in the

form of short natural language (English) texts containing

domain specific terse abbreviations. There is evidence [1]

that fuzzy sets decisions help to process uncertain quanti-

tative data from the equipment sensors, with the options to

be implemented in a local Programmable Logic Controllers

(PLC) or in a centralized system using OPC data links.

In a previous application [2], the qualitative data used

to populate the expert system, were extracted from local

databases and transformed and loaded (ETL operations)

into the knowledge database. In conjunction with [3] we

conclude that a qualitative knowledge database for mineral

processing can be established only after removing imprecise

data emerging from a knowledge acquisition phase, allow-

ing subsequent efficient searches by the inference engine

(Figure 1 from [4]). Typically the inference engine will

parse statements, assign degrees of belief, examine and fire

rules, use customized search strategies, provide explanations

and justifications, communicate these to users and external

programs, and process the problem solving results [4].NLTK and Wordnet are versatile tools to discard re-

dundant information from the data before storing it in the

knowledge database. NLTK and Wordnet are discussed here

as precursors of the Rule-based representation, semantic

networks and frames that are the three main methods used

for knowledge representation in intelligent decision support

systems [5].

Figure 1. Expert System Components

2010 IEEE International Conference on Granular Computing

978-0-7695-4161-7/10 $26.00 2010 IEEE

DOI 10.1109/GrC.2010.133

49

8/3/2019 GrC_2010_13305576265

2/6

I I . COLUMN BASED DATABASE MANAGEMENT SYSTEMS

Column based database management systems (DBMS)

access the database content reading and writing entire

columns, allowing fast searches in large databases and data

warehouses. Various approaches to column based database

management systems have been used in different appli-

cations, and some solutions have been tuned to specificproblems (Netezza Skimmer [6] ). Hybrid software-hardware

solutions add the power of a column based architecture, with

the performance of a SQL query processor implemented in

a field programmable gate array (FPGA). Kickfire [7] and

Xtremedata [8] are examples of analytic appliances with

capabilities up to 10 Peta bytes, and query performance

improvements up to 100x (10x minimum).

A. Infobright system

In contrast with hardware based FPGAs, Infobright [9]

is a high performance analytic software system designed

to handle specific queries on large data sets. Infobrighttechnology [10] combines a column-oriented database with

a Knowledge Grid architecture [11] to deliver low waiting

times in data analysis. The data are partitioned and physical

data structures built with a self-managing structure that

eliminates the need for standard database indexes. Infobright

provides scalability with solutions up to 50 Tera bytes using

a single server, and the 10:1 (up to 40:1) data compression

allowing a significant reduction of the storage media of the

database while delivering rapid response to complex queries.

The APIs supported by Infobright are extensive and

among them a mineral processing engineer will surely find

one that he/she prefers. The APIs include: C, C++, C#,

Borland Delphi (via dbExpress), Eiffel, SmallTalk, Java(with a native Java driver implementation), Lisp, Perl, PHP,

Python, Ruby, REALbasic, FreeBasic, and Tcl. Infobright

supports ANSI SQL-92 with some SQL-99 extensions,

standard database interfaces (including ODBC, JDBC and

native connections), 500 database users with up to 32

concurrent queries (depending on number of CPU cores and

amount of memory), and admits a variety of schema designs.

Two versions are available: a GPL2-licensed, open source

Community Edition (ICE), and a commercially licensed

Enterprise Edition (IEE).

ICE, being a self-contained system on PCs was convenient

to use for initial implementation and testing, but deficiencies

regarding limited data types were observed for the appli-cation at hand. A complete comparison matrix has been

provided on the IEE/ICE site. In summary, it is more fast

to migraate a mySQL database to IEE, than ICE.

Infobright Enterprise Edition (IEE) was obtained for use

in this project and others at Laurentian University on the

basis of a special academic promotional offer. It was in-

stalled on an 8-core, 8 Giga byte RAM server running the

Debian Linux operating system. IEEs MySQL pluggable

storage engine architecture allowed the database server to

be accessed using an SQL client running on Windows.

III. EXTRACT, TRANSFORM AND LOAD TOOLS

Extract, transform, and load (ETL) are functions used

in the population of databases, and typically consists in

Extracting data from outside sources (files or OPC servers),

transforming (deleting, filtering, etc), and loading it into the

target database [10].

A. How will ETL be used for mineral processing data?

ETL operations are needed for equipment operator com-

ments on error acknowledgments and for log files generated

in programmable logic controllers PLC systems. The PLC

that controls a mine hoist or a conveyor, transporting the feed

(ore) to the mill is one good example of log data generation

of parameters, in this case the transported weight, velocity,

number of trips, downtime, MTBF, etc.

B. AWK scripts

AWK is a programming language that is designed forprocessing text-based data, either in files or data streams,

and was created at Bell Labs in the 1970s. The name AWK

is derived from the family names of its authors Alfred Aho,

Peter Wgeinberger, and Brian Kernighan. Awk scripts find

use for mineral processing data to extract numbers and

values from text log files generated in production equipment.

C. Palo ETL server

Key issues of Palo are ETL and cube technology. The

open source Palo ETL Server 3.0 [12] is an Extract,

Transform and Load software designed for importing and

exporting large quantities of data to/from Palo databases.

Data are extracted from heterogeneous sources and masterand transaction data are transformed and loaded into Palo

models illustrated by the vertical arrows in figure 2. The

Palo ETL Server allows automatic data imports. Established

relational databases can be connected as data sources via

a standardized interface. This includes Infobright databases

used in the expert system framework described pictorially

shortly.

Complex transformations and aggregations, for example,

models for cyclone circuits and grinder loops can be rep-

resented within a Palo model. Palo ETL Server 3.0 can

be operated both from the command line level and, more

conveniently, using the ETL web client. Palo uses online

analytical processing cube technology for its data structure.Manipulations and analyzes of data may be executed from

multiple perspectives. The arrangement of data into cubes

overcomes a limitation of relational databases that makes

them unsuitable for near instantaneous analysis and display.

It is proposed to couple advantages of column-oriented

relational databases regarding compression and the resulting

speed enhancements with modeling of instantaneous phe-

nomena found in mineral processing applications afforded

50

8/3/2019 GrC_2010_13305576265

3/6

by Palo cube technology for transforming between relational

and OLAP databases.

Figure 2. Jedox Palo ETL Server Architecture [12]

IV. METADATA GENERATION

We are showing software to implement in complex min-

eral processing projects, which have the goal of maximum

production efficiency. The use of an efficient expert system

will reduce the time to execute optimizations cycles. The

magnitude of the project dictates that some areas are not

developed and metadata generation is one of them. It suffices

here to provide one example of metadata generation from a

previous project.

The Dublin Core metadata element set - ISO 15836:2009(ISO, 2009) is an example of the need to create a well

structured metadata. The Simple Dublin Core Metadata

Element Set (DCMES) consists of 15 metadata elements

(Title, Creator, Subject, Description, Publisher, Contributor,

Date, Type, Format, Identifier, Source, Language, Relation,

Coverage, Rights).

A. DCMES used for mineral processing metadata

Only at the highest level (level 1) of DCMES standard

can we normalize process metadata. The layered architecture

is diagrammed at http://dublincore.org/metadata-basics/. At

Level 1, interoperability among applications sharing meta-

data is based on a shared vocabulary. Participants withinan application environment agree upon the terms to use

in their metadata and on how those terms are defined.

Interoperability with the rest of the world outside of the

implementation environment is generally not a priority. Most

existing metadata applications operate at level 1. When

metadata is automatically generated from raw data present

in log servers, compliance with level 1 of the DCMES

architecture is under consideration.

Generating metadata from qualitative noisy data, compatible

with the Dublin core qualifiers, requires a powerful language

and toolboxes to achieve that objective. Python comes with

complete implementations of Wordnet and NLTK toolboxes

and hence provides an excellent replacement to AWK scripts

in the task of text extraction.

B. Wordnet and NLTK code example

The following code is an example of word synonymsearch using Wordnet and Python. The objective is totransform the text to a level 1 compliant form.

## ExpertS.py ##

import Tkinter

import nltk

import MySQLdb

from nltk.book import *from MySQLdb import Connect

def GenMeta(palavras):

from nltk.corpus import wordnet as wn

conta=0

while conta < palavras: #len(words):

conta = conta + 1for synset in wn.synsets(words[conta]):

print synset.definition

palavras=10

conn = Connect(host=localhost ,

user=root , passwd=123456)

C. Fields trim with AWK

PERL and Python are other examples of powerful text

processing facilities, but the simplicity of AWK as a Turing-

complete programming language, allows creating lean code

to manipulate text-based data and feed it to our real time

processing expert system. The following code is a functionavailable in the ETL Datamelt Tootkit. The objective is to

trim (eg spaces or special characters) from a raw text file.

The code is self documenting with comments internal to it.

#!/usr/bin/awk -f

# [email protected]

# http://datamelt.com

# function to remove blanks on both sides of the string

function trim(value)

{

sub(/ */,"",value)

sub(/ *$/,"",value)

return value

}

# begin of processing

BEGIN {

# setting the files field separatorFS=";";

OFS=";";

}

{

for(i=1;i

8/3/2019 GrC_2010_13305576265

4/6

V. BLENDING INFOBRIGHT WITH OPC SERVERS AND

DATA MANAGEMENT SYSTEMS

The two essential tasks to be done in a database migration

of any sort are: First, export the data from the original

source database and, second, import the data into the target

database. The syntax for Infobright

MySQL export command is SELECT INTO OUTFILEFrom . . . WHERE . . ..

The Infobright analytical engine has differences when

compared with a standard MySQL DBMS:

Declaration of storage engine type

Lack of need for indices or partition schemes

Lack of referential integrity checks

Removal of constraints

Minor data type differences (ICE, 2010)

Supported character sets and collations

A. Integration with OPC client-server systems

Mineral processing function is typically automated with

programmable logic controllers (PLC), to read analog anddiscrete values from the sensors wired to the I/Os. OPC

servers (Figure 3) can be used to map those values to

MySQL databases for post-processing [13].

Figure 3. Dataporter CommServer [14]

B. Integration with PI systems

Invensys Process Engineering Suite (PES), Wonderware,

and Osisoft PI systems (figure 4) are examples of datamanagement systems, used in mineral processing operations

(Xstrata, ValeInco).

Xstrata is a major global diversified mining group. Xs-

trata Nickel, Sudbury Operation has approximately 900

employees and produces Nickel and copper smelter products.

The 2008 Sudbury Smelter annual production rates were

64,906 tons nickel-in-concentrate, 17,811 tons copper-in-

concentrate and 2,698 tons cobalt-in-concentrate.

Key data items, to name just a few, among those mentioned

in Xtratas 2009 Regional, Divisional and Site Sustainability

Reports (published April 2010) follow (their units of mea-

sure are parenthesized):

Environmental indicators

Direct energy use (PJ), Total energy use (PJ),

Total water use (ML) Direct and total greenhouse gas emissions (both mea-

sured in CO2 equivalent million tons)

Sulphur dioxide stack emissions (tons)

Oxides of nitrogen stack emissions (tons)

Total recycling and reuse of water (ML)

Land disturbed (hectares)

Land rehabilitated (hectares)

Production indicators and their units of measure

Ferrochrome (kt)

Vanadium pentoxide (k lbs)

Ferrovanadium (k kg)

Thermal coal (mt)

Coking coal (mt)

Semi-soft coking (mt)

Total coal (mt)

Total mined copper (contained metal) (kt)

Total mined gold (contained metal) (koz)

Nickel (kt)

Ferronickel (kt)

Cobalt (kt)

Zinc in concentrate production (kt)

Zinc metal production (kt)

Lead in concentrate production (kt)

Lead metal production (kt)

Such measured quantities are related in different ways toother items of interest, for example:

Indirect energy consumption by primary source

Energy saved due to conservation and efficiency im-

provements

NOx, SOx, and other significant air emissions by type

and weight

Total water discharge by quality and destination

Total weight of waste by type and disposal method

Total number and volume of significant spills

Weight of transported, imported, exported, or treated

hazardous waste

Identity, size, protected status, and biodiversity value of

water bodies and related habitats significantly affectedby discharges of water and runoff

Extent of impact of initiatives to mitigate environmental

impacts of products and services

Percentage of products sold and their packaging mate-

rials that are reclaimed by category

Value and number of significant fines and non-monetary

sanctions for non-compliance with environmental laws

and regulations

52

8/3/2019 GrC_2010_13305576265

5/6

Extent of environmental impacts of transporting prod-

ucts and other goods and materials

Total amount of land owned, leased, and managed

for production activities or extractive use; total land

distributed, total land rehabilitated

The number/percentage of sites identified as requiring

biodiversity management plans, and with plans in place Percentage of product(s) derived from secondary mate-

rials

The Xstrata local operation uses PI systems for oper-

ational, event, and real-time data management recording

quantities that eventually feed into national and global

reports by extracting data from sensors positioned in produc-

tion operations. However, users of PI systems require an aid

to help them find and evaluate specific data values emitted

from sensors and the relationships among their data types.

Sensors may either produce or consume data sometimes

switching roles in response to perceived (consumed) inputs

from its environment. It is useful to view sensors within

a consumer/producer paradigm because the large body ofresearch into data mining in commercial and business

applications can be brought to bear upon the staggering

knowledge management needs of a mineral processing plant.

Figure 4. Osisoft PI systems [15]

Recommender systems connect users with items to con-

sume (purchase, view, listen to, etc.) by associating the

content of recommended items or the opinions of otherindividuals with the consuming users actions or opinions

[16]. A sensor in its role as consumer expresses an interest

in data from its environment either through its perceptual

instrument or by data received from other sensors. Data

items from other sensors that might be of interest to a

given sensor (the consumer) are recommended based on

the sensors on a site that have the most traffic, on certain

characteristics of the consumer (eg. strength of its signal), or

on a historical analysis of the past behavior of the consumer

as a prediction for future producer/consumer interactions.

V I . OPEN SOURCE EXPERT SYSTEMS

Expert systems can be implemented from scratch using

high level programming languages (Python, Lua, Ruby) and

specialized modules for the inference engines (PyFuzzyLib,Pyke). A more desireable approach for Engineers is to use a

ready to populate open source expert system, two possibilites

of which are described in the remainder of this section.

A. CLIPS - C Language Integrated Production System

The first versions of CLIPS [17] were developed at

NASA-Johnson Space Center in 1984, trying to eliminate

the problems of the LISP language. Nowadays CLIPS is

a public domain software tool to develop expert systems

that supports three different programming paradigms: rule-

based, object-oriented and procedural. CLIPS is written in

C for portability and speed, interfaces with Python, with

procedural programming capabilities provided by CLIPS aresimilar to capabilities found in languages such as C, Java,

Ada, and LISP.

B. D3Web Knowledge System

The d3web system [18] is a Java-based prototyping and

development toolkit for distributed knowledge systems. It in-

cludes a knowledge modelling environment tool (KnowME)

and a visual knowledge acquisition tool and an evaluation

& management tool.

D3web offers various problem-solving methods including:

categorical and heuristic rules

decision trees and decision tables

set-covering models case-based reasoning

VII. TYING IT ALL TOGETHER

The architecture for a real time expert system in a

mineral processing plant is illustrated in figure 5. The im-

plementation requires an OPC server that translates different

PLC protocols (Modbus, Profibus, CAN, DeviceNet, etc) to

standard TCP-IP socket connections. The OPC server (figure

4) will populate the Infobright IEE database as a MySQL

compatible database. A list of freely available OPC servers

is given in [13].

The OPC server (orange box) sends and receives digital

and analog signals to PLC and DCS systems. The OCPInfobright connection (between orange and blue boxes)

is the recommended scenario for a typical mineral pro-

cessing automation system. In this connection we have

Human Machine Interfaces (HMI), Distributed Computer

Systems (DCS) with supervisory control and data acquisition

(SCADA) systems. A customized ETL is required for each

of these distinct cases. The tools described [13] are either

pre-built or supplied as ready to build source code, or both.

53

8/3/2019 GrC_2010_13305576265

6/6

Some are evaluation versions and some are downloadable

from the Web.

The grey boxes represent the Expert System that can be

implemented using Clips, Python (for connection with the

database) and Pyke (for the inference engine), or D3Web.

Light yellow illustrates a typical implementation of report

generation and OLAP cubes creation. Manual and automaticprocess control allowing the test and implementation of

distinct control strategies (stochastic, heuristic, deterministic,

Monte Carlo methods) are placed in the human machine

interface block.

In future work the ETL operations for qualitative data

will be implemented using Python toolboxes, while the

quantitative data will be processed using normal algorithmic

calculus implementations.

Figure 5. Real Time Mineral Processing Expert System

VIII. CONCLUSION

In the current mineral processing plants in Northern

Ontario, Canada (eg., producing copper/nickel matte from

sulphidic ores), the level of automation is increasing due to

increased demands for productivity and efficiency as well asfor the need of compliance regarding environment factors.

Hundreds of analog and discrete signals given by sensors

and control systems are stored in databases either directly

or using real-time data management infrastructures [15]. The

implementation of a real time expert system to make use

of such data requires low latency database read cycles, and

column oriented databases tuned for performance and single

variable analysis. Most automation communications systems

can be integrated with an OPC server to send and receive

data from central or distributed systems. The divide and

conquer strategy implemented in automation systems with

local PID and fuzzy logic control makes way for distributed

artificial intelligence through remote expert systems.

This paper has addressed mineral processing data needs

with a focus on the application of data mining and datawarehousing techniques. We have provided a framework

in which a variety of software are put together to support

the development and implementation of a real time expert

system in a mineral processing plant. The software are

CLIPS, D3web, and Pyke, freely downloadable or open

source available for purchase. It was found that the software

for ETL (extract, transform, and load) showed variability

among applications requiring the evaluation and selection

from a variety of available products. Those products have

been enumerated. Additionally, the parameters by which the

ETL products should be evaluated have been listed.

REFERENCES

[1] R. K. Brouwer, Fuzzy rule extraction from feed forwardneural network by training a representative fuzzy neuralnetwork using gradient descent, International Journal ofUncertainty, pp. 673698, December 2005.

[2] J. Johnson and G. Johnson, Infobright for analyzing socialsciences data, Comm. Computer and Information Science:

Database Theory and Applications, pp. 9098, March 2009.

[3] P. Vaillancourt and J. Johnson, Monitoring network awaresensors using BACnet, IJCNS Int. Journal of ComputerScience and Network Security, pp. 1523, November 2006.

[4] T. Yalcin, Advanced mineral processing, LU-ENGR5207,pp. 144161, September 2007.

[5] EUNITE roadmap, http://www.eunite.org/eunite/index.htm.

[6] Netezza, Analytic appliancehttp://www.netezza.com.[7] Kickfire, Kickfires SQL chip,http://www.kickfire.com.

[8] XtremeData, Sql in silicon,http://www.xtremedata.com.

[9] Infobright, Open source data warehousing http://www.infobright.com.

[10] D. Slezak, J. Wroblewski, V. Eastwood, and P. Synak,Bright-house: An analytic data warehouse for ad-hocqueries, PVLDB, pp. 13371345, June 2008.

[11] D. Slezak and M. Kowalski, Intelligent data granulation onload: Improving infobrights knowledge grid, Lecture Notesin Computer Science, vol. 5899, pp. 1225, 2009.

[12] Jedox Plan Analyse Report,http://www.jedox.com, 2010.

[13] OPCconnect, http://www.opcconnect.com/freesrv.php.

[14] COMMserver, Opc servers powered by commserve, http:

//www.commsvr.com/Products/OPCServer.aspx.

[15] OSIsoft PI system http://www.osisoft.com, April 2010.

[16] J. B. Schafer, The application of data mining to recom-mender systems, Encyclopedia of Data Warehousing and

Mining, pp. 4448, Mar. 2006.

[17] CLIPS, A tool for building expert systems, http://www.clipsrules.sourceforge.net, April 2010.

[18] D3Web Knowledge Systems, http://www.d3web.sourceforge.net, April 2010.

54

Documents

GrC_2010_13305576265