View
216
Download
0
Category
Preview:
Citation preview
8/3/2019 GrC_2010_13305576265
1/6
Knowledge Representation and Expert Systems for Mineral Processing using
Infobright
Alberto Rui Frutuoso BarrosoNatural Resources Engineering
School of Engineering
Laurentian University
Sudbury, Canada
ar frutuosobarroso@laurentian.ca
Greg BaidenPenguin Automated Systems Inc
School of Engineering
Laurentian University
Sudbury, Canada
gbaiden@laurentian.ca
Julia JohnsonDepartment of Mathematics
and Computer Science
Laurentian University
Sudbury, Canada
jjohnson@cs.laurentian.ca
AbstractOpen source tools for Knowledge Representationin databases and the implementation of a real time expertsystem for mineral processing operations (size reduction andenrichment) are discussed. The use of a column-oriented
database system (Infobright IEE) to store quantitative datafrom sensors that measure feed size distribution, feed rate,aeration rate, pulp density, pH and temperature allows lowlatency database query responses and real time process controland analysis. Qualitative metadata can be generated with theuse of mathematical process models (simulation outputs, re-duction equations, transforms), and from the natural languageanalysis of process data (reagents and ore mineralogy). Thetoolkits Wordnet and the Natural Language Toolkit (NLTK)were used for metadata generation, processing qualitative textinformation present in process databases, and for generatingdata for subsequent inference engine rule checking. We tookadvantage of the power and ease of the programming languagePython to implement a framework for fuzzy and rough setrules generation, and to create an on-line-analytical-processing(OLAP) system for reporting production process parameters.
Keywords-Expert Systems; Artificial Intelligence; Infobright;CLIPS; Pyke; Rough Sets; Fuzzy Sets; OLAP cubes
I. INTRODUCTION
In a real-time expert system used in mineral processing
operations (grinding, flotation, cyclone separation), multiple
types of input data need to be processed and used to control
equipment on the mill production floor. In grinding and flota-
tion units we have vibration sensors detecting operational
malfunctions, high speed video cameras and processing units
giving dimensional values of rock particles that are exiting
the grinding circuits, gamma ray devices giving density
values of the liquid-solid mixtures (slurries), pH sensors,and high speed temperature and pressure measurements
(pumping circuits, autoclaves, etc). Such are examples of
quantitative data.
In contrast, qualitative data are present in local databases,
such as the characteristics of froth products used, type of
rods (or balls) used in the grinders, parameters adjusted with
manual procedures, and operators comments present in the
form of short natural language (English) texts containing
domain specific terse abbreviations. There is evidence [1]
that fuzzy sets decisions help to process uncertain quanti-
tative data from the equipment sensors, with the options to
be implemented in a local Programmable Logic Controllers
(PLC) or in a centralized system using OPC data links.
In a previous application [2], the qualitative data used
to populate the expert system, were extracted from local
databases and transformed and loaded (ETL operations)
into the knowledge database. In conjunction with [3] we
conclude that a qualitative knowledge database for mineral
processing can be established only after removing imprecise
data emerging from a knowledge acquisition phase, allow-
ing subsequent efficient searches by the inference engine
(Figure 1 from [4]). Typically the inference engine will
parse statements, assign degrees of belief, examine and fire
rules, use customized search strategies, provide explanations
and justifications, communicate these to users and external
programs, and process the problem solving results [4].NLTK and Wordnet are versatile tools to discard re-
dundant information from the data before storing it in the
knowledge database. NLTK and Wordnet are discussed here
as precursors of the Rule-based representation, semantic
networks and frames that are the three main methods used
for knowledge representation in intelligent decision support
systems [5].
Figure 1. Expert System Components
2010 IEEE International Conference on Granular Computing
978-0-7695-4161-7/10 $26.00 2010 IEEE
DOI 10.1109/GrC.2010.133
49
8/3/2019 GrC_2010_13305576265
2/6
I I . COLUMN BASED DATABASE MANAGEMENT SYSTEMS
Column based database management systems (DBMS)
access the database content reading and writing entire
columns, allowing fast searches in large databases and data
warehouses. Various approaches to column based database
management systems have been used in different appli-
cations, and some solutions have been tuned to specificproblems (Netezza Skimmer [6] ). Hybrid software-hardware
solutions add the power of a column based architecture, with
the performance of a SQL query processor implemented in
a field programmable gate array (FPGA). Kickfire [7] and
Xtremedata [8] are examples of analytic appliances with
capabilities up to 10 Peta bytes, and query performance
improvements up to 100x (10x minimum).
A. Infobright system
In contrast with hardware based FPGAs, Infobright [9]
is a high performance analytic software system designed
to handle specific queries on large data sets. Infobrighttechnology [10] combines a column-oriented database with
a Knowledge Grid architecture [11] to deliver low waiting
times in data analysis. The data are partitioned and physical
data structures built with a self-managing structure that
eliminates the need for standard database indexes. Infobright
provides scalability with solutions up to 50 Tera bytes using
a single server, and the 10:1 (up to 40:1) data compression
allowing a significant reduction of the storage media of the
database while delivering rapid response to complex queries.
The APIs supported by Infobright are extensive and
among them a mineral processing engineer will surely find
one that he/she prefers. The APIs include: C, C++, C#,
Borland Delphi (via dbExpress), Eiffel, SmallTalk, Java(with a native Java driver implementation), Lisp, Perl, PHP,
Python, Ruby, REALbasic, FreeBasic, and Tcl. Infobright
supports ANSI SQL-92 with some SQL-99 extensions,
standard database interfaces (including ODBC, JDBC and
native connections), 500 database users with up to 32
concurrent queries (depending on number of CPU cores and
amount of memory), and admits a variety of schema designs.
Two versions are available: a GPL2-licensed, open source
Community Edition (ICE), and a commercially licensed
Enterprise Edition (IEE).
ICE, being a self-contained system on PCs was convenient
to use for initial implementation and testing, but deficiencies
regarding limited data types were observed for the appli-cation at hand. A complete comparison matrix has been
provided on the IEE/ICE site. In summary, it is more fast
to migraate a mySQL database to IEE, than ICE.
Infobright Enterprise Edition (IEE) was obtained for use
in this project and others at Laurentian University on the
basis of a special academic promotional offer. It was in-
stalled on an 8-core, 8 Giga byte RAM server running the
Debian Linux operating system. IEEs MySQL pluggable
storage engine architecture allowed the database server to
be accessed using an SQL client running on Windows.
III. EXTRACT, TRANSFORM AND LOAD TOOLS
Extract, transform, and load (ETL) are functions used
in the population of databases, and typically consists in
Extracting data from outside sources (files or OPC servers),
transforming (deleting, filtering, etc), and loading it into the
target database [10].
A. How will ETL be used for mineral processing data?
ETL operations are needed for equipment operator com-
ments on error acknowledgments and for log files generated
in programmable logic controllers PLC systems. The PLC
that controls a mine hoist or a conveyor, transporting the feed
(ore) to the mill is one good example of log data generation
of parameters, in this case the transported weight, velocity,
number of trips, downtime, MTBF, etc.
B. AWK scripts
AWK is a programming language that is designed forprocessing text-based data, either in files or data streams,
and was created at Bell Labs in the 1970s. The name AWK
is derived from the family names of its authors Alfred Aho,
Peter Wgeinberger, and Brian Kernighan. Awk scripts find
use for mineral processing data to extract numbers and
values from text log files generated in production equipment.
C. Palo ETL server
Key issues of Palo are ETL and cube technology. The
open source Palo ETL Server 3.0 [12] is an Extract,
Transform and Load software designed for importing and
exporting large quantities of data to/from Palo databases.
Data are extracted from heterogeneous sources and masterand transaction data are transformed and loaded into Palo
models illustrated by the vertical arrows in figure 2. The
Palo ETL Server allows automatic data imports. Established
relational databases can be connected as data sources via
a standardized interface. This includes Infobright databases
used in the expert system framework described pictorially
shortly.
Complex transformations and aggregations, for example,
models for cyclone circuits and grinder loops can be rep-
resented within a Palo model. Palo ETL Server 3.0 can
be operated both from the command line level and, more
conveniently, using the ETL web client. Palo uses online
analytical processing cube technology for its data structure.Manipulations and analyzes of data may be executed from
multiple perspectives. The arrangement of data into cubes
overcomes a limitation of relational databases that makes
them unsuitable for near instantaneous analysis and display.
It is proposed to couple advantages of column-oriented
relational databases regarding compression and the resulting
speed enhancements with modeling of instantaneous phe-
nomena found in mineral processing applications afforded
50
8/3/2019 GrC_2010_13305576265
3/6
by Palo cube technology for transforming between relational
and OLAP databases.
Figure 2. Jedox Palo ETL Server Architecture [12]
IV. METADATA GENERATION
We are showing software to implement in complex min-
eral processing projects, which have the goal of maximum
production efficiency. The use of an efficient expert system
will reduce the time to execute optimizations cycles. The
magnitude of the project dictates that some areas are not
developed and metadata generation is one of them. It suffices
here to provide one example of metadata generation from a
previous project.
The Dublin Core metadata element set - ISO 15836:2009(ISO, 2009) is an example of the need to create a well
structured metadata. The Simple Dublin Core Metadata
Element Set (DCMES) consists of 15 metadata elements
(Title, Creator, Subject, Description, Publisher, Contributor,
Date, Type, Format, Identifier, Source, Language, Relation,
Coverage, Rights).
A. DCMES used for mineral processing metadata
Only at the highest level (level 1) of DCMES standard
can we normalize process metadata. The layered architecture
is diagrammed at http://dublincore.org/metadata-basics/. At
Level 1, interoperability among applications sharing meta-
data is based on a shared vocabulary. Participants withinan application environment agree upon the terms to use
in their metadata and on how those terms are defined.
Interoperability with the rest of the world outside of the
implementation environment is generally not a priority. Most
existing metadata applications operate at level 1. When
metadata is automatically generated from raw data present
in log servers, compliance with level 1 of the DCMES
architecture is under consideration.
Generating metadata from qualitative noisy data, compatible
with the Dublin core qualifiers, requires a powerful language
and toolboxes to achieve that objective. Python comes with
complete implementations of Wordnet and NLTK toolboxes
and hence provides an excellent replacement to AWK scripts
in the task of text extraction.
B. Wordnet and NLTK code example
The following code is an example of word synonymsearch using Wordnet and Python. The objective is totransform the text to a level 1 compliant form.
## ExpertS.py ##
import Tkinter
import nltk
import MySQLdb
from nltk.book import *from MySQLdb import Connect
def GenMeta(palavras):
from nltk.corpus import wordnet as wn
conta=0
while conta < palavras: #len(words):
conta = conta + 1for synset in wn.synsets(words[conta]):
print synset.definition
palavras=10
conn = Connect(host=localhost ,
user=root , passwd=123456)
C. Fields trim with AWK
PERL and Python are other examples of powerful text
processing facilities, but the simplicity of AWK as a Turing-
complete programming language, allows creating lean code
to manipulate text-based data and feed it to our real time
processing expert system. The following code is a functionavailable in the ETL Datamelt Tootkit. The objective is to
trim (eg spaces or special characters) from a raw text file.
The code is self documenting with comments internal to it.
#!/usr/bin/awk -f
# uwe.geercken@datamelt.com
# http://datamelt.com
# function to remove blanks on both sides of the string
function trim(value)
{
sub(/ */,"",value)
sub(/ *$/,"",value)
return value
}
# begin of processing
BEGIN {
# setting the files field separatorFS=";";
OFS=";";
}
{
for(i=1;i
8/3/2019 GrC_2010_13305576265
4/6
V. BLENDING INFOBRIGHT WITH OPC SERVERS AND
DATA MANAGEMENT SYSTEMS
The two essential tasks to be done in a database migration
of any sort are: First, export the data from the original
source database and, second, import the data into the target
database. The syntax for Infobright
MySQL export command is SELECT INTO OUTFILEFrom . . . WHERE . . ..
The Infobright analytical engine has differences when
compared with a standard MySQL DBMS:
Declaration of storage engine type
Lack of need for indices or partition schemes
Lack of referential integrity checks
Removal of constraints
Minor data type differences (ICE, 2010)
Supported character sets and collations
A. Integration with OPC client-server systems
Mineral processing function is typically automated with
programmable logic controllers (PLC), to read analog anddiscrete values from the sensors wired to the I/Os. OPC
servers (Figure 3) can be used to map those values to
MySQL databases for post-processing [13].
Figure 3. Dataporter CommServer [14]
B. Integration with PI systems
Invensys Process Engineering Suite (PES), Wonderware,
and Osisoft PI systems (figure 4) are examples of datamanagement systems, used in mineral processing operations
(Xstrata, ValeInco).
Xstrata is a major global diversified mining group. Xs-
trata Nickel, Sudbury Operation has approximately 900
employees and produces Nickel and copper smelter products.
The 2008 Sudbury Smelter annual production rates were
64,906 tons nickel-in-concentrate, 17,811 tons copper-in-
concentrate and 2,698 tons cobalt-in-concentrate.
Key data items, to name just a few, among those mentioned
in Xtratas 2009 Regional, Divisional and Site Sustainability
Reports (published April 2010) follow (their units of mea-
sure are parenthesized):
Environmental indicators
Direct energy use (PJ), Total energy use (PJ),
Total water use (ML) Direct and total greenhouse gas emissions (both mea-
sured in CO2 equivalent million tons)
Sulphur dioxide stack emissions (tons)
Oxides of nitrogen stack emissions (tons)
Total recycling and reuse of water (ML)
Land disturbed (hectares)
Land rehabilitated (hectares)
Production indicators and their units of measure
Ferrochrome (kt)
Vanadium pentoxide (k lbs)
Ferrovanadium (k kg)
Thermal coal (mt)
Coking coal (mt)
Semi-soft coking (mt)
Total coal (mt)
Total mined copper (contained metal) (kt)
Total mined gold (contained metal) (koz)
Nickel (kt)
Ferronickel (kt)
Cobalt (kt)
Zinc in concentrate production (kt)
Zinc metal production (kt)
Lead in concentrate production (kt)
Lead metal production (kt)
Such measured quantities are related in different ways toother items of interest, for example:
Indirect energy consumption by primary source
Energy saved due to conservation and efficiency im-
provements
NOx, SOx, and other significant air emissions by type
and weight
Total water discharge by quality and destination
Total weight of waste by type and disposal method
Total number and volume of significant spills
Weight of transported, imported, exported, or treated
hazardous waste
Identity, size, protected status, and biodiversity value of
water bodies and related habitats significantly affectedby discharges of water and runoff
Extent of impact of initiatives to mitigate environmental
impacts of products and services
Percentage of products sold and their packaging mate-
rials that are reclaimed by category
Value and number of significant fines and non-monetary
sanctions for non-compliance with environmental laws
and regulations
52
8/3/2019 GrC_2010_13305576265
5/6
Extent of environmental impacts of transporting prod-
ucts and other goods and materials
Total amount of land owned, leased, and managed
for production activities or extractive use; total land
distributed, total land rehabilitated
The number/percentage of sites identified as requiring
biodiversity management plans, and with plans in place Percentage of product(s) derived from secondary mate-
rials
The Xstrata local operation uses PI systems for oper-
ational, event, and real-time data management recording
quantities that eventually feed into national and global
reports by extracting data from sensors positioned in produc-
tion operations. However, users of PI systems require an aid
to help them find and evaluate specific data values emitted
from sensors and the relationships among their data types.
Sensors may either produce or consume data sometimes
switching roles in response to perceived (consumed) inputs
from its environment. It is useful to view sensors within
a consumer/producer paradigm because the large body ofresearch into data mining in commercial and business
applications can be brought to bear upon the staggering
knowledge management needs of a mineral processing plant.
Figure 4. Osisoft PI systems [15]
Recommender systems connect users with items to con-
sume (purchase, view, listen to, etc.) by associating the
content of recommended items or the opinions of otherindividuals with the consuming users actions or opinions
[16]. A sensor in its role as consumer expresses an interest
in data from its environment either through its perceptual
instrument or by data received from other sensors. Data
items from other sensors that might be of interest to a
given sensor (the consumer) are recommended based on
the sensors on a site that have the most traffic, on certain
characteristics of the consumer (eg. strength of its signal), or
on a historical analysis of the past behavior of the consumer
as a prediction for future producer/consumer interactions.
V I . OPEN SOURCE EXPERT SYSTEMS
Expert systems can be implemented from scratch using
high level programming languages (Python, Lua, Ruby) and
specialized modules for the inference engines (PyFuzzyLib,Pyke). A more desireable approach for Engineers is to use a
ready to populate open source expert system, two possibilites
of which are described in the remainder of this section.
A. CLIPS - C Language Integrated Production System
The first versions of CLIPS [17] were developed at
NASA-Johnson Space Center in 1984, trying to eliminate
the problems of the LISP language. Nowadays CLIPS is
a public domain software tool to develop expert systems
that supports three different programming paradigms: rule-
based, object-oriented and procedural. CLIPS is written in
C for portability and speed, interfaces with Python, with
procedural programming capabilities provided by CLIPS aresimilar to capabilities found in languages such as C, Java,
Ada, and LISP.
B. D3Web Knowledge System
The d3web system [18] is a Java-based prototyping and
development toolkit for distributed knowledge systems. It in-
cludes a knowledge modelling environment tool (KnowME)
and a visual knowledge acquisition tool and an evaluation
& management tool.
D3web offers various problem-solving methods including:
categorical and heuristic rules
decision trees and decision tables
set-covering models case-based reasoning
VII. TYING IT ALL TOGETHER
The architecture for a real time expert system in a
mineral processing plant is illustrated in figure 5. The im-
plementation requires an OPC server that translates different
PLC protocols (Modbus, Profibus, CAN, DeviceNet, etc) to
standard TCP-IP socket connections. The OPC server (figure
4) will populate the Infobright IEE database as a MySQL
compatible database. A list of freely available OPC servers
is given in [13].
The OPC server (orange box) sends and receives digital
and analog signals to PLC and DCS systems. The OCPInfobright connection (between orange and blue boxes)
is the recommended scenario for a typical mineral pro-
cessing automation system. In this connection we have
Human Machine Interfaces (HMI), Distributed Computer
Systems (DCS) with supervisory control and data acquisition
(SCADA) systems. A customized ETL is required for each
of these distinct cases. The tools described [13] are either
pre-built or supplied as ready to build source code, or both.
53
8/3/2019 GrC_2010_13305576265
6/6
Some are evaluation versions and some are downloadable
from the Web.
The grey boxes represent the Expert System that can be
implemented using Clips, Python (for connection with the
database) and Pyke (for the inference engine), or D3Web.
Light yellow illustrates a typical implementation of report
generation and OLAP cubes creation. Manual and automaticprocess control allowing the test and implementation of
distinct control strategies (stochastic, heuristic, deterministic,
Monte Carlo methods) are placed in the human machine
interface block.
In future work the ETL operations for qualitative data
will be implemented using Python toolboxes, while the
quantitative data will be processed using normal algorithmic
calculus implementations.
Figure 5. Real Time Mineral Processing Expert System
VIII. CONCLUSION
In the current mineral processing plants in Northern
Ontario, Canada (eg., producing copper/nickel matte from
sulphidic ores), the level of automation is increasing due to
increased demands for productivity and efficiency as well asfor the need of compliance regarding environment factors.
Hundreds of analog and discrete signals given by sensors
and control systems are stored in databases either directly
or using real-time data management infrastructures [15]. The
implementation of a real time expert system to make use
of such data requires low latency database read cycles, and
column oriented databases tuned for performance and single
variable analysis. Most automation communications systems
can be integrated with an OPC server to send and receive
data from central or distributed systems. The divide and
conquer strategy implemented in automation systems with
local PID and fuzzy logic control makes way for distributed
artificial intelligence through remote expert systems.
This paper has addressed mineral processing data needs
with a focus on the application of data mining and datawarehousing techniques. We have provided a framework
in which a variety of software are put together to support
the development and implementation of a real time expert
system in a mineral processing plant. The software are
CLIPS, D3web, and Pyke, freely downloadable or open
source available for purchase. It was found that the software
for ETL (extract, transform, and load) showed variability
among applications requiring the evaluation and selection
from a variety of available products. Those products have
been enumerated. Additionally, the parameters by which the
ETL products should be evaluated have been listed.
REFERENCES
[1] R. K. Brouwer, Fuzzy rule extraction from feed forwardneural network by training a representative fuzzy neuralnetwork using gradient descent, International Journal ofUncertainty, pp. 673698, December 2005.
[2] J. Johnson and G. Johnson, Infobright for analyzing socialsciences data, Comm. Computer and Information Science:
Database Theory and Applications, pp. 9098, March 2009.
[3] P. Vaillancourt and J. Johnson, Monitoring network awaresensors using BACnet, IJCNS Int. Journal of ComputerScience and Network Security, pp. 1523, November 2006.
[4] T. Yalcin, Advanced mineral processing, LU-ENGR5207,pp. 144161, September 2007.
[5] EUNITE roadmap, http://www.eunite.org/eunite/index.htm.
[6] Netezza, Analytic appliancehttp://www.netezza.com.[7] Kickfire, Kickfires SQL chip,http://www.kickfire.com.
[8] XtremeData, Sql in silicon,http://www.xtremedata.com.
[9] Infobright, Open source data warehousing http://www.infobright.com.
[10] D. Slezak, J. Wroblewski, V. Eastwood, and P. Synak,Bright-house: An analytic data warehouse for ad-hocqueries, PVLDB, pp. 13371345, June 2008.
[11] D. Slezak and M. Kowalski, Intelligent data granulation onload: Improving infobrights knowledge grid, Lecture Notesin Computer Science, vol. 5899, pp. 1225, 2009.
[12] Jedox Plan Analyse Report,http://www.jedox.com, 2010.
[13] OPCconnect, http://www.opcconnect.com/freesrv.php.
[14] COMMserver, Opc servers powered by commserve, http:
//www.commsvr.com/Products/OPCServer.aspx.
[15] OSIsoft PI system http://www.osisoft.com, April 2010.
[16] J. B. Schafer, The application of data mining to recom-mender systems, Encyclopedia of Data Warehousing and
Mining, pp. 4448, Mar. 2006.
[17] CLIPS, A tool for building expert systems, http://www.clipsrules.sourceforge.net, April 2010.
[18] D3Web Knowledge Systems, http://www.d3web.sourceforge.net, April 2010.
54
Recommended