Four ways to represent computer executable rules

Cover Page

Uploaded June 24, 2011

Four Ways to

Represent Computer‐

Executable Rules Author: Jeffrey G. Long ([email protected])

Date: July 25, 2008

Forum: Talk presented at the InterSymp 2008 Conference, sponsored by the

International Institute for Advanced Studies in Systems Research and Cybernetics

(IIAS). Paper published in conference proceedings, available at

http://iias.info/pdf_general/Booklisting.pdf

Contents

Pages 1‐5: Preprint of Article

Pages 6‐26: Slides (but no text) for presentation

License

This work is licensed under the Creative Commons Attribution‐NonCommercial

3.0 Unported License. To view a copy of this license, visit

http://creativecommons.org/licenses/by‐nc/3.0/ or send a letter to Creative

Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.

1

Four Ways to Represent Computer-Executable Rules

Jeffrey G. Long [email protected]

Abstract Rules have long been used by society but have rarely been studied explicitly in their own right. They are increasingly recognized as interesting and useful abstractions. The recent trend towards business rules has brought the subject front-and-center in the business world, as have interests in work process re-engineering over the past twenty years. Rules for computerized applications currently are represented in three ways:

as software instructions as production rules in the rulebase of an expert system as pairs of XML tags.

Each of these has its strengths and weaknesses. This paper discusses these approaches and briefly describes a proposed fourth approach, namely representing most rules in a relational DBMS. I view this as an exercise in notational engineering, i.e. examining alternative represen-tations to select one that is “best” in some engineering sense. Key Words: Business Rules; Software; Expert Systems; XML; Relational Databases General Features of Rules Any manner of representing rules must have several fundamental features, including:

what kind of events can initiate a cascade of rule executions the sequence in which rules are to be inspected, if sequence matters (including loops) the various conditions under which each rule is to be inspected and/or fired what happens if no rule, one rule, or multiple rules are found that match selection criteria how to resolve conflicts if multiple actions are prescribed when and how to stop or complete a rule cascade.

To be a rule management system, such a system must also have metadata such as:

who created or updated the rule, and when why the rule was created/updated by what device the rule was created or updated (manually, by import, by software, etc.) whether the rule can safely be changed without consulting others

2

what kind of further “research” ought to be done regarding a rule, if any (e.g. are there questions about the rule? Might it be obsolete?).

Software Rules Software rules are implemented as lines of code in a computer language such as Java. Such rules are typically called “business logic” rather than “business rules,” and are specified in terms of one of four standard programming constructs:

an ordered sequence of instructions loops, used to specify conditional re-iterations of rules If-Then-Else statements that select among two or more options Case statements that select among multiple options.

The result of executing a software rule is that either (a) internal or external data values are updated, or (b) program control goes to a portion of the program that is specified. From there, further rules are found and executed. Because different situations often have similar but slightly different rules, parameters are often specified whereby the code reads the parameter (typically stored as a data element) and branches to another section of code based on the value of the parameter. This allows software designers to anticipate predictable differences in the way different users might want the system to work. An example of a parameter is the definition of a fiscal year-end month, so that accounting systems can handle the fact that any month may be the year-end of a fiscal year for a particular user. The ability to specify rules as software provides a very fine-grained ability to represent complex and contingent rules. The downside of this is that there are always many such rules, typically thousands or more, and as a result there are thousands to millions of lines of code in a typical software application, or even a single software object. This large code corpus is difficult to comprehend, and, since it must evolve with new rules, ensures significant life-cycle maintenance costs. As with any complex system, changing one part of the system may have unanticipated consequences for other parts. And since only programmers can update the code, there is always the risk of miscommunication between the subject experts and the programmers. Production Rules In expert systems there is an inference engine, that knows only the rules of inference, a rulebase that specifies the rules (called productions), and an initial set of facts (the environment). Rules are triggered by facts, and any and all rules are selected that match the current environment. Those rules are added to an agenda, any conflicts are resolved (often via rule prioritization) and the remaining rules are fired. The result of firing a rule is to make a change to the facts (assert-ing new facts or withdrawing existing facts), which may then cause other rules to be fired. This process continues until a specified end-point is reached, or until there are no more rules on the agenda.

3

Production rules are formulated in an If-Then (sometimes If-Then-Else) format. There can be an unlimited number of If-conditions, used to specify the specific environmental conditions under which the Then-action(s) will be taken, and an unlimited number of Then-actions. Rules are typically stored in a text file which is loaded into memory at runtime, as are the initial facts. The way rules are defined (formatted) has become important for rule interchange among different systems, and the Object Management group (OMG) released in 11/2007 a Beta version of its Production Rule Representation specification. This approach has shed light on the kind of thinking that an expert seems to do, namely to look for salient features of a given environment, respond to those features with changes to the environment, and then respond to the changed environment. Its downside is that when the rulebase exceeds a few thousand rules the system may behave in an unexpected manner, for the rule interactions are hard to anticipate, and the order of rule execution is important. Another difficulty is that there are many (possibly thousands of) free-standing, independent rules to manage, even when the rules are grouped into rulesets. Yet future expert systems will need to manage not just thousands but hundreds of thousands, even millions, of rules. XML Rules Much work has been done in recent years towards the design and standardization of XML-based Rule Markup Languages. These are intended to make rules more easily maintainable by non-programmers; to serve the semantic web; and to define rules in a manner not tied to any particular vendor’s technology. A primary driver has been the increasing need to communicate and cooperate with numerous systems not only within an organization but now across organiza-tions (e.g. to customers, vendors, regulatory agencies, etc). This has led to an interest in exter-nalizing certain rules outside of software so they may be more readily examined and changed. The eXtendable Markup Language (XML) format has been widely adopted as a general framework for the specification of rules (e.g. RuleML, R2ML). XML tags are used to demark the beginning and the end of operators and relations to check for a particular rule; these may be nested and combined as necessary. Rules so demarked may then be searched for and read by multiple applications. There is a W3C Working Group dedicated to producing a Rule Inter-change Format (RIF), and the OMG is working on a variety of important areas, and recently released version 1.0 of its Semantics of Business Vocabulary and Business Rules. One difficulty of this approach is that those who maintain the rules are still left with an enormous number of free-standing, independent rules to manage. Integrity constraints are being developed, but there is still no referential integrity, such that an update can cascade to all places where an entity is referenced. Lastly, there is little query or reporting capability by which one can scan or update rules quickly and easily. These problems are similar to the problems encountered with the software representation of rules. An example of a simple RuleML rule implementation to give a premium customer a 5% discount on any regular product is shown in Figure 1 below.

4

<imp> <_head> <atom> <_opr><rel>discount</rel></_opr> <var>customer</var> <var>product</var> <ind>5.0 percent</ind> </atom> </_head> <_body> <and> <atom> <_opr><rel>premium</rel></_opr> <var>customer</var> </atom> <atom> <_opr><rel>regular</rel></_opr> <var>product</var> </atom> </and> </_body> </imp>

Figure 1: RuleML for a Price Discount Decision

Ultra-Structure Rules Since 1985 I’ve developed and used a fourth approach, called “Ultra-Structure”. This approach removes all business rules that might ever change from the software, leaving only the control logic for a “competency rule engine” as software. The rest of the rules are represented via relational tables; there are no data or facts in the system, only rules. Rules can be converted from their natural language form (e.g. a policy manual) into one or more rules having a canonical form consisting of:

one or more “If” statements, defining conditions under which the rule should be inspected one or more “Then-Consider” statements, defining additional considerations (before

deciding what to do next) and/or actions one or more metarule data fields specifying who set up the rule, why, whether it can

safely be changed without consulting others, etc. We can then categorize those rules into a small number of formats called “ruleforms” that are defined by their form and meaning, such that any logically possible rule pertaining to that application area (e.g. order processing) can be expressed in some table in the system. This has the profound effect of reducing the myriad numbers of known (and future unknown) rules to a manageably small number of tables, typically less than 100 for an enterprise system. Lastly, we can implement each ruleform as a table. All rules having the same number of If-statements and similar meanings are grouped together into one table, with the If-statements

5

(called factors) forming columns that constitute the primary key of the table (and thereby guaranteeing the uniqueness of each rule). Other columns in the table (called considerations) represent the Then-Consider statements and the metadata about the rule. Thus, most business rules are represented not as software, and not as data in XML tags, but as records (relations) in a modern RDBMS. Questioning decades of focus on software, under this approach software is seen as more of a problem than a solution, and the focus is on rules represented as relational data. By specifying business rules as records in a RDBMS, the only software that remains is control logic that knows nothing about the world except what tables to look at, in what order, and what to do based on rules selected for execution. Key benefits of this approach are that:

the amount of software required is reduced between 10-100 times since this control logic is unlikely to change over time, the software and data structures

stay remarkably stable even as the rules continue to evolve rules can evolve by simply changing data, without any software changes, so many kinds

of changes can be implemented immediately subject experts and business managers can explain new rules to business analysts (not

only programmers), who can then directly update the rules through the RDBMS. The key benefits of using a relational database for storing such rules are that the RDBMS:

provides access security and logging of changes provides utilities for querying and reporting on large numbers (millions) of rules guarantees referential integrity can easily handle millions of rules as necessary.

This approach is not presented as a perfect solution to the software bottleneck. Still to be addressed are (a) the need to determine when certain conditions that might arise have not been anticipated by any rule in the system, (b) the difficulty conventional programmers have with looking in two places (the “data” as well as the software) to understand the logic of a situation, and (c) the semantics of data such that each data element (such as “order date”) really means the same thing to all parties. The OMG is working to address this last issue with its new standard. We recently used this approach to create and install an enterprise system for a US$175M wholesale distributor. References

Long, J., and Denning, D. (1995); Ultra-Structure: A design theory for complex systems and processes; Communications of the ACM Vol. 38, No. 1 (pp. 105-120)

Four Ways to Represent Computer-Executable Rules

Jeffrey G. [email protected]

IIAS Baden-Baden ConferenceJuly 2008

Minimum Requirements of Rule Management

The sequence in which rules are to be inspected, if sequence matters (including loops)matters (including loops)

The various conditions under which each rule is to be inspected and/or fired

What happens if no rule one rule or multiple rules are found What happens if no rule, one rule, or multiple rules are found that match selection criteria

How to resolve conflicts if multiple actions are prescribedWh d h t t / d l d When and how to stop/end a rule cascade

Exceptions to rules are rules also.

July 20082

Conventional Ways to Represent Rules

Software (e.g. Java, C#) Production Rules (e g CLIPS Jess) Production Rules (e.g. CLIPS, Jess) XML (e.g. RuleML, JessML )

Natural languages Mathematical functions Chemical formulae Music notation

July 20083

Software Rules

If (premium customer) and (regular product)Then (discount is 5%)– Then (discount is 5%)

– Else (discount is 0%)

Select Case (customer category)– Case “Premium”

Select Case (product category)(p g y)– Case “Regular”

discount = 5%

July 20084

Features of Software as a Notational System

Many valid ways to express a given rule– both a strength and a weakness, depending on programmerboth a strength and a weakness, depending on programmer

Seemingly easy to change– but many times changes create new and unexpected

problemsp The starting point, stopping point, and sequence of operations

are defined wholly and explicitly by the programmer Control is based on program structure; rules (lines of code) are p g ( )

data-insensitive and ordered One missing bracket changes rule, can make it and entire

system inoperable (unexecutable)

July 20085

XML Rules

<imp> <_head>

<_body> <and>

<atom> <_opr><rel>discount</rel></_opr> <var>customer</var>

d t /

<atom> <_opr><rel>premium</rel></_opr> <var>customer</var> </atom>

<var>product</var> <ind>5.0 percent</ind> </atom>

</ head>

<atom> <_opr><rel>regular</rel></_opr> <var>product</var> </atom> _

</and> </_body></imp>

July 20086

XML Rule Markup Features

Vendor-independent standard. Other rule standardization efforts include RIF, PRR, CL, SBVR; open source rules pcommunities include jBoss Rules, Jess, Prova, OO jDrew, Mandarax, XSB, XQuery

Designed for use on Semantic Web – distributed, (partially) open, heterogeneous environments

One missing bracket changes rule, can make it unexecutable

July 20087

Production Rules

(defrule MAIN::good-customer-discount(product is regular)(product is regular)(customer is premium)=>(assert (price-discount is 5%)))

July 20088

Production Rule Features

The knowledge (rules) and the data (facts and instances) are separated, and the inference engine is used to apply the p g pp yknowledge to the data

Rules are data-sensitive and unordered; control is based on data statedata state

There are three phases: rule-matching, rule-selection, and rule-execution

There are limited choices during rule selection, depending on the inference engine used to resolve a conflict set

July 20089

Real-World Rules are More Complex

Must be inspected from most specific circumstances (exceptions) to most general (whole classes)(exceptions) to most general (whole classes)

Have multiple circumstances (3-10 “factors”) Each factor has many possible values (5+)

Ci t t i f th i ti f l Circumstances trigger further inspection of complex “considerations” (e.g. QOH)

After being selected, additional rules may need to determine final outcome (e.g. lowest price)

July 200810

But They Don’t Easily Handle Many Rules Having Multiple Factors and Multiple Values

Order Entry Product Type = Price = Customer Type No NoOrder Entry ypRegular?

Yes

Price * 1.00yp

= Premium?No No

Yes

Customer Type = Premium?

Yes

Price = Price * 0.90

Price = Price * 1.00 No

Price = Price * 0.95

Yes

July 200811

Additional Management Requirements

Who created or updated the rule, and when was last update Why the rule was created/updated Why the rule was created/updated By what device the rule was created or updated (manually, by

import, by software, etc.)Wh th th l f l b h d b ith t Whether the rule can safely be changed by a person without consulting others

What kind of further “research” ought to be done regarding a l if th ti b t th l ? Mi ht it brule, if any, e.g. are there questions about the rule? Might it be

obsolete?

July 200812

Merge Tools & Techniques of:

Information Managementdatabases industrial strength platforms– databases, industrial-strength platforms

Knowledge Management– repository for knowledge of organization, both human-

oriented and machine-oriented

Knowledge Engineering– simulation of expert decision-making with continuous

decision process improvement

July 200813

p p

Ultra-Structure Rules

July 200814

Ultra-Structure Provides Rules with Place-Value

Existing Optionsfreedom of expression

Ultra-Structureexpression of rules is– freedom of expression

means complex syntax

ti i i d

– expression of rules is constrained by ruleforms

ti i i d– semantics is assigned largely by syntax

– semantics is assigned positionally

– result is great freedom but low manageability

– result is adequate freedom plus high manageability

July 200815

Ruleforms Define Place-Value Rule Semantics

Rul

es

July 200816

Benefits

Rule-recognition not triggered by working memory state but by events; different events involve different rulesevents; different events involve different rules

Able to define and manage more complex rules– multiple factors and multiple values per factor address need

for high number of possible permutationsfor high number of possible permutations – multiple considerations applied during rule-recognition

RDBMS permits better management of millions of rules– using standard RDBMS tools, report-writers, etc.– can be read and managed by subject experts

Can exchange tables of rules as data

July 200817

g

Conclusion

The problems with rule management are primarily caused by how we represent ruleshow we represent rules

This is a classic notation/representation problem

Ultra-Structure uses a new abstraction (i.e. ruleforms) to provide a time-tested way of assigning meaning by column

July 200818

References

J. Long, D. Denning (1995), “Ultra-Structure: A design theory for complex systems and processes”; Communications of the ACM Vol. 38, No. 1 (pp. 105-120)

H. Boley, S Tabet, G. Wagner, “Design Rationale for RuleML: A Markup Language for Semantic Web Rules” at citeseer.ist.psu.edu/boley01design.htmlLanguage for Semantic Web Rules at citeseer.ist.psu.edu/boley01design.html

CLIPS Reference Manual (3/28/2008)

July 200819

Other Articles by JL

Long, J., "Automated Identification of Sensitive Information in Documents Using Ultra-Structure". In Proceedings of the 20th Annual ASEM Conference, American Society for Engineering Management (October 1999)

Long, J., "Editor's Note." In Long, J. (guest editor), Semiotica Special Issue: Notational Engineering, Volume 125-1/3 (1999)Notational Engineering, Volume 125 1/3 (1999)

Long, J., "A new notation for representing business and other rules." In Long, J. (guest editor), Semiotica Special Issue: Notational Engineering, Volume 125-1/3 (pp 215 227) (1999)1/3 (pp. 215-227) (1999)

Long, J., "How could the notation be the limitation?" In Long, J. (guest editor), Semiotica Special Issue: Notational Engineering, Volume 125-1/3 (1999)

July 200820

Writings by Others

Shostko, A., “Design of an automatic course-scheduling system using Ultra-Structure.” In Long, J. (guest editor), Semiotica Special Issue: Notational Engineering, Volume 125-1/3 (1999)Engineering, Volume 125 1/3 (1999)

Oh, Y., and Scotti, R., “Analysis and Design of a Database using Ultra-Structure Theory (UST) – Conversion of a Traditional Software System to One Based on UST,” Proceeding of the 20th Annual Conference, American Society for Engineering Management (1999)for Engineering Management (1999)

Parmelee, M., “Design For Change: Ontology-Driven Knowledgebase Applications For Dynamic Biological Domains.” Master’s Paper for the M.S. in I.S. degree, University of North Carolina, Chapel Hill (November 2002)I.S. degree, University of North Carolina, Chapel Hill (November 2002)

Maier, C., CoRE576 : An Exploration of the Ultra-Structure Notational System for Systems Biology Research. Master’s Paper for the M.S. in I.S. degree, University of North Carolina, Chapel Hill (April 2006)

July 200821

Technology

Four ways to represent computer executable rules