Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation1

DataExplorerPush Operator

InfoSphere Streams Version 3.0

Manasa K RaoToolkits


Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.


Agenda

Overview Architecture Diagram Use Cases Overview of InfoSphere Data Explorer Terminologies and Concepts Software Prerequisites Using the DataExplorerPush operator Update scenario Using the optional error output port Metrics


Overview

DataExplorerPush operator is a Java primitive operator added to the existing BigData toolkit

It is a Streams sink adapter providing ability to push data into IBM InfoSphere Data Explorer infrastructure

It can be found in the namespace com.ibm.streams.bigdata.dataexplorer

It has one non windowed input port and an optional error output port It supports sending data of the types: int8, int16, int32, int64, uint8,

uint16, uint32, float32, float64, timestamp, rstring and ustring


Architecture Diagram


Use Cases

Consider a large sports equipment manufacturing firm. In addition to the multiple data sources that already exist within the firm, the social media data is an indispensable source of information that can give indicators on the user experiences and sentiments regarding their products. The social media data can be tapped into Streams and sent into InfoSphere Data Explorer using the DataExplorerPush operator. This data in conjunction with the already existing enterprise data and knowledge from analysis of this data can be used to quickly discover positive trends and negative trends, causes of the negative trends, leader-follower patterns and tap into these valuable information on time.


Overview of Data Explorer

InfoSphere® Data Explorer V8.2 can help organizations discover, navigate, and visualize vast amounts of structured and unstructured information across many enterprise systems and data repositories.

Some of the benefits that InfoSphere Data Explorer offers:– Unlocks the value of big data by enabling organizations to quickly

navigate large volumes of content to discover high value sources.– Creates applications that combine in a single interface structured,

semistructured, and unstructured information that enables organizations to create complete contextual view of topics such as customers, products, employees, projects, and more.

– Delivers a new application framework component that changes the information access paradigm by proactively pushing relevant information to each user based on their activities and business context.

– Empowers organizations to cost-effectively build 360 degree information applications to improve efficiency and solve information-intensive business challenges.


Terminologies and Concepts

BigSearch API - A set of APIs that provides the API user with the capability of adding/modifying records on to Data Explorer index and hides the complexity of the operation from the API user. It internally uses the IBM InfoSphere Data Explorer API

Connection document – Connection document refers to a text file containing information for connection to Data Explorer. It is of the form:

zookeeperNamespace=<The name of the zookeeper namespace that is created>

zookeeperEndpoints=<A single string or a set of strings specifying the zookeeper endpoints>


Software Prerequisites

BigSearch API is required for using the DataExplorerPush operator The BigSearch API and its dependencies need to be present in an

accessible location to DataExplorerPush operator An environment variable BIGSEARCH_JAR needs to be set to point

to the name of the BigSearch API jar For example, if the jar file bigsearch1.jar is the name of the

bigsearch api jar file and is located inside /opt/DataExplorer/lib, then, the BIGSEARCH_JAR is set as follows:

export BIGSEARCH_JAR='/opt/DataExplorer/lib/bigsearch1.jar'


Using the DataExplorerPush operator

namespace application;

use com.ibm.streams.bigdata.dataexplorer::DataExplorerPush;

composite DataExplorerPushMain {

graph

stream<rstring a, int32 b, uint16 c, float32 d,ustring e> InStream = FileSource(){

paramfile: "Tweet.txt";

}

() as Sink1 = DataExplorerPush(InStream){param

connectionDocument : “/home/streamsuser/connections/DataExplorerConnection.txt”;

recordType : "Tweet";recordIdAttribute : “c”;retrievableAttributes : “a”,”b”,”c”,”d”,'e”sortableAttributes: “b”,”e”;filterableAttributes: “a”; nonSearchableAttributes: “a”;suppress: “d”;

} }

Contents of DataExplorerConnection.txt

zookeeperNamespace = Test

zookeeperEndpoints = xxxxxxxxx.ibm.com

Contents of Tweet.txt

"Text1",11,1,11.1,"ai\u00f1ata"

"Text2",22,2,22.2,"bi\u00f1ata"

"Text3",33,3,33.3,"ci\u00f1ata"

"Text4",44,4,44.4,"di\u00f1ata"

"Text5",55,5,55.5,"ei\u00f1ata"


Using the DataExplorerPush operator (cont'd)


Using the DataExplorerPush operator (cont'd)

Attribute 'a' is nonSearchable Consider record : "Text2",22,2,22.2,"bi\u00f1ata" Search using value of 'b', i.e 22 yields:

Search using value of 'a', i.e Text2 yields:


Update scenario

Update scenario – If a record with the same recordId as the current record exists in the collection and is of the same record type, then, an update would be performed on that record

Contents of Tweet.txt

"Text1Changed",11,1,11.1,"ai\u00f1ata"

"Text2",22,2,22.2,"bi\u00f1ata"

"Text3",33,3,33.3,"ci\u00f1ata"

"Text4",44,4,44.4,"di\u00f1ata"

"Text5",55,5,55.5,"ei\u00f1ata"


Using the optional error output port

namespace application;

use com.ibm.streams.bigdata.dataexplorer::DataExplorerPush;

composite DataExplorerPushMain {

graph

stream<rstring a, int32 b, uint16 c, float32 d,ustring e> InStream = FileSource(){paramfile: "Tweet.txt";}

stream<tuple<rstring a, ustringe> inTuple, rstring recordId, rstring errorMsg, rstring collectionName, rstring recordType> = DataExplorerPush(InStream){paramconnectionDocument : “/home/streamsuser/connections/DataExplorerConnection.txt”;recordType : "Tweet";recordIdAttribute : “c”;retrievableAttributes : “a”,”b”,”c”,”d”,'e”sortableAttributes: “b”,”e”;filterableAttributes: “a”; nonSearchableAttributes: “a”;suppress: “d”;} }

{a="Text1Changed",e="aiÃ±ata"},"1","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text2",e="biÃ±ata"},"2","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text3",e="ciÃ±ata"},"3","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text4",e="diÃ±ata"},"4","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text5",e="eiÃ±ata"},"5","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"


Metrics

4 metrics : nRecordsPushed, nRequestsOutstanding, nRecordsFailed and nRecordsWithNonIndexableFields are supported


Thank You


Backup Slides


Zookeeper Namespace

To create a zookeeper namespace: In the bigsearch lib:java -jar xxx.jar -n <namespace> -s <servers> -i <entitymodel

file> Entity model file: This file contains information on which velocity

instance/instances is this zookeeper being configured, collection name/names that the data need to go to and entity type/types of the data that is being sent to.

Documents

Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits