18
© 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

Embed Size (px)

Citation preview

Page 1: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation1

DataExplorerPush Operator

InfoSphere Streams Version 3.0

Manasa K RaoToolkits

Page 2: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation2

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation3

Agenda

Overview Architecture Diagram Use Cases Overview of InfoSphere Data Explorer Terminologies and Concepts Software Prerequisites Using the DataExplorerPush operator Update scenario Using the optional error output port Metrics

Page 4: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation4

Overview

DataExplorerPush operator is a Java primitive operator added to the existing BigData toolkit

It is a Streams sink adapter providing ability to push data into IBM InfoSphere Data Explorer infrastructure

It can be found in the namespace com.ibm.streams.bigdata.dataexplorer

It has one non windowed input port and an optional error output port It supports sending data of the types: int8, int16, int32, int64, uint8,

uint16, uint32, float32, float64, timestamp, rstring and ustring

Page 5: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation5

Architecture Diagram

Page 6: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation6

Use Cases

Consider a large sports equipment manufacturing firm. In addition to the multiple data sources that already exist within the firm, the social media data is an indispensable source of information that can give indicators on the user experiences and sentiments regarding their products. The social media data can be tapped into Streams and sent into InfoSphere Data Explorer using the DataExplorerPush operator. This data in conjunction with the already existing enterprise data and knowledge from analysis of this data can be used to quickly discover positive trends and negative trends, causes of the negative trends, leader-follower patterns and tap into these valuable information on time.

Page 7: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation7

Overview of Data Explorer

InfoSphere® Data Explorer V8.2 can help organizations discover, navigate, and visualize vast amounts of structured and unstructured information across many enterprise systems and data repositories.

Some of the benefits that InfoSphere Data Explorer offers:– Unlocks the value of big data by enabling organizations to quickly

navigate large volumes of content to discover high value sources.– Creates applications that combine in a single interface structured,

semistructured, and unstructured information that enables organizations to create complete contextual view of topics such as customers, products, employees, projects, and more.

– Delivers a new application framework component that changes the information access paradigm by proactively pushing relevant information to each user based on their activities and business context.

– Empowers organizations to cost-effectively build 360 degree information applications to improve efficiency and solve information-intensive business challenges.

Page 8: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation8

Terminologies and Concepts

BigSearch API - A set of APIs that provides the API user with the capability of adding/modifying records on to Data Explorer index and hides the complexity of the operation from the API user. It internally uses the IBM InfoSphere Data Explorer API

Connection document – Connection document refers to a text file containing information for connection to Data Explorer. It is of the form:

zookeeperNamespace=<The name of the zookeeper namespace that is created>

zookeeperEndpoints=<A single string or a set of strings specifying the zookeeper endpoints>

Page 9: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation9

Software Prerequisites

BigSearch API is required for using the DataExplorerPush operator The BigSearch API and its dependencies need to be present in an

accessible location to DataExplorerPush operator An environment variable BIGSEARCH_JAR needs to be set to point

to the name of the BigSearch API jar For example, if the jar file bigsearch1.jar is the name of the

bigsearch api jar file and is located inside /opt/DataExplorer/lib, then, the BIGSEARCH_JAR is set as follows:

export BIGSEARCH_JAR='/opt/DataExplorer/lib/bigsearch1.jar'

Page 10: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation10

Using the DataExplorerPush operator

namespace application;

use com.ibm.streams.bigdata.dataexplorer::DataExplorerPush;

composite DataExplorerPushMain {

graph

stream<rstring a, int32 b, uint16 c, float32 d,ustring e> InStream = FileSource(){

paramfile: "Tweet.txt";

}

() as Sink1 = DataExplorerPush(InStream){param

connectionDocument : “/home/streamsuser/connections/DataExplorerConnection.txt”;

recordType : "Tweet";recordIdAttribute : “c”;retrievableAttributes : “a”,”b”,”c”,”d”,'e”sortableAttributes: “b”,”e”;filterableAttributes: “a”; nonSearchableAttributes: “a”;suppress: “d”;

} }

Contents of DataExplorerConnection.txt

zookeeperNamespace = Test

zookeeperEndpoints = xxxxxxxxx.ibm.com

Contents of Tweet.txt

"Text1",11,1,11.1,"ai\u00f1ata"

"Text2",22,2,22.2,"bi\u00f1ata"

"Text3",33,3,33.3,"ci\u00f1ata"

"Text4",44,4,44.4,"di\u00f1ata"

"Text5",55,5,55.5,"ei\u00f1ata"

Page 11: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation11

Using the DataExplorerPush operator (cont'd)

Page 12: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation12

Using the DataExplorerPush operator (cont'd)

Attribute 'a' is nonSearchable Consider record : "Text2",22,2,22.2,"bi\u00f1ata" Search using value of 'b', i.e 22 yields:

Search using value of 'a', i.e Text2 yields:

Page 13: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation13

Update scenario

Update scenario – If a record with the same recordId as the current record exists in the collection and is of the same record type, then, an update would be performed on that record

Contents of Tweet.txt

"Text1Changed",11,1,11.1,"ai\u00f1ata"

"Text2",22,2,22.2,"bi\u00f1ata"

"Text3",33,3,33.3,"ci\u00f1ata"

"Text4",44,4,44.4,"di\u00f1ata"

"Text5",55,5,55.5,"ei\u00f1ata"

Page 14: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation14

Using the optional error output port

namespace application;

use com.ibm.streams.bigdata.dataexplorer::DataExplorerPush;

composite DataExplorerPushMain {

graph

stream<rstring a, int32 b, uint16 c, float32 d,ustring e> InStream = FileSource(){paramfile: "Tweet.txt";}

stream<tuple<rstring a, ustringe> inTuple, rstring recordId, rstring errorMsg, rstring collectionName, rstring recordType> = DataExplorerPush(InStream){paramconnectionDocument : “/home/streamsuser/connections/DataExplorerConnection.txt”;recordType : "Tweet";recordIdAttribute : “c”;retrievableAttributes : “a”,”b”,”c”,”d”,'e”sortableAttributes: “b”,”e”;filterableAttributes: “a”; nonSearchableAttributes: “a”;suppress: “d”;} }

{a="Text1Changed",e="aiñata"},"1","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text2",e="biñata"},"2","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text3",e="ciñata"},"3","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text4",e="diñata"},"4","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

{a="Text5",e="eiñata"},"5","com.ibm.dataexplorer.bigsearch.IndexerException: com.ibm.dataexplorer.bigsearch.IndexerException: Failure.","induceerror","Tweet"

Page 15: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation15

Metrics

4 metrics : nRecordsPushed, nRequestsOutstanding, nRecordsFailed and nRecordsWithNonIndexableFields are supported

Page 16: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation16

Thank You

Page 17: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation17

Backup Slides

Page 18: Click to add text © 2012 IBM Corporation 1 DataExplorerPush Operator InfoSphere Streams Version 3.0 Manasa K Rao Toolkits

© 2012 IBM Corporation18

Zookeeper Namespace

To create a zookeeper namespace: In the bigsearch lib:java -jar xxx.jar -n <namespace> -s <servers> -i <entitymodel

file> Entity model file: This file contains information on which velocity

instance/instances is this zookeeper being configured, collection name/names that the data need to go to and entity type/types of the data that is being sent to.