Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law
Notices and Disclaimers
2
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
Notices and Disclaimers cont.
3
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Please Note:
4
You are custodian of the most valuable data within the enterprise IF you can release it for business value
Are you an Analytics Rockstar?
5
6
Organizations with a highly engaged workforce significantly outperform those without
The shift to digital now makes analysis of engagement networks possible
Organizations with a highly engaged workforce significantly outperform those without
The shift to digital now makes analysis of engagement networks possible
7
Can we use analytics to better understand employee engagement and it’s impact on the business?
ODPi (Open Data Platform Initiative, odpi.org)
10
ODPi is an industry effort to promote and advance the state of Apache Hadoop and Big Data technology for the enterprise. It
currently has 24 member companies.
IBM is a founding member of ODPi and is
one of 4 members to release a data platform based on the ODP core;
IBM Open Platform.
PrioritiesCertifications for ODPi
compatible distributions
Guidelines for ODPi ISVs and consumers
Introduce more big data projects into ODPi
Data Exchange
Data Scientist & Developer Platform Services
Analytic Services
Data Processing & Management
IBM Open Platform (ibm.biz/ibmopenplatform)
11
IBM Engagement Analytics (ibm.com/engage)
12
Data Exchange
Data Scientist & Developer Platform Services
Analytic Services
Data Processing & Management
Helps each employee better understand their engagement, reputation, and helps them more effectively activate their network for maximum value
The Personal Social Dashboard
14
Activity: Measure of your activity
Reaction: Measure of how people respond to your activity
Eminence: Measure of how people respond to you
Network: Measure of the quality of your network and your role within it
Helps management better understand overall engagement and organizational health, identify issues and action accordingly
– Shows connectivity within & between teams
– Identifies people who play key roles
– Highlights organizational brittleness
The Organizational Dashboard
15
Many analysis actionable w/ recommendations
17
Understand your engagement & reputation within the
social network
Act on your personal
recommendations to drive improvement
Employee Matching: Based on a person’s social activity define if, and to what level, they fit a specific social engagement trait
Template Instantiation: Generate recommendations that if followed can change and strengthen their engagement patterns
Based on Recommendation Templates & Network Analysis:
Innovation & Advocacy
18
#1 Collaboration Does Impact Business Outcome • Engaged employees are 120% more likely to generate
Innovation and 150% more likely to demonstrate Customer Advocacy
#2 Optimal Behavior is Different for Everyone• A variety of interactions most effectively
contribute to business outcome
#3 Discovering & Disseminating Optimal Behaviors is Key to Improving Business Outcome• The Personal Social Dashboard provides such a channel
Employee Retention
19
Does engagement change prior to an attrition event? Analyzed organizational, social, and
retention data Inspected 10,000 random employees as a
control group and 1188 employees who quit
Yes! And engagement analytics can help to predict attrition events Social Behavior Patterns: less engaged with differences in types of activity Volume of Activity: less activity several months prior to attrition event Network: Attrition is viral (common manager, passive and active network
BusinessInsights
Analyticsdata
data
data
datadata
data
datadata data
data
data
data
data
data
data datadata
datadata
data
data
data
data
data
data
data
data
data
data
data
datadata data
data
data
data
datadata
data
data
data
data data
dataAnalytics
Our scope: making sense of the data
23
Date/time Latitude Longitude... ... ...
01/10/2015 16:15 53.3330556 -6.2488889
01/10/2015 16:30 53.4 -6.4666667
01/10/2015 16:45 53.4 -6.4666667
01/10/2015 17:00 53.4 -6.4666667
01/10/2015 17:15 53.4 -6.4666667
01/10/2015 15:45 53.4 -6.4666667
01/10/2015 15:45 53.3330556 -6.4666667
... ... ...
Where the person lives− House, apartment, ...− Type of neighbourhood
Where the person works− Potential income indications− Type of work
Where the person shops− Type of supermarket− Practice sport (cycling, running
)...
Locations for one person over one year
25
Date/time Latitude Longitude Person... ... ... ...
01/10/2015 16:15 53.3330556
-6.2488889 Vincent
01/10/2015 16:15 48.623881 7.747846 Bob
01/10/2015 16:15 28.497371 -81.407531 Sally
01/10/2015 16:15 53.4 -6.4666667 James
01/10/2015 16:30 53.4 -6.4666667 Vincent
01/10/2015 16:30 48.623881 7.747846 Bob
01/10/2015 16:30 28.497371 -81.407531 Sally
... ... ...
Social connections (2 or more people at the same location on a regular basis)
− Build general patterns to predict preferences and behaviors
− People who live in X and shop in Y tend to like Z
Locations for multiple people over one year
26
IBM Connections
Social events
Business Insights
Analytics
Collaboration tool ... that lets you “look under the hood”− Connections generates discrete events about who did what in the system at very granular level− By applying analytics to large number of events allows to define patterns, statistics .... business
insights
Value of IBM Connections
27
Home pageSee what's happening across your social network
CommunitiesWork with people who share common roles and expertise
FilesPost, share, and discover documents,presentations, images, and more
Micro-bloggingReach out for help your social network
ProfilesFind the people you need
WikisCreate web content together
ActivitiesOrganize your work and tap your professional network
BookmarksSave, share, and discover bookmarks
BlogsPresent your own ideas, and learn from others
ForumsExchange ideas with, and benefit from the expertise of others
IBM Connections
29
IBM Connections provides APIs and SPIs that allow the value of the social data to be maximized by external systems:
− ALL Connections data can be accessed by external systems
− Open, transparent, breaking down silos
Pull data from IBM Connections− Programmatically access much of the same
information that you can through the IBM Connections user interface
Have Connections push data to you− All data changes (CUD) event in all IBM
Connections components can be supplied to external consumers
Connections Maximizes The Value of Social Data
30
Directory
JMX / WSAdminAdministration
Search
Person Card
User Directory
IBM Connections Apps
RDB
Common Services
NavigationalHeader File
System
Connections Architecture
31
HTML
Directory
JMX / WSAdminAdministration
Search
Person Card
User Directory
HTTP Server & Proxy Cache
POST
JavaScript Atom FeedAtom EntryPUT DELETE GET
HTML Form
IBM Connections Apps
RDB
Common ServicesREST API
Feed Reader Sametime Portlets Your AppLotus NotesBrowser Mashups
JSON
Microsoft Office
NavigationalHeader
Connections Atom API
FileSystem
Connections Architecture
32
HTML
Directory
JMX / WSAdminAdministration
Search
Person Card
User Directory
HTTP Server & Proxy Cache
POST
JavaScript Atom FeedAtom EntryPUT DELETE GET
HTML Form
IBM Connections Apps
RDB
Common Services
Other Enterprise Services
REST API
Feed Reader Sametime Portlets Your AppLotus NotesBrowser Mashups
JSON
Microsoft Office
NavigationalHeader
Connections Atom API
Integration bus Event SPI
Your App
FileSystem
Connections Architecture
33
Designed to allow 3rd party to get notified whenever a data change happens in any of the IBM Connections service
− Real-time events generated by IBM Connections include all create, update, and delete (CUD) operations
− Potential to represent the complete interaction footprint of the enterprise
− Allowing to capture, persist, model, analyze, visualize and monetize your enterprise network
SPI (System Programming Interface) vs API (Application Programming Interface)
− SPI at lower level than APIs ... contribute Java code at system level
− By contributing Java code written to this SPI, 3rd parties can listen to creation, deletion and update (and more!) events of content within IBM Connections
The Event SPI is the social data fire-hose
34
Events: collections of data generated when activities (data-modifying, notifications) occur in IBM Connections
− In the SPI, an event is represented by a Java bean / object
− A Event encapsulate data such as the type of action and the object (and container) involved in the action
Events are delivered to Event Handlers:− An event handler is a Java class implemented by a 3rd
party (you!)− Event handlers are registered in an XML file (event-
config.xml)− Instructing what type of event to send to a given handler− Connections delivers Java bean representing the event
to registered event handler(s)
Event SPI
Handler 1Handler 2
Handler N
Event-config.xml
Event SPI – Programming aspects
35
The Event SPI relies on event handlers written in Java to allow vendors to listen and process events generated by the system
− Running external code (untrusted) on Cloud is not possible
− Running 3rd party code on same WebSphere servers as our applications is not safe
− Multitenancy issues
Introducting Switchbox− Our plan is to allow customers/vendors to listen events
generated for their own organization on our Cloud applications without running code on our system
− Already leveraged by compliance solutions− Currently being implemented for broader consumption,
not available as of now
Cloud considerations
36
Reliable delivery mechanism− Delivery at least once, support and recover
from network failure− Latency tolerant
Ease of transition between on-premise and Cloud
− Java event handlers implemented for Event SPI can be run by Switchbox client
− Main difference being that the event handlers are deployed and run on customer infrastructure, outside IBM Connections datacenter
− SwitchBox client invokes event handlers upon reception of event
Base for generation of events from most IBM social apps (Sametime)
Event SPI
SwitchBox client
Handler 1
Handler 2
SwitchBox server
Switchboxhandler
Customer infrastructure
Switchbox is not currently available. This diagram shows our desire to provide such a solution to allow customer consume events from their own organization on Cloud
IBM Connections Cloud infrastructure
Cloud considerations
37
blog.entry.created:“Amy Jones posted a blog entry in the blog named XYZ”
The person who initiated this action.
Details: External id, name and, if not disabled, email address
Type Item ContainerActor
Type of action
Example: CREATE, UPDATE, DELETE, NOTIFY, MEMBERSHIP, ..
General concept for representing an individual entity within a container
Details: id, name, textual content, HTML and ATOM paths
General concept for representing a "bucket" or "container" that contains other items
Details: id, name
Event SPI – available data in each event
38
Many more data fields encapsulated in events:− Correlation item set to represent parent-child relationship (events about commenting action)− Target set, allowing to deduce interaction between content and people− Membership delta field, indicating who has been added/removed from a community, activity, ...− ... see Event SPI documentation for full list (JavaDoc)
Key point: the event model encapsulates
all of data needed to understand the interaction between people, content and
containers in the platform
Event SPI – available data in each event
39
Challenges of analytics:
Large amount of incoming event stream− Over 100+ events per second CUD− Growing on longer term− Scalable framework for analysis− Horizontal scale to address growth
(Near) real-time indexingNo data loss
Event SPI in the context of an analytic solution
40
Analysis, even basic, is time consuming, thus:
Analysis should not occur in the event handler, but in an external system (“Analytics Service”)The event handler should not wait until the analytic service processes the event
− It would result in an accumulation of events at Connections level
− Problematic as Connections queue retaining events to be delivered to event handler has a limited depth
=> Design event handler to consume and process events as fast as possible, ie: as the interface between IBM Connections and an external system
“Data backbone”Storage for asynchronous processing
Event SPI
Analytics Service
Event Handler
Goal: retaining as many events as possible for further analysis
Taming the fire-hose... (1/2)
41
Characteristics of the data backbone− Distributed and highly available− Horizontal scale− High throughput− Agnostic to consumers' state
Multiple options− Message broker
MQ / MQTT / ActiveMQ / Apache Kafka
− Database− ...
Taming the fire-hose... (2/2)
42
Send JSON representation of the event. Serialization to JSON through Open Source GSON library
Java class implementing the EventHandler interface
Integration with a message broker – Apache Kafka
43
Registration – through events-config.xml
Java class implementing EventHandler interface
Subscriptions define the events delivered by the SPI to the event handler.
Filtered by event name, source (IBM service), or/and type (CREATE, UPDATE, DELETE, ...)
Properties: name/value pair injected in the event handler java class.Typically used to pass config. settings
Integration with a message broker – Apache Kafka
44
Deployment – jar and dependencies made available to the SPI (running in the IBM Connections News application) through a Shared Library in WebSphere Application Server
Integration with a message broker – Apache Kafka
45
IBM Connections provides OpenSocial Activity Streams APIs allowing 3rd party to push their own events to the Activity Stream
Since Connections 4.5:− Events pushed through the Activity Stream APIs
are also surfaced in the Event SPI− An option allows to NOT surface an event in the
Activity Stream APIs, ie: only surface through the Event SPIs
=> 3rd party applications can also participate in the social analytics graph simply by publishing to the Connections Activity Stream APIs
3rd party events can also participate in the social analytics solution
46
Good news:Events surface in most case all data needed for analytics purposes (including the content the event is about)Events about the same object repeat data
− If there are X events about the same object, the item/correlation data set will always contain the most up-to-date information about the referenced object
For an analytic solution – in a nutshell, this means that the Event SPI should be sufficient in most cases
You can “pull” all data from Connections...
but is it really needed?
Pulling data – when is it needed ?
47
“Push” approach (Event SPI) is sufficient to build most analytic solution− All necessary content (textual content, tags, …) is surfaced in every single event− All operation changing relationships (ie: adding/removing member, colleague, follower) are
surfaced as events
“Pull” (REST APIs) approaches should stay limited to:1.“Bootstrap” the Analytics Service based on a Connections system with data existing prior to the
introduction of the event handler used in your analytic solution
Essentially building membership/network data (as needed)Seeding the content should not be needed, as it is repeated whenever an event about the content is generated
1.Fetching data not available through the Event SPI
Relatively “rare” for events generated from Connections
Pulling data – when is it needed ?
48
2 main approaches for pulling data from Connections
1. REST APIs (Atom / OpenSocial format)− REST-style HTTP based APIs (XML, Json format)− Transparency: programmatically access much of the same information that can be accessed
through the IBM Connections UI− “Drink your own champagne” - public APIs used internally by plug-ins, mobile … and even some
components Web UI (Activity Stream, Activities, …)
2. Seedlist− Designed to allow crawling of Connections data for indexing purpose by a search engine− Surfacing all content in the system – therefore it can be of some value for an analytic solution− HTTP based APIs (Atom XML format)
Pulling data from Connections
49
Example: /forums/seedlist/myserver returns ALL forum entries in the system− Textual content, author, number of comments, number of recommendations, parent id, ACL
Seedlist
50
REST APIs support basic authentication, form-based authentication and (for most APIs) OauthPrivate data: strict enforcement of access on API calls
− Not very convenient for access by an analytic system...
“Super user”− Concept of “super user” - access control checks
on private data are by-passed− On-premise: the “super user” is a user mapped in
the JEE “admin” role across all Connections services
−On Cloud: impersonation support can help to fetch data for a range of users (progressively being disclosed)
Authentication aspects for the REST APIs
51
In some very specific cases, data not available in a form easily consumable to build an analytic solution
− Example: getting the list of followers for a given object in the system− Query directly the Connections databases (in these specific cases only)− Database schema can change overtime and is private
REST APIs (Atom / OS APIs) Seedlist
Pros •Fine granularity: access content / meta-data for a specific object / container•Access relationship informationAPIs are available for fetching membership lists, network information, who liked a given object, ...
•Batch retrieval of textual content•Incremental updates (but the Event SPI is much more suitable for this purpose)
Cons Lack of batch retrieval capabilities
Focused around content - does not expose all the data (missing tags membership information)
Pulling data from Connections – What to use, when?
52
Leverage the Event SPI as much as possible− Provides (most of) the data needed for any elaborated
analytics solution− Just let Connections push data to you! Easier, perform
well“Fill the gaps” by pulling data from the Atom/Seedlist APIs
− Initial loading of relationship / content data− Data not available through the Event SPI
One final warning:− Analytic solution access to private data through the Event SPI, and Atom/Seedlist APIs (with admin
role)
=> Ensure your solution is not leaking private data to unauthorized users
Key Points
53
Key parts of typical analytic pipeline
58
IBM Connections!
* Extract: Consume events
* Transform: Transform format
* Load: Load transformed data to database / disk
* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationship as graph
A property graph has:− vertices and edges can have any number of properties− directed relationships
Graph structure is ideal to represent relationships between entities (people, objects)− Context around the event− Cause and effect of an event− Artefacts related to an event
Person A Person BStatus Update Status UpdateComment
creates createscomments on
Representing Connections data as graph
59
Key parts of typical analytic pipeline
60
IBM Connections!
* Extract: Consume events
* Transform: Transform format
* Load: Load transformed data to database / disk
* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationship as graph
Query graph to generate insights: activity, eminence, reaction, network.Store score per user and org
Key parts of typical analytic pipeline
61
IBM Connections!
* Extract: Consume events
* Transform: Transform format
* Load: Load transformed data to database / disk
* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationship as graph
Query graph to generate insights: activity, eminence, reaction, network.Store score per user and org
API / UI to surface scores generated in previous step
Volume
Velocity Variety
Veracity
100s of eventsper seconds
~500 bytes perevent+ bulk data
=> 180 GB per hour,4.3 TB per day
Not an issue withConnections, cantrust veracityof eventsfrom Connections
Semi-structured dataTime and spatial aspectsEasy to represent asgraph
4 dimensions of Big Data
62
IBM Open Platform (ibm.biz/ibmopenplatform)
64
Data Exchange
Data Scientist & Developer Platform Services
Analytic Services
Data Processing & Management
IBM Engagement Analytics (ibm.com/engage)
65
Data Exchange
Data Scientist & Developer Platform Services
Analytic Services
Data Processing & Management
Value of collaboration data:− From discrete events to generating deep insights about people, network … the whole organization− Key insights by leveraging Big Data Analytics on events− Insights only limited by data and your own ability to process it
IBM Connections has its own powerful set of APIs to access to most interactions in the system− Fully available on promise− Being unlocked on Cloud
Analytic platform available (IBM Open Platform)− Get started with IBM Open Platform and build on top of it
Key points
66
IBM Open Platform @ ibm.biz/ibmopenplatform
IBM Engagement Analytics @ ibm.com/engage
Event SPI @ ibm.biz/eventspi w/ Java Doc @ ibm.biz/eventspijavadoc
SocialBiz User Group @ www.socialbizug.org
Follow us on Twitter @IBMConnect, @IBMSocialBiz, @marie_wallace
LinkedIn @ ibm.biz/socbizlinkedin; participate in the our Social Business group
Facebook @ www.facebook.com/IBMSocialBiz; give us a Like
Social Business Insights Blog @ ibm.com/blogs/socialbusiness; join the conversation!
More resources online
67