53
© 2014 IBM Corporation AD306 Turbocharge Your Enterprise Social Network with Analytics Vincent Burckhardt, IBM David Robinson, IBM

AD306 - Turbocharge Your Enterprise Social Network With Analytics

Embed Size (px)

DESCRIPTION

Social is generating large volumes of data about the business (who interacts with whom, when, and in what context). However, little of this data is being actively leveraged in order to generate insights that allow the business to work smarter and faster. This technical session describes how to capture and collect interactions within IBM Connections through its public APIs and apply a variety of analytics, including map/reduce and graph analytics, on a scalable Hadoop platform. This allows us to uncover insights into what the corporate network structure looks like, how information propagates across the organization, how are opinions formed, and how resilient is the organization to attrition.

Citation preview

Page 1: AD306 - Turbocharge Your Enterprise Social Network With Analytics

© 2014 IBM Corporation

AD306Turbocharge Your Enterprise Social Network with Analytics

Vincent Burckhardt, IBMDavid Robinson, IBM

Page 2: AD306 - Turbocharge Your Enterprise Social Network With Analytics

2

Please Note

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Page 3: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Agenda

A Peek into Data Science

Extracting IBM Connections data for analytical purposes

Analytics And Connections Data

3

Page 4: AD306 - Turbocharge Your Enterprise Social Network With Analytics

4

A Peek Into Data Science

Page 5: AD306 - Turbocharge Your Enterprise Social Network With Analytics

What Is This Thing Called Data Science ?

5

Credit: Rachel Schutt/Cathy O’Neil

Page 6: AD306 - Turbocharge Your Enterprise Social Network With Analytics

6

A Single Coffee Receipt

12/10/2013

date time cashier size qty itemlocation

13:09 Chris Raleigh500 reg 1 mocha

spent

.80

Page 7: AD306 - Turbocharge Your Enterprise Social Network With Analytics

7

A Year’s Worth Of Coffee Receipts For One Person

01/10/2013

date time cashier size qty itemlocation

13:53 Chris Raleigh500 reg 1 mocha

spent

.80

01/12/2013 14:02 Doug Carrabou reg 1 mocha .80

01/14/2013 13:09 Nadia Raleigh500 reg 1 vanilla .75

02/01/2013 14:02 Nadia Raleigh500 lg 1 mocha 1.10

03/14/2013 13:14 Chris Raleigh500 reg 1 blend .60

04/20/2013 13:32 Nadia Stardoe lg 1 mocha 1.10

…12/14/2013 13:14 Bev Raleigh500 reg 1 blend .60

12/20/2013 13:32 Nadia Winston’s reg 1 mocha 1.10

InsightsM-F, 1-2 pm72% Raleigh50075% regular63% mocha$.87 avg spending

Page 8: AD306 - Turbocharge Your Enterprise Social Network With Analytics

8

A Year’s Worth Of Coffee Receipts For Many People

01/10/2013

date time cashier size qty itemlocation

13:53 Chris Raleigh500 reg 1 mocha

spent

.80

01/12/2013 14:02 Doug Carrabou reg 1 mocha .80

01/14/2013 13:09 Nadia Raleigh500 reg 1 vanilla .75

02/01/2013 14:02 Nadia Raleigh500 lg 1 mocha 1.10

03/14/2013 13:14 Chris Raleigh500 reg 1 blend .60

04/20/2013 13:32 Nadia Stardoe lg 1 mocha 1.10

…12/14/2013 13:14 Bev Raleigh500 reg 1 blend .60

12/20/2013 13:32 Nadia Winston’s reg 1 mocha 1.10

person

Joel

Toni

Joni

Joe

Dan

Dave

Ken

Sally

You get the idea…

Page 9: AD306 - Turbocharge Your Enterprise Social Network With Analytics

9

Business Actions From Insights

From a single transaction (one receipt)

To engaging the customer with relevant actions (many receipts)

- Coupons for food - Weekend offers ?- Loyalty card ?- Employee rewards ?

Page 10: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Datafication

“The process of taking all aspects of life and turning them into data”– Google’s augmented-reality glasses– Twitter for thoughts– LinkedIn for professional networks

Creating new products with data, improving existing products with data

10

Credit: Kenneth Cukier/Victor Mayer-SchoenbergerMay/June 2013 Foreign Affairshttp://tinyurl.com/ke6cqku

Today we’ll show you how to add Lotus Connections to the list

Page 11: AD306 - Turbocharge Your Enterprise Social Network With Analytics

11

The Value of Connections ?

Obvious value:– Collaboration tool

Business Insights

Connections Analytics

Perhaps “not so obvious” value:–“Social Receipts” …Datafication of Interaction Patterns…Business Insights !

Page 12: AD306 - Turbocharge Your Enterprise Social Network With Analytics

12

Possible Questions Connections Data Can Help Answer

Are you effectively communicating your message ?

Are other’s responding to your message ?

Are customers, business partners, contractors, employees responding to your message?

Who are brokers of information in the organization ?

What Lotus communities are the most effective ?

What are the communication patterns like between divisions ?

What are the communication characteristics of high performing organizations ?

Ask Your Question… Find Your Business Value

Page 13: AD306 - Turbocharge Your Enterprise Social Network With Analytics

13

Extracting IBM Connections datafor analytical purposes

Page 14: AD306 - Turbocharge Your Enterprise Social Network With Analytics

IBM Connections

Home pageSee what's happening across your social network

CommunitiesWork with people who share common roles and expertise

FilesPost, share, and discover documents, presentations, images, and more

Micro-bloggingReach out for help your social network

ProfilesFind the people you need

WikisCreate web content together

ActivitiesOrganize your work and tap your professional network

BookmarksSave, share, and discover bookmarks

BlogsPresent your own ideas, and learn from others

ForumsExchange ideas with, and benefit from the expertise of others

Page 15: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Connections Maximizes The Value of Social Data

IBM Connections provides APIs and SPIs that allow the value of the social data to be maximized by external systems:

– ALL Connections data can be accessed by external systems

– Open, transparent, breaking down silos

Pull data from IBM Connections– Programmatically access much of the same

information that you can through the IBM Connections user interface

Have Connections push data to you– All data changes (CUD) event in all IBM Connections

components can be supplied to external consumers

Page 16: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Connections Architecture

Directory

JMX / WSAdminAdministration

Search

Person Card

User Directory

IBM Connections Apps

RDB

Common Services

NavigationalHeader File

System

Page 17: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Connections Architecture

HTML

Directory

JMX / WSAdminAdministration

Search

Person Card

User Directory

HTTP Server & Proxy Cache

POST

JavaScript Atom FeedAtom Entry

PUT DELETE GET

HTML Form

IBM Connections Apps

RDB

Common Services

REST API

Feed Reader

Sametime Portlets Your AppLotus NotesBrowser Mashups

JSON

Microsoft Office

NavigationalHeader

Connections Atom API

FileSystem

Page 18: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Connections Architecture

HTML

Directory

JMX / WSAdminAdministration

Search

Person Card

User Directory

HTTP Server & Proxy Cache

POST

JavaScript Atom FeedAtom Entry

PUT DELETE GET

HTML Form

IBM Connections Apps

RDB

Common Services

Other Enterprise Services

REST API

Feed Reader

Sametime Portlets Your AppLotus NotesBrowser Mashups

JSON

Microsoft Office

NavigationalHeader

Connections Atom API

Integration bus Event SPI

Your App

FileSystem

Page 19: AD306 - Turbocharge Your Enterprise Social Network With Analytics

The Event SPI is the social data fire-hose

Designed to allow 3rd party to get notified whenever a data change happens in any of the IBM Connections service

– Real-time events generated by IBM Connections include all create, update, and delete (CUD) operations.

– Potential to represent the complete interaction footprint of the enterprise

– Allowing to capture, persist, model, analyze, visualize and monetize your enterprise network

SPI (System Programming Interface) vs API (Application Programming Interface)

– SPI at lower level than APIs ... contribute Java code at system level

– By contributing Java code written to this SPI, 3rd parties can listen to creation, deletion and update (and more!) events of content within IBM Connections

Page 20: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Event SPI – Programming aspects

Events: collections of data generated when activities (data-modifying, notifications) occur in IBM Connections

– In the SPI, an event is represented by a Java bean / object

– A Event encapsulate data such as the type of action and the object (and container) involved in the action

Events are delivered to Event Handlers: – An event handler is a Java class implemented by a 3rd

party (you!)– Event handlers are registered in an XML file (event-

config.xml)• Instructing what type of event to send to a given

handler– Connections delivers Java bean representing the event

to registered event handler(s)

Event SPI

Handler 1

Handler 2Handler N

Event-config.xml

Page 21: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Event SPI – available data in each eventblog.entry.created:

“Amy Jones posted a blog entry in the blog named XYZ”

The person who initiated this action.

Details: External id, name and, if not disabled, email address

Type Item ContainerActor

Type of action

Example: CREATE, UPDATE, DELETE, NOTIFY, MEMBERSHIP, ..

General concept for representing an individual entity within a container

Details: id, name, textual content, HTML and ATOM paths

General concept for representing a "bucket" or "container" that contains other items

Details: id, name

Page 22: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Event SPI – available data in each event

Many more data fields encapsulated in events:

– Correlation item set to represent parent-child relationship (events about commenting action)

– Target set, allowing to deduce interaction between content and people

– Membership delta field, indicating who has been added/removed from a community, activity, ...

– ... see Event SPI documentation for full list (JavaDoc)

Key point: the event model encapsulates

all of data needed to understand the interaction between people, content and

containers in the platform

Page 23: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Event SPI in the context of an analytic solution

Challenges of analytics:

Large amount of incoming event stream– Over 100+ events per second CUD– Growing on longer term– Scalable framework for analysis

• Horizontal scale to address growth

(Near) real-time indexing

No data loss

Page 24: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Taming the fire-hose... (1/2)

Analysis, even basic, is time consuming, thus:

Analysis should not occur in the event handler, but in an external system (“Analytics Service”)

The event handler should not wait until the analytic service processes the event

– It would result in an accumulation of events at Connections level

– Problematic as Connections queue retaining events to be delivered to event handler has a limited depth

=> Design event handler to consume and process events as fast as possible, ie: as the interface between IBM Connections and an external system

“Data backbone” Storage for asynchronous processing

Event SPI

Analytics Service

Event Handler

Goal: retaining as many events as possible for further analysis

Page 25: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Taming the fire-hose... (2/2)

Characteristics of the data backbone– Distributed and highly available– Horizontal scale– High throughput– Agnostic to consumers' state

Multiple options– Message broker

MQ / MQTT / ActiveMQ / Apache Kafka

– Database– ...

Page 26: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Integration with a message broker – Apache Kafka

Send JSON representation of the event. Serialization to JSON through Open Source GSON library

Java class implementing the EventHandler interface

Page 27: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Integration with a message broker – Apache Kafka

Registration – through events-config.xml

Java class implementing EventHandler interface

Subscriptions define the events delivered by the SPI to the event handler.

Filtered by event name, source (IBM service), or/and type (CREATE, UPDATE, DELETE, ...)

Properties: name/value pair injected in the event handler java class.Typically used to pass config. settings

Page 28: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Integration with a message broker – Apache Kafka

Deployment – jar and dependencies made available to the SPI (running in the IBM Connections News application) through a Shared Library in WebSphere Application Server

Page 29: AD306 - Turbocharge Your Enterprise Social Network With Analytics

3rd party events can also participate in the social analytics solution

IBM Connections provides OpenSocial Activity Streams APIs allowing 3rd party to push their own events to the Activity Stream

From Connections 4.5:– Events pushed through the Activity Stream

APIs are also surfaced in the Event SPI– An option allows to NOT surface an event

in the Activity Stream APIs, ie: only surface through the Event SPIs

=> 3rd party application can also participate in the social analytics graph simply by publishing to the Connections Activity Stream APIs

Page 30: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Pulling data – when is it needed ?

30

You can “pull” all data from Connections...

but is it really needed?

Good news:

Events surface in most case all data needed for analytics purposes (including the content the event is about)

Events about the same object repeat data– If there are X events about the same object, the item/correlation data set will always contain the most

up-to-date information about the referenced object

For an analytic solution – in a nutshell, this means that the Event SPI should be sufficient in most cases

Page 31: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Pulling data – when is it needed ?

“Push” approach (Event SPI) is sufficient to build most analytic solution– All necessary content (textual content, tags, …) is surfaced in every single event– All operation changing relationships (ie: adding/removing member, colleague, follower) are surfaced

as events

“Pull” (REST APIs) approaches should stay limited to:

1. “Bootstrap” the Analytics Service based on a Connections system with data existing prior to the introduction of the event handler used in your analytic solution

• Essentially building membership/network data (as needed)• Seeding the content should not be needed, as it is repeated whenever an event about the content

is generated

2. Fetching data not available through the Event SPI• Relatively “rare” for events generated from Connections

Page 32: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Pulling data from Connections

32

2 main approaches for pulling data from Connections

1. REST APIs (Atom / OpenSocial format)– REST-style HTTP based APIs (XML, Json format)– Transparency: programmatically access much of the same information that can be accessed through

the IBM Connections UI– “Drink your own champagne” - public APIs used internally by plug-ins, mobile … and even some

components Web UI (Activity Stream, Activities, …)

2. Seedlist– Designed to allow crawling of Connections data for indexing purpose by a search engine– Surfacing all content in the system – therefore it can be of some value for an analytic solution– HTTP based APIs (Atom XML format)

Page 33: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Seedlist

Example: /forums/seedlist/myserver returns ALL forum entries in the system– Textual content, author, number of comments, number of recommendations, parent id,

ACL

Page 34: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Authentication aspects for the REST APIs

REST APIs support basic authentication, form-based authentication and (for most APIs) Oauth

Private data: strict enforcement of access on API calls– Not very convenient for access by an analytic

system...

“Super user” – Concept of “super user” - access control checks on

private data are by-passed– The “super user” is a user mapped in the JEE

“admin” role across all Connections services

Public data: APIs that access public data don't require authentication

– Provided that the environment is not configured to prevent anonymous access

Page 35: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Pulling data from Connections – What to use, when?

REST APIs (Atom / OS APIs) Seedlist

Pro • Fine granularity: access content / meta-data for a specific object / container

• Access relationship information

APIs are available for fetching membership lists, network information, who liked a given object, ...

• Batch retrieval of textual content• Incremental updates (but the Event

SPI is much more suitable for this purpose)

Cons • Lack of batch retrieval capabilities • Focused around content - does not expose all the data (missing tags membership information, ...)

In some very specific cases, data not available in a form easily consumable to build an analytic solution– Example: getting the list of followers for a given object in the system– Query directly the Connections databases (in these specific cases only)– Database schema can change overtime and is private

Page 36: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Key points

Leverage the Event SPI as much as possible– Provides (most of) the data needed for any elaborated

analytics solution– Just let Connections push data to you! Easier, perform

well

“Fill the gaps” by pulling data from the Atom/Seedlist APIs

– Initial loading of relationship / content data– Data not available through the Event SPI

One final warning:– Analytic solution access to private data through the Event SPI, and Atom/Seedlist APIs (with admin

role)

=> Ensure your solution is not leaking private data to unauthorized users

Page 37: AD306 - Turbocharge Your Enterprise Social Network With Analytics

37

Analytics And Connections Data

Page 38: AD306 - Turbocharge Your Enterprise Social Network With Analytics

The “Enterprise” Workflow

38

Data

Sources

ETL Data

Prep

Analytics

Data

Consumption

Credit: Paco Nathan

Page 39: AD306 - Turbocharge Your Enterprise Social Network With Analytics

The Analytics Data Service

Hadoop/Zookeeper

Map/ReduceTools

Big Table DB

Graph Database

GraphAnalytics

WebServer

node.js

data analytics service

UI service

identity

service Workflowcoordinator

StreamProcessing

pub/sub

Page 40: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Frequently Heard Big Data DimensionsA Fuzzy definition:

– 4Vs: volume, velocity, variety, value– Can’t fit or be processed on a single machine– data intensive vs. compute intensive– Analytics focused

40

Page 41: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Big Data Aspects For Us To ConsiderConnections data:

semi-structured, line formatted output, that works well with “a hadoop cluster” and graph

time and spacial aspects

de-normalized

combined with multiple data sources

calculations = data too

explored for insights, innovate with data

doesn’t ‘expire’, sticky

The difference between “BI” and “Analytics”– Hadoop environments are designed to interpret the data at processing time– Processing attributes chosen by the person processing the data

41

Page 42: AD306 - Turbocharge Your Enterprise Social Network With Analytics

‘Simple’ Analytics Are Often Best

More data usually beats better algorithms– LOTs of data. Simple algorithms is not a bad plan.

But you will probably always want to ‘sample’ for efficiency

42

Credit: Anand Rajaraman, Netflixhttp://anand.typepad.com/datawocky/2008/03/more-data-usual.html

Page 43: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Handling The Data From Connections

Full Refresh– Often called “bulk load”

Delta Updates– Streaming via the SPI

What do you do with the data as it comes in ?– Files ?– Directly into stores ?– Directly into analytics ?

A need for real time analytics ?

43

Page 44: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Why A Property Graph In Analytics ?

A property graph has:– key/value properties– both vertices and edges can have any number of properties– directed relationships– (hint: this is not rdf)

Reference: https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph

We want to answer questions like:– Context around the event– Cause and effect of an event– Things related to an event

Property graphs are a very useful tool– Data science part– Production part

44

Name: bob Name:roger

calls

Page 45: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Graph Analytics: A Specific Example For Connections Data

45

em·i·nence

noun \ e-mə-nən(t)s\ˈ

: a condition of being well-known and successful

Source: Merriam-Webster OnlineHow might we use graph technology in our analytics service to calculate a person’s eminence ?

Page 46: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Graph Analytics – A Glimpse At Eminence Calculations

46

Person A Person BStatus Update Status UpdateComment

creates createscomments on

Look for this graph pattern, thencount comments and weight by who commented, normalize… = an eminence scoreelement

A real eminence score canhave 13 or more measuresjust from Connections metadata alone.

Page 47: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Visualizing Analytics: A Real Dashboard Example

47

Scores are fictionalized

Page 48: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Gradually Add More Data and Analytics For Deeper Insights

48

Finding potentially obese people…

Source: The Wall Street Journal

What other sources of data are there outside of Connections ?

What other data is coming in the Connections Event SPI ?(hint: it can be more than just connections data)

For us:

Twitter

CRMConnections

Other…

Articles E-mail

Page 49: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Summary: Find Business Value In Your Connections Data

From “transactions”/“social receipts” To insights

Effective use of Connections APIs

Key insights using Big Data Analytics on Connections Data

Engagement for better productivity and faster execution – – at the personal, organizational and company wide levels

Your insights are limited only by the data and your ability to process it for insights

49

Page 50: AD306 - Turbocharge Your Enterprise Social Network With Analytics

For More Information

Visit IBM’s Emerging Technology Page !

http://www.ibm.com/sna

http://www.ibm.com/engage

Stop by the Innovation center to see more

I’ll be there to answer your specific questions !

More information about the Connections APIs and SPIs in the IBM Connections product wiki under “Developing”

50

Page 51: AD306 - Turbocharge Your Enterprise Social Network With Analytics

Access Connect Online to complete your session surveys using any:– Web or mobile browser – Connect Online kiosk onsite

51

Page 52: AD306 - Turbocharge Your Enterprise Social Network With Analytics

52

Engage Online

SocialBiz User Group socialbizug.org– Join the epicenter of Notes and Collaboration user groups

Follow us on Twitter– @IBMConnect and @IBMSocialBiz

LinkedIn http://bit.ly/SBComm– Participate in the IBM Social Business group on LinkedIn:

Facebook https://www.facebook.com/IBMSocialBiz– Like IBM Social Business on Facebook

Social Business Insights blog ibm.com/blogs/socialbusiness– Read and engage with our bloggers

Page 53: AD306 - Turbocharge Your Enterprise Social Network With Analytics

53

Acknowledgements and Disclaimers

© Copyright IBM Corporation 2014. All rights reserved.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM, the IBM logo, ibm.com, Lotus, and IBM Connections are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Other company, product, or service names may be trademarks or service marks of others.

Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates.

The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.