Reliable and Efficient Facebook data processing

Preview:

DESCRIPTION

Talk for the socal piggies meetup 2013-01-17 at Telesign's office.

Citation preview

Reliable and Efficient Facebook data

processing

Andres BuriticaSocal Piggies2013-01-17

http://thelinuxkid.com

Python developer

Facebook Graph API experience

Ubernear

FounderDating

Meta

Draws on lessons from Ubernear

Based on traversal of public nodes in Graph

Use case

Pages or users (owners) with public events

Event discovery

Flow

Owner IDsUpdate ownersUpdate owner events

Partial eventsExpire eventsEvent details

Complete events

Why?

Separation of concerns

Parallel processing with separate user/apps/servers

Less data load on batch requests

Want to store all data

Update owners

Check for migrated owners○ (#21) Page ID <old_id> was migrated to page ID

<new_id>. Please update your API calls to the new ID

Move migrated owners to separate table

Add new owners

Update owner events

Events for owners not checked since datetime

Stop at last event previously collected

Expire events

end_time has passed move to another table

False○ Data might exist

Should have returned False?○ [100] Unsupported get request○ Data might exist

Expire events

Alias not found○ (#803) Some of the aliases you requested do not

exist...

Event details

Skip completed (no refreshing)

Transient errors (retry)○ None○ OAuthException...Error validating application○ (#1) "Unknown error occured" ○ "(#2) Service temporarily unavailable"○ "(#4) User request limit reached" (throttle)○ "(#4) Application request limit reached" (throttle)○ "(#17) User request limit reached" (throttle)

Datetimes

All in ISO-8601

Events○ date_format modifier has no effect○ timezones after "Events Timezone Migration"○ is_date_only○ legacy without timezone

Batch requests

POST

50 requests in one

Large or complex can time out

Nested calls count towards rate limiting

One top level access token, many nested

Batch request example

User's profile and 50 friends

batch=[{ "method":"GET", "relative_url":"me" }, { "method":"GET", "relative_url":"me/friends?limit=50"}]

Batch dependencies

Reference results of a previous operation

JSONPath

Child operation executed after parent

Parent returns None unless forced

Batch dependencies example

Get details of 5 friends

batch=[{ "method":"GET", "name":"get-friends", "relative_url":"me/friends?limit=5", }, { "method":"GET", "relative_url":"?ids={result=get-friends:$.data.*.id}"}]

Field expansion

GET

"join" multiple graph queries into a single call

Replacement for FQL

fields, connections, modifiers and identifiers

No JSONPath

Field expansion example

User's name and birthday plus id and picture of the last 10 photos

/me? fields= name, birthday, photos.limit(10).fields(id,picture)

Batch request with field expansions

User's profile and picture link of 10 photos

batch=[{ "method":"GET", "relative_url":"me" }, { "method":"GET", "relative_url":"me?fields=photos.limit(10).fields(picture)"}]

Process

Facebook is constantly improving

Testing crucial

Beta tier

Not covered

Refreshing data

Pagination

Throttling

Insights

Thanks

Sample code at http://ubernear.com

Questions

http://thelinuxkid.com

Recommended