View
1.140
Download
0
Category
Tags:
Preview:
DESCRIPTION
GraphConnect 2014 SF: The Business Graph presented by Kurt Freytag, Head of Product and Engineering, CrunchBase
Citation preview
SAN FRANCISCO | 10.22.2014 THE BUSINESS GRAPH
The Business Graph(Why we chose Neo4j to rebuild CrunchBase)
THE BUSINESS GRAPH
Kurt Freytag Head of Product, CrunchBase
kurt@crunchbase.com 415.891.7761 @kfreytag
5’10”, 155lbs. Coding since 1977
Who Am I?
THE BUSINESS GRAPH
• Concise History of CrunchBase
• Our Vision
• Why Neo4j?
• Building w/ Neo4j & The Web
• Q&A
What am I Talking About?
THE BUSINESS GRAPH
• Started in 2007 by Michael Arrington
• Zero dedicated staff from 2007-2013
• Organically became source of truth for Startup Ecosystem
• Millions of Monthly Users
• Ran on two crappy AWS servers
History of CrunchBase - In One Slide
MySQL 5.0Rails 2.0
THE BUSINESS GRAPH
• The Complete Graph of the Connected Business World
• Entities: people, products, companies
• Activities: fundings, acquisitions, job changes
• Connections: how everything relates
• Time: the lifecycle of every element
• World’s Most Powerful Startup Community
• Open to all
The Vision of CrunchBase
THE BUSINESS GRAPH
Emil Eifrem Founder
• A natural way of modeling data
Why Neo4j?Neotechnologies
Company
Neo4j Enterprise Edition Product
Seed Round Funding
Sunstone Capital Investor
Connor Venture Partners Investor
Lars Nordwall COO
Philip Rathle VP of Products
GraphConnect 2014 Event
Kurt Freytag Speaker
THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
Why Neo4j?Neotechnologies
Company
Seed Round Funding
Sunstone Capital Investor
Connor Venture Partners Investor
Investment
Investment
John Smith Lead Investor
John Smith Lead Investor
THE BUSINESS GRAPH
• A natural way to model data
• Adapts easily to changing requirements
• Built-In Business Intelligence • Very specific or very general questions
• We don’t know the questions in advance
Why Neo4j?
select if (tg.described_count > 1, 'complex', 'basic') dupe_class, o.normalized_name, concat('=hyperlink("http://www.crunchbase.com', o.permalink, '", "', o.name, '")')name_url, ifnull(o.domain, '') domain, ifnull(o.homepage_url, '') homepage_url, if(o.status = 'unknown', '', o.status) status, o.permalink, ifnull(o.investment_rounds, '') investment_rounds, ifnull(o.funding_rounds, '') funding_rounds, ifnull(o.relationships, '') relationships, ifnull(o.milestones, '') milestones, if( o.logo_url is null, '', 'Yes') has_logo, length(ifnull(o.overview, '')) overview_length, ifnull(o.created_by, '') created_by, date_format(o.created_at, '%Y-%m-%d %H:%i:%s') created_at, UNIX_TIMESTAMP(o.created_at) ts, ( ifnull(o.investment_rounds, 0)*20 + ifnull(o.funding_rounds, 0)*20 + ifnull(o.relationships, 0)*10 + ifnull(o.milestones, 0) + length(ifnull(o.overview, '')) + if( o.logo_url is null, 0, 50)) entity_rank, o.entity_type, o.entity_id from cb_objects o join t_duplicate_objects td on td.object_id = o.id join t_duplicate_groups tg on tg.id = td.duplicate_group_id where td.max_created_at > FROM_UNIXTIME(i_start_unixtime)
EXPLAIN PLAN
THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
• Built-In Business Intelligence • Very specific or very general questions
• We don’t know the questions in advance
• Directly maps to our OO thinking
Why Neo4j?class Organization < BaseEntity relationship :has_funding_round, relationship :has_customer, relationship :sponsors_event, ...end
Neotechnologies Company
class FundingRound < BaseActivity attribute :announced_on, attribute :closed_on, attribute :funding_type, attribute :series, attribute :money_raised, attribute :post_money_valuation, ...end
Seed Round Funding
class HasFundingRound < BaseRelationship relationship :has_funding_round, relationship :has_customer, relationship :sponsors_event, ...endha
s_funding_round
THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
• Built-In Business Intelligence • Very specific or very general questions
• We don’t know the questions in advance
• Directly maps to our OO thinking
• We move faster • Just launched CrunchBase Events @ TC Disrupt London
• Design, development, QA, and release was 2 weeks
Why Neo4j?
Okay, if Neo’s so awesome, why doesn’t everybody use it?
THE BUSINESS GRAPH
• CGI • design a data model
• roll-your-own database connection
• manually write all your queries
• ORM (Hibernate, Doctrine) • design a data model
• build the objects
• map ‘em through configuration
Databases & the Web - A Brief History
THE BUSINESS GRAPH
• Today’s languages use datastores as dumb repos
• Generate schemas from code
• Isolate developer from writing queries
• Focus on business logic, not data
• Couple of Problems • The DBA role existed for a reason
• Data modeling is the foundation of a scalable architecture
• Generated queries can easily be 1,000x less efficient
• Quick development can lead to slow applications
Database as a Commodity
THE BUSINESS GRAPH
• Neo4j is tough to adopt • Languages don’t support it out-of-the-box
• The tools / drivers that exist are immature
• Neo4j is not plug-n-play
• However… • Neo4j is ideal for Object-Oriented development
• Graphs are a natural fit for many use cases
• We need to make Neo4j as easy to choose as MySQL
Means that…
+ = ?
THE BUSINESS GRAPH
• ActiveRecord for Neo4j
• Implements a lot of ActiveModel • Validations
• Serialization
• Callbacks
• Handles all Marshalling / UnMarshalling
• “Feels” like ActiveRecord
• Makes Neo4j plug-n-play for Rails
• We Will Open Source It
“Deja”
Thanks. Enjoy.
Recommended