Getting Started on Google Cloud Platform

Embed Size (px)

Text of Getting Started on Google Cloud Platform

  • Getting Started on Google Cloud Platform

    Aaron Taylor

    @ataylor0123

  • access any file in seconds, wherever it is.

    www.meta.sc

  • Folders are outdated

  • Files are scattered

  • Talk Roadmap

    What problems we face at Meta

    How we are solving them using GCP

    How you can get started on GCP

  • Building a product

    No baggage, free to choose whatever stack we want

    Take advantage of latest technologies

    but not quite bleeding edge

  • Engineering Goals

    This will be a complex product, it needs to be comprehensible to everyone on our team

    Keep the team as lean as possible

    Focus on product, not sysadmin and dev ops

  • Language Choices

    Go chosen as our primary language

    Python for NLP and data analysis

    enables easy experimentation, comfortable for data scientists and developers

    Java/Scala interacting with Dataflow, Apache Tika, etc.

  • Our Hard Problems

    User onboarding load

    Heterogeneous (changing) data sources

    Unpredictable traffic from web hooks

    Compute loads for file content analysis

    Processing streaming data

  • User Onboarding

    Crawl multiple cloud accounts at once

    Parallel computation

    In-process using Go

    Distributed using tasks App Engine

    Taskqueues

  • Heterogeneous Data

    Remove complexity of third-party services

    Detect changes/breakages in APIs

    Distributed by nature

    Continuous Deployment

    Datastore

    BigQuery

  • Unpredictable Traffic

    Changes are pushed to us through web hooks

    Dropping changes generally unacceptable

    One user should not negatively impact others

    App Engine autoscaling

    Asynchronous task queues

  • Compute loads Rich file content analysis

    Parallel computation

    App Engine Flexible Runtimes

    CPU-based autoscaling

  • Stream Processing Efficient handling of

    high-volume changes

    Collate events in succession, from multiple users

    Google Cloud Pub/Sub

    Google Cloud Dataflow

  • How we started off

    App Engine is our entry point

    Service Oriented Architecture

    Currently ~37 different services

    Cloud Datastore is our persistence layer

    BigQuery as a data warehouse

  • Documentation

    Lots of information for getting started

    Quality resources for our growing team

    Onboarding new developers without GCP experience has been a breeze

    Google is devoting lots of resources to this area

  • App Engine

    Dont worry about servers

    Cache, task queues, cron, database, logging, monitoring, and more all built in

    Powerful, configurable autoscaling

    Heavy compute on App Engine Flexible Runtimes

  • Development Process

    Build, run, and test services locally

    Continuous deployment to a development project

    Incremental releases go to production project

    Logging and monitoring easy to setup

  • Problems we faced Mantra of dont worry about scalability didnt take us

    very far

    Users have lots and lots of files

    Datastore use optimizations

    Cost issues with App Engine

    Trimming auto-scaling parameters

    Migrated heavy compute to Flexible Runtimes

  • Outside GCP Algolia

    Hosts infrastructure for our search indices

    Pusher

    realtime socket connections

    Postmark/Mailchimp

    transactional and campaign-based email

  • Growth of the platform Rapid changes and improvements taking place

    Flexible Runtimes

    Container Engine

    Dataflow

    Investing in a documentation overhaul soon

    Support is generally quite responsive

  • Recent Developments

    Introduction of Pub/Sub to our system for all event processing

    Experimenting with Kubernetes/Container Engine

    Dataflow stream processing jobs

    Splitting functionality into multiple projects

  • Quickstart Documentation for Go

    How you can start off

  • Hello World in Go

    https://cloud.google.com/appengine/docs/go/quickstart

    https://cloud.google.com/appengine/docs/go/quickstart

  • Server

    package hello

    import ( "fmt" "net/http" )

    func init() { http.HandleFunc("/", handler) }

    func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprint(w, "Hello, world!") }

    hello.go

  • Configuration

    runtime: go api_version: go1

    handlers: - url: /.* script: _go_app

    app.yaml

  • Deploy

    appcfy.py update .

  • Add a Guestbook

    https://cloud.google.com/appengine/docs/go/gettingstarted/creating-guestbook

    https://cloud.google.com/appengine/docs/go/gettingstarted/creating-guestbook

  • Datastoretype Greeting struct { Author string Content string Date time.Time }

    // guestbookKey returns the key used for all guestbook entries. func guestbookKey(c appengine.Context) *datastore.Key { // The string "default_guestbook" here could be varied to have multiple guestbooks. return datastore.NewKey(c, "Guestbook", "default_guestbook", 0, nil) }

    func root(w http.ResponseWriter, r *http.Request) { c := appengine.NewContext(r)

    // Ancestor queries, as shown here, are strongly consistent with the High // Replication Datastore. Queries that span entity groups are eventually // consistent. If we omitted the .Ancestor from this query there would be // a slight chance that Greeting that had just been written would not // show up in a query. q := datastore.NewQuery("Greeting").Ancestor(guestbookKey(c)).Order("-Date").Limit(10)

    greetings := make([]Greeting, 0, 10) if _, err := q.GetAll(c, &greetings); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return }

    if err := guestbookTemplate.Execute(w, greetings); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) } }

  • Templates

    var guestbookTemplate = template.Must(template.New("book").Parse(` Go Guestbook {{range .}} {{with .Author}} {{.}} wrote: {{else}} An anonymous person wrote: {{end}} {{.Content}} {{end}} `))

  • Formsfunc sign(w http.ResponseWriter, r *http.Request) { c := appengine.NewContext(r) g := Greeting{ Content: r.FormValue("content"), Date: time.Now(), }

    if u := user.Current(c); u != nil { g.Author = u.String() } // We set the same parent key on every Greeting entity to ensure each Greeting // is in the same entity group. Queries across the single entity group // will be consistent. However, the write rate to a single entity group // should be limited to ~1/second. key := datastore.NewIncompleteKey(c, "Greeting", guestbookKey(c)) _, err := datastore.Put(c, key, &g) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Redirect(w, r, "/", http.StatusFound) }

  • Conclusions

    Google Cloud Platform has allowed us to build out Meta in ways that wouldnt otherwise be feasible

    Simplicity of App Engine allows us to focus on product

    Scalability/Availability are built in to the platform

  • access any file in seconds, wherever it is.

    www.meta.sc/careers

    careers@meta.sc