28
O perationalizing C lojure C onfidently Prasanna Gautam Staples-SparX 02/19/2015 Image: http://www.rohitnair.net/img/design/core.jpg

Operationalizing Clojure Confidently

Embed Size (px)

Citation preview

Page 1: Operationalizing Clojure Confidently

Operationalizing Clojure Confidently

Prasanna Gautam Staples-SparX 02/19/2015

Image: http://www.rohitnair.net/img/design/core.jpg

Page 2: Operationalizing Clojure Confidently

–Douglas Hofstadter (“I am a Strange Loop”)

“We don't want to focus on the trees (or their leaves) at the expense of the forest.”

Page 3: Operationalizing Clojure Confidently

My Clojure StoryIntroduced to Clojure - didn’t have prior Lisp experience.

Did my senior project on simulating Mobile Ad-hoc networks using Clojure at Trinity College in 2011.

Started working at ESPN Innovation

Worked on variety of other languages - Java, Ruby, Python, Javascript, C++

Clojure was my primary interface to JVM for experimentation

Decided to use Clojure to deliver ESPN programming to International Space Station

SparX

2009

2011

2011-2013

2013

2015

Page 4: Operationalizing Clojure Confidently

RequirementsCmdr. Chris Cassidy reached out to request regular ESPN programming.

200 MB file limit

Had to be ready every day at noon Central Time

Obvious choice:

Lets hire people to clip and send videos every day!

Page 5: Operationalizing Clojure Confidently

But it’s 2013Why not automate?

Also, let’s remove ads.

Motive: Validating the video services and interfaces we had been working on.

Ok, so why Clojure?

Page 6: Operationalizing Clojure Confidently

Why Clojure?Two weeks to deadline

Not all the pieces were clear

No guarantees from upstream services

Human errors abound

Source of data was people pressing buttons

And, systems failing would result in similar behavior

Page 7: Operationalizing Clojure Confidently

Why Clojure?Immutability

I could keep the system as a “constant” in ever changing world

Idempotency - re-run if failed, resume at any point in pipeline.

Java Interop

Even when I had APIs that weren’t written by my group, they were SOAP and XML based. Yay!

Inherently refactorable if designed correctly

Page 8: Operationalizing Clojure Confidently

Post-mortemStill in production since September 2013

Strictly enforced the “naïve” approach that “should” work

Learned a lot of lessons that go beyond Clojure

This talk is about these lessons

Page 9: Operationalizing Clojure Confidently

- Paul Graham (“Hackers & Painters: Big Ideas from the Computer Age”)

“When you're forced to be simple, you're forced to face the real problem.”

Page 10: Operationalizing Clojure Confidently

Parts of the stackCore Assumptions

Operations

Familiar Interfaces

Overrides

State

Logging

Error Handling

Iterative Development

Page 11: Operationalizing Clojure Confidently

Core: TimestampsPrograms — items that have a name and “start” and “end” times

Program Segments, Breaks — blocks within a program that “start” and “end” at particular times.

It’s just a map and reduce operation now!!

Take only program segments and make them into a video.

Page 12: Operationalizing Clojure Confidently

Why was it a good idea?Bare set of functionality to bind everything together.

Everything else is a good signal and would make system “better” but not dependable.

Aligning timestamps in UI is dead-easy to see where things are not aligned.

TV Programs are events too.

Page 13: Operationalizing Clojure Confidently

Core: Dependency GraphYour tasks are dependent on previous tasks

What’s the plan when they fail to execute?

Page 14: Operationalizing Clojure Confidently

Core: Loose Coupling/Lazy Execution

Separate data gathering and execution

You can expose the data to the user with no side-effects.

Page 15: Operationalizing Clojure Confidently

On OperationsFunctional Programs still need Operational expertise

If you’re in big enough company with an ops team

They don’t care about your FP patterns - they shouldn’t have to.

Make configurations declarative and readable

Page 16: Operationalizing Clojure Confidently

On Familiar InterfacesUse standard configuration formats — readable, parseable by anything

I picked Yaml

Familiar scheduling

Used cron strings thanks to Quartz

Everything in UTC internally

Timezones treated as side-effects

programs:)

))*)name:)AROUND)THE)HORN)

))))short_name:)ATH)

))))start_time:)"20:00:00")

))*)name:)PARDON)THE)INTERRUPTION)

))))short_name:)PTI)

))))start_time:)"20:30:00")

))*)name:)SPORTSCENTER)

))))short_name:)SportsCenter)

))))start_time:)"14:00:00")))

run:)

))cron:)0)0)14)1/1)*)?)*)

)

final_tz:)America/Anchorage)

)

Page 17: Operationalizing Clojure Confidently

On Familiar InterfacesStarted with a solid command line interface.

Took the Config and Options abstractions and exposed as REST API.

Switches)))))))))))))))))))))))))Default))))))))Desc)

)////////)))))))))))))))))))))))))///////))))))))////)

)/c,)//config)))))))))))))))))))))nasamatic.yml))Use)this)config)file)path)

)/h,)//no/help,)//help))))))))))))false))))))))))Show)Help)

)/f,)//no/force,)//force))))))))))false))))))))))Force)run)now)instead)of)using)Cron)

)/u,)//no/upload,)//upload))))))))true)))))))))))Upload)or)not)

)/t,)//no/transcode,)//transcode))true)))))))))))Transcode)or)not)

)/B,)//hours/before/now)))))))))))0))))))))))))))How)many)hours)before)now)to)look)at)

)/d,)//no/dry/run,)//dry/run))))))false))))))))))Dry)Run)modeOptions)

)

Page 18: Operationalizing Clojure Confidently

On Familiar InterfacesAlso wrote a Web UI in AngularJS for Operations team to use in cases of failed runs

The system failed rarely enough that I had to retrain people all the time.

Just gave up and used the CLI tool most of the time

UI breakage due to javascript issues

Exposing the API to Slack was more popular

Page 19: Operationalizing Clojure Confidently

On Familiar InterfacesOne-to-one correspondence between CLI and JSON

Key switch type default description

upload -u,--[no-]upload flag TRUE Upload to the FTP server

transcode -t, --[no-]transcode flag TRUE Pass the files through transcoder

qc -q,--[no-]qc flag FALSE Submit file to be QC’d by Pulsar

hours-before-now -B,--hours-before-now int 0 Number of hours before to look

dry-run -d,--dry-run flag FALSE Run without affecting filesystem/uploading

filter-by-program-tag -p, --[no-]filter-by-program-tag

flag TRUE Select contiguous programTags from Authnet or not

short-names -s,--short-names string Programs to select as declared in the configuration file under programs. Default behavior is to run all programs declared in configuration.

Page 20: Operationalizing Clojure Confidently

On OverridesCore Abstractions - Config and Options

Config: A static set of parameters that defines the general behavior of program. Doesn’t change too often.

Options: A dynamic set of parameters that can override config per-run.

Every job gets defined entirely by them.

Page 21: Operationalizing Clojure Confidently

On StateKeep the least amount of state possible

The system used no database at all for operations.

Intermediate files that were effects of steps were relied upon

Have to keep only last-seen state for live operation.

Re-running is trivial.

Page 22: Operationalizing Clojure Confidently

On LoggingTimestamp, state, key=value

Parseable by anything! (It was Splunk’s weirdness that led to this)

Can generate metrics from on-going operations without instrumenting further.

Wired to PagerDuty directly

Page 23: Operationalizing Clojure Confidently

On Error HandlingFind out about error, try to fix it — if not possible, system should try the whole process next day/job

Parent form generates random trace-id for a job

Passed to all children for that job

Any exceptions are passed via the chain and logged

Back off and Retry — if all else fails, let humans figure it out.

Page 24: Operationalizing Clojure Confidently

(defmacro)do$with$log+

++"+Works+functionally+like+a+do+block+$$+more+or+less,+it+runs+all+the+given+forms+in+order+and+returns+the+output+of+the+last+form+it+ran..+It+logs+when+the+job+

started,+ended+or+when+it+runs+into+any+problems.+It+logs+the+error+and+rethrows+the+Throwable+upstream."+

++([[job$name+name+&+{:keys+[trace$id]+:or+{trace$id+(str+"trace$"+(rand$int+100000))}}]+&++body]+

+++(if$not+name+

+++++(throw+(IllegalArgumentException.+"You+want+to+provide+a+name+for+the+block+you+want+to+run.")))+++

+++`(let)[out#+(atom+nil)+

++++++++++start$time#+(System/currentTimeMillis)+

++++++++++~job$name+(str+~name)+

++++++++++~'trace$id+(str+~trace$id)+

+++++++++]+

++++(infoAm+"job"+~job$name+"status"+"Started"+"trace$id"+~trace$id)+

++++(reset!+out#+(try)

+++++++~@body+

+++++++(catch+Throwable++e#+

+++++++++(errorAm+"job="+~job$name++"status"+"Error"+"trace$id"+~trace$id++"message"+e#)+

+++++++++(throw+e#))))+

++++(infoAm+"job"+~job$name+"status"+"Ended"+"trace$id"+~trace$id+"time_taken"+(str+($+(System/currentTimeMillis)+start$time#+)+"ms"))+

++++@out#)+

+

+++++)+

++)+

2014-05-20 00:28:26 INFO utils-verify:1 - trace-id=trace-94295, status=Started, job=sleeps 2014-05-20 00:28:27 INFO utils-verify:1 - trace-id=trace-94295, status=Started, job=throws-error 2014-05-20 00:28:27 ERROR utils-verify:1 - job==throws-error, trace-id=trace-94295, message=java.lang.Throwable: Boo! I errored Out, status=Error 2014-05-20 00:28:27 ERROR utils-verify:1 - job==sleeps, trace-id=trace-94295, message=java.lang.Throwable: Boo! I errored Out, status=Error

Only Macro I needed

Page 25: Operationalizing Clojure Confidently

Iterative DevelopmentUsed “lein ns-deps-graph” to see the inter-relations between namespaces

Page 26: Operationalizing Clojure Confidently

Operational ClojureBuilds on simple concepts

they’re the units of composition

Sparingly depends on global state, if at all

Leverages existing infrastructure and people

Adapts to changes in scope and requirements

Loosely couples data and execution

Page 27: Operationalizing Clojure Confidently

FutureI had great time coming up with some of these patterns

Particularly - config and options for jobs

Thinking about open source re-implementations

More Clojure-y things at SparX coming soon. ;)

Page 28: Operationalizing Clojure Confidently

Questions/Comments?