40
Lessons Learned: Machine Learning and Technical Debt Matthew Kirk @mjkirk

Lessons learned

  • Upload
    hexgnu

  • View
    46

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Lessons learned

Lessons Learned: Machine Learning and

Technical DebtMatthew Kirk

@mjkirk

Page 2: Lessons learned

Who uses data?

Page 3: Lessons learned

Responsive Enterprise

Page 4: Lessons learned

A Golden Opportunity

Page 5: Lessons learned

The Danger

Page 6: Lessons learned

The High Interest Debt of Machine Learning

Page 7: Lessons learned

What we’re covering

• Boundary Erosion

• Data Dependencies

• Spaghetti Code

• The Real World

Page 8: Lessons learned

`whoami`

• O’Reilly Author - Thoughtful Machine Learning. Use AUTHD to get a discount on OReilly.com.

• Former Financial Quant

• Independent Consultant

• @mjkirk

Page 9: Lessons learned

Boundary Erosion

• Entanglement

• Visibility Debt

Page 10: Lessons learned

Entanglement

Page 11: Lessons learned

Entanglement: Solution

• Isolate models as much as possible

• Regularization

Page 12: Lessons learned

Visibility Debt

Page 13: Lessons learned

Solutions

• Keeping an API Log

• Monitoring of tool use

• No sharing of usernames :)

Page 14: Lessons learned

Data Dependencies• Unstable

• Underutilized

Page 15: Lessons learned

Unstable Data

Page 16: Lessons learned

Solution

• Versioning

• Keep a specific version of a dataset. For instance a timestamped version of language data.

Page 17: Lessons learned

Underutilized

Page 18: Lessons learned

Solution

• Feature engineering: PCA, ICA, Random Feature Selection, VIMP, etc.

Page 19: Lessons learned

Spaghetti Code• Glue Code

• Pipeline Jungle

• Experimental Paths

• Configuration Debt

Page 20: Lessons learned

Glue CodeR, Matlab, Python, Java. All to use that one

implementation

Page 21: Lessons learned

Solution

• Write your own implementation of the algorithm….

Page 22: Lessons learned

Pipeline Jungle

Page 23: Lessons learned

Conway’s Law

Page 24: Lessons learned

The Clymb’s Database V1.0

PS: No Monitoring on any of this.

Page 25: Lessons learned

Clymb DB V2.0

Page 26: Lessons learned

Solution

• Map systems and reduce

• Reduce organizational disconnects by attending stand ups and being a part of the engineering team

Page 27: Lessons learned

Experimental Paths

Page 28: Lessons learned

Solution: Tombstones

!

• def run_this_once_in_prod!; Tombstone.new(‘2014-01-02’); end

• When you think something is dead put a Tombstone on it

• https://www.youtube.com/watch?v=29UXzfQWOhQ

Page 29: Lessons learned

Configuration Debt

Page 30: Lessons learned

Solution

• Find optimal configurations regularly

• Revisit initial configuration with new datapoints.

Page 31: Lessons learned

External World Changes• Fixed Thresholds

• Correlation changes

Page 32: Lessons learned

Fixed Thresholds

• Law’s Change: The drinking age used to be 19 in many states.

Page 33: Lessons learned

Solution

• Rebuild, or include accuracy as part of your model to minimize on.

• Min Cost = Actual - Predicted

Page 34: Lessons learned

Correlations Change

Page 35: Lessons learned

Solution

• Be careful when trying to find causal evidence. Think what if the model doesn’t work.

• Iterate often

Page 36: Lessons learned

Questions?

Page 37: Lessons learned

The Blissful Land of Opportunity

Page 38: Lessons learned

Lessons Learned In one Slide

Danger SolutionsEntanglement Regularize or Isolate ModelsVisibility Debt Keep an access log of who uses whatUnstable Data Version datasets

Underutilized Data Trim by finding better featuresGlue Code Write your own implementations

Pipeline Jungle Find minimum cut in systemsExperimental Paths Use TombstonesConfiguration Debt Reconfigure with new datasetsFixed Thresholds Include accuracy as part of model

Correlation Changes Trim non-causal data from models

Page 39: Lessons learned

Links and Contact

• @mjkirk

[email protected]

• Machine Learning: The High-Interest Credit Card of Technical Debt: https://bit.ly/1zs9TXi

• Is that code dead?: http://bit.ly/1sg0B1L

Page 40: Lessons learned

Photo Sources• Cost of gigabyte: http://royal.pingdom.com/2011/12/19/would-you-pay-7260-for-a-3-tb-drive-charting-hdd-and-ssd-prices-over-time/

• Golden Opportunity: https://flic.kr/p/7xvfZr

• Problems are Opportunities: https://flic.kr/p/ifFos

• Master Charge: https://flic.kr/p/noQUh1

• Erosion: https://flic.kr/p/9agH2q

• Coupler: https://flic.kr/p/ppm9HG

• Fruit Loops: https://flic.kr/p/5rkLhP

• Somewhere in Quản Bạ, Hà Giang: https://flic.kr/p/q4K9Bo

• Data Dependencies: https://flic.kr/p/dVq7vg

• Unstable!: https://flic.kr/p/s7RLj

• Underutilized Piano: https://flic.kr/p/2sZVP

• Spaghetti: https://flic.kr/p/tuwkp

• Glue: https://flic.kr/p/6L13SK

• Pipelines at google: https://flic.kr/p/pvLQG2