54
Taming the resource tiger You cannot hide from Physics!

Taming the resource tiger

Embed Size (px)

Citation preview

Page 1: Taming the resource tiger

Taming the resource tigerYou cannot hide from Physics!

Page 2: Taming the resource tiger

MassHow much matter is in an object

Page 3: Taming the resource tiger

Data Storage• Hard Disk Drive - HDD

• Magnetizes a thin film of ferromagnetic material on a disk• Reads it with a magnetic head on an actuator arm

• Solid State Drive – SSD• Uses integrated circuit assemblies as memory to store data

persistently• No moving parts

Page 4: Taming the resource tiger

Areal Storage Density• SSD

• 2.8 Tbit/in2 • HDD

• 1.5 Tbit/in2

Terabits per square inch – numbers as of 2016 (see Wikipedia, our materials are improving)

Page 5: Taming the resource tiger

When hard drives go bad

Page 6: Taming the resource tiger

What does a this have to do with PHP?

Page 7: Taming the resource tiger

The chat that worked for 3 days…

Page 8: Taming the resource tiger

Streams: Computing ConceptDefinitions

• Idea originating in 1950’s • Standard way to get Input

and Output• A source or sink of data

Who uses them

• C – stdin, stderr, stdout• C++ iostream• Perl IO• Python io• Java• C#

Page 9: Taming the resource tiger

What is a Stream?• Access input and output generically• Can write and read linearly• May or may not be seekable• Comes in chunks of data

Page 10: Taming the resource tiger

Why do I care about streams?• They are created to handle massive amounts of data• Assume all files are too large to load into memory• If this means checking size before load, do it• If this means always treating a file as very large, do it• PHP streams were meant for this!

Page 11: Taming the resource tiger

What uses streams in PHP?• EVERYTHING• include/require _once• stream functions• file system functions• many other extensions

Page 12: Taming the resource tiger

ALL IO

Attach Context

Stream Transpo

rt

Stream Filter

Stream Wrappe

r

How PHP Streams Work

Page 13: Taming the resource tiger

Using Streams

Page 14: Taming the resource tiger

You can also do logic on the fly!

Page 15: Taming the resource tiger

What are Filters?• Performs operations on stream data• Can be prepended or appended (even on the fly)• Can be attached to read or write• When a filter is added for read and write, two instances of the

filter are created.

Page 16: Taming the resource tiger

Using Filters

Page 17: Taming the resource tiger

Things to watch for!• Data has an input and output state• When reading in chunks, you may need to cache in between

reads to make filters useful• Use the right tool for the job

Page 18: Taming the resource tiger

Throw away your assumptions except for:

There will be Terabytes of Cat Gifs!!

Page 19: Taming the resource tiger

DimensionBoth an object’s size and mathematical space

Page 20: Taming the resource tiger

Random Access Memory (RAM)• The CPU uses RAM to work• It randomly shoves data inside and pulls data back out• RAM is faster then SSD and HDD• It’s also more expensive

Page 21: Taming the resource tiger

Out of Memory

Page 22: Taming the resource tiger

There are two reasons you’ll see that error• Recursion recursion recursion recursion

• Solution: install xdebug and get your stacktrace• Loading too much data into memory

• Solution: manage your memory

Page 23: Taming the resource tiger

Inherently PHP hides this problem• Share nothing architecture• Extensions with C libraries that hide memory consumption• FastCGI/CGI blows away processes, restoring memory• Max child and other Apache settings blow away children,

restoring memory

Page 24: Taming the resource tiger

How do I fix it!

Page 25: Taming the resource tiger

Halp, I can’t upload!!

Page 26: Taming the resource tiger

Arrays are evil• There are other ways to store data that are more efficient• They should be used for small amounts of data• No matter how hard you try, there is C overhead

Page 27: Taming the resource tiger

Process with the appropriate tools• Load data into the appropriate place for processing• Hint – arrays are IN MEMORY – that is generally not an

appropriate place for processing• Datastores are meant for storing and retrieving data, use them

Page 28: Taming the resource tiger

Select * from table

Page 29: Taming the resource tiger

Use the iteration, Luke• Lazy fetching exists for database fetching – use it!• Always page (window) your result sets from the database –

ALWAYS• Use filters or generators to format or alter results on the fly

Page 30: Taming the resource tiger

The N+1 problem• In simple terms, nested loops• Don’t distance yourself too much from your datastore• Collapse into one or two queries instead

Page 31: Taming the resource tiger

Throw away all your assumptions except:

Page 32: Taming the resource tiger

SpeedThe rate at which an object covers distance

Page 33: Taming the resource tiger

How does a CPU work?

Page 34: Taming the resource tiger

CPU limitations• Transmission delays• Heat

• Both are materials limitations

• http://www.mooreslaw.org/

Page 35: Taming the resource tiger

Why I no longer overclock

Page 36: Taming the resource tiger

What does this have to do with PHP?• You are limited by the CPU your site is deployed upon.• Yes even in a cloud – there are still physical systems running

your stuff• Yes even in a VM – there are still physical systems running your

stuff• Follow good programming habits • PROFILE

Page 37: Taming the resource tiger

Good programming habits• Turn on opcache in production!• Keep your code error AND WARNING free• Watch complex logic in loops

• Short circuit the loop • Rewrite to do the logic on the entire set in one step• Calculate values only once• On small arrays use array_walk• On large arrays use generators/iterators

• Use isset instead of in_array if possible• Profile to find the place to rewrite for slow code issues

Page 39: Taming the resource tiger

Distribute the load• Perfect for heavy processing for some type of data• Queue code that requires heavy processing but not immediate

viewing• Design your UX so you can inform users of completed jobs• Cache complex work items

Page 40: Taming the resource tiger

Pick your system• php-resque• Gearman• Beanstalkd• IronMQ• RabbitMQ• ZeroMQ• AmazonSQS• Just visit http://queues.io

Page 41: Taming the resource tiger

Job queuing and 10K page pdfs

Page 42: Taming the resource tiger

Keep your CPU happy• Offload processing• Use a queue

Page 43: Taming the resource tiger

VelocitySpeed + Direction

Page 44: Taming the resource tiger

Networking 101• IP – forwards packets of data based on a destination address• TCP – verifies the correct delivery of data from client to server

with error and lost data correction• Network Sockets – subroutines that provide TCP/IP (and UDP

and some other support) on most systems

Page 45: Taming the resource tiger

Packet of Data

Page 46: Taming the resource tiger

Speed in the series of tubes• Bandwidth – size of your pipe• Latency – length of your pipe including size changes• Jitter – air bubbles in your pipe

Page 47: Taming the resource tiger

Network Socket Types• Stream

• Connection oriented (tcp)• Datagram

• Connectionless (udp)• Raw

• Low level protocols

Page 48: Taming the resource tiger

Definitions• Socket

• Bidirectional network stream that speaks a protocol• Transport

• Tells a network stream how to communicate• Wrapper

• Tells a stream how to handle specific protocols and encodings

Page 49: Taming the resource tiger

Using Sockets

Page 50: Taming the resource tiger

What does this have to do with PHP?• APIs fail• APIs go byby• AWS goes down

• Or loses network connection to a specific area• Or otherwise fails

Page 51: Taming the resource tiger

What do you mean we can’t write files?

Page 52: Taming the resource tiger

Prepare for failure• Handle timeouts• Handle failures• Abstract enough to replace systems if necessary, but only as

much as necessary• If you’re not paying for it, don’t base your business model on it

Page 53: Taming the resource tiger

Checklist• Cultivate good coding habits• Try not to loop logic or processing• Don’t be afraid to offload work to other systems or services• Assume every file is huge• Assume there are 1 million rows in your DB table• Assume that every network request is slow or going to fail• Profile to find code bottlenecks, DON’T assume you know the

bottleneck• Wrap 3rd party tools enough to deal with downtime or

retirement of apis

Page 54: Taming the resource tiger

About Me http://emsmith.net [email protected] twitter - @auroraeosrose IRC – freenode –

auroraeosrose #phpmentoring https://joind.in/talk/9f43b