stream processing engine

Stream Processing Engine

王岩 2011-12-8

Agenda

• Architecture• Multi-thread• I/O• Further work

Purpose

• Users can query provenance

Stream processing engine

stream1

stream2

stream3

stream4

result

Provenance : the data in the original coming streams from which the result is generated

When the user get a strange result, he may be interested in the provenance

Why to query provenance?

Required : If the provenance of one tuple hasn’t been saved in the disk , the tuple would not be sent to the user.

ArchitectureLayered architecture

File layer

Buffer layer

Spe layer

The layer below will provides service to the layer aboveThe layer above will invoke the interface provided by the layer below

Stream processing engine

Buffer for provenance

Disk I/O

Spe layer

Metadata Cql parser

Store metadata of streams including cql statements and datatypes Parser the cql statement and

generate query plan tree

Query plan processor Utility

Process along the query plan tree Provide common service

Component view

Query Plan TreeEntity

leaf

leaf

leaf

leaf

select

join

join

join

root

leaf operatorselect operator

join operator

root operator

Operator class diagram

OperatorEntity

list<QueueEntity*>queueInput;list<QueueEntity*>queueOutput;...string id;

RootOperatorEntity

list<RelationTuple*>waitTupleList;...LeafOperat

orEntitySelectOperatorEntity JoinOperatorEntity

RelationWindow * relationWindow1;...RelationWindow * relationWindow2;...

leaf select join root

id

Query Plan TreeEntity

leaf

leaf

leaf

leaf

select

join

join

join

root

Common queue

Storage queue

Transportation queue

QueueEntity

OperatorEntity * operatorInput;OperatorEntity * operatorOutput;RelationSchema * relationSchema;...list<RelationTuple*>tupleList;

void Push()RelationTuple& pop()

StorageQueueEntity

void Push()

TransportationQueueEntity

void Push()

StorageTransportationQueueEntity

void Push()

Queue class diagram

Data flow

attribute1: integer attribute2: integer attribute3: string

Queue EntityMemory management

Continuous memory used as buffer Head : the head of the tuples in the

bufferTail : the tail of the tuples in the buffer

In a queue, we don’t allocate memory for each tuple , we allocate memory for the queue, and the tuples would be saved in the buffer of the queue.

When initialed, head and tail are the beginning address of the buffer

headtail

When a tuple arrives, the head will move forward the length of a tuple

head head

When a tuple leaves, the tail would move forward the length of a tuple

tail headtail

When there is no space for the new tuple , throw exception: need load shedding algorithm

Tuple Entity

RelationTuple

BYTE * bytes;int tuplePosition;int tupleLength;TIME timestamp;RelationSchema * relationSchema;map<string,list<int> >idMap;

The beginning address of the buffer

If the tuple is in a queue, it will use the buffer of the queue.If not, it will create its own buffer

The offset in the buffer

The tuple length

The timestamp of the tuple

The relation schema with the tuple

The map saves the provenance of the tupleMap[“s1”]=list{“id1”,”id2”}Map[“s2”]=list{“id4”}

Buffer layer

The buffer control class provides an interface of the layer.The upper layer needn’t to interact with other classes in the buffer layer.If we change the implementation of the buffer layer, we needn’t to change the code of the layer above as long as we maintain the same interface.

Façade design patternSingleton design pattern

File layer

Buffer layer

Spe layer

BufferControl

BufferControl

instance : singleton...

getInstance()insert()delete()toBeStored()storing()isStored()query()

Provenance life cycle

File layer

Buffer layer

BufferControl

BufferControl

instance : singleton...


The provenance arrives at the system, it is pushed to a queue.This queue is a storage queue, each tuple pushed in would be stored in the memory.Call the insert functionThe tuple make a copy on the memoryThen the tuple is processed along the query plan treeWhen the tuple arrives at a transportation queue for the streamCall the toBeStored functionInsert the id into a map for tuples to be storedThere will be another thread storing the tuple at sometime. It will

call the storing function.It will scan the map and see what provenance should be stored. And store the provenance in the fileWhen the tuple reaches the root operatorCall the isStored function to see if the provenance have been saved.At sometime the system will call the delete functionThe provenance may be deleted from the memoryAnother thread may call the query function to query provenanceIf the provenance has been stored, the tuple will be output to the client

PageContinuous memoryMay be 4kb, 16kb, …. 56kbpage In this system, pages are used to save two kinds of objects.

page Page for tuplestuple

tuple

tuple

pagebitmap

bitmap

bitmap

bitmap

bitmap

bitmap

bitmap

bitmap

bitmap

bitmap

bitmap

bitmap

Page for bitmapsMarkup the state of the tuple

0:not saved1:saved

Why to use bitmap?Because we should save the state for each tuple, not saved , saved 。We are able to use just use 1 bit for one tuple if we use bitmap.Just thinking about a stream about 10kb/s, each tuple is 8 bytes, then we can save 10*1024/8*0.875=1.12kb/s

Architecture for buffer layer

Hash table

List<Page*> List<Page*>

vector

buffer

list<Page*>unused list<Page*>used

Global buffer

page page page page page


page page

page*

page*

page*

page*

s2s1 s1 s2

page*

page*

page*

page*

page*

page*

tuplebitmap

Each name of stream would be hashed to a vector

Each vector would save the pointers of the pages that save the data of one stream.

The buffer for this hash table

The global buffer that all pages needed should be allocated from here.


Hash table


vector

buffer


Global buffer



page page

page*

page*

page*

page*

s2s1 s1 s2

page*

page*

page*

page*

page*

page*

Insert tuple o(1)

Just suppose that Page : 100 bytes

Tuple of stream1 : 10 bytesThen a page can store 10 tuples.

Suppose that a tuple from stream1 with an id of 21 comes now

Firstly, it will look up in the hash table to find the vector for stream1

Secondly ,it will see if the last page in the vector have space to save the tuple. If yes, the tuple will

be inserted into this page.

In this case, because each page can only save 10 tuples, the two pages just save 20 pages, so there is no space for the tuple with id of 21. so it will allocate a page from the

buffer

The buffer of the hash table would allocate the page from the global bufferThe global buffer just move a page from the unused list to the used list, and return this

page

page

Page*

Page*

And it is the same with the bitmapThen the page is added to the buffer and the vector. The tuple is inserted into this page

Buffer layer Sequence diagram for inserting a tuple

: BufferControl PageHashTable PageVector ProvenanceBuffer

StreamBuffer

ifStreamExist(streamName)

getInsertablePage(int id)

getMorePage()

page

page

true

getPage(int id)

page

push(data)

getOnePageToUse()

page


Hash table


vector

buffer


Global buffer



page page

page*

page*

page*

page*

s2s1 s1 s2

page*

page*

page*

page*

page*

page*

Find tuple o(1)

Suppose that the page is 100 bytesThe tuple of stream1 is 10 bytes

The first Id of the vector is 31

Suppose now we want to find the tuple with the identifier of 45

We just calculate 45-31/10=1It is the index of the tuple in the vector

And we calculate 45-31-10*1=4It is the offset of the page in the vector

As a result it is in the page of vector with index of 1, and offset of 4 in the page. Then we have found the tuple. It is the same

with the bitmap

Release the memoryIf we don’t release the memory used for saving provenance ,the memory would run out quicklyWe don’t release memory for one tuple each time , we just release memory for one page each time.We will look into every identifier of the provenance in the query plan tree. These identifiers are considered useful. And others are useless. Then the page contains no useful tuples would be deleted.leaf

leaf

leaf

leaf

select

join

join

join

root


Hash table


vector

buffer


Global buffer



page page

page*

page*

page*

page*

s2s1 s1 s2

page*

page*

page*

page*

page*

page*

Delete tuples o(nmp)

Suppose that one page is 100 bytesThe tuple of stream1 is 10 bytes

For releasing the memory ,we scan along the query plan tree, and we found the useful identify of stream1 are: 13, 14 ,16

And the first id of the vector is 1.

Then we know the first page of the vector contains no useful tuples, we will release it.

What we should do is just flush the page and move it from the used page list to the unused list

page

And delete the page from the buffer and vector. Update the first id of the vector. It is the same with the bitmap

User query for provenanceData flow diagram

When user don’t query for provenance

When user query for provenance , and the

provenance is in the pages in the memory

When user query for provenance , and the

provenance is not in the pages in the memory

Architecture for buffer layerQuery provenance





page page

page*

page*

page*

page*

s2s1 s1 s2

page*

page*

page*

page*

page*

page*

page

tuplequery

When we query the provenance with the identify of 31 in stream1.

We will see if the provenance is in the buffer for tuples, if yes, we will find the provenance in this page.

If not, we must read the page from disk. We will read one page of data at one time.

page*

page*

page*

page*

page*

page*

If not, we will see if the provenance is in the page for query.In the buffer for query, we can set there are at most 5 pages here

If the buffer is full ,we must get one page outThe strategy may be LFU: least frequently used. For example ,we will flush the last page in the buffer for query, and read the data from disk here , then put the page to the beginning of the buffer.

Buffer layer

The client code needn’t to know the implementation details of the tuple and bitmap and query.

Abstract factory design pattern

AbstractBufferFactory

Page *createPage()PageHashTable* createPageHashTable()PageVector * createPageVecot()ProvenaceBuffer * createProcenanceBuffer()

AbstractPage AbstractPageHashTable

AbstractPageVector

AbstractProvenanceBuffer

TupleBufferFactory

BitMapBufferFactory

TuplePage TuplePageHashTable

TupleVector TuplePorvenanceBuffer

BitMapPage BitMapPageHashTable

BitMapPageVector

BitMapProvenanceBuffer

BufferControl

instance : singleton


TuplePage

QueryPage QueryPageHashTable

QueryPageVector

QueryProvenanceBuffer

QueryFactory

Multi-threads• main tread : do most of things including

receiving data from streams.• Storing tread : save provenance• I/O thread : deal with I/O with clients

including registering streams, registering cqls, query provenance.

Lockdatastructure

ProvenanceMap

hashtable vector buffer globalbuffer page

type map map vector list List Unsigned char []

Lock logic Initialed

read

~read

Write

initialed

~write

read

~read

read

~read

write

initailed

~write

Write

~write

Write

~write

Write

write

Insert(thread 1)Read-write lock

tuple

thread-unsafe

Lock

datastructure

ProvenanceMap

hashtable vector buffer globalbuffer page


Lock logic Write

~write

To be stored(thread 1)

Read-write lock

Lockdatastructure

ProvenanceMap

hashtable vector buffer globalbuffer

page


Lock logic read

~read

Read

~read

Read

~read

Is tuple stored(thread 1)Read-write lock

bitmap

Lockdatastructure

ProvenanceMap


page


Lock logic Read

~read

Read

~read

read

~read

Read

Write

released

~write

~read

Write

~write

delete(thread 1)Read-write lock

tuple

Lockdatastructure

ProvenanceMap


page


Lock logic Write

~write

Read

~read

Read

~read

Read

~read

Read

~read

Read

~read

trywrite

~write

storing(thread 2)Read-write lock

tuple

bitmap

Lockdatastructure

ProvenanceMap


page


Lock logic read

~read

Read

~read

tryread

~read

Read

~read

Read

~read

write

Initialed

~write

Write

~write

query(thread 3)Read-write lock

tuple

query

Lock optimization

• We should reduce the cost of lock management while increase the concurrency

• The lock for buffer is useless because all threads would make no conflicts on it. We can get rid of it.

• The lock for global buffer can be changed to a mutex.

• Some not important operations can just do trylock and trywrite.

Lock performance analysisFor the read-write lock we used:• allowing concurrent access to multiple threads for reading • restricting access to a single thread for writes• write-preferring• The smallest granularity : PagePerformance lost:• When we need to do some operations on one page.• Page for tuple : reader—storing thread, query thread writer—main thread• Page for bitmap :reader—main thread writer—storing thread• Page for query : all done in the I/O threadConclusion:• Likely to improve performance while needs experiments

Concurrency controlstudying

File layer

• When write a tuple into the file• Get the offset of the tail of the file• Append the tuple on the tail of the file• Flush the buffer• Add the offset and tuple identifier to the index• Use partitioned hash to implement the two-

dimensional index.

diskfile

Write a tuple

I/O

System

Registering streams

Registering cqls

stream1

stream2

stream3

stream4

Query provenance

We just implement it in the main thread.Must be non-blocking I/O

We don’t use one thread for one I/O. We implement them in one thread.It can be blocked when there is no need to read or write We will use I/O Multiplexing here.

• When an application needs to handle multiple I/O descriptors at the same time

• When I/O on any one descriptor can result in blocking

• It can be blocked until any of the I/O descriptors registered becomes able to read, write or throw exception.

What is I/O multiplexing?

epoll

• epoll is a scalable I/O event notification mechanism

• It is meant to replace the older POSIX select and poll system calls.

File descriptor:

Write:

Read:

Fd=0 Fd=1 Fd=2 fd=3

0

Fd=4

0 0 1 1

0 0 1 1 0

select

Further work

• Implement the multi-threads design, use a thread to save the provenance

• Implement the file layer design. Add an index to the provenance saved in the file

• Implement the I/O design

Thank you

Technology

stream processing engine