1. Azure Stream Analytics Dr. Nico Jacobs, nico@ .be,
@SQLWaldorf Tweet and win an Ignite 2016 ticket #itproceed
2. Why Traditional Business Intelligence first collects data
and analyzes it afterwards Typically 1 day latency But we live in a
fast paced world Social media Internet of Things Just-in-time
production We want to monitor and analyze streams of data in near
real time Typically a few seconds up to a few minutes latency
3. A different kind of query Traditional querying assumes the
data doesnt change while you are querying it: We query a fixed
state If the data is changing: snapshots and transactions freeze
the data while we query it Since we query a finite state, our query
should finish in a finite amount of time table query result table
14
4. A different kind of query When analyzing a stream of data,
we deal with a potential infinite amount of data As a consequence
our query will never end! To solve this problem most queries will
use time windows stream temporal query result stream 12:15:00 1
12:15:10 3 12:15:20 2
5. Azure Stream Analytics In Azure Stream Analytics we create,
manage and run jobs Every job has at least one input, one query and
one output But jobs can be more complex: a query can read from
different inputs and write to multiple outputs QueryInput Output
Query
6. Inputs Currently two types of input supported Data Stream:
an Azure Event Hub or Azure Blob through which we receive a stream
of data Reference Data: an Azure Blob for static reference data
(lookup table) No support for Azure databases or other cloud
storage (yet)
7. Temporal query Query is written in SQL! No Java or .Net
coding skills needed Mainly a subset of T-SQL A few extra keywords
are added to deal with temporal queries
8. Output Results are stored either in Azure Blob storage:
creates log files with temporal query results Ideal for archiving
SQL database: Stores results in Azure SQL Database table Ideal as
source for traditional reporting and analysis Event hub: Sends an
event to an event hub Ideal to generate actionable events such as
alerts or notifications Azure Table storage: More structured than
blob storage, easier to setup than SQL database and durable (in
contrast to event hub) PowerBI.com: Ideal for near real time
reporting!
9. Time for action! Online feedback on this talk Browse to
itprofeed.azurewebsites.net Event hub Azure Stream Analytics
PowerBI.com
10. Demos 1. Create an Azure Service Bus Event Hub 2. Implement
applications to send data into the Event Hub 3. Create an Azure
Stream Analytics job 4. Link the input 5. Create an output 6. Write
and test a query 7. Start the job
11. Create Azure Event Hub Azure event hub is newest component
in Azure Service Bus Typically used to collect sensor and app data
Event hub collects and temporary stores thousands of events per
second
12. Implement application for sending events
13. Create Azure Stream Analytics job Currently only available
in the old Azure portal Preferably put it in the same region as
Event Hub and data storage
14. Link the input Event hub does not assume any data format
But stream analytics needs to parse the data Three data formats
supported: JSON, CSV and Apache Avro (binary JSON) No columns
specified
15. Create an output Five output options: Azure Table or Blob,
SQL Database, Event Hub or PowerBI.com Blob and event hub do not
require predefined meta-data Again: CSV, JSON and Avro supported
When storing information in a SQL Database or Azure Table storage
we need to create upfront the table in which we will store the
results Meta-data needed upfront
16. Create Query In a query window we can write two types of
statements: SELECT statement to extract a stream of results from
one or more input streams Required Can use WITH clause to write
more complex constructs or increase parallelism CREATE TABLE
statements to specify type information on our input stream(s)
17. Simple SELECT statement SELECT | * FROM [WHERE ] This query
simply produces a filtered output- stream based on the input stream
In the SELECT statement and WHERE clause we can use functions such
as DATEDIFF But many functions from T-SQL are not available E.g. we
can use CAST but not CONVERT
18. Testing a query Trial and error query development would be
slow: Starting a Stream Analytics job takes some minutes Inspecting
the outcome of a job means checking tables or blobs We cannot
modify a query while it is running Luckily when a job is stopped,
we can run a query on data from a JSON text file and see the
outcome in the browser There is even a sample input option
19. Data types Very simple type system: Bigint Float
Nvarchar(max) Datetime Inputs will be casted into one of these
types We can control these types with a CREATE TABLE statement:
This does not create a table, but just a data type mapping for the
inputs
20. Group by Group by returns data aggregated over a certain
subset of data How to define a subset in a stream? Windowing
functions! Each Group By requires a windowing function
(fromMSDN)
22. Timestamp by A record can have multiple timestamps
associated with them E.g. the time a phone call starts, ends, is
submitted to the event hub, is processed by Azure Stream Analytics,
By default the timestamp used in the temporal SQL queries is
System.Timestamp Event hub arrival time Blob last modified data But
we can include an explicit timestamp in the data we provide. In
that case we must follow the FROM in our temporal query with
TIMESTAMP BY
23. JOIN We can combine multiple event streams or an event
stream with reference data via a join (inner join) or a left outer
join In the join clause we can specify the time window in which we
want the join to take place We use a special version of DateDiff
for this
24. INTO clause We can have multiple outputs Without INTO
clause we write to destination named output With INTO clause we can
choose for every select the appropriate destination E.g. send
events to blob storage for big data analysis, but send special
events to event hub for alerting
25. Out of order inputs What if event 6:54:32 arrives after
event 6:55:55? Trick: buffer your data for n minutes: all events
that arrive less than n minutes late will be processed (tolerance
window) What do we do with everything that arrives more then n
minutes late? Do we skip them (drop) or do we pretend they happened
just now (adjust)?
26. Scaling By default every job consists of 1 streaming unit A
streaming unit can process up to 1 Mb / second When higher
throughput is needed we can activate up to 6 streaming units per
regular query If your input is a partitioned event hub, we can
write partitioned queries and partitioned subqueries (WITH clause)
A non-partitioned query with a 3-fold partitioned subquery can have
(1+3) * 4 = 24 streaming units!
27. Pricing Azure Stream Analytics 0.55 per streaming unit per
day (+- 17 /month) 0.0008 per Gb throughput So, when processing
about 10 million events at a max. rate of 1 Mb/sec. this costs less
than 18 a month
28. Machine Learning Sensor thresholds are not always constant
But Azure can learn which values preceded issues Azure Machine
Learning
29. Summary Azure Stream Analytics is a PaaS version of
StreamInsight Process stream of events via temporal queries
Supports multiple input and output formats Scales to large volumes
of events Temporal queries are written in SQL variant
30. And win a Lumia 635 Feedback form will be sent to you by
email Give me (more) feedback
31. Follow Technet Belgium @technetbelux Subscribe to the
TechNet newsletter aka.ms/benews Be the first to know