What’s new in the Azure Data Platformdownload.microsoft.com/download/6/5/0/65023338-AE17...AZURE...

Data Platform Airlift21 de Outubro \\ Microsoft Lisbon Experience

What’s new in the Azure Data PlatformRicardo Peres

Luis Calado

Azure DocumentDB

Azure Search

Azure Machine Learning Marketplace

Azure SQL Database

Azure Data Lake

Azure Data Factory

Agenda

Headline

Core Concepts

Resources

Indexes

Querying

Paging

Updating

Transactions

Partition Resolvers

User Defined Functions

Stored Procedures

Triggers

Security

Limits

Search

Best Practices

Headlines

NoSQL database as a service for JSON documents

Schemaless

RESTful

Part of Azure – only available online

Highly scalable

Several bindings (.NET, JavaScript, Python, ...)

Core Concepts

Resources (1 of 3)

Documents that live in DocumentDB

All have a unique addressable URL (_rid or id):https://{account}.documents.azure.com/dbs/{_rid-db}/colls/{_rid-col}/docs/{_rid-doc}

All live inside a collection

A collection lives inside a database

A database belongs to an account

A collection can take different kinds of documents

Resources (2 of 3)

Either POCOs or inherit from Resource

Some built-in properties:

If an id property is not specified, one will be provided (Guid)

Case matters!

Property User Settable Purpose

_rid No System generated, unique and hierarchical

identifier

_etag No HTTP etag required for optimistic concurrency

control

_ts No Last updated timestamp

_self No Unique addressable URL

id Yes User defined unique name

Resources (3 of 3)

Can have attachments:https://{account}.documents.azure.com/dbs/{_rid-db}/colls/{_rid-col}/docs/{_rid-doc}/attachments/{_id-attch}

Additional properties:

Property User Settable Purpose

contentType Yes The content type of the attachment

media Yes The URL link or file path where the

attachment resides

Indexes (1 of 2)

Consistency can be configured per collectionConsistent: indexes are updated synchronously

Lazy: indexes are updated asynchronously

Indexes (2 of 2)

By default, all paths are indexed, can be overriden

Three kinds of property indexes:Hashed: for exact matchesRange: for range comparisons, orderingSpatial: for geospatial queries

Three kinds of property value indexes (from JSON):String (precision: 1-100 or -1)Number (precision: 1-8 or -1)Point

A collection can have several indexes at once

If a collection does not have an index, it cannot be queried except by id or self link!

Querying – SQL (1 of 3)

Returns JSON

Joins only inside document (collections)

No comparison of different data types (undefined)

Math: +, -, *, /, %

Bitwise: |, &, ^, <,>>, >>>

Logical: AND, OR, NOT

Comparison: =, !=, <, >, <=, >=, <>

String: ||

Ternary and coalesce: ?, ??

IN, BETWEEN, ORDER BY

Parameterized – no SQL injection

SQL functions:Math: ABS, CEILING, EXP, FLOOR, LOG, LOG10, POWER, ROUND,

SIGN, SQRT, SQUARE, TRUNC, ACOS, ASIN, ATAN, ATN2, COS, COT, DEGREES, PI, RADIANS, SIN, TAN

Type checking: IS_ARRAY, IS_BOOL, IS_NULL, IS_NUMBER, IS_OBJECT, IS_STRING, IS_DEFINED, IS_PRIMITIVE

String: CONCAT, CONTAINS, ENDSWITH, INDEX_OF, LEFT, LENGTH, LOWER, LTRIM, REPLACE, REPLICATE, REVERSE, RIGHT, RTRIM, STARTSWITH, SUBSTRING, UPPER

Array: ARRAY_CONCAT, ARRAY_CONTAINS, ARRAY_LENGTH, ARRAY_SLICE

Spatial: ST_DISTANCE, ST_WITHIN, ST_ISVALID, ST_ISVALIDDETAILED

SQL Ternary and coalesce: ?, ??

SELECT (c.grade < 5)? "elementary": "other" AS gradeLevel

FROM Families.children[0] c

SELECT f.lastName ?? f.surname AS familyName

FROM Families f

Projecting into new JSON objects:SELECT { "state": f.address.state, "city": f.address.city, "name": f.id }

FROM Families f

WHERE f.id = "AndersenFamily“

Creating arrays:SELECT [f.address.city, f.address.state] AS CityState

FROM Families f

Returning single values:SELECT VALUE “Hello World”

[{ "$1": { "state": "WA", "city": "seattle" }, "$2": { "name": "AndersenFamily" } }]

[ { "CityState": [ "seattle", "WA" ] }, { "CityState": [ "NY", "NY" ] } ]

[ "Hello World" ]

Querying - LINQ

LINQ functions:Math: Abs, Acos, Asin, Atan, Ceiling, Cos, Exp, Floor, Log, Log10,

Pow, Round, Sign, Sin, Sqrt, Tan, Truncate

String: Concat, Contains, EndsWith, IndexOf, Count, ToLower, TrimStart, Replace, Reverse, TrimEnd, StartsWith, SubString, ToUpper

Array: Concat, Contains, and Count

Spatial: Distance, Within, IsValid, and IsValidDetailed

Paging

Can specify maximum number of items to retrieve

Has more results / get next results

Ordering

Updating

InsertsFrom POCOFrom StreamBatching:

Document ExplorerData Migration ToolStored Procedures

ReplacesConcurrency control from Etags

DeletesBy self link or id

Transactions

No explicit transactions

Implicit inside triggers and stored procedures – only at collection level

Partition Resolvers

Specified per database

Possibly several

Can decide on which collection a document is to be saved or retrieved from

Included:HashPartitionResolver: distribute data evenly accross collections

RangePartitionResolver<T>: when there is a “natural” ordering, such as with date and time

User Defined Functions

JavaScript-based

Exist in collections

No side effects

var regexMatchUdf = new UserDefinedFunction {

Id = "REGEX_MATCH",

Body = "function (input, pattern) {

return input.match(pattern) !== null;

SELECT udf.REGEX_MATCH("ardo", s.Id) FROM Session s

Stored Procedures

JavaScript-based

Can do batching

Implicit transactions

function (gender) {

var response = getContext().getResponse();

var collection = getContext().getCollection();

var query = 'SELECT * FROM c WHERE c.Gender= "' + gender + '"';

collection.queryDocuments(collection

.getSelfLink(), query, {},

function(err, documents, options) {

response.setBody(response.getBody() + JSON.stringify(documents));

Triggers

JavaScript-based

Two types:Pre trigger

Post trigger

function updateTrigger() {

var request = getContext()

.getRequest();

var doc = request.getBody();

doc[‘message’] = ‘Added by trigger’;

request.setBody(doc);

Security

Access keys:

Master (single)

Read only (multiple)

Database users – specify use at DocumentClient level

Permissions for users over resources (resource tokens: default expiration is 1h, up to 5h):

Resources:

Collections

Documents

Attachments

Stored procedures

Triggers

User defined functions

LimitsFeature Limit

Maximum Request Units / second / collection 2500

Maximum execution time for stored procedure

and trigger

Provisioned document storage / collection 50 GB

Maximum collections per database account* 100

Maximum document storage per database

(100 collections)*

Maximum Length of the Id property 255 chars

Maximum request size of document and

attachment

512 KB

Maximum number of JOINs per query* 5

Number of stored procedures, triggers and

UDFs per collection*

Number of users per database account 500.000

Search

Based on Elasticsearch and Lucene

.NET + REST APIs

Can retrieve data from DocumentDB

Best Practices

Cache the DocumentClient instance

Choose right collection index update policy

Index only properties that will be searchable and with appropriate values – watch out for ranges

Store small documents

Measure and tune request costs

Retrieve only what you need – paging, projections

Cache self links – they never change

Use partition resolvers for distributing burden

Beware throttling!

Meet the Competition

MongoDBOpen source + support model

No joins

Aggregations

Time to live

Offline deployment

Replication

Eventual consistency

ACID transactions

Map/Reduce

Several programming languages supported

RavenDBOpen source + support model

Joins across documents

Aggregations

Expiry

Offline deployment

Replication

Eventual consistency

ACID transactions

Map/Reduce

.NET, REST

References

Query Playground: https://www.documentdb.com/sql/demo

.NET Azure DocumentDB Samples: https://github.com/Azure/azure-documentdb-net

DocumentDB Studio: https://studiodocumentdb.codeplex.com/

Azure DocumentDB Data Migration Tool: http://www.microsoft.com/en-us/download/details.aspx?id=46436

Pricing: https://azure.microsoft.com/en-us/pricing/details/documentdb/

Connecting DocumentDB with Azure Search using indexers: https://azure.microsoft.com/en-us/documentation/articles/documentdb-search-indexer/

A search-as-a-service solution allowing developers to incorporate great search experiences into applicationswithout managing infrastructure or needing to become search experts.

Type Ahead

FacetsFacets

Hit Highlighting

Spelling Mistakes

Geo-Spatial Search

Paging

Sorting & Scoring

New indexers (SQL Database and DocumentDB)

New language support (35 languages including pt-PT)

Index creation in the new Management Portal

New Regions

New APIs for index creation

• Distance

• Intersection

Full Text Search

Secure data with authentication, authorization and encryption

Extended Events

Azure Portal

Azure Ops Team

ML Studio

Data Scientist

HDInsight

Azure Storage

Training Set

from on-prem

Azure Portal &

ML API service

Azure Ops Team

PowerBI/DashboardsMobile AppsWeb Apps

ML API service Developer

ML Studio and the Data Scientist

• Access and prepare data

• Create, test and train models

• Collaborate

• One click to stage for

production via the API service

Azure Portal & ML API serviceand the Azure Ops Team

• Create ML Studio workspace

• Assign storage account(s)

• Monitor ML consumption

• See alerts when model is ready

• Deploy models to web service

ML API service and the Developer

• Tested models available as an url that can be called from any end point

Business users easily access results:

from anywhere, on any device

Event Hubs

ML Studio ML API Service

Microsoft

Azure Portal

Blob Storage

ML Apps

Marketplace

ML Operationalization

ML Studio

ML Algorithms

Observation

Pattern

Theory

Hypothesis

What will happen?

How can we make it happen?

Predictive

Analytics

Prescriptive

Analytics

What happened?

Why did it happen?

Descriptive

Analytics

Diagnostic

Analytics

Top-Down

Confirmation

Theory

Hypothesis

Observation

Implement Data Warehouse

Physical Design

Development

Reporting &

Analytics

Development

Install and Tune

Reporting & Analytics Design

Dimension Modelling

ETL Design

Setup Infrastructure

Understand Corporate Strategy

Data sources

BI and analytic

Data warehouse

Gather Requirements

Business Requirements

Technical Requirements

Ingestregardless of requirements

Storein native format without

schema definition

AnalyzeUsing analytic engines

like Hadoop

Interactive queries

Batch queries

Machine Learning

Data warehouse

Real-time analytics

Devices

Store and analyse data of any kind and size

Develop faster, debug and optimise smarter

Interactively explore patterns in your data

No learning curve—use U-SQL, Spark, Hive, HBase and Storm

Managed and supported with an enterprise-grade SLA

Dynamically scales to match your business priorities

Enterprise-grade security with Azure Active Directory

Built on YARN, designed for the cloud

AZURE DATA LAKE

TOOLSVisual

Studio

PowerShell

Azure Data Factory

Azure Stream

Analytics*

HDInsight

Azure SQL

AzureML*

3rd Party

Informatica*

3rd Party

Cloudera*

Hortonworks*

Open Source

RevolutionR*

PowerBI*

3rd Party

PLATFORMS

APPLICATIONS

DATA INTEGRATION TOOLS

Last Name First Name Country Age …

Flasko Mike Canada 32

Anand Subbaraj USA 30

Gaurav Malhotra USA 72

… …. …. ….

Last Name First Name At risk of

churning

Flasko Mike Yes

Anand Subbaraj No

Gaurav Malhotra Yes

… ….

Call Log Files

Customer Table

Call Log Files

Customer Table

Customer

Churn Table

Data Sources Ingest Transform & Analyze Publish

Customer

Call Details

Customers

Likely to

What’s new in the Azure Data Platformdownload.microsoft.com/download/6/5/0/65023338-AE17...AZURE...

Documents

28532B - resource.cdn.azure.cn · Windows Azure 2 Azure B: Windows Azure 3 Azure Web Azure Azure Wet, Azure Azure Azure ASP.NET W,ndows Azure 4 Azure SQL Azure SQL Azure SQL Azure

Azure & Open Source Azure Ecosystem - pershing.com.t€¦ · Azure & Open Source Azure Ecosystem. ... Docker on Azure ... •Jenkins and Hudson Plugins PaaS Websites •Azure Java

Azure Powershell. Azure Automation

Azure Backup と Azure Site Recovery

Digitale Strategien individuell umsetzen · Azure Application Insights Azure Backup Azure Cognitive Services Azure Container Services (Kubernetes) Azure Datalake Analytics Azure Event

Global Azure BootCamp: Azure Logic Apps

Cognitive Services Walking Deck - L100 · Azure Cognitive Services Azure Bot Service Azure Cognitive Search Azure Databricks Azure Machine Learning Azure AI Infrastructure 知识挖掘

Microsoft EIMdownload.microsoft.com/download/6/5/0/65023338-AE17-44DF-9254-E4... · Governance”, garantindo o controle do IT •O SQL Server tem uma plataforma EIM que permite:

Azure App service (Azure Deep Dive)

Migrating SQL Workload to Azure · •Azure Webjobs •Azure SQL Elastic Job (S0 tier and higher) •Azure Automation •Azure Functions •On-premise SQL or Azure VM SQL Server with

Azure Prime: Azure Mobile Apps Azure Mobile Engagement

SQL Server AlwaysOn & Hybriddownload.microsoft.com › download › 6 › 5 › 0 › 65023338-AE17-44DF...By running our software in the cloud, we can help reduce information silos

Seguridad en SQL Azure Windows azure

Partnering in IoT - Microsoft... · 2018-12-28 · Azure Stream Analytics Azure Cosmos DB Azure Data Lake Azure Data Lake Analytics Azure HD Insight Spark, Storm, Kafka Azure Event

Sparking your Knowledge with Azure Sparkdownload.microsoft.com/download/6/5/0/65023338-AE17-44DF-925… · Apache Hive, Cassandra and MongoDB. Runs on top the Apache YARN resource

Azure Media Services & Azure Search

Windows Azure Storage SQL Azure

eBECS SmartWorker - Microsoft Azure · 10/1/2015 · Azure HDInsight, AzureML, Power BI, Azure Data Factory, Azure Data Lake Hot path analytics Azure Stream Analytics, Azure HDInsight

Azure documentDB and Azure Search

RECOMMENDED STRATEGY FOR HYBRID CLOUD INFRASTRUCTURE · Azure DNS External DNS AZURE INTEGRATION POINTS Azure Logging, Metrics, etc Azure Active Directory User Authentication Azure