16
Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Does SQL

U-SQL Does SQL (SQLBits 2016)

Embed Size (px)

Citation preview

Page 1: U-SQL Does SQL (SQLBits 2016)

Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com

U-SQL Does SQL

Page 2: U-SQL Does SQL (SQLBits 2016)

“Top 5”sSurprises for SQL Users

AS is not as• C# keywords and SQL keywords

overlap• Costly to make case-insensitive ->

Better build capabilities than tinker with syntax

= != ==• Remember: C# expression language

null IS NOT NULL• C# nulls are two-valued

PROCEDURES but no WHILE

No UPDATE nor MERGE• Transform/Recook instead

Page 3: U-SQL Does SQL (SQLBits 2016)

Let’s do some SQL with U-SQL

Page 4: U-SQL Does SQL (SQLBits 2016)

@customers = SELECT Customer.ToUpper() AS Customer FROM @orders WHERE Customer.Contains("Contoso");

C# Expression

Transforming Rowsets

C# Expression

Use WHERE for filtering

Page 5: U-SQL Does SQL (SQLBits 2016)

@rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer;

Many other aggregations are

possible. You can define your own aggregator

with C#!

Grouping & Aggregation

Page 6: U-SQL Does SQL (SQLBits 2016)

@rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer HAVING TotalAmount > 1000000;

HAVING filters the output of a GROUP BY

Grouping & Aggregation (2)

Page 7: U-SQL Does SQL (SQLBits 2016)

Sorting a rowset@customers SELECT * FROM @customers ORDER BY Amount ASC FETCH FIRST 3 ROWS;   SELECT with ORDER

BY requires a FETCH FIRST!

Page 8: U-SQL Does SQL (SQLBits 2016)

Sorting on OIUTPUTOUTPUT @customers TO @"/output.tsv" ORDER BY Amount ASC USING Outputters.Tsv();

Page 9: U-SQL Does SQL (SQLBits 2016)

Creating Constant Rowsets in Script@departments = SELECT * FROM (VALUES (31, "Sales"), (33, "Engineering"), (34, "Clerical"), (35, "Marketing") ) AS D( DepID, DepName );

Page 10: U-SQL Does SQL (SQLBits 2016)

@t = EXTRACT date string, time string, author string, tweet string FROM "/Samples/Data/MyTwitterHistory.csv“ USING Extractors.Csv();

@m = SELECT new ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs FROM @t;

@t = SELECT author, "authored" AS category FROM @t UNION ALL SELECT r.Substring(1) AS r, "referenced" AS category FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r);

@res = SELECT author, category, COUNT( * ) AS tweetcount FROM @t GROUP BY author, category;

OUTPUT @res TO "/Samples/Data/Output/MyTwitterAnalysis.csv"ORDER BY tweetcount DESCUSING Outputters.Csv();

@m(refs)@me, @you

@him, @her

Refs(r)@me

@you

@him

@her

CROSS APPLY EXPLODE

@me, @you

@me

@you

Page 11: U-SQL Does SQL (SQLBits 2016)

U-SQLJoins

Join operators

• INNER JOIN• LEFT or RIGHT or FULL OUTER JOIN• CROSS JOIN• SEMIJOIN

• equivalent to IN subquery• ANTISEMIJOIN

• Equivalent to NOT IN subqueryNotes

• ON clause comparisons need to be of the simple form: rowset.column == rowset.columnor AND conjunctions of the simple equality comparison

• If a comparand is not a column, wrap it into a column in a previous SELECT

• If the comparison operation is not ==, put it into the WHERE clause

• turn the join into a CROSS JOIN if no equality comparison

Reason: Syntax calls out which joins are efficient

Page 12: U-SQL Does SQL (SQLBits 2016)

U-SQLAnalytics

Windowing Expression

Window_Function_Call 'OVER' '(' [ Over_Partition_By_Clause ]

[ Order_By_Clause ] [ Row _Clause ]')'.

Window_Function_Call :=Aggregate_Function_Call

| Analytic_Function_Call| Ranking_Function_Call.

Windowing Aggregate Functions

ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP

Analytics Functions

CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG

Ranking Functions

DENSE_RANK, NTILE, RANK, ROW_NUMBER

Page 13: U-SQL Does SQL (SQLBits 2016)

Inserting New Data

Page 14: U-SQL Does SQL (SQLBits 2016)

INSERT• INSERT constant

values• INSERT from

queries• Multiple INSERTs

INSERT constant values

INSERT INTO T VALUES (1, "text", new SQL.MAP<string,string>("key","value"));

INSERT from queries

INSERT INTO T SELECT col1, col2, col3 FROM @rowset;

Multiple INSERTs into same table

• Is supported• Generates separate file per insert in physical

storage:• Can lead to performance degradation

• Recommendations:• Try to avoid small inserts• Rebuild table after frequent insertions with:ALTER TABLE T REBUILD;

Page 16: U-SQL Does SQL (SQLBits 2016)

http://aka.ms/AzureDataLake