Upload
michael-rys
View
426
Download
0
Embed Size (px)
Citation preview
Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Does SQL
“Top 5”sSurprises for SQL Users
AS is not as• C# keywords and SQL keywords
overlap• Costly to make case-insensitive ->
Better build capabilities than tinker with syntax
= != ==• Remember: C# expression language
null IS NOT NULL• C# nulls are two-valued
PROCEDURES but no WHILE
No UPDATE nor MERGE• Transform/Recook instead
Let’s do some SQL with U-SQL
@customers = SELECT Customer.ToUpper() AS Customer FROM @orders WHERE Customer.Contains("Contoso");
C# Expression
Transforming Rowsets
C# Expression
Use WHERE for filtering
@rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer;
Many other aggregations are
possible. You can define your own aggregator
with C#!
Grouping & Aggregation
@rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer HAVING TotalAmount > 1000000;
HAVING filters the output of a GROUP BY
Grouping & Aggregation (2)
Sorting a rowset@customers SELECT * FROM @customers ORDER BY Amount ASC FETCH FIRST 3 ROWS; SELECT with ORDER
BY requires a FETCH FIRST!
Sorting on OIUTPUTOUTPUT @customers TO @"/output.tsv" ORDER BY Amount ASC USING Outputters.Tsv();
Creating Constant Rowsets in Script@departments = SELECT * FROM (VALUES (31, "Sales"), (33, "Engineering"), (34, "Clerical"), (35, "Marketing") ) AS D( DepID, DepName );
@t = EXTRACT date string, time string, author string, tweet string FROM "/Samples/Data/MyTwitterHistory.csv“ USING Extractors.Csv();
@m = SELECT new ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs FROM @t;
@t = SELECT author, "authored" AS category FROM @t UNION ALL SELECT r.Substring(1) AS r, "referenced" AS category FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r);
@res = SELECT author, category, COUNT( * ) AS tweetcount FROM @t GROUP BY author, category;
OUTPUT @res TO "/Samples/Data/Output/MyTwitterAnalysis.csv"ORDER BY tweetcount DESCUSING Outputters.Csv();
@m(refs)@me, @you
@him, @her
Refs(r)@me
@you
@him
@her
CROSS APPLY EXPLODE
@me, @you
@me
@you
U-SQLJoins
Join operators
• INNER JOIN• LEFT or RIGHT or FULL OUTER JOIN• CROSS JOIN• SEMIJOIN
• equivalent to IN subquery• ANTISEMIJOIN
• Equivalent to NOT IN subqueryNotes
• ON clause comparisons need to be of the simple form: rowset.column == rowset.columnor AND conjunctions of the simple equality comparison
• If a comparand is not a column, wrap it into a column in a previous SELECT
• If the comparison operation is not ==, put it into the WHERE clause
• turn the join into a CROSS JOIN if no equality comparison
Reason: Syntax calls out which joins are efficient
U-SQLAnalytics
Windowing Expression
Window_Function_Call 'OVER' '(' [ Over_Partition_By_Clause ]
[ Order_By_Clause ] [ Row _Clause ]')'.
Window_Function_Call :=Aggregate_Function_Call
| Analytic_Function_Call| Ranking_Function_Call.
Windowing Aggregate Functions
ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP
Analytics Functions
CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG
Ranking Functions
DENSE_RANK, NTILE, RANK, ROW_NUMBER
Inserting New Data
INSERT• INSERT constant
values• INSERT from
queries• Multiple INSERTs
INSERT constant values
INSERT INTO T VALUES (1, "text", new SQL.MAP<string,string>("key","value"));
INSERT from queries
INSERT INTO T SELECT col1, col2, col3 FROM @rowset;
Multiple INSERTs into same table
• Is supported• Generates separate file per insert in physical
storage:• Can lead to performance degradation
• Recommendations:• Try to avoid small inserts• Rebuild table after frequent insertions with:ALTER TABLE T REBUILD;
Additional Resources
DocumentationU-SQL Reference Doc: http://aka.ms/usql_reference
Sample Projectshttps://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis
http://aka.ms/AzureDataLake