Upload
adam-book
View
82
Download
1
Embed Size (px)
Citation preview
Redshift WLM Work Load Management
Sponsors
Find me on LinkedIn
AWS Certifications
Presented by Adam Book
Redshift Use Cases
Enterprise Data Warehouse
• Reduce costs by extending DW rather then extending HW
• Migrate completely from existing DW systems
• Respond Faster to business
Companies with Big Data
• Improve performance by an order of magnitude
• Make more data available for analysis
• Access business data via standard reporting tools
SaaS Companies
• Add analytic functionality to applications
• Scale DW capacity as demand grows
• Reduce HW & SW costs by an order of magnitude
Amazon Redshift System Architecture
Amazon Redshift Node Types
3.7 vCPU ECU Mem Storage I/O
dc1.large 2 7 15 0.16 TB SSD 0.20 GB /s
dc1.8xlarge 32 104 244 2.56 TB SSD 3.70 GB/s
vCPU ECU Mem Storage I/O
ds2.xlarge 4 14 31 2TB HDD 0.40 GB /s
ds2.8xlarge 36 116 244 16 TB HDD 3.30 GB /s
Dense Compute
Dense Storage
Columnar Storage
SSN Name Age City ST
101259797 SMITH 88 JUNO AL
892375862 CHIN 37 POMONA CA
318370701 HANDU 12 CHICAGO IL
Columnar storage for database storage tables is an important factor in optimizing analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk
101259797 SMITH 88 JUNO AL 892375862 CHIN 37 POMONA CA 318370701 HANDU 12 CHICAGO IL
Block 1 Block 2 Block 3
What is WLM?
WLM (Work Load Management) is a feature for managing queues while running queries on Amazon Redshift.
Steps to WLM ?
• Defining Query Queues• Modifying the WLM Configuration • WLM Queue Assignment Rules• Assigning Queries to Queues• WLM Dynamic and Static Configuration
Properties• WLM System Tables and Views
Tuned Queries still running S-L-O-W?
Image by http://www.pexels.com/
The Query Submission
• Leader Node– Parses query and creates logical tree– Query rewritten for optimization– Compiled to C++– Submitted to Compute Nodes
Managing workloads
• How many queries can run concurrently in each queue• How much memory is allocated among the queues• How queries are routed to queues, based on criteria such as the
user who is running the query or the query label • Query timeout settings for a queue
WLM Queue Assignment Rules
When a user runs a query, WLM assigns the query to the first matching que based on the rules.
Redshift Queues
• Super User queue: For Administrators (1 concurrent query)
• Worker queue: For things like bi queries (5 concurrent queries)
• Processing queue: for background processing
WLM Queues
Queue Concurrency User Groups Query Groups
Superuser 1 superuser
1 4 UG1
2 4 QGB
3 4 UG3 QGC
Default 4
The following table shows a WLM configuration with the Superuser queue and four user-defined queues.
Creating Queues(WLM configuration)
• Add a new queue in the WLM configuration – Set the concurrently level – Add User group name – Allocate Memory Percentage– Set the Timeout in milliseconds (optional)
WLM Dynamic Memory Allocation
The workload manager uses the following process to manage the transition.
1. WLM recalculates the memory allocation for each new query slot.2. If a query slot is not actively being used by a running query, WLM removes the slot, which makes that
memory available for new slots.3. If a query slot is actively in use, WLM waits for the query to finish.4. As active queries complete, the empty slots are removed and the associated memory is freed.5. As enough memory becomes available to add one or more slots, new slots are added.6. When all queries that were running at the time of the change finish, the slot count equals the new
concurrency level, and the transition to the new WLM configuration is complete.
WLM Settings tips:
One of the cases where you can gain the most from setting WLM is when you have time-consuming queries filling your queue. Without WLM, such time-consuming queries will clog your queue. Even the queries that run fast will take longer if it has to wait longer if it has to wait for these slower queries to finish running.
Separate out time-consuming queries
WLM Settings tips:
Since the resource of each node is shared among concurrent processes, increasing the concurrency level total too high will limit the overall throughput. In the AWS development guide, they recommend setting the concurrency level total (the total of the concurrency levels for all queues) to 15 or lower.
Set the total concurrency levels to 15 or lower
WLM Settings tips:
With many queues, the amount of allocated memory for each queue becomes smaller. On top of that, there is the recommended limit of a total of 15 concurrency levels, so it becomes difficult ot maximize Redshift performance if you have too many queues.
Avoid adding too many queues
WLM Settings tips:
Redshift has reserved a “superuser” queue for running commands related to the system, troubleshooting, or for manual operations in emergency situations. So, you won’t need to set another queue for such purposes.
Use superuser for troubleshooting
set query_group to ‘superuser’; analyze; vacuum;reset query_group;
Query: Existing WLM config
SELECT service_class, num_query_tasks, evictable, eviction_threshold, name FROM stv_wlm_service_class_configWHERE service_class > 5;
Run the following query to check the existing WLM configuration.
Service_class | num_query_tasks| evictable|eviction_threshold| name-------------------------------------------------------------------------------- 6| 8| false| 0| Service class #1 7| 6| true| 125163520| Service class #2 8| 10| true| 300000000| Service class #3 9| 1| false| 0| Service class #4
Query: After last restart
SELECT service_class, num_executing_queries, num_executed_queries FROM stv_wlm_service_class_stateWHERE service_class > 5ORDER BY service_class;
Run the following query after the last cluster restart to show the number or queries that have either executed or have completed executing.
Service_class | num_executing_queries| num_executed_queries-------------------------------------------------------------------------------- 6| 2| 7023 7| 1| 2057 8| 1| 2970 9| 0| 0
Quick Redshift Tuning Tips
• Number of nodes, processors or slices• Node Types• Data Distribution• Data Sort Order• Dataset size• Concurrent operations• Query structure • Code compilation
Factors Affecting Query Performance:
Compression EncodingsEncoding Type Keyword in CREATE TABLE and
ALTER TABLEData Types
Raw (no compression)
RAW All
Byte Dictionary BYTEDICT All except Boolean
Delta Delta
Delta32K
SMALLINT, INT, BIGINT, DATE, TIMESTAMP, DECIMAL
INT, BIGINT, DATE, TIMESTAMP, DECIMALLZO LZO All except BOOLEAN, REAL, and DOUBLE
Precision
Run-length RUNLENGTH ALL
Compression EncodingsEncoding Type Keyword in CREATE TABLE and
ALTER TABLEData Types
Mostlyn MOSTLY8
MOSTLY16
MOSTLY32
SMALLINT, INT, BIGINT, DECIMAL
INT, BIGINT, DECIMAL
BIGINT, DECIMALText TEXT255
TEXT32K
VARCHAR only
VARCHAR only
Zstandard ZSTD All
Questions?
Image by http://www.gratisography.com/
Interested in SponsoringAWS Atlanta?
Image by http://www.gratisography.com/