28
January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction) The Baker’s Dozen Business Intelligence Webcast Radio 13 SQL Server -Business Intelligence Productivity Tips Kevin S. Goff Microsoft SQL Server MVP BDBI Webcast Radio

January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

  • Upload
    maisie

  • View
    48

  • Download
    3

Embed Size (px)

DESCRIPTION

BDBI Webcast Radio. The B aker’s D ozen B usiness I ntelligence Webcast Radio 13 SQL Server -Business Intelligence Productivity Tips Kevin S. Goff Microsoft SQL Server MVP . January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction). BDBI Radio. - PowerPoint PPT Presentation

Citation preview

Page 1: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

January 15, 201413 Tips for using SQL Server Change Data Capture

(Introduction)

The Baker’s Dozen Business Intelligence Webcast Radio

13 SQL Server -Business Intelligence Productivity Tips

Kevin S. GoffMicrosoft SQL Server MVP

BDBI Webcast Radio

Page 2: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 2

BDBI Radio• Produced and Hosted by Kevin S. Goff, Microsoft SQL

Server MVP and author of CoDe Magazine“Baker’s Dozen” Productivity Series from 2004-2013– http://www.codemag.com/People/Bio/Kevin.Goff

• Pre-recorded webcast – Available twice a month• Site: http://www.BDBIRadio.com• My overall blog/site: http://www.KevinSGoff.net• Program format:

– 13 quick tips on different SQL/BI/SharePoint topics– In any week, tips might cover one topic (13 features in PowerPivot,

13 SQL Interview topics), or 13 random tips on anything from data warehousing to MDX/DAX programming, to maybe even a few .NET tips for scenarios when the .NET and BI worlds collide

– (Email me at [email protected] if you have questions)

01/15/2014

Page 3: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 3

• Today: 13 tips for getting started with SQL Server Change Data Capture

• Change Data Capture (CDC) was a new feature added to SQL Server 2008 Enterprise

• A major feature added to the database engine to support “audit trail” capabilities

• This is an introductory session, first of a two-part series• Next broadcast I’ll have a more advanced session

01/15/2014

BDBI Radio

Page 4: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 4

1. Change Data Capture as a concept and why it’s important 2. History of capturing changes to data in SQL Server3. How SQL Server CDC works and how it differs from past solutions4. Configuring a database for CDC5. Configuring tables for CDC6. A simple CDC Demo with a table and some inserts, updates, deletes7. Viewing the CDC Log tables8. Querying the log tables9. CDC Latency10. Issue with CDC – capturing “who” made the change11. Handling DDL Changes12. Turning off SQL Server Agent13. Limitations of Change Data Capture

01/15/2014

BDBI Radio – Topics for today

Page 5: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 5

• For years, database developers/DBAs have looked for best way to…– Capture Changes that occur to data – Write out the changes to log (audit trail) tables

• Log (Audit Trail) tables usually contain three pieces of information:– Who changed data – When was it changed– What was the change (i.e. Discount % changed from 5% to 7%)

• In some industries, this functionality is absolutely critical• Need to know full history of changes – a customer profile might change

7 times in a year. Want to know each change, who made it, and when• Used in data warehousing systems – to feed a data warehouse with

“what was added and what was changed” since the previous load• Challenge is capturing EVERY change – whether made by the

application, by stored procedures, by import jobs, or manual updates in a query window in Management Studio

1 – Why Change Data Capture is important

01/15/2014

Page 6: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 6

• Just to illustrate that last point – on the challenge of capturing EVERY change – whether made by the application, by stored procedures, by import jobs, or manual updates in a query window in Management Studio

1 – Why Change Data Capture is important

01/15/2014

Need mechanism that will capture what was changed, regardless of the

original process

Some solutions are more robust and more effective than others

Page 7: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 7

2 – History of capturing changes

01/15/2014

• Prior to SQL Server 2008, here were the solutions available:• Database Triggers:

– PRO: They are “tried and true” and they work, regardless of update scenario– CONs:

• Often requires large amount of code (some will generate code via scripts & metadata)• In some instances, they can adversely affect performance

• Use of OUTPUT INTO statement added in SQL 2005– PRO: Simpler than triggers – can redirect changes directly into INSERT/UPDATE statement– CONs:

• Every DML statement on a table must be modified to include the OUTPUT INTO clause. Might not be practical in open-ended systems where UPDATES occur many ways

• Implement logic in the application layer or manually in stored procs– CONs: same as with OUTPUT INTO (might not handle every update scenario)

• Purchase third-party product (SQLAudit)– Debate on Build vs Buy, though some third-party products work well

Page 8: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 8

3 – How CDC works & how it differs

01/15/2014

• Found in SQL Server Enterprise, starting in 2008• Requires the use of SQL Server Agent• Mechanism in the Database Engine• Captures insert/update/delete activity and posts the details

to change tracking tables• Details include:

– State of an inserted row, after insert– State of an updated row (both before and after update)– State of a row just before deletion– Change tracking log tables include all versions of the row!

• Feature not enabled by default – must run a few system stored procedures to configure

Page 9: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 9

3 – How CDC works & how it differs

01/15/2014

• Database engine reads the transaction log asynchronously, using SQL Server Agent Jobs

• Logs details associated with CRUD operations to change tracking tables

• Change tracking tables contain complete version history of the rows

• Note: CDC does not track “who” made the change – we need to handle that manually

• CDC Jobs clean up change tracking logs rows that are 3+ days old

Page 10: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 10

4 – Configuring a database for CDC

01/15/2014

• SQL Server Agent must be ON• This might be a problem for organizations that manage database

services using a Managed Services area – some of them might use other approaches for scheduling jobs and might not enable SQL Server Agent

• Again, CDC requires SQL Server Agent• By default, CDC is not enabled on a database• To enable CDC on a database:

• Exec sys.sp_cdc_enable_db

Page 11: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 11

5 – Configuring tables for CDC

01/15/2014

• We’ll create a simple table with three columns• Then we need to enable CDC for the table, using the system stored

procedure exec sys.sp_cdc_enable_table

• This will do two things:• Creates a change tracking table to store all versions of inserted & modified rows• Creates a job in SQL Server Agent to scan transaction log for DML statements

• Again, SQL Server Agent MUST be turned on!!! • Next slide, we’ll see what SQL Server created

Specify the schema, the table, and an optional role for

user access to the change tracking

table that will be created

CDC creates 2 jobs in SQL Server Agent

Page 12: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 12

5 – Configuring tables for CDC

01/15/2014

Capture job runs system stored procedure

sys.sp_MScdc_capture_job

Cleanup job runs system stored procedure

sys.sp_MScdc_cleanup_job

Under system tables, we see the Change Tracking table for the base table TestEmployees. We’ll look at entries in the table in a few minutes

Page 13: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 13

6 – CDC Demo with inserts, updates, deletes

01/15/2014

• Insert two rows, then change a rate on one of the rows, then delete a row

• We want to see the state of each row after the inserts, before and after each update, and when the delete occurs

• Next step we’ll see what’s in the change tracking log

Page 14: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 14

7 – Viewing the CDC Log tables

01/15/2014

• First two rows ($Operation = 2) represent the state of the rows after the insert (for the 2 new rows)

• Second two rows ($Operation =3 and 4) represent the “old version of the row” and “new version of the row” when Katy’s rate was changed from 21.00 to 22.00

• The third row ($Operation = 1) represents the state of Katy’s row at the time of the deletion

• We’ll look at the Log Sequence Number (_$start_lsn) later - this is important!• One thing missing: datetime of the update?

Each row represents state of the row for each insert, update

(old state, new state) and delete

Page 15: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 15

8 – Querying the log tables

01/15/2014

• On the issue of the timestamp (datetime) not being in the log table• SQL Server stores the time in a separate table (cdc.lsn_time_mapping), with a

reference to the log sequence number column from the change tracking table

Time for all updates is stored in cdc.lsn_time_mapping

We can join this to the Change tracking table on the LSN

• Time mapping table holds the Begin and End time for each update

• We can join this table to the Change tracking table by the Start_LSN column

• This also holds the Transaction ID – helpful for debugging

• Reason for storing this in separate table: transactions might span multiple base tables

Page 16: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 16

8 – Querying the log tables

01/15/2014

• Here’s the JOIN

The transaction ID is useful for debugging

“what actually happened”

Remember, data only lives in the log tables for 3 days!!!

Page 17: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 17

9 – CDC Latency

01/15/2014

• After you insert/update/delete data from the core tables, you might not immediately see data in the change tracking logs

• Because the CDC jobs scan the transaction log asynchronously, there is a slight period of delay/latency

• A transaction that updates a million rows in five seconds might not necessarily see data in the change tracking logs for at least another five seconds

• You can adjust these jobs for different levels of latency – I’ll cover that next webcast

Page 18: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 18

10 – Handling who made an update

01/15/2014

• Change Data Capture does not capture one potentially key piece of information: who made the change

• Because CDC scans the transaction log asynchronously, after the transaction occurs, it cannot capture the user who made the change

• We need to implement some kind of workaround

Page 19: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 19

10 – Handling who made an update

01/15/2014

• Run SSMS as a different user (SQLUser)

• Run an update and change rows

Even tough current user is SQLUser, an update still

marks last user as original user (both in the

Employees table and also the change tracking log)

Page 20: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 20

10 – Handling who made an update

01/15/2014

• But wait – when we created dbo.TestCustomer, didn’t we include this in the DDL?• That only works on INSERTS, not UPDATES

• OK , since the Change Tracking Log always inserts new rows (even when original tables are updated), can’t we change the DDL for the Change Tracking Log and make the LastUser column use a default value for current user?• This is a great question, and often asked.• Unfortunately, when SQL Server CDC inserts a row into the Change Tracking table,

the “current user” is the SQL Service admin account, not the current user• So, considering an update could happen many different ways, how can

we get CDC to accurately capture the user who made the change???

Page 21: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 21

10 – Handling who made an update

01/15/2014

• Only ways:• Explicitly specify the LastUser Column when the original table is originally

updated• Might not always be possible or practical, if you have an open system where updates can

occur many different ways

• Database Trigger (ironically) to mark the LastUser in the original base table on a change – this will ripple through to the Change Tracking table

Code accesses the INSERTED and DELETED system tables, which are

“exposed” by the transaction log and represent the state of the row BEFORE (DELETED) and AFTER (INSERTED) the

update

Compare the “old value” and “new value” of each column whose change would warrant updating the Last User. This is the ONLY

WAY to detect an actual change was made

Page 22: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 22

11 – Handling DDL Changes

01/15/2014

• Suppose you add a column to a core table• Will Change Data Capture automatically start logging

that new column? • Unfortunately, no • We need to go through a series of steps:

1. Alter the core table to add the new column2. Back up the change tracking table to another temp table3. Disable CDC on that table (with a CDC stored procedure)4. Re-enable CDC on that table (with a CDC stored procedure), which will add

a reference to the new column (but will wipe out the data in the change tracking log, which is why we backed it up)

5. Select all of the data from the temp table (from step 2) back into the new change tracking table

Page 23: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 23

11 – Handling DDL Changes

01/15/2014

Page 24: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 24

12 – Turning off SQL Server Agent

01/15/2014

• Suppose you turn off SQL Server Agent and then add rows to a table

• Since the CDC scan job must be running in SQL Server Agent, turning Agent off will keep any logging from occurring

• However, when you turn Agent back on, the job will run and “pick up” any changes from the transaction log that weren’t initially captured

• Believe it or not, even if you attempt to truncate the transaction log in this period of time, any “uncaptured changes” will still be captured when Agent is turned back on!

Page 25: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 25

13 – Limitations of Change Data Capture• Only available in Enterprise Edition of SQL Server• Cannot truncate tables involved in Change Data Capture• Cannot be implemented on tables that do not have a

primary key• Cannot use CDC on a table that has a columnstore index

01/15/2014

Page 26: January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)

13 tips for SQL Server CDC (Intro) 26

Next webcast: Advanced CDC • Tools in SSIS 2012 to work in conjunction with SSIS• Options for adjusting latency/delay of CDC scan jobs• Additional system stored procedures for using CDC• Performance benchmarks using CDC

01/15/2014