VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

Preview:

DESCRIPTION

VMworld 2013 Darryl Hing, VMware Canada Jacy Townsend, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Citation preview

VMware vSphere Data Protection (VDP) Technical

Deep Dive And Troubleshooting Session

Darryl Hing, VMware Canada

Jacy Townsend, VMware

BCO4756

#BCO4756

2

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

3

Overview

File and image level; Full and incremental backups .

Variable Length Block Deduplication

4

Overview

Replacement for VDR 1 Optimized for Virtual

Advanced Dedupe 3 Backup and Recovery 4

2

6

Key Features

Up to 100 VMs per appliance

100 VMs

Up to 8 TB of De-duplicated backup data

capacity per appliance

8TB Dedupe

Up to 10 VDP virtual appliances are supported per

vCenter

10 appliances

7

Key Features

Powered by EMC

Avamar

Bundled with

vSphere 5.1

Essentials & +,

Standard, Enterprise

& Enterprise Plus

Variable Length

Dedupe

9

Important URLs

Configuration: https://<VDP_IP>:8543/vdp-configure

Management URL: https://<vCenter_IP>:9443/vsphere-client

FLR Portal: https://<VDP_IP>:8543/flr

Default Credentials: root/changeme

10

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

11

Terminologies – General Backup

VMware ESXi &

ESX

SNAPSHOT

Snapshot: Preserve state of VM at

point in time including power state.

Full Backup: Complete backup of

VM.

Full Backup Differential

Differential: Files changed since

last FULL backup.

4

Incremental: Files changed since

last backup. 4

Incremental

5

File Level Restore (FLR): Restore

files individually. 5

1

2

3

1

2 3

12

Terminologies – Backup Types

Full Backup

Cumulative or

Differential

Incremental

Full Cumulative Incremental

13

Terminologies – VMware Specific

CBT: Identifies disk sectors

altered.

Microsoft VSS: Automatic or

manual backups and

snapshots of data.

Quiescing: Pause or alter

running processes that can

modify disk during backup.

Steady State: When data

being imported to the dedupe

store is less or equal to the

amount of data being pruned

14

Terminologies - RPO & RTO

Recovery Time Objective (RTO) – How quickly you need to have applications back

up and running after downtime.

Recovery Point Objective (RPO) – Point to which data must be restored to

successfully resume work.

RTO RPO

Major Incident Last backup Backup Data Restored

15

Terminologies - Deduplication

VM-A

A B C

D E F

1 2 3

VM-B

F E D

3 1 2

C A B

Source

Object

Pointers

Data

Compression

A B C D E F 1 3 2

X Y Z

Identify duplicate or

redundant data

Only unique data is

stored

Saves pointers

instead of multiple

copies

Consumes less

disk space

16

Backup Process

Sticky Byte

Factoring

Compression

Hashing

Store hash and

Data on GSAN

#

1 2 3

4

1

2

3

4

17

Sticky Byte Algorithm

Data chunks average size is 24kB

Data chunks vary in size between 1 and 64kB

10000000001000000000

00100000000000000000

10010001000001000001

10101010001010001010

10kB 25kB 5kB

40kB

10000000000000000000

00110000000000000000

10010001000001000001

10101010001010001010

5kB 10kB 25kB

40kB

First Backup

Subsequent Backup – Change in VM

18

Terminologies – Compression

Chunks are compressed

to 30% - 50% there

original size

Average compressed

chunk size 12 kB – 16kB

Compression occurs

when we can achieve

=>25% compression 2kB 1kB 5kB 8kB

10kB 25kB 5kB

40kB

19

Terminologies – Hashing

Hashing continues until

a single root hash for the

backup is created

Atomic hashes

are combined to

create composites.

The hash created from

each data object is

called an atomic hash.

Data is used to create

the hash, but it is not

converted into the hash

1

2

3

4

20

VMware Backup History

VDP

2013 -> TBA

VDR

2009 -> TBA

VCB

2006 - 2010

21

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

22

Log Procurement

Open the VDP configure

URL

Click “Collect Logs”

Name appropriately

23

How To Scope a VDP Issue

Who? 1 What?

When? 3 Where? 4

2

24

Core Services

Scheduler

/usr/local/avamar/var/mc/server_log/mcserver.log

MCS

Worker Thread

/usr/local/avamarclient/var-proxy-N/avagent.log

AvaAgent

VMware API Module

/usr/local/avamarclient/var-proxy-N/<Jobname>-

<EPOCH>-vmimage[w|l].log

AvVcbImage

25

Core Services

Deduplication and Compression

/usr/local/avamarclient/var-proxy-N/

<Jobname>-<EPOCH>-vmimage[w|l]_avtar.log AvTar

Storage

/data01/cur/gsan.log

GSAN

26

Log Locations

• /usr/local/avamar/var/vdr/server_logs/vdr-configure.log

Installation

• /usr/local/Avamar/var/avi/server_log/avinstaller.log*

• /usr/local/Avamar/var/avi/server_log/AvamarInstallSles*.log

Configuration

27

Log Locations

• /usr/local/avamar/var/mc/server_log/mcserver.log*

• /usr/local/avamar/var/vdr/server_logs/vdr-server*

• /usr/local/avamar/var/log/dpnctl.log*

• /usr/local/avamarclient/var-proxy-N/avagent*.log

• /usr/local/avamarclient/var-proxy-N/<jobname>-<EPOCH>-

vmimage[w|l].log

• /usr/local/avamarclient/var-proxy-N/<jobname>-<EPOCH>-

vmimage[w|l]_avtar.log

• /data01/cur/gsan.log

Backup and Restore

28

Log Locations

• /usr/local/avamar/var/flr/server_log/flr-server.log

• /usr/local/avamarclient/bin/logs/FlrMerged.log

• /usr/local/avamarclient/bin/logs/VmwareFlr.log

• /usr/local/avamarclient/bin/logs/VmwareFlrWs.log

File Level Restore(FLR)

29

ALG File – About the Job

<proxyDirectives>

<flag type="string" value="vm-221" name="vm_moref" />

<flag type="string" value="Windows Server 2008 R2"

name="guest_fullname" />

<flag type="string" value=“VDPTest" name="vmname" />

<flag type="string" value="[VMStore1] VDPTest/VDPTest.vmx"

name="vmx_path" />

<flag type="string" value="/VDP_Lab" name="vmware_datacenter" />

<flag type="string" value="192.168.8.31" name="esxserver" />

<flag type="string" value="192.168.8.43" name="vmware_server" />

</proxyDirectives>

ALG File

30

LOG File – About the Process

2013-03-05 01:03:37 avvcbimage Info <9754>: VDDK IO

102400.00 MB, Performance: 297.5 MB/minute, Duration:

05:44:15

2013-03-04 16:38:53 avvcbimage Warning <14654>: The in-use

blocks (pass 1) could not be found for 'VDP-

136243273610203b57a3b4bb8946f82f4a78bdb8e0d0da870a', using

disk extents.

2013-03-05 01:09:25 avvcbimage Error <9769>: Timeout on wait

for spawned avtar process to complete

2013-03-05 01:09:25 avvcbimage FATAL <16018>: The datastore

information from VMX '[VMStore1]

VDP_Protected_VM/VDP_Protected_VM.vmx' will not permit a

restore or backup.

LOG File

31

Finding the Work Order Logs Quickly

# cd /usr/local/avamarclient/var-proxy-3

# IFS=$(echo -en "\n\b");for i in `ls *.alg`;do grep -m 1 " START" $i | rev | awk

'{print $4" "$5}' | rev;grep vmname $i|awk -F\" '{print $4}';echo

$i;echo;done;unset IFS

2013-03-04 16:32:14

VM_Name_1

Daily 5 Day Retention-1362432700504-

618a82a5277ebb1dd536b018a407a21582926e6a-3016-vmimagew.alg

2013-03-05 16:07:30

VM_Name_1

Daily 5 Day Retention-1362517629476-

6acb4658af622ac48a52d73247aad95b1887af7c-3016-vmimagew.alg

Finding Work Orders

32

Scenario 1

• /usr/local/avamar/var/mc/server_log/mcserver.log*

• /usr/local/avamar/var/vdr/server_logs/vdr-server*

• /usr/local/avamar/var/log/dpnctl.log*

• /usr/local/avamarclient/var-proxy-N/avagent*.log

• /data01/cur/gsan.log

Logs

33

Scenario 1

2013-03-05 23:01:35 avvcbimage Info <16001>: Found 1 disk(s),

0 snapshots, and 1 snapshot ctk files, on the VMs datastore.

2013-03-05 23:01:35 avvcbimage Warning <16002>: Too many extra

snapshot files (1) were found on the VMs datastore. This can

cause a problem for the backup or restore.

2013-03-05 23:01:35 avvcbimage FATAL <16018>: The datastore

information from VMX '[VMStore1]

VDP_Protected_VM/VDP_Protected_VM.vmx ' will not permit a

restore or backup.

2013-03-05 23:01:35 avvcbimage Info <0000>: Starting graceful

(staged) termination, Too many pre-existing snapshots will not

permit a restore. (wrap-up stage)

2013-03-05 23:01:35 avvcbimage Error <9759>: createSnapshot:

snapshot creation failed

LOG File

34

Scenario 2

$grep "Node restarted" ./data01/cur/err.log

2013/02/26-17:52:38.81009 {P0.0} [gsan] <0017> Node restarted

When?

2013/02/26-17:52:35.07740 {0.0} [strtask.6:3281] <0055> checkpoint

cp.20130223140423 3300 out of 3590 stripes complete

2013/02/26-17:52:36.21084 {0.0} [perfbeat.0:273] WARN:

<0963> server node 0.0 is swapping: check configuration

2013/02/26-17:52:38.81009 {P0.0} [gsan] <0017> Node restarted

Why?

35

Scenario 2 – Successful Checkpoint Sample

2013/02/27-14:18:54.19296 {0.0} [manage:196] <0054>

checkpoint cp.20130227141853 started

2013/02/27-14:18:58.14928 {0.0} [strtask.1:3247] <0055>

checkpoint cp.20130227141853 300 out of 3595 stripes

complete

2013/02/27-14:19:00.72912 {0.0} [strtask.2:3483] <0055>

checkpoint cp.20130227141853 600 out of 3595 stripes

complete

<SNIP>

2013/02/27-14:19:27.42271 {0.0} [manage:2746] <0056>

checkpoint cp.20130227141853 completed

2013/02/27-14:19:27.50773 {0.0} [sched.cp:3263] <4301>

completed checkpoint maintenance

/data01/cur/err.log

36

Scenario 3 – Storage Performance

2013/01/24-01:09:47.04134 {0.0} [perfbeat.7:197] WARN:

<1060> perfbeat::outoftolerance mask=[backup,restore]

average=2191.09 limit=219.1092 mbpersec=0.04

/data01/cur/gsan.log

37

Scenario 2

#grep perfbeat /data01/cur/err.log |

awk '{print $1"="$10}' | awk -F= '{print $1" - "$3}'

2013/02/18-13:16:05.93532 - 10.95

2013/02/18-13:19:40.12223 -

2013/02/18-13:20:44.07831 - 25.40

Performance Data

2013/02/18-13:19:40.12223 {0.0} [perfbeat.0:218] WARN:

<0963> server node 0.0 is swapping: check configuration

Swapping

38

What Next?

Review the monitor logs (vmware.log) at the time of the incident

for both the VDP appliance and the target VM. 1

Review the vCenter logs at the time logs at the time of the incident 2

Review the ESX logs (hostd/vmkernel) at the time of the incident. 3

39

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

• Troubleshooting

• Administration

Commands

Resources

40

Should only be used to resume

daily backups. Should not be

used as a workaround except in

extreme conditions.

Backup Best Practices - Troubleshooting

Redploy VDP

Define:

Who , What , When

Where and WHY

SCOPE - W5

Understand how the product

works and which modules

communicate with other modules.

Communications

41

Plan your deployment

Backup Best Practices – Administration

Plan

Ensure your storage

infrastructure can handle

the capacity and load.

Always use HCL

hardware

Storage

Separate and group the

workload between

appliances, or

deduplication stores

Separate

42

Check backups regularly,

do not set and forget

Backup Best Practices – Administration

Set And Forget

Think about single points

of failure and consider

correcting these

conditions.

Single Points Of Failure

At => 60% space

utilization be mindful of

storage consumption.

Consumption

43

Limit on-demand backups

during the maintenance

window

Backup Best Practices – Administration

On Demand Backups

Avoid initiating

on-demand maintenance

activities (CP, CP

Validation, or GC)

On Demand Maintenance

44

Backup Best Practices – Administration

• Check the status of the deduplication

store. (Checkpoints)

• Check the status of the backup

subsystems.

• Review any failed backups.

Weekly

• Test restore plan. Ensure business

continuity.

• Review and correct any new trends.

• Review storage performance, and

storage growth. Monthly / Quarterly

45

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

46

Commands - MCCLI

root@vdp:~/#: mccli server show-prop

State Full Access

Total capacity 535.7 GB

Capacity used 1.7 GB

Server utilization 0.3%

Bytes protected 10.0 GB

Time since Server initialization 21 days 21h:48m

Last checkpoint 2013-03-27 11:26:37 PDT

Last validated checkpoint 2013-03-27 11:26:37 PDT

System Name vdp.vdp.lab

IP address 192.168.2.99:26000

show-prop

47

Commands - MCCLI

root@vdp:~/#: mccli server show-services

Name Status

-------------------------------- ---------------------------

Hostname vdp.vdp.lab

IP Address 192.168.2.99

Load Average 0.97

Last Admin Datastore Flush 2013-04-18 07:45:00 PDT

PostgreSQL database Running

192.168.2.103 All vCenter connections OK.

show-services

48

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

49

VMware Backup History - VDP

References

• Datasheet: http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-with-Operations-

Management-Datasheet.pdf

• Admin Guide: http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-Data-Protection-

Administration-Guide.pdf

• VDDK Guide: https://www.vmware.com/support/developer/vddk/vddk-511-releasenotes.html

50

Other VMware Activities Related to This Session

HOL:

HOL-SDC-1305

Business Continuity and Disaster Recovery In Action

Group Discussions:

BCO1002-GD

Data Protection and Backup with Jeff Hunter

BCO4756

THANK YOU

VMware vSphere Data Protection (VDP) Technical

Deep Dive And Troubleshooting Session

Darryl Hing, VMware Canada

Jacy Townsend, VMware

BCO4756

#BCO4756

Recommended