Upload
takashi-hoshino
View
4.246
Download
2
Embed Size (px)
DESCRIPTION
Source code repository.https://github.com/starpos/vmbkp
Citation preview
Vmbkp: An Online Backup Toolfor VMware vSphere
Oct 15, 2010
HOSHINO Takashi
Cybozu Labs, Inc.
1
What is Vmbkp?
• Backup software for Virtual Machines in VMware vSphere environment
– Online full/differential/incremental backup
– Multi-generation backup management
– Efficient archive access with sequential IO and reverse diff.
– Command-line I/F for scheduling by Cron
2
Supported platform
• VMware vSphere 4
– vCenter server managing several ESX(i)s
– Single ESX(i) (not tested)
– Free ESXi is not supported (snapshot fails)
• Backup server
– Linux on x86_64 host.
– CentOS 5.5 64bit is confirmed
3
Hardware Architecture
4
VMware vSpherevCenter Server
VMwareESX(i) Host
VMwareESX(i) Host
VMStorage
VMStorage
VMStorage
Backup Storage
VmbkpServer
Control/GetInfo with vSphere Soap Protocol
Data Transfer via SANwith VDDK Protocol
SAN
LAN
You can use NBD transfer without SAN.
Commands
• Update:– Get and save information of all available VMs
• Backup:– Execute backup of the specified vm/group or all
• Restore:– Execute restore of the specified archived generation as a new VM
• Check:– Check backup archives are valid
• Status:– Show status of backup archives
5
Commands –cont.
• Destroy:– Remove a virtual machine from vSphere environment
• Clean:– Delete archives of virtual machines
• List:– Get a list of virtual machines satisfying specified conditions
• Help:– Show usage
6
(Register to cron)
Prepare config
Read config/profiles
Get vSphere information
Read config/profiles
Restore vmdk files
Add disks to new VM
Import ovf
Update profiles
(Delete previous dump)
Delete snapshot
Backup vmdk files
(Get changed block info)
Create snapshot
Export ovf (without disks)
Workflow
7
Backup Restore
Prepare config
User task
Vmbkp task
Backup target VMs
Restore target VMs
Configuration files
• Global (required)
– Global configuration• Backup directory
• Number of generations to keep
• Vmdkbkp path to backup/restore vmdk files
• vSphere authentication information
• Group (optional)
– Group configuration for convenient use
8
Layout of Archive Files
• <backup dir>– AllVM profile
• <backup dir>/<vm>/– VM profile
• <backup dir>/<vm>/<generation>/– Generation profile
– Ovf file for VM configuration
– Dump/digest/rdiff/bmp files for each vmdk
9
Profiles
• Allvm– Information/status of all VMs in the target vSphere environment– Updated by update command
• Vm– Information/status of archives of a VM– Created/updated by backup command and referred by restore
command
• Generation– Information/status of each generation of backup of a VM– Created by backup command and referred by restore command
10
Software Architecture
11
Backup/Restore Controller
VI Java Library
Soap Wrapper
Vmdkbkp: VmdkBackup/Restore
Tool/Library (C++)
VDDK C Library
Vmdkbkp Wrapper
Command-line Interface
Cron User
Snapshot
Ovf
Changed blocks
UtilityLibrary
VMware vSpherevCenter Server
VMware ESX(i) Host
SANStorage
Bitmap
XML (Ovf)
Config/Profile
Required Tools and Libraries
• Java SE 1.6– Java, Javac, Jar comands
• VI-Java 2.1GA– soap wrapper
• G++ 4.4
• Boost 1.43– shared_ptr, scoped_array, thread, and iostreams
• VDDK 1.2.0– Virtual disk development kit by Vmware
12
Source Code Overview (Java)
• control/*– Command-line I/F
– Backup/restore Controller
– Vmdkbkp wrapper
• soap/*– Soap (VI-Java) wrapper
• utility/*– Utilities for Ovf, Bitmap,
Command line, etc.
• config/*– Config/profile parser and
accessor
• profile/*– Semantic-level config/profile
managers
13
VmdkBkp (C++ code)
What is VmdkBkp?
• Online backup softwarefor remote/local vmdk filesin VMware vSphere environments.
– Currently support vSphere version 4.
• Written in C++
• Uses VDDK Library by Vmware
• Used by Vmbkp (java) tool
Archive Files
• Dump/Rdiff– VMDK metadata and blocks archive
without zero-blocks– Dump is full archive,
Rdiff is reverse differential one– Dump + Rdiff = Previous dump
• Digest– MD5 digest data for all blocks of VMDK– Used to check equality of blocks,
and validate corresponding dump/rdiff files
Supported Commands
• Dump– Execute full/differential/incremental dump
• Restore– Execute restore with dump/rdiff
• Check– Validate dump/rdiff with digest data
• Print– Print dump/rdiff/digest for human read
• Digest– Make digest from dump
• Merge– Make past dump from current dump and past rdiff(s)
How to Backup Remote Vmdk
• Command line:– vmdkbkp dump [connect options] --mode [full/diff/incr]
--vm [vm moref] --snapshot [snapshot moref]--remote [disk path] --dumpin [previous dump] --dumpout [current dump]--digestin [previous digest] --digestout [current digest]--bmpin [changed block bitmap]--rdiffout [current-previous rdiff]
• Inputs/Outputs:– Full: Just --dumpout and --digestout are required– Diff: All options except --bmpin are required– Incr: All options are required
Full Backup
• Ovf
– VM configuration data(without disk information)
• Dump
– Full data of vmdk(without zero-blocks)
• Digest
– Digest data of all blocks
19
Virtual Disk(vmdk)
Dump
Digest
Vmbkp Tool
Backup files
Ovf
VM Configuration
All blocks
Non-zero blocks
Differential Backup
• Rdiff
– Reverse difference data of vmdk
– Dump’ + Rdiff’ = Dump• You can delete dump of previous
generation after current backup
20
Virtual Disk(vmdk)
Dump’
Digest’
Vmbkp Tool
Backup files ofcurrent generation
Ovf’
VM Configuration
Dump
Digest
Backup files ofprevious generation
Ovf
Rdiff’
All blocks
Non-zero blocks
Incremental Backup
• Changed Block Information
– The set of address of changed blocks after previous backup
21
Virtual Disk(vmdk)
Dump’
Digest’
Vmbkp Tool
Backup files ofcurrent generation
Ovf’
VM Configuration
Dump
Digest
Backup files ofprevious generation
Ovf
Rdiff’
Changed BlockInformation
Non-zero blocks
Changed blocks
Vmdk Archives Relationships
0.dump
0.digest
1-0.rdiff
1.bitmap
1.dump
1.digest
0.vmdk 1.vmdkWrite some data on the 1st vm.
Full dump
Full dump
Diffdump
rdiff2bmp
Incrdump
Check the all dump/digest filesfrom all possible paths are the sameusing check_dump_and_dump andcheck_digest_and_digest.
Vmdk Archives Relationships –cont.
0.dump
0.digest
1-0.rdiff
1.dump
1.digest
0.vmdk 1.vmdk
Restore
Merge
Restore
Write some data on the 1st vm.
Digest
Restore to 0.dump Full dump 0.vmdk to 0r.dump Check 0.dump and 0r.dump are the same.
Merge 1.dump and 1-0.rdiff to 0m.dump Digest 0m.dump to 0m.digest Check 0.{dump,digest} and 0m.{dump.digest} are the same.
Software Architecture of vmdkbkp
• Command– Parse command-line and execute it
• Util– Configuration, Time, etc.
• Header– Manage header/blocks of
dump/rdiff/digest files
• Exception– Exceptions and related macros.
• Manager– Manage (1) VDDK connection,
(2) vmdk file access, and (3) dump/rdiff/digest file access
• Serialize– StringMap/Integers data serializer
• Bitmap– Bitmap data serializer
Bitmap
Util Manager
Serialize
Command
General components
Command executor
Specific componentsHeader
Exception
VDDK Control with Fork
• Solves the problem that VDDK re-initialization for SAN transfer due to SCSI reservation conflict error inevitably fails and falls back to NBD transfer.
25
VDDK Control with Fork –cont.
VddkManager VmdkManager
VddkWorker(parent)
VddkWorker(child)
VddkController
Main process
Forked process
Provide the same interfacewith Vddk/Vmdk Manager
Manage processes andcommunicate with child
Wrapper of Vddk/Vmdkmanager and communicatewith parent
Multi-threaded Archive Manager
• Improves performance with gziped multi-stream dump/restore/check/merge operations
27
DataReader, DataWriter
Queue
Archive Managers
Archive IO Managers
Interface of archive accesses specialized for each command
Multi-threaded/Single-threadedstream access for each archive file
Worker thread and its controller for Gzip compresson/decompression
Thread-safe FIFO
Restore/Check with MultiArchiveManager
Archive Manager
Multi Archive Manager
Full dump
Rdiff
Rdiff
Full dump Rdiff Rdiff
waiting waiting
Restore with SAN
• Problem in restore with SAN
– Failed auto-allocation for thin vmdk
– Auto-allocation is too slow for thick vmdk
– There is no efficient allocation API.
• If zero-block restore with NBD is faster, use it as allocation method
– not fast…
Future Work
• Improve parallelism
– Solving SCSI reservation conflict problem
– Multi-threaded compression
• Restore with SAN
– Depends on VDDK’s efficient block allocation API
30