View
1
Download
0
Category
Preview:
Citation preview
XtreemStore – A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL
Software Produkt Portfolio
FileLock
Scalable sync & share solution for secure data exchange for organizations and enterprises
- Software for high performance data archiving
- Parallel file system with integrated archive
Software WORM solution for audit-proof long-term data archiving on disk systems
Windows and Linux based HSM software for long-term data archiving up to petabyte scale
New Products
Established Products
Product Family
Archive Manager - the Basis for XtreemStore
Disk Archive
Archive Server Incl. Performance Disk
NFS / CIFS
PACS VIDEO PrePress CAD/CAM
DMS Email / Files Scientific Data Others
Tape Archive
„Tape Media“
“Disk Media”
SAN (remote)
Disk Archive
“Disk Media”
TCP/IP (remote)
Tape Archive
„Tape Media“
Client – Server Architecture
Keep it simple “All in one box”
Several clients for high throughput
Windows/Linux Clients for seamless integration
into CIFS or NFS world
CIFS NFS
Client 1 Client 2 Client n Windows Client Linux Client
GAM Server Win/Linux GAM Server Win/Linux
Tape Disk
Disk
Disk Tape
Tape
Advantages of different configurations
Server
Client
Several Disk and Tape Pools
LAN
UNIX WINDOWS
Partitioning for physical separation of data
Disk Media
Tape Media
Archive Manager
Remote Copy
Tape Media
Key Features
Online access to „unlimited content“
Multiple media strategy
− Secure and cost effective archiving on tape and / or removable disk
− fast access to disk
Partitioning for different applications („data separation“)
Remote copy functionality
Several levels of backup / recovery options
Free selection of archival hardware (server, disk, tape libraries)
Direct access through file system (additional API optional)
Native support for CIFS (Active Directory) & NFS
XtreemStore HPC – Posix Archive YOUR DATA. YOUR CONTROL
Need for high performance POSIX Archive
POSIX interface is very common to all applications
But Standard File System have limitations
ArchiveManager inherits those limitations when working with Standard file systems
Major limits:
− The amount of files within one File System
− The throughput of one single File System
− Throughput for huge files (10TB and more)
To digest large data volumes from parallel sources GAM needs an embedded adapted parallel file system ( P/HSM-FS ) - „Everything needs to be parallel“
GAM + P/HSM-FS XtreemStore
P/HSM-FS is an adaption of BeeGFS
New product: XtreemStore
ArchiveManager (GAM) with built in Parallel HSM File System XtreemStore
Storage Management Software (HSM; Archive; Backup) for high performance applications.
Parallel access through a Meta File System with POSIX interface to
unlimited amount of storage on disk and tape.
Grid structure build on standard PC hardware
NFS / CIFS NFS / CIFS NFS / CIFS
Driving an unlimited amount of standard hardware
Software Architecture XtreemStore
Parallel HSM File System
GAM Client
GAM Server
XtreemStore
Storage Server
BeeGFS Client
Primary Disk Raid 5/6
Storage Server
Storage Server * n
Meta Data Server
BeeGFS
BeeGFS Client
BeeGFS Client
* n
GAM Client
GAM Server
Secondary Disk
and Tape
Primary Disk Raid 5/6
GAM Client
GAM Server
Secondary Disk
and Tape
Primary Disk Raid 5/6
GAM Client
GAM Server
Secondary Disk
and Tape
BeeGFS is adapted to
work with GAM
Each storage node of BeeGFS becomes a HSM-node
This works OK as an
extremely fast archive or as target for HPC-HSM
Some high speed operations of BeeGFS are slowed down
For HPC-improvements the architecture is in rework process. (later chart)
Overview XS-GAM
BeeGFS Client Linux
Primary Disk Raid 5/6
GAM Client
GAM Server
Secondary Disk
and Tape
New architecture: Pools of nodes
Nodes may have
different characteristics
Life cycle of data includes several types of storage
Data end up on tape
All data are managed in one name space
Near Future: Seamless integration of HPC and HSM
Meta Data Server
BeeGFS
BeeGFS Client
Windows
storage node
for high-speed
i.e. SSD Primary Disk Raid 5/6
GAM Client
GAM Server
Secondary Disk
and Tape
Storage Node Long Term
storage node
for high-speed
i.e. SSD
Storage Node
fast
i.e. SSD
storage node
for high-speed
i.e. SSD
storage node
for high-speed
i.e. SSD
Storage Node
medium
i.e. SAS
Storage Management for Lustre: Overview
Lustre Server
Lustre Clients & Server
Backend Technologies
XtreemStore Parallel Data Mover
Robinhood Policy Engine
NAS
Isilon
etc.
p/XS/FS
GRAU
ArchiveManger
Object Store
S3, etc.
(Future)
HSM API
XtreemStore HSM
Storage Adapter
Disk + Tape from any vendor
Use Case 1: Standard Throughput
Lustre OST
NAS – Device (single path) i.e. ArchiveManager
Lustre Client
HSM-API
Lustre OST
Lustre Client
Lustre Client
Lustre Client
Lustre Client
Lustre Client
Lustre Client
PDM
Use Case 2: High Throughput
Lustre OST
Lustre Client
HSM-API
Lustre OST
Lustre Client
Lustre Client
Lustre Client
Lustre Client
Lustre Client
Lustre Client
PDM
Lustre Client
PDM
Lustre Client
PDM
NAS – Device with parallel ingest
i.e. XtreemStore HSM / Isilon / . . .
XtreemStore Parallel Data Mover (XS-PDM) for HPC YOUR DATA. YOUR CONTROL
XS - Parallel Data Mover (currently POSIX only)
Data Mover
Data Mover
Data Mover
Data Mover
Data Mover
Data Mover
The XS - Parallel Data Mover is just software
It may run on
− the source machine
− the target machine
− or on dedicated computer nodes
The amount of streams running in parallel is not limited
Possible Environment
Data Mover
In theory PDM works in any environment.
Throughput depends only on hardware infrastructure.
Grid structure environment seems best matching to those requirements.
To scale to full speed, it needs source and target nodes that can deliver and ingest the required amount of data and a network without bottlenecks.
The amount of streams running in parallel is not limited.
Source Node
Target Node
Data Mover
Source Node
Target Node
Data Mover
Source Node
Target Node
How the PDM works
Large files are copied chunk-wise
− Each PDM node copies one set of chunks
− The number of threads and the chunk sizes are freely configurable
− The throughput capacity of target and source nodes should be matched
Small files are copied single-threaded, the threshold is configurable
1
2
3
4
5
6
7
8
9
A0 B
C
D
E
F
G
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
G Data Mover
Source File Target File
Data Mover
Data Mover
Data Mover
Customer Installations
Customer Installation samples
Lustre installation at University of Rijeka
BeeGFS with continuous backup functionality at Automotive Supplier
Installation at Intel lab Lustre to Lustre
Lustre + XtreemStore at University of Rijeka
Standard architecture
About 20 Lustre client nodes
4 Lustre storage nodes
4 HSM nodes for throughput
− 2 physical machines
− 2 HSM nodes each
1 Tape library with 400 tapes and 4 drives LTO-6
Productive since November 2015
Scratch File System gets Archive + Continuous Backup
Lustre Storage Node
Primary Storage
Lustre Clients
Parallel Data Mover Master
Data Mover
Data Mover
Data Mover
Data Mover
Lustre Storage Node
Scratch File System
XtreemStore for Archive and Backup
HSM for Archive and Backup
GAM Client
GAM Server
Rijeka Functionality
Lustre clients are working for HPC applications
HSM – functionality
− “Old” data are moved to XtreemStore
− Storage on Lustre is released
− Files are transparently accessible through Lustre
− Limited read access at archive level
BeeGFS + XtreemStore at Automotive Supplier
Scale out NAS device
3 BeeGFS storage nodes
2 Xtreemstore HSM nodes
− 2 physical machines
− 2 virtual machines each
2 Tape libraries with 50 tapes and 2 drives
Productive since February 2016
Installation – Automotive Supplier: Continuous Backup
Clients
Storage Server Linux 1
BeeGFS
Win
CIFS
Win
CIFS
Win
CIFS * n
Internal LAN 10GB
SAMBA Linux BeeGFS Client
SAMBA Linux BeeGFS Client
i.e. 15 TB Disk Raid 5/6
Storage Server Linux 2
BeeGFS
Storage Server Linux 3
BeeGFS
GAM
Disk
T A P E
2
T A P E
1
250 TB – X PB
All Data are synchronized from BeeGFS into GAM – HSM
Currently all data are kept in two file systems..
GAM behaves as Backup for the disk Archive.
Inactive data may be removed from BeeGFS and are available on GAM only.
Everywhere the capacity may be extended on demand.
* n
Meta Data Server
BeeGFS
BeeGFS Client Sync-Tool
BeeGFS Client
Linux
BeeGFS Client
Linux
BeeGFS Client
Linux Additional clients optional
i.e. 15 TB Disk Raid 5/6
i.e. 15 TB Disk Raid 5/6
External LAN 1GB
Functionality
BeeGFS offers scale out file system
− NFS & CIFS (through Samba )
HSM – functionality
− “Old” data are moved to HSM nodes
− Storage in primary nodes is released
Backup – functionality
− “Important” data are copied to HSM nodes
− Files are kept on primary storage nodes
Scratch File System gets Archive + Continuous Backup
Lustre Storage Node
Primary Storage
Lustre Clients
XtreemStore Parallel Data Mover
Master Data Mover
Data Mover
Data Mover
Data Mover
Lustre Storage Node
Scratch File System
Lustre Storage Node
Lustre Clients
Lustre Storage Node
Secondary Storage
Lustre + PDM + Lustre at Intel Lab
Primary Lustre clients are working for HPC applications
Secondary Lustre acts as scale out NAS device
Backup – functionality
− “Important” data are copied to secondary Lustre
− Files are kept on primary storage nodes
− In regular mode files are “read only” on secondary Lustre
− In disaster situations operation may switch to secondary level Lustre
− Primary Lustre may be rebuild from secondary Lustre
YOUR DATA. YOUR CONTROL
www.graudata.com
Recommended