20
Storage on the Lunatic Fringe Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium [email protected]

Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Storage on theLunatic Fringe

Thomas M. Ruwart

University of Minnesota

Digital Technology Center

Intelligent StorageConsortium

[email protected]

Page 2: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Orientation

• Who are the lunatics?

• What are their requirements?

• Why is this interesting to the StorageIndustry?

• What is SNIA doing about this?

• Conclusions

Page 3: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Who are the Lunatics?

• DoE Accelerated Strategic Computing Initiative (ASCI)! BIG data, locally and widely distributed, high bandwidth access,

relatively few users, secure, short-term retention• High Energy Physics (HEP) – Fermilab, CERN, DESY

! BIG data, locally distributed, widely available, moderate number ofusers, sparse access, long-term retention

• NASA – Earth Observing System Data Information Systems(EOSDIS)! Moderately sized data, locally distributed, widely available, large

number of users, very long-term retention• DoD – NSA

! Lots of little data – trillions of files, locally distributed, relatively fewusers, secure, long-term retention

• DoD – Army High Performance Computing Centers and NavalResearch Center! BIG data, locally and widely distributed, relatively few users, high

bandwidth access, secure, very long term reliable retention

Page 4: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

A bit of History• 1990 – Supercomputer Centers operating with HUGE

disk farms of 50-100 GB!

• 1990 – Laptop computers have 50MB internal diskdrives!

• 1992 – Fast/wide SCSI runs at break-necking speeds of20 MB/sec!

• 1994 – Built a 1+TB array of disks with a single SGI xFSfile system and wrote a single 1TB file! Used 4GB disks in 7+1 RAID 5 disk arrays! 36 disk arrays mounted in 5 racks

• 1997 ASCI Mountain Blue - 75TB – distributed

• 2002 ASCI Q – 700TB – online, high performance,pushing limits of traditional [legacy] block-based filesystems

Page 5: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

The not-too-distant Future

• 2004 ASCI Red Storm – 240TB – online, highbandwidth, massively parallel

• 2005 ASCI Purple – 3000TB – online, highperformance, OSD/Lustre

• 2006 NASA RDS – 6000TB – online, global access,CAS,OSD, Data Grids, Lustre?

• 2007 DoE Fermi Lab / CERN – 3 PB/year online /nearline, global sparse access

• 2010 Your laptop will have a 1TB internal disk thatwill still be barley adequate for MS Office™

Page 6: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

DoE ASCI

• 1998 – Mountain Blue – Los Alamos! 48 128-Processor SGI Origin 2000 systems! 75TB disk storage

• 2002 – Q! 310 32-processor machines + 64 32-processor I/O nodes! 2048 2GB FC connections to 64 I/O nodes! 2048 2GB FC connections to disk storage subsystem! 692 TB disk storage, 20GB/sec bandwidth

! 2 file systems of 346GB each! 4 file system layers between the application! and the disk media

• 2004 – Red Storm! 10,000 processors, 10TB Main Memory! 240TB Disk, 50 GB/sec bandwidth

Page 7: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

DoE ASCI Purple Requirements

• Parallel I/O Bandwidth - Multiple (up to 60,000) clients accessone file at hundreds of GB/sec.

• Support for very large (multi-petabyte) file systems

• Single files of multi-terabyte size must be permitted.

• Scalable file creation & Metadata Operations! Tens of Millions of files in one directory! Thousands of file creates per second within the same

directory

• Archive Driven Performance - The file system should supporthigh bandwidth data movement to tertiary storage.

• Adaptive Pre-fetching - Sophisticated pre-fetch and write-behindschemes are encouraged, but a method to disable them mustaccompany them.

• Flow Control & Quality of I/O Service

Page 8: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

HEP – Fermilab and CMS

• The Compact Muon Solenoid (CMS)! $750M Experiment being built at CERN in Switzerland! Will be active in 2007! Data rate from the detectors is ~1 PB/sec! Data rate after filtering is ~hundreds of MB/sec

• The Data Problem! Dataset for a single experiment is ~1PB! Several experiments per year are run! Must be made available to 5000 scientists all over the

planet (Earth primarily)! Dense dataset, sparse data access by any one user! Access patterns are not deterministic

• HEP experiments cost $US 1B, last 20 years, involvethousands of collaborators at hundreds of institutions world-wide, and collect and analyze several petabytes of data peryear

Page 9: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Tier 1

Tier2 Center

Online System

eventreconstruction

French Regional

Center

German

Regional Center

InstituteInstituteInstituteInstitute~0.25TIPS

Workstations

~100MBytes/sec

~0.6-2.5 Gbps

100 - 1000Mbits/sec

Physics data cache

~PByte/sec

~2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~0.6-2.5 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center

LHC Data Grid HierarchyCMS as example, Atlas is similar

Tier 2

CERN/CMS data goes to 6-8 Tier 1 regional centers,and from each of these to 6-10 Tier 2 centers.

Physicists work on analysis “channels” at 135institutes. Each institute has ~10 physicists workingon one or more channels.

2000 physicists in 31 countries are involved in this20-year experiment in which DOE is a major player.

CMS detector: 15m X 15m X 22m

12,500 tons, $700M.

human=2m

analysis

eventsimulation

Italian CenterFermiLab, USA

Regional Center

CourtesyHarvey

Newman,CalTech and

CERN

Page 10: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

NASA EOSDIS

• Remote Data Store Project:! Build a 6PB Data archive with a life expectancy of at

least 20 years, probably more! Make data and data products available to 2 million

users

• What to use?! Online versus Nearline! SCSI vs ATA! Tape vs Optical! How much of each and when?

• Data Grids?

• Dealing with Technology Life Cycles – continual migration

Page 11: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

DoD NSA

• How to deal with a trillion files?! At 256 bytes of metadata per file -> 256TB

just for the file system metadata for onetrillion files

! File System resiliency! Backups? Forget it.

• File Creation Rate is a challenge – 32,000 fileper second for 1 year will generate 1 trillion files

• How to search for any given file

• How to search for any given piece ofinformation inside all the files

Page 12: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

DoD MSRC

• 500TB per year data growth

• Longevity of data retention is critical! 100% reliable access of any piece of

data for 20+ years

• Security is critical

• Reasonably quick access to any pieceof data from anywhere at any time

• Heterogeneous computing and

storage environment

Page 13: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

History has shown…

• The problems that the Lunatic Fringe isworking on today are the problems thatthe main-stream storage industry willface in 5-10 years

• Legacy Block-based File Systems breakat these scales

• Legacy Network File System protocolscannot scale to meet these extremerequirements

Page 14: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Looking Forward

Page 15: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

What happens when….

• NEC Announces a 10Tbit Memory Chip

• Disk drives reach 1TByte and beyond

• MEMS devices become commercially viable

• Holographic Storage Devices becomecommercially viable

• Interface speeds reach 1Tbit/sec

• Intel develops the sub-space channel

•• Vendors need better ways to exploit theVendors need better ways to exploit thecapabilities of these technologies rather thancapabilities of these technologies rather thanreact to themreact to them

Page 16: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Common thread

• Their data storage capacity, access,and retention requirements arecontinually increasing

• Some of the technologies andconcepts the Lunatic Fringe arelooking at include:! Object-based Storage Device! Intelligent Storage! Data Grid! Borg Assimilation Technologies,

…etc.

Page 17: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

How does SNIA make a difference?

• Act as a point to achieve critical mass behindemerging technologies such as OSD, SMI, andIntelligent Storage

• Make sure that these emerging technologiescome to market from the beginning asstandards (not proprietary implementations thatmigrate to standards)

• Help to get beyond the potential barrier foremerging technologies OSD and IntelligentStorage

• Help to generate vendor and user awarenessand education regarding future trends andemerging technologies

Page 18: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Conclusions

• Lunatic Fringe users will continue to push thelimits of existing hardware and softwaretechnologies

• Lunatic Fringe is a moving target – there willalways be a Lunatic Fringe well beyond whereyou are

• The Storage Industry at large should pay moreattention to! What they are doing! Why they are doing it! What they learn

Page 19: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

References

• University of Minnesota DigitalTechnology Center – www.dtc.umn.eduASCI – www.llnl.gov/ASCI/platforms

• Fermilab – www.fnal.gov

• NASA EOSDIS – www.nasa.gov

• NSA – www.dod.mil

Page 20: Storage on the Lunatic Fringe - DTC · ¥ Support for very large (multi-petabyte) file systems ¥ Single files of multi-terabyte size must be permitted. ¥ Scalable file creation

Contact Info

[email protected]