View
2
Download
0
Category
Preview:
Citation preview
Data Management Systems
• Storage Management
• Memory hierarchy
• Segments and file storage
• Database buffer cache
• Storage techniques in context
• Basic principles
• Blocks instead of pages
• Tablespaces, segments, extents
• Updates, free space
Gustavo Alonso
Institute of Computing Platforms
Department of Computer Science
ETH Zürich Storage - Segments and File storage 1
Storage in databases
• Remember the two key guarantees provided by a database in regard to storage• Data is persistent
• Data is recoverable
• Even when failures occur!!
• Add as well the following property:• Physical data independence
• Taken together these three aspects play a big role on how databases store data to maintain these properties while still achieving the necessary performance.
Storage - Segments and File storage 2
Storage Management
Physical storage
Blocks, files, segments
Pages in memory
Physical records
Logical records (tuples)
Logical data (tables, schemas)
Relations, views
Queries, Transactions (SQL)
Record Interface
Record Access
Page access
File Access
Application
Logical view (logical data)
Access Paths
Physical data in memory
Page structure
Storage allocation
3Storage - Segments and File storage
Disclaimers
• As usual: • Standard database architectures based on slow, hard drive disks (HDD) with a
high latency for seek operations (random access)
• Assuming most of the database is not in memory
• Today, some things are different:• More main memory available
• Different storage media (SSD, NVM, network attached storage)
• Principles remain the same and illustrate the underlying problem rather than a particular implementation
Storage - Segments and File storage 4
Explanations using a real system
• Many of the explanations that follow are based on Oracle’s database• Cover all the basics• Provide a good example of how a real system works• Demonstrate the many tuning knobs available in a database• Prove why some people consider databases too complicated (more about this later in the course)
• Ideas and concepts are nevertheless generic and they are somewhat similar across systems
• Logical storage:• https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage-
structures.html#GUID-4AF2D61A-8675-4D48-97A4-B20F401ADA16• https://docs.oracle.com/cd/B19306_01/server.102/b14220/logical.htm
• Disk storage:• https://docs.oracle.com/cd/B19306_01/server.102/b14220/physical.htm• https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/physical-storage-
structures.html#GUID-008A1F08-9C75-4E9F-A70B-41FB942C60B4
Storage - Segments and File storage 5
Problem statement
• A database is doing many things at the same time (in the same way an Operating System is managing many different user processes at the same time)
• Each “thing” (a query, a system process, a database component) active at any point in time needs its own logical view of the data (and correspondingly, of memory, and disk). This is the same as the OS giving a process the impression it is alone in the machine.
• A database engine creates such virtual, logical views of the system using its own mechanisms (and different from the OS)
Storage - Segments and File storage 6
Logical and Physical storage in Oracle 19
• Entity-relationship diagram (crow’s foot implies one-to-many)
• Logical:• Tablespaces
• Segments
• Extent
• Block
• Physical:• Data File
• OS block
Storage - Segments and File storage 7
https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage-structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D
Tablespaces
• A tablespace is a logical data unit in the database:• Schema related:
• A table• An index• Several tables (clustered tables)
• Engine related:• Data structures for the database engine (result buffers, undo buffers, etc.)
• A tablespace provides a logical representation of the principle of spatial locality (keep together what belongs together)• Does not necessarily mean all data is continuous• It means all the information and all the data of a tablespace is under the same
umbrella
• Space (memory/disk) is allocated to tablespaces
Storage - Segments and File storage 8
Segments, Extents, and Blocks (example)
Storage - Segments and File storage 9
https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage-structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D
Generalizing
• A given logical object in a database (a table, an index) is “stored” in a tablespace
• A tablespace is organized into segments (each schema object has a segment)
• Segments have space allocated to them in the form of extents. A segment can have several extents
• Extents are sets of contiguously allocated data blocks. Extents are mapped to one data file (and typically to a file)
• Data blocks are the smallest allocation unit of space (not necessarily a page, databases typically use blocks larger than an OS page). The size of the blocks is a tunable parameter.
Storage - Segments and File storage 10
Translating
• Tablespace = keep together what needs to be kept together
• Segment = an object in the schema (or part of an object if it is partitioned)
• Extent = groups of continuously allocated pages
• Blocks = glorified pages
• Why so complicated?:• Tablespaces = logical locality• Segments = allocate (virtual) space to objects• Extents = allocate contiguous physical space• Blocks = the unit of space allocation
Storage - Segments and File storage 11
More explanations and analogies
• Tablespaces provide a logical unit to refer to the storage of well defined entities (a table, an index, a buffer, etc.)
• Virtual space is allocated to a Tablespace in a segment. A segment acts as a form of virtual memory where everything belonging to the same entity appears continuous and can be treated as a single unit
• Actual space is allocated to a segment through extents. Extents are sets of blocks that are physically contiguous on storage and can be allocated as a whole
• The space in a segment is divided into blocks which typically correspond to several OS pages.
Storage - Segments and File storage 12
Segments I
• A segment allocates the equivalent of virtual memory to a tablespace
• Common setup:• Tablespace -> table
• Tablespace has one segment
• Segment has one or more extents
• These structures, like all examples that follow, are created to provide physical data independence. Higher layers only need to know in which tablespace a table is. This allows the engine to change everything below by simply changing the pointers to the segment, to the extents, etc.
Storage - Segments and File storage 13
Segments II
• The mapping of a segment to a tablespace simplifies the manipulation of database entities:• CREATE TABLE T• DROP TABLE T• CREATE INDEX I ON table_name (attributes)• DROP INDEX I
• If we need more space for a table or an index, the table is still referred to by its segment, but the segment indexes as many extents as needed.
• If we partition a table, each partition has its own segment but the table is still referred to by its tablespace
Storage - Segments and File storage 14
Extents
Storage - Segments and File storage 15
• Extents provide optimal access by ensuring logically continuous allocation of blocks (blocks can end up non continuous on disk because of RAID, for instance)
• Acts as a form of virtual address space:• Allocate and release blocks to the
extent• Easy sequential search• Can be dropped as an unit https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage-
structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D
The big square is a data file!
Space available in the data file
Dynamic extents
• When a segment is created, it is allocated an extent (how big? => tunable parameter)
• If more space is needed, another extent is created
• The new extent is not contiguous with the previous one and can be on a different data file
Storage - Segments and File storage 16
The big square is a data file!
Why extents?
• Better than the alternatives:
Storage - Segments and File storage 17
Static file mapping
Segment (starting address, size)
Easy to maintainHighly efficient (performance)Poor utilization (space)No flexibility
Dynamic extents Dynamic block mapping
Difficult to maintainNon contiguousPoor performanceMaximum flexibility
Why extents?
• Databases must optimize along many dimensions.
• A static file mapping is very easy to manage but induces fragmentation and provides no flexibility
• Dynamic block mapping is extremely flexible but data is not contiguous and is expensive to maintain
• Extents are a compromise:• An extent provides a static mapping to a set of blocks (like a static file
mapping)
• When more space is needed, extents are dynamically allocated (like with dynamic block mapping)
Storage - Segments and File storage 18
Managing space in extents
• Extents are collections of blocks. • How large should a extent be?• What do we do when we need more extents?
• Modern databases offer many tunable parameters to control extents• Uniform allocation: all extents are of the same size (tunable) and when more
space is needed, another extent is added• Automatic: when more space is needed, a new extent is added and its size
optimized according to come criterion.
• Typical database design trade-off: we want a very large extent to have the data contiguous but that induces fragmentation and it is not efficient. Instead, allocate groups of contiguous blocks as needed.
Storage - Segments and File storage 19
Allocating and sizing new extents
• A good way to allocate new extents is to increase their size exponentially (~1.25)
• Better to allocate a bit more than to be constantly allocating new extents (if a table grows, it is likely to keep growing)
• Bounds the size of the extent directory
Storage - Segments and File storage 20
Dynamic extents
Blocks vs OS pages
• Operating systems deal with pages to implement virtual memory
• Databases typically store a lot of data:• A table, an index might span many OS pages• Allocating space page by page at OS granularity is too much overhead• If one OS page belongs to a table, it is likely we will also be interested in the
other pages allocated to that table (locality)
• Hence, databases use blocks rather than OS pages
• This is why, although most databases today use the OS I/O system for writing to disk, they manage their own files and memory representation
• Later on we will see the structure of database pages in memory
Storage - Segments and File storage 21
Block structure: slotted pages
• A block is structured as follows:
• Header: address and type of segment (index, table, etc.)
• Table directory: schema of the table stored in the block
• Row directory: pointers to the actual tuples stored in the block
• Free space
• Row data (tuples) stored in the block
• The directory grows downwards, the space for tuples is used upwards
Storage - Segments and File storage 22
Slotted pages
• The row directory allows to insert tuples anywhere in the block (and change their position) while maintain a simple addressing schema:
(block, slot)
• How the space within a block is managed depends on many optimizations
• We will look into the block structure in more detail later on
Storage - Segments and File storage 23
Optimizing the use of blocks I
• Percentage Free• Determines how much space in
each block is reserved for updating tuples instead of using it for storing new tuples
• This is needed because an update can result in a bigger tuple than the original one
UPDATE TSET Adress = “AStreetWithAVeryLongName”WHERE LegiNr = 12345678
Storage - Segments and File storage 24
Optimizing the use of Blocks II
• Percentage used• Determines how much space needs to
be free in a block before the free space can be used to insert new tuples
• Blocks are unavailable to inserting new tuples until they have the given amount of free space
• It is needed because if updates can make tuples smaller (freeing up space), they can also make them bigger (needing space). The combination of both parameters avoids thrashing on the page
Storage - Segments and File storage 25
Storage - Segments and File storage 26
Fragmentation within blocks
• Like any unit of storage, blocks can suffer form fragmentation. However, compaction is a very expensive procedure
• Compaction often done only when the block has enough space for an INSERT or UPDATE but the space is not contiguous. Otherwise, the available space is used without reorganizing the tuples in the block.
• UPDATE might require to move a tuple from one block to another if the new size does not fit in the block where the tuple is. In such a case, the tuple is inserted into a new block and the original space is used to store a pointer to the new row. The ID of the tuple does not change (which means an indirection is needed to find the tuple).
Storage - Segments and File storage 27
Where to find space?
• A segment contains one or more free lists• Not done at the extent level because the search for space would be more
complex
• A free list contains pointers to blocks that have usable free space
• Using several free lists helps avoiding contention when performing parallel inserts or updates
• The free list is updated as transactions execute INSERT, DELETE, or UPDATE statements using the rules established with PCTFREE and PCTUSED
Storage - Segments and File storage 28
Part 2Part 2
Tables have no order
• Recall that tables have no order on the tuples?• A property of the relational model
• A consequence of how data is organized
• Example with extents
Storage - Segments and File storage 29
Table T Extent
Part 1 Part 1
Part 3
Part 3
Part 4
Part 4
SELECT * FROM TORDER BY …
The art of writing to disk
• Databases provide concurrency control and recovery.• Concurrency control: database is correct even if several transactions modified
the data at the same tie
• Recovery: the state of the database can be recovered in the event of failures
• This makes relational database engines unique and imposes a lot of design and performance constraints
• It is, nevertheless, one of the major advantages of database engines
• We will study concurrency control and recovery in more detail later on, here we just point out some of the implications in terms of dealing with storage.
Storage - Segments and File storage 30
Shadow paging
Storage - Segments and File storage 31
SEGMENT
Data
Free List
EXTENT
1 2 3 4 5 6 7
1 2 3 4 5 6 7
8 9
10 11 12 13 14 15 16 17 18
8 9 10 11 12 13 14 15 16 17 18
UPDATE TSET attribute_1 = 1000WHERE ID = 17
Modifies a tuple in page 5
Shadow paging
Storage - Segments and File storage 32
SEGMENT
Data
Free List
EXTENT
1 2 3 4 5 6 7
1 2 3 4 5 6 7
8 9
10 11 12 13 14 15 16 17 18
8 9 10 11 12 13 14 15 16 17 18
UPDATE TSET attribute_1 = 1000WHERE ID = 17
Modifies a tuple in page 5
Creates a copy of page 5 and puts it into page 8
5’
Does the update on the new page
Shadow paging
Storage - Segments and File storage 33
SEGMENT
Data
Free List
EXTENT
1 2 3 4 5 6 7
1 2 3 4 8 6 7
8 9
10 11 12 13 14 15 16 17 18
8 9 10 11 12 13 14 15 16 17 18
UPDATE TSET attribute_1 = 1000WHERE ID = 17
Modifies a tuple in page 5
Creates a copy of page 5 and puts it into page 8
Does the update on the new page
When the transaction commits, makes page 8 the correct page and marks the old one as free
5
Shadow Paging, pros and cons
• Shadow paging is good for:• Recovery: it is enough to make sure that only clean pages (committed) are
written to disk and then the disk contains only clean data
• Recovery: undo of an uncommitted transaction is easy, throw the page away
• Concurrency Control: A transaction can use the copies it makes without interfering with queries which are reading the original copies (snapshot isolation)
• Shadow paging is bad because:• Page access and management is complicated (clean vs dirty pages, copy a
page, refresh lists on the segment when transaction commits, etc.)
• Every update ends up moving a page somewhere else, destroying locality
Storage - Segments and File storage 34
When to use shadow paging
• Shadow paging makes sense when the overhead of creating a copy of the page and managing the copies is small enough• Main memory (e.g., use the Copy on Write feature of the OS)
• Possibly with Flash and NVM
Storage - Segments and File storage 35
Delta Files
• Using delta files, before modifying a page, one makes a copy of the page
• The copy is stored in a delta file and can be used to undo the changes
• The update is performed directly on the original page
• The copy in the delta file can be discarded when no longer needed
Storage - Segments and File storage 36
EXTENT
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
EXTENT
1 2 3 4 5’ 6 7 8 9
10 11 12 13 14 15 16 17 18
5
DELTA
Implementing delta files
• Delta files are used for many different purposes and in many different ways• Keep the old data in the delta file
• Favors commits, simplifies undo, allows looking at older data
• Keep the new data in the delta file• Favors aborts, allows to delay the propagation of updates
• Oracle• used rollback segments (segments specifically used to store the old copy of
the data)
• Now it uses undo-tablespaces
Storage - Segments and File storage 37
Summary Segments and file storage
• I/O is a big component of the performance of a database engine
• Many different architectures and possibilities
• Many opportunities for optimization and trade-offs
• Plays a big role in concurrency control and recovery (see later)
• Plays a big role in how data is accessed and manipulated (see later)
• Many details changing as storage evolves from disk to Flash to NVM to network attached storage …
Storage - Segments and File storage 38
Recommended