29
Transactional Storage for MySQL FAST. RELIABLE. PROVEN. InnoDB Internals: InnoDB File Formats and Source Code Structure Heikki Tuuri CEO Innobase Oy Vice President, Development Oracle Corporation MySQL Conference, April 2009 Calvin Sun Principal Engineer Oracle Corporation

Inno Db Internals Inno Db File Formats And Source Code Structure

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Inno Db Internals Inno Db File Formats And Source Code Structure

Transactional Storage for MySQL

FAST. RELIABLE. PROVEN.

InnoDB Internals: InnoDB File Formats and Source Code

Structure

Heikki Tuuri

CEO Innobase Oy

Vice President, Development

Oracle Corporation

MySQL Conference, April 2009

Calvin Sun

Principal Engineer

Oracle Corporation

Page 2: Inno Db Internals Inno Db File Formats And Source Code Structure

Today’s Topics

• Goals of InnoDB

• Key Functional Characteristics

• InnoDB Design Considerations

• InnoDB Architecture

• InnoDB On Disk Format

• Source Code Structure

• Q & A

Page 3: Inno Db Internals Inno Db File Formats And Source Code Structure

Goals of InnoDB

• OLTP oriented

• Performance, Reliability, Scalability

• Data Protection

• Portability

Page 4: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Key Functional

Characteristics

• Full transaction support

• Row-level locking

• MVCC

• Crash recovery

• Efficient IO

Page 5: Inno Db Internals Inno Db File Formats And Source Code Structure

Design Considerations

• Modeled on Gray & Reuter’s “Transactions Processing: Concepts & Techniques”

• Also emulated the Oracle architecture

• Added unique subsystems• Doublewrite

• Insert buffering

• Adaptive hash index

• Designed to evolve with changing hardware & requirements

Page 6: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Architecture

IO

Buffer

File Space Manager

Transaction

Handler API Embedded InnoDB API

Cursor / Row

Mini-

transactionLockB-tree

Page

Server Applications

Page 7: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB On Disk Format

• InnoDB Database Files

• InnoDB Tablespaces

• InnoDB Pages / Extents

• InnoDB Rows

• InnoDB Indexes

• InnoDB Logs

• File Format Design Considerations

Page 8: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Database Files

ibdata files

System tablespace

internaldata

dictionary

MySQL Data Directory

InnoDB

tables

OR innodb_file_per_table

.ibd files

.frm files

undologs

insert buffer

Page 9: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Tablespaces

• A tablespace consists of multiple files and/or raw disk partitions. file_name:file_size[:autoextend[:max:max_file_size]]

• A file/partition is a collection of segments.

• A segment consists of fixed-length pages.

• The page size is always 16KB in uncompressed tablespaces, and 1KB-16KB in compressed tablespaces (for both data and index).

Page 10: Inno Db Internals Inno Db File Formats And Source Code Structure

System Tablespace

• Internal Data Dictionary

• Undo

• Insert Buffer

• Doublewrite Buffer

• MySQL Replication Info

Page 11: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Tablespaces

Extent

Segment

Extent

Extent Extent

an extent = 64 pages

Extent

Trx id

Row

Field 1

Roll pointer

Field pointers

Field 2 Field n

Row

Page

Row

Row

Row Row

Leaf node segment

Tablespace

Rollback segment

Non-leaf node segment

RowRow

Page 12: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Pages

Symbol Value Notes

FIL_PAGE_INODE 3 File segment inode

FIL_PAGE_INDEX 17855 B-tree node

FIL_PAGE_TYPE_BLOB 10 Uncompressed BLOB page

FIL_PAGE_TYPE_ZBLOB 11 1st compressed BLOB page

FIL_PAGE_TYPE_ZBLOB2 12 Subsequent compressed BLOB page

FIL_PAGE_TYPE_SYS 6 System page

FIL_PAGE_TYPE_TRX_SYS 7 Transaction system page

othersi-buf bitmap, I-buf free list, file space header, extent desp page, new allocated page

InnoDB Page TypesInnoDB Page Types

Page 13: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Pages

A page consists of: a page header, a page

trailer, and a page body (rows or other

contents).

Page header

Page trailer

row offset array

Row RowRow

Row

Row

RowRow

Row

Row RowRow

Page 14: Inno Db Internals Inno Db File Formats And Source Code Structure

Page Declares

typedef struct /* a space address */{ulint pageno; /* page number within the file */ulint boffset; /* byte offset within the page */

} fil_addr_t;

typedef struct {ulint checksum; /* checksum of the page (since 4.0.14) */ulint page_offset; /* page offset inside space */fil_addr_t previous; /* offset or fil_addr_t */fil_addr_t next; /* offset or fil_addr_t */dulint page_lsn; /* lsn of the end of the newest

modification log record to the page */PAGE_TYPE page type; /* file page type */dulint file_flush_lsn;/* the file has been flushed to disk

at least up to this lsn */int space_id; /* space id of the page */char data[]; /* will grow */ulint page_lsn; /* the last 4 bytes of page_lsn */ulint checksum; /* page checksum, or checksum magic, or 0 */} PAGE, *PAGE;

Page 15: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Compressed Pages

•InnoDB keeps a “modification log” in each page

•Updates & inserts of small records are written to the log w/o page reconstruction; deletes don’t even require uncompression

•Log also tells InnoDB if the page will compress to fit page size

•When log space runs out, InnoDB uncompresses the page, applies the changes and recompresses the page

Page header

modification log

Page trailer

page directory

compressed data

BLOB pointers

empty space

Page 16: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Rows

prefix(768B) ……

overflowpage

COMACT formatCOMACT format

Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values

overflowpage

… …

DYNAMIC formatDYNAMIC format

20 bytes

Page 17: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Indexes - Primary

●Data rows are stored in the B-tree leaf nodes of a clustered index

●B-tree is organized by primary key or non-null unique key of table, if defined; else, an internal column with 6-byte ROW_ID is added.

xxxxxxxxxxxx----

nnnnnnnnnnnn001001001001----

275275275275276 276 276 276 ––––500500500500

clustered(primary key)

index

501501501501----

630630630630631631631631

----768768768768

769769769769----

800800800800801801801801

----949949949949

950950950950----

xxxxxxxxxxxx

001 001 001 001 ––––500500500500

801 801 801 801 ––––nnnnnnnnnnnn

500 500 500 500 ––––800800800800

PK valuesPK valuesPK valuesPK values001 001 001 001 ---- nnnnnnnnnnnn

Key valuesKey valuesKey valuesKey values501501501501----630630630630+ data for + data for + data for + data for

corresponding rowscorresponding rowscorresponding rowscorresponding rows

……

Primary Index

Page 18: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Indexes - Secondary

● Secondary index B-tree leaf nodes contain, for each key value, the primary keys of the corresponding rows, used to access clustering index to obtain the data

clustered(primary key)

index

clustered(primary key)

index

Secondary index

PK valuesPK valuesPK valuesPK values001 001 001 001 ---- nnnnnnnnnnnn

B-tree leaf nodes, containing data

key valueskey valueskey valueskey valuesA ZA ZA ZA Z

B-tree leaf nodes, containing PKs

Secondary index

key valueskey valueskey valueskey valuesA ZA ZA ZA Z

B-tree leaf nodes, containing PKs

Secondary Index

Page 19: Inno Db Internals Inno Db File Formats And Source Code Structure

DATA

InnoDB Logging

Rollback segments

Log Buffer Buffer Pool

redo

logrollback

Log File

#1Log File

#2

log thread

write thread

log filesibdata files

Page 20: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Redo Log

Redo log structure:

Space id PageNo OpCode Data

end of log

min LSN

start of loglast checkpoint

Page 21: Inno Db Internals Inno Db File Formats And Source Code Structure

File Format Management• Builtin InnoDB format: “Antelope”

• New “Barracuda” format enables compression,ROW_FORMAT=DYNAMIC

• Fast index creation, other features do notrequire Barracuda file format

• Builtin InnoDB can access “Antelope”databases, but not “Barracuda”databases

• Check file format tag in system tablespace on startup

• Enable a file format with new dynamic parameter innodb_file_format

• Preserves ability to downgrade easily

.ibddata files

(file pertable)

Page 22: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB File Format Design

Considerations• Durability

• Logging, doublewrite, checksum;

• Performance• Insert buffering, table compression

• Efficiency• Dynamic row format, table compression

• Compatibility• File format management

Page 23: Inno Db Internals Inno Db File Formats And Source Code Structure

Source Code Structure

• 31 subdirectories

• Relevant InnoDB source files on file formats• Tablespace: fsp0fsp {.c, .ic, .h}

• Page: page0page, page0zip {.c, .ic, .h}

• Log: log0log {.c, .ic, .h}

Page 24: Inno Db Internals Inno Db File Formats And Source Code Structure

Source Code Subdirectories

• buf

• data

• db

• dict

• dyn

• eval

• fil

• fsp

• fut

• ha

• handler

• ibuf

• include

• lock

• log

• math

• mem

• mtr

• os

• page

• pars

• que

• read

• rem

• row

• srv

• sync

• thr

• trx

• usr

• ut

Page 25: Inno Db Internals Inno Db File Formats And Source Code Structure

Summary:

Durability, Performance,

Compatibility & Efficiency

• InnoDB is the leading transactional storage engine for MySQL

• InnoDB’s architecture is well-suited to modern, on-line transactional applications; as well as embedded applications.

• InnoDB’s file format is designed for high durability, better performance, and easy to manage

Page 26: Inno Db Internals Inno Db File Formats And Source Code Structure

For More Information …

2009 MySQL User Conference

InnoDB Birds of a Feather

Wed 7:30pm

Ballroom C

• Heikki Tuuri: Concurrency Control: How it Really Works, Thurs, 2:50pm

Please visit www.innodb.com,

blogs.innodb.com and forums.innodb.com

Page 27: Inno Db Internals Inno Db File Formats And Source Code Structure

Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S

Page 28: Inno Db Internals Inno Db File Formats And Source Code Structure

companyan

Embedded

Hot Backup

Plugin

Page 29: Inno Db Internals Inno Db File Formats And Source Code Structure

InnoDB Size Limits

• Max # of tables: 4 G

• Max size of a table: 32TB

• Columns per table: 1000

• Max row size: n*4 GB • 8 kB if stored on the same page

• n*4 GB with n BLOBs

• Max key length: 3500

• Maximum tablespace size: 64 TB

• Max # of concurrent trxs: 1023