MySQL 101

MYSQL 101JASON NGUYEN

AGENDA

• Introduction

• Storage engine

• SQL Language

• Data types

• Index

• Normalization

• Guidelines for good schema design

INTRODUCTION

• What’s MySQL?

• Since 1995

• Written in C/C++

• RDBMS (Relational Database Management System)

STORAGE ENGINE

• What is it?

• Stores, handles, and retrieves information from a table

• There are two types of storage engines in MySQL: Transactional and non-Transactional

• MySQL provides support for thirteen difference storage engines.

• Most people who use two storage engines: MyISAM and InnoDB

• Default storage engine for MySQL <= 5.5 was MyISAM and other was InnoDB

• Comparison between InnoDB and MyISAM (ref: http://en.wikipedia.org/wiki/Comparison_of_MySQL_database_engines)

• Choosing the right engine

http://en.wikipedia.org/wiki/Comparison_of_MySQL_database_engines)

DATA TYPES INTEGER

DATA TYPES FLOATING-POINT

DATA TYPES DATE AND TIME

DATA TYPES STRING

CHOOSE YOUR NUMERIC DATA TYPES

• Favorites signs of poor design:

• INT(1)

• BIGINT AUTO_INCREMENT

• No UNSIGNED used

• DECIMAL(31,0)


• INT(1) – 1 does not mean 1 digit

• 1 represents client output display format only

• INT is 4 bytes, TINYINT is 1 byte

• TINYINT UNSIGNED can store from 0 – 255

• BIT is even better when values are 0 - 1


• BIGINT is not need for AUTO_INCREMENT

• INT UNSIGNED can store 4.3 billion values

• BIGINT is applicable for some columns

• E.g: Summation of values


• Best practice

• All integer columns UNSIGNED unless there is a reason otherwise

• Adds a level of data integrity for negative values

OTHER DATA TYPE EFFICIENCIES

• TIMESTAMP v DATETIME

• Suitable for EPOCH only values

• TIMESTAMP is 4 bytes

• DATETIME is 8 bytes

OTHER DATA TYPE EFFICIENCIES

• CHAR(n)

• Use VARCHAR(n) for variable values

• E.g. CHAR(128) when storing ~ 10 bytes

APPLICATION DATA TYPE EFFICIENCIES

• Using Codes or ENUM

• A description is a presentation layer function

• e.g. 'M', 'F' instead of 'Male', 'Female’

• e.g. 'A', 'I' instead of 'Active', 'Inactive’

• BINARY(16/20) v CHAR(32/40)

• MD5() or HASH() Hex value with twice the length

• INT UNSIGNED for IPv4 address

• VARCHAR(15) results in average 12 bytes v 4 bytes

NOT NULL

• Saves up to a byte per column per row of data

• Double benefit for indexed columns

• NOT NULL DEFAULT '' is bad design

• Always use NOT NULL unless there is a reason why not

INDEX

• What are indexes for?

• Speed up access for database

• Help to enforce constrains (UNIQUE, FOREIGN KEY)

Queries can ran without any indexes

But it can take a really long time

OVERHEAD OF THE INDEXING

• Indexes are costly; Do not add more than you need

• In most cases extending index is better then adding new one

• Writes: Updating indexes is often major cost of database writes

• Reads: Wasted space on disk and in memory; additional overhead during query optimization

GUIDELINES FOR GOOD SCHEMA DESIGN

• These are only guidelines, not rules, and your code will work if they're not followed. But you'll have an easier path if you adopt them

• DO name your database tables in the singular, not plural

• DON'T prepend db table names to field names

• DON'T include a table prefix in the model class name

• DO name a table's own ID column "id"

• AVOID semantically-meaningful primary key names

• DO define foreign-key relationships in the database schema

• DO name your foreign key fields ending in "id”

THANK YOU!

Data & Analytics

MySQL 101