Data Management for Quantitative Biology - Database Systems (continued) LIMS and E-lab books by Dr....

Dr. Sven Nahnsen/Dr. Marius Codrea,

Quantitative Biology Center (QBiC)

Data Management for Quantitative Biology

Lecture 5: Database systems (continued)

LIMS and E-lab books

Many database design & concepts

http://dataconomy.com/wp-content/uploads/2014/07/fig2large.jpg2

Databases

DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database

DB = DB + data model for the application at hand (business logic) + implementation

DB = DB + database management system (DBMS). Software than enables:

• Create entries

• Read (retrieve)

• Update / edit

• Delete

DB = DB + Administration (User privilages, monitoring)

Selected database systems

I. Relational databases

II.NoSQL databases

MongoDB

Specific characteristics MongoDB vs MySQL

More details here: http://db-engines.com/en/system/MongoDB%3BMySQL

System Property MongoDB MySQL

Initial release 2009 1995

Current release 3.0.2, April 2015 5.6.24, April 2015

Triggers No Yes

MapReduce Yes No

Foreign keys No Yes

Transaction concepts No ACID*

*A database transaction, must be Atomic, Consistent, Isolated and Durable.

Fields

Record 1

Record 6

Primary keyPrimary

Foreign KeyRef

Mice.Mouse_number

● The values of the primary keys uniquely identifies the rows of the table● The foreign key uniquely links the rows of the host table to 1 record in the referencing table

Mice table

Samples table

Terminology - Relational databases

Mice tableSamples table

Samples are RELATED to mice

1:N one-to-many relationship

Relational databases (Normalization)

Foreign Keys

CREATE TABLE samples ( Sample_ID SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT, Mouse_number SMALLINT UNSIGNED NOT NULL, Timepoint VARCHAR(15) NOT NULL, PRIMARY KEY (Sample_ID), FOREIGN KEY (Mouse_number) REFERENCES mice(Mouse_number) ON DELETE CASCADE )ENGINE=InnoDB DEFAULT CHARSET=utf8;

Queries

SELECT * , COUNT(*) as count_per_gender from mice group by Gender, Treatment;

“How many males and how many females per treatment?”

JOIN queries

SELECT Sample_ID, Treatment, Timepoint, mice.Mouse_number from samples join mice

on samples.Mouse_number = mice.Mouse_number where mice.Mouse_number=2;

“What samples do I have from mouse number 2?”

Relational „facts“

1.Rigid schema (once the structure is defined, it may be difficult to adjust)

2.Normalization introduces/requires additional tables, joins, indices and it scatters data

3.Each field in each record has a single value of a pre-defined type

Mice tableSamples table

1:M one-to-many relationship

Relational „facts“ 1

Generalization to other Projects/Experiments in the lab?

Rigid schema (once the structure is defined, it may be difficult to adjust)

Organisms table

Samples tableBROKEN 1:N one-to-many relationship

Mice table

Deleted relationship

CREATE TABLE samples ( Sample_ID SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT, Mouse_number SMALLINT UNSIGNED NOT NULL, Timepoint VARCHAR(15) NOT NULL, PRIMARY KEY (Sample_ID), FOREIGN KEY (Mouse_number) REFERENCES mice(Mouse_number) ON DELETE CASCADE )ENGINE=InnoDB DEFAULT CHARSET=utf8;

Organisms table

Projects table

Relational „facts“ 1 Users table

Many users can be involved in many projects. With many roles?

Projects_Users table

Projects table

Users table

Normalization introduces/requires additional tables, joins, indices and scatters data

Projects_Users table

CREATE INDEX usr on Project_Users (User_ID);

Each field in each record has a single value of a pre-defined type

Primary Key Field 1 Field 2 Field 3

A 2-D map (tuples)

A single value ?!?

What if a person has 2 affiliations and thus 2 addresses, 2 phone numbers, etc?

Normalization? Again?

DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database

● Can we allow for “some” heterogeneity of the data?

● Can the records be highly similar but not necessarily identical? (e.g., most of the users having just 1 phone number but others more?)

MongoDB is a document-oriented DB

{ Mouse_number: “1”, Gender: “Male”, Age: 3, Treatment: “Vitamin A” }

Field:value pairs

Document ~ Record

http://www.mongodb.org/

MongoDB Documents

{ Mouse_number: “1”, Gender: “Male”, Age: 3, Treatment: “Vitamin A” }

Field:value pairs

Documents are BSON files (binary JSON)

Closely resemble structures in programming languages (key-value association)

Each field can be

● NULL● Single value (integer, string, etc)● An array of many values● Other embedded documents● A reference to another document

MongoDB Collections

Documents are stored in Collections

Collection ~ Table

{ Mouse_number: “6”, Gender: “Female”, Age: 2, Treatment: “Vitamin B” , }

Different representation – The challenge remains the same:

Model the relationships between data

Organisms

Projects

AffiliationsSamples

Design & operational mechanisms

● Primary Key

● Foreign Key

● Join Tables

MongoDB

● Unique ID

● References

● Embedding

MongoDB – Field types

{ _id: <ObjectID1> Username: { first_name: “Hans”, last_name: “Meyer” }, Gender: “Male”, Age: 30, Phones: [“+490777”, “+350777”],

Affiliations_id: <UUID_affiliation>}

Users document

● array

● embedded document

● reference

● Unique ID

● Unique ID _id: <ObjectID1>

Acts as a primary key

ObjectId is a 12-byte BSON type, constructed using:

● a 4-byte value representing the seconds since the Unix time,● a 3-byte machine identifier,● a 2-byte process id, and● a 3-byte counter, starting with a random value.

http://docs.mongodb.org/manual/reference/object-id/

ObjectId("507f1f77bcf86cd799439011")

● array Phones: [“+490777”, “+350777”]

● Upon indexing, each value in the array is in the index

● Query for ANY matching value

{ _id: <ObjectID1> Username: { first_name: “Hans”, last_name: “Meyer” },

Gender: “Male”,}

● embedded document

● Pre-joined data?

● Can be indexed

● Query at any level on any field

{ _id: <ObjectID1> Username: { first_name: “Hans”, last_name: “Meyer” }, Gender: “Male”, Age: 30, Phones: [“+490777”, “+350777”],

Affiliations_id: <UUID_affiliation>}

Users document

● reference

Affiliations document

{ _id: <UUID_affiliation> Name: “My lab”, Address: “Tübingen”}

Where is the catch?

● "In MongoDB, write operations are atomic at the document level, and no single write operation can atomically affect more than one document or more than one collection."

● OK, then references (normalized model) are not really Foreign Keys that the DB engine resolves. "Client-side applications must issue follow-up queries to resolve the references".(see next slide)

● “A denormalized data model with embedded data combines all related data for a represented entity in a single document. This facilitates atomic write operations since a single write operation can insert or update the data for an entity.”

● OK, denormalize. Maximum default document size is 16MB.

http://docs.mongodb.org/manual/

Foreign key „ON DELETE CASCADE“

“Mouse number 3 went wrong. Let's just delete it.”

SELECT * from samples;

DELETE from mice where Mouse_number = 3;

SELECT * from samples;

Where are these two samples gone?

The key challenge

Find the right structure of the documents (references and embedded documents) that best fit

● the requirements of the application (queries, updates) -data usage

● the performance of the database engine

Model the relationships between data

Organisms

Projects

AffiliationsSamples

Model the relationships between data 1:N

Organisms

Samples

OrganismsSample_ids: [ ]

SamplesOrganism_id:

Depends on the most frequent question?

● What samples do I have from Organism X ?● Where Sample Y came from?

● How many samples? Reach the 16MB limit?

● Organism embeds multiple samples

Each field in each record has a single value of a pre-defined type

Primary Key Field 1 Field 2 Field 3

A 2-D map (tuples)

MongoDB

Nested documents

_id Field 1 Field 2 Field 3

Queries

{ _id: 4, Project_ID: 2, Species: “human”, Gender :””, Age: 30, Treatment:”Vaccine A”}

Organisms

db.organisms.insert( { Project_ID: 2, Species: “human”, Gender :””, Age: 30, Treatment: ”Vaccine A”}

Queries

Organisms

db.organisms.find( { Project_ID: { $eq : 2} })

SELECT * from organismsWHERE Project_ID = 2;

Queries

Organisms db.organisms.find( { $and: [{Species: /h.*/}, {Age: {$gt: 20 }}]})

SELECT * from organisms WHERE Species like 'h%' AND Age > 20;

Schema flexibility

Organisms

{ _id: 14, Project_ID: 5, Species: “human”, Gender :”Female”, Age: 10, Genetic_background: “WT”}

Data IS the schema!

Queries

Organisms db.organisms.find( { Genetic_background: $exists: true } })

SELECT ???

Model the relationships between data 1:N

Organisms

Samples

OrganismsSample_ids: [ ]

SamplesOrganism_id:

● Organism embeds multiple samples

MongoDB

Nested documents

_id Field 1 Field 2 Field 3

Queries

{ _id: 4, Project_ID: 2, Species: “human”, Gender :””, Age: 30, Samples: [ { _id: 10, Timepoint:”5h”},

{ _id: 11, Timepoint:”24h” } ], Treatment:”Vaccine A”}

Organisms

db.organisms.find( { '_id': '4', 'Samples._id':'11' } )

db.organisms.find( { '_id': '4', 'Samples.Timepoint':'5h' } )

Summary

● Database design requires technical and substantial domain specific knowledge

● Normalization

● Indices

● Primary Key

● Foreign Key

● Join Tables

MongoDB

● Unique ID

● References

● Embedding

Hint: http://en.wikipedia.org/wiki/Category:Web_application_frameworks

Laboratory information management system (LIMS)

Organisms

Projects

AffiliationsSamples

An underlying data structure of a simple LIMS design

LIMS definition

http://en.wikipedia.org/wiki/Laboratory_information_management_system

„A Laboratory Information Management System (LIMS), sometimes referred to as a Laboratory Information System (LIS) or Laboratory Management System (LMS), is a software-based laboratory and information management system that offers a set of key features that support a modern laboratory's operations.“

LIMS properties and functionality

http://en.wikipedia.org/wiki/Laboratory_information_management_system

● Meta data of any sample entering the laboratory

● Tracking of processes throughout sample treatment and preparation; scheduling of the sample and the associated analytical workload

● Quality control associated with the sample and the utilized equipment and inventory

● Inspection, approval, and compilation of the sample data for reporting and/or further analysis

Advantages of LIMS

● Fewer transcription errors

● Faster sample processing

● Real-time control of data and metadata

● Reproducibility of experimental processes

● Direct electronic reporting to clients

● Despite many advantages,...

Disadvantages of LIMS

● Customization of LIMS

● Interface is required

● Adequate validation to ensure data quality

With a good LIMS in place, we can consider Electronic Laboratory Notebooks

Electronic laboratory notebooks (ELN)

http://en.wikipedia.org/wiki/Electronic_lab_notebook

An electronic lab notebook (also known as electronic laboratory notebook, or ELN) is a computer program designed to replace paper laboratory notebooks. Lab notebooks in general are used by scientists, engineers and technicians to document research, experiments and procedures performed in a laboratory. A lab notebook is often maintained to be a legal document and may be used in a court of as evidence.

Prominent use-case: review process

http://rushthecourt.net/mag/wp-content/uploads/2010/09/Three-Ring-Binders.jpg

● You submit a paper

● Several months of review process is not unlikely

● Reviewers ask for a more detailed description of the experiments you did two years back

Traditional Paper Lab Books

ELN, a survey

Journal of Laboratory Automation 18(3) 229–234, 2012 Society for Laboratory Automation and Screening

DOI: 10.1177/2211068212471834

Examples of ELN software

Practical issues

● Lab technicians “have only two hands”

● Labs are often not equipped with desktop PCs

● Data security of ELNs opposes challenges

● Scientists are classically reluctant adopters

● There is activation energy required to change work habits

● In academic science there is no formal obligation

● Establishment requires stringent modeling (see previous slides on databases) or significant investments into existing tools

Mobile application of ELNs

Nature Methods 8, 541–543 (2011) doi:10.1038/nmeth.1631

● Handwriting capture technology

● All functionality as on paper● Sketch and manipulate

equations● Draw figures

● All notes can be linked, reordered, archived, edited, tagged, annotated and bundled in virtual 'notebooks' representing different projects

Easy solutions

Evernote as lab notebook

Journal of Laboratory Automation 18(3) 229–234, 2012 Society for Laboratory Automation and Screening

DOI: 10.1177/2211068212471834

Data Management for Quantitative Biology - Database Systems (continued) LIMS and E-lab books by Dr....

Education

CG-LIMS ORD

Progeny LIMS

Progeny Lims Overview

LIMS - Liquid Injection

(Sample Data Management System) A Solution for LIMS ......ELN CRM Inventory APP Web Portal Labware EDD DMS SDMS Laboratory Informatics SDMS vs LIMS LIMS Lab Data Master LIMS SDMS Covers

Lims for Forensic

LIMS Quick start guide - Progeny Software · Progeny LIMS Quick Start Guide 2 Overview - Creating a LIMS Database The LIMS Quick Start Guide is to assist in the construction of a

Baobab LIMS Documentation

LabWare LIMS Privacy Impact Assessment - USDA · LabWare LIMS Cyber and Privacy ... USDA, APHIS, LabWare LIMS Privacy Impact Assessment for the LabWare LIMS April 2014 ... by reviewing

MyFab LIMS

LabLite LIMS Brochure

Proteomics LIMS

AutoLoader 2.x User Guide · For detailed information about LIMS, refer to the appropriate LIMS documentation: `Illumina LIMS User Guide `Illumina LIMS Project Manager Guide NOTE

Presentation for lims

LIMS USER TRAINING

Informationshanteringssystem - LIMS · Informationshanteringssystem - LIMS ... The LabSoft LIMS Microbiology . Kjell Orsborn - UDBL - IT - UU 2011-02-01 30 BIKA LIMS (open source)

LIMS IMPLEMENTATION PROJECT HINDSIGHTSnemc.us/docs/2014/Presentations/Tue-LIMS Implementation-9...LIMS IMPLEMENTATION PROJECT HINDSIGHTS How to avoid pitfalls and delays Presented

CG-LIMS CONOP

LIMS REPORTING MANUAL

Logiciel TEP LIMS