24
© 2013 DBCLS Licensed under CC BY 2.1JAPAN 生命科学分野の大規模データ利用 技術開発の現状と今後の展開 ライフサイエンス統合データベースセンター(DBCLS) 坊農 秀雅(a.k.a. @bonohu) Technology development of database integration in lifescience by Hidemasa Bono from Database Center for Life Science(DBCLS) Pictures from Togo Picture Gallery http://g86.dbcls.jp/togopic/

Technology development of database integration in lifescience

Embed Size (px)

DESCRIPTION

While the data from experiments by DNA microarrays and Next Generation DNA Sequencers(so called NGS) are so huge and it is thus hard for wet biologists to handle them, many biologists now try to make full use of public database. In DBCLS, we are developing the technology for the re-use of big data including DNA sequence from NGS and providing such information for wet biologists. I will show you the current status and the future of our project. -- 第3回 SPARC Japan セミナー2013「オープンアクセス時代の研究成果のインパクトを再定義する:再利用とAltmetricsの現在」 http://www.nii.ac.jp/sparc/event/2013/20131025.html で話した講演「生命科学分野の大規模データ利用技術開発の現状と今後の展開」のスライドです。 要旨:DNAマイクロアレイや新型DNAシーケンサ(Next Generation Sequencers)といった大規模解析による実験データの量は膨大でそのデータハンドリングは実験生物学者には困難であるが,論文発表に伴って公開されたデータを蓄積した公共データベースをフル活用する新しい研究スタイルが注目されてきている。DBCLSではそれらを再利用する利用技術を開発し,実験生物学者の情報技術的な自立を促すための情報提供を行ってきた。本講演ではその現状を紹介し,今後について展望する。

Citation preview

Page 1: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

生命科学分野の大規模データ利用技術開発の現状と今後の展開

ライフサイエンス統合データベースセンター(DBCLS)

坊農 秀雅(a.k.a. @bonohu)Technology development of database integration in lifescienceby Hidemasa Bono from Database Center for Life Science(DBCLS)

Pictures from Togo Picture Gallery http://g86.dbcls.jp/togopic/

Page 2: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Who we are: togoDB• The integrated database project in Japan• Collaborative effort to recycle data

–Provide data which can easily reuse–Retain data which is part of ‘public data’

2

TogoHeadquarters

Technology developer

DNA data archiver

Universities & institutesData organizer

http://biosciencedbc.jp/

Page 3: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

NBDC portal

3

http://biosciencedbc.jp/

Page 4: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN4

http://integbio.jp/dbcatalog/

Page 5: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Big data in lifescience解釈のために

1. DB統合化技術開発2. 信頼出来るコンテンツ作成

Page 6: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

RDFによるDB統合ゲノムの配列情報と多種多様なアノテーションデータを個別のオントロジー、データ変換プログラムを開発し RDF 形式にして統合

6

Slide from トーゴーの日シンポジウム2013 「データベース統合の実現に向けて2」by 岡本忍 (DBCLS)

NCBI: BioProject/RefSeq -- 既存のリファレンス配列DDBJ: Annotation pipeline/GTPS -- 新規ゲノム配列

ゲノム配列

UniProt: Protein functions and links Formats: GFF3, GTF, GVF, DAS, BED ... Tools: Cufflinks, BLAST, InterProt ...

アノテーション

NCBO: BioPortal, OBO (GO, SO ...)DBCLS: MEO, GMO, MCCV ...

オントロジー

INSDC, NCBI: SRA, GEODBCLS: RefEx, KusarinokoGOLD, GSC: 環境メタデータBulk data: 文献, 画像 ...

実験・メタデータ

Page 7: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Big data in lifescience• Output mostly from machines

–NGS(Next Generation Sequencers)• over 100M lines, 2Gbyte in size/sample• Ethical issues: Personal human genome

• So many variations in...–Data format–Application: re-sequencing, de novo seq, RNA-seq,...–Annotation: granularity of metadata

Pictures from Togo Picture Galleryhttp://g86.dbcls.jp/togopic/

SRA GEO ArrayExpress

GenomeMetagenome

RNAseqChIPseq microarray (GeneChip,

Oligoarray)

Page 8: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

1. DBCLS SRA

• Yellow pages for NGS data archived–Indexed by metadata. Search by....

• Statistics• Publications• Diseases

–Direct link to original DB(SRA)• Pre-calculated QC data

8

Search data

Download

Quality Check

Data processing

Analysis

Pipeline to help users re-use public NGS data

http://SRA.dbcls.jp/

Page 9: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN9

Statistics: studies

http://SRA.dbcls.jp/

Page 10: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Search by publications

10 http://bit.ly/sra2pubmed

Page 11: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Search by diseases

11

Page 12: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Search by diseases(cont.)

12

Nakazato T, Ohta T, Bono HExperimental design-based functional mining and characterization of high-throughput sequencing data in the Sequence Read Archive.PLOS ONE. 2013; doi: 10.1371/journal.pone.0077910

Page 13: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Big data in lifescience解釈のために

1. DB統合化技術開発2. 信頼出来るコンテンツ作成

Page 14: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

新着論文レビュー

14クリエイティブ・コモンズ 表示 2.1 日本

http://first.lifesciencedb.jp/

Page 15: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

領域融合レビュー

クリエイティブ・コモンズ 表示 2.1 日本15

http://leading.lifesciencedb.jp/

Page 16: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

統合TV (togoTV)•動画によるDBやツールのチュートリアル‒ 各DBやツール名で検索

•統合データベース講演会AJACSの動画も•YouTubeにも

•約700の動画             (アップデート込)

16クリエイティブ・コモンズ 表示 2.1 日本

http://togotv.dbcls.jp/

http://youtube.com/togotv

Page 17: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN17

http://allie.dbcls.jp/

Page 18: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

inMeXes

18 http://docman.dbcls.jp/im/

Page 19: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN19

Page 20: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN20

LifeScience Dictionaryのサイトにリンク

Page 21: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

まとめ•ライフサイエンス分野ではDBCLSなどのセンターがDB統合化に取り組んでいる1.DB統合化技術開発

2.信頼出来るコンテンツ作成

•現状「まずは使ってもらう」フェーズ•その一方で、測定機器のムーアの法則を上回るパーフォーマンスによるアーカイブデータの爆発

21

Page 22: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

今後どうあるべきか•データを出したがらない状況を変える

–データの適切なcitationがなされるように–公的研究費から得たデータを売る人の撲滅

•「データを流通させると御利益がある」ことの普及–tracking機能の充実–成功事例の充実

22

Page 23: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN23

Hidemasa Bonoafter his Genome karaoke presentationat GENOME INFORMATICS meetingWellcome Trust Genome CampusHinxton, Cambridge, U.K.

Lead the next scientific revolution.Submit your best work to PLoS Biology. www.plos.org

I choose

Page 24: Technology development of database integration in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

I still choose Open Access.• BMC Genomics

–Associate Editor (2008年12月~)

• PLOS Supporter :)

24