adv bi unit 1

8/9/2019 adv bi unit 1

1/39


2/39

contained (4,5)6,433 records #ith a total o ((,543,(72,85) bases9

see the EMBL DB statistics page!

-t can be accessed and searched through the ; system at EB-, or

one can do#nload the entire database as lat iles! 1n e'ample o #hat

an entry looks like is gi"en or the human ra oncogene protein, -D:

*;1&; !

"enBan www.ncbi.nlm.nih.gov!"enban!

The GenBank nucleotide database is maintained by the


3/39

primary ones, or ha"e a dierent organi$ation o the data to better suit

some speciic purpose! *o#e"er, the nucleotide sequences themsel"es

should al#ays be a"ailable in the EMBL>GenBank databases! -n this

sense, the databases belo# are secondary databases!

&ni"ene www.ncbi.nlm.nih.gov!&ni"ene!

The /niGene system attempts to process the GenBank sequence data

into a non%redundant set o gene%oriented clusters! Each /niGene

cluster contains sequences that represent a unique gene, as #ell as

related inormation such as the tissue types in #hich the gene has

been e'pressed and map location!

S"# genome'www.stan(ord.edu!Saccharomyces!

The accharomyces Genome Database +GD is a scientiic databaseo the molecular biology and genetics o the yeast accharomyces

cere"isiae!

EB) "enomes www.ebi.ac.u!genomes!

This #eb site pro"ides access and statistics or the completed

genomes, and inormation about ongoing pro?ects!

"enome Biology www.ncbi.nlm.nih.gov!"enomes!

The Genome Biology site at


4/39

Ensembl is a ?oint pro?ect bet#een EMBL%EB- and the anger .entre

to de"elop a sot#are system #hich produces and maintains automatic

annotation o eukaryotic genomes!

*rotein Sequence

The t#o protein sequence databases @-%A;=T and A-; are

dierent rom the nucleotide databases in that they are both curated!

This means that groups o designated curators +scientists prepare the

entries rom literature and>or contacts #ith e'ternal e'perts!

SWISS-PROT, TrEMBL www.expasy.ch/sprot/

@-%A;=T is a protein sequence database #hich stri"es to pro"ide

a high level o( annotations +such as the description o the unction o a protein, its domains structure, post%translational modiications,

"ariants, etc!, a minimal le"el o redundancy and high le"el o

integration #ith other databases!

-t #as started in (862 by 1mos Bairoch in the Department o Medical

Biochemistry at the /ni"ersity o Gene"a! This database is generally

considered one o the best protein sequence databases in terms o the

quality o the annotation! ;elease 58!(3 +(( Jan 344( contained

83,3(( entries!

TrEMBL is a computer'annotated supplement o @-%A;=T

that contains all the translations o EMBL nucleotide sequence entries

http://www.expasy.ch/sprot/http://www.expasy.ch/sprot/


5/39

not yet integrated in @-%A;=T! The procedure that is used to

produce it #as de"eloped by ;ol 1p#eiler! ;elease (7!( +7 Jan

344( contained 5)6,(73 entries! The annotation o an entry in

TrEMBL has not +yet reached the standards required or inclusion

into @-%A;=T proper!

@-%A;=T and TrEMBL are de"eloped by the @-%A;=T

groups at #iss -nstitute o Bioinormatics +-B and at EB-! The

databases can be accesses and searched through the the ; system atE'A1y, or one can do#nload the entire database as one single lat

ile! 1n e'ample o #hat an entry looks like is gi"en or the human ra

oncogene protein, -D 0;1&C*/M1

The @-%A;=T database has some legal restrictions: the entries

themsel"es are copyrighted, but reely accessible and usable byacademic researchers! .ommercial companies must pay a license ee

rom -B to use @-%A;=T!

PIR pir.georgetown.e!

The Arotein -normation ;esource +A-; is a di"ision o the


6/39

A-; gre# out o Margaret Dayhos #ork in the middle o the (824s!

-t stri"es to be comprehensive, #ell%organi$ed, accurate, and

consistently annotated! *o#e"er, it is generally belie"ed that it does

not reach the le"el o completeness in the entry annotation as does

@-%A;=T! 1lthough @-%A;=T and A-; o"erlap e'tensi"ely,

there are still many sequences #hich can be ound in only one o

them!

=ne can search or entries or do sequence similarity searches at theA-; site! The database can also be do#nloaded as a set o iles! 1n

e'ample o #hat an entry looks like is gi"en or the human ra%(

oncogene protein, -D T*/&2!

A-; also produces the N+L'#, #hich is a database o sequences

e'tracted rom the three%dimensional structures in the AroteinDatabank +ADB +see also the ollo#ing page in this lecture! The


7/39

domain, it contains a multiple alignment o a set o deining

sequences +the seeds and the other sequences in @-%A;=T and

TrEMBL that can be matched to that alignment!

The database #as started in (882 and is maintained by a consortium

o scientists, among them Erik onnhammer +.G;, 0-, #eden,

ean Eddy +@ash/, t Louis /1, ;ichard Durbin, 1lan Bateman

and E#an Birney +anger .entre, /0! ;elease 7!7 +ep 3444

contains 3)6 amilies!

The alignments can be con"erted into hidden Marov

models +*MM, #hich can be used to search or domains in a query

protein sequence! The sot#are *MME; +by ean Eddy is the

computational oundation or Aam! The domain structure o protein

sequences in @-%A;=T and TrEMBL are a"ailable directly romthe Aam #eb sites, and it is also possible to search or domains in

other sequences using ser"ers at the #eb sites!

The technology behind Aam>*MME; #ill be discussed in a lecture

later in this course!

The Aam database can be searched, or used to identiy domains in a

sequence, or do#nloaded rom the #ebsites abo"e! 1n e'ample o a

multiple sequence alignment that deines a protein amily +domain is

gi"en or the ;a%like ;as%binding domain +Aam name ;BD,

accession code A&43(82!

http://hmmer.wustl.edu/http://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/RBD.alihttp://www.avatar.se/molbioinfo2001/RBD.alihttp://hmmer.wustl.edu/http://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/RBD.alihttp://www.avatar.se/molbioinfo2001/RBD.ali


8/39

The Aam database is licensed under the G


9/39

*rimary and Secondary databases.

Primary Databases:

Databases consisting of data derivedexperimentally such as nucleotide sequences

and three dimentional structures are known as primary databases.

primary databases(consisting of data derived experimentally)

• grown tremendously over the years

• contains information of the sequence or structure alone and associated

annotation information

econdary Dtabases:

!hose data that are derived from the analysis or treatement of primary data such assecondary structures" hydrophobicity plots" and domain are stored in secondary

databases

• contains derived information from a primary database" like information

about conserved sequence" signature sequence and active site residues of

the protein families arrived by multiple sequence alignment of a set of related

proteins

• secondary structure database contains entries of the PDB in an organi#ed

way (for instance" by classification of all PD$ entries according to structures

like alpha%helix or &%sheets) and also information on conserved secondary

structure motifs of a particular protein

composite databases


10/39

• 'oins a variety of different primary database sources" which obviates the

need to search multiple resources

*rimary databases '

"enBan,• The "enBan sequence database is an open access,

annotated collection o all publicly

a"ailable nucleotide sequences and

their protein translations!

• This database is produced and maintained by

the


11/39

• -n the more than 54 years since its establishment,

GenBank has become the most important and most

inluential database or research in almost all biological

ields, #hose data are accessed and cited by millions o

researchers around the #orld!

• GenBank is built by direct submissions rom indi"idual

laboratories, as #ell as rom bulk submissions rom

large%scale sequencing centers!

• =nly original sequences can be submitted to GenBank!

Direct submissions are made to GenBank using Bank-t,

#hich is a @eb%based orm, or the stand%alone

submission program, equin!

• /pon receipt o a sequence submission, the GenBank

sta e'amines the originality o the data and assigns

an accession number to the sequence and perorms

quality assurance checks!

• The submissions are then released to the public database,

#here the entries are retrie"able by Entre$ or

do#nloadable by &TA!

EMBL,

http://www.ncbi.nlm.nih.gov/BankIt/http://www.ncbi.nlm.nih.gov/Sequin/http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)http://en.wikipedia.org/wiki/Entrezhttp://en.wikipedia.org/wiki/File_Transfer_Protocolhttp://www.ncbi.nlm.nih.gov/BankIt/http://www.ncbi.nlm.nih.gov/Sequin/http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)http://en.wikipedia.org/wiki/Entrezhttp://en.wikipedia.org/wiki/File_Transfer_Protocol


12/39

• The EMBL >###!ebi!ac!uk>embl, maintained at the European

Bioinormatics -nstitute +EB- near .ambridge, /0, is a

comprehensi"e collection o nucleotide sequences and

annotation rom a"ailable public sources!

• The database is part o an international collaboration

#ith DDBJ +Japan and GenBank +/1!

• Data are e'changed daily bet#een the collaborating

institutes!

• @ebinis the preerred tool or indi"idual submissions o

nucleotide sequences, including Third Aarty 1nnotation

+TA1 and alignments!

• 1utomated procedures are pro"ided or submissions

rom large%scale sequencing pro?ects and data rom the

European Aatent =ice!

•


13/39

• =ther tools are a"ailable or sequence similarity

searching +e!g! &1T1 and BL1T!

##B$,

• The #N #ata Ban o( $apan +DDBJ is a biological

database that collects D


14/39

• DDBJ is primarily unded by the Japanese Ministry o

Education, .ulture, ports, cience and

Technology +MEFT!• The principal purpose o DDBJ operations is to impro"e

the quality o -


15/39

• sequence database consists o sequence entries! equence

entries are composed o dierent line types,

•

each #ith their o#n ormat! &or standardi$ation purposesthe ormat o @-%A;=T ollo#s as closely as

• possible that o the EMBL TrEMBL contains high%quality

computationally analy$ed records, #hich are enriched

#ith automatic annotation!

• -t #as introduced in response to increased datalo#

resulting rom genome pro?ects, as the time% and labour%

consuming manual annotation process o

/niArot0B>#iss%Arot could not be broadened to include

all a"ailable protein sequences!


16/39

• The translations o annotated coding sequences in

the EMBL%Bank>GenBank>DDBJ nucleotide sequence

database are automatically processed and entered in

/niArot0B>TrEMBL! /niArot0B>TrEMBL also contains

sequences rom ADB, and rom gene prediction,

including Ensembl, ;eeqand ..D!

• Due to the nature o the source /niArot0B>TrEMBL is

highly redundant and the quality o the annotation is "ery

"ariable! 1s #ell as the original annotations carried o"er

rom EMBL%Bank additional annotations are added

based on a series o automated annotation #orklo#s!

• 1s the entries in /niArot0B>TrEMBL and manually

re"ie#ed by the /niArot curators they graduate into

/niArot0B>#iss%Arot +the human curated section o

/niArot0B and may be merged into e'isting entries

#hich describe the same gene in the same species!

• The usual #iss%Arot annotation pipeline in"ol"es the

manual annotation o TrEMBL entries, their integration

into #iss%Arot, #ith their original accession number,

and subsequent deletion rom TrEMBL!

•

Secondary databases 5

http://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Ensemblhttp://en.wikipedia.org/wiki/RefSeqhttp://en.wikipedia.org/wiki/Consensus_CDS_Projecthttp://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Ensemblhttp://en.wikipedia.org/wiki/RefSeqhttp://en.wikipedia.org/wiki/Consensus_CDS_Project


17/39

*+3S)4E,

• *+3S)4E is a protein database!(H3H -t consists o entries

describing the protein amilies, domains and unctional

sites as #ell as amino acid patterns and proiles in them!

• -t is based on the obser"ation that, #hile there is a huge

number o dierent proteins, most o them can be

grouped, on the basis o similarities in their sequences,

into a limited number o amilies!

• Aroteins or protein domains belonging to a particular

amily generally share unctional attributes and are

deri"ed rom a common ancestor!

• A;=-TE currently contains patterns and proiles

speciic or more than a thousand protein amilies or

domains!

• Each o these signatures comes #ith documentation

pro"iding background inormation on the structure and

unction o these proteins!

•

The Aro;ule section o A;=-TE is constituted o manually created rules that can automatically generate

annotation in the /niArot0B>#iss%Arot ormat based on

A;=-TE motis!

http://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/PROSITE#cite_note-DeCastro2006-1http://en.wikipedia.org/wiki/PROSITE#cite_note-Hulo2007-2http://en.wikipedia.org/wiki/Protein_familieshttp://en.wikipedia.org/wiki/Protein_domainshttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Amino_acidhttp://www.uniprot.org/http://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/PROSITE#cite_note-DeCastro2006-1http://en.wikipedia.org/wiki/PROSITE#cite_note-Hulo2007-2http://en.wikipedia.org/wiki/Protein_familieshttp://en.wikipedia.org/wiki/Protein_domainshttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Amino_acidhttp://www.uniprot.org/


18/39

• A;=-TEs uses include identiying possible unctions o

ne#ly disco"ered proteins and analysis o kno#n

proteins or pre"iously undetermined acti"ity!• A;=-TE oers tools or protein sequence analysis and

moti detection +see sequence moti , A;=-TE patterns!

-t is part o the E'A1y proteomicsanalysis ser"ers!

*+)N4S,• *+)N4S database is a collection o so%called

IingerprintsI

• it pro"ides both a detailed annotation resource or protein

amilies, and a diagnostic tool or ne#ly determinedsequences!

• 1 ingerprint is a group o conser"ed motis taken rom

a multiple sequence alignment % together, the motis orm

a characteristic signature or the aligned protein amily!

• The motis themsel"es are not necessarily contiguous in

sequence, but may come together in 5D space to deine

molecular binding sites or interaction suraces!

http://en.wikipedia.org/wiki/Sequence_analysishttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Sequence_motif#PROSITE_pattern_notationhttp://en.wikipedia.org/wiki/ExPASyhttp://en.wikipedia.org/wiki/Proteomicshttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_analysishttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Sequence_motif#PROSITE_pattern_notationhttp://en.wikipedia.org/wiki/ExPASyhttp://en.wikipedia.org/wiki/Proteomicshttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Multiple_sequence_alignment


19/39

• The particular diagnostic strength o ingerprints lies in

their ability to distinguish sequence dierences at the

clan, superamily, amily and subamily le"els!

• This allo#s ine%grained unctional diagnoses o

uncharacterised sequences, allo#ing, or e'ample,

discrimination bet#een amily members on the basis o

the ligands they bind or the proteins #ith #hich theyinteract, and highlighting potential oligomerisation or

allosteric sites!

• A;-


20/39

• ie# protein domain architectures

• E'amine species distribution

• &ollo# links to other databases

• ie# kno#n protein structures

•


21/39

The database can be searched by e%mail and @orld @ide

@eb +@@@ ser"ers +http:>>blocks!hcrc!org>help to

classiy protein and nucleotide sequences! The description o a protein amily by its conser"ed

regions ocuses on the amilys characteristic and

distincti"e sequence eatures, thus reducing noise!

Databases o conser"ed eatures o protein amilies can

be utili$ed to classiy sequences rom proteins, cD


22/39

Bio"+)#

• The Biological "eneral +epository (or )nteraction

#atasets +Bio"+)# is a curated biological

database o protein%protein and genetic interactions

created in 3445

• -t stri"es to pro"ide a comprehensi"e resource

o proteinKprotein and genetic interactions or all

ma?or model organism species #hile attempting to

remo"e redundancy to create a single mapping o

interactions!• The Biological General ;epository or -nteraction

Datasets +BioG;-D database #as de"eloped to house

and distribute collections o protein and genetic

interactions rom ma?or model organism species!

• /sers o The BioG;-D can search or their protein o

interest and retrie"e annotation, as #ell as physical

and genetic interaction data as reported, by the primary

literature and compiled by in house large%scale curation

eorts!

http://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Geneticshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Geneticshttp://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Geneticshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Genetics


23/39

• =riginally separated into organism speciic databases, the

ne#est "ersion no# pro"ides a uniied ront end allo#ing

or searches across se"eral organisms simultaneously!• The BioG;-D is unded by the BB;.,


24/39

• Each o the member databases o -nterAro contribute

to#ards a dierent niche, rom "ery high%le"el, structure%

based classiications +/AE;&1M-LN and .1T*%

Gene5D through to quite speciic sub%amily

classiications +A;-


25/39

• The data, typically obtained by F%ray

crystallography or


26/39


27/39

• 1 moti"ation or this classiication is to determine the

e"olutionary relationship bet#een proteins!

• Aroteins #ith the same shapes but ha"ing little sequence

or unctional similarity are placed in dierent

IsuperamiliesI, and are assumed to ha"e only a "ery

distant common ancestor!

• Aroteins ha"ing the same shape and some similarity o

sequence and>or unction are placed in IamiliesI, and

are assumed to ha"e a closer common ancestor!

The .=A database is reely accessible on the internet!

.=A #as created in (88!(H

The source o protein structures is the Arotein Data Bank !

The unit o classiication o structure in .=A is

the protein domain!

The shapes o domains are called IoldsI in .=A!

Domains belonging to the same old ha"e the same ma?or

secondary structures in the same arrangement #ith the

same topological connections!

http://en.wikipedia.org/wiki/Structural_Classification_of_Proteins_database#cite_note-NAR2007-1http://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Protein_domainhttp://en.wikipedia.org/wiki/Structural_Classification_of_Proteins_database#cite_note-NAR2007-1http://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Protein_domain


28/39

The le"els o .=A are as ollo#s!

• .lass: Types o olds, e!g!, beta sheets!

• &old: The dierent shapes o domains #ithin a class!

• uperamily: The domains in a old are grouped into

superamilies, #hich ha"e at least a distant common

ancestor!

• &amily: The domains in a superamily are grouped into

amilies, #hich ha"e a more recent common ancestor!

• Arotein domain: The domains in amilies are grouped

into protein domains, #hich are essentially the same

protein!

• pecies: The domains in Iprotein domainsI are grouped

according to species!

• Domain: part o a protein! &or simple proteins, it can be

the entire protein!

/47,


29/39

The /47 *rotein Structure /lassi(icationis a semi%

automatic, hierarchical classiication o protein domains

.1T* shares many broad eatures #ith its principalri"al, .=A, ho#e"er there are also many areas in #hich

the detailed classiication diers greatly!

=nly crystal structures sol"ed to resolution better than

!4 angstroms are considered, together #ith


30/39

7omologous

superamily

indicati"e o a demonstrable e"olutionary

relationship! Equi"alent to the

superamily le"el o .=A!

.lass is determined according to the secondarystructure composition and packing #ithin the

structure! Three ma?or classes are recognised9

mainly%alpha, mainly%beta and alpha%beta!

•

Euro/arb#B,

• Euro/arb#B is an E/%unded initiati"e or the creation

o sot#are and standards or the systematic collection

o carbohydrate structures and their e'perimental data!

• The E/;=.arbDB pro?ect is a design study or a

technical rame#ork, #hich pro"ides sophisticated,

reely accessible, open%source inormatics tools and

databases to support glycobiology and glycomic

research!

http://en.wikipedia.org/wiki/European_Unionhttp://en.wikipedia.org/wiki/Carbohydratehttp://en.wikipedia.org/wiki/European_Unionhttp://en.wikipedia.org/wiki/Carbohydrate


31/39

• E/;=.arbDB is a relational database containing glycan

structures, their biological conte't and, #hen a"ailable,

primary and interpreted analytical data rom high%

perormance liquid chromatography, mass spectrometry

and nuclear magnetic resonance e'periments!

• Database content can be accessed "ia a #eb%based user

interace!

• The database is complemented by a suite o

glycoinormatics tools, speciically designed to assist the

elucidation and submission o glycan structure and

e'perimental data #hen used in con?unction #ith

contemporary carbohydrate research #orklo#s

• The pro?ect includes a database o kno#n carbohydrate

structures and e'perimental data, speciically mass

spectrometry, *AL. and


32/39

• 1 speciic design ob?ecti"e o the architecture o the

database #as to allo# or the e'tension and incorporation

o ne# modules and tools to support urther types o

e'perimental data and #orklo#s!

*ub/hem /ompound,

*ub/hem is a database o chemicalmolecules and their

acti"ities against biological assays! The system ismaintained by the


33/39

ubstances,

Bio1ssay,

• Aub.hem .ompound + is a searchable database o

chemical structures #ith "alidated chemical depiction

inormation pro"ided to describe substances in Aub.hem

ubstance!

•

tructures stored #ithin Aub.hem .ompounds are pre%clustered and cross%reerenced by identity and similarity

groups!

• Aub.hem .ompound includes o"er 7M compounds!

•

• Molecular chemical properties, and

descriptors!

• imple Elemental earches +all compounds containing

Gallium allo# searching #ith speciic element

restrictions!

•

#rugBan,


34/39


35/39


36/39

• The database contains more than 54 million

unique molecules rom o"er 74 data sources including:

/!! &ood and Drug 1dministration +&D1,


37/39

#eight range, .1 numbers, suppliers, etc! The search

can be used to #iden or restrict already ound results!

• tructure searching on mobile de"ices can be done using

ree apps or i=+iAhone>iAod>iAad(2H and or

the 1ndroid +operating system!()H

and /ambridge Structural #atabase.

• The /ambridge Structural #atabase +.D, is a

repository or small molecule crystal structures!

• cientists use single%crystal '%ray crystallography to

determine the crystal structure o a compound!

• =nce the structure is sol"ed, inormation about the

structure is sa"ed in a ile +.-& ormat and deposited in

the .D!

• =ther scientists can search and retrie"e structures rom

the database!

• The inormation consists o the space group symmetry o

the crystalline phase, its cell parameters, the relati"e

atomic coordinates o all the atoms in the cell in 5D!

http://en.wikipedia.org/wiki/Molecular_weighthttp://en.wikipedia.org/wiki/Chemical_Abstracts_Servicehttp://en.wikipedia.org/wiki/IOShttp://en.wikipedia.org/wiki/ChemSpider#cite_note-16http://en.wikipedia.org/wiki/Android_operating_systemhttp://en.wikipedia.org/wiki/ChemSpider#cite_note-17http://en.wikipedia.org/wiki/Moleculehttp://en.wikipedia.org/wiki/Crystal_structureshttp://en.wikipedia.org/wiki/X-ray_crystallographyhttp://en.wikipedia.org/wiki/Crystallographic_Information_Filehttp://en.wikipedia.org/wiki/Space_grouphttp://en.wikipedia.org/wiki/Lattice_constanthttp://en.wikipedia.org/wiki/Atomshttp://en.wikipedia.org/wiki/Molecular_weighthttp://en.wikipedia.org/wiki/Chemical_Abstracts_Servicehttp://en.wikipedia.org/wiki/IOShttp://en.wikipedia.org/wiki/ChemSpider#cite_note-16http://en.wikipedia.org/wiki/Android_operating_systemhttp://en.wikipedia.org/wiki/ChemSpider#cite_note-17http://en.wikipedia.org/wiki/Moleculehttp://en.wikipedia.org/wiki/Crystal_structureshttp://en.wikipedia.org/wiki/X-ray_crystallographyhttp://en.wikipedia.org/wiki/Crystallographic_Information_Filehttp://en.wikipedia.org/wiki/Space_grouphttp://en.wikipedia.org/wiki/Lattice_constanthttp://en.wikipedia.org/wiki/Atoms


38/39

• cientists can use the .D to compare e'isting data #ith

that obtained rom crystals gro#n in their laboratories!

• The inormation can also be used to "isuali$e the

structure in a "ariety o sot#are such

as atoms, powdercell etc!

• -t is also possible to calculate #hat the

theoretical po#der diraction pattern o the phase #ould

look like! This option is particularly important or

analytical reasons because it acilitates the identiication

o phases present in a crystalline po#der mi'ture #ithout

the need or gro#ing crystals!

• Many o the small molecules are organic compounds o

the sort that could potentially act as medical drugs, and a

"ery important use o the .D is or structural

comparisons among related molecules that can suggest

ne# leads or drug design!

• The .D is compiled and maintained by the .ambridge

.rystallographic Data .entre!

• Each crystal structure undergoes e'tensi"e "alidation and

cross%checking by e'pert chemists and crystallographers

http://en.wikipedia.org/wiki/Crystalshttp://en.wikipedia.org/wiki/Powder_diffractionhttp://en.wikipedia.org/wiki/Organic_compoundshttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centrehttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centrehttp://en.wikipedia.org/wiki/Crystalshttp://en.wikipedia.org/wiki/Powder_diffractionhttp://en.wikipedia.org/wiki/Organic_compoundshttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centrehttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centre


39/39

to ensure that the .D is maintained to the highest

possible standards!

• 1lso, each database entry is enriched #ith bibliographic,

chemical and physical property inormation, adding

urther "alue to the ra# structural data!

• These editorial processes are "ital or enabling scientists

to interpret structures in a chemically meaningul #ay!

• The .D is continually updated #ith ne# structures

+Q4,444 ne# structures each year and #ith

impro"ements to e'isting entries!

• @ith regular #eb%updates and early online access to

ne#ly published structures you can keep ully inormed

o the latest research!

Documents

adv bi unit 1