kfc/stbi strukturní bioinformatikafch.upol.cz/wp-content/uploads/2015/07/04_stbi_databases.pdf ·...
TRANSCRIPT
-
KFC/STBIStrukturní bioinformatika
04_databáze
Karel Berka
-
htttp://www.rcsb.org/pdb/
Databáze – není jich málo…
-
Primární strukturní databáze
• PDBe: Protein Data Bank in Europe– doplnění PDB z BMRB (NMR) a EMDB (EM)
• PDBsum :– shromažďuje další informace o struktuře
• PDBwiki: A community annotated knowledge base of biological molecular structures– wikipedia o PDB strukturách
• NDB: Nucleic Acid Structure Database– databáze Nukleových struktur
• CSD: Cambridge Structural Database– databáze krystalů malých molekul – placená
• MODBASE: Database of Comparative Protein Structure Models– databáze modelů proteinů
-
Sekundární databáze
• SCOP: Structural Classification of Proteins– hledání strukturních rodin proteinů
• CATH: – hledání strukturních rodin proteinů
• GENE3D:– strukturní genomika
• 3Dee– Database of Protein Domain Definitions
• FSSP: – Based on exhaustive all-against-all 3D structure comparison of
protein structures currently in the Protein Data Bank (PDB)• DALI:
– Fold Classification based on Structure-Structure Assignments
-
PDBehttp://www.ebi.ac.uk/pdbe/• Souhrnná relační databáze macromolekulárních struktur
-
Example of an Atlas page, in this case for PDB entr y 1E9F.
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
PDBe
Navigačnímenu
sekvence anotovanáz dalších databází
Uniprot
CATH
Pfam
SCOP
-
Schematic overview of the process by which SIFTS fi les are generated (see text for details).
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
SIFTS format
Structure Integration with Function, Taxonomy and Sequence
-
PDBe – služby
http://www.ebi.ac.uk/pdbe-srv/msdmineSupports ad-hoc queries and data analysis based on the
relational PDBe databasePDBeMine
http://www.ebi.ac.uk/pdbe/olderado/Clustering information for NMR entries in the PDBOLDERADO
http://www.ebi.ac.uk/pdbe-as/PDBeValidateValidation and analysis of PDBe dataPDBeAnalysis
http://www.ebi.ac.uk/pdbe-as/PDBeTemplate/Search of local residue interactions in the PDBPDBeTemplate
http://www.ebi.ac.uk/msd-srv/ssm/Secondary Structure Matching (SSM) service for
comparing protein structures in 3DPDBeFold
http://www.ebi.ac.uk/msd-srv/prot_int/pistart.htmlSearch and analysis of Protein Interfaces, Surfaces
and AssembliesPDBePISA
http://www.ebi.ac.uk/pdbe-site/PDBeMotif/Query and analysis of structure, sequence motifs and
interactionsPDBeMotif
http://www.ebi.ac.uk/msd-srv/chempdbLigand search using the PDB reference dictionaryPDBeChem
http://www.ebi.ac.uk/pdbe-srv/emsearchSearch system for the EM DatabaseEMsearch
http://www.ebi.ac.uk/pdbe-srv/pdbeliteSearch system based on the relational PDBe databasePDBeLite
http://www.ebi.ac.uk/pdbe-srv/viewText-based and advanced PDB search toolPDBeView
http://www.ebi.ac.uk/pdbe-as/PDBeMapQuick/Quick access to cross-reference information to external
databases based on PDB IDPDBeMapQuick
http://www.ebi.ac.uk/pdbe-as/pdbStatusSearch system to query the status of PDB entriesPDBeStatus
http://www.ebi.ac.uk/pdbe/docs/biobar.htmlSearch system implemented as a toolbar application
for Mozilla browsersBIObar
-
A toolbar search application for Mozilla/Netscape or firefox browsers
http://biobar.mozdev.org/
Simple and quick retrieval of data from PDBe and 45 other Databases
Biobar
-
PDBeChem• „Ligandy” v PDB• Vázané molekuly (např. cukry,
lipidy, inhibitory, koenzymy and kofaktory)
• Unikátní 3 písmenný kód– atom, element type, connectivity,
bond orders, stereochemicalconfiguration
• Hledání dle– By ligand code
– By ligand name
– By formula– By non-stereo SMILE
– By stereo SMILE
– By exact stereo structure– By fingerprint similarity
– By fragment expression
-
Example of a graphically defined query that can be submitted to
PDBeMotif.
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
PDBeMotif• Hledání dle
a) Ligands and their 3D environment
b) protein families (SCOP, CATH, UNIPROT, EC-number)
c) protein secondary structures and different 3D motifs (PROSITE, beta turn, catalytic sites etc.)
d) protein Φ/Ψ angle sequences
• Výsledky:
a) Sequence multiple alignment
b) 3D multiple alignment of fragments, motifs and protein chains.
c) Interactions statistics
d) Motifs characteristics and properties distribution charts.
-
• Define search by ligand
• Define search by sequence motif (pattern)
• Define search by metal site geometry
• Define search by environment
• has same environment
• has similar environment
PDBe-site page
• Compare ligand environments.
• Analyze interactions between ligand and protein.
• Compare binding environment.
• Look for ligands within a certain environment.
• Superpose binding sites and ligands.• Predict what could bind that empty
pocket in your structure
-
What assembly can my structure have ?
PDBePisa
• PQS – protein quarternary structure
• velmi obtížné získat predikcí –krystalografie a EM
-
The new EMViewer 3D visualization Java applet is av ailable on the EMDB Atlas pages and allows interactive generation of isosurface represe ntations.
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
EMviewer
-
PDBsum
-
Schematic diagrams from the PDBsum ‘Protein page’ fo r entry 1a5z: lactate dehydrogenase from Thermatoga maritima (16).
Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)
PDBSum
• Snaha mít všechny informace na jednom místě
• Dodatečné analýzy– schéma sekundárních
struktur– Ligplot
-
Extracts from the protein–protein interaction diagr ams in PDBsum for PDB entry 1mmo, a non-haem iron hydroxylase from Methylococcus capsul atus (17).
Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)
PDBSum interfaces
-
NDB
-
NDB
• DNA• RNA
-
NDB3D struktura 2D struktura
RNAview
-
CSD
• The Cambridge Structural Database
• www.ccdc.cam.ac.uk• malé látky
• placená + pro výukové účely otevřený set 500 látek
600050730ProteinsPDB
5003555Nucleic AcidsNDB
40000488057Organics, Metal-OrganicsCSD
9000100200Inorganics & MineralsICSD
9000119600Metals, alloys, inorganicsCRYSTMET
za rokTotal (2009)co?DB
-
CSD - komponenty
-
WebCSD
-
Mercury• Mercury visualiser
– Crystal structure visualisation program by CCDC
• Free• Teaching subset embedded
-
A zpátky k proteinům...
-
Klasifikace struktur proteinů
Class:similar contents of secondary structures
Architecture (Fold):structural similarity
Superclass (Topology):probably same ancestor
• SCOP, CATH, FSSP, 3Dee
-
SCOP
• Structural Classification of Proteins• manual classification of protein structural domains based on
similarities of their amino acid sequences and three-dimensional structures.
• SCOP utilizes four levels of hierarchic structural classification:– class - general "structural architecture" of the domain– fold - similar arrangement of regular secondary structures but without
evidence of evolutionary relatedness– superfamily - sufficient structural and functional similarity to infer a
divergent evolutionary relationship but not necessarily detectablesequence homology
– family - some sequence similarity can be detected.
Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classification of proteins database for theinvestigation of sequences and structures. J. Mol. Biol. 247, 536-540.
-
CATH• manually-curated hierarchical
classification of protein domainstructures.
• více automatizované, než SCOP • Class
– secondary structure content• (mainly-alpha, mainly-beta,
mixed alpha/beta or 'fewsecondary structures');
• Architecture– general arrangement of the
secondary structuresirrespective of connectivitybetween them
• (e.g. alpha/beta sandwich);
• Topology (Fold)– connectivity of secondary
structures in the chain;• Homologous Superfamily
– domains that are believed to berelated by a commonancestor .
• S-levels– automated clustering based on
sequence identity.
-
CATH
-
GENE3D
• Gene3D – large collection of CATH protein domain
assignments for ENSEMBL genomes andUniprot sequences
– functional information, as well as taxonomicdistributions, multi-domain architectures andprotein-protein interaction (PPI) data.
-
FSSP - fold classificationwww2.embl-
ebi.ac.uk/dali/fssp/
structurallysuperimposedproteins by (DALI)
"Distance-matrix ALIgnment"
-
3Dee – domény
http://www.compbio.dundee.ac.uk/3Dee/Hierarchie jednotlivých domén
klastrování dle strukturní podobnosti
Dengler, U., Siddiqui, A. S. & Barton, G. J. (2001). Protein structural domains: Analysis of the 3Dee domains database. Proteins 42 , 332-344. Siddiqui, A. S., Dengler, U. & Barton, G. J. (2001). 3Dee: A database of protein structural domains. Bioinformatics 17, 200-201.
-
Databáze, na které se nedostalo...
• Relibase– protein-ligand interactions
• Modbase, SWISSModel repository, MMDB– databáze modelů
• MolMovdb– Macromolecular Motions database
• A spousta dalších většinou specifických pro daný problém– např. jen pro cytochromy P450
• CYPED, SuperCyp, Cytochrome P450 Homepage, Fungal CYP database, CYPallelles, Arabidopsis Cytochrome P450s, Cytochrome P450 Drug Interactions Table, a další.
• Pak nezbývá, než použít Google. :o)