kfc/stbi strukturní bioinformatikafch.upol.cz/wp-content/uploads/2015/07/04_stbi_databases.pdf ·...

33
KFC/STBI Strukturní bioinformatika 04_databáze Karel Berka

Upload: others

Post on 30-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • KFC/STBIStrukturní bioinformatika

    04_databáze

    Karel Berka

  • htttp://www.rcsb.org/pdb/

    Databáze – není jich málo…

  • Primární strukturní databáze

    • PDBe: Protein Data Bank in Europe– doplnění PDB z BMRB (NMR) a EMDB (EM)

    • PDBsum :– shromažďuje další informace o struktuře

    • PDBwiki: A community annotated knowledge base of biological molecular structures– wikipedia o PDB strukturách

    • NDB: Nucleic Acid Structure Database– databáze Nukleových struktur

    • CSD: Cambridge Structural Database– databáze krystalů malých molekul – placená

    • MODBASE: Database of Comparative Protein Structure Models– databáze modelů proteinů

  • Sekundární databáze

    • SCOP: Structural Classification of Proteins– hledání strukturních rodin proteinů

    • CATH: – hledání strukturních rodin proteinů

    • GENE3D:– strukturní genomika

    • 3Dee– Database of Protein Domain Definitions

    • FSSP: – Based on exhaustive all-against-all 3D structure comparison of

    protein structures currently in the Protein Data Bank (PDB)• DALI:

    – Fold Classification based on Structure-Structure Assignments

  • PDBehttp://www.ebi.ac.uk/pdbe/• Souhrnná relační databáze macromolekulárních struktur

  • Example of an Atlas page, in this case for PDB entr y 1E9F.

    Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

    PDBe

    Navigačnímenu

    sekvence anotovanáz dalších databází

    Uniprot

    CATH

    Pfam

    SCOP

  • Schematic overview of the process by which SIFTS fi les are generated (see text for details).

    Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

    SIFTS format

    Structure Integration with Function, Taxonomy and Sequence

  • PDBe – služby

    http://www.ebi.ac.uk/pdbe-srv/msdmineSupports ad-hoc queries and data analysis based on the

    relational PDBe databasePDBeMine

    http://www.ebi.ac.uk/pdbe/olderado/Clustering information for NMR entries in the PDBOLDERADO

    http://www.ebi.ac.uk/pdbe-as/PDBeValidateValidation and analysis of PDBe dataPDBeAnalysis

    http://www.ebi.ac.uk/pdbe-as/PDBeTemplate/Search of local residue interactions in the PDBPDBeTemplate

    http://www.ebi.ac.uk/msd-srv/ssm/Secondary Structure Matching (SSM) service for

    comparing protein structures in 3DPDBeFold

    http://www.ebi.ac.uk/msd-srv/prot_int/pistart.htmlSearch and analysis of Protein Interfaces, Surfaces

    and AssembliesPDBePISA

    http://www.ebi.ac.uk/pdbe-site/PDBeMotif/Query and analysis of structure, sequence motifs and

    interactionsPDBeMotif

    http://www.ebi.ac.uk/msd-srv/chempdbLigand search using the PDB reference dictionaryPDBeChem

    http://www.ebi.ac.uk/pdbe-srv/emsearchSearch system for the EM DatabaseEMsearch

    http://www.ebi.ac.uk/pdbe-srv/pdbeliteSearch system based on the relational PDBe databasePDBeLite

    http://www.ebi.ac.uk/pdbe-srv/viewText-based and advanced PDB search toolPDBeView

    http://www.ebi.ac.uk/pdbe-as/PDBeMapQuick/Quick access to cross-reference information to external

    databases based on PDB IDPDBeMapQuick

    http://www.ebi.ac.uk/pdbe-as/pdbStatusSearch system to query the status of PDB entriesPDBeStatus

    http://www.ebi.ac.uk/pdbe/docs/biobar.htmlSearch system implemented as a toolbar application

    for Mozilla browsersBIObar

  • A toolbar search application for Mozilla/Netscape or firefox browsers

    http://biobar.mozdev.org/

    Simple and quick retrieval of data from PDBe and 45 other Databases

    Biobar

  • PDBeChem• „Ligandy” v PDB• Vázané molekuly (např. cukry,

    lipidy, inhibitory, koenzymy and kofaktory)

    • Unikátní 3 písmenný kód– atom, element type, connectivity,

    bond orders, stereochemicalconfiguration

    • Hledání dle– By ligand code

    – By ligand name

    – By formula– By non-stereo SMILE

    – By stereo SMILE

    – By exact stereo structure– By fingerprint similarity

    – By fragment expression

  • Example of a graphically defined query that can be submitted to

    PDBeMotif.

    Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

    PDBeMotif• Hledání dle

    a) Ligands and their 3D environment

    b) protein families (SCOP, CATH, UNIPROT, EC-number)

    c) protein secondary structures and different 3D motifs (PROSITE, beta turn, catalytic sites etc.)

    d) protein Φ/Ψ angle sequences

    • Výsledky:

    a) Sequence multiple alignment

    b) 3D multiple alignment of fragments, motifs and protein chains.

    c) Interactions statistics

    d) Motifs characteristics and properties distribution charts.

  • • Define search by ligand

    • Define search by sequence motif (pattern)

    • Define search by metal site geometry

    • Define search by environment

    • has same environment

    • has similar environment

    PDBe-site page

    • Compare ligand environments.

    • Analyze interactions between ligand and protein.

    • Compare binding environment.

    • Look for ligands within a certain environment.

    • Superpose binding sites and ligands.• Predict what could bind that empty

    pocket in your structure

  • What assembly can my structure have ?

    PDBePisa

    • PQS – protein quarternary structure

    • velmi obtížné získat predikcí –krystalografie a EM

  • The new EMViewer 3D visualization Java applet is av ailable on the EMDB Atlas pages and allows interactive generation of isosurface represe ntations.

    Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.

    EMviewer

  • PDBsum

  • Schematic diagrams from the PDBsum ‘Protein page’ fo r entry 1a5z: lactate dehydrogenase from Thermatoga maritima (16).

    Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)

    PDBSum

    • Snaha mít všechny informace na jednom místě

    • Dodatečné analýzy– schéma sekundárních

    struktur– Ligplot

  • Extracts from the protein–protein interaction diagr ams in PDBsum for PDB entry 1mmo, a non-haem iron hydroxylase from Methylococcus capsul atus (17).

    Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)

    PDBSum interfaces

  • NDB

  • NDB

    • DNA• RNA

  • NDB3D struktura 2D struktura

    RNAview

  • CSD

    • The Cambridge Structural Database

    • www.ccdc.cam.ac.uk• malé látky

    • placená + pro výukové účely otevřený set 500 látek

    600050730ProteinsPDB

    5003555Nucleic AcidsNDB

    40000488057Organics, Metal-OrganicsCSD

    9000100200Inorganics & MineralsICSD

    9000119600Metals, alloys, inorganicsCRYSTMET

    za rokTotal (2009)co?DB

  • CSD - komponenty

  • WebCSD

  • Mercury• Mercury visualiser

    – Crystal structure visualisation program by CCDC

    • Free• Teaching subset embedded

  • A zpátky k proteinům...

  • Klasifikace struktur proteinů

    Class:similar contents of secondary structures

    Architecture (Fold):structural similarity

    Superclass (Topology):probably same ancestor

    • SCOP, CATH, FSSP, 3Dee

  • SCOP

    • Structural Classification of Proteins• manual classification of protein structural domains based on

    similarities of their amino acid sequences and three-dimensional structures.

    • SCOP utilizes four levels of hierarchic structural classification:– class - general "structural architecture" of the domain– fold - similar arrangement of regular secondary structures but without

    evidence of evolutionary relatedness– superfamily - sufficient structural and functional similarity to infer a

    divergent evolutionary relationship but not necessarily detectablesequence homology

    – family - some sequence similarity can be detected.

    Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classification of proteins database for theinvestigation of sequences and structures. J. Mol. Biol. 247, 536-540.

  • CATH• manually-curated hierarchical

    classification of protein domainstructures.

    • více automatizované, než SCOP • Class

    – secondary structure content• (mainly-alpha, mainly-beta,

    mixed alpha/beta or 'fewsecondary structures');

    • Architecture– general arrangement of the

    secondary structuresirrespective of connectivitybetween them

    • (e.g. alpha/beta sandwich);

    • Topology (Fold)– connectivity of secondary

    structures in the chain;• Homologous Superfamily

    – domains that are believed to berelated by a commonancestor .

    • S-levels– automated clustering based on

    sequence identity.

  • CATH

  • GENE3D

    • Gene3D – large collection of CATH protein domain

    assignments for ENSEMBL genomes andUniprot sequences

    – functional information, as well as taxonomicdistributions, multi-domain architectures andprotein-protein interaction (PPI) data.

  • FSSP - fold classificationwww2.embl-

    ebi.ac.uk/dali/fssp/

    structurallysuperimposedproteins by (DALI)

    "Distance-matrix ALIgnment"

  • 3Dee – domény

    http://www.compbio.dundee.ac.uk/3Dee/Hierarchie jednotlivých domén

    klastrování dle strukturní podobnosti

    Dengler, U., Siddiqui, A. S. & Barton, G. J. (2001). Protein structural domains: Analysis of the 3Dee domains database. Proteins 42 , 332-344. Siddiqui, A. S., Dengler, U. & Barton, G. J. (2001). 3Dee: A database of protein structural domains. Bioinformatics 17, 200-201.

  • Databáze, na které se nedostalo...

    • Relibase– protein-ligand interactions

    • Modbase, SWISSModel repository, MMDB– databáze modelů

    • MolMovdb– Macromolecular Motions database

    • A spousta dalších většinou specifických pro daný problém– např. jen pro cytochromy P450

    • CYPED, SuperCyp, Cytochrome P450 Homepage, Fungal CYP database, CYPallelles, Arabidopsis Cytochrome P450s, Cytochrome P450 Drug Interactions Table, a další.

    • Pak nezbývá, než použít Google. :o)