sommaire ● base de donnees modele entite-relation identifiants normalises ● programme python de...
Post on 12-Jan-2016
218 Views
Preview:
TRANSCRIPT
Sommaire● Base de donnees
• Modele entite-relation
• Identifiants normalises
● Programme Python de gestion des annotations
• Choix techniques
• Processus
• Remarques et ameliorations possibles
● Comparaisons d'annotations
● Conclusion
Base de donnees – Modele entite-relation
Bases de donnees – Identifiants
normalisesIDENTIFIANTS NORMALISES IDENTIFIANTS AFFX IDENTIFIANTS RESNET
GENE_SYMBOL Gene Symbol Name GENE_SYMBOL
ALIAS NA Alias NA
GENE_NAME Gene Title Description GENE_NAME
ENTREZ_ID Entrez Gene LocusLink ID LOCUS_ID
UNIGEN_ID UniGene ID Unigene ID UNIGENE
CHROMOSOME_LOCATION Chromosomal Location homo sapiens chromosome position MAP
OMIM_ID OMIM OMIM ID OMIM
ENSEMBL_ID Ensembl NA NA
ACCESS_NUMBER Genbank ID ACC_NUM
REFSEQ_ID RefSeq Protein ID NA REFSEQ
SWISSPROT_ID SwissProt Swiss-Prot Accession NA
GO_ID GO_ID GO_ID
GO_DESCRIPTION GO_DESCRIPTION GO_ID
PATHWAY Pathway KEGG pathway PATH
IDENTIFIANTS BIOCO
RefSeq Transcript IDAnnotation Transcript Cluster
Gene Ontology Biological Process Gene Ontology Molecular Function Gene Ontology Cellular Component Gene Ontology Biological Process Gene Ontology Molecular Function Gene Ontology Cellular Component
Gestion des annotations – Choix
techniques
Implementation utilisant Python 2.3.4• Facilite et rapidite d'utilisation.• Excellent exercice dans le cadre de ce stage.• Modules pour travailler avec MySQL (mySQLdb) et
pour lire des fichiers .csv (csvReader).
Gestion des annotations – Remarques
et ameliorations futures
● Uniformiser le format des fichiers dans le fond et dans la forme● Meme traitement pour 2 organismes differents● Relation entre 2 probesets appartenant a differentes especes pour un
meme gene● Traitement des experiences
• Analyse comparative CUFI/NULI (Berthiaume) vs CF/non-CF
(Wright)• Analyse comparative NULI/DMNQ vs ATII/TNF (Berthiaume)
Comparaison des annotations Differences
entre les gene_symbol+--------------+-----------+-----------------------------------------------+---------------+| probe | Affy | ResNet | BioCo |+--------------+-----------+-----------------------------------------------+---------------+ | 1552279_a_at | PCFT | SARM1 | HCP1 || 1552318_at | DCTN5 | ARC | GIMAP1 || 1552393_at | ENTHD1 | FLJ25421 | RP1-172B20.3 || 1552394_a_at | ENTHD1 | FLJ25421 | RP1-172B20.3 || 1552405_at | NLRP5 | MATER | NALP5 || 1552411_at | DEFB106B | DEFB106 | DEFB106A | | 1552412_a_at | DEFB106B | DEFB106 | DEFB106A || 1552449_a_at | LOC653486 | RYD5 | SCGB1C1 || 1552514_at | WBP2NL | MGC26816 | CTA-250D10.11 || 1552531_a_at | NLRP11 | PYPAF6 | NALP11 || 1552641_s_at | LOC732419 | TOB3 | ATAD3B || 1552641_s_at | LOC727868 | TOB3 | ATAD3B | | 1552641_s_at | ATAD3A | TOB3 | ATAD3B || 1552663_a_at | ERC1 | ELKS | RAB6IP2 || 1552833_at | B3GNT6 | IMAGE:4907098 | B3Gn-T6 | | 1552834_at | B3GNT6 | IMAGE:4907098 | B3Gn-T6 || 1552882_a_at | FAM123B | FLJ39827 | RP11-403E24.2 || 1552927_at | MAP3K7IP3 | MGC45404 | TAB3 | | 1552928_s_at | MAP3K7IP3 | MGC45404 | TAB3 || 1552932_at | NLRP6 | PYPAF5 | NALP6 || 1553002_at | DEFB105B | DEFB105 | DEFB105A | | 1553247_a_at | CYP4F8 | ZNF564 | ZNF709 || 1553315_at | SLFNL1 | FLJ23878 | RP11-348A7.4 || 1553320_s_at | LOC641983 | MGC26484 | CDC14C | | 1553320_s_at | CDC14B | MGC26484 | CDC14C || 1553320_s_at | LOC648060 | MGC26484 | CDC14C || 1553326_at | RXFP2 | GREAT | LGR8 | | 1553340_s_at | AMAC1 | AMAC | AMAC1L2 || 1553590_at | FAM27E1 | MGC42630 | LOC158318 || 1553639_a_at | GBP2 | PERC | PPARGC1B | | 1553639_a_at | GBP4 | PERC | PPARGC1B || 1553695_a_at | NLRX1 | FLJ21478 | NOD9 || 1553761_at | C22orf30 | MGC50372 | RP4-694E4.2 || 1553817_at | LOC727983 | POM121L1 | DKFZP434P211 || 1553817_at | LOC651452 | POM121L1 | DKFZP434P211 || 1553817_at | LOC728451 | POM121L1 | DKFZP434P211 | | 1553817_at | LOC646074 | POM121L1 | DKFZP434P211 || 1553817_at | LOC728418 | POM121L1 | DKFZP434P211 || 1553818_x_at | LOC727983 | POM121L1 | DKFZP434P211 | | 1553818_x_at | LOC651452 | POM121L1 | DKFZP434P211 |
+--------------+-----------------------------------------------+-----------------------------------------------+-----------------------------------------------+| probe | Affy | ResNet | BioCo | +--------------+-----------------------------------------------+-----------------------------------------------+-----------------------------------------------+| 1552318_at | Dynactin 5 (p25) | activity-regulated cytoskeleton-associated pr | GTPase, IMAP family member 1 | | 1553247_a_at | cytochrome P450, family 4, subfamily F, polyp | zinc finger protein 564 | zinc finger protein 709 || 1553562_at | CD8b molecule | CD8 antigen, beta polypeptide 1 (p37) | CD8 antigen, beta polypeptide (p37) | | 1553822_at | receptor (chemosensory) transporter protein 1 | receptor transporting protein 1 | receptor transporter protein 1 || 1553823_a_at | receptor (chemosensory) transporter protein 1 | receptor transporting protein 1 | receptor transporter protein 1 | | 1553993_s_at | mediator of RNA polymerase II transcription, | immunoglobulin kappa variable 1/OR-1 | mediator of RNA polymerase II transcription, || 1554194_at | CDNA clone IMAGE:4825132 | PDZ and LIM domain 2 (mystique) | KIAA1967 | | 1554260_a_at | FRY-like | Rac GTPase activating protein 1 | furry homolog-like (Drosophila) || 1554344_s_at | similar to aquaporin 12A | aquaporin 12B | aquaporin 12A | | 1554511_at | WW and C2 domain containing 1 | KIBRA protein | WW, C2 and coiled-coil domain containing 1 || 1554762_a_at | WW and C2 domain containing 2 | BH3-only member B protein | WW, C2 and coiled-coil domain containing 2 | | 1555671_at | islet cell autoantigen 1,69kDa-like | amyotrophic lateral sclerosis 2 (juvenile) ch | amyotrophic lateral sclerosis 2 (juvenile) ch || 1555833_a_at | CDNA FLJ38849 fis, clone MESAN2008936 | immunity-related GTPase family, Q | nucleophosmin (nucleolar phosphoprotein B23, | | 1555855_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | 20-alpha (3-alpha)-hydroxysteroid dehydrogen || 1555855_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | aldo-keto reductase family 1, member C1 (dihy | | 1555856_s_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | 20-alpha (3-alpha)-hydroxysteroid dehydrogen || 1555856_s_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | aldo-keto reductase family 1, member C1 (dihy | | 1555913_at | gon-4-like (C. elegans) | gon-4 homolog (C.elegans) | gon-4-like (C.elegans) || 1555950_a_at | CD55 molecule, decay accelerating factor for | decay accelerating factor for complement (CD5 | CD55 antigen, decay accelerating factor for c | | 1556078_at | Hypothetical protein LOC143286 | chromosome 10 open reading frame 6 | mitochondrial ribosomal protein L43 || 1556088_at | olfactory receptor, family 5, subfamily T, me | RPA interacting protein | complement component 1, q subcomponent bindin |
Comparaison des annotations
Differences entre les gene_name
Comparaison des annotations
Differences entre les Entrez_id+--------------+--------+--------+--------+| probe | Affy | ResNet | BioCo |+--------------+--------+--------+--------+| 1552281_at | 5826 | 378941 | 283375 | | 1552281_at | 5826 | 72002 | 283375 || 1552281_at | 5826 | 72086 | 283375 || 1552302_at | 728772 | 103625 | 113277 || 1552302_at | 728772 | 217203 | 113277 || 1552303_a_at | 728772 | 103625 | 113277 | | 1552303_a_at | 728772 | 217203 | 113277 || 1552318_at | 84516 | 312312 | 170575 || 1552318_at | 84516 | 16205 | 170575 || 1552318_at | 84516 | 11838 | 170575 || 1552318_at | 84516 | 23237 | 170575 | | 1552318_at | 84516 | 53837 | 170575 || 1552318_at | 84516 | 54323 | 170575 || 1552318_at | 84516 | 97989 | 170575 || 1552381_at | 84669 | 272009 | 135295 || 1552449_a_at | 653486 | 338417 | 147199 | | 1552474_a_at | 7402 | 25257 | 2593 || 1552474_a_at | 7402 | 14431 | 2593 || 1552474_a_at | 7402 | 103105 | 2593 || 1552611_a_at | 23091 | 362552 | 3716 || 1552611_a_at | 23091 | 84598 | 3716 | | 1552611_a_at | 23091 | 16451 | 3716 || 1552611_a_at | 23091 | 100022 | 3716 || 1552611_a_at | 23091 | 230508 | 3716 || 1552611_a_at | 23091 | 319959 | 3716 || 1552641_s_at | 55210 | 388767 | 83858 | | 1552641_s_at | 55210 | 170769 | 83858 |
Comparaison des annotations
Differences entre les Unigen_id+--------------+-----------+-----------+-----------+| probe | Affy | ResNet | BioCo |+--------------+-----------+-----------+-----------+ | 1007_s_at | Hs.631988 | Mm.5021 | Hs.520004 || 1007_s_at | Hs.631988 | Rn.7807 | Hs.520004 || 1053_at | Hs.647062 | Mm.383189 | Hs.139226 || 1053_at | Hs.647062 | Rn.113319 | Hs.139226 || 1552263_at | Hs.431850 | Rn.34914 | Hs.568258 || 1552263_at | Hs.431850 | Mm.196581 | Hs.568258 || 1552264_a_at | Hs.431850 | Rn.34914 | Hs.568258 || 1552264_a_at | Hs.431850 | Mm.196581 | Hs.568258 || 1552281_at | Hs.94395 | Mm.22983 | Hs.524506 || 1552281_at | Hs.94395 | Hs.556043 | Hs.524506 || 1552286_at | Hs.437691 | Mm.159369 | Hs.534515 || 1552301_a_at | Hs.143046 | Mm.33477 | Hs.178728 || 1552301_a_at | Hs.143046 | Rn.28432 | Hs.178728 || 1552309_a_at | Hs.632387 | Mm.200188 | Hs.22370 || 1552309_a_at | Hs.632387 | Rn.107975 | Hs.22370 || 1552314_a_at | Hs.185774 | Mm.227733 | Hs.469543 || 1552315_at | Hs.647087 | Mm.25405 | Hs.159955 || 1552315_at | Hs.647087 | Rn.10086 | Hs.159955 || 1552315_at | Hs.647087 | Hs.40888 | Hs.159955 || 1552316_a_at | Hs.647087 | Rn.10086 | Hs.159955 || 1552316_a_at | Hs.647087 | Hs.40888 | Hs.159955 || 1552316_a_at | Hs.647087 | Mm.25405 | Hs.159955 || 1552318_at | Hs.435941 | Hs.40888 | Hs.159955 || 1552318_at | Hs.435941 | Mm.25405 | Hs.159955 || 1552318_at | Hs.435941 | Rn.10086 | Hs.159955 || 1552330_at | Hs.513832 | Hs.567640 | Hs.534773 || 1552337_s_at | Hs.591609 | Rn.141410 | Hs.386365
Comparaison des annotations
Differences entre les Chromosome_location+--------------+--------------------------+------------------+----------------------------------------------+| probe | Affy | ResNet | BioCo | +--------------+--------------------------+------------------+----------------------------------------------+| 1552281_at | 14q24.3 | 12q13.3 | 12q13.2 | | 1552318_at | 16p12.1 | 8q24.3 | 7q36.1 || 1553034_at | 21q22.3 | 1q44 | 1q43-q44 | | 1553432_s_at | 16p12.1 | 16p12.2 | 16p12.2|16p12.2 || 1553639_a_at | 1p22.2 | 5q32 | 5q33.1 | | 1554500_a_at | 1q43|1q23.1 | 1q43 | 1q43|1q23.1 according to Sierra (Genomics 79 || 1554500_a_at | 1q43|1q23.1 | 1q43 | 177 | | 1554500_a_at | 1q43|1q23.1 | 1q43 | 2002) [AFS] || 1555282_a_at | 1p22.2 | 5q32 | 5q33.1 | | 1555671_at | 2q33.1 | 2q33.2 | 2q33 || 1556088_at | 11q11 | 17p13.2 | 17p13.3 | | 1557203_at | Xq13.1 | Xq13.1-q13.2 | Xq13.2 || 1557886_at | 17q24.1-q24.2 | 17q24.2 | 17q24.1 | | 1559285_at | 17q21.31 | 14q32.3-qter | 14q32.3-qter|14q32 || 1559501_at | 21q22.2 | 2p22-p21 | 21q22.12 | | 1559917_a_at | 21q22.2 | 2p22-p21 | 21q22.12 || 1561669_at | 3p12-p11.1 | 3p11.1 | 3p11.2 | | 1563221_at | 12q23 | 12q24.11 | 5q32 || 1563488_at | 12q24.31-q24.32 | 12q24.3 | 12q24.32 | | 1565454_at | Xp11.22-p11.21 | Xp11.22 | Xp11.21 || 1565772_at | 11q13-q14 | 11q13.5 | 11q14.1 | | 1567862_at | 1q42.12 | 1 | 1q42 || 1568884_at | 7q22 | 7q22-q32 | 4q28 | | 1569519_at | 1p36.13 | 1p13-p11 | 1q21.1 || 1569519_at | 1q21.2 | 1p13-p11 | 1q21.1 | | 201003_x_at | 1q32 | 1q32.2 | 20q13.2 || 201003_x_at | 3q26.31 | 1q32.2 | 20q13.2 | | 201003_x_at | 3q26.31 | 1q32 | 20q13.2 || 201104_x_at | 1q21.1 | 1p13-p11 | 1q12-1q21.2 | | 202938_x_at | 22q13.2 | 22q13 | 22q13.2-q13.31 || 203624_at | Xp22.32; Ypter-p11.2 | Xp22.3 or Yp11.3 | Xp22.32 | | 203624_at | Xp22.32; Ypter-p11.2 | Xp22.3 or Yp11.3 | Ypter-p11.2 || 204171_at | 17p11.2 | 17q23.2 | 17q23.1 | | 206290_s_at | 1q43|1q23.1 | 1q43 | 1q43|1q23.1 according to Sierra (Genomics 79 || 206290_s_at | 1q43|1q23.1 | 1q43 | 177 |
|
Comparaison des annotations –
Differences entre les OMIM_ID+--------------+-------+--------+--------+--------+| probe | id | Affy | ResNet | BioCo |+--------------+-------+--------+--------+--------+| 121_at | 44932 | 218700 | 167415 | 167415 || 121_at | 44932 | 218700 | 167415 | 218700 || 121_at | 44932 | 167415 | 167415 | 218700 || 1255_g_at | 44933 | 602093 | 600364 | 602093 || 1255_g_at | 44933 | 600364 | 600364 | 602093 || 1255_g_at | 44933 | 602093 | 600364 | 600364 || 1494_f_at | 44941 | 211980 | 122720 | 122720 || 1494_f_at | 44941 | 211980 | 608054 | 122700 || 1494_f_at | 44941 | 122700 | 122720 | 122700 || 1494_f_at | 44941 | 211980 | 608054 | 122720 || 1494_f_at | 44941 | 122700 | 122720 | 122720 || 1494_f_at | 44941 | 122720 | 608054 | 122700 || 1494_f_at | 44941 | 122720 | 608054 | 122720 || 1494_f_at | 44941 | 211980 | 122720 | 122700 || 1494_f_at | 44941 | 122700 | 608054 | 122700 || 1494_f_at | 44941 | 122700 | 608054 | 122720 || 1494_f_at | 44941 | 122720 | 122720 | 122700 || 1552281_at | 44959 | 603214 | 608730 | 608730 || 1552304_at | 44973 | 152427 | 603313 | 603313 || 1552306_at | 44974 | 152427 | 603313 | 603313 || 1552332_at | 44994 | 609761 | 609761 | 609823 || 1552332_at | 44994 | 609823 | 609761 | 609761 || 1552332_at | 44994 | 609823 | 609761 | 609823 || 1552334_at | 44995 | 609823 | 609761 | 609761 || 1552334_at | 44995 | 609823 | 609761 | 609823 || 1552334_at | 44995 | 609761 | 609761 | 609823 |
top related