tics techniques cod ek nov07
Post on 06-Apr-2018
229 Views
Preview:
TRANSCRIPT
-
8/3/2019 tics Techniques COD EK Nov07
1/18
1/18
Bioinformatics tools and techniquesInto the heart of darkness
Elaine Kenny
Colm ODushlaine
15/11/07
-
8/3/2019 tics Techniques COD EK Nov07
2/18
2/18
Summary
Simple overviews of some of the tools and methods used by EK andCOD
TK notebook
get_hapmap_snps.pl: retrieve HM genotype information for a list ofSNP
s GeneViewer.pl & cross_ref.pl: visualise e.g. SNPs in the context of
other genomic landmarks. Score SNPs depending on how many ofthese landmarks they overlap with
ld_expander.pl: find SNPs in LD with SNPs of interest, based onuser-specified r2 and LD window (distance between SNPs)
STATA VIM: command line text editor
Lab website
-
8/3/2019 tics Techniques COD EK Nov07
3/18
3/18
TK notebook
Application for saving notes, to-do lists, daily
logs, and any other kind of textual information
in a place where you can find it all again, and
where related information is easily found
Easy to edit and rapidly searchable
DEMO editing
DEMO search
-
8/3/2019 tics Techniques COD EK Nov07
4/18
4/18
get_hapmap_snps.pl
Simple script to read in a 1-column list of
SNPs and retrieve HapMap genotypes
Can select population and strand DEMO
Retrieved data can be loaded into HaploView
DEMO
-
8/3/2019 tics Techniques COD EK Nov07
5/18
5/18
cross_ref_scored.pl
Score SNPs based on how many putatively functional regionsthey overlap with:
On a per gene / chromosome basis
Gene basis:
Type: perl cross_ref_scored.pl file_A file_B file_C ...where
file_A - 2-column file ofSNPs (format = id, location)
file_B - 3-column file of EXONS (format = id/name, start, stop)
file_C ... - whatever you want, (format = id/name, start, stop)
i.e. other regions like CpGs, TFBS, clusters. Any order.
-
8/3/2019 tics Techniques COD EK Nov07
6/18
6/18
cross_ref_scored.pl example output:
Can then be merged with HapMap / Perlegen to retrieve MAF data
forSNPs
-
8/3/2019 tics Techniques COD EK Nov07
7/18
7/18
Merge cross_ref_scored data with HapMap/Perlegen data using merge_per_hap.pl
Type:
perl merge_per_hap.pl perlegen.txt hapmap.txt overlapped_region_scored.txt
Where:
hapmap.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq),
perlegen.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq)
-
8/3/2019 tics Techniques COD EK Nov07
8/18
8/18
cross_ref.pl applied to WGA data
cross_ref.pl: Scoring SNPs throughout genome Data analysed on coding/non-coding basis
(coding)
perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.coding.txt 22
WTCCC_T2D_chr22_without_inferred.forCrossRef
WGA_databases/coding_non_synon_SNPs_UCSC.clean=3
WGA_databases/coding_synon_SNPs_UCSC.clean=2
WGA_databases/RefSeq_Genes_UCSC.byExon.uniqid=1 WGA_databases/Triplexes_may2006.bed=2
WGA_databases/splice_site_SNPs_UCSC.clean=2 >
Overlapped_regions_scored.WTCCC.chr22.coding.log &
(input-dependent, coding/non-coding dependent, arbitrary)
(noncoding)
perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.NONcoding.txt 22
WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/TFBS.chr22=1WGA_databases/CpG_islands_UCSC.uniqid=1
WGA_databases/Most_conserved_phastConsElements17way_UCSC.clean=1
WGA_databases/promoters_knowngene_hg18.txt=1 WGA_databases/sno_or_miRNA_UCSC.uniqid=1 >
Overlapped_regions_scored.WTCCC.chr22.NONcoding.log &
-
8/3/2019 tics Techniques COD EK Nov07
9/18
9/18
cross_ref.pl
cross_ref.pl output:
Load into STATA. IfSNPs have e.g.
association p-values, calculate adjusted p-value (R. Anney) as-log10[P] + [cross_ref_score]
-
8/3/2019 tics Techniques COD EK Nov07
10/18
10/18
GeneViewer.pl
GeneViewer.pl: Visualise overlappingfeatures (e.g. exons, SNPs etc.) along e.g.
your gene of interest (html output)
-
8/3/2019 tics Techniques COD EK Nov07
11/18
11/18
ld_expander.pl
Find proxies (SNPs in LD) for a list ofSNPs
User specifies the r2 and LD window
Currently configured to obtain proxies from HM CEU
Result is a list of additional proxy SNPs that have
been obtained by LD expansion
DEMO
Note: dont LD expand >150000
SNPs, or HapMapwill ban you! COD has an alternative version that
uses local pre-computed pairwise LD SNP files
-
8/3/2019 tics Techniques COD EK Nov07
12/18
12/18
STATA
Extremely powerful and flexible
>65k rows handled shock horror!
Can write scripts to automate tasks, e.g. read in file,
do analysis, save results
When use GUI to run some commands, the
commands are shown in the command window, so
can save in a do file
COD, EK and R. Anney strongly advocate this as a
platform for both file manipulation and statistical
analysis
-
8/3/2019 tics Techniques COD EK Nov07
13/18
13/18
http://www.wtccc.org.uk/
STATA example using WTCCC data
Bipolar Disorder,
Coronary Artery Disease,
Crohn's Disease,
Hypertension,
Rheumatoid Arthritis,
Type 1 Diabetes,
Type 2 Diabetes
-
8/3/2019 tics Techniques COD EK Nov07
14/18
14/18
DATA FORMAT
3 folders: Basic
Each case collection against the pooled control groups
58C and UKBS
Combined cases Combining other case collections as controls
Combined controls
Combining phenotypically relevant case collections
(e.g. RA/T1D, autoimmune )
Data are split by chromosome
-
8/3/2019 tics Techniques COD EK Nov07
15/18
15/18
Questions
How do I get all of the chromosome data formy gene of interest into one file?
How do I search easily all of the SNP
information for my gene(s) of interest? Create a .do file for all manipulations that you
want to carry out to the data
DEMO
Good starting resource:http://www.ats.ucla.edu/stat/stata/
-
8/3/2019 tics Techniques COD EK Nov07
16/18
16/18
VIM
Vi Improved. Mainly UNIX but cross-platform text editor (available for Windows).
Full list of commands outside scope of thisdemonstration
Very fast and efficient, esp. with search andreplace functions on large datasets
Regular expression pattern matching
DEMO Integrates with Cygwin (www.cygwin.com
very useful UNIX emulator for windows)
-
8/3/2019 tics Techniques COD EK Nov07
17/18
17/18
Group website
Some useful stuff up there!
Please send information about current
projects etc. Good for our image as a group
and minimal effort required on your part
DEMO
-
8/3/2019 tics Techniques COD EK Nov07
18/18
18/18
Conclusions
Small summary of some things you can do
Slides and video demonstrations will be online at:http://www.medicine.tcd.ie/psychiatry/research/neurop
sychiatry/Protocols/
COD & EK available for advice (Fridays 9-9.02am)
These things will help you in your work!!
top related