images.nature.com · web viewthen run dbotucaller (a dry run is showed here with the parameters...
TRANSCRIPT
Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing
time-course
Authors
Nicolas Tromas1*, Nathalie Fortin2, Larbi Bedrani1, Yves Terrat1, Pedro Cardoso4, David Bird3,
Charles W. Greer2 and B. Jesse Shapiro1*
Author affiliations
1- Département de sciences biologiques, Université de Montréal, 90 Vincent-d’Indy, Montréal,
QC, Canada, Montréal, QC H2V 2S9, Canada
2- National Research Council Canada, Energy, Mining and Environment, 6100 Royalmount
Avenue, Montréal, QC H4P 2R2, Canada
3- Université du Québec à Montréal, Faculté des sciences, Département des sciences biologiques,
Case postale 8888, Succ Centre-ville, Montréal, QC H3C 3P8, Canada
4- Finnish Museum of Natural History University of Helsinki, P.O. Box 17 (Pohjoinen
Rautatiekatu 13) 00014 Helsinki, Finland
*Corresponding authors: B. Jesse Shapiro. Phone: 514-343-6033. E-mail:
[email protected]; Nicolas Tromas. Phone 514-343-3188. E-mail:
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
Supplementary methods
Sampling
The sampling was performed exactly as described in Fortin et al., 2015. We used a part of these
samples for this study.
The positions of the two stations sampled from April to November, between 2006 and 2013:
Littoral Station latitude: 45.038907379749226 longitude: -73.07982623577118
Pelagic Station latitude: 45.036837653060864 longitude: -73.10624599456787
2
25
26
27
28
29
30
31
32
33
34
35
2
As described in Fortin et al., 2015, the two stations were sampled as follows: three sub-samples
were combined from collected surface water using an acrylic tube (93cm long by 10 cm in
diameter). This tube was submerged vertically beneath the surface of the water. Sampling was
performed between 10 AM and 14:00.
Total nutrients were measured directly from collected water and dissolved nutrients
concentrations were measured from the filtrate of the GF/F 0.7 μm glass-fiber filter (Whatman,
Inc., Florham Park, NJ). These analyses were performed in the laboratory of the Groupe de
recherche interuniversitaire en limnologie et en environment aquatique (GRIL).
Water temperatures were obtained with a YSI 6600 v2 water quality multi-probe (YSI, Yellow
Springs, Ohio, USA). Air temperature and daily precipitation data were obtained from
Environment Canada (National Climate Data and Information Archive).
We measure Microcystin concentrations by filtering between 20 and 500 ml of water samples,
according to the density of the planktonic biomass, through a Whatman GF/F 0.7 m glass-fiber
filter (Whatman, Inc., Florham Park, NJ). Both filter and filtrate were used to detect intracellular
(particulate) and extracellular (dissolved) toxins respectively.
We filtered the same day of the extraction, between 130 and 250 mL of water samples, depending
on the amount of suspended solids and/or biomass, onto 0.2-μm hydrophilic polyethersulfone
membranes (Millipore, Etobicoke, ON). The filters were conserved at -80°C until the DNA
extraction.
DNA library preparation
The first amplification targets the V4 region of the 16S rRNA gene using the following primers:
U515_forward:
ACACGACGCTCTTCCGATCTYRYRGTGCCAGCMGCCGCGGTAA and
3
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
3
E786_reverse:
CGGCATTCCTGCTGAACCGCTCTTCCGATCTGGACTACHVGGGTWTCTAAT. PCR
reactions were performed with a Mastercycler nexus GSX1 (Eppendorf) using 2μl of extracted
DNA, 14.25μl of sterile water, 5μl HF buffer (New England Biolabs), 0.5μl dNTPs, 0.25μl
Phusion High-Fidelity DNA Polymerase (New England Biolabs), and 1.5μl of forward and
reverse primers. Lahr and Katz (2009) showed that the Phusion DNA polymerase might favor
PCR chimera under specific conditions like high cycles number. In order to minimized chimera
formation and limit possible PCR artifacts, samples were amplified in quadruplicate with a
maximum of 22 cycles following these conditions: initial denaturation at 98°C for 30 seconds; 22
cycles alternating 98°C for 15 seconds, 40 seconds at 54°C, 30 s kb–1 at 72°C, and final
elongation step for one minute at 72°C. Negative and positive controls were included in the
amplification as well as mock communities. Mock communities were constructed as described in
Preheim et al. (2013) and generously provided by this group. PCR products purification was
performed with ZYMO DNA Clean & Concentrator (ZYMO Research) following the
manufacturer’s protocol. A second PCR step was used to integrate Illumina adapter sequences
and a 9-bp barcode for library recognition. To do so, we mixed 4μl of the pooled and purified
PCR products with 10.25μl of sterile water, 5μl HF buffer, 0.5μl DNTPs, 0.25μl Phusion DNA
Polymerase and 2.5μl of the following primers PE-III-PCR-forward:
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
and PE-III-PCR-001-096-reverse:
CAAGCAGAAGACGGCATACGAGATNNNNNNNNNCGGTCTCGGCATTCCTGCTGAAC
CGCTCTTCCGATCT (where N indicate the unique barcode). Indexing was performed under the
following thermal conditions: initial denaturation at 98°C for 30 seconds, 7 cycles alternating
98°C for 30 seconds, 30 seconds at 83°C, and finally 30 seconds at 72°C. This second
4
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
4
amplification was performed in triplicate to minimize the PCR errors. Triplicates were then
pooled, purified with Agencourt AMPure XP kit (Beckman Coulter) and finally quantified by a
Qubit 2.0 Fluorometer (Invitrogen). Indexed samples were pooled to obtain a final concentration
range between 10-20 ng/μl, diluted and denatured according to Illumina’protocol to be paired-end
sequenced using a MiSeq (Illumina).
Clustering pipeline: SmileTrain pipeline that incorporates dbOTUcaller (Preheim et al.,
2013, https://github.com/spacocha/dbOTUcaller)
Running SmileTrain (from https://github.com/almlab/SmileTrain )
After SmileTrain installation:
export PYTHONPATH=$PYTHONPATH:/path_where_SmileTrain_is_installed
If you have 3 files, R1,R2, and index :
python /path/SmileTrain/tools/convert_3file_to_2file.py R1.fastq R2.fastq Index.fastq
R1_output.fastq R2_output.fastq
Then run dbOTUcaller (a dry run is showed here with the parameters used in this paper) :
python /path/SmileTrain/otu_caller.py -f Run_R1.fasta -r Run_R2_fasta -p
GTGCCAGCMGCCGCGNNNN -q GGACTACHVGGGTWTCNNNN -b barcodes.txt -n
10 --split --primers --merge --demultiplex --qfilter --dereplicate --index --dbotu --k_fold 10
5
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
5
--dbotu_chimeras --dbotu_split --maxee 0.5 --dry_run
Splitting fastq
dry run: test that files are non-empty: Run_R1.fasta
dry run: test that destinations are free: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9
dry run: test that files are non-empty: Run_R2_fasta
dry run: test that destinations are free: Run_R2_fasta.0 Run_R2_fasta.1 Run_R2_fasta.2
Run_R2_fasta.3 Run_R2_fasta.4 Run_R2_fasta.5 Run_R2_fasta.6 Run_R2_fasta.7
Run_R2_fasta.8 Run_R2_fasta.9
python /path/SmileTrain/split_fastq.py Run_R1.fasta 10
python /path/SmileTrain/split_fastq.py Run_R2_fasta 10
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9
dry run: test that files are non-empty: Run_R2_fasta.0 Run_R2_fasta.1 Run_R2_fasta.2
Run_R2_fasta.3 Run_R2_fasta.4 Run_R2_fasta.5 Run_R2_fasta.6 Run_R2_fasta.7
Run_R2_fasta.8 Run_R2_fasta.9
Merging reads
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9 Run_R2_fasta.0 Run_R2_fasta.1 Run_R2_fasta.2 Run_R2_fasta.3
Run_R2_fasta.4 Run_R2_fasta.5 Run_R2_fasta.6 Run_R2_fasta.7 Run_R2_fasta.8
6
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
6
Run_R2_fasta.9
dry run: test that destinations are free: Run_R1.fasta.0.tmp Run_R1.fasta.1.tmp
Run_R1.fasta.2.tmp Run_R1.fasta.3.tmp Run_R1.fasta.4.tmp Run_R1.fasta.5.tmp
Run_R1.fasta.6.tmp Run_R1.fasta.7.tmp Run_R1.fasta.8.tmp Run_R1.fasta.9.tmp
Run_R2_fasta.0.tmp Run_R2_fasta.1.tmp Run_R2_fasta.2.tmp Run_R2_fasta.3.tmp
Run_R2_fasta.4.tmp Run_R2_fasta.5.tmp Run_R2_fasta.6.tmp Run_R2_fasta.7.tmp
Run_R2_fasta.8.tmp Run_R2_fasta.9.tmp
python /path/SmileTrain/check_intersect.py Run_R1.fasta.0 Run_R2_fasta.0
python /path/SmileTrain/check_intersect.py Run_R1.fasta.1 Run_R2_fasta.1
python /path/SmileTrain/check_intersect.py Run_R1.fasta.2 Run_R2_fasta.2
python /path/SmileTrain/check_intersect.py Run_R1.fasta.3 Run_R2_fasta.3
python /path/SmileTrain/check_intersect.py Run_R1.fasta.4 Run_R2_fasta.4
python /path/SmileTrain/check_intersect.py Run_R1.fasta.5 Run_R2_fasta.5
python /path/SmileTrain/check_intersect.py Run_R1.fasta.6 Run_R2_fasta.6
python /path/SmileTrain/check_intersect.py Run_R1.fasta.7 Run_R2_fasta.7
python /path/SmileTrain/check_intersect.py Run_R1.fasta.8 Run_R2_fasta.8
python /path/SmileTrain/check_intersect.py Run_R1.fasta.9 Run_R2_fasta.9
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.0 -reverse Run_R2_fasta.0 -
fastq_truncqual 2 -fastqout Run_R1.fasta.0.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.1 -reverse Run_R2_fasta.1 -
fastq_truncqual 2 -fastqout Run_R1.fasta.1.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.2 -reverse Run_R2_fasta.2 -
fastq_truncqual 2 -fastqout Run_R1.fasta.2.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.3 -reverse Run_R2_fasta.3 -
7
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
7
fastq_truncqual 2 -fastqout Run_R1.fasta.3.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.4 -reverse Run_R2_fasta.4 -
fastq_truncqual 2 -fastqout Run_R1.fasta.4.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.5 -reverse Run_R2_fasta.5 -
fastq_truncqual 2 -fastqout Run_R1.fasta.5.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.6 -reverse Run_R2_fasta.6 -
fastq_truncqual 2 -fastqout Run_R1.fasta.6.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.7 -reverse Run_R2_fasta.7 -
fastq_truncqual 2 -fastqout Run_R1.fasta.7.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.8 -reverse Run_R2_fasta.8 -
fastq_truncqual 2 -fastqout Run_R1.fasta.8.tmp
/usr/local/bin/usearch -fastq_mergepairs Run_R1.fasta.9 -reverse Run_R2_fasta.9 -
fastq_truncqual 2 -fastqout Run_R1.fasta.9.tmp
dry run: test that files are non-empty: Run_R1.fasta.0.tmp Run_R1.fasta.1.tmp
Run_R1.fasta.2.tmp Run_R1.fasta.3.tmp Run_R1.fasta.4.tmp Run_R1.fasta.5.tmp
Run_R1.fasta.6.tmp Run_R1.fasta.7.tmp Run_R1.fasta.8.tmp Run_R1.fasta.9.tmp
rm Run_R1.fasta.0
rm Run_R1.fasta.1
rm Run_R1.fasta.2
rm Run_R1.fasta.3
rm Run_R1.fasta.4
rm Run_R1.fasta.5
rm Run_R1.fasta.6
rm Run_R1.fasta.7
8
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
8
rm Run_R1.fasta.8
rm Run_R1.fasta.9
rm Run_R2_fasta.0
rm Run_R2_fasta.1
rm Run_R2_fasta.2
rm Run_R2_fasta.3
rm Run_R2_fasta.4
rm Run_R2_fasta.5
rm Run_R2_fasta.6
rm Run_R2_fasta.7
rm Run_R2_fasta.8
rm Run_R2_fasta.9
mv Run_R1.fasta.0.tmp Run_R1.fasta.0
mv Run_R1.fasta.1.tmp Run_R1.fasta.1
mv Run_R1.fasta.2.tmp Run_R1.fasta.2
mv Run_R1.fasta.3.tmp Run_R1.fasta.3
mv Run_R1.fasta.4.tmp Run_R1.fasta.4
mv Run_R1.fasta.5.tmp Run_R1.fasta.5
mv Run_R1.fasta.6.tmp Run_R1.fasta.6
mv Run_R1.fasta.7.tmp Run_R1.fasta.7
mv Run_R1.fasta.8.tmp Run_R1.fasta.8
mv Run_R1.fasta.9.tmp Run_R1.fasta.9
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
9
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
9
Run_R1.fasta.9
Removing primers
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9
dry run: test that destinations are free: Run_R1.fasta.0.tmp Run_R1.fasta.1.tmp
Run_R1.fasta.2.tmp Run_R1.fasta.3.tmp Run_R1.fasta.4.tmp Run_R1.fasta.5.tmp
Run_R1.fasta.6.tmp Run_R1.fasta.7.tmp Run_R1.fasta.8.tmp Run_R1.fasta.9.tmp
python /path/SmileTrain/remove_primers.py Run_R1.fasta.0 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.0.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.1 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.1.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.2 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.2.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.3 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.3.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.4 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.4.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.5 GTGCCAGCMGCCGCGNNNN --
10
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
10
max_primer_diffs 1 --output Run_R1.fasta.5.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.6 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.6.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.7 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.7.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.8 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.8.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
python /path/SmileTrain/remove_primers.py Run_R1.fasta.9 GTGCCAGCMGCCGCGNNNN --
max_primer_diffs 1 --output Run_R1.fasta.9.tmp --reverse_primer
GGACTACHVGGGTWTCNNNN
dry run: test that files are non-empty: Run_R1.fasta.0.tmp Run_R1.fasta.1.tmp
Run_R1.fasta.2.tmp Run_R1.fasta.3.tmp Run_R1.fasta.4.tmp Run_R1.fasta.5.tmp
Run_R1.fasta.6.tmp Run_R1.fasta.7.tmp Run_R1.fasta.8.tmp Run_R1.fasta.9.tmp
mv Run_R1.fasta.0.tmp Run_R1.fasta.0
mv Run_R1.fasta.1.tmp Run_R1.fasta.1
mv Run_R1.fasta.2.tmp Run_R1.fasta.2
mv Run_R1.fasta.3.tmp Run_R1.fasta.3
mv Run_R1.fasta.4.tmp Run_R1.fasta.4
mv Run_R1.fasta.5.tmp Run_R1.fasta.5
mv Run_R1.fasta.6.tmp Run_R1.fasta.6
11
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
11
mv Run_R1.fasta.7.tmp Run_R1.fasta.7
mv Run_R1.fasta.8.tmp Run_R1.fasta.8
mv Run_R1.fasta.9.tmp Run_R1.fasta.9
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9
Demultiplexing
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9
dry run: test that destinations are free: q.0.tmp q.1.tmp q.2.tmp q.3.tmp q.4.tmp q.5.tmp
q.6.tmp q.7.tmp q.8.tmp q.9.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.0 barcodes2.txt --max_barcode_diffs 1
--output q.0.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.1 barcodes2.txt --max_barcode_diffs 1
--output q.1.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.2 barcodes2.txt --max_barcode_diffs 1
--output q.2.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.3 barcodes2.txt --max_barcode_diffs 1
--output q.3.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.4 barcodes2.txt --max_barcode_diffs 1
--output q.4.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.5 barcodes2.txt --max_barcode_diffs 1
--output q.5.tmp
12
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
12
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.6 barcodes2.txt --max_barcode_diffs 1
--output q.6.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.7 barcodes2.txt --max_barcode_diffs 1
--output q.7.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.8 barcodes2.txt --max_barcode_diffs 1
--output q.8.tmp
python /path/SmileTrain/map_barcodes.py Run_R1.fasta.9 barcodes2.txt --max_barcode_diffs 1
--output q.9.tmp
dry run: test that files are non-empty: q.0.tmp q.1.tmp q.2.tmp q.3.tmp q.4.tmp q.5.tmp q.6.tmp
q.7.tmp q.8.tmp q.9.tmp
mv q.0.tmp Run_R1.fasta.0
mv q.1.tmp Run_R1.fasta.1
mv q.2.tmp Run_R1.fasta.2
mv q.3.tmp Run_R1.fasta.3
mv q.4.tmp Run_R1.fasta.4
mv q.5.tmp Run_R1.fasta.5
mv q.6.tmp Run_R1.fasta.6
mv q.7.tmp Run_R1.fasta.7
mv q.8.tmp Run_R1.fasta.8
mv q.9.tmp Run_R1.fasta.9
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9
Quality filtering
13
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
13
dry run: test that files are non-empty: Run_R1.fasta.0 Run_R1.fasta.1 Run_R1.fasta.2
Run_R1.fasta.3 Run_R1.fasta.4 Run_R1.fasta.5 Run_R1.fasta.6 Run_R1.fasta.7 Run_R1.fasta.8
Run_R1.fasta.9
dry run: test that destinations are free: q.0.tmp q.1.tmp q.2.tmp q.3.tmp q.4.tmp q.5.tmp
q.6.tmp q.7.tmp q.8.tmp q.9.tmp
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.0
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.1
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.2
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.3
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.4
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.5
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.6
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.7
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.8
python /path/SmileTrain/check_fastq_format.py Run_R1.fasta.9
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.0 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.0.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.1 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.1.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.2 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.2.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.3 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.3.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.4 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
14
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
14
q.4.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.5 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.5.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.6 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.6.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.7 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.7.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.8 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.8.tmp
/usr/local/bin/usearch -fastq_filter Run_R1.fasta.9 -fastq_truncqual 2 -fastq_maxee 0.5 -fastaout
q.9.tmp
python /path/SmileTrain/combine_fasta.py --output q.fst q.0.tmp q.1.tmp q.2.tmp q.3.tmp q.4.tmp
q.5.tmp q.6.tmp q.7.tmp q.8.tmp q.9.tmp
dry run: test that files are non-empty: q.fst
rm q.0.tmp
rm q.1.tmp
rm q.2.tmp
rm q.3.tmp
rm q.4.tmp
rm q.5.tmp
rm q.6.tmp
rm q.7.tmp
rm q.8.tmp
rm q.9.tmp
15
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
15
Dereplicating sequences
python /path/SmileTrain/derep_fulllength.py q.fst --output q.derep.fst
dry run: test that files are non-empty: q.derep.fst
Indexing samples
dry run: test that files are non-empty: q.fst q.derep.fst
dry run: test that destinations are free: q . i n d e x
python /path/SmileTrain/index.py q.fst q.derep.fst --output q.index
dry run: test that files are non-empty: q.index
dbOTU: aligning sequences
perl /path/dbOTUcaller/perllib/temp_071514.pl q.derep.fst q.index unique
/path/mothur/mothur "#align.seqs(fasta=unique.fa,
reference=/path/SmileTrain/tmpdir/silva.bacteria.m45.25434.11887.m668.filter.fasta)"
/path/mothur/mothur "#screen.seqs(fasta=unique.align, start=5, minlength=250)"
perl /path/dbOTUcaller/perllib/filter_mat_from_fasta.pl unique.f0.mat unique.good.align >
unique.f0.good.mat
dry run: test that files are non-empty: unique.good.align unique.f0.good.mat
dbOTU: progressive clustering
perl /path/dbOTUcaller/perllib/find_replace_seq_dash-period.pl unique.good.align
unique.good.align.ng
perl /path/dbOTUcaller/perllib/fasta2uchime_size.pl unique.f0.good.mat unique.good.align.ng
unique.good.align.ng.size
/usr/local/bin/usearch -cluster_otus unique.good.align.ng.size --uc unique.97.uc -otus
unique.97.otus.fa -fastaout unique.97.fastaout.fa
perl /path/dbOTUcaller/perllib/USEARCH_fastaout2list.pl unique.97.fastaout.fa unique.97.uc.list
16
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
16
/usr/local/bin/usearch -sortbylength unique.97.otus.fa -output unique.97.sorted.fa
/usr/local/bin/usearch -cluster_smallmem unique.97.sorted.fa -id 0.96 --uc unique.96.uc -
centroids unique.96.otus.fa
perl /path/dbOTUcaller/perllib/UC2list3.pl unique.96.uc unique.96.uc.list
/usr/local/bin/usearch -sortbylength unique.96.otus.fa -output unique.96.sorted.fa
/usr/local/bin/usearch -cluster_smallmem unique.96.sorted.fa -id 0.95 --uc unique.95.uc -
centroids unique.95.otus.fa
perl /path/dbOTUcaller/perllib/UC2list3.pl unique.95.uc unique.95.uc.list
/usr/local/bin/usearch -sortbylength unique.95.otus.fa -output unique.95.sorted.fa
/usr/local/bin/usearch -cluster_smallmem unique.95.sorted.fa -id 0.94 --uc unique.94.uc -
centroids unique.94.otus.fa
perl /path/dbOTUcaller/perllib/UC2list3.pl unique.94.uc unique.94.uc.list
/usr/local/bin/usearch -sortbylength unique.94.otus.fa -output unique.94.sorted.fa
/usr/local/bin/usearch -cluster_smallmem unique.94.sorted.fa -id 0.93 --uc unique.93.uc -
centroids unique.93.otus.fa
perl /path/dbOTUcaller/perllib/UC2list3.pl unique.93.uc unique.93.uc.list
/usr/local/bin/usearch -sortbylength unique.93.otus.fa -output unique.93.sorted.fa
/usr/local/bin/usearch -cluster_smallmem unique.93.sorted.fa -id 0.92 --uc unique.92.uc -
centroids unique.92.otus.fa
perl /path/dbOTUcaller/perllib/UC2list3.pl unique.92.uc unique.92.uc.list
/usr/local/bin/usearch -sortbylength unique.92.otus.fa -output unique.92.sorted.fa
/usr/local/bin/usearch -cluster_smallmem unique.92.sorted.fa -id 0.91 --uc unique.91.uc -
centroids unique.91.otus.fa
perl /path/dbOTUcaller/perllib/UC2list3.pl unique.91.uc unique.91.uc.list
17
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
17
/usr/local/bin/usearch -sortbylength unique.91.otus.fa -output unique.91.sorted.fa
/usr/local/bin/usearch -cluster_smallmem unique.91.sorted.fa -id 0.90 --uc unique.90.uc -
centroids unique.90.otus.fa
perl /path/dbOTUcaller/perllib/UC2list3.pl unique.90.uc unique.90.uc.list
/usr/local/bin/usearch -sortbylength unique.90.otus.fa -output unique.90.sorted.fa
perl /path/dbOTUcaller/perllib/merge_progressive_clustering4.pl
unique.97.uc.list,unique.96.uc.list,unique.95.uc.list,unique.94.uc.list,unique.93.uc.list,unique.92.u
c.list,unique.91.uc.list,unique.90.uc.list unique.PC.final.list
dry run: test that files are non-empty: unique.PC.final.list
Calling dbOTUs
python /path/dbOTUcaller/dbOTUcaller.py unique.f0.good.mat unique.good.align
unique.dbOTU -k 10.0 -p 0.0001 -d 0.1 -s unique.PC.final.list
/path/mothur/mothur "#degap.seqs(fasta=unique.dbOTU.fasta)"
dry run: test that files are non-empty: unique.dbOTU.list unique.dbOTU.ng.fasta
unique.dbOTU.mat unique.dbOTU.log
Removing chimeras from dbOTUs de novo
perl /path/dbOTUcaller/perllib/fasta2filter_from_mat_SmileTrain.pl unique.dbOTU.mat
q.derep.fst > unique.dbOTU.ng.fasta
/usr/local/bin/usearch -uchime_denovo unique.dbOTU.ng.fasta -nonchimeras
unique.dbOTU.nonchimera.fasta -strand plus
perl /path/dbOTUcaller/perllib/filter_mat_from_fasta_SmileTrain.pl unique.dbOTU.mat
unique.dbOTU.nonchimera.fasta > unique.dbOTU.nonchimera.mat
dry run: test that files are non-empty: unique.dbOTU.nonchimera.fasta
18
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
18
Phylogenetic tree construction
The same alignment, produced by SmileTrain pipeline, was used as input to four different
methods to infer a phylogenetic tree: Maximum Likelihood using either (1) MEGA (version 7) or
(2) RAxML (version 7.0.3), approximate ML using (3) Fasttree (version 2.1.3), or (4) Relaxed
neighbor joining (using clearcut version 1.0.9). The latter three methods were implemented in
QIIME with the script: make_phylogeny.py with –t option. Using each of these trees, we then
computed weighted UniFrac distances and repeated the Permanova adonis, Permdisp and PCoA
analyses to compare seasons (essentially repeating analyses shown in Figure 2 and Table S6). All
trees yielded nearly identical results (Permanova adonis R2 = 0.1, P < 0.01, Permdisp P < 0.001)
therefore we report results for FastTree only.
Supplementary references
Fortin N, Munoz-Ramos V, Bird D, Lévesque B, Whyte LG, Greer CW. (2015). Toxic
cyanobacterial bloom triggers in Missisquoi Bay, Lake Champlain, as determined by next-
generation sequencing and quantitative PCR. Life 5: 1346–1380.
Lahr DJG, Katz LA. (2009). Reducing the impact of PCR-mediated recombination in molecular
evolution and environmental studies using a new-generation high-fidelity DNA polymerase.
BioTechniques 47: 857–866.
Preheim SP, Perrotta AR, Martin-Platero AM, Gupta A, Alm EJ. (2013). Distribution-based
clustering: using ecology to refine the operational taxonomic unit. Appl Environ Microbiol 79:
6593–6603.
19
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
19
20
442
443
20