supplementary...

67
Supplementary information Title: Canine transmissible venereal tumor genome reveals ancient introgression from coyotes to pre-contact dogs in North America Xuan Wang 1 , Bo-Wen Zhou 1,2 , Melinda A. Yang 3 , Ting-Ting Yin 1 , Fang-Liang Chen 4 , Sheila C. Ommeh 5 , Ali Esmailizadeh 6 , Melissa M. Turner 7 , Andrei D. Poyarkov 8 , Peter Savolainen 9 , Guo-Dong Wang 1,2,10 , Qiaomei Fu 3,11 , Ya-Ping Zhang 1,2,10 1 State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China 2 Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650223, China 3 Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, IVPP, CAS, Beijing, 100044, China 4 Kunming Police Dog Base of the Ministry of Public Security, Kunming, 650204, China 5 Animal Biotechnology Group, Institute of Biotechnology Research, Jomo Kenyatta University of Agriculture and Technology, Nairobi 00200, Kenya 6 Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran 7 Department of Forestry and Environmental Resources, Fisheries, Wildlife, and Conservation Biology Program, North Carolina State University, Raleigh, NC 27695, USA 8 Severtsov Institute of Ecology and Evolution, Russian Academy of Science, Leninskiy prospect, 33, Moscow, 119071, Russia 9 Department of Gene Technology, KTH-Royal Institute of Technology, Science for Life Laboratory, Tomtebodavägen 23A, Solna, 17165, Sweden 10 Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China 11 Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Beijing, 100044, China These authors contributed equally: Xuan Wang, Bo-Wen Zhou, Melinda A. Yang Correspondence: Guo-Dong Wang ([email protected]) or Qiaomei Fu ([email protected]) or Ya-Ping Zhang ([email protected]) File Description: Supplementary Note, Supplementary Methods, Supplementary References and Supplementary Figures

Upload: others

Post on 02-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information

Title: Canine transmissible venereal tumor genome reveals ancient introgression from coyotes to pre-contact dogs in North America Xuan Wang1, Bo-Wen Zhou1,2, Melinda A. Yang3, Ting-Ting Yin1, Fang-Liang Chen4, Sheila C. Ommeh5, Ali Esmailizadeh6, Melissa M. Turner7, Andrei D. Poyarkov8, Peter Savolainen9, Guo-Dong Wang1,2,10, Qiaomei Fu3,11, Ya-Ping Zhang1,2,10 1State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China 2Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650223, China 3Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, IVPP, CAS, Beijing, 100044, China 4Kunming Police Dog Base of the Ministry of Public Security, Kunming, 650204, China 5Animal Biotechnology Group, Institute of Biotechnology Research, Jomo Kenyatta University of Agriculture and Technology, Nairobi 00200, Kenya 6Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran 7Department of Forestry and Environmental Resources, Fisheries, Wildlife, and Conservation Biology Program, North Carolina State University, Raleigh, NC 27695, USA 8Severtsov Institute of Ecology and Evolution, Russian Academy of Science, Leninskiy prospect, 33, Moscow, 119071, Russia 9Department of Gene Technology, KTH-Royal Institute of Technology, Science for Life Laboratory, Tomtebodavägen 23A, Solna, 17165, Sweden 10Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China 11Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Beijing, 100044, China These authors contributed equally: Xuan Wang, Bo-Wen Zhou, Melinda A. Yang Correspondence: Guo-Dong Wang ([email protected]) or Qiaomei Fu ([email protected]) or Ya-Ping Zhang ([email protected]) File Description: Supplementary Note, Supplementary Methods, Supplementary References and Supplementary Figures

Page 2: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary Note

Recent whole genome sequencing (WGS) studies of ancient and modern canids all

indicate an intricate history, and with high population turnover during the course of

their evolutionary history1-9. These findings are in large part due to increased

availability of ancient genomic sequences, which are a powerful resource for

elucidating past demographic history of different species. Ancient genomic studies

have extensively refined our understanding of genetic history and adaptive evolution

in humans10, 11 and the evolutionary history of domestication in livestock12 and crops13.

Thus, the sequencing of more ancient canid genomes will play a pivotal role in

clarifying the genetic history of canine evolution and dog domestication14. However,

ancient genomes need not always derive from past specimens.

The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell line,

originating from cancer cells transmitted from a host to other canids during the mating

process15. Since it was shown ten years ago that living cells from an ancient host

could be transmitted among canids, the origin of CTVT has been studied

continuously16. Multi loci genetic analyses indicate that all CTVT cells are derived

from a single neoplastic clone of an original founder individual that lived many

generations ago17. Thus, CTVT cells can be treated as “living fossils”, whose genetic

material can provide insight on the founder and its population. Studies first narrowed

the CTVT founder (the original canid infected with CTVT) to a spitz type dog or

wolf17, and later phylogenomic analyses further indicated that the CTVT founder is

potentially an Arctic sled dog15, 18. However, horizontal transfer of mitochondrial

DNA (mtDNA) from infected dogs into the CTVT cells has occurred at least five

times19, making the maternal genealogy of the CTVT founder untraceable. Recent

comparison of the CTVT genetic data with a more comprehensive canine reference

panel including pre-contact dogs (PCDs) from North America argued that the CTVT

Page 3: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

founder is the closest detectable lineage to PCDs, and that this clade possessed

introgression from wild canids in North America1.

However, these previous studies may not take into account several potential biases in

the genotyping methods for CTVT samples and the strategy for collecting loci. First,

contamination of host cells will dilute the ratio of reads from tumor cells. Massive

copy number variation (CNV) resulting from chromosomal instability in early somatic

evolution15, 20, 21 can result in dilution (deletion) or concentration (duplication) of

reads from tumor cells. These factors can lead to genotyping errors when using direct

germline calling methods or a rigid variation allele fraction (VAF) interval1, 15, 18

(Supplemental information, Methods). Second, each CTVT genome represents a

complex mixture of entities – systematic errors, alleles inherited by the founder,

lineage-specific somatic mutations, and earlier somatic mutations. Somatic mutations

resulting in polymorphic genotypes help to test whether tumor cells originated from a

single founder or multiple clonal origins, as observed for facial tumors found in

Tasmanian Devils22, but their inclusion biases evolutionary analyses. For instance, in

phylogenetic analyses, branch lengths18 are overestimated and the likelihood of long

branch attraction increases1. In population genetic analyses, somatic mutations add

extreme outliers in a principal component analysis (PCA)1 and potentially affect the

significance of several statistical tests due to increased sharing between somatic

mutant alleles and unrelated germline populations. Previous studies, while aware of

this complexity, still used multiple CTVT samples to confirm whether these samples

have a single origin and to search the origin of the founder alongside other germline

samples simultaneously in the genetic analyses1, 15, 18. Third, uneven sample sizes and

different levels of sequencing depth of the reference panel may also bias tests of

admixture23. Previously used genomic reference panels often contain unbalanced

sample sizes for different sub-populations of dogs and wild canids. And most village

Page 4: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

dogs24, except those in East Asia, were sequenced to a low to middle depth of less

than 10.

We collected new CTVT samples and modern canids, and then used newly developed

tool and refined strategy to address these biases. Two new CTVT samples used in the

present study, named KM1 and KM2, were obtained in Kunming, China

(Supplementary information, Fig. S1 and Table S2). WGS was performed on host

(20 depth) and tumor (40 depth) tissues of each sample. We also included WGS

data for three previously published CTVT samples from Australia, Brazil15, and

Gambia1. Together, these five CTVTs from four continents allow us to exclude

lineage-specific somatic mutations.

Along with the accumulation of genome-wide canine data, estimation of CTVT’s

origin has gradually improved1, 15, 17, 18. However, as mentioned in previous studies15,

18, the integrity of the reference panel may influence how accurately CTVT’s origin

can be estimated. As CTVT has evolved for thousands of years, comparison to ancient

canine samples is most useful for directly tracing its origin, but despite great advances,

ancient DNA is still difficult to sequence. Instead, village dogs can be alternative

genetic proxies of ancient populations from the last millennium, assuming no

dramatic population exchange happened. We collected village dogs from diverse

geographical regions to avoid the influence of population exchange and admixture in

some “ancient” breeds by European dogs during colonization. Closely related wild

canids, such as gray wolves, coyotes, and golden jackals, are also useful to test

whether CTVT originated from wild canids or possess partial genetic ancestry from

them. Thus, we collected some wild canine samples to build a more integrated genetic

panel. In total, we additionally collected 22 canids from around the world

(Supplementary information, Table S1), including two golden jackals (Canis aureus)

from western Russia, one coyote (Canis latrans) from western North America, two

Page 5: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

gray wolves (Canis lupus) from Iran, and village dogs (Canis lupus familiaris) from

East Asia, the Indian Peninsula, Central Asia, the Middle East, and Africa. Most of

these canids were sequenced to an average 20 depth (Supplementary information,

Table S1). To maximize the spatial and temporal resolution of canine genetic

diversity, we selected another 81 canids from previously published samples (strategy

of selection is described in Supplementary information, Methods). Present-day

samples include coyotes from North America25, worldwide gray wolves4, 6, 8, 18, 25-27,

village dogs6, 28, European breed dogs 18, 27, and breed dogs from other regions6, 8, 18, 24,

28-32. We also included seven ancient North American dogs1, three ancient European

dogs4, 5, and the ~34,900-year-old Taimyr wolf7 (Supplementary information, Table

S1, locations of dogs are depicted in Fig. 1f).

Using filtering, mapping, and single nucleotide polymorphisms (SNPs) calling

procedures as described in the Supplementary information, Methods, we jointly called

24.1M SNPs from 92 present-day canids. The eleven ancient canids were ascertained

and genotyped on these 24.1M SNPs using the same methods developed previously4.

Collectively, the 24.1M SNP panel for these 103 modern and ancient canids were

used as our reference panel to study the genetic ancestry of the CTVT founder. This

reference panel is much more dense than previous studies, giving us the opportunity

to develop a refined landscape of the genetic ancestry making up CTVT’s genome.

Next, we performed ploidy, contamination (cellularity) and CNV analyses of the

CTVT samples using a grid based maximum Bayes estimating method sequenza33.

The ploidy of the five CTVTs ranged from 1.8 to 2, which is a very short interval

(Supplementary information, Fig. S2), confirming no drastic chromosomal variation

at the whole genome level occurred during worldwide dispersal21. The CNV profiles

of the five CTVTs showed a conserved pattern similar to that found in previous

results15, 21 (Fig. 1g), suggesting that the five CTVT samples had a singular origin.

Page 6: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Methods of studying somatic mutations keep advancing as the fields of mutational

mechanisms34, 35, intratumor heterogeneity36-38 and clonal evolution39, 40 in human

tumors progress. As chromosomal instability is considered the predominant somatic

mutational type in the tumorigenesis of CTVT20, the CNV profile is necessary to

determine the genotype at local sites. Thus, we developed a method, the transmissible

tumor genotyper (ttgeno), which is the first genotyping tool designed specifically to

analyze whole genome sequencing data from paired transmissible tumors and their

hosts, to obtain per-site allelic copy number of the tumor (Supplementary information,

Methods). This tool simultaneously takes into account the ploidy, contamination,

local copy number state of both host and tumor, and small indels in the tumor,

removing the sub-clonal factor, as previous studies have shown that CTVT has

already been almost homogeneous15, 20. We genotyped each CTVT using this tool,

obtaining successful genotyping rates from 95.5% to 97.4%.

The genotyped CTVT genome is composed of a mix of different mutations. These

include systematic errors, alleles inherited by the founder, lineage-specific somatic

mutations, and earlier somatic mutations. Assuming a single origin for CTVT,

lineage-specific somatic mutations can be distinguished from genotype-polymorphic

mutations using multiple worldwide CTVT samples. That is, alleles inherited by the

founder and earlier somatic mutations should be genotype-monomorphic among

CTVT samples. We found ~1.7G genotype-monomorphic sites, allowing one missing

CTVT sample at each site. Another 2.9M sites were genotype-polymorphic loci

among the five CTVTs, allowing two missing CTVT samples at each site. We used

the genotype-polymorphic sites to assess the relationship between these five CTVTs

(Supplementary information, Fig. S3) and excluded these from subsequent analyses.

However, the remaining ~1.7G of genotype-monomorphic sites can either contain an

inherited germline allele that originated from the CTVT founder, or an early somatic

mutation that was not in the CTVT founder and arose before the CTVT became

Page 7: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

widespread. We hypothesized that SNPs that are both genotype-monomorphic in the

CTVT samples and polymorphic in the reference panel (i.e. biallelic intersection) are

likely inherited germline polymorphisms, whereas private genotype-monomorphic

alleles found only in the CTVT samples are likely early somatic mutations. We found

that of the ~1.7G sites that were genotype-monomorphic in five CTVT samples,

17.4M sites (2M non-ref alleles) are biallelic polymorphic in the reference panel,

while 1.5M sites were private to CTVT samples. However, while we restrict our study

to those loci that are genotype-monomorphic in five CTVT samples but polymorphic

in the reference panel, alternative possibilities remain: 1) Some genotype-

monomorphic loci private to CTVT may in fact be germline polymorphic loci

belonging to an ancient population that has undergone drift. 2) Some CTVT’s alleles

in the 17.4M biallelic intersected sites may be early somatic mutations. For instance,

some polymorphic loci in the reference panel may have mutated more recently than

the time the CTVT founder lived and by chance matched early somatic mutations in

the CTVT lineage. 3) Some genotype-monomorphic loci may contain early somatic

alleles that mutated to alleles observed in coyotes and golden jackals.

Different mutagenic processes often generate different characteristic imprints, which

are combinations of mutation types, termed “signatures”35, 41. We assessed the extent

of somatic and germline mutations contained in the 1.5M genotype-monomorphic loci

private to CTVT samples, and the 17.4M biallelic intersected sites, testing our

assumption that these loci mostly germline mutations. To do so, we performed an

analysis of mutation signatures using signeR42, a method based on an empirical

Bayesian treatment of the non-negative mutational spectra matrix factorization model.

We did not distinguish ancestral and derived alleles for the variants, so mutation

signatures were used to determine relative similarities or differences among samples,

rather than to represent past mutagenic mechanisms. Eight signatures were estimated

from the 96 tri-nucleotide mutational spectra matrix at the 17.4M intersected sites for

Page 8: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

both CTVT and the samples in reference panel, and for the 1.5M CTVT-private

alleles (Supplementary information, Fig. S4 and Bayesian Information Criterion in

Supplementary information, Fig. S5). The contribution of each signature in all

samples reveals that Signature8, which is similar with the C>T signature found in

CTVT in previous studies15, 18, is enriched in the CTVT-private alleles (96.7%), but

less than 11.3% in all germline samples (Fig. 1a). The others show that Signature1,

Signature2, and Signature7 are enriched in golden jackals, and Signature4 and

Signature5 are enriched in coyotes (Supplementary information, Figs. S6-S7). For the

set of 17.4M intersected loci, in CTVT, the contribution of Signature8 is only 9.5%,

and most other signatures are not different from that found for other dogs (P>0.05,

Kruskal-Wallis rank sum test). The exception is Signature6, which is found in a

diverse set of dogs, and for whom there may have been a lack of statistical power as

only one sample represents the CTVT at these loci (Supplementary information, Figs.

S6-S7). Unsupervised clustering of samples based on the relationship of these eight

signatures reveals that different species and populations form general clades with a

few exceptions, and the CTVT of the 17.4M intersected loci clusters into a clade

composed of dogs (Fig. 1b). Recent studies about the mutation rate and mutational

mechanisms reflected by mutation signatures show that these signatures can

distinguish between somatic and germline mutations41, 43 among different species43

and between different germline populations44-47, a pattern we also find here (Fig. 1b).

These results indicate that most genotype-monomorphic sites in CTVT genome that

are polymorphic in the reference panel are inherited germline SNPs. Thus, we treated

the 17.4M sites as direct descendants of the CTVT founder and use these sites in

subsequent population genetic analyses. A small proportion (3.3%) of germline

signatures were estimated in CTVT private alleles, such that the loss of these loci in

subsequent analyses is negligible with 17.4M SNPs remaining for inferring the

genetic ancestry of CTVT. We describe the pipeline from genotyping to loci selection

in Fig. 1c. We emphasize that our pipeline is more advanced, by excluding bias from

Page 9: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

genotyping errors and somatic mutations, and constructing the suppositional ancient

canid “the CTVT founder”.

In previous analyses, CTVT grouped most closely with, but not into pre-contact dogs

(PCDs) from North America1. In a phylogenetic analysis of the CTVT founder with

our comprehensive canine reference panel, the topology of maximum-likelihood (ML)

tree (Fig. 1d) shows the CTVT founder clusters within several PCDs. Specifically, the

CTVT founder is placed closest to the Baum Village (SAMEA104190271) and Uyak

(SAMEA104190274) dogs. The PCD clade then groups with modern Arctic sled dogs

(ASDs), confirming that PCDs and ASDs share a common ancestor1. However, our

results do not support that all PCDs form a monophyletic clade1. Particularly, we

noted that the Koster dog (SAMEA104190270) is basal to all other dogs, indicating

this PCD may possess ancestry from a wild canid.

Our whole genome sequencing data also support that the PCD/ASD clade is the most

basal dog lineage, similar to Ní Leathlobhair et al.1, but not others where southern

East Asia dogs (EADs) are the most basal4, 6. Ní Leathlobhair et al. proposed that the

discrepancy may be due to post-divergence gene flow from European dogs (EUDs) to

specific EADs. However, exclusion or inclusion of the putative admixed EADs in

neighbor-joining (NJ) trees or ML trees did not result in any distinct changes

(Supplementary information, Figs. S8-S10). When we excluded PCDs and the CTVT

founder, EADs became the most basal clade in the NJ tree, rather than ASDs

(Supplementary information, Fig. S12). These results suggest that PCDs may lead to

the uncertainty in phylogeny dominantly. If introgression from wild canids exists in

the PCD clade, this clade, even with ASDs, can turn to outside of EADs. We then test

this hypothesis in subsequent analyses.

Page 10: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

We additionally included three published ancient European dogs’ genomes (NGD,

Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog) in our reference

panel4, 5. We found that HXH and NGD clustered together in the EUD clade, while

CTC is outside of the EUD clade. As previous studies have shown, HXH and NGD

are thought to be derived almost completely from ancient EUDs, while CTC has a

mixture of European and southern East Asian-related ancestry4, 6, 48. Three modern

“American native” breeds (chihuahua and two hairless dogs) also cluster into the

EUD clade (Fig. 1d), which suggests that these three breeds carry ancestry related to

EUDs. Our results are consistent with a population history where EUDs were almost

completely replaced Native American dog lineages1.

PCA excluding the golden jackals shows that PC1 (4.62% of variation) distinguishes

all coyotes and gray wolves from all dogs, and PC2 (3.35% of variation) distinguishes

coyotes and grey wolves, as well as different sub-populations within dogs (Fig. 1b).

We found that the CTVT founder is closest to the PCD cluster. The Taimyr wolf

(TMR) is located between PCDs and worldwide wolves, supporting placement of the

TMR at the split between the dog and wolf lineage7. Ancient European dogs (HXH

and NGD) and modern American dogs fall mostly within an EUD cluster. All

worldwide dogs cluster relatively tightly, but PCDs, including the CTVT founder,

cluster separately from the dog cluster and more closely to wild canids, suggesting

that PCDs may have admixed with wild canids in North America. Our PCA results

generally corroborate with the ML tree and previous results1, although the Port au

Choix dog (SAMEA104190273) closest to the CTVT founder in the PCA analysis

(Supplementary information, Fig. S13) is different from the closest in the ML tree.

In order to validate the above phylogenetic relationships, we then examined the

genetic relatedness between the CTVT founder and each dog in the reference panel by

performing outgroup f3-statistics49, 50, using coyotes as outgroup. Higher f3 values

Page 11: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

indicate increased shared drift between the samples, and therefore higher genetic

similarities. The CTVT founder showed the most genetic similarity with the pre-

contact Port au Choix dog, followed by the pre-contact Uyak dog, and then by most

other PCDs and ASDs (Fig. 1f). In particular, the Koster dog is not very genetically

similar to the CTVT founder, but it does share high genetic similarity with coyotes.

Overall, our results further support our phylogenetic and principal component

analyses, where we found that the CTVT founder is genetically most similar to PCDs.

To test the hypothesis whether admixture exists in PCDs and the CTVT founder, we

performed unsupervised ancestral clustering (K) analyses with ADMIXTURE51 on

pruned loci of the reference panel with the CTVT founder, excluding two golden

jackals (K ranges from 2 to 7, Supplementary information, Fig. S14a, and coefficient

of variation in Supplementary information, Fig. S14b). At K=2, coyotes and gray

wolves separate from all dogs, meanwhile all PCDs and the CTVT founder possess

notable amounts of ancestry that is also present in modern coyotes and gray wolves.

At K=3, coyotes separate from gray wolves, with wolves from the Qinghai-Tibetan

Plateau possessing ancestry related to modern coyotes. PCDs and the CTVT founder

still possess ancestry from coyotes and gray wolves. At K=4, all dogs split into two

major clades representing the Eastern and Western Eurasian lineages for dogs. All

PCDs and the CTVT founder cluster into the Eastern Eurasian lineage, further

supporting an East Eurasian origin for PCDs. At K=5, dogs from the Indian Peninsula

separate from Western Eurasian dogs. Ancestry related to Indian dogs also appear in

some dogs from East Asia, the Middle East, and Central Asia, potentially due to

admixture with Indian dog populations. At K=6, the PCDs, the CTVT founder, and

the modern ASDs form a single clade, separating from southern East Asian dogs,

indicating that they share a close genetic relationship. At K=7, African dogs separate

from West Eurasian dogs. In summary, all results support the population structure

observed in our previous phylogenetic and principal component analyses. We find

Page 12: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

that the main ancestral component found in the CTVT founder is associated primarily

with PCDs with minor components found predominantly in wild canine populations.

Specifically, we find that the Koster dog possesses a main ancestral component from

gray wolves, consistent with results suggesting that the Koster dog is the most basal

branch of all dogs on the phylogeny and the most related to wild canids in the PCA.

One possibility is that the Koster dog is a recent offspring of backcrosses to dogs after

initial hybridization with American wild canids. We also find that the three modern

“American native” breeds all possess European-like genetic ancestry. Finally, we also

find the ancestral components possessed by three ancient European dogs are

consistent with Botigué et al.’s result4, indicating continuity of European-like genetic

ancestry from modern dogs through the entire Neolithic period.

To further investigate whether the CTVT founder and PCDs experienced

introgression from a population distantly related to dogs, we calculated D-statistics49

to test whether significant asymmetry (positive D value, Z>3) exists between Pop1

and Pop2 using the form D(Pop1, Pop2; Candidate Introgressor, Outgroup). As

extensive gene flow exists in the genus Canis52, we used an ~11 Andean fox

(Lycalopex culpaeus) genome24 as the outgroup, which we genotyped at the 17.4M

intersected SNPs, randomly calling alleles at heterozygous sites to account for low

depth. We tested every non-dog group as a candidate introgressor for the CTVT

founder using D(CTVT founder, Pop2; Introgressor, Andean Fox), where Pop2 was

each canid population in turn (Supplementary information, Fig. S15 and Table S3).

Only coyotes were found to be a robust candidate introgressor. Coyotes from

Monterey showed significantly positive D-statistics for most Pop2 populations except

the other coyotes, New World wolves, and PCDs (Z>3.7). Coyotes from California

and Alabama are also potential candidate introgressors, but modern coyotes from the

Midwest, Ohio, and Florida were not robust introgressors, with several D(CTVT

founder, Pop2; Introgressor, Andean Fox) ~0. We found that coyotes from the

Page 13: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Midwest, Ohio, and Florida are closer to wolves and dogs than coyotes from the

Monterey area, California, and Alabama (Supplementary information, Table S3),

indicating potential geneflow among America canids53-55 or ancestral population

structure among coyotes. Similar to previous analyses1, we found that two PCDs (i.e.

Port au Choix, Weyanoke Old Town) showed significantly positive D(CTVT founder,

Pop2; PCD, Andean Fox) statistics for all Pop2 populations (Z>46), indicating the

close relationship between the CTVT founder and PCDs in our panel. Taken together,

the CTVT founder is likely an ancient American dog with introgression from

populations carrying ancestry related to coyotes from the Monterey area, California,

and Alabama. We also tested whether other dogs (Pop1) possessed introgression from

coyotes by using D(Pop1, Pop2; Coyote, Andean Fox), where Pop2 was tested using

all other groups in turn (Supplementary information, Fig. S16). We found no evidence

of introgression from coyotes in any dog population except PCDs and the CTVT

founder. Due to the CTVT founder’s high coverage, we used it as a surrogate for

PCDs to test whether any other canids carry ancestry from PCDs (Supplementary

information, Fig. S17). Only Arctic sled dogs in North America show more similarity

to PCDs, followed by Siberian and Alaskan huskies. However, whether asymmetric

D-statistics indicate introgression from closely related populations, or an inherited

relationship cannot be determined without high-density sampling of ancient and

modern PCDs and ASDs over a broad geographical region and time frame.

To confirm our result of introgression from coyotes to the CTVT founder shown in D-

statistics analyses, we utilized the coyote-specific diagnostic alleles53, fd-statistics56,

and fdM-statistics57 in sliding windows, as well as RFMix58 to infer the local ancestry

in the genome of the CTVT founder (Fig. 1g). We found the results were consistent

using these methods, with several regions introgressed from coyotes. From the RFMix

estimation, the estimated proportion of introgression from coyotes is ~0.9%. We also

used the F4-ratio test49 to estimate the proportion of coyote-related ancestry in the

Page 14: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

CTVT founder and dogs sampled from America. We found the introgressed

proportion is 2.6%±0.5% (Z=5.735) for the CTVT founder and 4.9%±0.5%

(Z=10.334) for the Port au Choix dog, whereas the proportion is negative for the

Greenland dog and close to zero for the Alaska malamute, Mexico naked, and

Peruvian naked dog. The proportion inferred by RFMix is smaller than the F4-ratio,

likely because RFMix finds local ancestral segments using a smoothing algorithm for

limited generations after initial hybridization. These results reveal introgression from

coyotes into PCDs, but none in later introduced American dogs. We also identified

introgressed regions from New World wolves (NWW, 1.9%, physical length

proportion) in the genome of the CTVT founder when three ancestral references were

used in RFMix, supported by an F4-ratio estimate of 6.2%±1.1% (Z=5.742). Similar

to coyote introgression, the Port au Choix dog also has a higher admixture proportion

from NWW (F4-ratio, 12.0%±1.1%, Z=10.205). The extensive gene flow between

wild canids and PCDs may reflect overlap in the ecological habitats of PCDs and

coyotes.

TreeMix59 determines the graph structure of ancestral populations that allows for both

population splits and potential gene flow by using genome-wide allele frequency data

and a Gaussian approximation of genetic drift. We used TreeMix to investigate the

genetic relationship between the CTVT founder, PCDs, other ancient and present-day

canids (Supplementary information, Figs. S18-S21). The ML tree without admixture

(m=0, Supplementary information, Fig. S18a) showed that EAD form the basal clade

of all dogs. Other dogs split into two major clades, one is composed of ASDs and

PCDs in North America, while the other one is composed of Western Eurasian dogs

and African dogs. The topology is consistent with the NJ tree constructed without

PCDs. We observe that the PCD/CTVT founder clade clusters with the Greenland dog

and Alaskan malamute, and this super clade in turn clusters with two kinds of huskies

as a sister clade. This indicates that present-day Arctic dogs in North America may

Page 15: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

possess high amounts of genetic ancestry inherited from the ancestral population of

the CTVT founder and sampled PCDs. This result is inconsistent with our

phylogenetic results in the text and other previous results1, but appeared in some

phylogenies when we adjusted samples used in the reference panel (not shown). In

view of this result, the refined evolutionary history of American dogs from their initial

introduction until the present-day is still uncertain and may require increased

sampling ancient and modern American dogs. We visualized the matrix of residuals

(Supplementary information, Fig. S18b) to determine how the estimated genetic

relationship between each pair of canids fit the model. A high residual indicates that

the pair does not fit the graph model and may be candidates for an admixture event.

We find three candidate admixture events: 1) between coyotes and the PCD/CTVT

founder, 2) between Siberian and Alaskan huskies, and 3) between Indian and African

village dogs. In a reticulate ML graph allowing three admixture events, a migration

event from the coyote lineage to the PCD/CTVT founder clade is included (Fig. 1h,

matrix of residuals in Supplementary information, Fig. S21b). The other two events

reflect the extensive admixture in Eurasian canids. The topology of the graph

remained unchanged when migrant events were included. Thus, several methods

support the presence of gene flow from coyotes into the ancient native dog population

represented by the CTVT founder and PCDs. This reticulate graph also demonstrated

the concordant result of the Out of Southern East Asia hypothesis of living dogs

suggested in previous studies4, 6 (Fig. 1h), where East Asian dogs are the basal clade

of all dogs, and two major superclades are found in the dog phylogeny, representing

two migration routes into the regions of Far East-America and Indian Peninsula-West

Eurasia6.

Due to the low sequencing depth of most PCDs, we just included one or two PCDs in

the statistical analyses used to determine introgression, but the ADMIXTURE results

suggest that introgression was extensive, over a long timeframe and across a broad

Page 16: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

region (Supplementary information, Fig. S14a). However, previous published results

on ancient mtDNA of morphologically identified PCDs classified the vast majority as

haplotypes belonging to either dogs or wolves, but not coyotes1, 60-64. An uneven

proportion of introgression from coyotes, supported by mtDNA or autosomal analyses,

likely existed in PCDs. A model-based survey demonstrated that sex-biased

introgression arises when asymmetries exist between the sexes in fitness or mating

behavior in the hybridizing migrant pool or in the source species65. Some studies have

claimed that male-biased introgression from dogs occurred in modern eastern

coyotes53, 66. The sex bias of transient coyotes has not been recorded in such detail67, 68,

which suggests that further work is needed to determine whether transient coyotes are

male-biased. Studies indicate a high level of diversity in bone morphology in early

PCDs69, but whether the coyote-PCD hybridization was found naturally or was

introduced by humans to develop new breeds is still unknown. Classic cases such as

the Tibetan Mastiff acquiring adaptation to hypoxia at high elevations due to

introgression from Tibetan wolves70, 71 also highlight the importance of denser

sampling in the future to study whether any introgressed regions from coyotes are

under selection in PCDs.

Although studies based on mtDNA provides a timeframe for the initial introduction of

PCDs with humans1, 60, the refined demography of American dogs is still

controversial, especially for Arctic dogs in North America1, 61-64, 72. Our results are

also not enough for concluding this issue. Recent studies reveal a complex population

history of Native Americans73, 74, which suggest that American dogs have complex

histories associated not only with hybridization with wild canids but also with human

migration within the Americas. High quality genome data from PCDs are in high

demand to answer these questions. Thus, the CTVT founder, inferred from the

geographically dispersed CTVT samples, is a useful high-quality proxy for PCDs.

The CTVT-private genotype-monomorphic sites will greatly aid cancer evolution

Page 17: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

studies75, and more importantly, the extraction of the CTVT founder genome from

genotype-monomorphic sites in CTVT samples is invaluable to canine population

studies. Thus, we provide the genotype-monomorphic diploidized sites of the five

geographically dispersed CTVTs in the DogDG database of the iDog76 platform for

researchers to conveniently use in future studies.

Page 18: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary Methods

Sample collection and data set aggregation

Canine transmissible venereal tumor sample collection The two CTVT samples

used in this study (both from male dogs), named KM1 and KM2, were obtained in

Kunming, China (Supplementary information, Fig. S1). Both the tumors and blood

from the host dogs were collected with the approval of their owners. After

cytodiagnosis and removal during anesthetic surgery at two animal hospitals

respectively, the tumors were soaked in absolute ethanol. Blood from the host was

collected from forelimb vein and stored in EDTA-anticoagulation tubes. WGS data

for three published CTVT samples (24T, 79T, 609T) and their corresponding hosts

(24H, 79H, 609H) were also downloaded for use1, 15.

Compiling the reference panel To investigate the genetics of the CTVT founder,

we collected all published canine WGS data to date. Then we selected samples using

criteria to balance the sample size of sub-populations and improve the quality of panel:

1) If both high depth (>20) and low depth samples (<10) exist at one geographic

point, low depth samples were excluded. 2) If the distribution of a sample’s

sequencing depth along chromosomes is not uniform, the sample was excluded. 3) If

close relatives in two generations exist based on kinship coefficients77, only one of

them was retained. 4) If admixed village dogs not from Europe possess extreme EUD

ancestry and clusters with EUDs, photos of these dogs were checked to identify

admixed characters, and these samples were excluded. 5) All ancient samples were

retained. 6) EUDs were selected randomly, but geographically dispersed to a

proportionate sample size compared with other groups. Specifically, we sampled two

golden jackals from Western Russia, as it was previously shown that African and

Israeli golden jackals show admixture related to wolves and dogs, suggesting that they

may be more closely related to wolves and dogs than to coyotes4, 8, 78. Using these

Page 19: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

conditions, we included WGS data from 103 canids in the reference panel, containing

worldwide gray wolves (Canis lupus), dogs (Canis lupus familiaris), coyotes (Canis

latrans), and golden jackals (Canis aureus) (Supplementary information, Table S1).

Genomic library construction and sequencing

CTVT samples All genomic DNA were extracted from the CTVT tumors and the

blood of their corresponding hosts using QIAGEN DNeasy Blood & Tissue kit. The

DNA extracts were then sent to Tianjin Novogene Bioinformatics Technology Co.,

Ltd after vacuum freeze-drying for sequencing. Four paired-end libraries (insert size:

250bp, 650bp, 800bp, 650bp) were constructed for DNA extracts from each CTVT

sample and one paired-end library (insert size: 300bp) was constructed for each host.

Sequencing was carried out for paired-end reads on the Illumina HiSeq2000 platform

according to the manufacturer’s instructions. For the raw reads, sequencing adapters

were removed. Contaminated reads (chloroplast, mitochondrial, bacterial and viral

sequences, etc.) were screened by alignment to the NCBI-NR database using

megablast (version 2.2.26)79, 80 with the parameters “-v 1 -b 1 -e 1e-5 -m 8 -a 13”. The

in-house script duplication_rm.v2 was used to remove the duplicated read pairs. The

low-quality reads were filtered, and the following conditions led to filtering out of the

reads: 1) reads with ≥10% unidentified nucleotides (N), 2) reads with adapters, 3)

reads with >20% bases having Phred quality less than 5. Finally, the average

sequencing depth for both CTVT tumors was ~40, and the average sequencing depth

for both of their hosts was ~20.

Newly collected canids For newly collected canids, we sent Whatman® FTA®

Cards containing their DNA to Tianjin Novogene Bioinformatics Technology Co.,

Ltd for sequencing. Paired-end genomic sequence libraries were constructed with an

insert size of 250-400 bp, and sequencing was carried out on the Illumina HiSeq

XTen platform. The filtering scheme used is the same as that described above.

Page 20: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Sequencing depth and coverage information are shown in Supplementary information,

Table S1.

Sequence data pre-processing and variant calling

We mapped all clean reads to the CanFam3.1

(ftp://hgdownload.soe.ucsc.edu/goldenPath/canFam3) reference sequence using bwa

mem -M (version 0.7.5-r1140)81. Mapped reads were sorted using samtools sort

(version 1.5)82. We applied picard (version 2.9.0,

http://broadinstitute.github.io/picard/) to remove duplicated reads and merged BAM

files for multiple lanes. Indels were realigned using the GenomeAnalysisTK

(GATK ,version 3.7.0)83 Indelrealign. Base quality was recalibrated using GATK

BQSR to produce a final BAM file for each sample. The depth and coverage for all

samples were calculated using GATK DepthOfCoverage.

We applied the GATK HaplotypeCaller to simultaneously call variants (SNPs and

indels) from the final BAM files of 92 modern canids. We removed SNPs that are

within three base pairs of an indel using bcftools (version 1.5)84 SnpGap. Biallelic

SNPs were retained using a hard filter of QD < 2.0, MQ < 40.0, FS > 60.0, SOR > 3.0,

MQRankSum < -12.5, or ReadPosRankSum < -8.0 in accordance with the GATK

Tutorials (https://software.broadinstitute.org/gatk/documentation/article?id=2806). To

retain private alleles belonging to golden jackals and coyotes, we filtered out SNPs

where the minor allele count was less than two. Then, we removed SNPs with a

missing rating less than 0.9 using vcftools (version 0.1.15)85. We ended up with a final

set of 24.2M SNPs. Then these sites were genotyped for ancient canids individually

using the script aDNA_GenoCaller.py as described before4. 24.1M biallelic SNPs

were retained from the union of modern and ancient canine SNPs. Subset of SNPs

belonging to subset of samples in several analyses was acquired by bcftools view -S

samples_list -c 1:minor, respectively.

Page 21: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Ploidy, cellularity and copy number variation analyses of CTVTs

The GC content of all chromosomes (removing unplaced contigs) were calculated in

50bp windows. Base content, sequencing depth and strand information at all sites

were extracted from the BAM files of every CTVT sample and its corresponding host

using sequenza-utils bam2seqz and binned using sequenza-utils seqz_binning in 50bp

windows. We used median normalization method to determine the depth ratio, the fast

method for segmentation, and three alleles so that at each locus we could estimate the

tumor genotype. The gender parameter was assigned according to the host’s gender.

Finally, ploidy and cellularity were estimated using the sequenza (version 2.1.2)33

package in R. CNV profiles were then estimated using the estimated ploidy and

cellularity results.

Development of a transmissible tumor genotyper pipeline

In order to perform accurate population genetic analyses to study the CTVT founder

and its relationship to the reference panel, a comprehensive genotyped set of loci that

are not somatic single nucleotide variations is required. Thus, we need obtain the per-

site genotype of each CTVT sample firstly, and then judge whether polymorphism

exists at a specific locus among CTVTs to classify it into recent somatic mutations or

potential early somatic/germline genotype-monomorphic sites. Finally, through

mutation spectrum deconvolution, we can assess the contribution of different type of

mutations in the sets.

We first give two examples to demonstrate the weakness of previous methods using

VAF, and then we describe the first tool to obtain per-site genotypes specific for

paired whole genome sequencing data of transmissible tumor and its host, naming as

transmissible tumor genotyper (ttgeno).

The first example:

Page 22: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Assuming a homozygous site AA in host, if beta VAF>0.1 is the threshold to

determine a heterozygous genotype in tumor. If 1) the real genotype of tumor is CC

without copy number alternation, 2) the local reads depth is same as host sample, and

3) the sequencing possibility of each haplotype is same, when the contamination of

host cells increases, the observed VAF will also increases. Only when the

contamination is less than 10%, the genotyping result of tumor is true CC, otherwise,

it is falsely genotyped as AC.

The second example:

Assuming a homozygous site AA in host, if beta VAF>0.1 is the threshold to

determine a heterozygous genotype in tumor. If 1) the real genotype of tumor is

“ACCCCCCCCCC” with copy number alternation to 11; 2) the sequencing possibility

of each haplotype is same; 3) no any contamination of host cells exists, the biallelic

genotyping result is “CC”, but the truth may be “AC” by amplification of the “C”

haplotype’s chromosomal segment.

When these two factors exist at the same time, the genotyping result will be biased

depending on how contamination is and whether chromosomal instability exists. Also,

reads depth and few CNV in host will also affect genotyping results. Thus, here we

developed the ttgeno to consider contamination, ploidy, CNV of both tumor and host,

and reads depth together:

1) We called SNPs and indels of the host using GATK HaplotypeCaller, filtering

using the same strategy described in Sequence data pre-processing and variant

calling section of Materials and Methods. SNPs within three base pairs of an indel

were annotated as ambiguous sites.

2) We then extracted the genotypes of small deletions in the host to recalibrate the

local depth ratio of the tumor and its corresponding host, as deletions in the host

Page 23: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

do not meet the global assumption of diploidy for normal samples analyzed in

sequenza.

3) We called deletions in the tumor BAM file and in the corresponding host BAM

file using GATK Mutect2. Genotypes for host’s deletions extracted from the

Mutect2 results were combined into results of step 2), in consideration of omission

of GATK HaplotypeCaller. Read counts covering local deletion regions in tumor

were used to recalibrate the estimated local copy number, as the large segments of

copy number variation were broken under tolerance of small indels in sequenza.

4) We called deletions in the tumor BAM files using bcftools mpileup, with

parameters -Q 20, -d 500 and -L 500, in consideration of omission of GATK

Mutect2. Read counts covering local deletion regions in the tumor were combined

into results of GATK Mutect2 in step 3).

5) We determined the large CNV in the host using CNVnator (version 0.3.3)86, using

bins of 400 bp. The CNV regions were filtered with length>1000 bp, e-val1<0.01,

q0<0.5, an overlap ratio with gaps

(http://hgdownload.soe.ucsc.edu/goldenPath/canFam3/database/gap.txt.gz) <0.5,

and overlap ratio with repeatmask regions

(http://hgdownload.soe.ucsc.edu/goldenPath/canFam3/database/rmsk.txt.gz) <0.5.

The large CNV for host were also used to recalibrate the local depth ratio, as CNV

in the host do not meet the global assumption of diploidy for normal samples

analyzed in sequenza.

6) The CNV of the tumor was estimated using sequenza.

7) We generated the multi-informative seqz file for the tumor and host using the

results from samtools mpileup, with the command sequenza-utils bam2seqz. All

host’s SNPs with genotypes that differ from the GATK HaplotypeCaller&Mutect2

result were annotated as ambiguous sites.

8) We generated the per-site read count acgt file for the tumor using the results from

samtools mpileup, with the command sequenza-utils pileup2acgt. If the site is

Page 24: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

ambiguous in the host, we excluded it for genotyping. For the main genotyping

process, we first recalibrate the local CNV status of the host. Second, we infer the

host’s allelic reads contamination ratio in the tumor data based on global

cellularity, local copy number of host and the allelic sequence depth of tumor.

Third, we recalibrate the local CNV status of the tumor if small deletions were

found. Finally, we calculated the allelic copy number using calibrated allelic read

counts and the local CNV status of the tumor. These steps were implemented

using perl and available at https://github.com/xuan-wang/ttgeno. The ratio of sites

that have an unequal value between the sum of per-site allelic copy number and

recalibrated locus copy number is approximately 0.00001 at the whole genome

level.

The demo output of the ttgeno is given below as example:

Chr Pos Ref A C G T Total

1 1836 G 0 0 4 0 4

1 1838 G 0 1 3 0 4

1 1841 C 1 2 1 0 4

1 1842 T 0 0 0 1 1

Tumor’s sites with ambiguous state in host and uncovered sites were excluded in the

output. Sites with copy number lower than 2 were masked as missing, because the

ancestral diploid genotype is unknown. We treated the final per-site diploidized

genotype of allelic copy number as the genotype of the tumor under assumption of

maximum parsimony, which means the genotype of chr1:1838 site is CG, chr1:1841

is ACG, chr1:1842 is NN, and chr1:1836 is homozygous GG. The reason we

performed this conversion is the absolute copy number may be different among

samples, leading to disparate information to get the genotype-monomorphic sites

among transmissible tumors.

Page 25: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

CTVT loci selection

We selected the genotype-monomorphic sites among five CTVTs, allowing one

missing sample, as any genotype-polymorphic site likely is a result of somatic

mutations. An unrooted neighbor-joining phylogeny was constructed by MEGA-CC

(version 7.0.25)87 based on the genotype-polymorphic sites (allowing two missing

samples) to show the diversity among CTVTs. Comparing with the SNPs from the

reference panel, we further determined two categories: the first containing the biallelic

intersection between the genotype-monomorphic sites of CTVTs and the reference

panel’s SNPs, and the second containing CTVT-private genotype-monomorphic sites.

Mutation signatures analyses

We utilized signeR42 to factorize the 96 trinucleotide mutational counts matrix of

mutation signatures to assess the extent of somatic and germline mutations contained

in the intersected SNPs and the CTVT-private alleles. The bar plot of contributions

from each signature is made with the medians of the evaluations of each signature

from the combined set of samples. Default distance and cluster methods were used to

group samples according to their levels of exposure to the signatures. Differentially

active signatures among previously defined groups of samples (GDJ, golden jackals;

CYT, coyotes; WOLF, worldwide wolves; DOG, worldwide dogs; CTVT intersected,

the biallelic intersected SNPs between genotype-monomorphic sites and the SNPs of

reference panel; CTVT private, CTVT-private genotype-monomorphic alleles) were

determined in signeR by Kruskal-Wallis Rank Sum Test. Ultra-low depth ancient

samples were excluded in mutation signature analyses.

Population phylogeny analysis

We constructed approximately maximum-likelihood phylogenetic trees using

FastTreeMP (version 2.1)88. The bootstrap replicates were generated using the

defaults in FastTreeMP. We adopted the same methods to construct two other ML

Page 26: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

trees using a subset of samples excluding specific admixed EADs, or the PCDs and

the CTVT founder (Supplementary information, Figs. S9, S11). In addition, we built

three NJ trees using MEGA-CC (version 7.0.25)87 to compare the effect of different

methods of phylogenetic estimation. The NJ trees were built using the same SNPs and

subsets of samples used for the ML trees (Supplementary information, Figs. S8, S10

and S12). 100 replicates were generated to calculate bootstrap values. All tree figures

were illustrated using GDJs as outgroup in FigTree (version 1.4.3,

http://tree.bio.ed.ac.uk/software/figtree) and colored according to geography.

Principal component analysis

PCA for the subset of samples excluding GDJs was performed using SNPRelate

(version 1.12.0)89. SNPs were pruned by linkage disequilibrium, resulting in 76K

SNPs in subsequent analyses. PCA figures were created using the R package ggplot2

(version 3.0.0)90. Colors were assigned according to geography similar to that found

in the phylogenies. Specifically, we colored each PCD separately to distinguish them

from each other (Supplementary information, Fig. S13).

Population structure analysis

Population structure was inferred using ADMIXTURE (version 1.3.0)51 on pruned loci

from PCA analyses, with the number of inferred ancestries (K) ranging from two to

seven. We used the lowest cross validation error across all K to determine the

component of each inferred ancestral populations. The best value was found for two

ancestral populations (K=2), which was determined according to the overall minimum

coefficient of variation.

Statistical analyses

The symmetry statistical tests were performed by programs within the Admixtools

software package49, with the setting numchrom=38. The genetic map for our SNPs

Page 27: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

was inferred from Auton et al.24, using

https://github.com/armartin/ancestry_pipeline/blob/master/makeMap.py. Standard

errors were estimated by Admixtools using a weighted block jackknife based on the

genetic map as previously described in Patterson et al.49. We performed outgroup f3-

statistics analysis49, 50 of the form f3(CTVT founder, Pop2; Coyote), to assess the

relative genetic similarity of the CTVT founder to present-day dogs (Pop2). f3-

statistics were calculated using the qp3pop program (version 412). We created a heat

map of the outgroup f3-statistics using the R package ggplot290 and a public map in R

package ggmap91. We performed D-statistics (ABBA-BABA tests) analysis49 of the

form D(Pop1, Pop2; Pop3, Andean Fox) for all combinations of groups to assess

potential introgression between populations. The ~11 Andean fox (Lycalopex

culpaeus) genome24 was used as outgroup, and due to its low sequencing depth,

alleles were called randomly at heterozygous sites. Of canids, only two PCDs were

used, SAMEA104190273 and SAMEA104190275, as the others had very low

sequencing depth. Labels are shown in Supplementary information, Table S1 and

results are shown in Supplementary information, Table S3. Some results not directly

relevant to this study were included for completeness. D-statistics were calculated

using the qpDstat program (version 712). The proportion of introgression from

western coyotes (Monterey area, California and Alabama) to the CTVT founder, Port

au Choix dog, Greenland dog and Alaskan Malamute (X in turn) was calculated using

F4(Andean Fox, SEAD; X, Siberian Huskies) / F4(Andean Fox, SEAD; Coyotes,

Siberian Huskies). The proportion of introgression from western coyotes to Mexico

and Peruvian naked dogs (X) was calculated using F4(Andean Fox, SEAD; X,

Newgrange dog) / F4(Andean Fox, SEAD; Coyotes, Newgrange dog). The proportion

of introgression from New World wolves to the CTVT founder and Port au Choix dog

was calculated using F4(Andean Fox, SEAD; X, Siberian Huskies) / F4(Andean Fox,

SEAD; New World wolves, Siberian Huskies). F4-ratio statistics were calculated

using the qpF4ratio program (version 300). Labels of samples used in F4-ratio are

Page 28: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

recorded in Supplementary information, Table S1 and results can be found in

Supplementary information, Table S4.

Local ancestry inference

According to D-statistic results, coyotes from the Midwest, Ohio and Florida are more

closely related to dogs and wolves than coyotes from Monterey area, California and

Alabama are. Thus, we used coyotes from the Monterey area, California and Alabama

as the coyote-ancestral reference in the following local ancestry inference. High

frequency coyote-private non-reference (non-ref) alleles can be used as diagnostic

alleles to test for admixture53. We extracted fixed non-ref alleles (positions are

colored as blue in the second inner circle of Fig. 1g) from the SNP set private to all

coyotes using bcftools view -S westerncoyotes.list -c 6:nref, and checked the allelic

state at these loci in the genome of CTVT founder. If the CTVT founder share the

coyote diagnostic allele, we plot the position of the locus (red in the first inner circle

of Fig. 1g). We then calculated fd-statistics56, fdM-statistics57 and D-statistics in a 500

kb sliding window using a step size of 250 kb by

https://github.com/simonhmartin/genomics_general/blob/master/ABBABABAwindo

ws.py, setting the Andean fox as outgroup, the three aforementioned coyotes as

introgressers, the Greenland dog and Alaskan Malamute as Pop2, and the CTVT

founder as Pop1. The top 1% of negative windows for the fd-statistics and fdM-

statistics are highlighted, and windows of negative D-statistics are highlighted green

as a reference. Beagle5 (version 28Sep18.793)92 is used to phase the SNPs excluding

GDJs and ultra-low depth samples for RFMix58 local ancestry inference. The average

effective population size was set to 60,000 in accordance with previous studies4, 6. The

three mentioned CYT, four North NWW, Greenland dog and ASD were set as

ancestral references. We performed RFMix (version v2.03-r0) with additional

parameters: -e=40 --reanalyze-reference -G 25. The physical proportion of each

Page 29: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

ancestry was the average value weighted by length of each chromosome segment, and

local ancestry of two haplotypes are recorded in Supplementary information, Table S5.

TreeMix analysis

We applied TreeMix (version 1.13)59 to investigate the genetic relationship between

the CTVT founder, PCDs, other ancient and present-day canids. Ultra-low depth

ancient samples, Tibetan wolves and Eastern coyotes were excluded, and labels of

samples used are recorded in Supplementary information, Table S1. Allele

frequencies were counted by plink (version v1.90b5.2)93 and converted to TreeMix

input using the script plink2treemix.py. The analysis was performed with the

parameters: -k 100 -root CYT. To further investigate how well the tree model fits the

data, we visualized the matrix of residuals for the tree model with no admixture. We

tested trees for zero to three migration events and show results for zero and three

migration events.

Accession number

Sequencing data is archived and available in the Genome Sequence Archive (GSA,

http://bigd.big.ac.cn/gsa/). The accession number for the 22 newly collected canids is

CRA000938, and the accession number of the two CTVT samples and their

corresponding hosts is CRA000939. The genotype-monomorphic sites based on the

five CTVT samples are available in the DogGD database from the iDog platform

(http://bigd.big.ac.cn/doggd/pages/modules/download/download.jsp). The

transmissible tumor genotyper is available at https://github.com/xuan-wang/ttgeno.

Page 30: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary References

1. Ní Leathlobhair M, et al. The evolutionary history of dogs in the Americas.

Science 361, 81 (2018).

2. Ostrander EA, Wayne RK, Freedman AH, Davis BW. Demographic history,

selection and functional diversity of the canine genome. Nat. Rev. Genet. 18,

705 (2017).

3. Freedman AH, Wayne RK. Deciphering the origin of dogs: from fossils to

genomes. Annu. Rev. Anim. Biosci. 5, 281-307 (2017).

4. Botigué LR, et al. Ancient European dog genomes reveal continuity since the

Early Neolithic. Nat. Commun. 8, 16082 (2017).

5. Frantz LAF, et al. Genomic and archaeological evidence suggest a dual origin

of domestic dogs. Science 352, 1228-1231 (2016).

6. Wang G-D, et al. Out of Southern East Asia: the natural history of domestic

dogs across the world. Cell Res. 26, 21 (2015).

7. Skoglund P, Ersmark E, Palkopoulou E, Dalén L. Ancient wolf genome

reveals an early divergence of domestic dog ancestors and admixture into

high-latitude breeds. Curr. Biol. 25, 1515-1519 (2015).

8. Freedman AH, et al. Genome sequencing highlights the dynamic early history

of dogs. PLoS Genet. 10, e1004016 (2014).

9. Fan Z, et al. Worldwide patterns of genomic variation and admixture in gray

wolves. Genome Res. 26, 163-173 (2016).

10. Yang MA, Fu Q. Insights into modern human prehistory using ancient

genomes. Trends Genet. 34, 184-196 (2018).

11. Nielsen R, et al. Tracing the peopling of the world through genomics. Nature

541, 302 (2017).

12. MacHugh DE, Larson G, Orlando L. Taming the past: ancient DNA and the

study of animal domestication. Annu. Rev. Anim. Biosci. 5, 329-351 (2017).

Page 31: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

13. Pont C, et al. Paleogenomics: reconstruction of plant evolutionary trajectories

from modern and ancient DNA. Genome Biol. 20, 29 (2019).

14. Larson G, et al. Rethinking dog domestication by integrating genetics,

archeology, and biogeography. Proc. Natl. Acad. Sci. USA 109, 8878-8883

(2012).

15. Murchison EP, et al. Transmissible dog cancer genome reveals the origin and

history of an ancient cell lineage. Science 343, 437-440 (2014).

16. Ostrander EA, Davis BW, Ostrander GK. Transmissible tumors: breaking the

cancer paradigm. Trends Genet. 32, 1-15 (2016).

17. Murgia C, et al. Clonal origin and evolution of a transmissible cancer. Cell

126, 477-487 (2006).

18. Decker B, et al. Comparison against 186 canid whole-genome sequences

reveals survival strategies of an ancient clonally transmissible canine tumor.

Genome Res. 25, 1646-1655 (2015).

19. Strakova A, et al. Mitochondrial genetic diversity, selection and

recombination in a canine transmissible cancer. eLife 5, e14552 (2016).

20. Ujvari B, Papenfuss AT, Belov K. Transmissible cancers in an evolutionary

context. BioEssays 38, S14-S23 (2016).

21. Thomas R, et al. Extensive conservation of genomic imbalances in canine

transmissible venereal tumors (CTVT) detected by microarray-based CGH

analysis. Chromosome Res. 17, 927 (2009).

22. Stammnitz MR, et al. The origins and vulnerabilities of two transmissible

cancers in tasmanian devils. Cancer Cell 33, 607-619.e615 (2018).

23. Meirmans PG. Subsampling reveals that unbalanced sampling affects structure

results in a multi-species dataset. Heredity 122, 276-287 (2019).

24. Auton A, et al. Genetic recombination is targeted towards gene promoter

regions in dogs. PLoS Genet. 9, e1003984 (2013).

Page 32: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

25. vonHoldt BM, et al. Whole-genome sequence analysis shows that two

endemic species of North American wolf are admixtures of the coyote and

gray wolf. Sci. Adv. 2, e1501714 (2016).

26. Zhang W, et al. Hypoxia adaptations in the grey wolf (Canis lupus chanco)

from Qinghai-Tibet Plateau. PLoS Genet. 10, e1004466 (2014).

27. Marsden CD, et al. Bottlenecks and selective sweeps during domestication

have increased deleterious genetic variation in dogs. Proc. Natl. Acad. Sci.

USA 113, 152 (2016).

28. Gou X, et al. Whole-genome sequencing of six dog breeds from continuous

altitudes reveals adaptation to high-altitude hypoxia. Genome Res. 24, 1308-

1315 (2014).

29. Kim H-M, et al. Whole genome comparison of donor and cloned dogs. Sci.

Rep. 3, 2998 (2013).

30. Kim RN, et al. Genome analysis of the domestic dog (Korean Jindo) by

massively parallel sequencing. DNA Res. 19, 275-288 (2012).

31. Wiedmer M, et al. A RAB3GAP1 SINE insertion in Alaskan Huskies with

Polyneuropathy, Ocular Abnormalities, and Neuronal Vacuolation (POANV)

resembling human Warburg Micro Syndrome 1 (WARBM1). G3: Genes,

Genomes, Genet. 6, 255-262 (2016).

32. Liu Y-H, et al. Whole-genome sequencing of African dogs provides insights

into adaptations against tropical parasites. Mol. Biol. Evol. 35, 287-298 (2018).

33. Favero F, et al. Sequenza: allele-specific copy number and mutation profiles

from tumor sequencing data. Ann. Oncol. 26, 64-70 (2015).

34. Lee JK, Choi YL, Kwon M, Park PJ. Mechanisms and consequences of cancer

genome instability: lessons from genome sequencing studies. Annu. Rev.

Pathol. 11, 283-312 (2016).

35. Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational

signatures in human cancers. Nat. Rev. Genet. 15, 585-598 (2014).

Page 33: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

36. Rosenthal R, McGranahan N, Herrero J, Swanton C. Deciphering genetic

intratumor heterogeneity and its impact on cancer evolution. Annu. Rev.

Cancer Biol. 1, 223-240 (2017).

37. Schwartz R, Schaffer AA. The evolution of tumour phylogenetics: principles

and practice. Nat. Rev. Genet. 18, 213-229 (2017).

38. Salk JJ, Schmitt MW, Loeb LA. Enhancing the accuracy of next-generation

sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet. 19,

269-285 (2018).

39. Shpak M, Lu J. An evolutionary genetic perspective on cancer biology. Annu.

Rev. Ecol. Evol. Syst. 47, 25-49 (2016).

40. Wu CI, Wang HY, Ling S, Lu X. The ecology and evolution of cancer: the

ultra-microevolutionary process. Annu. Rev. Genet. 50, 347-369 (2016).

41. Alexandrov LB, et al. Signatures of mutational processes in human cancer.

Nature 500, 415-421 (2013).

42. Rosales RA, et al. signeR: an empirical Bayesian approach to mutational

signature discovery. Bioinformatics 33, 8-16 (2017).

43. Milholland B, et al. Differences between germline and somatic mutation rates

in humans and mice. Nat. Commun. 8, 15183 (2017).

44. Smith TCA, Arndt PF, Eyre-Walker A. Large scale variation in the rate of

germ-line de novo mutation, base composition, divergence and diversity in

humans. PLoS Genet. 14, e1007254 (2018).

45. Rahbari R, et al. Timing, rates and spectra of human germline mutation. Nat.

Genet. 48, 126-133 (2016).

46. Harris K, Pritchard JK. Rapid evolution of the human mutation spectrum.

eLife 6, e24284 (2017).

47. Scally A. Global clues to the nature of genomic mutations in humans. eLife 6,

e27605 (2017).

Page 34: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

48. Shannon LM, et al. Genetic structure in village dogs reveals a Central Asian

domestication origin. Proc. Natl. Acad. Sci. USA 112, 13639-13644 (2015).

49. Patterson N, et al. Ancient admixture in human history. Genetics 192, 1065

(2012).

50. Raghavan M, et al. Upper Palaeolithic Siberian genome reveals dual ancestry

of Native Americans. Nature 505, 87 (2013).

51. Alexander DH, Novembre J, Lange K. Fast model-based estimation of

ancestry in unrelated individuals. Genome Res. 19, 1655-1664 (2009).

52. Gopalakrishnan S, et al. Interspecific gene flow shaped the evolution of the

genus Canis. Curr. Biol. 28, 3441-3449.e3445 (2018).

53. Monzon J, Kays R, Dykhuizen DE. Assessment of coyote-wolf-dog admixture

using ancestry-informative diagnostic SNPs. Mol. Ecol. 23, 182-197 (2014).

54. vonHoldt BM, et al. A genome-wide perspective on the evolutionary history

of enigmatic wolf-like canids. Genome Res. 21, 1294-1305 (2011).

55. Sinding M-HS, et al. Population genomics of grey wolves and wolf-like

canids in North America. PLoS Genet. 14, e1007745 (2018).

56. Martin SH, Davey JW, Jiggins CD. Evaluating the use of ABBA-BABA

statistics to locate introgressed loci. Mol. Biol. Evol. 32, 244-257 (2015).

57. Malinsky M, et al. Genomic islands of speciation separate cichlid ecomorphs

in an East African crater lake. Science 350, 1493-1498 (2015).

58. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative

modeling approach for rapid and robust local-ancestry inference. Am. J. Hum.

Genet. 93, 278-288 (2013).

59. Pickrell JK, Pritchard JK. Inference of population splits and mixtures from

genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

60. Leonard JA, et al. Ancient DNA evidence for Old World origin of New World

dogs. Science 298, 1613-1616 (2002).

Page 35: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

61. Castroviejo-Fisher S, et al. Vanishing native American dog lineages. BMC

Evol. Biol. 11, 73 (2011).

62. Brown SK, Darwent CM, Sacks BN. Ancient DNA evidence for genetic

continuity in arctic dogs. J. Archaeol. Sci. 40, 1279-1288 (2013).

63. Witt KE, et al. DNA analysis of ancient dogs of the Americas: Identifying

possible founding haplotypes and reconstructing population histories. J. Hum.

Evol. 79, 105-118 (2015).

64. van Asch B, et al. Pre-Columbian origins of Native American dog breeds,

with only limited replacement by European dogs, confirmed by mtDNA

analysis. Proc. R. Soc. B 280, 20131142 (2013).

65. Patten MM, Carioscia SA, Linnen CR. Biased introgression of mitochondrial

and nuclear genes: a comparison of diploid and haplodiploid systems. Mol.

Ecol. 24, 5200-5210 (2015).

66. Wheeldon TJ, et al. Y-chromosome evidence supports asymmetric dog

introgression into eastern coyotes. Ecol. Evol. 3, 3005-3020 (2013).

67. Bekoff M, Wells MC. Behavioral ecology of coyotes: social organization,

rearing patterns, space use, and resource defense. Z. Tierpsychol. 60, 281-305

(1982).

68. Gese EM, Ruff RL. Scent-marking by coyotes, Canis latrans: the influence of

social and ecological factors. Anim. Behav. 54, 1155-1166 (1997).

69. Perri A, et al. New evidence of the earliest domestic dogs in the Americas. Am.

Antiq. 84, 68-87 (2019).

70. Miao B, Wang Z, Li Y. Genomic analysis reveals hypoxia adaptation in the

Tibetan Mastiff by introgression of the gray wolf from the Tibetan Plateau.

Mol. Biol. Evol. 34, 734-743 (2017).

71. vonHoldt B, Fan Z, Ortega-Del Vecchyo D, Wayne RK. EPAS1 variants in

high altitude Tibetan wolves were selectively introgressed into highland dogs.

PeerJ 5, e3522 (2017).

Page 36: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

72. Brown SK, Darwent CM, Wictum EJ, Sacks BN. Using multiple markers to

elucidate the ancient, historical and modern relationships among North

American Arctic dog breeds. Heredity 115, 488 (2015).

73. Posth C, et al. Reconstructing the deep population history of central and South

America. Cell 175, 1185-1197 e1122 (2018).

74. Moreno-Mayar JV, et al. Early human dispersals within the Americas. Science

362, eaav2621 (2018).

75. Ostrander EA, Dreger DL, Evans JM. Canine cancer genomics: lessons for

canine and human health. Annu. Rev. Anim. Biosci. 7, 449-472 (2019).

76. Tang B, et al. iDog: an integrated resource for domestic dogs and wild canids.

Nucleic Acids Res. 47, D793-D800 (2018).

77. Manichaikul A, et al. Robust relationship inference in genome-wide

association studies. Bioinformatics 26, 2867-2873 (2010).

78. Koepfli K-P, et al. Genome-wide evidence reveals that African and Eurasian

golden jackals are distinct species. Curr. Biol. 25, 2158-2165 (2015).

79. Morgulis A, et al. Database indexing for production MegaBLAST searches.

Bioinformatics 24, 1757-1764 (2008).

80. Zhang Z, Schwartz S, Wagner L, Miller W. A greedy algorithm for aligning

DNA sequences. J. Comput. Biol. 7, 203-214 (2000).

81. Li H. Aligning sequence reads, clone sequences and assembly contigs with

BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997v1 (2013).

82. Li H, et al. The sequence Alignment/Map format and SAMtools.

Bioinformatics 25, 2078-2079 (2009).

83. McKenna A, et al. The Genome Analysis Toolkit: A MapReduce framework

for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297-

1303 (2010).

Page 37: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

84. Li H. A statistical framework for SNP calling, mutation discovery, association

mapping and population genetical parameter estimation from sequencing data.

Bioinformatics 27, 2987-2993 (2011).

85. Danecek P, et al. The variant call format and VCFtools. Bioinformatics 27,

2156-2158 (2011).

86. Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to

discover, genotype, and characterize typical and atypical CNVs from family

and population genome sequencing. Genome Res. 21, 974-984 (2011).

87. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics

analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870-1874 (2016).

88. Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-

likelihood trees for large alignments. PLoS One 5, e9490 (2010).

89. Zheng X, et al. A high-performance computing toolset for relatedness and

principal component analysis of SNP data. Bioinformatics 28, 3326-3328

(2012).

90. Wickham H. (eds) ggplot2: elegant graphics for data analysis (Springer, 2016).

91. Kahle D, Wickham H. ggmap: spatial visualization with ggplot2. R Journal 5,

144-161 (2013).

92. Browning SR, Browning BL. Rapid and accurate haplotype phasing and

missing-data inference for whole-genome association studies by use of

localized haplotype clustering. Am. J. Hum. Genet. 81, 1084-1097 (2007).

93. Chang CC, et al. Second-generation PLINK: rising to the challenge of larger

and richer datasets. GigaScience 4, 7 (2015).

Page 38: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary Figures

Supplementary information, Fig. S1. Tumor appearance (KM2, a male dog).

Page 39: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S2. Ploidy, cellularity estimation using

sequenza. The log posterior probability (LPP) of the observed data were calculated

for a range of candidate ploidy and cellularity values. The point estimate is the ploidy

Page 40: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

and cellularity with maximum LPP. The 95% C.R. (Confidence Region) is the

smallest (not necessarily contiguous) set of points with a total posterior

probability >0.95. The background color indicates the rank of the LPP (blue = most

likely, white = least likely), provided here to contrast other possible parameters that

are very unlikely under our model but might still be of interest. Local maxima are

indicated with a “+” and indicate possible alternative solutions.

Page 41: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S3. Unrooted neighbor-joining tree of 2.9M

genotype-polymorphic loci (allowing 2 missed samples) among 5 CTVTs. Node

labels indicate bootstrap values.

Page 42: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S4. Mutation spectrum. Signatures barplot with

error bars reflecting the sample percentiles 0.05, 0.25, 0.75, and 0.95 for each entry.

Page 43: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S5. Boxplot of Bayesian Information Criterion

values of signeR analysis, showing that the optimal number of signatures is 8.

Page 44: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S6. Differential Exposure Analysis plot showing

group-specific exposed signatures. P-values were calculated by comparing each

signature exposures among sample groups with Kruskal-Wallis Rank Sum Test.

Benjamini & Hochberg (1995) method was the correction method for P-values adjust

at the post-hoc tests. GDJ, golden jackals; CYT, coyotes; CTVT intersected, CTVT

sites intersected with panel’s SNPs; CTVT private, CTVT private alleles

Page 45: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell
Page 46: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S7. Box plots showing the significant

differences found of each signature when groups are compared against each

other. X axis labels represent eigen groups assigned according to the contribution of

this signature in different sample groups. Y axis represents mutation counts assigned

to each sample groups of this signature. GDJ, golden jackals; CYT, coyotes; PCD,

pre-contact dogs; DOG, all dogs except PCDs; WOLF, all wolves except Taimyr wolf.

Page 47: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S8. A neighbor-joining tree based on whole

genomes of 104 individuals. The golden jackals are the outgroup. Node labels

indicate bootstrap values. GDJ, golden jackals; CYT, coyotes; NWW, New World

wolves; OWW, Old World wolves; TMR, Taimyr wolf; PCD, pre-contact dogs; ASD,

Arctic sled dogs; EAD, East Asia dogs; NCD, Northern China dogs; IPD, India

Peninsula dogs; MECAD, Middle East and Central Asia dogs; AFD, African dogs;

MSD, mixed sled dogs; EUD, European dogs; NGD, Newgrange dog; HXH,

Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F, the CTVT founder.

Page 48: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S9. An approximate maximum-likelihood tree

based on whole genomes of 99 individuals (excluding specific admixed East Asian

dogs). The golden jackals are the outgroup. Node labels indicate bootstrap values.

GDJ, golden jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World

wolves; TMR, Taimyr wolf; PCD, pre-contact dogs; ASD, Arctic sled dogs; EAD,

East Asia dogs; NCD, Northern China dogs; IPD, India Peninsula dogs; MECAD,

Middle East and Central Asia dogs; AFD, African dogs; MSD, mixed sled dogs; EUD,

European dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave

dog; CTVT_F, the CTVT founder.

Page 49: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S10. A neighbor-joining tree based on whole

genomes of 99 individuals (excluding specific East Asian admixed dogs). The

golden jackals are the outgroup. Node labels indicate bootstrap values. GDJ, golden

jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; TMR,

Taimyr wolf; PCD, pre-contact dogs; ASD, Arctic sled dogs; EAD, East Asia dogs;

NCD, Northern China dogs; IPD, India Peninsula dogs; MECAD, Middle East and

Central Asia dogs; AFD, African dogs; MSD, mixed sled dogs; EUD, European dogs;

NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F,

the CTVT founder.

Page 50: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S11. An approximate maximum-likelihood tree

based on whole genomes of 96 individuals (excluding PCDs and the CTVT

founder). The golden jackals are the outgroup. Node labels indicate bootstrap values.

GDJ, golden jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World

wolves; TMR, Taimyr wolf; ASD, Arctic sled dogs; EAD, East Asia dogs; NCD,

Northern China dogs; IPD, India Peninsula dogs; MECAD, Middle East and Central

Asia dogs; AFD, African dogs; MSD, mixed sled dogs; EUD, European dogs; NGD,

Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F, the

CTVT founder.

Page 51: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S12. A neighbor-joining tree based on whole

genomes of 96 individuals (excluding PCDs and the CTVT founder). The golden

jackals are the outgroup. East Asian dogs are basal to all the dogs consistently with

recently published work. Node labels indicate bootstrap values. GDJ, golden jackals;

CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; TMR, Taimyr

wolf; ASD, Arctic sled dogs; EAD, East Asia dogs; NCD, Northern China dogs; IPD,

India Peninsula dogs; MECAD, Middle East and Central Asia dogs; AFD, African

dogs; MSD, mixed sled dogs; EUD, European dogs; NGD, Newgrange dog; HXH,

Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F, the CTVT founder.

Page 52: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S13. Principal components analysis of 102

individuals (excluding golden jackals). Specially, we colored each PCD to

distinguish them. CYT, coyotes; WOLF, worldwide wolves; TMR, Taimyr wolf;

DOG, all dogs except PCDs; PCD, pre-contact dogs; CTVT_F, the CTVT founder.

Page 53: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

a

b

Supplementary information, Fig. S14. a Population structure between the CTVT

founder, ancient and contemporary canids. ADMIXTURE clustering for K=2 to K=7

on pruned sites excluding golden jackals. Vertical lines represent individuals. Colors

indicate different ancestral components. Minimum coefficient of variation is when

K=2. CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; PCD, pre-

contact dogs; ASD, Arctic sled dogs; EAD, East Asia dogs; NCD, Northern China

dogs; IPD, India Peninsula dogs; MECAD, Middle East and Central Asia dogs; AFD,

African dogs; MSD, mixed sled dogs; EUD, European dogs; CTC, Cherry Tree Cave

dog; CTVT_F, the CTVT founder. b Cross-validation plot for the ADMIXTURE

analyses. K ranges from 2 to 7.

Page 54: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S15. D(CTVT founder, Pop2; Pop3, Andean Fox),

with the Z-score given on the x axis. Dashed lines indicate ±3 of Z score. GDJ, golden

jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; DOG,

Page 55: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

all dogs except PCDs; PCD, pre-contact dogs. Detail of Pop3’s label is recorded in

Supplementary information, Table S1.

Page 56: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S16. D(Pop1, Pop2; Coyote, Andean Fox), with

the Z-score given on the x axis. Dashed lines indicate ±3 of Z score. NWW, New

World wolves; OWW, Old World wolves; DOG, all dogs except PCDs; PCD, pre-

Page 57: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

contact dogs; CTVT_F, the CTVT founder. Detail of Pop3’s label is recorded in

Supplementary information, Table S1.

Page 58: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

Supplementary information, Fig. S17. D(Pop1, Pop2; CTVT founder, Andean Fox),

with the Z-score given on the x axis. Dashed lines indicate ±3 of Z score. GDJ, golden

jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; DOG,

Page 59: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

all dogs except PCDs; PCD, pre-contact dogs. Detail of Pop3’s label is recorded in

Supplementary information, Table S1.

Page 60: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

a

b

Supplementary information, Fig. S18. a TreeMix graph without migration edge. b

Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World

wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,

Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern

Page 61: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle

East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European

dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;

CTVT_F, the CTVT founder.

Page 62: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

a

b

Supplementary information, Fig. S19. a TreeMix graph with one migration edge. b

Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World

Page 63: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,

Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern

China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle

East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European

dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;

CTVT_F, the CTVT founder.

Page 64: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

a

b

Supplementary information, Fig. S20. a TreeMix graph with two migration edge. b

Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World

wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,

Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern

Page 65: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle

East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European

dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;

CTVT_F, the CTVT founder.

Page 66: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

a

b

Supplementary information, Fig. S21. a TreeMix graph with three migration edge. b

Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World

wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,

Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern

Page 67: Supplementary informationstatic-content.springer.com/esm/art:10.1038/s41422-019-0183-2/MediaObjects...The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell

China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle

East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European

dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;

CTVT_F, the CTVT founder.