hameau de l’étoile, montpellier, france · program 9:30-10:45 - keynote : v. minin - algorithmic...

33
June 25-29, 2018 Hameau de l’étoile, Montpellier, France

Upload: others

Post on 11-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

June 25-29, 2018Hameau de l’étoile, Montpellier, France

INFORMATION

St Roch Train StationThe bus

Meeting Point

Bus station, on the Sète bridge (In front of the train station )A bus will leave Montpellier on Monday at 5pm.Bus responsible:Gonché Danesh: +33 (0)6 31 93 47 79

In case of problems :Olivier Gascuel : 33 (0) 06 48 12 14 82

Location

The conference will be held at the Hameau de l’Etoile, a hamlet dedicated to seminars and conferences, located at St Martin de Londres, about 25 km north of Montpellier (south of France).

Domaine Le Hameau de l’Etoile Route de Frouzet34380 ST-MARTIN-DE-LONDRES Tél (+33) 04 67 55 75 73Fax (+33) 04 67 55 09 10

Taxi :

Taxi de St Martin de LondresCall first « Florent » at 06 69 34 03 03Ex : rates, in week, tram station Saint-Roch Train Station = 59 € / Airport=74 € more details : http://www.hameaudeletoile.com/en/access-plan.html

Hotels in Montpellier :

Practical INFORMATION

Program

18:00 Arrival at Hameau de l’Etoile in the afternoon (from 4 :00 pm – Bus from the train station at 5pm)18:00-19:30 Stand « Peer Community in »

19:30 Welcoming drink

20:30 Dinner

9:20-9:30 - Conference start!

9:30-10:45 - KEYNOTE : M. BLANCHETTE - Mining ancient mammalian genomes

10:45-11:15 Coffee break

11:15-12:35 - 4x20 min TALKS (including questions)F. POUYET - Identifying the neutrally evolving fraction of the human genome to infer demography and selectionB. PEREZ-LAMARQUE - Modeling tools for studying host-associated microbiota inheritance from metabarcoding dataC. METZIG - Prediction of Successful Clades in S. aureus in a Europe-wide databaseJ. CORANDER - Negative frequency-dependent selection in the evolution of bacteria

12:45 Lunch

14:00-15:00 - 3x 20 min TALKS (including questions) T. SANCHEZ - End-to-end Deep Learning Approach for Demographic History Infe-renceF. PARDI - Alignment-free phylogenetic placement of metagenomes via ancestral k-mersD. BRYANT - Inferring the evolution of Niches

15:00-15:15 Break

15:15-16:30 - KEYNOTE : I. MOLTKE - Inferring relatedness from genetic data

16:30 Coffee break & Free time

18:00-19:00 Stand « Peer Community in »

19:00-20:30 - POSTERS Wine and posters (1 to 11)

20:30 Dinner

Monday, June 25

Tuesday, June 26

Program

9:30-10:45 - KEYNOTE : V. MININ - Algorithmic and statistical advances in phyloge-netic stochastic mapping.

10:45-11:15 Coffee break

11:15-12:15 - 3x 20 min TALKS (including questions) R. DURBIN - Selection on hybridisation and gene flow in the Lake Malawi cichlid fish adaptive radiationA. ZHUKOVA - Reconstructing and Predicting the Emergence and Transmission of Drug Resistance Mutations in HIVM. FOURMENT - Phylogenetic inference with streaming data using sequential Monte Carlo

12:30 Lunch and free afternoon (canoe, hiking, theorems, etc.)

20:30 Dinner

9:30-10:45 - KEYNOTE : C. COLIJN - Using genomic data and modelling to forecast population composition and design vaccines

10:45-11:15 Coffee break

11:15-12:35 - 4x 20 min TALKS (including questions)T. VAUGHAN - An MCMC algorithm for Bayesian inference of hard polytomiesP. BILLER - On the robustness of Evolvability under different mutatorsM. MOSLONKA-LEFEBVRE - The phylodynamics of partner notification and contact tracing in HIV epidemicsJ. PENSAR - Genome-wide epistasis analysis

13:00 Lunch

14:00-15:00 : 3x 20 min TALKS (including questions)K. PARAG - Optimally Robust Design for Inference under Phylogenetic Coalescent ModelsI. ARBISSER - FST never satisfies the triangle inequality for biallelic markers with distinct allele frequenciesP. VITOR - Tracking selection in time-series population genomic data using ABC random forests

Wednesday, June 27

Thursday, June 28

Program

9:30-10:45 - KEYNOTE : J. LAGERGREN - Reconstructing tissue trees

10:45-11:15 Coffee break

11:15-12:30 - KEYNOTE : J. NOVEMBRE - New lenses on genetic variation: Tools for understanding geographic structure

12:45 Lunch and then farewell session

14:15 Bus to Montpellier

Friday, June 29

15:00-15:15 Break

15:15-16:30 - KEYNOTE : L. DURET - Biased gene conversion: the dark side of recombination

16:30 Coffee break & Free time

19:00-20:30 - POSTERSWine and posters (12 to 22)

20:30 Dinner

Thursday, June 28

Keynote speakers

> Mathieu BLANCHETTEComputational Genomics Lab, McGill University, Montreal, QuebecMining ancient mammalian genomes

> Caroline COLIJNImperial College LondonUsing genomic data and modelling to forecast population composition and design vaccines

> Laurent DURETUMR CNRS 5558 - LBBE «Biométrie et Biologie Évolutive» UCB Lyon 1Biased gene conversion: the dark side of recombination

> Jens LAGERGRENComputational Biology department, KTH Royal Institute of Technology, StockholmReconstructing tissue trees

> Vladimir MININUniversity of California, IrvineAlgorithmic and statistical advances in phylogenetic stochastic mapping

> Ida MOLTKEThe Bioinformatics Center, Department of Biology, University of Copen-hagenInferring relatedness from genetic data

> John NOVEMBREDepartment of Human Genetics, Department of Ecology and Evolution, University of ChicagoNew lenses on genetic variation: Tools for understanding geographic structure

Keynote speakers

Talks

> ILANA ARBISSER, NOAH ROSENBERGBiology Department, Stanford University, California, USAFST never satisfies the triangle inequality for biallelic markers with distinct allele frequenciesThe population differentiation statistic FST is often treated as a pairwise distance measure between populations. As was known to Sewall Wright, however, FST is not a true metric because allele fre-quencies exist for which it does not satisfy the triangle inequality. We prove that a stronger result holds: for biallelic markers whose allele frequencies differ across three populations, FST never satisfies the triangle inequality. We study the deviation from the triangle inequality as a function of the allele frequencies of three populations, identifying frequency vectors at which the deviation is maximal. Next, we examine the extent to which FST fails to satisfy the triangle inequality in genome-wide data from human populations, finding that some loci have frequencies that produce deviations near the maximum. We discuss the consequences of the theoretical results for various types of data analysis, including multidimensional scaling and inference of neighbor-joining trees based on pairwise FST matrices.

> PRISCILA BILLER[1]; JULES LALLOUETTE[1]; VINCENT LIARD[1]; LAURENT GUÉGUEN[2]; GUILLAUME BESLON[1][3]; ERIC TANNIER[1][2];[1] INRIA Grenoble Rhône-Alpes, Villeurbanne, France; [2] Université Lyon 1 LBBE UMR5558, Villeurbanne, France; [3] Université Lyon 1 LIRIS UMR5205, Villeurbanne, France;On the robustness of Evolvability under different mutatorsThe theory of Evolvability consists in studying the evolution of living organisms as a computa-tional learning process. It defines the possibilities of a population under Darwinian selection, to evolve in a certain direction, in a reasonable amount of time. While its robustness to certain parameters has been theoretically assessed, this theory has never been experimentally tested. We use a standard in silico experimental evolution tool to compare some predictions of the theory and the behavior of digital populations designed to resemble biological organisms. We show two surprising results, that we then explain by investigations of some details in the theory. First, the evolvability of monotone conjunctions is not reproduced by the experiments. We show that this is due to the mutation algorithm, by proving an exponential waiting time to the target if its defi-nition varies. Second, parity functions do evolve to short targets under the binomial distribution. We confirm this result by an evolvability proof in that case. These results not only jeopardize the natural distinction between these two classes of functions, but most importantly, shed light on some unexpected properties of Evolvability. An important one is that the generous definition of the mutation algorithm, while being per se an advantage compared to other abstract evolutionary models, also allows, if it is misused, for the inclusion of oracles that are incompatible with the prin-ciples of a Darwinian evolution. Unfortunately these oracles are extensively used in the current evolvability proofs.

Talks

> DAVID BRYANTUniversity of Otago, Dunedin, NZInferring the evolution of NichesThe ecological niche of an organism is, abstractly, the set of environments which it can live in. We have been using techniques based on species distribution data and physiological models to infer niches for different species of trees. My interest has been in studying how those niches evolved down the phylogeny, and what that might say about speciation and adaption. Being a mathema-tician, though, I’ll talk about the theoretical challenges this has thrown up, like how to model random changes in a set over time. Our results so far are only preliminary, but have lead to some interesting questions.

> JUKKA CORANDERUiO/Sanger/University of HelsinkiNegative frequency-dependent selection in the evolution of bacteriaMany bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these ques-tions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent in-troduction of partialcoverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Baye-sian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions. Co-rander et al. Nature Ecology & Evolution, 2017, doi.org/10.1038/s41559-017-0337-x

Talks

> RICHARD DURBIN[1,2], HANNES SVARDAL[1], MILAN MALINSKY[3], ALEXAN-DRA TYERS[4], MARTIN GENNER[5], ERIC MISKA[6], GEORGE TURNER[4][1] Department of Genetics, University of Cambridge, UK; [2] Wellcome Sanger Ins-titute, Hinxton, UK; [3] University of Basel, Switzerland; [4] School of Biological Sciences, University of Bangor, UK; [5] Department of Biology, University of Bristol, UK; [6] Gurdon Institute, University of Cambridge, UK.Selection on hybridisation and gene flow in the Lake Malawi cichlid fish adaptive ra-diationThe hundreds of cichlid fish species in Lake Malawi constitute the most extensive recent vertebrate adaptive radiation. We have mapped its genomic diversity by sequencing 134 individuals covering 73 species across all major lineages. Average sequence divergence between species pairs is only 0.1-0.25%. These divergence values overlap diversity within species, with 82% of heterozygosity shared between species. Phylogenetic analyses suggest that no single species tree adequately repre-sents all species relationships, with evidence for substantial gene flow at multiple times. Common signatures of selection on visual and oxygen transport genes shared by distantly related deep water species point to both adaptive introgression and independent selection. Sequencing of related riverine Astatotilapia species from East African rivers indicates that the Malawi radiation arose from a hybridisation between at least two previously separated lineages, and that differentially fixed variants contributed from the ancestral lineages have been under adaptive selection within the Malawi radiation. More recently we have sequenced the genomes of nearly 200 additional species and over 1000 samples. In addition to discussing the studies described above, I hope to illustrate some of the questions concerning finer scale evolutionary processes within the species complex that we hope to address using these data, and some of the technical issues that these analyses create.

> MATHIEU FOURMENT[1]; BRIAN CLAYWELL[2], VU DINH[2], CONNOR MCCOY[2]; FREDERICK MATSEN[2]; AARON DARLING[1][1] ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia; [2] Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USAPhylogenetic inference with streaming data using sequential Monte CarloA major shortcoming of current phylogenetic methods is their inability to quickly incorporate new data as it becomes available. Adding new data to an analysis usually requires re-computing the entire analysis. Each analysis run can take decades of CPU time or weeks on a supercomputer facility, making re-analysis of large data sets impractical. This limitation could be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference: continuously amending new data and updating the estimate of the posterior probability distribution. After reviewing the SMC framework, I will describe how SMCs can be adapted to the streaming phylogeny problem by adding new taxa to the backbone tree. Similar to Bayesian MCMC algorithms, designing a good transition kernel is the most challenging aspect of implementing an SMC algorithm and I will present and compare some kernels that vary in complexity and efficiency. Preliminary results show that the streaming phylogeny algorithm can

Talks

outperform the widely-used MCMC-based algorithm implemented in MrBayes in terms of speed without incurring a significant loss in accuracy.

> CORNELIA METZIG, CAROLINE COLIJNImperial College LondonPrediction of Successful Clades in S. aureus in a Europe-wide databasePredicting the evolution of pathogens strains, in particular of features like antimicrobial resis-tance, is a challenging research question. We study this at the example of Staphylococcus aureus which is one of the most important human pathogens that causes a variety of diseases such as skin infection, food poisoning, bone and joint infections. One strain, the meticillin-resistant S. aureus (MRSA) is a major cause of both community- and hospital-acquired infections that are difficult to treat. We use data of a European-wide survey sampled in 2006 and 2011 where a number of 453 hospitals in 25 countries a total of 3753 isolates were sampled, of which 1130 resistant and 2621 non-resistant, which is a result of study design rather than of prevalence of cases. While most strains exist throughout Europe, some can be localised to a particular country. We constructed a phylogenetic tree, which we separated into clades. We truncate the extracted clades to isolates from earlier sampled cases (e.g. from isolates sampled in 2006), which we call the clade trunk. In this work we study ways to predict the success of a clade trunk, in order to understand which strains of S. aureus are expanding. We calculate a variety of tree summary statistics of these trunks (imbalance measures, counts of small substructures, and features using branchlength i.e. genetic distance between nodes). In addition, we use available metadata about the isolates of a clade, such as presence or absence of certain resistance genes. On these data we train a k-nearest neighbour and a random forest classifier for predicting expansion (success) of a clade trunk (measured in number of tips sampled in 2011). This method allows to study success of clades as well as the most informative predictors for success.

> MATHIEU MOSLONKA-LEFEBVRE; JAKUB VOZNICA; MIRAINE DAVILA FELIPE; FRÉDÉRIC LEMOINE; ANNA ZHUKOVA; OLIVIER GASCUELUnité Bioinformatique Evolutive, C3BI – USR 3756 Institut Pasteur & CNRS, Paris, FranceThe phylodynamics of partner notification and contact tracing in HIV epidemicsThanks to the growing popularity of phylodynamics approaches, it is now well established that pathogen genomes can be used to trace back infectious disease dynamics in space, time, and par-ticular risk groups. The models used in phylodynamics share the common assumption that new cases are detected randomly, as observed in public health screening programs that offer diagnosis to patients irrespective of their position in infection chains. Examples include routine HIV tes-ting for pregnant women or victims of incidents. However, health policies and patients’ spon-taneous practices that lead to the identification of new cases do not reduce to random screening. A key counter-example is given by partner notification (PN), one of the most widely used control measure against sexually transmitted infections such as HIV. PN is a highly targeted epidemic control measure, where an infected index patient notifies his / her partner(s) of their risk of infection, so they can in turn get diagnosed and treated, if need be. It follows that the viral strains

Talks

of index cases and subsequent notified patients are likely to be far closely related (in terms of ge-netics and sampling time) than if the sampling of cases was representative of the whole infected population, a bias that could alter the conclusion of phylodynamics studies. In practice, the noti-fication process can be done by the patients themselves (the so-called WHO “passive notification” or “spontaneous notification”) or be mediated by health care professionals and health authorities (referred to as “assisted notification” or “contact tracing”). Health experts currently debate the scope of PN, especially spontaneous PN, as it is difficult to estimate through surveys. Here, we assess and model the public health impact of PN on HIV epidemics from viral genetic data. Our goal is twofold: i) inform policy-making for efficient PN implementation based on a benchmark of PN relative impacts in different settings; ii) measure the observation biases induced by PN, and elaborate an alternative baseline model that account for PN and could limit the risk of biased conclusions. We introduce and explore a novel phylodynamics framework, the Birth-Death with Partner Notification (BDPN) model, which involves a non-Markovian process regarding the sam-pling of cases. We use this model to analyse genetic data of HIV-1 collected all around the world, and show that PN has a strong impact on the shape of phylogenies. This impact clearly depends on the region, country, health policy and risk group. The model parameters are estimated using a re-gression-based ABC approach and specifically designed summary statistics. Several test strategies are proposed to compare the standard null model to our BDPN model. Our results suggest that standard phylodynamics approaches tend to put too much emphasis on transmission dynamics, without properly modelling the diversity of detection pathways, a pitfall that could alter their conclusions.

> KRIS PARAG; OLIVER PYBUSDepartment of Zoology, Oxford UniversityOptimally Robust Design for Inference under Phylogenetic Coalescent ModelsThe coalescent process models how changes in the size of a population influence the genealogical patterns of sequences sampled from that population. The estimation of these hidden population changes given the observed sequence phylogeny is an important problem in epidemiology, phylo-geography and even human demography. While there is extensive literature developing statistical methods for coalescent inference, there is comparatively little research on how to optimally apply these methods or even on what ‘optimal’ means. The work that does exist is simulation based, allowing no general directives to be derived. As a result, any design strategies used have been heuristic and method specific. We examine three design problems: temporal sampling for the heterochronous, time-varying coalescent; spatial sampling for the structured coalescent and time discretisation in sequentially Markovian coalescent models. In all cases we prove that any design (i) working in the logarithm of the parameters to be inferred (e.g. population size), and (ii) distri-buting informative events evenly among these log-parameters (e.g. coalescences), is optimally ro-bust. ‘Optimally robust’ means that both the total uncertainty on our estimates and the maximum uncertainty over any estimate are minimised. These results provide the first firm theoretical basis for why transforming to a log-population or choosing bin sizes to equalise coalescent counts is not just a good choice, but actually the best choice. Given its persistence across models, the log-even paradigm may be fundamental to coalescent design.

Talks

> BENJAMIN LINARD [1], FABIO PARDI [1,2][1] LIRMM, Univ Montpellier, CNRS, France; [2] Institut de Biologie Computation-nelle (IBC), Montpellier, FranceAlignment-free phylogenetic placement of metagenomes via ancestral k-mersMetagenomics projects are scaling up and spreading to many fields such as ecology, environmen-tal monitoring, plant pathology and medicine, in particular through the study of microbiomes and viromes. Central to all applications is the challenge of taxonomic classification of metagenomic sequences (reads or contigs). The most informative way to do this is phylogenetic placement, which seeks the evolutionary origin of each “query” metagenomic sequence within a reference phylogeny. Traditionally, this entails (1) aligning the query to the reference alignment (usually via a profile HMM), and (2) seeking the maximum likelihood position in the reference tree for the query sequence. In this talk we describe a novel, fast approach for phylogenetic placement. Based on the reference phylogeny and alignment, our algorithm calculates the posterior probabilities of k-mers (words of length k) within sequences originating from any branch in the reference phylogeny. All the k-mers having a non-negligible probability are stored in a database, together with their likely phylogenetic origin(s). Each query metagenomic sequence can then be matched against this database, producing both a genomic localization on the reference alignment and the phylogenetic placement itself on the reference tree. One advantage of our approach is that the da-tabase can be reused for the analysis of several different metagenomes. Experimental evaluation of our software (RAPPAS - Rapid, Alignment-free Phylogenetic Placement via Ancestral Sequences) was based on simulated datasets obtained from real-life environmental and clinical data (bacterial and viral metagenomes). We show that our method is significantly faster than the fastest available methods, and its placement accuracy is comparable to that of maximum likelihood approaches. Finally, we will discuss its potential for the detection of recombination in a metagenome through screening of metagenomic sequences with chimaeric phylogenetic origins.

> JOHAN PENSAR [1]; JUKKA CORANDER [1,3,4]; SANTERI PURANEN [2,3]; MAIJU PESONEN [2]; YINGYING XU [2][1] Department of Mathematics and Statistics, University of Helsinki, Helsinki, Fin-land; [2] Department of Computer Science, Aalto University, Espoo, Finland; [3] Department of Biostatistics, University of Oslo, Oslo, Norway; [4] Wellcome Trust Sanger Institute, Cambridge, United KingdomGenome-wide epistasis analysisThe potential for genome-wide modeling of epistasis has recently surfaced as a result of the possi-bility of sequencing densely sampled populations and the emerging families of statistical interac-tion models. For example, Direct Coupling Analysis (DCA), which has earlier been shown to yield valuable predictions for single protein structures, has recently been extended to genome-wide ana-lysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. In particular, a recently proposed DCA method employed several computational tricks to enable full model fitting over 100 000 polymorphisms, which represents the amount of core genomic variation observed in analyses of many bacterial species. Still, for pro-blems at that scale, the method is computationally quite heavy as it considers all positions simul-

Talks

taneously in a global manner. We have examined an alternative local approach, which is ultimately based on pairwise tests. In addition to being computationally much less demanding, we show that the local method is accuracy-wise competitive with the state-of-the-art DCA method on synthetic datasets generated to mimic real-world population sequence data of bacteria.

> BENOÎT PEREZ-LAMARQUE[1,2]; HÉLÈNE MORLON[1][1] Département de Biologie, Institut de Biologie, École Normale Supérieure, CNRS UMR 8197, 46 rue d’Ulm 75005 Paris, France ; [2] Institut de Systématique, Evolu-tion, Biodiversité, Muséum national d’Histoire naturelle, UMR 7205, 45 rue Buffon, 75005 Paris, FranceModeling tools for studying host-associated microbiota inheritance from metabarco-ding dataMicrobiotas play a central role in the functioning of multicellular life, yet understanding their in-heritance during host evolutionary history remains an important challenge. Microorganisms are either acquired from the environment during the life of the host (i.e. environmental acquisition), transmitted across generations with a faithful association with their hosts (i.e. vertical transmis-sion) or transmitted across populations or species by host-switch (i.e. horizontal transmission). Few quantitative tools exist for studying microbiota inheritance. Tools developed for studying host-symbiont are based on a cophylogenetic approach, which estimates events, such as host-swit-ches, by reconciling the phylogenies of the host with symbiont phylogenies. Such tools are not well adapted to studying the microbiota evolution for which robust phylogenies are often unavailable or very computationally intensive to generate. Here, we propose a model-based framework for quantifying the preponderance of vertical versus horizontal transmission during the evolution of host-associated microbial taxa. Our approach assumes that the evolution of each microbial Operational Taxonomic Unit (OTU) is independent of one another, and we model the evolution of microbial sequences within each OTU along a host phylogeny. We compute the probability distribution of a given microbial sequence alignment under a model including vertical and hori-zontal transmission. Given a host phylogeny and the associated microbial sequences at the tips of the phylogeny, we use this probability distribution to compute the maximum likelihood estimate of the number of past host-switches. We develop a model selection procedure, including testing for strict vertical transmission and independent host-symbiont evolution, that allows identifying OTUs that evolved with their host from whole-microbiota high-throughput sequencing data. We test our approach using simulations, and provide an empirical application on the gut microbiota of the Hominids family. We find patterns of cospeciation between the great apes and several bacterial species related to fibrolytic capacities; our results highlight the role of varying digestive abilities during Hominids evolution. References: Bright et al. (2010), A complex journey: transmission of microbial symbionts, Nature Reviews Microbiology De Vienne et al. (2013), Cospeciation vs host-shift speciation: methods for testing, evidence from natural associations and relation to coe-volution, New Phytologist Groussin et al. (2017), Unraveling the processes shaping mammalian gut microbiomes over evolutionary time, Nature Communications Huelsenbeck et al. (2000), A Bayesian framework for the analysis of cospeciation, Evolution Ochman et al. (2010), Evolutio-nary relationships of wild hominids recapitulated by gut microbial communities, PLoS Biology

Talks

> FANNY POUYET [1,2]; SIMON AESCHBACHER [1,2,3]; ALEXANDRE THIÉRY [1,2]; LAURENT EXCOFFIER[1,2][1] University of Bern [2] Swiss Institute of Bioinformatics [3] University of Zurich, SwitzerlandIdentifying the neutrally evolving fraction of the human genome to infer demography and selectionIdentifying a set of variants that are evolving neutrally is crucial to both reconstruct population history and determine the molecular basis of evolutionary adaptations. Here, we focus on the nu-mber of derived alleles per individual, a statistic that is the same on expectation for any individual of our sample, irrespective of the demography of its population. We find this statistic to be strongly correlated with local recombination rate in a number of human populations that we examined, and argue that this pattern is best explained by background selection (BGS). This correlation is also observed in non-transcribed regions and at non-conserved sites genome wide. Genomic re-gions of high recombination show no effect of BGS, but they are strongly affected by GC-biased gene conversion (gBGC). These two processes taken together affect more than 95% of the variants in our genome. We show that genomic regions usually used for demographic inference or for buil-ding null distributions to detect selection in other parts of the genome are sensitive to BGS and gBGC. In particular, we provide evidences that previous reconstructions of human demographic history based on synonymous sites or on non-coding regions, even far away from transcribed regions (> 50kb), are likely biased. We advocate that future inference should be based on sites unaffected by BGS and gBGC. Similar biases may have affected demographic inference in other mammals and in many other eukaryotes, but the statistic we use to identify these biases can be easily used to define a set of unbiased sites in other organisms, provided that a recombination map is available.

> THÉOPHILE SANCHEZ[1]; GUILLAUME CHARPIAT[1]; FLORA JAY[1];[1] Laboratoire de Recherche en Informatique (LRI) - Centre National de Recherche Scientifique (CNRS) - Institut National de Recherche en Informatique et en Automa-tique (INRIA) - Université Paris-Sud - Université Paris-Saclay, Orsay, France;End-to-end Deep Learning Approach for Demographic History InferenceRecent methods for demographic history inference have achieved good results, circumventing the complexity of raw genomic data by summarizing them into handcrafted features called summary statistics. We developed a new approach based on deep learning that takes as input the variant sites found within a sample of individuals from the same population, and infers demographic des-criptor values without relying on these predefined summary statistics. Our model processes raw data and learn automatically how to embed them in order to solve a demographic prediction task. We compare our approach to several methods frequently used in population genetics, such as Ap-proximate Bayesian Computation[1] and Random Forests[2], and highlight the advantages of our end-to-end deep learning framework. [1] Boitard, S., Rodríguez, W., Jay, F., Mona, S. & Austerlitz, F. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach. PLOS Genetics 12, e1005877 (2016). [2] Raynal, L. et al. ABC random forests for Bayesian parameter inference. Peer Community in Evolutionary Biology 100036 (2017). doi:10.24072/pci.evolbiol.100036

Talks

> TIMOTHY VAUGHAN[1]; TANJA STADLER[1]; PATRICK HOSCHEIT[2]; OLIVER PYBUS[3][1] Department of Biosystems Science and Engineering (D-BSSE), ETH Zurich, Basel, Switzerland; [2] l’Institut national de la recherche Agronomique (INRA), Jouy-en-Josas, France; [3] Department of Zoology, University of Oxford, Oxford, UKAn MCMC algorithm for Bayesian inference of hard polytomiesModern phylogenetic inference methods are almost exclusively restricted to the inference of binary trees, in which every ancestral node has exactly two offspring. This limitation explicitly precludes study of hard polytomies: true multifurcations resulting from, for example, near-instan-taneous bursts of speciation (in the macroevolutionary context) or infection (in the epidemiologi-cal context). In this talk, I will present a novel Markov chain Monte Carlo (MCMC) algorithm for sampling the space of phylogenetic time trees under models that allow for hard polytomies. This algorithm is implemented as an extension of the BEAST 2 phylogenetic inference platform [1], and can be combined with existing molecular substitution models for application to a wide variety of molecular data sets. So far the sampler has been applied to Bayesian inference under the Lamb-da-coalescent [2,3]: a generalization of the Kingman coalescent supporting multiple mergers of ancestral lineages. After demonstrating the ability of the algorithm to correctly recover known po-lytomies from genetic data sets simulated under a Lambda-coalescent model, I will discuss a pre-liminary application of the algorithm to the joint inference of polytomies and model parameters from publicly-available Ebola virus genomes sampled from the Kailahun district of Sierra Leone during the 2014-2015 West African Ebola virus epidemic. [1] Bouckaert et al., PLoS Comput Biol, 2014 [2] Pitman, Annals of Probability, 1999 [3] Sagitov, Journal of Applied Probability, 1999

> VITOR PAVINATO[1,2] ; JEAN-MICHEL MARIN[2]; MIGUEL NAVASCUÉS[1][1] UMR Centre de Biologie pour la Gestion des Populations (CBGP), INRA, Mont-pellier, France; [2] Institut Montpelliérain Alexander Grothendieck (IMAG), Uni-versite de Montpellier, Montpellier, FranceTracking selection in time-series population genomic data using ABC random forestsRecent theoretical works have shown that the interaction between the signals of demographic changes and selection can lead to bias in the inference of population size changes, migration rates, and the spurious identification of adaptive loci in genome scans. In this context, the joint estima-tion of selection and demography is a necessity, however not yet fully implemented. Methods of joint inference will allow us to have a better picture of the past and present evolutionary changes since it is possible to use the majority of the information present in population genomics datasets to properly account of demography signal to search for selection. We propose the use of Approxi-mate Bayesian Computation, a simulation-based method, to implement the joint inference of de-mography and selection parameters in population genomics studies. Traditional ABC approaches are computationally expensive, making its use in some scenarios challenging, particularly those including selection. This has changed with the introduction of random forests (RF) in ABC, which can alleviate the computation requirements. In addition, ABC-RF circumvents the limitations im-posed by the choice of the summary statistics and better implements model selection. We present

Talks

this new approach with the analysis of time-series population genomic datasets. It also has the potential to be applied in more complex demographic scenarios - cases that have more than one event of population contraction, expansion, and admixture.

> ANNA ZHUKOVA [1]; SOTA ISHIKAWA [1,2]; OLIVIER GASCUEL [1][1] Unité Bioinformatique Evolutive, C3BI – USR 3756 Institut Pasteur & CNRS, Paris, France; [2] Department of Biological Sciences, the University of Tokyo, Tokyo, JapanReconstructing and Predicting the Emergence and Transmission of Drug Resistance Mutations in HIVDrug resistance mutations (DRMs) appearing in HIV under drug selection pressure are especially dangerous when transmitted: The recipients become infected by strains that are not susceptible to certain types of treatment. In this study we describe and compare the patterns of drug resistance in different countries. We reconstruct ancestral states for presence/absence of common surveillance DRMs in phylogenetic trees of HIV using PASTML software (https://github.com/saishikawa/PASTML). In a tree annotated with PASTML, we can discriminate between two types of resistant nodes: (1) those whose parent node does not have the DRM, which could correspond to inde-pendent emergence of DRMs under treatment, and (2) those whose parent node is also resistant, such nodes form clusters of potentially transmitted drug resistance. We show that the frequency of these two patterns varies highly depending on the country, e.g. in Africa we find resistant nodes mostly of type (1), while in higher-income countries with large access to anti-retroviral treatment (ART), we see nodes of both types. Lastly, we study how these patterns evolve over time and the method ability to predict the future distribution of DRMs in studied populations.

Posters

Poster 1> PAULINA BOLIVAR[2]; LAURENT DURET[1]; CARINA MUGAL[2]; LAURENT GUÉGUEN[1]Laboratoire de Biométrie et Biologie Évolutive (LBBE), Lyon, France; [2] Department of Evo-lutionary Biology, Uppasa, Sweden.Stochastic mapping and the analysis of selectionProbabilistic modeling is a powerful way to study molecular evolution, and many efficient sta-tistical approaches exist in phylogeny to infer the evolutionary process explaining observed se-quences. In this context, stochastic mapping is an efficient approach to estimate many features of interest of the process on sites and branches, such as the count of substitutions of a given type. In the context of the evolution of coding sequences, we have adapted this approach to the analysis of selection. This allows a rigorous model based definition dN and dS, and better estimates of selec-tion. Usually, gene sequence compositions change along phylogenies. These changes strongly bias usual estimates of selection, that do not take this non-stationarity into consideration. With this new method, it is possible to estimate accurately dN and dS even with changing base composition. Another feature biases the estimates of selection. During recombination, biased gene conversion (gBGC) biases fixation probabilities towards GC alleles. This induces a tendency on neutral and non-neutral substitutions towards GC, and is mixed up with actual selection. With stochastic mapping, it is possible to consider exclusively substitutions that are independent of gBGC, and then to estimate accurately selection. With this approach we find an expected relation between population size and selection in birds genomes.

Poster 2> LINA HERBST [1]; THOMAS LI [2]; MIKE STEEL [3][1] Greifswald University, Institute of Mathematics and Computer Science, Greifswald, Ger-many; [2] School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand; [3] iomathematics Research Centre, University of Canterbury, Christchurch, New ZealandQuantifying the accuracy of ancestral state prediction in a phylogenetic tree under maximum parsimonyIn phylogenetic studies, biologists often wish to estimate the ancestral discrete character state at an interior vertex $v$ of an evolutionary tree $T$ fromthe states that are observed at the leaves of the tree. A simple and fast estimation method - maximum parsimony - takes the ancestral state at $v$ to be any state that minimises the number of state changes in $T$ required to explainits evolution on $T$. Here, we investigate the reconstruction accuracy of this estimation method further, under a simple symmetric model of state change, and obtain a number of new results, both for 2-state characters, and $r$-state characters ($r>2$). Our results rely on establishing new identities and inequalities, based on a coupling argument that involves a simpler `coin toss’ approach to ances-tral state reconstruction.

Posters

Poster 3> DAMIEN CORREIA[1, 3], FRÉDÉRIC LEMOINE[1], VINCENT LEFORT[2], OLIVIA DOPPELT-AZEROUAL[1], FABIEN MAREUIL[1], SARAH COHEN-BOULAKIA[3] and OLIVIER GASCUEL[1, 2][1] Unité de Bioinformatique Evolutive et Hub Bioinformatique et Biostatistique, C3BI - USR 3756 Institut Pasteur et CNRS, Paris, France [2] Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier, LIRMM - UMR 5506 CNRS et Université de Montpel-lier, France [3] Laboratoire de Recherche en Informatique, LRI - CNRS UMR 8623 et Univer-sité Paris-Saclay, Orsay, FranceNew Generation Phylogeny.frWith 50,000 data analysis per month and more than 2,500 citations, Phylogeny.fr [1] is highly used both at national and international level. Phylogeny.fr aims at facilitating the analysis of phylogene-tic data, by providing users with a way to chain tools widely used in phylogenetics, such as MUS-CLE, T-Coffee, TNT and PhyML, in a user-friendly interface. However, since its development, users’ needs have evolved and new tools and workflows have been published, thus promoting new practices. For example, nowadays, users run Phylogeny.fr for teaching, inducing possibly hundreds of users at the same time, or employ it in batch mode, leading to the submission of large amount of requests to the same server. In response to these evolutions, we have developed NG-Phylogeny.fr, an entirely redesigned web application dedicated to phylogenetic analyses, which: (1) Integrates state-of-the-art phylogenetic tools such as BMGE, Noisy, FastTree, PhyML-SMS, MrBayes, MAFFT and Booster; (2) Is flexible enough to ease the periodic update of tools and workflows; 3) Is scalable in terms of number of runs, thanks to a Galaxy [2,3] backend, which NG-Phylogeny.fr relies on for workflow execution; 4) Is based on DJANGO, a popular web framework, which makes it easily installable and maintainable. As a successor of Phylogeny.fr, NGPhylogeny.fr is a powerful resource made available to the community. In this poster, we will not only introduce the final architecture of NGphylogeny.fr but also outline a set of new and particularly interesting use cases demonstrating the benefit of using NGphylogeny.fr for both novice and expert users. [1] Dereeper A., Guignon V. Blanc G., Audic S., Buffet S., Chevenet F., Dufayard JF., Guindon S., Lefort V., Lescot M., Claverie J.-M.* and Gascuel O*. Phylogeny.fr: robust phylogenetic analysis for the non‐specialist. Nucleic Acids Res. 2008, 36 (Web Server issue):W465‐9. [2,842 citations, cf. google scholar] [2] Goecks J., Nekrutenko A., Taylor J. and Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11(8):R86. [3] Mareuil F., Doppelt-Azeroual O. and Ménager H. A public Galaxy platform at Pasteur used as an execution engine for web services [version 1]. F1000Research 2017, 6:1030 (poster) (doi: 10.7490/f1000research.1114334.1)

Poster 4> PIJUS SIMONAITIS [1], ANNIE CHATEAU [1,2,3], KRISTER SWENSON [1,2,3]LIRMM, Université de Montpellier, Montpellier, France [1], Institut de Biologie Computa-tionnelle (IBC), Montpellier, France [2], CNRS [3]A framework for cost-constrained genome rearrangement under Double Cut and JoinThe study of genome rearrangement has many flavours, but they all are somehow tied to edit dis-

Posters

tances on variations of a multi-graph called the breakpoint graph. We study a weighted 2-break distance on Eulerian 2-edge-colored multi-graphs, which generalizes weighted versions of several Double Cut and Join problems, including those on genomes with unequal gene content. We affirm the connection between cycle decompositions and edit scenarios first discovered with the Sorting By Reversals problem. Using this we show that the problem of finding a parsimonious scenario of minimum cost on an Eulerian 2-edge-colored multi-graph – with a general cost function for 2-breaks – can be solved by decomposing the problem into independent instances on simple al-ternating cycles. For breakpoint graphs, and a more constrained cost function, based on coloring the vertices, we give a polynomial-time algorithm for finding a parsimonious 2-break scenario of minimum cost, while showing that finding a non-parsimonious 2-break scenario of minimum cost is NP-Hard.

Poster 5> EMMA SAULNIER[1,2]; OLIVIER GASCUEL[2,3]; SAMUEL ALIZON[1][1] Laboratoire des Maladies Infectieuses et Vecteurs : Ecologie, Génétique, Evolution et Contrôle (MIVEGEC), Montpellier, France; [2] Institut de Biologie Computationnelle (IBC), Montpellier, France; [3] Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Paris, FranceAssessing the contribution of post-mortem transmision for the 2014-2016 Ebola outbreak, using regression-ABC

Poster 6> GONCHÉ DANESH[1]; EMMA SAULNIER[2]; SAMUEL ALIZON[1][1] Laboratoire “Maladies Infectieuses et Vecteurs, Ecologie, Evolution et Contrôle” (UMR CNRS, IRD, UM), Montpellier, France ; [2] Institut de Biologie Computationnelle, LIRMM (UMR CNRS, UM), Montpellier, FrancePhylodynammics of infections : comparing simulatorsThe field of phylodynamics hypotheses that the way pathogens spread leaves footprints in their genomes [1]. There are several phylodynamics inference methods but there are rarely compared. Recently the Phylogenetics and Networks for Generalized HIV Epidemics in Africa consortium (PANGEA-HIV) launched a contest to compare and evaluate phylodynamics methods. Viral sequences were simulated from two very detailed individual-level models and research groups were invited to analyze these data [2]. We propose to use an approximate bayesian computation approach (ABC) that relies on simulating phylogenetic trees [3]. This requires the simulation of the trajectory of the epidemic according to a specific model and the simulation of sampled trans-mission tree based on this trajectory. Here we present two simulators developed in Rcpp : one for the simulation of stochastic trajectories, based on Gillespie’s Stochastic Simulation Algorithms [4] and one for the simulation of the tree based on the coalescent approach. We compared the performances of our simulators, the rcolgem R package [5][6] and the software MASTER [7]. References : [1] Grenfell, B. T., Pybus, O. G., Gog, J. R., Wood, J. L., Daly, J. M., Mumford, J. A., & Holmes, E. C. (2004). Unifying the epidemiological and evolutionary dynamics of pathogens. Science, 303(5656), 327-332. [2] Pillay D, Herbeck J, Cohen MS, et al. (2015). The PANGEA-HIV Consortium: Phylogenetics and Networks for Generalised HIV Epidemics in Africa. The Lancet

Posters

Infectious diseases. 15(3), 259-261. [3] Ratmann O, et al. (2016). Phylogenetic Tools for Gene-ralized HIV-1 Epidemics: Findings from the PANGEA-HIV Methods Comparison [3] Saulnier E, Gascuel O, Alizon S. (2017). Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study. Ferguson NM, ed. PLoS Computational Biology. 13(3), e1005416. [4] Keeling, M. J. & Rohani, P. (2008). Modeling infectious diseases in humans and animals. Princeton. [5] Volz EM. (2012). Complex population dynamics and the coalescent under neutrality. Genetics, 190(1),187-201. [6] Rasmussen DA, Volz EM, Koelle K. (2014). Phylody-namic inference for structured epidemiological models. PloS Comput Biol, 10(4), e1003570. [7] Vaughan TG, Drummond AJ. (2013). A stochastic simulator of birth-death master equations with application to phylodynamics. Mol Biol Evol. 30(6), 1480–1493.

Poster 7> JAKUB TRUSZKOWSKI[1]; FABIO PARDI[1]; CELINE SCORNAVACCA[2][1] CNRS-LIRMM, MAB [2] CNRS-ISEM, Universite de MontpellierRapidly computing gene tree probabilities for concordant gene treesAbstract: The evolutionary histories of individual genes often differ from the species history due to diverse phenomena such as incomplete lineage sorting, duplications and horizontal gene transfer. As a result, gene tree topologies offer limited information about the species tree and inference of species trees is a non-trivial computational problem. Statistical inference of species trees from gene tree topologies has been limited by the computational difficulty of computing the likeliho-od of a gene tree topology given the species tree. All known exact algorithms for the problem scale exponentially with the number of taxa in the gene tree, which limits their use to relatively small data sets. In this work, we present an algorithm that can rapidly compute the probability of any gene tree topology concordant with a species tree under the multispecies coalescent. Our approach is based on a novel dynamic programming strategy which leads to an algorithm with a time complexity of O(n^4) in the worst case. In experiments, the algorithm runs up to 5 orders of magnitude faster than standard methods and allows us to compute likelihoods for gene trees with up to a thousand leaves in a few minutes. A slight modification of the algorithm enables us to compute the total probability that the species tree produces a concordant gene tree. We are currently investigating extensions to our algorithm to efficiently compute gene tree probabilities for general gene trees.

Poster 8> LEO VAN IERSEL[1]; REMIE JANSSEN[1]; MARK JONES[1];YUKIHIRO MURAKA-MI[1]; NORBERT ZEH[2];[1] Delft Institute of Applied Mathematics, Delft University of Technology, Delft, the Nether-lands; [2] Faculty of Computer Science, Dalhousie University, Halifax Canada;Phylogenetic inference using weakly displayed gene treesAbstract from the corresponding paper: ̀ Polynomial-Time Algorithms for Phylogenetic Inference Problems’ available on ArXiv (number 1802.00317). A common problem in phylogenetics is to try to infer a species phylogeny from gene trees. We consider different variants of this problem. The first variant, called Unrestricted Minimal Episodes Inference, aims at inferring a species tree based on a model of speciation and duplication where duplications are clustered in duplication episodes.

Posters

The goal is to minimize the number of such episodes. The second variant, Parental Hybridization, aims at inferring a species network based on a model of speciation and reticulation. The goal is to minimize the number of reticulation events. It is a variant of the well-studied Hybridization Number problem with a more generous view on which gene trees are consistent with a given spe-cies network. We show that these seemingly different problems are in fact closely related and can, surprisingly, both be solved in polynomial time, using a structure we call “beaded trees”. However, we also show that methods based on these problems have to be used with care because the optimal species phylogenies always have some restricted form. To overcome this problem, we introduce a new variant of Unrestricted Minimal Episodes Inference that minimizes the duplication episode depth. We prove that this new variant of the problem can also be solved in polynomial time.

Poster 9> FLORIAN MASSIP [1,2]; MARC LAURENT [3]; CAROLINE BROSSAS [3]; MA-RIE-NOELLE PRIOLEAU [3]; LAURENT DURET [1]; FRANCK PICARD [1][1] Laboratoire de biométrie et de Biologie Evolutive, Université Lyon 1, Villeurbanne, France; [2] Max Delbruck Center for Molecular Medicine, Berlin, Germany; [3] Institut Jacques Mo-nod, université Paris Descartes, Paris, FranceEvolution of Replication Origins in Vertebrate Genomes: Rapid Turnover Despite Selective ConstraintsIn vertebrate genomes, DNA replication starts in specific regions, called replication origins (Oris). Despite the critical role of the replication process in maintaining genome integrity, the genetic features that induce Oris’ activity are not well understood. To decipher sequence features essential for Oris’ activity, we generate Oris’ data in the chicken DT40 cell, providing the first genome-wide map of Oris in a bird, and conduct the first comparative study of Oris in vertebrates. We find that Oris’ loci present a specific nucleotidic composition signature, very similar in the three species. We next analysed the mutational constraints that shape human replication origins. On the short evolutionary time scale, we find strong negative selection signature at Ori loci in the human po-pulation, confirming the importance of Oris’ sequence for replication activity. In contrast, we find that Oris have experienced a rapid turnover during vertebrates’ evolution. Indeed, the spatial density in Oris strongly differs in human, mouse and chicken. In addition, pairwise comparisons of Oris’ map revealed that only 8 to 21% of Oris’ loci are conserved between two species. Finally, at this time scale, sequence conservation is not strongly affected by the presence of Oris. Our results thus indicate that Oris’ sequence content play an important role in the firing of replication in vertebrates. However, these sequence constraints are evolving rapidly in vertebrates’ genomes, and do not result in the presence of long well defined motifs. These lead us to propose to describe replication firing as a continuous, strongly multifactorial process. In this model, Oris would be hotspot for which the combination of genetic and epigenetic factors create an environment fa-voring replication firing. As such, these hotspots would be able to rapidly adapt to variations in genomes’ architecture.

Posters

Poster 10> VENELIN MITOV[1,2]; KRZYSZTOF BARTOSZEK [3]; GEORGIOS ASIMOMITIS [1]; TANJA STADLER [1,2][1] ETH Zurich, Zurich, Switzerland; [2] Swiss Institute of Bioinformatics, Lausanne, Switzer-land; [3] Linköping University, Linköping, SwedenA quadratic polynomial formulation enables fast likelihood calculation for multivariate Gaussian phylogenetic comparative modelsSince Felsenstein described the independent contrasts method (1), phylogenetic comparative me-thods (PCMs) have become a primary set of tools for quantifying phylogenetic signal, accounting for common history in comparative studies and testing evolutionary hypotheses, such as neutral evolution, adaptive radiation, and punctuated equilibrium (2, 3). Numerous PCMs have been de-veloped to model the link between micro-evolutionary forces acting at the generation level and phenotype distributions at the population level observable after many generations (2, 3). Most PCMs currently in use are Gaussian phylogenetic models, because, under the model assump-tions, the joint trait distribution at the tips of the phylogeny is multivariate Gaussian (4). Recently, Ho and Ané (5) proposed a linear-time algorithm for calculating the likelihood of the Gaussian phylogenetic models which satisfy a generalized three-point structure condition. For any model different than Brownian motion (BM), however, a model-specific non-linear transformation has to be applied to the tree. This transformation is complex, in particular for non-ultrametric trees, and for multiple trait models (see, e.g. appendix in (6)). As an alternative, we propose a generic linear-time algorithm for calculating the likelihood of multiple trait Gaussian models, using a quadratic polynomial representation of the likelihood function. Our approach is similar, although more general, to the integration-based likelihood calculation algorithms proposed in (7, 8). The key advantage of this algorithm is that it supports any tree-shape, eliminating the need of a tree transformation and allowing to implement any Gaussian model that satisfies the following two conditions: a) the mean of the underlying stochastic process at the end of a time interval of the tree depends linearly on the ancestral value; b) the variance-covariance matrix of the stochastic process after a time interval does not depend on the ancestral value. The above conditions are met by most commonly used Gaussian phylogenetic models (2,3) During the talk, I will present a new R-package implementing the quadratic polynomial algorithm and several applications, including estimating the heritability of pathogen traits (9) and stepwise AIC selection of Gaussian PCMс fitting to the evolution of body and brain mass in mammals (in preparation). References: 1. J. Felsenstein, Am. Nat. 125, 1–15 (1985). 2. T. F. Hansen, E. P. Martins, Evolution 50, 1404 (1996). 3. M. W. Pennell, L. J. Harmon, Annals of the New York Academy of Sciences. 1289, 90–105 (2013). 4. B. C. O’Meara, AREES. 43, 267–285 (2012). 5. L. S. T. Ho, C. Ané, Syst. Biol. 63, 397–408 (2014). 6. E. W. Goolsby, J. Bruggeman, C. Ané, Methods in Ecology and Evolution. 8, 22–27 (2016). 7. R. G. FitzJohn, Methods in Ecology and Evolution. 3, 1084–1092 (2012). 8. O. G. Pybus et al., PNAS. 109, 15066–15071 (2012). 9. V. Mitov, T. Stadler, Mol. Biol. Evol. msx328 (2018).

Posters

Poster 11> ELISE KERDONCUFF[1],[2]; AMAURY LAMBERT[2],[3]; GUILLAUME ACHAZ[1],[2][1] Institut de Systématique, Evolution, Biodiversité (ISYEB), MNHN, Paris, France; [2] Centre Interdisciplinaire de Recherche en Biologie (CIRB), Collège de France, Paris, France; [3] Laboratoire de Probabilités et Modèles Aléatoires (LPMA), UPMC, Paris, France.Detection of strong decline in populations by genomic approaches.Only 5% of described species have a conservation status. This lack of information is due to me-thods used to assess conservation status which cannot be generalized to all species. We thus de-veloped a model that describe the DNA of individuals sampled from a population evolving under different demographic scenarios. Using this model, we studied features of DNA sequences under a constant size or a strong population decline. We developed a new method to study demography based on length of compatible blocks along the genome, e.g. blocks of nucleotides within which we cannot detect recombinaison events. Lengths of compatible blocks depend on the frequency of recombination events which is influenced by the ancestral history of the population.

Poster 12> SYLVAIN PULICANI[1, 2, 3], KRISTER SWENSON[1, 2], ERIC RIVALS[1, 2], GIACOMO CAVALLI[3][1] Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIR-MM), Montpellier, FRANCE; [2] Institut de Biologie Computationnelle (IBC), Montpellier, France; [3] Institut de Génétique Humain (IGH), Montpellier, FRANCE;Linking Chromosomal Rearrangement to Chromatin Structure in the DrosophilaChromosomal rearrangements like inversion, duplication or transposition are known for more than a century and had been linked to aberrant phenotypes and cancers. There are initiated by a double-strand breaks of the DNA, delimiting the involved segment.The organized structure of the chromatin has been shown thanks to technologies like 3C/Hi-C or Fish. This spatial organization has a functional importance, is conserved throughout the evolution and its disruption is linked to tumorigenesis. What is the influence of the genome 3D on the distribution of rearrangements breakpoints? To answer this question, we compute evolutionary scenarios based on the chromo-some structures. We conceive methods that infer such scenarios giving advantage to breakpoints close in space.

Poster 13> FABIAN FREUND[3] , SEBASTIAN MATUSZEWSKI[N/A] , ELISE KERDONCUFF[1,2], JEFFREY JENSEN[4] , MARGUERITE LAPIERRE[N/A] , AMAURY LAMBERT[2] , GUIL-LAUME ACHAZ[1,2][1] Atelier de BioInformatique, ISYEB (UMR7205), Muséum Nationnal d’Histoire Naturelle, , Paris, France ; [2] SMILE, CIRB (UMR7241), Collège de France, Paris, France ; [3] University of Hohenheim, , Stuttgart, Germany ; [4] Arizona State University, , Tempe, United StatesThe mystery of the U-shaped spectraSince the advent of the neutral theory of molecular evolution the standard neutral model (muta-tion-drift equilibrium) has been elevated as a reference model in light of which we routinely inter-pret population genetics data. Using either forward diffusion or backward coalescent approaches,

Posters

it can be shown that the expected distribution of mutation frequencies (the so-called sites fre-quency spectrum, SFS) is proportional to 1/f, where f is the frequency of the mutations. We have collected a large collection of genome-wide SFS (averages of thousands of loci) from a diverse set of organisms, both eukaryotic (plants, fungi, animals) and prokaryotic (bacteria and archea). All the observed SFS were compared to the standard neutral SFS and, with only few exceptions, none fits the expectations of the reference model. The observed SFSs typically show an excess of low and high frequency variants, leading to what is known as U-shaped spectra. Including simple demographies (e.g. monotonic growth/decline) does not result in a perfect fit, neither can the inclusion of mis-orientation errors (erroneous swaps of derived and ancestral frequencies). We have therefore derived theoretical expectations from multiple mergers coalescent (MMC) models (i.e. Beta-coalescents and Xi-coalescents, that both can tuned by a single parameter) together with exponential growth and mis-orientation. MMC models emerge from modes of evolution where few individual can concentrate the parenthood of a large fraction of the population. This include some selective regimes (e.g. the genetic draft) or extremely skewed offspring distribution (e.g. sweep-take reproduction mode). Using both a likely-based approaches and least-square regres-sions, we show that this three parameter models (mis-orientation, growth rate plus concentrated parenthood) show an excellent fit to many of the observed SFS. We thus discuss the possible re-placement of the old reference model by a new one that do fit most of the data (what is expected from a null model !). We further discuss the potential causes of the excellent fit of this new model together the challenging observation that the standard neutral model typically fails to fit the data.

Poster 14> JAMES R. OTIENO[1]; EVERLYN M. KAMAU[1]; OKETCH JOHN[1]; NGOI JOYCE[1]; CHARLES N. AGOTI[1, 2]; GICHUKI ALEXANDER[1]; ANN BETT[1]; MWANAJU-MA NGAMA[1]; PATRICIA A. CANE[3]; PAUL KELLAM[4,5]; COTTEN MATTHEW[6]; LEMEY PHILIPPE[7]; AND D. JAMES NOKES[1, 8][1] Epidemiology and Demography Department, Kenya Medical Research Institute (KEMRI) – Wellcome Trust Research Programme, Kilifi, Kenya; [2] Department of Biomedical Sciences, Pwani University, Kilifi, Kenya; [3] Public Health England, Salisbury, United Kingdom; [4] Department of Medicine, Division of Infectious Diseases, Imperial College London, London, UK; [5] Kymab Ltd., Babraham Research Campus, Cambridge, UK; [6] Department of Vi-roscience, Erasmus Medical Center, Rotterdam, Netherlands; [7] Department of Microbiolo-gy and Immunology, KU Leuven- University of Leuven, Leuven, Belgium; [8] School of Life Sciences and Zeeman Instutute (SBIDER), University of Warwick, Coventry, United KingdomWhole genome analysis unravels the epidemiological and evolutionary dynamics of RSV ge-notype ON1 strainsAbstract The introduction of a novel virus variant, RSV genotype ON1, into a host community and replacement of previously circulating strains implies a fitness advantage over previously cir-culating strains, and the virus should possess some genomic ‘fitness signatures’. Furthermore, such an introduction, observed and tracked through time, provides an opportunity to better unders-tand the transmission (introduction and spread) and evolutionary dynamics of the virus. Through generation and analysis of whole genome sequences from genotype ON1 viruses isolated in Kilifi, and comparison with the global dataset, we observe signature amino acid substitutions that dis-

Posters

tinguish between ON1 and GA2 viruses which could have fitness implications. In addition, RSV transmission into this coastal Kenyan location is characterized by multiple introductions (from varied locations) of which only a small proportion becomes successful. This study offers new in-sights into the molecular and epidemiological processes that define RSV-A evolution.

Poster 15> JEREMY MANRY [1,2], ESTELLE MARION [3], CHRISTIAN JOHNSON [4,5], ANNICK CHAUTY [4,6], THIERRY GATEAU [4,6], LAURENT MARSOLLIER [3], LAURENT ABEL [1,2,7], ALEXANDRE ALCAÏS [1,2][1] Laboratory of Human Genetics of Infectious Diseases, Necker Branch, Institut National de la Recherche Médicale (INSERM) UMR 1163, Paris, France; [2] Imagine Institute, Paris Descartes ‐Sorbonne Paris Cité University, Paris, France; [3] INSERM UMR‐U892 and CNRS U6299, team 7, University of Angers, CHU d’Angers, Angers, France; [4] Fondation Raoul Follereau, Paris, France; [5] Centre Interfacultaire de Formation et de Recherche en Environ-nement pour le Développement Durable. Université d’Abomey Calavi, Bénin; [6] Centre de Dépistage et de Traitement de la lèpre et de l’Ulcère de Buruli (CDTLUB), Pobè, Benin; [7] St Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, Rockefel-ler University, New York, NY, USA; [7] Laboratoire de Bactériologie, CHU d’Angers, Angers, FranceTranscriptomic landscape of paradoxical reaction in Buruli ulcerBuruli ulcer, caused by Mycobacterium ulcerans, was identified as a neglected emerging infectious disease claiming more than 2,000 victims each year. If diagnosed late – which is often the case since the ulcer is painless – the disease can lead to permanent disabilities. It is known that approxi-mately 25% of these patients will develop a so-called paradoxical reaction (PR) during or after the treatment where no live mycobacteria is detected proving that the treatment is efficient while their symptoms worsen (e.g. the ulcer gets bigger or new ulcers appear). Such reaction can also be found in tuberculosis and leprosy patients. Here, 44 Buruli ulcer patients were followed during their treatment and blood was collected at different time points. During the follow up, 9 patients developed a PR. A comparison of the transcriptomes of whole-blood of these patients with the other allows us to identify genes that are differentially expressed between the two groups, which thus could explain why some people will engage a PR and why other won’t. Until now, only the comparison at day 30 after the beginning of the antibiotic treatment between patients who will not develop a PR vs. those who will, gave significant results highlighting a group of 68 differentially expressed genes (DEG) – with a log2 fold change always lower than 1 – after Benjamini-Hochberg FDR correction. Of note, the rather low sample size in this study offers limited power to detect such genes and an increase in sample size could lead to the identification of many more DEG. In a near future, additional patients will be integrated in the study to circumvent this issue. Another issue is the normalization of the samples. Here, all the sample were normalized together. Should we separate samples where a PR has been diagnosed? Comparison with reversal reaction observed in leprosy (Orlova et al., PLoS Genet, 2013) will be done and genes and pathways unique and com-mon to both reactions will be identified. At term, our work will provide the first transcriptomic study on PR and will allow us to better understand the mechanisms involved in the susceptibility

Posters

to the disease and its symptoms, and will pave the way for further studies aiming at identifying patients at high risk of developing a PR.

Poster 16> MARIE MOREL[1], FRÉDÉRIC LEMOINE[1], ANNA ZHUKOVA[1], OLIVIER GAS-CUEL[1,2][1] Unité Bioinformatique Evolutive, C3BI - USR 3756 Institut Pasteur & CNRS, Paris, France; [2] Institut de Biologie Computationnelle (IBC) & CNRS, Montpellier, FranceRevealing Convergent Evolution in Virus GenomesConvergent evolution at the molecular/genetic level is the independent acquisition of identical DNA substitutions in different lineages. Molecular convergence is a more and more studied phe-nomenon thanks to the increasing amount of genome wide data. Indeed, the acquisition of similar traits at the phenotypic level has been studied for many years, but without really being able to explain it at the genetic level [1]. Several studies now focus on understanding to what extent phe-notypic convergence can be related to similar changes at the genetic level [2]. In the case of higher eukaryotes, convergent evolution at the genetic level is assumed to be quite rare, but in organisms with very high mutation rate such as viruses, we find several examples of the independent surge of the same mutations in different lineages [3,4]. These convergent mutations are important to study since they can lead to 1) the prediction of evolutionary pathways for viruses under selective constraints, 2) better understand these constraints, for example in the case of treatments or host specificity, and 3) evaluate the interest of using these mutated regions as targets for therapeutic drugs. Here, we compare several existing methods using large HIV data sets. These methods rely on different models that we can classify in three groups: topological, identical and profile based, according to the definition of convergent substitution proposed by Rey et al. [5]. We then compare the results with known drug resistance mutations to evaluate the specificity and sensitivity of each method for this kind of data sets, for which convergent evolution is already well documented. [1] Storz, J.F. (2016). Causes of molecular convergence and parallelism in protein evolution. Nat Rev Genet 17, 239–250. [2] Rosenblum, E.B., Parent, C.E., and Brandt, E.E. (2014). The Molecular Basis of Phenotypic Convergence. Annual Review of Ecology, Evolution, and Systematics 45, 203–226. [3] Bertels, F., Metzner, K.J., and Regoes, R.R. (2017). Convergent evolution as an indicator for selection during acute HIV-1 infection. bioRxiv 168260. [4] Vignuzzi, M., and Higgs, S. (2017). The Bridges and Blockades to Evolutionary Convergence on the Road to Predicting Chikungunya Virus Evolution. Annu Rev Virol 4, 181–200. [5] Rey, C., Semon, M., Gueguen, L., and Boussau, B. (2018). Accurate detection of convergent substitutions. bioRxiv 247296.

Poster 17> ALISSA SEVERSON [1]; SHAI CARMI[2]; NOAH ROSENBERG[3][1] Department of Genetics, Stanford University; [2] Braun School of Public Health, the He-brew University of Jerusalem; [3] Department of Biology, Stanford UniversityThe effect of consanguinity on between-individual identity-by-descent sharingConsanguineous unions, in which mating pairs share a recent common ancestor, produce offs-

Posters

pring whose two genomic copies possess increased sharing of long segments inherited identi-cal-by-descent (IBD). In a population, consanguinity increases the rate at which IBD segments pair within individuals to produce runs of homozygosity (ROH). The extent to which such unions affect IBD sharing between rather than within individuals, however, is not immediately evident from within-individual levels. Using the fact that the time to the most recent common ancestor (TMRCA) for a pair of genomes at a specific locus is inversely related to IBD sharing between the genomes in the neighborhood of the locus, we study IBD sharing for a pair of genomes sampled either within an individual or in different individuals. We develop a coalescent model for a set of mating pairs in a diploid population, treating the fraction of consanguineous unions as a parame-ter. Considering a variety of types of consanguinity, we determine the effect of the consanguinity rate on TMRCA for lineage pairs sampled either within an individual or in different individuals. We find that consanguinity not only increases within-individual IBD sharing, it also increases between-individual IBD sharing, with the magnitude of the effect increasing with the kinship coefficient of the type of consanguineous union. Considering ROH and IBD computations in Jewish populations with consanguinity rates reported from demographic data, the model pro-vides an explanation for an observation that increases in consanguinity and ROH levels inflate between-individual IBD sharing in a population.

Poster 18> CARINA F. MUGAL[1]; INGEMAR KAJ[2][1] Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden; [2] Depart-ment of Matematics, Uppsala University, Uppsala, SwedenStatistical inference of the rate of adaptive evolution between closely related species in a time-dependent Poisson random field frameworkRates of protein evolution are frequently measured by the dN/dS ratio (commonly denoted as ω), which quantifies the strength of selection in protein-coding genes as judged from sequence divergence between two or more lineages. The incorporation of information on the frequency of synonymous and non-synonymous polymorphisms within a population in a McDonald-Kreit-man test framework allows to split up ω into a non-adaptive and an adaptive part and thereby provides information on the rate of adaptive protein evolution (ωa). Current approaches to infer ωa build on a stationary Poisson random field model, which is based on the indirect assumption that sequence divergence and species divergence are identical, an assumption, which is generally violated and reasonable only for distantly related species. The violation of the underlying assump-tion leads to an overestimation of ωa in particular for closely related species, where the impact of ancestral and lineage-specific polymorphisms on sequence divergence is substantial. Here, we invoke a time-dependent Poisson random field model (Kaj I. and Mugal C.F. Theor. Pop. Biol. 2016, 111, p51-64), and show that estimates of ωa can be expressed as a function of divergence time, sample size and the true rate of adaptive protein evolution. Hence, knowledge of divergence time seems crucial for accurate inference of the rate of adaptive protein evolution. We propose a statistical framework to address this issue.

Posters

Poster 19> NICOLA F. MUELLER[1][2]; HUW A. OGILVIE[3][4]; CHI ZHANG[5][6]; ALEXEI DRU-MMOND[3]; TANJA STADLER[1][2]; [1] Department of Biosystems Science and Engineering at ETH Zuerich, Basel, Switzerland; [2] Swiss Institute of Bioinformatics (SIB), Switzerland; [3] Centre for Computational Evolution, Auckland, New Zealand; [4] Division of Ecology and Evolution, Canberra, Australia; [5] Key Laboratory of Vertebrate Evolution and Human Ori-gins, Beijing, China; [6] Center for Excellence in Life and Paleoenvironment, Beijing, China;Inference of species histories in the presence of gene flowSpeciation occurs when populations become genetically isolated from one another. This isolation, which may be initially driven by geography or other environmental factors, results in the accu-mulation of genetic differences over time. These differences in turn can reinforce existing genetic isolation by introducing genetic incompatibility. On the other hand, gene flow during speciation is a homogenizing force that counteracts the process (Sousa and Hey, 2013). The multispecies coalescent model (MSC) (Rannala and Yang, 2003; Liu et al., 2009; Heled and Drummond, 2010) reconstructs the species tree while accounting for the discordance of gene trees from multiple loci due to incomplete lineage sorting. However, it assumes complete isolation (no gene flow) after speciation and can thus be biased in the presence of migration (Leache et al., 2014). The isolation-with-migration (IM) model (Nielsen and Wakeley, 2001; Hey and Nielsen, 2004; Wilkin-son-Herbots, 2008) on the other hand explicitly models gene flow after speciation (reviewed here Sousa and Hey, 2013). These methods however require a fixed species tree topology and node rank (i.e., oldest to youngest). An additional complication lies in the need to sample the migration his-tories (states of each lineage) using MCMC sampling, limiting the complexity of scenarios that can be analyzed. In this talk, I will present a novel isolation-with-migration model (AIM) to model the speciation process in the presence of gene flow. I will then go on to demonstrate the application of this method to the inference of the evolutionary history of great apes accounting for horizontal gene flow. In contrast to previous methods, our approach does not restrict the species tree topo-logy to be fixed but allow it to be co-inferred with the gene trees, population sizes, and migration rates, while integrating over all possible migration histories in a computationally tractable way. This approach is an extension to the marginal approximation structured coalescent (Mueller et al., 2017) and relies on solving a set of ordinary differential equations (ODEs) that describe the state of each lineage over time. We have implemented this model as part of StarBEAST2 (Ogilvie et al., 2017), a package for the phylogenetic software platform BEAST2 (Bouckaert et al., 2014).

Poster 20> BENJAMIN NGUYEN-VAN-YEN[1,2] ; BERNARD CAZELLES[2] ; RICHARD PAUL[1][1] Génomique fonctionnelle des maladies infectieuses, Institut Pasteur, Paris, France ; [2] Institut de biologie de l’École Normale Supérieure, Paris, FrancePhylodynamics for a better understanding of dengueDengue represents an estimated 100 to 400 million cases, and 25000 deaths every year ; it has been emerging quickly in tropical countries, and is already highly endemic in many low-income envi-

Posters

ronments. The disease exhibits com- plex dynamics, with four interacting subtypes, a mosquito vector, and a high proportion of asymptomatic and unreported cases ; but the classically avail- able incidence surveillance data is often insufficient to fit mechanistic models that take this complexity into account. The increasing availability of pathogen genome sequences, and the development of phylodynamic methods [1] offers an opportunity to complement this classical data. A few studies have already started looking at joint inference [2, 3], and show that case-report data and sequence data can inform us about different features of the transmission process. Phylodynamics methods have been succesfully used with outbreaks and endemic diseases when the coverage is high (Ebola, HIV, Influenza...) [4, 5], but the methods used in the field are still very much in flux. It is there-fore important to compare those different methods. We simulate some case-report and sequence data from a fairly realistic eco-evolutionary process, trying to see how much information the two sources of data give us about the process, and how well different methods exploit them, from MCMC, regression ABC [6] to PMCMC. We then use the methods on dengue data from French Polynesia collected during the 2001-2006 period. The case-report data provides more confident estimates than the sequence data, but put together we obtain better estimates, in particular for the reporting rate, which is in line with other studies [2, 3]. We further aim to apply that same methodology on more comprehensive data from an ongoing study in Kampong Cham, Cambo-dia, to evaluate the efficacy of vector control in schools. References [1] Erik M Volz. “Complex population dynamics and the coalescent under neutrality”. In: Genetics 190.1 (2012), pp. 187–201. [2] David A Rasmussen, Oliver Ratmann, and Katia Koelle. “Inference for nonlinear epidemio-logical models using genealogies and time series”. In: PLoS computational biology 7.8 (2011), e1002136. [3] Lucy M Li, Nicholas C Grassly, and Christophe Fraser. “Quantifying transmission heterogeneity using both pathogen phylogenies and incidence time series”. In: Molecular biology and evolution 34.11 (2017), pp. 2982–2995. [4] Trevor Bedford et al. “Global circulation patterns of seasonal influenza viruses vary with antigenic drift”. In: Nature 523.7559 (2015), pp. 217–220. [5] Daniel J Park et al. “Ebola virus epidemiology, transmission, and evolution during seven mon-ths in Sierra Leone”. In: Cell 161.7 (2015), pp. 1516–1526. [6] Emma Saulnier, Olivier Gascuel, and Samuel Alizon. “Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study”. In: PLoS computational biology 13.3 (2017), e1005416.

Poster 21> MIRAINE DÁVILA FELIPE[1]; JEAN-BAKA DOMELEVO ENTFELLNER[2]; FRÉDÉRIC LEMOINE[1]; and OLIVIER GASCUEL[1,3][1] Unité Bioinformatique Evolutive / C3BI USR 3756, Institut Pasteur & CNRS, Paris, France; [2] Department of Computer Science & South African National Bioinformatics Institute, Uni-versity of the Western Cape, Bellville 7535, South Africa; [3] Méthodes et Algorithmes pour la Bioinformatique, IBC - LIRMM UMR 5506, Université de Montpellier & CNRS, Montpellier, FranceMathematical and statistical properties of the phylogenetic transfer distanceThe transfer distance was introduced in the classification framework by Day (1981) and have been studied since then by other authors in the context of phylogenetic tree matching. Recently, the transfer distance showed to be quite operational to asses the branch support of phylogenies with

Posters

large data sets, thus providing a relevant alternative to Felsenstein’s bootstrap. This distance allows a branch b in a reference tree T to be compared to a branch b* from another tree T*, both on the same set X of n taxa. Roughly speaking, the transfer distance between these branches is the num-ber of taxa that must be transferred from one side of b* to the other, in order to obtain b. Then, if we take the minimum of the transfer distance from b to all branches in T*, we get a transfer index, measuring the degree of agreement of T with the reference branch b. We study and discuss the main properties of the transfer distance and index, when this reference branch b is compared to a tree T drawn according to a null model.

Poster 22> SOTA ISHIKAWA[1,2]; ANNA ZHUKOVA[1]; WATARU IWASAKI[2]; OLIVIER GASCUEL[1];[1] Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France; [2] Department of Biological Sciences, the University of Tokyo, Tokyo, Japan;A fast likelihood method to reconstruct and visualize ancestral scenarios of character evolu-tionThe reconstruction of ancestral scenarios is used in a number of domains to study the evolution of characters along a phylogenetic tree, for example in phylogeography to trace back the geogra-phical locations and moves of species or virus strains, with sequences to infer ancestral molecular characters and their changes in time, or in ecology to reconstruct past traits. The usual methods to reconstruct ancestral scenarios are based on parsimony and likelihood, assuming a probabilis-tic evolutionary model. In the likelihood framework one commonly uses the marginal posterior probabilities of the character states, and the joint reconstruction of the most likely scenario. Both approaches are somewhat unsatisfactory. Marginal reconstructions provide users with state pro-babilities, but these are difficult to interpret and use, as they are not associated to decisions (e.g. states A and C are likely and both must be kept, but we can discard G and T). Joint reconstructions select a unique solution for each of the tree node, and thus do not account for the uncertainty in-herent to ancestral inferences. We propose a fast and simple approach, which is in between these two extremes. We use decision-theory concepts and the Brier criterion, to associate each node in the tree to a set of likely states. In the tree regions where the uncertainty is low, a unique state is associated to the nodes. In the uncertain parts, typically the most ancient ones around the tree root, several states are associated to the nodes, reflecting the uncertainty of the inferences. The algorithm scale well and can be applied to very large trees comprising dozens of thousands leaves. To visualize the results, we cluster the neighboring nodes associated to the same states, and use graphical tools from the Cytoscape.js library. Our results on simulated data consistently show the accuracy of the approach and its robustness regarding evolutionary model violations. Then, we apply the method to a large dataset of more than 3,000 sequences from HIV-1 subtype C, sampled all around the world. We study both geographical character states, and the emergence and trans-mission of drug resistance mutations. Results are quite convincing: we retrieve and visualize in a few minutes of computing time the main transmission routes of HIV-1C; we demonstrate that drug resistance mutations mostly emerge independently under the treatment pressure, but some resistance clusters are found, likely corresponding to transmissions among untreated patients. Moreover, these results are robust to phylogenetic uncertainty. Our software is freely available as PASTML (https://github.com/saishikawa/PASTML).

Peer Community in : a free, public and transparent peer-review system for preprints

To offer an alternative to the current publication system - which is parti-cularly costly and not very transparent - we initiated the Peer Commu-nity in (PCI) project - with PCI Evolutionary Biology, PCI Ecology, PCI Paleontology and, possibly in a few weeks, PCI Computational Statistics. PCI is based on the publication of evaluations (peer-reviews) and recom-mendations of articles not yet published, but deposited - and freely acces-sible - in electronic form in an open archive available on the Internet (eg arXiv, bioRxiv..). These evaluations and recommendations are carried out voluntarily by the researchers without any link with private publishers.

Publication fees disappear: PCI offers the possibility of validating, distributing and consulting articles submitted to it free of charge. The deadlines for access to information are null: the scientific articles evaluated are deposited in the open archives as soon as they are written. The system becomes transparent: reviews, editorial decisions, authors’ responses and recommendations are published on the website of the scientific community concerned.

Interested? Need further explanation on how to proceed? Please contact us at: [email protected]://evolbiol.peercommunityin.orghttp://peercommunityin.org

Stand

Map