École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017tou30390.pdf ·...

260

Upload: others

Post on 17-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture
elodie
Zone de texte
École doctorale et discipline ou spécialité :
Page 2: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture
Page 3: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture
Page 4: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture
Page 5: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

1

AcknowledgementsRemerciements

Jesouhaiteraisicichaleureusementremerciertouteslespersonnesquiontcontribuéaubon déroulement de cette thèse de façon plus ou moins directe, par leur aide, leursconseilsouleursoutien.Merci tout d’abord à Jérôme Chave,mon directeur de thèse à Toulouse, pourm’avoirouvertlesportesdel’écologie,unedisciplinedontj’ignoraistoutoupresqueaudébutdemathèse.Ayantété jusqu’alorsprincipalementconfrontéà larecherchethéorique, j’aidécouvert un univers où les données, leur production, leur analyse et leurinterprétation,sontlenerfdelaguerre.Monadaptationnes’estpasfaiteenunjour,etje mesure aujourd’hui le chemin parcouru. Je te remercie Jérôme de tes efforts pourm’intégrer à la discipline en début de thèse, notamment enme permettant de partirdeux fois faire du terrain aux Nouragues, une expérience inoubliable. Merci pour tesconseilstoujoursavisésettesrelecturesattentives.Tonexigence,tonenthousiasmeettonénergieinépuisables,tonreculsurladisciplineettalargeculturescientifiquem’ontinspirétoutaulongdemathèse.Merci à Hélène Morlon, ma directrice de thèse à Paris, de m’avoir accueilli pour madernièreannéedethèse.Outrequecetteannéesupplémentairem’aétéprécieusepourterminermathèsedansdebonnesconditions,cetteimmersiondanslamacro-évolutiona été pour moi l’occasion de découvrir une autre façon d’aborder la recherche enécologieetévolution.MerciHélènepourtesconseils,tonsoutienettapatience.Etmercidem’avoirfaitprofiterdecetteatmosphèreunique,oùdynamisme,rigueuretefficacitériment si bien avec convivialité, bienveillance et décontraction. Nous n’avons pas eul’occasiondetravaillerensembleautantquejel’auraissouhaitéaucoursdecettethèse,etjemeréjouisparconséquentdepouvoircontinueràlefairedanslefutur.I am very grateful to Christopher Quince and Corinne Vacher for taking the time toreviewthisthesis.Thankyouverymuchforyourcarefulreadingandyourconstructivecriticism.IalsothankSébastienBrosseandFrancescoFicetolaforagreeingtobepartofthejury.

Merci àmes co-auteurset collaborateurs, sur le travaildesquelsunegrandepartiedecettethèseestbasée.

Merci enparticulier à Lucie Zinger, dont le travail d’analyse et d’interprétationdes données utilisées dans cette thèse a été crucial. Lucie, tu as été un précieux pontaveclabiologietoutaulongdecettethèsepourlephysiciendeformationquejesuis,etjetesuisinfinimentreconnaissantpourtesnombreusesexplicationsetconseils.MerciàPierreTaberletetEricCoissacdem’avoir faitbénéficierde leurexpertisestate-of-the-artdanslaproductionetl’analysedesdonnéesmetabarcoding.MerciàHeidySchimannpour son aide dans l’interprétation biologique des données, ainsi que pour son aide

Page 6: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

2

logistique sur le terrain. Merci à Amaia Iribar pour son aide indispensable dans lapréparationduterrain,sacontributionà laproductiondesdonnées,ainsiquepoursaremarquable efficacité pour surmonter les difficultés technique, logistique etadministrativedetoutordre,letoutdansunebonnehumeurinaltérable.MerciàSophieManzietElianeLouisannapourleuraidesurleterrainetleurtravaildewetlab.MerciàVincent Schilling pour sa contribution aux analyses bioinformatiques, son aide sur leterrain, et pour son humour en tant que camarade de carbet aux Nouragues. MerciégalementàSaintOmerCazaletAudreySagnepourleuraidesurleterrainàParacouetArbocel,etàDanielBoutaudpoursestrèsutilesporte-piochonstout-terrain.MercienfinàElodieCourtoisetBlaiseTymendem’avoirfaitdécouvrirlaGuyaneetlesNouraguesendébutdethèse,etpourcesmomentspartagéssurleterrain.

Je remercie en outre Antoine Fouquet et Jean-Pierre Vacher de m’avoir faitconfiancepour l’analysede leursdonnées.Merci également àHélèneHolotapour sonefficacité, sa disponibilité et sa gentillesse, à Blaise Tymen pour son aide avec lesdonnées Lidar, et àMélanie Roy, Antoine Fouquet, Gaël Grenouillet et Lounès Chikhi,entreautres,pourd’enrichissantesdiscussionsetpourleursconseils.

Je voudrais ensuite remercier un certain nombre de personnes qui, si elles n’ont pasdirectementcontribuéaucontenudecette thèse,ontéclairémes journéesde travailàToulouseetParisetontfaitleseldecesquatreannéesdevie.

Merci à mes co-bureaux toulousains Jessica et Félix pour toutes ces longuesdiscussions, et pour avoir supporté mon humour avec bienveillance en toutecirconstance.Merciàmesquatre«camaradesdepromo»àEDB,Arthur,Paul,Isabelleet Jean-Pierre, avec qui cela a été un immense plaisir de partager ces trois années àToulouse.MerciàBoris,Blaise,Mathieu,Olivia,Léa,Josselin,Aurèle,Luc,Nico,Camille,Céline,Marine,Lucie,Alice, Sébastien, Jade,Kévin, Jan,Fabian, Isabel… pour tous cesmomentspartagés,etàtouslesmembresd’EDBjeunesetmoinsjeunespourl’ambianceremarquabledulabo.

MerciàSimonetaux locatairessuccessifsde l’inénarrablemaisonde lacultureFrançois Magendie, à Louise et ses plongées dans le monde du théâtre, à Etienne etMathildeetleurs«écolesd’été»hippies,àGuillem,Claire,Hélène,Lucie,Lisa-Louetauxautres,pouravoirbrillammentpeuplé ces années toulousaines.Merci à Jean-PierreetSébastiendem’avoirinitiéàl’herpéto.MerciàAlexpoursonaccueilàMontpellieretcesdébatspassionnéssurlascienceetl’écologie.MerciauxAméricains:MarcetLéoetleurinspirant«SiliconValleyspirit»,etMatthieu,fidèlebirdingbuddy.EtmerciàSimonetFlorian pour leurs – trop rares – immersions dans l’informatique quantique et lesréseauxd’énergie.

Ungrandmerciégalementàmesco-bureauxparisiens.Marc,Odile, Julien,Eric,Leandro,Olivier,votreaccueilchaleureuxdansethorsdulaboagrandementfacilitématransitionparisienne.

Merci enfin aux anciens, aux vieux de la vieille, Grenoblois d’ici et d’ailleurs:Thibault,Mathieu,Arantxa,David,Lucas,Aurore,Carl,Vio… Jevouscompteparmi lesamis,maisc’estdéjàpresquelafamille.

Je remercie pour finir, last but not least, mes parents et mon frère, pour leursoutiensansfailleetôcombienindispensable.

Page 7: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

3

TableofContents

Introduction 5

I. Whatdrivestheassemblyofecologicalcommunities? 6II. DNA-basedbiodiversitypatterns 21III. Statisticalapproaches 34IV. Objectivesandoutline 53

Chapter1 69

CausesofvariationinsoilbetadiversityacrossdomainsoflifeinthetropicalforestsofFrenchGuiana

Chapter2 113

InferringneutralbiodiversityparametersusingenvironmentalDNAdatasets

Chapter3 163

TopicmodellingrevealsspatialstructureinaDNA-basedbiodiversitysurvey

Discussion 203

I. Synthesis 204II. Perspectives 208

Appendix 221

Large-scaleDNAbarcodingofAmazoniananuransleadstoanewdefinitionofbiogeographicalsubregionsintheGuianaShieldandrevealsavastunderestimationofdiversityandlocalendemism

Page 8: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

4

Page 9: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

5

Introduction

Page 10: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

6

I. Whatdrivestheassemblyofecological

communities?

Scienceconsistsinfindingpatternsinacollectionofisolatedobservationssoastogain

understanding of the processes that generated them. Natural sciences began with

attempts at classifying thediversity of the living organisms into categories (Aristotle,

IVth cent. BC), and this classification has been developed and perfected over the

centuries into themodernbinomialnomenclature (Linnaeus,1753).But classification

effortswere not limited to the description of species. Associations of species, and in

particular plant associations, were named using the samemodel, and were carefully

described based on their taxonomic composition and the abiotic properties of their

environment(Braun-Blanquet&Pavillard,1922).Eventhoughforestplantassociations

wereobservedshiftingthroughtime,thisphenomenonwasdescribedasmirroringthe

life cycle of individual organisms, from ‘youth’ to ‘senescence’ (Clements, 1916). The

underlying idea was that the organization of the living world obeyed static and

deterministic rules, which were to be uncovered. This idea was encouraged by the

discoveryoftheelegantlawsthatgovernphysicsandchemistry.

By contrast, early discoveries on evolution and biogeography (Darwin, 1859;

Wallace,1876)broughttheideathatchanceandhistoryhaveplayedanoverwhelming

roleinshapingthemodernlivingworld.Gleason(1926)andTansley(1935)werethe

first to contend that the diversity of plant associations was not well described by

discrete vegetation types, and that species associations were rather the transient

outcome of random dispersal events, constrained by abiotic conditions and species

interactions. Later, Hutchinson (1961), MacArthur (1972), Diamond (1975), Hubbell

(1979),Ricklefs(1987),andBrown(1995),amongothers,havesuccessivelyelaborated

on this idea, laying the foundations of modern community ecology. The term

‘community’ refers to all the organisms coexisting in a given location and at a given

time. Itmay also refer to a taxonomic subgroup of these organisms, such as a ‘plant

community’.

Page 11: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

7

The question of the relative role played by deterministic and stochastic

processesinshapingecologicalcommunitiesremainscentraltoecology.Inthissection,

I first argue that addressing this question is key to our ability to preserve natural

ecosystemsandtopredicttheirresponsetohumanperturbations.Ithenbrieflyreview

themechanismsofcommunityassemblythathavebeenproposed.

Motivations1.

The increasing awareness of the threats posed to natural ecosystems by human

activitieshasaddedasenseofurgencytothestudyofecologicalprocesses.Indeed,the

fate of the Earth’s biodiversity, and beyond it, of the ecosystems on which human

societiesrelyforfood,water,cleanair,health,andrawmaterials,hasbecomeamajor

sourceofconcern(Daily,1997).Asaconsequence,theoreticaladvancesinecologycan

no longer be considered in isolation from their practical implications. In particular,

manypredictionsrelevanttopolicy-makingstronglydependonassumptionsregarding

the mechanisms of community assembly. Thus, data-driven understanding of

community assembly is critical to well-informed policy-making. Three examples are

givenbelow:thepredictionofecosystemstabilityandstateshiftsinresponsetohuman

perturbations, thepredictionof the impactofclimatechange,and theconservationof

biodiversity.

Measuringecosystemstabilitytoperturbationsisasubjectofactiveresearch,as

istherelationshipbetweenbiodiversityandecosystemstability(McCann,2000;Tilman

et al., 2006; Loreau & deMazancourt, 2013). In this context, natural ecosystems are

commonlyrepresentedasstablecommunitiesheldtogetherbyspeciesinteractions,in

partbecausethisrepresentationlendsitselfwelltotheoreticalapproaches(Arnoldiet

al., 2016).Drawingon this framework, it hasbeenhypothesized that the responseof

ecosystems to perturbations may bear a similarity with that of physical systems

exhibitingcriticalphasetransitions(cf.Fig.1;Schefferetal.,2012).Accordingly,‘tipping

points’, sudden and difficult-to-reverse shifts in a system’s state in response to

Page 12: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

8

perturbation,shouldbeexpected(Brooketal.,2013).Moreover,suchstateshiftscould

be possibly predicted in advance through the identification of early-warning signals

(Carpenteretal., 2011; Schefferetal., 2012).While this typeof non-linearbehaviour

hasbeenevidenced in lakeecosystems (Carpenteretal., 2011), it remainsdifficult to

studyempirically,andknowledgeofcommunityassemblyprocesses iskey toprovide

realisticassumptionsforthetheoreticalpredictionofpossibletippingpoints.

Figure 1. The response of ecosystems to human-induced stress is commonly studied using anetwork representationof ecological communities, envisionedas stableentitiesheld togetherbyinteractions.Dependingonnetworkconnectivityandmodularity,theresponsemaybelinear(left) or exhibit a tipping point (right). Data-driven knowledge of community assemblyprocessesismuchneededtoinformsuchmodels.AdaptedfromSchefferetal.(2012).

Climatechangehasbecometheforemostthreattomanyecosystems,especially

thosethatarelessdirectlyimpactedbyhumanactivities.Speciesdistributionmodelling

is an important tool to predict the effect of climate change on biodiversity (Miller,

2010).Itconsistsininferringtheabioticrequirementsofindividualspeciesfromtheir

observed geographic distribution, and predicting their future distribution based on

predictedchangesinabioticconditions.Theneedtotakeintoaccountprocessesother

Anticipating Critical TransitionsMarten Scheffer,1,2* Stephen R. Carpenter,3 Timothy M. Lenton,4 Jordi Bascompte,5William Brock,6 Vasilis Dakos,1,5 Johan van de Koppel,7,8 Ingrid A. van de Leemput,1 Simon A. Levin,9Egbert H. van Nes,1 Mercedes Pascual,10,11 John Vandermeer10

Tipping points in complex systems may imply risks of unwanted collapse, but also opportunitiesfor positive change. Our capacity to navigate such risks and opportunities can be boosted bycombining emerging insights from two unconnected fields of research. One line of work isrevealing fundamental architectural features that may cause ecological networks, financialmarkets, and other complex systems to have tipping points. Another field of research is uncoveringgeneric empirical indicators of the proximity to such critical thresholds. Although suddenshifts in complex systems will inevitably continue to surprise us, work at the crossroads of theseemerging fields offers new approaches for anticipating critical transitions.

About 12,000 years ago, the Earth sud-denly shifted from a long, harsh glacialepisode into the benign and stable Hol-

ocene climate that allowed human civilization todevelop. On smaller and faster scales, ecosystemsoccasionally flip to contrasting states. Unlike grad-ual trends, such sharp shifts are largely unpre-dictable (1–3). Nonetheless, science is now carvinginto this realm of unpredictability in fundamentalways. Although the complexity of systems suchas societies and ecological networks prohibits ac-curate mechanistic modeling, certain features turnout to be generic markers of the fragility that maytypically precede a large class of abrupt changes.Two distinct approaches have led to these in-sights. On the one hand, analyses across networksand other systems with many components haverevealed that particular aspects of their structuredetermine whether they are likely to have criticalthresholds where they may change abruptly; onthe other hand, recent findings suggest that cer-tain generic indicators may be used to detect if asystem is close to such a “tipping point.”We high-light key findings but also challenges in these

emerging research areas and discuss how excit-ing opportunities arise from the combination ofthese so far disconnected fields of work.

The Architecture of FragilitySharp regime shifts that punctuate the usual fluc-tuations around trends in ecosystems or societiesmay often be simply the result of an unpredict-able external shock. However, another possibilityis that such a shift represents a so-called criticaltransition (3, 4). The likelihood of such tran-sitions may gradually increase as a system ap-proaches a “tipping point” [i.e., a catastrophicbifurcation (5)], where a minor trigger can invokea self-propagating shift to a contrasting state. Oneof the big questions in complex systems scienceis what causes some systems to have such tipping

points. The basic ingredient for a tipping pointis a positive feedback that, once a critical pointis passed, propels change toward an alternativestate (6). Although this principle is well under-stood for simple isolated systems, it is more chal-lenging to fathom how heterogeneous structurallycomplex systems such as networks of species,habitats, or societal structures might respond tochanging conditions and perturbations. A broadrange of studies suggests that two major featuresare crucial for the overall response of such sys-tems (7): (i) the heterogeneity of the componentsand (ii) their connectivity (Fig. 1). How theseproperties affect the stability depends on the na-ture of the interactions in the network.

Domino effects. One broad class of networksincludes those where units (or “nodes”) can flipbetween alternative stable states and where theprobability of being in one state is promoted byhaving neighbors in that state. Onemay think, forinstance, of networks of populations (extinct ornot), or ecosystems (with alternative stable states),or banks (solvent or not). In such networks, het-erogeneity in the response of individual nodesand a low level of connectivity may cause the net-work as a whole to change gradually—rather thanabruptly—in response to environmental change.This is because the relatively isolated and differ-ent nodes will each shift at another level of an en-vironmental driver (8). By contrast, homogeneity(nodes beingmore similar) and a highly connectednetwork may provide resistance to change until athreshold for a systemic critical transition is reachedwhere all nodes shift in synchrony (8, 9).

This situation implies a trade-off between lo-cal and systemic resilience. Strong connectivity

REVIEW

1Department of Environmental Sciences, Wageningen Univer-sity, Post Office Box 47, NL-6700 AA Wageningen, Nether-lands. 2South American Institute for Resilience and SustainabilityStudies (SARAS), Maldonado, Uruguay. 3Center for Limnology,University of Wisconsin, 680 North Park Street, Madison, WI53706, USA. 4College of Life and Environmental Sciences,University of Exeter, Hatherly Laboratories, Prince of WalesRoad, Exeter EX44PS, UK. 5Integrative Ecology Group, EstaciónBiológica de Doñana, Consejo Superior de InvestigacionesCientíficas, E-41092 Sevilla, Spain. 6Department of Economics,University of Wisconsin, 1180 Observatory Drive, Madison, WI53706, USA. 7Spatial Ecology Department, Royal NetherlandsInstitute for Sea Research (NIOZ), Post Office Box 140, 4400AC,Yerseke, Netherlands. 8Community and Conservation EcologyGroup, Centre for Ecological and Evolutionary Studies (CEES),University of Groningen, Post Office Box 11103, 9700 CCGroningen, Netherlands. 9Department of Ecology and Evolu-tionary Biology, PrincetonUniversity, Princeton, NJ 08544–1003,USA. 10University of Michigan and Howard Hughes MedicalInstitute, 2045 Kraus Natural Science Building, 830 North Uni-versity, Ann Arbor, MI 48109–1048, USA. 11Santa Fe Institute,1399 Hyde Park Road, Santa Fe, NM 87501, USA.

*To whom correspondence should be addressed. E-mail:[email protected]

Modularity

Stress

Sta

te

Sta

te

Stress

Connectivity

Heterogeneity

Adaptive capacity Resistance to change

Local losses Local repairs

Gradual change Critical transitions

+ +Homogeneity

+ +

+ +

Fig. 1. The connectivity and homogeneity of the units affect the way in which distributed systems withlocal alternative states respond to changing conditions. Networks in which the components differ (areheterogeneous) and where incomplete connectivity causes modularity tend to have adaptive capacity inthat they adjust gradually to change. By contrast, in highly connected networks, local losses tend to be“repaired” by subsidiary inputs from linked units until at a critical stress level the system collapses. Theparticular structure of connections also has important consequences for the robustness of networks,depending on the kind of interactions between the nodes of the network.

19 OCTOBER 2012 VOL 338 SCIENCE www.sciencemag.org344

on July 20, 2017

http://science.sciencemag.org/

Dow

nloaded from

Page 13: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

9

thanabioticrequirements,suchasspeciesinteractions,dispersallimitation,adaptation,

and phenotypic plasticity, has long been acknowledged (Guisan & Thuiller, 2005),

neverthelessmostpredictionsarestillobtainedwhileignoringtheseprocesses(Wiszet

al.,2013).Anotherapproachtopredictingtheeffectofclimatechangeonecosystemsis

through the dynamical simulation of ecosystems, either by simulating each organism

individually or using coarser models (Fisher et al., 2014). Building such models,

especially at the level of individual organisms, requires a clear understanding of the

processesrelevanttocommunityassemblyanddynamics.

Lastly, knowledge of community assembly is necessary to guide conservation

efforts.Assumptionsonthemechanismsofcommunityassemblyplayakeyroleinthe

debate on the optimal design of natural reserves (Cabeza & Moilanen, 2001) or on

species sensitivity to extinction (Tilman et al., 1994). Such assumptions are also

required toestimate theamountofbiodiversityharboured in species-richandpoorly

knownecosystems.Astraightforwardwaytoproceedistoassumethattherelationship

betweenthenumberofindividualsandthenumberofspecies,observedforasampleof

individuals, holds for the entire ecosystem. This reasoning implies that community

assembly can be regarded as random at the scale of the ecosystem. It has been for

instanceappliedtoAmazoniantrees,yieldinganestimatedtotalof16,000treespecies

extrapolatedfromabout5,000observedspecies(terSteegeetal.,2013).

Deterministicprocesses2.

Thedeterministicprocessesofcommunityassemblycanbedecomposedintotwomajor

components:abioticfilteringandbioticinteractions.

‘Abiotic filtering’ is a metaphor referring to the fact that species can only

establishthemselvesinlocationswhereabioticconditionssuittheirneeds:hence,any

givenlocationhostsonlyasubsetofthespeciesthatwouldhavetheabilitytoreachit

(Kraftetal., 2015).While this concept is very general, it has its roots in the studyof

plantcommunityassembly(Noble&Slatyer,1977). Inthiscontext,abiotic filtersmay

Page 14: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

10

include temperature, precipitation, soil nutrients, soil pH, soil grain size, soil water

content,soildepthandbedrock.

Biotic interactions refer to any type of interaction between organisms, either

betweenorwithinspecies,andcanbebroadlycategorizedintocompetition,predation,

parasitism, commensalism andmutualism (Schemske etal., 2009). Biotic interactions

mayfacilitateorhindertheestablishmentofaspeciesinacommunitydependingonthe

typeofinteraction,andassuchtheiractiononcommunityassemblymaybereferredto

as ‘biotic filtering’. Biotic and abiotic filtering are sometimes jointly referred to as

‘habitat filtering’ (Maireetal., 2012). Indirectbiotic interactions across trophic levels

mayhavecomplexandnon-trivialoutcomes.Forinstance, ifweassumethatatrophic

networkcanbedecomposedintodiscretetrophiclevels,increasingabundancesamong

the species belonging to a given trophic level (e.g., carnivores) lead to decreasing

abundances in the trophic level immediately below (e.g., herbivores), and in turn to

increasing abundances one level lower (primary producers), a process known as a

‘trophiccascade’(Paine,1980;Polisetal.,2000).Interspecificinteractionmayalsotake

theformofamodificationofsurroundingabioticconditionsbyorganisms,forinstance

by so-called ‘ecosystem engineer’ species (Wright et al., 2002), or simply through

shadinginthecaseofplants,thusblurringthelinebetweenabioticandbioticfiltering.

Withinasingletrophiclevel,competitionisconsideredtobethedominanttype

ofbioticinteractions(Chesson,2000).The‘competitiveexclusionprinciple’statesthat

the coexistence of two species competing for the same resource is not stable (Gause,

1932;MacArthur,1958;Hutchinson,1961;Armstrong&McGehee,1980).Indeed,ifone

of thespecieshasanevenslightcompetitiveadvantage, itwilleventuallyoutcompete

the other. Thus, any set of coexisting species is expected to exhibit differences in the

waytheyexploittheirhabitat.Thishasledtotheconceptof‘niche’,whichrefersinits

broader meaning to the relationship between a species and its habitat, including its

resource use, its interactionswith other species, and theway its occupies its habitat

both spatially and temporally (cf. Fig. 2; Grinnell, 1917; Hutchinson, 1957; Chase &

Leibold,2003).Aspecies’nichemayberepresentedasahypervolumeinthespaceofall

availableresourcesandpossiblehabitatuses.

Page 15: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

11

Figure 2.Aclassicalexampleofnichepartitioning:habitatpreferencesamongclosely relatedwarblerspeciesintheborealforestsofNorthAmerica.(A)CapeMay,(B)Blackburnian,(C)Bay-breasted, (D) Yellow-rumped, and (E) Black-throated Green warblers favour different treelayersanddifferenttreeheightswhenforagingforinsectsduringthebreedingseason.AdaptedfromMacArthur(1958).

In spite of theoretical predictions, the coexistence of many similar species

competing for a common resource in homogeneous environments is a common

occurrenceinnature.Thisisforinstancethecaseinspecies-richcommunitiessuchas

tropical forest trees and oceanic phytoplankton communities. This apparent paradox

has been called the ‘paradox of the plankton’ (Hutchinson, 1961). Thus, additional

October, 1958 WARBLER POPULATION ECOLOGY W (3

feeding. For this reason, differences between the species' feeding positions and behavior have been observed in detail.

For the purpose of describing the birds' feeding zone, the number of seconds each observed bird spent in each of 16 zones was recorded. (In the summer of 1956 the seconds were counted by saying "thousand and one, thousand and two, . . ." all subsequent timing was done by stop watch. When the stop watch became available, an attempt was made to calibrate the counted seconds. It was found that each counted second was approxi- mately 1.25 true seconds.) The zones varied with height and position on branch as shown in Figure 2. The height zones were ten foot units measured from the top of the tree. Each branch could be

divided into three zones, one of bare or lichen- covered base (B), a middle zone of old needles

(M), and a terminal zone of new (less than 1.5 years old) needles or buds (T). Thus a measure- ment in zone T3 was an observation between 20 and 30 feet from the top of the tree and in the terminal part of the branch. Since most of the trees were 50 to 60 feet tall, a rough idea of the height above the ground can also be obtained from

the measurements. There are certain difficulties concerning these

measurements. Since the forest was very dense, certain types of behavior rendered birds invisible. This resulted in all species being observed slightly disproportionately in the open zones of the trees. To combat this difficulty each bird was observed

for as long as possible so that a brief excursion into an open but not often-frequented zone would be compensated for by the remaining part of the observation. I believe there is no serious error in this respect. Furthermore, the comparative aspect is independent of this error. A different difficulty arises from measurements of time spent in each zone. The error due to counting should not affect results which are comparative in nature. If a bird sits very still or sings, it might spend a

large amount of time in one zone without actually requiring that zone for feeding. To alleviate this trouble, a record of activity, when not feeding, was kept. Because of these difficulties, non-parametric statistics have been used throughout the analysis of the study to avoid any a priori assumptions about distributions. One difficulty is of a dif- ferent nature; because of the density of the vegeta- tion and the activity of the warblers a large number of hours of watching result in disappointingly few seconds of worthwhile observations.

The results of these observations are illustrated in Figures 2-6 in which the species' feeding zones are indicated on diagrammatic spruce trees. While

4 9- fes 43. 8

13-2_-__ , -13.8

z ZO. 6-4_-ZI 3 8.4 . r '\ .3

I, .

4.0- / ~~V-5. 0

OBSERVAIO I OBEVAIN

~%

%T

- ~ ~ ~ ~ -

FIG. 2. Cape May warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.

the base zone is always proximal to the trunk of the tree, as shown, the T zone surrounds the M, and is exterior to it but not always distal. For each species observed, the feeding zone is illus- trated. The left side of each illustration is the percentage of the number of seconds of observa- tions of the species in each zone. On the right hand side the percentage of the total number of times the species was observed in each zone is entered. The stippled area gives roughly the area in which the species is most likely to be found. More specifically, the zone with the highest per- centage is stippled, then the zone with the second highest percentage, and so on until at least fifty percent of the observations or time lie within the stippled zone.

Early in the investigation it became apparent that there were differences between the species' feeding habits other than those of feeding zones. Subjectively, the black-throated green appeared omnervous," the bay-breasted slow and "deliberate." In an attempt to make these observations objective, the following measurements were taken on feeding birds. Then a bird landed after a flight, a count

This content downloaded from 129.199.24.197 on Tue, 09 May 2017 16:57:19 UTCAll use subject to http://about.jstor.org/terms

October, 1958 WARBLER POPULATION ECOLOGY 605

34.8 ,z4.7 1 ~ 10.5_\'l',1 7

______. __SE-3b

2 ...Q / _ 13 0 2.7J A A-\- 6.4

11.0- io4 . 672.6'

/ 3/

J/

% OF TOTAL % OF TOTAL lqUMBER (1631) MUMBER (7 7)

OF SECODS OF OF

OB5PE:RVAT1O0E O B53EP:VATIONE 3

FIG. 5. Blackburnian warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.

tion. To give a nonparametric test of the signifi- cance of these differences Table III is required.

Each motion was classified according to the di- rection in which the bird moved farthest. Thus, in 47 bay-breasted warbler observations of this type, the bird moved predominantly in a radial direction 32 times. Applying a X9 test to these, bay-breasted and blackburnian are not different but all others are significantly (P<.O1) different from one another and from bay-breasted and blackburnian.

There is one further quantitative comparison which can be made between species, providing ad- ditional evidence that during normal feeding be- havior the species could become exposed to dif- ferent types of food. During those observations of 1957 in which the bird was never lost from sight, occurrence of long flights, hawking, or hovering was recorded. A flight was called long if it went between different trees and was greater than an estimated 25 feet. Hawking is dis- tinguished from hovering by the fact that in hawk- ing a moving prey individual is sought in the air, while in hovering a nearly stationary prey indi-

L .59/ti ,t, 4.

:} 1 9 50

7 /

// /:0..:io: 7.0 \7

% OF TOTrAL OF TOT^AL. NqUM:BER L(416 6) 1;tUM-BM1:Z (Z 2R4S) OF SECONDS OF OF

O:B S3E1EV.ATI O OBS5:ERVAT I OqS

FIG. 6. Bay-breasted warbler feeding position. The zones of most concentrated activity are shaded until at least 50,01 of the activity is in the stippled zones.

duall is sought amid the foliage. This informa- tion is summarized in Table IV.

Both Cape May and myrtle hawk and undertake long flights significantly more often than any of the other species. Black-throated green hovers significantly more often than the others.

At this point it is possible to summarize differ- ences in the species' feeding behavior in the breed- ing season. Unfortunately, there are very few original descriptions in the literature for com- parison. The widely known writings of William Brewster (Griscom 1938), Ora Knight (1908), and S. C. Kendeigh (1947) include the best ob- servations that have been published. Based upon the observations reported by, these authors, the other scattered published observations, and the observations made during this study, the following comparison of the species' feeding behavior seems warranted.

Cape May W~arbler. The foregoing data show that this species feeds more consistently near the top of the tree than any species expect black- burnian, from which it differs principally in type

This content downloaded from 129.199.24.197 on Tue, 09 May 2017 16:57:19 UTCAll use subject to http://about.jstor.org/terms

604 ROBERT H. MACARTHUR Ecology, Vol. 39, No. 4

1 Al I t

9.8-I

_______~ 3 1_ HMlA

7.s/ l 1.*.* 10. 6

3~~~~~** '<34

/ l:,53.6 93.6\ ' \

/

5em.

% OF TOTAi. 7 OF TOTAL NUMBER (477 7) NUMBER (z263)

OF 5ECONDS OF OF

OZ5ERVATION OB5ERVATIONS

FIG. 3. Myrtle warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.

of seconds was begun and continued until the bird was lost from sight. The total number of flights (visible uses of the wing) during this period was recorded so that the mean interval between uses of the wing could be computed.

The results for 1956 are shown in Table I. The results for 1957 are shown in Table II. Except for the Cape May fewer observations were taken than in 1956.

By means of the sign test (Wilson, 1952), treating each observation irrespective of the num- ber of flights as a single estimate of mean interval between flights, a test of the difference in activity can be performed. These data are summarized in

the following inequality, where < is interpreted

to mean "has smaller mean interval between flights. with 95% certainty."

1 5.72 s, 1 ' -Z4.9

18.9

_____________ .13

1.4 / ' /& /2.3 3.34./ ..

% OF TOTAL % OF TOTAL

NUMBPIL (2 611) mu MBEP. (1 64)

OF 5ECOMID5 OF OF

OBS:EPrVAT1:O}.I OB.SE.VATI O 5

FIG. 4. Black-throated green warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.

their time searching in the foliage for food, some

appear to crawl along branches and others to hop across branches. To measure this the following procedure was adopted. All motions of a bird from place to place in a tree were resolved into

components in three independent directions. The natural directions to use were vertical, radial, and tangential. When an observation was made in which all the motion was visible, the number of feet the bird moved in each of the three direc- tions was noted. A surpringing degree of di- versity was discovered in this way as is shown in Figure 7. Here, making use of the fact that the sum of the three perpendicular distances from an interior point to the sides of an equilateral triangle is independent of the position of the point, the proportion of motion in each direction is re- corded within a triangle. Thus the Cape May

Black-throated green 95 Blackburnian 99 f Cape May t K< Myrtle f < |Bay-breastedf The differences in feeding behavior of the

warblers can be studied in another way. For, while all the species spend a substantial part of

moves predominantly in a vertical direction, black- throated green and myrtle in a tangential direction, bay-breasted and blackburnian in a radial direc-

This content downloaded from 129.199.24.197 on Tue, 09 May 2017 16:57:19 UTCAll use subject to http://about.jstor.org/terms

A B C

D E

Page 16: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

12

mechanisms need to be considered to account for species coexistence in such

communities (Tilman, 1982; Chesson, 2000). Even though a vast number of potential

mechanisms of species coexistence has been proposed (Palmer, 1994), they can be

roughly divided into ‘equalizing’ mechanisms, that reduce competitive differences

between species, and ‘stabilizing’mechanisms, that balance the effect of interspecific

competition(Chesson,2000).

Intraspecific competition represents one stabilizing mechanism. It has indeed

beenfoundempiricallythatcompetitionamongconspecificindividualsisoftenatleast

as intense as among different species (Connell, 1983). Predation and parasitism are

another important cause of negative intraspecific interactions among prey or host

species.Indeed,thefactthatpredatorsandparasitestendtospecializeononeorafew

speciesinducesa‘negativedensity-dependence’,i.e.favourslowerpopulationdensities.

This effect, known as the Janzen-Connell effect,was first proposed for tropical forest

trees (Connell, 1970; Janzen, 1970). Lastly, spatial and temporal fluctuations in

environmental conditions are also a stabilizing mechanism favouring species

coexistence(Chase&Leibold,2003;seesectionI.4below).

Competition, predation and parasitism act also as equalizing mechanisms.

Indeed, interspecific competition eliminates less competitive species from the

community, while predation and parasitism effectively offsets the competitive

advantageofthemostsuccessfulspecies(Chesson,2000).Theimportanceofequalizing

mechanisms and intraspecific competition in species-rich communities has prompted

some ecologists to propose that competitive differences between organisms could be

altogetherneglectedinsuchsystems(Hubbell,2001),asdiscussedinthefollowing.

Stochasticprocesses3.

However complex and fascinating the interplay of species’ niches is, community

assembly cannot be fully understoodwithout considering the influence of geography

andhistoryoncommunitycomposition(MacArthur,1972;Ricklefs,1987).Firstly, the

Page 17: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

13

capacity todisperse is finite in all species: offspringaremore likely tobe foundnear

parent individuals.Thus, community composition in a given location isdependenton

thepoolof species thatarewithindispersaldistanceof that location, andon random

dispersal events. The limiteddispersal of individuals generates spatial clusters in the

distributionofaspecies(Houchmandzadeh,2009),andthuscausesspatialvariationsin

communitycompositionevenintheabsenceofothermechanisms.Secondly,ifthereare

nocompetitivedifferencesbetweentwocompetingspecies,thefactthanoneislocally

commonandtheotherrareisduetochancealone.Therelativeabundancesofthetwo

speciesareexpectedtofluctuaterandomlyovertime,untiloneeventuallygoesextinct.

Thus,overasufficiently longperiodoftime,competitiveexclusionisexpectedtotake

place even in the absence of competitive differences. The larger the number of

competingspeciesinagivenlocation,thelowertheaveragepopulationofeachspecies

is,andthefasterthecommunitywilllosespeciestorandomdemographicfluctuations.

Thisprocesshasbeencalleddemographicorecologicaldrift,byanalogytotheprocess

ofgeneticdriftinpopulationgenetics(Etienne&Alonso,2007).

The ‘neutrality’ assumption is defined as the absence of any competitive

differences among individuals, irrespective of the species they belong to (Watterson,

1974; Caswell, 1976). Since dispersal limitation and demographic drift take place

independently of any competitive differences between organisms, they are often

referred to as ‘neutral’ processes, even though they are also present in non-neutral

systems. Under a dynamics governed by dispersal limitation and demographic drift,

ecological communities never reach equilibrium: their composition indefinitely shifts

overtime.Nevertheless,ifthetotalnumberofindividuals,thespeciesrichness,andthe

dispersal capacity of individuals remain constant over time, community structure

reaches a stationary state that can be described statistically as a function of these

parameters.

MacArthur & Wilson (1967) were the first to build dispersal limitation and

demographicdriftintoamodel,whichtheyusedasafoundationfora‘theoryofisland

biogeography’aimedatexplainingspeciesrichnessonislands.Theyreasonedthatthe

number of species found on a coastal island results from an equilibrium between

Page 18: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

14

immigration of new species from the mainland and species extinction on the island

throughdemographicdrift,eventhoughtheydidnotexplicitlyinterprettheirtheoryas

neutral.The twoprocessesarestochasticand their relative frequencydetermines the

number of species found on the island at any given time (cf. Fig. 3). They further

assumedthat the immigrationratedependsonthedistanceto themainland,andthat

theextinctionratedependsontheisland’sarea,thusenablingempiricalcomparisonof

theirtheorytoobservations(Simberloff&Wilson,1969).

Figure 3. MacArthur & Wilson (1967) were the first to combine dispersal limitation anddemographicdriftintoasimplemodel,thataimsatexplainingthenumberofspeciesfoundonislands.Theyassumedthatthenumberofspeciesresultsfromadynamicequilibriumbetweenstochasticimmigrationandextinction,whicharedependentondistancetothemainlandandonislandsize,respectively.AdaptedfromHubbell(2001).

The theory of island biogeography was later expanded to better account for

empirical observations (Brown & Kodric-Brown, 1977). It was also proposed that it

mightapplymoregenerallytoanypatchofisolatedhabitat(Brown,1978).Inparallel,

Watterson (1974) and Caswell (1976) used the mathematical tools of population

MacARTHUR AND WILSON’S RADICAL THEORY

Fig. 1.3. Various enhancements to the basic equilibrium hypothe-sis of MacArthur and Wilson do not change the dispersal assemblyassumption underlying the model. Downwardly bowing immigrationand extincton curves were added to characterize the effects of compe-tition on these rates, but all species, whether early or late colonizers,good or bad competitors, experience the same changes in rates. Sim-ilarly, the effects of island distance from the mainland and island sizeon immigration and extinction rates, respectively, operate equally onall species.

and Wilson fully appreciated the implications of this rad-ical assumption. A majority of their 1967 monographwas devoted to discussing such topics as species differ-ences in colonization strategies, causes of species differ-ences in extinction rates, temporal patterning in the orderin which species would successfully establish, and so on—alldifferences forbidden by their model! Although MacArthurand Wilson (1967) wrote about traditional ecological pro-cesses such as competition, the actual parameters of theirmodel were immigration and extinction rates, distance from

17

Page 19: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

15

genetics tomodelneutral communitiesat the levelof individualorganisms insteadof

thelevelofspecies,thusprovidingamoremechanisticdescriptionoftheprocesses,but

withoutincludingdispersallimitation.Hubbell(1979,1997,2001)eventuallycombined

both ideas into an influential neutral model, which he used as a basis to propose a

‘unifiedneutraltheoryofbiodiversityandbiogeography’.Histheorynotonlystatesthe

importanceofdemographicdriftanddispersallimitationforcommunityassembly,but

also proposes that they may be the dominating mechanisms in some species-rich

communities,especiallytropicalforesttreesandcoralreefs.Indeed,stronginterspecific

competition and predation could act as equalizing mechanisms between species in

these communities, as mentioned earlier, and combine with strong intraspecific

competitiontomakeallindividualsofallspecieseffectivelyequivalent(Scheffer&van

Nes, 2006). Another hypothesis is that in highly diversified communities, complex

interspecificinteractionscouldaverageoutatthescaleofthecommunity,leadingtoan

‘emergentneutrality’(Holt,2006).

In Hubbell’s model, the mainland’s species reservoir, called the

‘metacommunity’,undergoesademographicdriftwhererandomextinctionsareoffset

by random speciation events. The island, or ‘local community’, also undergoes a

demographicdrift,butrandomextinctionsareoffsetbythedispersal,orimmigration,of

individualsfromthesourcemetacommunity.Sincethemodelisneutral,allindividuals

are considered to have the same dispersal capacity, irrespective of the species they

belong to.The scopeof the theory isnot limited to isolatedhabitatpatches: the local

community may represent any spatially delineated ecological community, while the

metacommunityrepresentstheregionalpoolofspeciesconstitutedbytheaggregation

ofall localcommunities.Themodel iscontrolledbytwoparameters, the frequencyof

speciation events in the metacommunity, which determines the regional species

richness,andthefrequencyofimmigrationintothelocalcommunity.Theimmigration

fluxintothelocalcommunitymodulatesitsconnectivitywiththemetacommunity:the

stronger the immigration flux, the more species-rich and the more similar to the

metacommunity the local community is. Hubbell’s model and subsequent related

neutral models (Etienne & Alonso, 2007) are amenable to several quantitative

Page 20: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

16

predictions,andthustostatisticaltesting(this isdiscussedinmoredetails insections

II.1andIII.4).

Two distinct neutrality assumptions can be distinguished in Hubbell’s neutral

theory: one regarding themetacommunity dynamics, over an evolutionary timescale,

andoneregardingthelocalcommunitydynamics,overthetimescaleofanindividual’s

lifetime. Predictions regarding local community structure, namely the relationship

betweenareaandspeciesrichness,thedecayoftaxonomicsimilaritywithdistance,and

the distribution of relative species abundances (see section II.1), integrate both

assumptions. They are in good qualitative agreement with empirical data (Hubbell,

2001), nevertheless most datasets exhibit quantitative departure from neutrality

(McGill et al., 2006). The assumption of a neutral diversification dynamics in the

metacommunitycanbetestedseparately,andhasbeenshowntobeunrealistic.Indeed,

themean species agepredictedbyHubbell’smodel arenot consistentwith empirical

measurements(Ricklefs,2003,2006),andtheshapeofthepredictedphylogenetictrees

does not match that of empirically reconstructed trees (Davies et al., 2011). Hence,

recent approacheshave instead focusedon testing separately theassumptionof local

neutral assembly through immigration, with contrasting results depending on the

system(Sloanetal.,2006;Jabotetal.,2008;Ofiteruetal.,2010;Harrisetal.,2015).

Even though comparison of empirical patterns to model predictions suggests

that real ecological communities are rarely neutral, Hubbell’s neutral theory retains

important merits (Alonso et al., 2006). Indeed, it has been pointed out that all the

processesofcommunityecologyareunderpinnedbyonlyfourfundamentalprocesses:

naturalselection,demographicdrift,speciation,anddispersal(Vellend,2010).Yet,the

majorityof ecological literature focusesononlyoneof them,natural selection,which

underpins all niche differences between species and thus all deterministic ecological

processes. In contrast, Hubbell’s neutral theory focuses on the three remaining

fundamental processes, which are inherently stochastic, and places them in a

quantitative framework. In practice, neutral models are essential tools for twomain

uses (Rosindell et al., 2012). Firstly, they may serve as a ‘null model’ against which

empiricalpatternscanbecontrasted,soastoidentifycaseswhereneutralprocessesare

Page 21: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

17

sufficienttoexplainthedataandcaseswheretheyarenot.Secondly,theymayserveas

a parsimonious approximation to real systems, and as a foundation for incorporating

relevantnon-neutralmechanisms,suchasnichedifferences(Chisholm&Pacala,2010),

environmentalstochasticity(Kalyuzhnyetal.,2015),negativedensity-dependence(Du

etal.,2011),oramorerealisticspeciationdynamics(Rosindelletal.,2010).

Spatialandtemporalscales4.

Community assembly involves a range of temporal and spatial scales spanningmany

orders ofmagnitude - from the evolutionary timescale to the behaviour of individual

organisms,andfromtheglobalscaletothescaleofmicroorganisms(Chave,2013).The

continental scale is therealmofbiogeography,wherespeciesdistributionreflects the

geological and evolutionary history of continents (Cox et al., 2016), as well the

latitudinalgradientofdiversity(Hillebrand,2004).Attheoppositeend,moststudieson

species interactions focus on a limited number of individuals. Community ecology is

concernedwiththeintermediatescales(Lawton,1999):namely,withinabiogeographic

unit (Morrone, 2015), but encompassing a number of individuals large enough for

statistical patterns to emerge. The scale at which statistical patterns start emerging

depends on the type of organisms considered, and will differ by many orders of

magnitudebetweenplantsandbacteria.

Nicheandneutralprocessesmightalternatelydominateatdifferentspatialand

temporal scales.Firstly, locallyobservedspecies interactionsdonotprecluderandom

species assembly over larger spatial and temporal scales. Indeed, the majority of

interspecificinteractionsareopportunisticandvaryacrossspaceandtime(Holt,1996;

Poisot et al., 2014), despite much-studied instances of specialized interspecific

interactions such as plant-pollinator mutualisms (Rønsted et al., 2005). Secondly,

speciesdynamically adapt theirniche to the local competitive context, either through

plasticityorthroughnaturalselection.Forinstance,closelyrelatedspecieswithmostly

disjointgeographicaldistributionsareknowntodisplaygreaterphenotypicdifferences

Page 22: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

18

(such as a difference in size)wherever they co-occur, a process known as ‘character

displacement’ (Brown & Wilson, 1956). Natural selection has been found to have

measureableeffectsonphenotypeovertimescalesasshortasafewgenerationswhen

speciesareconfrontedwithasuddenchangeintheirbioticorabioticenvironment,thus

questioning the legitimacy of the traditional separation between the timescales of

evolutionaryandcommunityassemblyprocesses(Ghalamboretal.,2015).

Figure 4. Community assembly processes depend on the spatial and temporal scalesconsidered: current geographical patterns of tree diversity in Europe might reflect on-goingdispersalfromiceagetreerefugia,whichstarted14,000yearsago.Topright,bottomleftandbottomright:geographicaldistributionoftreediversity(increasingfromyellowtoblue)forall60Europeantreespecies,the45temperatespeciesandthe15borealspecies,respectively.Topleft: accessibility through dispersal from ice age tree refugia (black dots). Adapted fromSvenning&Skov(2007).

Anotherkeyaspectofcommunityassemblyishowfastcommunitycomposition

responds to abiotic change, relative to thepace of the abiotic change itself. Indeed, if

abiotic change is fast enough relative to community response, the community may

never reach equilibrium, thus leading to an apparently random dynamics. This

present time, i.e. extrinsic ecogeographical factors. Notably,it is easy to imagine that the cold-hardy Quercus robur hadmore and more northerly located refugia than Q. cerris andfor this reason could achieve an earlier and faster postglacialspread. The first tree species to spread would have metmuch less competition from other tree species than late-spreading species, which would have had to spread throughwell-established late-successional forest communities. Sven-ning & Skov (2004) did in fact find a relatively strongpositive correlation between range filling and cold hardinessin European trees.

C A N C U R R E N T P A T T E R N S O F T R E E D I V E R S I T Y B EP R E D I C T E D F R O M A S I M P L E M E A S U R E O FA C C E S S I B I L I T Y F R O M G L A C I A L R E F U G I A ?

While it is evident that climate and to a lesser extent otherenvironmental factors such as soil do constrain Europe-wide tree species diversity and distribution patterns (Walter& Breckle 1986; Pigott 1991; Sykes et al. 1996; Svenning &

Skov 2005), we will now consider the extent to which thesepatterns could entirely be caused by dispersal.

If diversity patterns were entirely driven by limiteddispersal out of the glacial refugia, we expect that the areasthat are most accessible from the refugia, i.e. located closestto the greatest number of refugia, would harbour the greatestnumber of species. Figure 2 shows the pattern of accessi-bility across Central and Northern Europe as well as theobserved pattern of tree species richness. The accessibility(ACC) of each grid cell in the receiving area (Central andNorthern Europe) was computed as the inverse of thesummed distances to all grid cells in the source area. Hence,the more distant a receiving grid cell on average is locatedfrom any one source cell the lower its accessibility. Thesource area was set to be Southern Europe at 43–46! N, aspostglacial expansions into Central and Northern Europeprimarily took place from or via this region (e.g. Petit et al.2002; Magri et al. 2006). Albeit some of the most cold-tolerant tree species had LGM refugia somewhat furthernorth, especially in eastern Europe (Willis & van Andel

Figure 2 Top-left: The accessibility of each 50 · 50 m grid cell in Central and Northern Europe to postglacial immigration from the ice agetree refugia, computed as the inverse of the summed distances to all grid cell in the source area (Southern Europe at 43–46! N). Top-right:The current native species richness of tree species (60 species in total, 2–31 species per cell) in Europe. Right: Bottom-left: The current nativespecies richness of temperate tree species (45 species in total, 0–22 species per cell). Bottom-right: The current native species richness ofboreal tree species (15 species in total, 0–10 species per cell). Colour coding corresponds to 10 equal frequency categories, with yellow overgreen to blue representing low to high accessibility and few to many species, respectively.

456 J.-C. Svenning and F. Skov Idea and Perspective

" 2007 Blackwell Publishing Ltd/CNRS

Page 23: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

19

phenomenonmaybemorepervasivethanitseems:forinstance,ithasbeenshownthat

thedispersaloftreespeciesinEuropefollowingtheendofthelasticeageisstillanon-

going process (cf. Fig. 4; Svenning & Skov, 2007). In contrast, organisms with short

generation time and high dispersal ability are able to track environmental changes

more efficiently. Additionally, if several local communities are connected by a

permanent and strong enough dispersal flux, they may never reach the optimal

composition that would be expected based on local abiotic conditions (Gravel et al.,

2006). A local community will also bemore prone to demographic stochasticity if it

hostsasmallerpopulationsize(Fisher&Mehta,2014).Theseobservationshaveledto

thedevelopmentinthelastdecadeof‘metacommunitytheory’,afamilyofmathematical

models aiming at reconcilingneutral andnicheprocessesby explicitly accounting for

spatialand temporaldynamics (Leiboldetal.,2004).However,unlikesimplerneutral

models,thesemodelsdonotprovidepredictionsthatareeasilyamenabletostatistical

comparisonwithempiricaldata.

Lastly,mostoftheexistingknowledgeoncommunityassemblycomesfromthe

study of plants and vertebrates, and the extension of community ecology to

microorganisms is comparatively very recent (Curtis & Sloan, 2005; Martiny et al.,

2006;seesectionII.2).Whilethefundamentalprocessesofcommunityassemblyapply

toalllivingorganisms,theyoperateoververydifferentscalesformicroorganisms,and

their relative importance is likely to differ (Hanson et al., 2012). It has long been

considered that microorganisms had effectively infinite dispersal capacity, and that

abiotic filtering was the dominant process of community assembly (Baas Becking,

1934). Microbial communities have indeed been found to be very sensitive to local

abioticconditionsanddominatedbyspecialisttaxa(Ramirezetal.,2014;Mariadassou

etal.,2015).Nevertheless,thisviewhasnowbeennuanced,anddispersallimitationhas

beenshowntoplayaroleaswell(Ofiteruetal.,2010;Martinyetal.,2011;Roguetetal.,

2015). While microorganisms tend to be more cosmopolitan than larger organisms,

biogeographic patterns do exist (Hanson et al., 2012; Livermore & Jones, 2015).

Microorganismshavealsobeenfoundableofcomplexinteractionsbeyondcompetition

(Corderoetal.,2012).

Page 24: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

20

Page 25: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

21

II. DNA-basedbiodiversitypatterns

Mostofecologicalknowledgecomes fromstudiesperformedat the levelof individual

species, and from this perspective, the singularity of each species and sometimes of

each individual isstriking.Thus,ecologistshave longwonderedwhethergeneral laws

werehidingbehindthecollectionofidiosyncrasies(Lawton,1999).Integrativedataon

speciesrichness,abundanceandspatialoccurrencehavebeengatheredwiththehope

that they would yield insight into the general mechanisms of community assembly

(Brown, 1995). The underlying idea is that, as in statistical physics, informative

statisticalpropertiesmightemergefromtheobservationofa largeenoughnumberof

individualsandspeciesirrespectiveofthedetailsofspeciesidentities.

Inthissection,Ifirstintroducetwotypesofintegrativepatternsthathavebeen

widely studied in community ecology: the distribution of species abundances, and

spatial patterns. I then discuss why the emergence of automated data collection is

opening new horizons for the study of these patterns. Lastly, I briefly present the

ecosystem that this thesismore specifically focuses on, the tropical forests of French

Guiana.

Integrativebiodiversitypatterns1.

a. Speciesrelativeabundances

The distribution of species abundances in a random sample of individuals takes two

forms in the ecological literature: the ‘rank-abundance distribution’ (RAD), or

‘Whittaker’splot’,consistsoftheabundancesniofallSspeciesinthesamplerankedby

decreasing abundance,while the ‘species abundancedistribution’ (SAD), or ‘Preston’s

Page 26: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

22

plot’, is the distribution of the numberΦ!of species having abundance n for all the

possible n values in 𝑛!,… ,𝑛! (cf. Fig. 5; Preston, 1948; Whittaker, 1965). To

accommodate the limited amount of data, the SAD is usually binned into abundance

categories. This binning step leads to a loss of information, thus the RAD is more

informative than the SAD. Nevertheless, the SAD has often been the preferred

distributionbecauseitiseasiertohandlemathematicallyandtoderivefromtheoretical

models.This is linked to the fact that it canbe interpreteduponnormalizationas the

probabilitydistributionfortheabundanceofarandomlychosenspeciesinthesample.

Because of the wide range of abundances typically observed in empirical data,

abundances are often log-transformed in SAD and RAD – in SAD, this amounts to

binning species into abundance classes of exponentially increasing width from the

lowest abundance class (one individual) to the highest, following the example of

Preston(1948).

Itwasnoticedearlyonthatthedistributionofspeciesabundancestendedtobe

similar in species-rich communities. Indeed, within a single trophic level, there are

usuallyafewcommonspeciesandalongtailofrarespecies–simplyput,‘mostspecies

are rare’ (cf. Fig. 5). This spurred attempts at finding a general explanation for this

pattern. Fisher et al. (1943) and Preston (1948) were the first to propose statistical

distributionstofitthedistributionofspeciesabundances.

Fisher assumed that the sampled species abundances followed a negative-

binomialdistributionwithoutthezero-abundanceclass,andderivedaSADoftheform

𝔼 Φ! = 𝛼𝑥! 𝑛,whereαisaconstantparameter,𝑥isafunctionofαandofsamplesize

N(with0 < 𝑥 < 1),and𝔼 Φ! isthestatisticallyexpectedvalueofΦ!(cf.sectionIII.3.b;

Chave, 2004). Since 𝔼 Φ!!!!! = −𝛼 ln(1− 𝑥), this distribution is called the ‘log-

series’.Aremarkablepropertyofthismodelisthattheexpectednumberofspecies𝔼 𝑆

in the sample is given as a function of the number of sampled individuals N by

𝔼 𝑆 = 𝛼 ln(1+ 𝑁 𝛼). Hence, the parameter𝛼is sufficient to predict the observed

speciesrichnessasafunctionofthesamplingeffort.Itcanthusbeusedasasampling-

independent measure of the community’s diversity. The value of𝛼 can be easily

Page 27: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

23

visualized in the RAD representation, since the log-transformed abundances are

expectedtodecreaselinearlywithslope−1 𝛼asafunctionofspeciesrank(cf.Fig.5).

Preston (1948) argued in contrast that a log-normal SAD best fitted empirical

data,i.e.𝔼 Φ! ∝ 𝑒! !"!!! !!! with𝜇and𝜎constantparameters.Anotabledifference

between the two SADs is that the log-normal distribution exhibits a mode (i.e., the

abundanceclasswiththemostspeciesisnotthelowestabundanceclass),whileFisher’s

log-seriesdoesnot.Prestonexplainedthefactthatbothsituationscouldbeencountered

inempiricaldatabytheeffectofsampling:acommunityinwhichthe‘true’SAD(i.e.,for

METACOMMUNITY DYNAMICS

Fig. 5.7. Preston-type plot of relative species abundance for treespecies >10 cm dbh in the 50 ha BCI plot, compared with expecta-tions from the lognormal, and from the zero-sum multinomial of theunified neutral theory, for θ = 50 and m = 0"10. The error bars are±1 standard deviation.

Fig. 5.8. Preston-type plot of relative species abundance for treespecies >10 cm dbh in the 50 ha Pasoh plot, compared with expecta-tions from the lognormal, and from the zero-sum multinomial of theunified neutral theory, for θ = 180 and m = 0"15. The error bars are±1 standard deviation.

135

METACOMMUNITY DYNAMICS

Fig. 5.9. Fitted and observed dominance-diversity distributions fortrees >10 cm dbh in the 50 ha plot on Barro Colorado Island, Panama.The best fit θ had a value of 50. Note the departure of the metacom-munity distribution for very rare species, but that the observed distri-bution is fit well once dispersal limitation (m = 0"10) is taken intoaccount. The error bars are ±1 standard deviation.

each figure. The metacommunity logseries distribution isthe diagonal line extending downward beyond the empiricalcurves to the lower right. The metacommunity distributionwas calculated for a fitted θ value of 50 in the case of theBCI forest, and for a fitted θ value of 180 in the Pasoh for-est. Then the parameters for dispersal limitation and localcommunity size were included to predict the local commu-nity dominance-diversity curves in each forest plot. Localcommunity size was 20,541 trees > 10 cm dbh in the BCIplot, but it was 28% higher (26,331) in the Pasoh plot. Thepreviously estimated values of m of 0.10 and 0.15 for BCIand Pasoh, respectively, were used. The precision of thepredicted local dominance-diversity curves in each plot isreadily apparent from figures 5.9 and 5.10. The expecteddistributions fit even the abundances of the rarest species

137

Figure5:SpeciesAbundanceDistribution(top)andRankAbundanceDistibution(bottom)formaturetreesinthe50-haBarroColoradoIsland(BCI)monitoredforestplot(Panama).Maturetreesaredefinedasstemswithdiameterlargerthan10cmatbreastheight(or‘>10cmdbh’).Thedispersal-limitedHubbell’smodelisfittedtothedata(𝜃 = 50,𝑚 = 0.1),andiscomparedwiththelog-normalSAD(top;dashedline),andwiththeRADofFisher’smodel(bottom;dashedline).Fisher’smodelisequivalenttoHubbell’smodelwithoutdispersallimitation(i.e.,case𝑚 = 1)forlargesamplesize.Errorbarsindicate±1standarddeviation.

Page 28: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

24

aninfinitenumberofindividuals)islog-normalcanloseitsmodeifunder-sampled,and

be mistaken for a log-series. It has since then been acknowledged that the effect of

samplingisindeedparamountinourabilitytodistinguishbetweendifferently-shaped

SAD by curve-fitting (Sloan et al., 2007). In the RAD representation with log-

transformed abundances, a log-normal SAD takes the form of an S-shaped curve, the

commonspeciesbeingcommonerandtherarespeciesrarerthaninFisher’slog-series.

Later models have focused on finding a mechanistic justification for the

proposed distributions.MacArthur (1957) proposed that species relative abundances

resultedfromtherandompartitioningofthenichespacebetweenthedifferentspecies

of the community. A number of more sophisticated niche partitioning models’ were

subsequentlyproposed(Tokeshi,1996;McGilletal.,2007).However,Hubbell’sneutral

model is themechanisticmodel thathasbeen themost successful at fittingempirical

SADs (Hubbell, 2001; cf. section I.3 and III.4). Indeed, the metacommunity SAD

convergestowardFisher’slog-seriesforalargeenoughsamplesizeandischaracterized

by a ‘fundamental biodiversity number’ θ that converges toward Fisher’s𝛼(Chave,

2004). Intheabsenceofdispersal limitation,the localcommunity isarandomsample

fromtheregionalmetacommunity,andhencealsoexhibitsalog-series-likeSAD.Inthe

presenceofdispersallimitationhowever,thedepletionofrarespeciesandtheincrease

inabundanceoflocallycommonspeciesleadtoalog-normal-likeSAD(cf.Fig.5).Thus,

Hubbell’sneutralmodelcanapproximateboththelog-seriesandthelog-normalSADs,

whileprovidingamechanistic justification for themand fullyaccounting forsampling

effects.

Nevertheless,ithasbeenshownthatmanytypesofnon-neutralprocessescould

yieldSADs similar toneutralones (Chaveetal., 2002;Pueyoetal., 2007;Chisholm&

Pacala, 2010). It has also been argued that the log-normal distribution fits empirical

SADsatleastaswellasHubbell’slocalcommunitySAD(McGill,2003).Thelog-normalis

still themostpopularchoicewhenitcomestochoosingarealistically-shapedSADfor

modellingpurposesirrespectiveoftheunderlyingmechanisms(Connollyetal.,2017).A

log-normal SAD is not in itself very informative on the mechanisms of community

assembly.Indeed,thelog-normaldistributionisthelimitingprobabilitydistributionfor

Page 29: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

25

anyproductofsufficientlymanyrandomvariables,asaconsequenceofthecentrallimit

theorem(cf.sectionIII.2andIII.3.a),thusalog-normalSADcouldariseastheresultof

anytypeofmultiplicativeprocess.Moregenerally,ithasbeensuggestedthattherange

ofempiricallyobservedSADscouldsimplyresultfromtheiterativespatialaggregation

of smaller-scale SADs, a phenomenon described as a ‘spatial analogy of central limit

theorem’(Sizlingetal.,2009).Asaconsequence,ithasbeencalledfor,ontheonehand,

more statistically powerful tests than simple curve-fitting (Chave et al., 2006; Al

Hammal et al., 2015), and on the other hand, testing multiple predicted patterns

simultaneouslyinsteadofsolelytheSAD(McGilletal.,2007).

b. Spatialpatterns

Spatial patterns form a second family of integrative patterns in ecology. The

relationshipbetweenthesampledareaandthenumberofsampledspeciesistheoldest

suchpattern tohavebeenstudied(Watson,1859).Thiscurvewas first regardedasa

meantoassesswhetheracommunityhadbeenadequatelysampled,i.e.toensurethat

onlyamarginalnumberofnewspecieswouldappearinthesampleifthesampledarea

were to be increased. It was soon realized that the species-area relationship (SAR)

mightalsocontainvaluableinformationregardingspatialcommunitystructure.Indeed,

attheregionalscale,thenumberofspeciesSwasfoundtoconsistentlyfollowapower

law𝑆 ∝ 𝐴!asafunctionofareaA,wheretheexponentztakesvaluesbetween0.15and

0.40(Arrhenius,1921;Williamson,1988).This ‘law’has laterbeenobservedtobreak

downattheextremes,eitherforareasthatarebelowapproximately1km2(forplants

or vertebrates), or conversely for areas that exceed the boundaries of a single

biogeographic unit (cf. Fig. 6; Preston, 1960; Shmida &Wilson, 1985). The resulting

curveexhibitsan ‘S’ shapeona log-logscale,witha lineardomain in thecentralpart

correspondingtothepower-lawbehaviourdescribedabove,andsteeperslopesatboth

ends.

Page 30: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

26

The three domains of the SAR reflect different processes at play. At the local

scale,theSARdirectlyresultsfromsamplingthelocalspeciesabundancedistribution:

thenumberofdetectedspeciesfirstincreaseslinearlywithareaandthenprogressively

slowsdownasonlytherarerspeciesremaintobesampled.Attheglobalscale,theSAR

approacheslinearityagainasspecieswithdistinctevolutionaryhistoryaresampledin

differentbiogeographiczones.At intermediatescales, thepower-lawregimereflectsa

slowincreaseinspeciesrichnesswithareaoncethelocalspeciesrichnesshasbeenfully

sampled. This increase corresponds to a shift in species composition with distance,

referred to as ‘beta-diversity’ by Whittaker (1960), i.e. the link between ‘alpha-

diversity’, the number of species in the local community, and ‘gamma-diversity’, the

numberofspeciesattheregionalscale.

Figure6:Numberofbirdspeciesasafunctionofarea;datafromPreston(1960).TheS-shapedSpecies-Area Relationship introduces two characteristic spatial scales for a given taxonomicgroup (verticaldashed lines), separating ‘local’, ‘intermediate’, and ‘large’ scales.The studyofbetadiversitymostlyfocuseson‘intermediate’scales,whilebiogeographyismostlyconcernedwith‘large’scales.AdaptedfromHubbell(2001).

Conceptually, beta-diversity is the variation in taxonomic composition among

siteswithinaregionofinterest.However,severalquantitativedefinitionscoexist.One

approachistoconsiderbeta-diversityasaquantityβthatlinksthemeanlocaldiversity

CHAPTER S IX

Fig. 6.2. Species-area curve for the world’s avifauna, spanning spa-tial scales from less than one acre to the entire surface of the Earth.The S-shaped curve suggests that the sampling units change as area isincreased, from individuals, to species ranges, and finally to differentbiogeographic realms at local, regional to subcontinental, and finallyto intercontinental spatial scales. Data from Preston (1960).

steep once again over large intercontinental spatial scales,until the area of the entire world was included. The changein slope implies that scale-dependent changes in samplingunits are occurring, a possibility of which Preston (1960) wasclearly well aware. I will discuss this scale-dependent changein a more formal theoretical treatment shortly.

A similar S-shaped curve was obtained by Shmida andWilson (1985), who plotted plant species-area relationshipson local to global scales (fig. 6.3). They argued that thechange in the form of the curve reflected changes in thebiological determinants of plant species richness. On verylocal scales, niche-assembly rules would dominate. On some-what larger spatial scales, mass effects and habitat diver-sity would become important. They define mass effects as animmigration subsidy from regional populations of a speciesthat would go locally extinct without this immigration sub-

158

Page 31: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

27

α to the regional diversity γ through𝛾 = 𝛼𝛽, so that the regional diversity can be

partitioned into independent within-community and among-community components

(Whittaker,1960;Jost,2007).Thespatialscalethatseparatesalpha-andbeta-diversity

maybedefinedasthescalewitnessingtheregimeshiftintheSAR.Anotherapproachis

tomeasure beta-diversity independently of alpha- and gamma-diversity as themean

taxonomic similarity between sites or as the variance of the community matrix

(Legendre & De Caceres, 2013). The community matrix is the matrix describing the

numberofindividualsperspeciesandpersites,takingusuallyspeciesascolumnsand

sitesasrows.Awealthofsimilaritymetricscanbeusedtocomparesitestoeachother,

depending for instanceon theweight given to rare species, onwhether the sampling

effort ishomogeneousamongsitesornot, andonwhetherabundance informationor

onlyspeciesoccurrenceshouldbetakenintoaccount(Legendre&DeCaceres,2013).

Taxonomicsimilarityiswellknowntodecreasewithdistance,ageneralpattern

ofecologythatisrelatedtothemonotonousincreaseofdiversitywitharea(Soininenet

al.,2007).Dependingonthemechanismsofcommunityassembly,this ‘distance-decay

of similarity’ can be interpreted either as the result of dispersal limitation, or as the

consequence of new habitats and community types being encountered. A major

motivationforthestudyofbeta-diversityliesinthefactthatitisanindirectmeansto

investigate thedriversof community assembly. Indeed, taxonomic similaritybetween

sitescanbecomparedtodistanceandtoenvironmentalsimilarity,soastoempirically

assess therelative importanceofdispersalandabiotic filtering inshapingcommunity

composition (Tuomistoetal., 2003).Thisquestionmayalsobe addressedbydirectly

comparing taxonomic compositionwith quantitative environmental descriptors using

multivariate statistical methods, an approach deemed more statistically powerful

(Legendreetal.,2005,2008;cf.sectionIII.2).

Formally, the distance-decay of similarity can be described using the pair-

correlation function of statistical physics, i.e. the probability for two individuals at a

givendistance tobelong to thesamespecies (Chave&Leigh,2002;Zillioetal., 2005;

Houchmandzadeh, 2009). Predictions for both the SAR and the distance-decay of

similaritycanbeobtainedfromaspatiallyexplicitversionofHubbell’sneutralmodel.

Page 32: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

28

Theneutral predictions are in qualitative agreementwith observations, including the

tri-phasic SAR (Hubbell, 2001; Condit et al., 2002). Nevertheless, as in the case of

speciesabundancedistributions, thisdoesnotprecludeothermechanismsfrombeing

involved.

EnvironmentalDNAdata2.

Collecting the large amounts of data required to study integrative patterns has long

been a tedious and challenging task (Lawton et al., 1998). Direct taxonomic

identification relies on rare expert knowledge, and is prone to errors. Sampling

protocolsaredifficult tostandardize,andowingtotheamountofworkinvolved,data

collection may spread over long periods of time – sometimes years – which may

introducebiases.Lastbutnotleast,onlyasmallfractionofbiodiversitycanbedirectly

sampledandidentifiedbyahumanobserver,mostlyvertebratesandplants.Asaresult,

datasets available for the study of integrative biodiversity patterns have long been

relatively rare and limited in their taxonomic extent. However, major technological

advances have been made over the last decades that now allow for the automatic

collectionofecologicaldata.Theseadvancesareallrelatedtotheexponentialincrease

in computer power that took place over the same period of time, and that has

dramaticallyimpactedallfieldsofscienceandindustry.

For instance,remotesensingofecological featuresover largespatialscalescan

be achieved using Lidar and hyperspectral imaging. Lidar is a small-wavelength

equivalent of radar (either airborne or ground-based) that allows for fine-grain 3D

imaging. Hyperspectral imaging consists in recording images (from a plane or a

satellite) foramuch largerspectrumofelectromagneticwavelengths than thehuman

eye does: the additional information may for instance be used for the automated

identification of tree species from their spectral signature, especiallywhen combined

withLidardata(Alonzoetal.,2014).

Page 33: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

29

Arguably, theonerecenttechnological innovationwiththestrongest impacton

biology has been high-throughput DNA sequencing (Schuster, 2007). While DNA

sequencingmethodshaveexistedsincethe1970s(Sangeretal.,1977),abreakthrough

occurredaround2005bywhichsequencingspeedwasmultipliedbyseveralordersof

magnitude for a fraction of the cost of previousmethods (cf. Fig. 7; Margulies et al.,

2005).Since2011,thedominanthigh-throughputDNAsequencingmethodisIllumina

sequencing,whichconsistsinspreadingandattachingthetargetDNAstrandsonaflat

surface and synthesizing the complementary strands using four-colour fluorescent

nucleotides(Bentleyetal.,2008).Byrecordingtheorderofappearanceofthedifferent

colours at the location of each DNA strand with a fast and high-resolution camera,

millions of strands can be simultaneously sequenced with high accuracy. The main

limitation of the method is on the length of the sequenced strands, which currently

cannotexceed150or300basepairs,dependingontheexacttechnology.

The idea of using DNA sequencing to study biodiversity predates high-

throughput sequencing, and was introduced as a mean to study microorganisms

(Giovannoni et al., 1990). Indeed, most microorganisms can only be detected in the

environmentthroughtheirDNA,collectedfromsoilorwatersamples(Pace,1997).The

idea was to identify a short DNA sequence satisfying two properties. First, it should

haveconservedextremitiesacrosstherangeoftargetedtaxa,sothatitcanbeamplified

byPCRusing a single pair of primers frombulkDNA. Second, its central part should

exhibitrandommutationsmakingthedifferenttaxadistinguishable,i.e.itshouldnotbe

understrongevolutionaryselection.Suchasequence iscalledabarcode,andthefirst

that has been used is the 16S rRNA gene of prokaryotes, which codes for the RNA

formingthesmall (16S)subunitof theprokaryoticribosome(Giovannonietal.,1990;

Pace, 1997). DNA barcodes have soon also been recognized as amean to bypass the

needfortraditionaltaxonomicexpertiseinidentifyinglargerorganisms,forwhichDNA

can be directly extracted from tissue (Hebert et al., 2003). Nevertheless, barcode

sequences can only be attributed to known taxa once a reference database has been

establishedforthebarcode.Whennoreferencedatabaseisavailablefortheorganisms

under study, molecular Operational Taxonomic Units (OTUs) defined based on

Page 34: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

30

sequencesimilarityaresubstitutedforspeciesinanalyses.Moreover,dependingonthe

frequency at which mutations occur in the barcode sequence, the comparison of

sequencesacrossspeciesmaynotbecongruentwithtraditionalspeciesdelineation,and

twobarcodestargetingthesametaxonomicgroupmayhavewidelydiffering levelsof

taxonomicresolution.

before 2003, hundreds of articles have been publishedsince the emergence of the DNA barcoding concept.Clearly, standardization was an important step in thedevelopment of DNA-based species identification, and ithas encouraged extensive international efforts to build tax-onomic reference libraries of the standardized regions.However, the barcoding standards were designed to iden-tify species from more or less intact DNA isolated fromsingle specimens using Sanger sequencing, and focusmore on the variability of the amplified region than onthe nonvariability of the primer sites and the length of thetargeted DNA region.

The emergence of DNA metabarcoding in relation

to next-generation sequencing and to the needs

of the scientific community

Here, we introduce the term ‘DNA metabarcoding’ to desig-nate high-throughput multispecies (or higher-level taxon)identification using the total and typically degraded DNAextracted from an environmental sample (i.e. soil, water, fae-ces, etc.). Species identification from bulk samples of entireorganisms (e.g. Chariton et al. 2010; Creer et al. 2010; Pora-zinska et al. 2010; Hajibabaei et al. 2011), where the organ-isms are isolated prior to analysis, can also be considered asDNA metabarcoding. Below, we will restrict our consider-

ations to the analysis of environmental DNA (eDNA),because analysis of bulk samples has very relaxed technicalconstraints compared to that of environmental samples.Bulk samples are usually composed of a restricted taxo-nomic group and provide high-quality DNA allowing theuse of a longer barcode, even the standardized ones. We willalso emphasize that because the goal of DNA metabarcodingis to identify taxa, it should be clearly differentiated frommetagenomics that ‘describes the functional and sequence-based analysis of the collective microbial genomes containedin an environmental sample’ (Riesenfeld et al. 2004).

The emergence of DNA metabarcoding was because oftechnology catching up to a scientific need. Standardized(‘traditional’) DNA barcoding does not fulfil all the needsof ecologists. As it is designed to identify single specimenswith DNA that is more or less intact, it typically requiresthe isolation of a suitable specimen to be analysed, whichis time-consuming and, for some taxonomic groups, diffi-cult or virtually impossible. Consequently, standardizedbarcoding is limited in the number of specimens that it canidentify. We must therefore accept that standardized DNAbarcoding is not ideal for high-throughput species identifi-cation for use in ecological studies, although it has an obvi-ous added value in many situations where classical speciesidentification is difficult and in facilitating the discovery ofnew species. These limitations have been surmounted bythe increasing availability of NGS machines that permithigh-throughput techniques such as DNA metabarcoding.At the moment, sequencing platforms can produce up to6 billions of sequence reads of 100 bp per run, with thepossibility to implement paired-end experiments (Glenn2011). Thus, it is not any more a problem to obtain severalthousands of sequence reads per amplicon, and the lengthof the sequence reads is already fully compatible with theshort fragment lengths required for eDNA metabarcoding.There is no doubt that the technology will improve stillfurther. As a consequence, NGS has the potential to pro-vide an enormous amount of information per experimentfrom in-depth sequencing of uniquely tagged amplicons(Binladen et al. 2007; Valentini et al. 2009). So, why not useeDNA to simultaneously identify many species in a singleexperiment? After some initial experiments based onPCR ⁄ cloning ⁄ sequencing (Willerslev et al. 2003, 2007), theapproach using NGS has already demonstrated its poten-tial, for analysing plant communities using soil samples(Yoccoz et al. 2012), for reconstructing past plant or animalcommunities using permafrost or ice samples (Haile et al.2009; Sønstebø et al. 2010; Boessenkool et al. 2012; Jørgen-sen et al. 2012a,b; Epp et al. submitted), for tracking earth-worms using soil samples (Bienert et al. 2012), formonitoring vertebrate biodiversity (Andersen et al. 2012),or for diet analysis using faeces or stomach content as asource of DNA (see review in Pompanon et al. 2012).

However, there are significant constraints when designingan eDNA metabarcoding study. First, eDNA is often highlydegraded, and long fragments of several hundreds of basepairs cannot be reliably amplified (Willerslev et al. 2004;Hansen et al. 2006). Second, because many species have tobe amplified in the same PCR experiment, it is extremely

DNA-based species identification(RFLP/Southern blots)

DNA barcoding(PCR – Sanger sequencing)

eDNA metabarcoding(PCR – NGS)

eDNA metabarcoding(Shotgun – NGS)

DNA-based species identification(PCR – Sanger sequencing)

Time

1990

2000

2010

PCR

Nex

t Gen

erat

ion

Sequ

enci

ng (N

GS)

eDNA metabarcoding(Capture probes – NGS)

Method implemented forspecies identification

Availabletechniques

?

Fig. 1 DNA-based species identification. Past and currentapproaches, and possible future trends.

2046 NE XT -G EN ERAT ION DN A M E TAB ARCOD ING

! 2012 Blackwell Publishing Ltd

Figure7:EvolutionofDNA-basedspeciesidentificationovertime.ThesuccessiveintroductionofPolymeraseChainReaction(PCR)andHigh-throughput(orNextGeneration)Sequencingtoecologyhavetransformedthefield,andautomateddatacollectionusingmolecularapproachesisdevelopingfast.AdaptedfromTaberletetal.(2012b).

Page 35: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

31

With the advent of high-throughput sequencing, several thousands to several

millionsof barcode sequences cannowbe readily sequenced froma singlebulkDNA

sample.Asaconsequence,theuseofbarcodesequencingtomeasurebiodiversityfrom

environmentalDNA–or‘metabarcoding’–hasboomed(Biketal.,2012;Taberletetal.,

2012b,a;Bohmannetal.,2014).Thevastdiversityofthemicrobialworldonlystartsto

befullygrasped,andwholenewswathesofthetreeoflifearebeingdiscovered(Huget

al., 2016). Ambitious projects aim at sampling microbial diversity across the globe,

eitheronland(Gilbertetal.,2014)orintheocean(deVargasetal.,2015).Inparallel,

metabarcodingcanbeusedasafastandstandardizedmeanstogatherinformationon

macroscopic organisms, either using environmental DNA or, for small enough

organisms,DNAextractedfroma‘soup’ofsampledspecimen(Andersenetal.,2012;Yu

etal.,2012;Gibsonetal.,2014).Thiswealthofdatahasledtoarenewalofinterestin

the study of integrative biodiversity patterns and biogeography, which were until

recently entirely unknown formicroorganisms (Martiny etal., 2006; Fuhrman, 2009;

Hanson et al., 2012). Since metabarcoding is but the simplest method to exploit the

information contained in environmental DNA, and is being replaced by approaches

makinguseofalargerfractionoftheorganisms’genomeassequencingcapacitykeeps

increasing(Taberletetal.,2012b),thetrendtowardincorporatingsequencingdatainto

ecologicalstudiesisprobablyjuststarting.

ThetropicalforestsofFrenchGuiana3.

Tropicalforestsareestimatedtoconcentratehalfofglobalbiodiversity,andareassuch

thearchetypical‘hyperdiverse’ecosystem(Scheffersetal.,2012).Theyhaveplayedan

historicalrole ingeneratinghypotheses inecologyandevolution,especiallyregarding

themechanismsofspeciescoexistence(Wright,2002).Indeed,likethephytoplanktonic

communities at the origin of the ‘paradox of the plankton’ (Hutchinson, 1961), they

harbour formany taxonomic groups awide range of species competing for the same

resources.Hubbell’sneutraltheoryofbiodiversityhasbeenelaboratedbasedprimarily

Page 36: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

32

ontheobservationoftropicalforesttreecommunities(Hubbell,2001),andmuchofthe

ensuing debate has initially focused on these communities as well (McGill, 2003;

Ricklefs, 2003; Volkov et al., 2003). In addition to their unparalleled biodiversity,

tropicalforestsarealsothoughttoharbourthemajorityofthenon-microbialterrestrial

taxa still unknown to science (Scheffers et al., 2012). Hence, the automated

measurementof integrativepatterns iswell suited to their study, and is inparticular

uniquelycomprehensivecomparedtootherpossibleapproaches.

Figure 8. Whether ecosystems can be considered pristine depends on the temporal scaleconsidered: map showing estimated changes in phosphorus (P) concentrations over time inSouthAmericafollowingthesuddenextinctionofmostlargemammalspecies12,000yearsagoand the consecutive disruption of nutrient transport through dung, likely caused by humanarrival.AdaptedfromDoughtyetal.(2013).

UnlikemostlandecosystemsonEarth,asignificant,iffastdwindling,fractionof

tropicalforestscanstillbeconsideredtobeinapristinestate,thusguaranteeingaccess

NATURE GEOSCIENCE DOI: 10.1038/NGEO1895LETTERS

70° W 60° W

70° W 60° W

a

b

c

d

500

400

300

kg P km¬2

200

100

0

70° W 60° W

60° W70° W

Steady state 15,000 years ago

Current day Change in steady state

Steady state in 28,000 years

10° S

10° S

10° S

10° S

Figure 3 |Map showing changing ecosystem P concentrations in South America due to megafauna extinctions. a, The steady-state estimate of Pconcentrations in the Amazon basin before the megafaunal extinctions with a lateral diffusivity �excreta value of 4.4 km2 yr�1. b, The current-day estimateof P concentrations 12,000 years after the extinctions with current animals and a �excreta value of 0.027 km2 yr�1. c, Estimated P concentrations in theAmazon basin 28,000 years in the future. d, The difference between the pre- and post-extinction equilibrium (a and c).

Table 1 |Average�excreta ⇤↵B (km2 yr�1) for each continent calculated for modern species and modern plus extinct species.

North America South America Australia Eurasia Africa

Number of species extinct 65 64 45 9 13Mean weight of extinct animals (kg) 846 1,156 188 2,430 970Modern �excreta ⇤↵B 13,876 12,934 21,804 21,779 265,621Modern+extinct fauna �excreta ⇤↵B 140,716 (±38,000) 283,854 (±81,000) 48,250 (±8,000) 118,349 (±29,000) 324,848 (±18,000)Percentage of original 10% (±2%) 5% (±1%) 45% (±6%) 18% (±4%) 82% (±4%)

Bottom row is the percentage of the original �excreta ⇤↵B remaining. The error represents an uncertainty in extinct species distribution of 30%.

on the loss rate (K ) which is a large source of uncertainty.).Our simulated modern-day distribution of P does not includethe large diversity of parent material and soil evolutionary stageswhich greatly impact observations of soil P across Amazonia(Supplementary Fig. S3), and instead represents the change inaccessible P in the biomass-necromass-soil continuum (‘ecosystemP’) andnot total P. EcosystemP concentrations in intact Amazonianforests could, therefore, potentially continue to decrease (to >90%of steady state) for 17 (between 3 and 43) thousand years into thefuture as a legacy of the Pleistocenemegafauna extinctions.

Although we have concentrated our analysis on Amazonia, itis likely that there were similar changes in nutrient transfer onall continents that experienced megafaunal extinction, albeit withvariations in the local nutrient gradients and the key limitingmacro-or micronutrients. Using data on Pleistocene megafaunal bodymasses, we estimate that � decreased drastically on all continents.Africa, the continent on which modern humans co-evolved withmegafauna, is the only continent with most (82%) of the lateralnutrient distribution capacity still intact (Table 1). The largestdeclines (90–95%) were in the Americas. It seems that Eurasia alsoshowed a large decline despite only nine extinctions, because theextinct megafauna were large (for example mammoths) whereasAustralia showed a moderate decline despite a large numberof extinctions, because the extinct megafauna were relativelysmall. However, these are estimates of non-pressured populationdensities, and ranges and current values for Africa and Eurasia

are probably reduced owing to current pressures on megafauna,because of decreases in megafaunal population size and restrictionson their free movement across landscapes.

Following the extinction of the megafauna, humans eventuallyappropriated much of the net primary production that had beenconsumed by the extinct animals23,24. Did we also take over theirrole of nutrient dispersal? People currently provide nutrients asfertilizer to agricultural systems, but much of this gets concentratednear agriculture, suggesting that humans act as concentratingagents rather than diffusive agents like the herbivorous megafauna.Therefore, compared to earlier eras, the post-megafaunal world ischaracterized by greater heterogeneity in nutrient availability25.

Our framework for estimating nutrient diffusion by animals canbe applied to modern ecosystems globally, and even incorporatedinto global land biosphere models demonstrating the ecosystemservice of nutrient dispersal. This service is analogous to that playedby arteries in the human body, with large animals acting as arteriesof ecosystems transporting nutrients further and smaller animalsacting as capillaries distributing nutrients to smaller subsectionsof the ecosystem. Therefore, after the demise of its large animals,the Amazon basin has lost its nutrient ‘arteries’ and the widespreadassumption of P limitation in the Amazon basin may be a relic ofan ecosystem without the functional connectedness it once had3.This new mathematical framework provides a potential tool ofquantifying the important but rarely recognized biogeochemicalservices provided by existing large animals. Therefore, those

NATURE GEOSCIENCE | VOL 6 | SEPTEMBER 2013 | www.nature.com/naturegeoscience 763

Page 37: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

33

tonaturalprocessesunaffectedbyhumanactivities.Amazonia represents theworld’s

largest tropical forest, andwithin it, French Guiana counts among its least disturbed

parts(Hansenetal.,2013).Nevertheless,recentfindingshavechallengedtheideaofan

entirely pristine Amazonian basin. Indeed, it appears that human population density

wasrelativelyhigh inplacesuntil theEuropeanconquest (Heckenbergeretal.,2008).

Moreover,ona longertimescale, theAmazonianbasinmaystillbe inatransientstate

following the suddendisappearanceofmost largemammal species12,000years ago,

which was likely caused by the arrival of human hunters and has had deep

consequencesonnutrienttransportandseeddispersal(cf.Fig.8;Doughtyetal.,2013).

TworesearchstationshavebeenestablishedinFrenchGuianainthe1980sfor

research on Amazonian biodiversity. This is where the data used in this thesis have

beencollected.TheNouraguesresearchstationisabout100kminland,intheheartof

theNouraguesnaturalreserve,andisdevotedtothestudyoftheundisturbedlowland

forestaswellasoftheneighbouringinselberg.TheParacouresearchstation,nearthe

coast, is devoted to the study of the long-term effects of logging on biodiversity

(Gourlet-Fleuryetal., 2004). Inboth stations, soils are acidic andnutrient-poor, as is

typical intropical forests,withamoresandysoil inParacouandamoreclayeysoil in

Nouragues. The mean rainfall is about 3,000 mm per year, with relatively strong

seasonalvariation,andtemperatureisaround26°Cthroughouttheyear.

Page 38: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

34

III. Statisticalapproaches

In this section, I introduce the statistical approachesused in this thesis. I first briefly

review the classical approaches of community ecology. I then introduce Hubbell’s

neutral model and Dirichlet mixture models, which are respectively the foci of the

second and third chapters of this thesis, by emphasizing their commonmathematical

structurebasedontheDirichletdistributionanditsDirichletprocessextension.

Comparingmodelstodatainecology1.

Inphysics,empiricaldatacanoftenbesatisfyinglycharacterizedbyaone-dimensional

mathematicalfunctionfittedtothedata(‘curvefitting’).Thisisnotthecaseinecology,

because observations do not as a rule tightly follow the prediction of a theoretical

model, andbecausedatapoints are always relatively scarce and costly to acquire.To

take full advantage of the available data, it is hence essential to account for the

statistical distribution of the observations around the fittedmodel, and often for the

statisticaldependencebetweenobservations.Intheabsenceofatheoreticalprediction,

deterministic trends in the relationshipbetweenvariables are conversely assumed to

be very simple (e.g., linear). Thus, ecologicalmodels aiming at comparisonwith data

need to be expressed in probabilistic terms, and model fitting heavily relies on

likelihood-basedinference(Fisher,1925;Pawitan,2001).

Thelikelihoodfunctionofamodelisgivenbytheprobabilitydistribution𝑝(𝑋|𝜃)

for the dataX to be observed conditional on themodel’s parameters𝜃. Themodel is

fittedtodatabymaximizingthelikelihoodfunction𝐿 𝜃 𝑋 = 𝑝(𝑋|𝜃),whichisameans

of simultaneously estimating themodel’s parameters as𝜃(𝑋) = argmax! 𝐿(𝜃|𝑋) and

measuringthegoodness-of-fitas𝐿(𝑋) = max! 𝐿(𝜃|𝑋) .Inpractice,thelogarithmofthe

likelihood is maximized, and the normalization factor in the likelihood expression is

discarded.Dependingonthesituation, the focusmaybeonmeasuringgoodness-of-fit

Page 39: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

35

oronestimatingandinterpretingmodelparameters.Ifseveralalternativemodelsareto

becomparedtoeachother,thiscanbeachievedbycomparingtheAkaikeInformation

Criterionforeachmodel,equalto2𝐾 − 2 ln 𝐿(𝑋),whereKisthenumberofparameters

in themodel (Akaike, 1974; Burnham&Anderson, 2002). If only onemodel is to be

compared to the data, themost popular approach is to assess how likely the data at

handwouldbetobeobservediftheyweretobegeneratedbytheprobabilisticmodel

underconsideration.Tothisend,thevaluetakenbya‘teststatistics’-forinstancethe

log-likelihoodln 𝐿(𝑋)-iscomparedtoitstheoreticaldistributiongiventhemodel.The

thresholdforrejectingthemodelwithreasonableconfidenceistraditionallysetat5%

probability,followingtheexampleofFisher(1925).

Another approach to likelihood-based inference consists in estimating the full

probability distribution of themodel’s parameters conditional on the data instead of

only their most likely value (Gelman et al., 2014). This approach is called Bayesian

inference, in contrast to maximum-likelihood inference, since the full probability

distribution of the model’s parameters 𝜃 is given by Bayes’ equation 𝑝 𝜃 𝑋 =

𝑝(𝑋|𝜃)𝑝(𝜃)/𝑝(𝑋)(Bayes&Price,1763).Anotherdistinctionbetweenbothapproaches

is that maximum-likelihood inference assumes that

argmax! 𝑝(𝜃|𝑋) = argmax! 𝑝(𝑋|𝜃) , and thus implicitly that 𝑝(𝜃) is a uniform

distribution.Incontrast,𝑝(𝜃)isoftenusedtoexpresspriorbeliefonparametervalues

in Bayesian inference. The normalization factor𝑝 𝑋 = 𝑝(𝑋|𝜃)𝑝(𝜃)! , or marginal

likelihood,canthenbeusedasameasureofgoodness-of-fitaccountingforallpossible

parameter choices. Because it is less analytically tractable than maximum-likelihood

inference,Bayesianinferencehasbeenlessemployedhistorically.However,itcannow

be performed numerically, and even though it is usually more computationally

demanding than maximum-likelihood inference, it has become increasingly popular

withthesteadyincreaseincomputerpower.Oneofthereasonsofitssuccessisthatit

canaccommodatecomplexmodelsinwhichthelikelihoodisdifficulttomaximize.

Page 40: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

36

Thestatisticaltoolsofcommunityecology2.

Univariatemodelssuchassimplelinearregression,whereobservationsareregardedas

realizationsofasingerandomvariable,canbedistinguishedfrommultivariatemodels

where observations result from several non-independent random variables. The

analysis of communitymatrices relies onmultivariate statisticalmethods, where the

abundance, or theoccurrence, of eachof thep taxa is regardedas a randomvariable

with a realization at each of the n sampling sites. Not all of the many multivariate

methods classically used in community ecology are explicitly model-based: they

typicallycombinemultivariatelinearregression,eigenvaluedecompositionandtheuse

of (dis)similarity metrics (Legendre & Legendre, 2012). Their results are often

interpretedwithintheframeworkofthe‘analysisofvariance’(ANOVA),whichconsists

inpartitioningthevarianceoftheobservedvariablesintocomponentscorrespondingto

differentsourcesofvariation.

Themultivariatemethods that include an eigenvaluedecomposition step (or a

generalizedversionofit)arecalled‘ordination’methods.Acornerstoneofmultivariate

analysis is Principal Component Analysis (PCA), a simple ordination method of

widespreadusewellbeyondecology.Itconsistsinrotatingpobservedvariablesaround

their mean so as to obtain p uncorrelated variables ordered by decreasing variance.

Namely, then-by-pmatrixT containing thep newvariables isobtainedas thematrix

product𝑇 = 𝑋𝑊,where𝑋isthen-by-pmatrixcontainingthecentredoriginalvariables,

and𝑊 the p-by-p matrix formed by the eigenvectors of the covariance matrix

1 𝑛 − 1 𝑋!𝑋orderedbydecreasingeigenvalues.ThefirstuseofPCAistodecorrelate

the data. It may also be used for reducing data dimensionality by discarding the

independent variables accounting for the least variance. Thus, PCA allows for

conveniently representing the data by projecting them on the two or three axes that

accountforthemostvariance.Toinvestigatethedependenceofacommunitymatrixon

a set of explanatory variables, such as environmental variables measured at the

samplingsites,aclassicalmethodistoperformamultivariate linearregressionofthe

communitymatrix on the explanatory variables, followed by a PCA on thematrix of

Page 41: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

37

fitted coefficients, a method known as Canonical Redundancy Analysis (RDA). Using

partiallinearregression,RDAcanbeextendedinto‘partialRDA’tocomparetheeffect

ofseveralsetsofexplanatoryvariablesonthecommunitymatrix.

Clustering methods constitute another family of extensively used statistical

methods in ecology (Legendre & Legendre, 2012), as well asmore generally in data

miningandmachine learning (Bishop,2006; Jain,2010).Theyaimatpartitioning the

data into ‘natural’ clustersofobservations,bysearching forstructure in thematrixof

pairwisesimilaritybetweenobservations.Assuch,theirscopeoverlapstosomeextent

withthatofexploratoryordinationmethodssuchasPCA.Intheterminologyofmachine

learning, clustering algorithms are ‘unsupervised’ algorithms, i.e. they aim at

discovering patterns without being provided any prior information, in contrast to

‘supervised’algorithmsaimingatclassifyingpatternsbasedonpre-existingcriteria.

Themostpopularclusteringalgorithms inecologyare ‘hierarchical’ones.They

consist inrecursivelysplitting thedata intoclustersofobservationsstarting fromthe

whole dataset – or conversely, recursively agglomerating clusters of observations

startingfromtheindividualobservations–bymaximizingbetween-clusterdissimilarity

at each step. Dissimilarity between two clusters is most commonlymeasured as the

meanpairwisedissimilaritybetweentheobservationsofeachcluster,amethodcalled

UPGMA (‘Unweighted Pair Group Method with Arithmetic Mean’). The pairwise

dissimilarity between observations can bemeasured using any dissimilarity metrics,

whichisoftenanadvantageinecologyowingtothewiderangeofdissimilaritymetrics

in use (Legendre & De Caceres, 2013; cf. section II.1.b). Another advantage of

hierarchical clustering is that the result can be displayed as a tree of hierarchically

nested clusters (or ‘dendrogram’): in addition to visualizingdata structure, thishelps

choose the number of clusters according to the desired level of similarity within

clusters.Hierarchicalclusteringishowevercomputationallyintensiveforlargedatasets.

Moreover, because splits – or merges – decided at each hierarchical step cannot be

undoneandhaveastrongimpactonthesubsequentsteps,thealgorithmmaybeeasily

trappedinsuboptimalsolutionsforlargeandnoisydatasets.

Page 42: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

38

‘Partitional’ algorithms,which consist in searching for the optimal partition of

thedataintoapredefinednumberofclusters,formasecondfamilyofalgorithmsthat

are better adapted to large datasets (Jain, 2010). The most widespread partitional

algorithm isk-means clustering,which formally consists in finding thek clusters that

minimizewithin-clustervarianceintheEuclidianspaceofobservations,withkafixed

parameter.Unlikehierarchical clustering,which ispurelyheuristic, theproblemofk-

means clustering can be reframed as the fit of amultivariate statisticalmodel to the

data(specifically,a‘Gaussianmixturemodel’).Thisishoweverachievedusingheuristic

algorithms,whichmayconvergetosuboptimalsolutions.Themostcommonalgorithm

consists in randomly setting the position of the k cluster centres in the space of

observations, delineating the clusters by assigning each observation to the closest

clustercentrebasedonEuclidiandistance,and then iterativelyreshaping theclusters

using theirmean in the previous step as their new centre, until convergence. Lastly,

‘networkscience’providesa rangeofclusteringalgorithms thatarebasedonagraph

representation of the similaritymatrix (Rosvall et al., 2009; Fortunato, 2010). These

methods that are well adapted to large datasets have recently enjoyed a rise in

popularity in ecology (Vilhena&Antonelli, 2015;Bloomfieldetal., 2017;Wangetal.,

2017).

A pervasive assumption in classical statistical models is that observations are

normally distributed – i.e., followGaussian probability distributions. This assumption

may be explicit, or sometimes implicit. For instance, model fitting by least-square

regression amounts to maximizing the log-likelihood of independent identically

distributednormalvariablescentredon the fittedmodel.Likewise, theassumption in

PCA that the observed variables can be entirely characterized by their mean and

varianceimpliesthattheyarenormallydistributed,sincethispropertyisuniquetothe

Gaussian distribution. A justification for the normality assumption is that an

observationonasamplecantypicallyberegardedasthesum,orthemeanoutcome,of

manyrandomdraws,yetthecentrallimittheoremstatesthatthemeanofasufficiently

largenumberof randomvariables is alwaysnormallydistributed.Thank to themany

convenient mathematical properties of the Gaussian distribution, exact analytical

Page 43: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

39

expressions have been obtained for maximum-likelihood estimators and for the

theoretical distribution of test statistics. Prior to the advent of computers, such

analytical resultswereanessential condition for thepracticalusefulnessof statistical

models.This ishowevernotthecaseanymore,andtheexplorationofmodelsthatare

notbasedontheGaussiandistributionisnowpossible.

TheDirichletdistributionanditsDirichletprocessextension3.

a. TheDirichletdistribution

Notallnaturalprocessesareadditive,andasaconsequence,notallquantitiescanbe

assumed to be normally distributed as the sum of a large number of random draws.

Some processes are multiplicative, and a direct consequence of the central limit

theoremisthattheproductofalargenumberofrandomdrawswillfollowalog-normal

distribution. Indeed, for N random variables𝑋! ,ln 𝑋!!!!! = ln𝑋!!

!!! . Hence, the

central limit theorem states thatln 𝑋!!!!! is normally distributed for large N. It

follows from the definition of the log-normal distribution that 𝑋!!!!! is log-normally

distributed.Asmentioned insectionII.1.a, this isapossibleexplanation for theoften-

observed log-normaldistributionof species abundances. Indeed, if the abundancesof

species are independent of each other, a species’ change in abundance through time

maytaketheformofarandommultiplicativefactorappliedtoitsreproductiveoutput

ateachgeneration,dependingforinstanceonenvironmentalfluctuations.

However, if changes in species abundance are rather driven by demographic

drift, as assumed in a neutral framework, relative species abundances are better

described by the following process: starting from abundances 𝑎!,… ,𝑎! , where𝑎! is

thenumberof individuals inspecies i,oneoftheSspecies ispickedateachtimestep

with probability equal to its relative abundance (or equivalently, one individual is

pickedatrandominthepopulation),anditsabundanceisincreasedbyoneindividual.If

this sampling scheme, called a Pólya urn, is repeated indefinitely, the distribution of

Page 44: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

40

species relative abundances 𝑥!,… , 𝑥! will follow the Dirichlet distribution of

parameters 𝑎!,… ,𝑎! , which may be regarded as a distribution over distributions

(Blackwell&MacQueen,1973):

𝑝 𝑥!,… , 𝑥!!!|𝑎!,… ,𝑎! =Γ 𝑎!!

!!!

Γ 𝑎!!!!!

𝑥!!!!!

!

!!!

𝑥! = 1− 𝑥!!!!

!!!

Γis the gamma function generalizing the factorial to real numbers and taking value

Γ 𝑎 = 𝑎 − 1 !when𝑎isapositiveinteger.NotethatthedescriptionofthePólyaurn

originallyinvolvesdrawingballsofdifferentcoloursfromanurninsteadofindividuals

ofdifferentspeciesfromacommunity.

If there is no a priori reason to assume differences between the S species,

parsimony leads to setting all initial abundances𝑎! to the same value𝑎(‘symmetric’

Dirichlet distribution). In that case, some species will randomly emerge as more

abundant than others over time in the Pólya urn sampling scheme, since any above-

averageabundancetendstobeamplified.Theshapeofthelimitingdistributionafteran

infinitenumberoftimesteps isheavily influencedbythe ‘concentrationparameter’𝑎,

which can formally take any positive real value. If𝑎ismuch smaller than 1, the first

speciestobepickedbythesamplingschemewillhaveitsabundanceupdatedto𝑎 + 1,

andwillhaveadisproportionatelyhigherprobabilitytobepickedagainatthenexttime

step. Conversely, if𝑎is much larger than 1, the fact that a species’ abundance is

increased by 1 has little influence on its subsequent probability to be picked. Thus,

dependingonthevalueof𝑎relativeto1,thesymmetricDirichletdistributioncaneither

describeaspeciesabundancedistributionwithafewdominantspeciesandmanyrare

one (𝑎 ≪ 1), reminiscentof the structureobserved in species-richcommunities,or in

contrast a veryeven species abundancedistribution (𝑎 ≫ 1). In thegeneral case, any

set of parameters 𝑎!,… ,𝑎! can be rewritten as 𝜃𝑝!,… ,𝜃𝑝! , with 𝑎!!!!! = 𝜃and

𝑝! = 𝑎! 𝜃, so that 𝑝!!!!! = 1. The Dirichlet distributionwith asymmetric parameters

behaves similarly to the symmetric case, except that the relative abundance𝑥! of

Page 45: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

41

species i hasmean𝑝! overallpossibledraws from theDirichletdistribution,while its

varianceisdeterminedbythevalueof𝜃 𝑆.

The symmetric Dirichlet distribution is the distribution that Fisher (1943)

implicitlyassumedforrelativespeciesabundancestoderivethelog-seriesSAD,defined

as𝔼 Φ! = 𝛼𝑥! 𝑛 (cf. section II.1.a). He assumed that the number of sampled

individuals per species followed a negative-binomial distribution of parameters

(𝛼 𝑆 , 𝑥)(withoutthezero-abundanceclass,becausethelattercannotbeobserved),as

the result of Poisson sampling from a large number S of Gamma-distributed species

abundances with shape parameter𝛼 𝑆and rate parameter 1− 𝑥 𝑥. The negative-

binomial distribution 𝑃!" can indeed be obtained as

𝑃!" 𝑘|𝛼 𝑆 , 𝑥 = 𝑃! 𝑘|𝜆 𝑝!(𝜆|𝛼 𝑆 , (1− 𝑥) 𝑥)𝑑𝜆!! , where 𝑃! and 𝑝! denote the

Poisson and Gamma distributions. Yet, if S species have abundances𝑛! identically

distributed as Gamma 𝛼 𝑆 ,𝜃 , their relative abundances𝑛! 𝑁 , where𝑁 = 𝑛!!!!! ,

followasymmetricDirichletdistributionwithconcentrationparameter𝛼 𝑆(Devroye,

1986).SinceFisherassumed𝛼 𝑆 ≪ 1toobtainthelog-series,thisindeedcorresponds

totheregimeofveryunevenrelativespeciesabundances.

b. TheDirichletprocessandtheEwenssamplingformula

As it is apparent in the case of Fisher’s log-series, a limitation of the Dirichlet

distribution as amean to describe species relative abundances is that it requires the

number S of species to be fixed in advance. It is hence appealing to generalize the

Dirichlet distributionbymakingS tend toward infinity. Let us consider the time step

𝑁 + 1of the Pólya urn sampling scheme with symmetric concentration parameter𝑎,

where N individuals have already been added to the original𝑆𝑎 individuals. The

probability to pick species i is(𝑛! + 𝑎) 𝑁 + 𝑆𝑎 , where𝑛! is the number of times

species ihasalreadybeenpicked.Hence, theprobability topickoneof the𝑆!species

that have already been picked at least once is (𝑁 + 𝑆!𝑎) 𝑁 + 𝑆𝑎 , while the

probability to pick one of the 𝑆 − 𝑆! species that have never been picked is

Page 46: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

42

(𝑆 − 𝑆!)𝑎 𝑁 + 𝑆𝑎 . If we simultaneously make S tend toward infinity and𝑎tend

toward 0, keeping the product𝑆𝑎 equal to a constant𝜃 , we obtain an infinite-

dimensional version of thePólya urn, called theHoppe urn,where the probability to

pickanexistingspeciesiattimestep𝑁 + 1is𝑛! 𝑁 + 𝜃 andtheprobabilitytopicka

newspeciesis𝜃 𝑁 + 𝜃 (Hoppe,1984).Afteraninfinitenumberoftimesteps,species

relative abundances are distributed according to a Dirichlet process of concentration

parameter𝜃anduniformbasedistribution,whichcanberegardedasthelimitoftheS-

dimensional symmetric Dirichlet distribution of concentration parameter𝜃 𝑆when S

tendstowardinfinity(Ferguson,1973;Tehetal.,2006).

More generally, a Dirichlet process of concentration parameter𝜃 and base

distribution𝒑 = 𝑝! !∈ℕ∗ can be regarded as the limit of the S-dimensional Dirichlet

distributionofconcentrationparameters 𝜃𝑝!,… ,𝜃𝑝! ,where 𝑝!!!!! = 1,whenStends

toward infinity (Ferguson, 1973). The infinite base distribution p is the distribution

fromwhich new species are sampled during the Hoppe urn scheme of parameter𝜃:

each new species is sampled from an infinite number of possible species labelswith

probabilityweightsp. If the basedistribution is uniform, as assumed in theprevious

paragraph,anever-encounteredlabelissimplyassignedtoeachnewspecies.

The Dirichlet process ismost intuitively understood by sampling from it. IfN

individuals are sampled from relative species abundances described by a Dirichlet

processofparameter𝜃anduniformbasedistribution,theirpartition Φ!,… ,Φ! intoS

species, whereΦ! is the number of species with abundance n, obeys the ‘Ewens

samplingformula’ofparameters(𝜃,𝑁)(Ewens,1972):

𝑃 Φ!,… ,Φ!|𝜃,𝑁 =𝑁!𝜃 !

1Φ!!

𝜃𝑛

!!!

!!!

where 𝜃 ! = Γ(𝜃 + 𝑁) Γ(𝜃).ThisformulaalsodescribesthepartitionofNindividuals

intoSspeciesobtainedbystoppingaHoppeurnschemeofparameter𝜃atstepN,thus

theDirichlet processdoesnot need to be explicitly defined for theEwens formula to

emerge from the Hoppe urn scheme. For a large enough sample, theΦ! are

approximatelydrawnfromindependentPoissonrandomvariableswithparameter𝜃 𝑛

Page 47: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

43

(Crane,2016).AremarkablepropertyoftheEwensformulaisthatityieldsasampling-

invariantdescriptionofrelativespeciesabundancescharacterizedbytheparameter𝜃:

indeed, any random subsample of𝑁! < 𝑁individuals taken from the initial sample

obeystheEwenssamplingformulaofparameters(𝜃,𝑁!).Moreover, theprobabilityof

observingSspeciesinasampleofNindividualsdoesnotdependontheexactpartition

but only on𝜃andN, as𝑃 𝑆 𝜃,𝑁 = 𝑠(𝑁, 𝑆)𝜃! 𝜃 ! ,where the function𝑠denotes the

absolutevalueof theStirlingnumbersof the first kind (Ewens,1972).Thus,𝜃canbe

regarded as a sampling-invariantmeasure of diversity in a species pool describedby

Ewenssamplingformula,irrespectiveofwhetherthisspeciespoolisfiniteorinfinite.

Neutralmodels4.

TheEwensformulawasfirstdiscoveredbyEwens(1972)inthecontextofpopulation

genetics. Indeed, it arises as the stationary distribution of allele frequency in the

Wright-Fisher andMoranmodels,whichdescribe theneutraldynamicsof alleles in a

population (Fisher, 1930; Wright, 1931; Moran, 1958; Wakeley, 2009). More

importantly for ecologists, the Ewens formula is also the stationary distribution of

species frequency in Hubbell’s neutral model of biodiversity, which was directly

inspired by population genetics (Hubbell, 2001). These models all bear some

resemblancetotheHoppeurnsamplingscheme,exceptthattheyaccountforthedeath

ofindividuals,sothatthetotalnumberofindividualsremainsconstantovertime.The

Wright-Fisher model assumes that all N individuals die at each time step and are

replacedbyanewgenerationofNnewindividuals.Theallelesofthesenewindividuals

aresampled(withreplacement)fromtheallelesinthepreviousgeneration,exceptfora

small probability in each new individual ofmutating into a never-encountered allele.

This translates into a demographic drift of allele frequency through time, and, over

longer time scales, by a turnover in thepoolof alleles through randommutationand

extinctionevents.TheMoranmodelissimilarbutassumesthanindividualsdieandare

replaced one at a time,which allows for overlapping generations. Hubbell’smodel is

almostidenticaltotheMoranmodel,exceptthatallelesarereinterpretedasspeciesand

Page 48: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

44

mutation as speciation, and that dying individuals cannot be replaced by their own

offspring. Even though all three models have the same stationary abundance

distribution described by Ewens formula, the exact expression of𝜃depends on the

model’sdynamics(Etienne&Alonso,2007).InHubbell’smodel,𝜃 = (𝑁 − 1) 𝜈 (1− 𝜈),

where𝜈isthespeciationprobabilityateachtimestep,i.e.theprobabilitythatthedying

individualisreplacedbyanewspecies.

ThekeyinnovationofHubbell’smodelcomparedtopopulationgeneticsmodels

isthatitalsoincludesthedescriptionofadispersal-limitedlocalcommunityconnected

to the regional metacommunity through immigration. The dynamics of the local

community is identical to that of the metacommunity, except that new species arise

through immigration instead of speciation: at each time step, an individual dies and

there isprobabilitym that thereplacing individualresults fromimmigration fromthe

metacommunity insteadof from local reproduction(if𝑚 = 1, there isno limitation to

dispersal). The difference is that unlike individuals arising through speciation, an

immigrating individual may belong to a species that is already present in the local

community.Thus,thestationarydistributionofspeciesfrequencyinalocalcommunity

of size N obeys a Ewens sampling formula of parameter𝐼 = (𝑁 − 1)𝑚 (1−𝑚) ,

modified to account for the fact that the immigrating ancestors to the current local

communityaresampledfromtheEwensformulaofparameter𝜃(Etienne&Olff,2004).

Theresultingtwo-layersamplingformulawasderivedbyEtienne(2005).The‘Etienne

sampling formula’ can also be regarded as the result of ‘dispersal-limited sampling’

fromEwens formula,which can be defined as a type of skewed sampling (Etienne&

Alonso, 2005). Importantly, Etienne formula still satisfies the sampling-invariance

propertyofEwensformula,i.e.anyrandomsubsampleof𝑁! < 𝑁individualswillfollow

theEtienneformulaofparameters 𝜃, 𝐼,𝑁! .

EwensandEtiennesamplingformulaallowforlikelihood-basedinferenceofthe

neutralparameters𝜃andI,aswellasforrigorousstatisticaltestsofmodelfit(cf.Fig.9;

Etienne & Olff, 2005; Etienne, 2007; Al Hammal et al., 2015). In practice, the

metacommunitycannotbedirectlyobservedand isusuallyregardedas infinite,while

thelocalcommunityisequatedwiththeobservedsampleofindividuals.Thus,𝜃andm

Page 49: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

45

are often chosen as model parameters instead of𝜃and I, reflecting the fact that the

numberofindividualsisknowninthelocalcommunitybutnotinthemetacommunity.I

canbeinterpretedastheeffectivenumberofindividualsinthemetacommunitythatare

in direct competition with the local community for reproduction. Furthermore, data

usuallyconsistofsamplesfromseverallocalcommunities.Thisconsiderablyincreases

statisticalpower,sincestatisticalinferencedoesnotonlyrelyontheshapeofthelocal

abundancedistributions,butalsoonthetaxonomicoverlapbetweenlocalcommunities.

Exact sampling formulas have been derived both for the case where all local

communitieshavethesameimmigrationparameterm(Etienne,2007)andforthecase

wheretheydonot(Etienne,2009).

Figure 9. Bayesian inference of neutral parameters based on the ‘Etienne sampling formula’:map showing the joint posterior probability density of𝜃andm for the tree abundance data(>10cmdbh)ofthe50-haBarroColoradoIslandmonitoredplot.AdaptedfromEtienne&Olff(2004).

As our approach focuses on the full multivariateprobability distribution P ½~JS jI ; H; J ", all information inthe data is used, or in other words, the curve-fitting exerciseis applied to the dominance-diversity (rank-abundance)curve. This is in contrast with McGill (2003) and Volkovet al. (2003) who lump abundances in (arbitrary) logarithmicabundance classes, and then fit a curve through the resultingspecies-abundance distribution.

In our analysis we only accounted for process error, theerror inherent in the stochastic model, and not formeasurement error (false identifications of species orincorrect abundance counts) that may play a very importantrole as well. The Bayesian framework is ideally suited toaccommodate such errors. This is however beyond thescope of this paper, as our primary goal is to present a newgenealogical modelling approach rather than a detailedBayesian data analysis.

We have presented a novel approach that shines newlights on the neutral model improving our understanding ofbiodiversity and particularly of the role of immigration. LikeVolkov et al. (2003), we expect our analysis to have

important consequences in population genetics andevolutionary biology, but in a more complete way thantheir approach, because our model predicts the relatednessof individuals and thus illustrates that genetic and speciesdiversity are inseparable aspects of biodiversity. It provides anew test of the neutral model: positive correlation betweengenetic and species diversity (Vellend 2003) may beinterpreted as evidence of neutral processes.

A C K N O W L E D G E M E N T S

We thank M. Boer, J. Chave, H. Heesterbeek, M. Rietkerk,M. Ritchie, P. de Ruiter and two anonymous reviewers forhelpful comments on earlier versions of the manuscript, andC. Bos, B. Carlin, M. Carlson, S. Chib, O. Diserud, P. Laud,I. Ntzoufras and A. McKane for clarifying their Bayesianand modelling approaches. Funding for Rampal Etienne wasprovided by the priority program !Biodiversity in DisturbedEcosystems" of the Netherlands Organization for ScientificResearch. Part of the work for this article was carried out atthe Tropical Nature Conservation and Vertebrate EcologyGroup, Wageningen University and Research Centre,Wageningen, The Netherlands, and at EnvironmentalScience, Faculty of Geosciences, Utrecht University,Utrecht, The Netherlands.

R E F E R E N C E S

Aitkin, M. (1991). Posterior Bayes factors. J. R. Stat. Soc. B, 53, 111–142.

Bell, G. (2001). Neutral macroecology. Science, 293, 2413–2418.

Figure 2 Joint posterior probability densityplot of the two model parameters h (funda-mental biodiversity number measuringregional diversity) and m (immigration prob-ability m :¼ I

I þ J % 1) for trees with diameterat breast height equal to or larger than 10 cmin the 1982 census of the Barro ColoradoIsland dataset (J ¼ 20 741, Condit et al.

1996). The maximum occurs at hopt ¼ 44.6and mopt ¼ 0.20 and the correlation coeffi-cient is r ¼ )0.61. The joint posteriordensity is obtained with a Metropolis–Hastings Markov Chain Monte Carloalgorithm, treating the elements of thespecies-ancestry abundance vector ~nSA aslatent variables. We used the Jeffreys priordistribution for h and I. Total sample size (offive parallel chains) and lag were 7 875 000and five iterations respectively.

Table 1 Interpretation of the Bayes factor B10 in comparing model1 to model 0 (Kass & Raftery 1995)

B10 Evidence against model 0

1–3 Not worth more than a bare mention3–20 Positive20–150 Strong>150 Very strong

174 R. S. Etienne and H. Olff

!2004 Blackwell Publishing Ltd/CNRS

Page 50: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

46

Despite the interest of exact sampling formulas for statistical inference,

approximate approaches have proved more practical in some instances. When the

number of samples is large enough, themetacommunity compositionmay be simply

approximated as the sum of all samples, instead of being explicitly modelled. In so

doing,onecanavoidmakinganyassumptiononthemetacommunitywhenestimating

immigration rates, or when testing the assumption of dispersal-limited neutral

community assembly (Sloan et al., 2006; Jabot et al., 2008; Harris et al., 2015).

Furthermore,whenabundancedataareunavailableorunreliable,theimmigrationrate

fromthemetacommunitymaybeestimatedsolelybasedontheoccurrenceofspecies

acrosssamples(Sloanetal.,2006).Alimitationofexactsamplingformulasisthattheir

computationisnumericallydemandingwhenthenumberofindividualsbecomeslarge.

Analternativeapproachisthentorepresentthesampleascontinuousspeciesrelative

abundances rather than in a fully discrete way. The species relative abundances

𝑥!,… , 𝑥! in a large dispersal-limited sample containing S species may be

approximatedasfollowingtheDirichletdistributionofparameters 𝐼𝑝!,… , 𝐼𝑝! ,where

𝑝!,… ,𝑝! aretherelativeabundancesofthoseSspeciesinthemetacommunity(Sloan

et al., 2007). In turn, 𝑝!,… ,𝑝! can be approximated as following the symmetric

Dirichlet distribution of parameter𝜃 𝑆(Woodcock et al., 2007). As is apparent from

sectionIII.b,thesecontinuousapproximationsmaybeextendedtothecaseofaninfinite

numberofspeciesSbymodellingtherelativeabundances𝒙inthelocalcommunityasa

Dirichlet process of parameter𝐼and base distribution𝒑 = 𝑝! !∈ℕ∗ , and the relative

abundancespinthemetacommunityasaDirichletprocessofparameter𝜃anduniform

base distribution (Harris et al., 2015). Such a model is referred to as a ‘hierarchical

Dirichletprocess’inthelanguageofmachinelearning.

While multivariate likelihood expressions are powerful tools for statistical

inference,theyaredifficulttovisualize,andone-dimensionalSADsmaybebettersuited

for intuitively understanding themodel’s behaviour. For instance, the non dispersal-

limitedSADinHubbell’smodelisequalto(Moran,1958;Vallade&Houchmandzadeh,

2003):

Page 51: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

47

𝔼 Φ!|𝜃,𝑁 =𝜃𝑛𝑁 + 1− 𝑛 !

𝑁 + 𝜃 − 𝑛 !

and converges toward Fisher’s log-serieswith𝜃 = 𝛼for a large enough numberN of

individuals(Chave,2004). Ingeneral, theSADcanberegardedas the firstmomentof

the multivariate sampling formula, since it is obtained as:

𝔼 Φ!|𝜃,𝑁 = Φ!𝑃 Φ!,… ,Φ!|𝜃,𝑁!!,…,!!| !!!!!!

!!!

A more straightforward approach to deriving this quantity is to express Hubbell’s

dynamical model through the approximate conditional transition probabilities

𝑃(𝑛 + 1|𝑛,𝜃),𝑃(𝑛 − 1|𝑛,𝜃)and𝑃(𝑛|𝑛,𝜃)thatagivenspecieswithcurrentabundancen

will have abundances𝑛 + 1 ,𝑛 − 1 , or n at the next time step, respectively. The

stationaryprobabilitydistributionofthis‘masterequation’thenprovidesanestimateof

𝔼 Φ!|𝜃,𝑁 ,oncemultipliedbytheobservednumberSofspeciesinthesample(Volkov

etal.,2003;Alonso&McKane,2004;McKaneetal.,2004;O’Dwyeretal.,2009).Unlike

the exact ‘genealogical’ approach described above, this approach typical of statistical

physicsdoesnot explicitly account for the interdependencebetween species, induced

by the constraint of a fixed total number of individuals through time (‘mean field’

approach). While this constraint was originally deemed a key element of the model

since it accounts for competition between species (Hubbell, 2001), both the

genealogical and themaster equation approaches have been found to yield the same

SADexpressionforalargeenoughsample(Etienneetal.,2007).

Categoricalmixturemodels5.

Letusassumethattherelativeabundances𝒙 = 𝑥!,… , 𝑥! ofSspecies,with 𝑥!!!!! = 1,

follow a Dirichlet distribution of parameters 𝒂 = 𝑎!,… ,𝑎! . The categorical

distributiondescribesthechoiceofoneoutofSspecies(orcategories)withprobability

weights𝒙. Itcanberegardedasaspecialcaseofthemultinomialdistribution,defined

as𝑃 𝒏 𝑁,𝒙 = 𝑁! 𝑛!!…𝑛!! 𝑥!!! … 𝑥!

!! , which describesmore generally the outcome

Page 52: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

48

ofNsuccessivecategoricaldrawswithprobabilityweights𝒙.Aremarkablepropertyof

the Dirichlet distribution is that it is the conjugate prior of the categorical and

multinomialdistributions.Namely, ifamultinomialsample𝒏 = 𝑛!,… ,𝑛! isobserved

from𝒙,with 𝑛!!!!! = 𝑁, then theposteriordistributionof𝒙given theobservations𝒏

still follows a Dirichlet distribution, but with parameters updated to𝒂+ 𝒏 = 𝑎! +

𝑛!,… ,𝑎! + 𝑛! to account for the observations. Fundamentally, this means that the

Dirichlet distribution is the “naturaldistributionoccurringwhentheprobability thata

forthcomingobservationisofcertainclassonlydependsonthenumberoftimesthisclass

hasalreadybeenobservedandonthetotalnumberofobservationsmadesofar” (Crane,

2016).

The posterior distribution of 𝒙 given the observations 𝒏 is defined as

𝑝 𝒙|𝒏,𝑁,𝒂 = 𝑃 𝒏 𝑁,𝒙 𝑝 𝒙 𝒂 /𝑃(𝒏|𝑁,𝒂) . The marginal likelihood 𝑃 𝒏|𝑁,𝒂 =

𝑃 𝒏 𝑁,𝒙 𝑝 𝒙 𝒂 𝑑𝒙𝒙 is the ‘Dirichlet-multinomial’ distribution of parameters 𝑁,𝒂 ,

i.e. the distribution of a N-individual multinomial sample with Dirichlet-distributed

probability weights of parameters𝒂. The Dirichlet-multinomial distribution can be

regardedasafinite-dimensionalversionoftheEwensformula(Crane,2016).

Because theDirichlet distribution is the conjugate prior of the categorical and

multinomial distributions, it is the natural prior in any probabilisticmodel involving

categorical or multinomial sampling from discrete classes. This is the case of

‘categoricalmixturemodels’,whichdescribeobservationsassampledfromamixtureof

K classes,with ‘mixtureweights’𝜽𝒌 = 𝜃! !∈ !,! , verifying 𝜃!!!!! = 1. The different

classes are typically not directly observable: instead, each is characterized by a

probabilitydistributionofparameters𝝓𝒌fromwhichall theobservationsassigned to

classkaresampled.Theprobabilitydistributionassociatedwitheachclassmaybefor

instance Gaussian if observations are continuous, or categorical if observations are

discrete.Ifthegoalofstatisticalinferenceistocapturedatastructure,thefocuswillbe

onestimatingthemixtureweights𝜽𝒌ofthedifferentclassesgiventheobservations,as

wellas theparameters𝝓𝒌of theprobabilitydistributionassociatedwitheachclass. If

thegoalistoclustertheobservations(ortoclassifythem,ifinferenceisconductedina

supervisedway),thefocuswillbeonassigningtoeachobservationitsmostlikelyclass

Page 53: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

49

k. In a fully Bayesian setting, the parameters𝝓𝒌 of the probability distribution

associatedwithclasskmayalsobegivenapriordistribution,suchasaDirichletpriorin

thecaseofcategoricalobservations.Inthislattercase,themodelmaybereferredtoas

a‘Dirichletmixturemodel’,sinceitdescribesobservationsascategoricalormultinomial

samplesfromamixtureofDirichlet-distributedclasses.

Thisfamilyofmodelshasfoundapplicationsinmanyfields.Inparticular,Holmes

etal.(2012)appliedsuchamodeltoinvestigatethestructureofmicrobialcommunities

sampled by environmentalDNA sequencing. They assume that each sample is a local

community belonging to one of K possible classes, which they interpret as

‘metacommunities’.Toeachofthesemetacommunitiesisassignedamixtureweight𝜃! ,

whichistheprobabilityforasampletooriginatefromit.Ametacommunitykisdefined

byprobabilityweights𝝓𝒌 = 𝜙!! !∈ !,!over theSOTUsobserved in thedataset, from

which local OTU abundances are sampled. These probability weights are themselves

Dirichlet-distributed with parameters𝒂𝒌 = 𝑎!! !∈ !,!. For practical purposes, the

parameters 𝑎!! may be further assumed to follow a ‘hyperprior’ distribution

parameterizedby‘hyperparameters’,soastoreducethenumberoffixedparametersto

estimate.

A version of this model was introduced earlier in population genetics, and

implemented in the software Structure (Pritchard et al., 2000). In the context of

population genetics, each sample is an individual, each class is a population, and

observations consist in the alleles found at a number of loci in each individual. As a

consequence,themodelexhibitsafewminordifferencescomparedtothatofHolmeset

al.(2012).SincethereareLobservedlociperindividual,eachclasskisnotdefinedby

onedistribution,butbyLdistributions𝝓𝒌,𝒍 = 𝜙!!,!

!∈ !,!!overthe𝑆! possibleallelesat

locusl,eachofthesedistributionshavingDirichletprior.Moreover,onlyonecategorical

draw from𝝓𝒌,𝒍is observed at each locus in each individual, instead of amultinomial

sample.

Page 54: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

50

Figure 10. Valle et al. (2014) applied Latent Dirichlet Allocation to identify forest treeassemblages in theEasternUnitedStates,basedon treecensusdata from34,174 forestplots.Maps show the relative proportion of each of the𝐾 = 11LDA classes in each forest plot.AdaptedfromValleetal.(2014).

In the same paper, Pritchard et al. (2000) proposed a second slightly more

sophisticatedmodel,whichincludesthepossibilityofadmixturebetweenpopulations.

This is achieved by relaxing the assumption that each individualm originates from a

single population, and by assuming instead that it originates from a mixture of K

populationswithindividual-specificweights𝜽𝒎 = 𝜃!! !∈ !,! .Asinthemodelwithout

admixture, the K populations are each defined by a single set of L distributions𝝓𝒌,𝒍

acrossthedataset.Thus,eachobservedalleleistheresultofacategoricaldrawfromthe

0.00 0.01 0.02 0.03 0.04 0.05 0.06

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Minnesota

0.74

0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Indiana

0.7

Rel

ativ

e ab

unda

nce

(200

8–20

12)

Relative abundance (1999–2003) Relative abundance (1998)

(a)

(b)

Figure 4 Spatial and temporal patterns for tree plots in Eastern United States. Panel (a) depicts the spatial distribution of the proportion of eachcomponent community. Each subpanel corresponds to a component community (numbers in the lower right corner, see Table S1) except for the colour keyin the lower right. Panel (b) shows temporal patterns of relative abundance of the oak community in Minnesota (left subpanels) and Indiana (rightsubpanels). Upper subpanels show the relative abundance of community 4 in earlier and current forest inventories (numbers in red are the proportion ofplots indicating a decline in relative abundance). Lower subpanels show the spatial distribution of this decline, based on an inverse-distance weightedinterpolation. Data from Minnesota refer only to re-measured plots while data from Indiana were grouped into latitude–longitude bins because no plotswere re-measured. Only bins with at least four plots in 1998 and 2008–2012 are used.

© 2014 The Authors. Ecology Letters published by John Wiley & Sons Ltd and CNRS.

Letter A new multivariate tool for biodiversity data 1597

Page 55: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

51

individual-specific weights𝜽𝒎 , followed by a second categorical draw from the

population-specificandlocus-specificweights𝝓𝒌,𝒍.Anotherversionofthismodelwith

admixture was independently proposed under the name ‘Latent Dirichlet Allocation’

(LDA) by Blei et al. (2003) in the field of natural language processing, a subfield of

machinelearning,toaddresstheproblemof‘topicmodelling’.Inthiscontext,theaimof

themodel is to decompose text documents into topics based on their word content.

Each class or topick is defined by its distribution𝝓𝒌 = 𝜙!! !∈ !,!over theS distinct

words observed in the whole text corpus, and each documentm is a mixture, with

document-specificweights𝜽𝒎,ofmultinomialsamplesfromthesedistributions.

This model with admixture has proved very successful and has been

subsequently extended, both in its population genetics version (Falush et al., 2003,

2007;Hubiszetal.,2009)andinitstopicmodellingversion(Griffiths&Steyvers,2004;

Rosen-Zvietal.,2004;Tehetal.,2006;Blei,2012).Thelatter(LDA)hasbeenappliedto

awiderangeofdomainspertainingtomachinelearningwhereitsabilitytohandlelarge

andcomplexdatasetshasbeenpraised,includingsatelliteimageprocessing(Vaduvaet

al., 2013), bioinformatics (Liu et al., 2010), fraud detection in telecommunications

(Olszewski, 2012) and social sciences (Mauch et al., 2015). In particular, it has been

recently applied to spatially and temporally explicit forest tree composition data in

ecology, where its ability to decompose samples into classes learnt over the whole

datasetallowsforcapturingsmoothspatialandtemporalgradientsacrossthesamples

(cf.Fig.10;Valleetal.,2014).Relatedmodelshavealsobeenappliedtothedetectionof

different source environments in microbial community samples, with a focus on

supervised inference: Knights et al. (2011) applied this approach to the detection of

contamination in amedical environment,while Shafieietal. (2015)proposedamore

sophisticated two-layer model, where each class is itself a mixture of higher-level

classes.

Asinthecaseofneutralmodels,alimitationofDirichlet-multinomialmodelsis

that the number of classesmust be specified in advance. A number ofmethods have

beenusedtohelpselect thenumberofclasses(Airoldietal.,2010).Nevertheless, the

most rigorous approach is to design a model with a potentially infinite number of

Page 56: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

52

classes,anapproachreferredtoas‘nonparametricBayesian’,sincethesizeofthemodel

is not fixed in advance by a parameter. This can be achieved by setting a Dirichlet

processprioroverthemixtureweights,sincetheDirichletprocessis,liketheDirichlet

distribution,conjugate to thecategoricalandmultinomialdistributions (Crane,2016).

ThisamountstomakingthenumberKofclassestendtowardinfinity.

In the infinite-dimensional extension of the model without admixture, the

mixture weights𝜽 = 𝜃! !∈ℕ∗ over classes follow a Dirichlet process of uniform base

distributionoverclasslabels,whileeachclassk isdefinedasinthefinite-dimensional

case by its distribution𝝓𝒌 = 𝜙!! !∈ !,!over the S possible observations (Teh et al.,

2006).Inthemodelwithadmixturehowever,ahierarchicalDirichletprocessneedsto

be defined. Indeed, if an independent Dirichlet process of uniform base distribution

weretobeassignedineachsamplemasapriortothemixtureweights𝜽𝒎 = 𝜃!! !∈ℕ∗ ,

twodocumentswouldnothaveanyclassincommon.Thus,intheinfinite-dimensional

extension of the model with admixture, the mixture weights𝜽𝒎in each sample m

originate from a Dirichlet process of base distribution𝜷 over classes, while the

distribution𝜷follows itselfaDirichletprocessofuniformbasedistributionoverclass

labels (Teh et al., 2006). Likewise, two local communities in the infinite-dimensional

approximationofHubbell’sneutralmodelwouldnothaveanyspeciesincommonifnot

forthehierarchicalDirichletprocessconstruction(cf.sectionIII.4).

Page 57: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

53

IV. Objectivesandoutline

Objectives1.

MostofEarth’sbiodiversityisconcentratedinafewhyperdiverseecosystems,suchas

tropicalforests.Yet,themechanismsthatpermitthecoexistenceofsuchalargenumber

ofspeciesarenotfullyunderstood.Inparticular,therelativeinfluenceofdeterministic

niche processes and stochastic dispersal limitation has long been debated. One

approach to address this question is through the study of integrative biodiversity

patterns, such as the distribution of species abundances and the turnover of species

compositionthroughspace.Atatimewhenhumanactivitiesthreatenbothbiodiversity

and the associated ecosystems, a better understanding of these patterns and of the

underlyingmechanismsismuchneeded.

Amajorobstacleliesinthedifficultytomeasurebiodiversity.Indeed,ithaslong

reliedondirecthumanobservation.However,recenttechnologicaladvancesnowmake

automateddatacollectionpossible,whichcouldalleviatethisproblem.Environmental

DNA sequencing is especially promising for improving our understanding of

biodiversity patterns. Indeed, it eases and standardizes the measurement of

biodiversity, increases the amount of available data by orders of magnitude, and

dramaticallyexpandstherangeofaccessibletaxa.Inparticular,itallowsfortakinginto

accountmicrobialdiversity,arguablythe‘hiddenpartofthebiodiversityiceberg’.

Nevertheless,takingadvantageofthisnewtypeofdataischallenging.First,the

rangeofinformationtypesthatcanbecollectedisrestricted,inthatnocomplementary

measurements,suchassizeforinstance,canbemadeonorganisms.Inmostcases,even

taxonomic information is relatively impreciseowing to the lackof referencedatabase

for the retrieved DNA sequences. Thus, inference is mostly based on patterns of

unidentified OTUs. Second, because observations are indirect and noisy, their

interpretation isnotas straightforwardas in thecaseofdirect censusesof individual

Page 58: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

54

organisms. Third, the high diversity of microbial communities makes for large and

sparsedatasets,towhichexistingstatisticalapproachesarenotwellsuited.

Theoverarchinggoalof this thesiswas to investigatehowenvironmentalDNA

sequencing, and more generally the automated collection of ecological data, could

contribute to our understanding of biodiversity patterns and of their underlying

mechanisms.Thisworkwasmotivatedbytwoobservations.First,theoreticalmodelsin

ecologyareforthemostpartnotorientedtowardcomparisonwithdata,andwhenthey

are,asinthecaseofHubbell’sneutralmodel,theyarecentredonindividualorganisms,

which hampers their comparison to environmental DNA data. Second, existing

statisticalmethodsinecologyhavelimitationsintheirabilitytotacklesuchdata.Thus,

thisworkhasanimportantmethodologicalcomponent.Asecondgoalofthisthesiswas

toapply thedevelopedapproaches tosoilDNAdatacollected in the forestsofFrench

Guiana, so as to better understand community assembly in tropical forests. This

includesadatasetthatwascollectedaspartofthisthesis.

Outline2.

The first chapter addresses the issue of measuring beta diversity patterns from

environmentalDNAdata, andof using thesepatterns todisentangledispersal-limited

andniche-basedprocessesacrossthedifferentdomainsoflife.Tothisend,asoilDNA

datasetwascollectedinFrenchGuiana,inforestplotsthatareapproximatelyregularly

spacedona logarithmic scale.A rangeof soilpropertieswasalsomeasured from the

soil samples. Three approaches are compared: distance-based analyses using

dissimilaritymetrics, raw-data analyses usingmultivariate ordination, and fitting the

neutral prediction for the decay of taxonomic similarity with distance. These

approaches are typical of those used to analyse classical biodiversity census data. In

addition,theeffectonhumandisturbancethroughloggingisassessed,basedonamore

limitednumberofplotspresentingagradientofloggingintensities.

Page 59: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

55

Thesecondchapterfocusesonspeciesabundancedistributionsmeasuredfrom

environmentalDNAdata,andaddresses theproblemofcomparing thispattern to the

prediction of Hubbell’s neutral model. Indeed, it was unknown to what extent this

patternmay remain informative in spiteof thepotentialnoise. Simulation results are

presented, that quantify how the estimates of the neutral diversity and dispersal

parameters are biased when inferred from environmental DNA data. A benchmark

datasetoflimitedextentisusedtoassessthelevelofnoisethatistobeexpectedinreal

data.

Like the first chapter, the third chapter discusses spatial patterns in

environmentalDNAdata, but itproposesanapproachdiffering from those classically

followed in ecology. It investigates the potential of amodel-based statisticalmethod,

Latent Dirichlet Allocation, to decompose the data into assemblages of spatially co-

occurring OTUs. In addition, a method is proposed to measure the stability of the

decomposition.Theapproachistestedthroughsimulations,andbyapplyingittoalarge

soilDNAdataset.Thisdataset followsaregularspatialsamplingschemeovera forest

plot,andwascollectedinFrenchGuianabeforethestartofthisthesis.Theinsightson

soilcommunitystructureprovidedbytheapproacharediscussed,makinguseofLidar

measurementsofenvironmentalfeatures.

Finally, the discussion provides a synthesis of the results, and discusses the

perspectivesarisingfromthisthesis.

Page 60: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

56

References

Airoldi, E.M., Erosheva, E.A., Fienberg, S.E., Joutard, C., Love, T. & Shringarpure, S. (2010)ReconceptualizingtheclassificationofPNASarticles.PNAS,107,20899–20904.

Akaike, H. (1974) A new look at the statistical model identification. IEEE transactions onautomaticcontrol,19,716–723.

AlHammal,O.,Alonso,D.,Etienne,R.S.&Cornell,S.J.(2015)WhenCanSpeciesAbundanceDataRevealNon-neutrality?PlosComputationalBiology,11,23.

Alonso,D.,Etienne,R.S.&McKane,A.J.(2006)Themeritsofneutraltheory.TrendsinEcology&Evolution,21,451–457.

Alonso, D. & McKane, A.J. (2004) Sampling Hubbell’s neutral theory of biodiversity. EcologyLetters,7,901–910.

Alonzo, M., Bookhagen, B. & Roberts, D.A. (2014) Urban tree species mapping usinghyperspectralandlidardatafusion.RemoteSensingofEnvironment,148,70–83.

Andersen,K.,Bird,K.L.,Rasmussen,M.,Haile, J., Breuning-Madsen,H.,Kjaer,K.H.,Orlando, L.,Gilbert,M.T.P. &Willerslev, E. (2012)Meta-barcoding of “dirt” DNA from soil reflectsvertebratebiodiversity.MolecularEcology,21,1966–1979.

Aristotle(IVthcent.BC)HistoryofAnimals.Armstrong, R.A. & McGehee, R. (1980) Competitive exclusion. The American Naturalist, 115,

151–170.Arnoldi, J.-F., Loreau, M. & Haegeman, B. (2016) Resilience, reactivity and variability: A

mathematicalcomparisonofecologicalstabilitymeasures.JournalofTheoreticalBiology,389,47–59.

Arrhenius,O.(1921)Speciesandarea.JournalofEcology,9,95–99.Baas Becking, L.G.M. (1934) Geobiologie of inleiding tot demilieukunde., W.P. Van Stockum &

Zoon,TheHague,theNetherlands.Bayes,M.&Price,M.(1763)Anessaytowardssolvingaprobleminthedoctrineofchances.by

thelaterev.mr.bayes,frscommunicatedbymr.price,inalettertojohncanton,amfrs.PhilosophicalTransactions(1683-1775),370–418.

Bentley,D.R.,Balasubramanian,S.,Swerdlow,H.P.,Smith,G.P.,Milton,J.,Brown,C.G.,Hall,K.P.,Evers,D.J.,Barnes,C.L.&Bignell,H.R.(2008)Accuratewholehumangenomesequencingusingreversibleterminatorchemistry.nature,456,53.

Bik,H.M.,Porazinska,D.L.,Creer,S.,Caporaso,J.G.,Knight,R.&Thomas,W.K.(2012)Sequencingour way towards understanding global eukaryotic biodiversity. Trends in Ecology &Evolution,27,233–243.

Bishop, C.M. (2006) PatternRecognition andMachine Learning, Springer. Michael Jordan, JonKleinberg,BernhardSchölkopf.

Blackwell,D.&MacQueen,J.B.(1973)FergusondistributionsviaPólyaurnschemes.Theannalsofstatistics,353–355.

Blei, D. (2012) Probabilstic Topic Models. Communication of the Association for ComputingMachinery,55,77–84.

Blei,D.M.,Ng,A.Y.&Jordan,M.I.(2003)LatentDirichletAllocation.JournalofMachineLearningResearch,3,993–1022.

Page 61: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

57

Bloomfield, N.J., Knerr, N. & Encinas-Viso, F. (2017) A comparison of network and clusteringmethodstodetectbiogeographicalregions.Ecography.

Bohmann,K.,Evans,A.,Gilbert,M.T.P.,Carvalho,G.R.,Creer,S.,Knapp,M.,Yu,D.W.&deBruyn,M.(2014)EnvironmentalDNAforwildlifebiologyandbiodiversitymonitoring.TrendsinEcology&Evolution,29,358–367.

Braun-Blanquet&Pavillard(1922)Vocabulairedesociologievégétale.Brook,B.W.,Ellis,E.C.,Perring,M.P.,Mackay,A.W.&Blomqvist,L. (2013)Does the terrestrial

biospherehaveplanetarytippingpoints?TrendsinEcology&Evolution,28,396–401.Brown,J.H.(1995)Macroecology,UniversityofChicagoPress.Brown,J.H.(1978)Thetheoryofinsularbiogeographyandthedistributionofborealbirdsand

mammals.GreatBasinNaturalistMemoirs,209–227.Brown, J.H. & Kodric-Brown, A. (1977) Turnover Rates in Insular Biogeography: Effect of

ImmigrationonExtinction.Ecology,58,445–449.Brown,W.L.&Wilson,E.O.(1956)Characterdisplacement.Systematiczoology,5,49–64.Burnham,K.P.&Anderson,D.R.(2002)Modelselectionandmultimodelinference,Springer,New

York.Cabeza, M. & Moilanen, A. (2001) Design of reserve networks and the persistence of

biodiversity.TrendsinEcology&Evolution,16,242–248.Carpenter, S.R., Cole, J.J., Pace, M.L., Batt, R., Brock, W.A., Cline, T., Coloso, J., Hodgson, J.R.,

Kitchell,J.F.,Seekell,D.A.,Smith,L.&Weidel,B.(2011)EarlyWarningsofRegimeShifts:AWhole-EcosystemExperiment.Science,332,1079.

Caswell,H. (1976)Community structure - neutralmodel analysis.EcologicalMonographs,46,327–354.

Chase, J.M. & Leibold, M.A. (2003) Ecological niches: linking classical and contemporaryapproaches,UniversityofChicagoPress.

Chave,J.(2004)Neutraltheoryandcommunityecology.EcologyLetters,7,241–253.Chave,J.(2013)Theproblemofpatternandscaleinecology:whathavewelearnedin20years?

EcologyLetters,16,4–16.Chave, J.,Alonso,D.&Etienne,R.S. (2006)Theoreticalbiology -Comparingmodelsof species

abundance.Nature,441,E1–E1.Chave, J. & Leigh, E.G. (2002) A spatially explicit neutral model of beta-diversity in tropical

forests.TheoreticalPopulationBiology,62,153–168.Chave, J., Muller-Landau, H.C. & Levin, S.A. (2002) Comparing classical community models:

Theoreticalconsequencesforpatternsofdiversity.AmericanNaturalist,159,1–23.Chesson,P. (2000)Mechanismsofmaintenanceof speciesdiversity.AnnualReviewofEcology

andSystematics,31,343–+.Chisholm, R.A. & Pacala, S.W. (2010) Niche and neutral models predict asymptotically

equivalent species abundance distributions in high-diversity ecological communities.Proceedings of the National Academy of Sciences of the United States of America, 107,15821–15825.

Clements, F.E. (1916) Plant succession: an analysis of the development of vegetation, CarnegieInstitutionofWashington.

Condit, R., Pitman, N., Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Nunez, P., Aguilar, S.,Valencia,R.,Villa,G.,Muller-Landau,H.C.,Losos,E.&Hubbell,S.P.(2002)Beta-diversity

Page 62: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

58

intropicalforesttrees.Science,295,666–669.Connell, J.H. (1983) On the prevalence and relative importance of interspecific competition:

evidencefromfieldexperiments.TheAmericanNaturalist,122,661–696.Connell,J.H.(1970)Ontheroleofnaturalenemiesinpreventingcompetitiveexclusioninsome

marineanimalsandinrainforesttrees.Dynamicsofpopulations.Connolly,S.R.,Hughes,T.P.&Bellwood,D.R.(2017)Aunifiedmodelexplainscommonnessand

rarityoncoralreefs.EcologyLetters.Cordero,O.X.,Wildschutte,H.,Kirkup,B.,Proehl,S.,Ngo,L.,Hussain,F.,LeRoux,F.,Mincer,T.&

Polz, M.F. (2012) Ecological Populations of Bacteria Act as Socially Cohesive Units ofAntibioticProductionandResistance.Science,337,1228.

Cox,C.B.,Moore,P.D.&Ladle,R.(2016)Biogeography:anecologicalandevolutionaryapproach,JohnWiley&Sons.

Crane,H.(2016)TheUbiquitousEwensSamplingFormula.StatisticalScience,31,1–19.Curtis, T.P. & Sloan, W.T. (2005) Exploring microbial diversity - A vast below. Science, 309,

1331–1333.Daily,G.(1997)Nature’sservices:societaldependenceonnaturalecosystems,IslandPress.Darwin,C.(1859)OntheOriginofSpeciesbyMeansofNaturalSelection.Davies, T.J., Allen, A.P., Borda-de-Agua, L., Regetz, J.&Melian, C.J. (2011)Neutral biodiversity

theory can explain the imbalance of phylogenetic trees but not the tempo of theirdiversification.Evolution.

Devroye, L. (1986) Sample-based non-uniform random variate generation. Proceedings of the18thconferenceonWintersimulation,pp.260–265.ACM.

Diamond,J.M.(1975)Assemblyofspeciescommunities.EcologyandEvolutionofCommunities,pp.342–444.Cody,M.L.&Diamond,J.M.,Cambridge,MA.

Doughty,C.E.,Wolf,A.&Malhi,Y.(2013)ThelegacyofthePleistocenemegafaunaextinctionsonnutrientavailabilityinAmazonia.NatureGeoscience,6,761–764.

Du, X., Zhou, S. & Etienne, R.S. (2011) Negative density dependence can offset the effect ofspecies competitive asymmetry: A niche-based mechanism for neutral-like patterns.JournalofTheoreticalBiology,278,127–134.

Etienne, R.S. (2007) A neutral sampling formula for multiple samples and an “exact” test ofneutrality.EcologyLetters,10,608–618.

Etienne,R.S. (2005)Anewsampling formula forneutralbiodiversity.EcologyLetters,8,253–260.

Etienne,R.S. (2009)Maximumlikelihoodestimationofneutralmodelparameters formultiplesamples with different degrees of dispersal limitation. Journal of Theoretical Biology,257,510–514.

Etienne, R.S. & Alonso, D. (2005) A dispersal-limited sampling theory for species and alleles.EcologyLetters,8,1147–1156.

Etienne,R.S.&Alonso,D. (2007)Neutral community theory:Howstochasticityanddispersal-limitationcanexplainspeciescoexistence.JournalofStatisticalPhysics,128,485–510.

Etienne,R.S.,Alonso,D.&McKane,A.J.(2007)Thezero-sumassumptioninneutralbiodiversitytheory.JournalofTheoreticalBiology,248,522–536.

Etienne, R.S. & Olff, H. (2004) A novel genealogical approach to neutral biodiversity theory.EcologyLetters,7,170–175.

Page 63: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

59

Etienne,R.S.&Olff,H.(2005)Confrontingdifferentmodelsofcommunitystructuretospecies-abundancedata:aBayesianmodelcomparison.EcologyLetters,8,493–504.

Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical populationbiology,3,87–112.

Falush, D., Stephens, M. & Pritchard, J.K. (2007) Inference of population structure usingmultilocusgenotypedata:dominantmarkersandnullalleles.MolecularEcologyNotes,7,574–578.

Falush, D., Stephens, M. & Pritchard, J.K. (2003) Inference of population structure usingmultilocus genotype data: Linked loci and correlated allele frequencies.Genetics,164,1567–1587.

Ferguson, T.S. (1973) A Bayesian analysis of some nonparametric problems. The annals ofstatistics,209–230.

Fisher,C.K.&Mehta,P.(2014)Thetransitionbetweenthenicheandneutralregimesinecology.Proceedings of the National Academy of Sciences of the United States of America, 111,13111–13116.

Fisher, J.B., Huntzinger, D.N., Schwalm, C.R. & Sitch, S. (2014) Modeling the TerrestrialBiosphere.AnnualReviewofEnvironmentandResources,39,91–123.

Fisher,R.A.(1925)Statisticalmethodsforresearchworkers,GenesisPublishingPvtLtd.Fisher,R.A.(1930)Thegeneticaltheoryofnaturalselection:acompletevariorumedition,Oxford

UniversityPress.Fisher,R.A.,Corbet,A.S.&Williams,C.B.(1943)Therelationbetweenthenumberofspeciesand

thenumberof individuals ina randomsampleofananimalpopulation.TheJournalofAnimalEcology,42–58.

Fortunato,S.(2010)Communitydetectioningraphs.PhysicsReports.Fuhrman, J.A. (2009) Microbial community structure and its functional implications.Nature,

459,193–199.Gause, G.F. (1932)Experimental studies on the struggle for existence: 1.Mixedpopulation of

twospeciesofyeast.JournalofExperimentalBiology,9,389–402.Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. (2014) Bayesian data analysis, Chapman &

Hall/CRCBocaRaton,FL,USA.Ghalambor,C.K.,Hoke,K.L.,Ruell,E.W.,Fischer,E.K.,Reznick,D.N.&Hughes,K.A.(2015)Non-

adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature.Nature,525,372–375.

Gibson,J.,Shokralla,S.,Porter,T.M.,King,I.,vanKonynenburg,S.,Janzen,D.H.,Hallwachs,W.&Hajibabaei,M.(2014)Simultaneousassessmentofthemacrobiomeandmicrobiomeinabulk sample of tropical arthropods through DNA metasystematics. Proceedings of theNationalAcademyofSciencesoftheUnitedStatesofAmerica,111,8007–8012.

Gilbert, J.A., Jansson, J.K. & Knight, R. (2014) The Earth Microbiome project: successes andaspirations.BmcBiology,12.

Giovannoni,S.J.,Britschgi,T.B.,Moyer,C.L.&Field,K.G.(1990)GeneticdiversityinSargassoSeabacterioplankton.Nature,345,60–63.

Gleason,H.A. (1926)The individualisticconceptof theplantassociation.BulletinoftheTorreyBotanicalClub,7–26.

Gourlet-Fleury,S.,Ferry,B.,Molino, J.-F.,Petronelli,P.&Schmitt,L. (2004)Experimentalplots:

Page 64: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

60

keyfeatures,Elsevier.Gravel,D.,Canham,C.D.,Beaudet,M.&Messier,C.(2006)Reconcilingnicheandneutrality:the

continuumhypothesis.EcologyLetters,9,399–409.Griffiths,T.&Steyvers,M.(2004)CollapsedGibbsSamplingforLDA.101,5228–5235.Grinnell,J.(1917)Theniche-relationshipsoftheCaliforniaThrasher.TheAuk,34,427–433.Guisan, A. & Thuiller, W. (2005) Predicting species distribution: offering more than simple

habitatmodels.EcologyLetters,8,993–1009.Hansen,M.C., Potapov, P.V.,Moore, R., Hancher,M., Turubanova, S.A., Tyukavina, A., Thau, D.,

Stehman, S.V., Goetz, S.J., Loveland, T.R., Kommareddy, A., Egorov, A., Chini, L., Justice,C.O. & Townshend, J.R.G. (2013) High-Resolution Global Maps of 21st-Century ForestCoverChange.Science,342,850–853.

Hanson,C.A.,Fuhrman,J.A.,Horner-Devine,M.C.&Martiny,J.B.H.(2012)Beyondbiogeographicpatterns:processesshapingthemicrobiallandscape.NatureReviewsMicrobiology.

Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticalandecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.

Hebert,P.D.N.,Cywinska,A.,Ball,S.L.&DeWaard,J.R.(2003)BiologicalidentificationsthroughDNAbarcodes.ProceedingsoftheRoyalSocietyB-BiologicalSciences,270,313–321.

Heckenberger,M.J.,Russell,J.C.,Fausto,C.,Toney,J.R.,Schmidt,M.J.,Pereira,E.,Franchetto,B.&Kuikuro,A.(2008)Pre-ColumbianUrbanism,AnthropogenicLandscapes,andtheFutureoftheAmazon.Science,321,1214.

Hillebrand,H.(2004)Onthegeneralityofthelatitudinaldiversitygradient.AmericanNaturalist,163,192–211.

Holmes,I.,Harris,K.&Quince,C.(2012)DirichletMultinomialMixtures:GenerativeModelsforMicrobialMetagenomics.PlosOne,7.

Holt,R.D.(2006)Emergentneutrality.TrendsinEcology&Evolution,21,531–533.Holt, R.D. (1996) Food Webs in Space: An Island Biogeographic Perspective. Food Webs:

IntegrationofPatterns&Dynamics(ed.byG.A.Polis)andK.O.Winemiller),pp.313–323.SpringerUS,Boston,MA.

Hoppe, F.M. (1984)Polya-likeurns and theEwens sampling formula. JournalofMathematicalBiology,20,91–94.

Houchmandzadeh, B. (2009) Theory of neutral clustering for growing populations. PhysicalReviewE,80,8.

Hubbell, S.P. (1997)Aunified theory of biogeography and relative species abundance and itsapplicationtotropicalrainforestsandcoralreefs.CoralReefs,16,S9–S21.

Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.

Hubbell,S.P.(1979)Treedispersion,abundance,anddiversityinatropicaldryforest.Science,203,1299–1309.

Hubisz,M.J.,Falush,D.,Stephens,M.&Pritchard,J.K.(2009)Inferringweakpopulationstructurewiththeassistanceofsamplegroupinformation.MolecularEcologyResources,9,1322–1332.

Hug,L.A.,Baker,B.J.,Anantharaman,K.,Brown,C.T.,Probst,A.J.,Castelle,C.J.,Butterfield,C.N.,Hernsdorf, A.W., Amano, Y., Ise, K., Suzuki, Y., Dudek, N., Relman, D.A., Finstad, K.M.,

Page 65: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

61

Amundson,R.,Thomas,B.C.&Banfield,J.F.(2016)Anewviewofthetreeoflife.NatureMicrobiology,1,16048.

Hutchinson, G.E. (1957) Concluding remarks. Cold Spring Harbor Symposia on QuantitativeBiology,22,415–427.

Hutchinson,G.E.(1961)Theparadoxoftheplankton.TheAmericanNaturalist,95,137–145.Jabot, F., Etienne, R.S. & Chave, J. (2008) Reconciling neutral community models and

environmentalfiltering:theoryandanempiricaltest.Oikos,117,1308–1320.Jain, A.K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31,

651–666.Janzen,D.H.(1970)Herbivoresandthenumberoftreespeciesintropicalforests.TheAmerican

Naturalist,104,501–528.Jost,L.(2007)Partitioningdiversityintoindependentalphaandbetacomponents.Ecology,88,

2427–2439.Kalyuzhny, M., Kadmon, R. & Shnerb, N.M. (2015) A neutral theory with environmental

stochasticityexplainsstaticanddynamicpropertiesofecologicalcommunities.EcologyLetters,18,572–580.

Knights,D.,Kuczynski, J.,Charlson,E.S.,Zaneveld, J.,Mozer,M.C.,Collman,R.G.,Bushman,F.D.,Knight, R. & Kelley, S.T. (2011) Bayesian community-wide culture-independentmicrobialsourcetracking.NatureMethods,8,761-U107.

Kraft, N.J.B., Adler, P.B., Godoy, O., James, E.C., Fuller, S. & Levine, J.M. (2015) Communityassembly,coexistenceandtheenvironmentalfilteringmetaphor.FunctionalEcology,29,592–599.

Lawton,J.H.(1999)AreThereGeneralLawsinEcology?Oikos,84,177.Lawton, J.H., Bignell, D.E., Bolton, B., Bloemers, G.F., Eggleton, P., Hammond, P.M., Hodda, M.,

Holt,R.D.,Larsen,T.B.&Mawdsley,N.A.(1998)Biodiversityinventories,indicatortaxaandeffectsofhabitatmodificationintropicalforest.Nature,391,72–76.

Legendre, P., Borcard, D. & Peres-Neto, P.R. (2005) Analyzing beta diversity: partitioning thespatialvariationofcommunitycompositiondata.EcologicalMonographs.

Legendre, P., Borcard, D. & Peres-Neto, P.R. (2008) Analyzing or explaining beta diversity?Comment.Ecology,89,3238–3244.

Legendre, P. & De Caceres, M. (2013) Beta diversity as the variance of community data:dissimilaritycoefficientsandpartitioning.EcologyLetters,16,951–963.

Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.Leibold,M.A., Holyoak,M.,Mouquet, N., Amarasekare, P., Chase, J.M., Hoopes,M.F., Holt, R.D.,

Shurin, J.B., Law,R., Tilman,D., Loreau,M.&Gonzalez,A. (2004)Themetacommunityconcept:aframeworkformulti-scalecommunityecology.EcologyLetters,7,601–613.

Linnaeus,C.(1753)SpeciesPlantarum.Liu,B.,Liu,L.,Tsykin,A.,Goodall,G.J.,Green, J.E., Zhu,M.,Kim,C.H.&Li, J. (2010) Identifying

functional miRNA-mRNA regulatory modules with correspondence latent dirichletallocation.Bioinformatics,26,3105–3111.

Livermore, J.A. & Jones, S.E. (2015) Local-global overlap in diversity informsmechanisms ofbacterialbiogeography.ISMEJ,9,2413–2422.

Loreau, M. & de Mazancourt, C. (2013) Biodiversity and ecosystem stability: a synthesis ofunderlyingmechanisms.EcologyLetters,16,106–115.

Page 66: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

62

MacArthur, R.H. (1972)Geographical ecology:patterns in thedistributionof species, PrincetonUniversityPress.

MacArthur,R.H. (1957)On therelativeabundanceofbirdspecies.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,43,293–5.

MacArthur,R.H.(1958)Populationecologyofsomewarblersofnortheasternconiferousforests.Ecology,39,599–619.

MacArthur, R.H. & Wilson, E. 0. (1967) The theory of island biogeography. Monographs inPopulationBiology,1.

Maire,V.,Gross,N.,Börger,L.,Proulx,R.,Wirth,C.,Pontes,L.daS.,Soussana,J.-F.&Louault,F.(2012) Habitat filtering and niche differentiation jointly explain species relativeabundancewithingrasslandcommunitiesalongfertilityanddisturbancegradients.NewPhytologist,196,497–509.

Margulies,M.,Egholm,M.,Altman,W.E.,Attiya,S.,Bader,J.S.,Bemben,L.A.,Berka,J.,Braverman,M.S.,Chen,Y.-J.,Chen,Z.&others (2005)Genomesequencing inopenmicrofabricatedhighdensitypicoliterreactors.Nature,437,376.

Mariadassou,M.,Pichon,S.&Ebert,D.(2015)Microbialecosystemsaredominatedbyspecialisttaxa.EcologyLetters,18,974–982.

Martiny, J.B.H., Bohannan, B.J.M., Brown, J.H., Colwell, R.K., Fuhrman, J.A., Green, J.L., Horner-Devine, M.C., Kane, M., Krumins, J.A., Kuske, C.R., Morin, P.J., Naeem, S., Øvreås, L.,Reysenbach, A.-L., Smith, V.H. & Staley, J.T. (2006) Microbial biogeography: puttingmicroorganismsonthemap.NatureReviewsMicrobiology,4,102–112.

Martiny, J.B.H., Eisen, J.A., Penn, K., Allison, S.D. & Horner-Devine, M.C. (2011) Drivers ofbacterialbeta-diversitydependonspatialscale.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,108,7850–7854.

Mauch,M.,MacCallum,R.M.,Levy,M.&Leroi,A.M.(2015)Theevolutionofpopularmusic:USA1960-2010.RoyalSocietyopenscience,2,150081–150081.

McCann,K.S.(2000)Thediversity-stabilitydebate.Nature,405,228.McGill,B.J.(2003)Atestoftheunifiedneutraltheoryofbiodiversity.Nature,422,881–885.McGill, B.J., Etienne, R.S., Gray, J.S., Alonso, D., Anderson, M.J., Benecha, H.K., Dornelas, M.,

Enquist, B.J., Green, J.L., He, F.L., Hurlbert, A.H.,Magurran, A.E.,Marquet, P.A.,Maurer,B.A., Ostling, A., Soykan, C.U., Ugland, K.I. & White, E.P. (2007) Species abundancedistributions: moving beyond single prediction theories to integration within anecologicalframework.EcologyLetters,10,995–1015.

McGill,B.J.,Maurer,B.A.&Weiser,M.D.(2006)Empiricalevaluationofneutraltheory.Ecology,87,1411–1423.

McKane, A.J., Alonso, D. & Sole, R.V. (2004) Analytic solution of Hubbell’s model of localcommunitydynamics.TheoreticalPopulationBiology,65,67–73.

Miller,J.(2010)SpeciesDistributionModeling.GeographyCompass,4,490–509.Moran,P.A.P. (1958)Randomprocessesingenetics.MathematicalProceedingsoftheCambridge

PhilosophicalSociety,pp.60–71.CambridgeUniversityPress.Morrone, J.J. (2015) Biogeographical regionalisation of the world: a reappraisal. Australian

SystematicBotany,28,81.Noble, I.R. & Slatyer, R.O. (1977) Post-fire succession of plants in Mediterranean ecosystems.

Symposium on Environmental Consequences of Fire and Fuel Management in

Page 67: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

63

MediterraneanEcosystems,PaloAlto,CA,USA,pp.27–36.O’Dwyer, J.P.,Lake,J.K.,Ostling,A.,Savage,V.M.&Green,J.L.(2009)Anintegrativeframework

for stochastic, size-structured community assembly. Proceedings of the NationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6170–6175.

Ofiteru, I.D., Lunn,M., Curtis, T.P.,Wells, G.F., Criddle, C.S., Francis, C.A. & Sloan,W.T. (2010)Combined niche and neutral effects in amicrobial wastewater treatment community.Proceedings of the National Academy of Sciences of the United States of America, 107,15345–15350.

Olszewski,D.(2012)EmployingKullback-LeiblerdivergenceandLatentDirichletAllocationforfrauddetectionintelecommunications.IntelligentDataAnalysis,16,467–485.

Pace,N.R.(1997)Amolecularviewofmicrobialdiversityandthebiosphere.Science,276,734–740.

Paine,R.T.(1980)FoodWebs:Linkage,InteractionStrengthandCommunityInfrastructure.TheJournalofAnimalEcology,49,666.

Palmer,M.W. (1994)Variation in species richness: towards a unification of hypotheses.FoliaGeobotanica,29,511–530.

Pawitan, Y. (2001) Inall likelihood: statisticalmodellingand inferenceusing likelihood, OxfordUniversityPress.

Poisot, T., Stouffer, D.B. & Gravel, D. (2014) Beyond species: why ecological interactionnetworksvarythroughspaceandtime.BioRxivpreprint.

Polis,G.A.,Sears,A.L.,Huxel,G.R.,Strong,D.R.&Maron, J. (2000)When isa trophiccascadeatrophiccascade?TrendsinEcology&Evolution,15,473–475.

Preston,F.W.(1948)Thecommonness,andrarity,ofspecies.Ecology,29,254–283.Preston,F.W.(1960)Timeandspaceandthevariationofspecies.Ecology,41,611–627.Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using

multilocusgenotypedata.Genetics,155,945–959.Pueyo, S., He, F. & Zillio, T. (2007) The maximum entropy formalism and the idiosyncratic

theoryofbiodiversity.EcologyLetters,10,1017–1028.Ramirez,K.S.,Leff, J.W.,Barberan,A.,Bates,S.T.,Betley, J.,Crowther,T.W.,Kelly,E.F.,Oldfield,

E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographicpatterns inbelow-grounddiversity inNewYorkCity’sCentralParkaresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.

Ricklefs,R.E.(2003)AcommentonHubbell’szero-sumecologicaldriftmodel.Oikos,100,185–192.

Ricklefs, R.E. (1987) Community diversity: relative roles of local and regional processes.Science(Washington),235,167–171.

Ricklefs, R.E. (2006) The unified neutral theory of biodiversity: Do the numbers add up?Ecology,87,1424–1431.

Roguet,A.,Laigle,G.S.,Therial,C.,Bressy,A.,Soulignac,F.,Catherine,A.,Lacroix,G.,Jardillier,L.,Bonhomme,C., Lerch,T.Z.&Lucas,F.S. (2015)Neutral communitymodel explains thebacterialcommunityassemblyinfreshwaterlakes.FemsMicrobiologyEcology,91,11.

Rønsted, N., Weiblen, G.D., Cook, J.M., Salamin, N., Machado, C.A. & Savolainen, V. (2005) 60million years of co-divergence in the fig–wasp symbiosis. Proceedings of the Royal

Page 68: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

64

SocietyB:BiologicalSciences,272,2593–2599.Rosen-Zvi,M.,Gri_ffiths,T.,Steyvers,M.&Smyth,P.(2004)TheAuthor-TopicModelforAuthors

andDocuments.Rosindell, J., Cornell, S.J., Hubbell, S.P.& Etienne, R.S. (2010) Protracted speciation revitalizes

theneutraltheoryofbiodiversity.EcologyLetters,13,716–727.Rosindell, J., Hubbell, S.P., He, F., Harmon, L.J. & Etienne, R.S. (2012) The case for ecological

neutraltheory.TrendsinEcology&Evolution,27,203–208.Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European Physical

JournalSpecialTopics,178,13–23.Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, J.C., Hutchison, C.A.,

Slocombe,P.M.&Smith,M. (1977)NucleotidesequenceofbacteriophageφX174DNA.nature,265,687–695.

Scheffer,M.,Carpenter,S.R.,Lenton,T.M.,Bascompte,J.,Brock,W.,Dakos,V.,VandeKoppel,J.,Van de Leemput, I.A., Levin, S.A., Van Nes, E.H. & others (2012) Anticipating criticaltransitions.science,338,344–348.

Scheffer, M. & van Nes, E.H. (2006) Self-organized similarity, the evolutionary emergence ofgroupsof similarspecies.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,103,6230–6235.

Scheffers,B.R., Joppa,L.N.,Pimm,S.L.&Laurance,W.F.(2012)Whatweknowanddon’tknowaboutEarth’smissingbiodiversity.TrendsinEcology&Evolution,27,501–510.

Schemske,D.W.,Mittelbach,G.G.,Cornell,H.V.,Sobel,J.M.&Roy,K.(2009)IsThereaLatitudinalGradient in the ImportanceofBiotic Interactions?AnnualReviewofEcology,Evolution,andSystematics,40,245–269.

Schuster, S.C. (2007)Next-generation sequencing transforms today’sbiology.NatureMethods,5,16–18.

Shafiei, M., Dunn, K.A., Boon, E., MacDonald, S.M.,Walsh, D.A., Gu, H. & Bielawski, J.P. (2015)BioMiCo:asupervisedBayesianmodelforinferenceofmicrobialcommunitystructure.Microbiome,3,8.

Shmida, A.V.I. & Wilson, M.V. (1985) Biological determinants of species diversity. Journal ofbiogeography,1–20.

Simberloff,D.S.&Wilson,E.O.(1969)ExperimentalZoogeographyofIslands:TheColonizationofEmptyIslands.Ecology,50,278–296.

Sizling, A.L., Storch, D., Sizlingova, E., Reif, J. & Gaston, K.J. (2009) Species abundancedistribution results from a spatial analogy of central limit theorem.Proceedingsof theNationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6691–6695.

Sloan,W.T.,Lunn,M.,Woodcock,S.,Head,I.M.,Nee,S.&Curtis,T.P.(2006)Quantifyingtherolesof immigrationandchance inshapingprokaryotecommunitystructure.EnvironmentalMicrobiology,8,732–740.

Sloan,W.T.,Woodcock, S., Lunn,M.,Head, I.M.&Curtis,T.P. (2007)Modeling taxa-abundancedistributions in microbial communities using environmental sequence data.MicrobialEcology.

Soininen,J.,McDonald,R.&Hillebrand,H.(2007)Thedistancedecayofsimilarityinecologicalcommunities.Ecography,30,3–12.

ter Steege, H., Nigel, C.A., Sabatier, D., Baraloto, C., Salomao, R.P., Guevara, J.E., Phillips, O.L.,

Page 69: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

65

Castilho,C.V.,Magnusson,W.E.,Molino, J.F.,Monteagudo,A.,Vargas,P.N.,Montero, J.C.,Feldpausch, T.R., Coronado, E.N.H., Killeen, T.J., Mostacedo, B., Vasquez, R., Assis, R.L.,Terborgh,J.,Wittmann,F.,Andrade,A.,Laurance,W.F.,Laurance,S.G.W.,Marimon,B.S.,Marimon, B.H., Vieira, I.C.G., Amaral, I.L., Brienen, R., Castellanos, H., Lopez, D.C.,Duivenvoorden, J.F.,Mogollon,H.F.,Matos, F.D.D.,Davila,N., Garcia-Villacorta,R.,Diaz,P.R.S., Costa, F., Emilio, T., Levis, C., Schietti, J., Souza, P., Alonso, A., Dallmeier, F.,Montoya,A.J.D.,Piedade,M.T.F.,Araujo-Murakami,A.,Arroyo,L.,Gribel,R.,Fine,P.V.A.,Peres,C.A.,Toledo,M.,Gerardo,A.A.C.,Baker,T.R.,Ceron,C.,Engel,J.,Henkel,T.W.,Maas,P., Petronelli, P., Stropp, J., Zartman, C.E., Daly, D., Neill, D., Silveira,M., Paredes,M.R.,Chave,J.,Lima,D.D.,Jorgensen,P.M.,Fuentes,A.,Schongart,J.,Valverde,F.C.,DiFiore,A.,Jimenez, E.M.,Mora,M.C.P., Phillips, J.F., Rivas, G., van Andel, T.R., von Hildebrand, P.,Hoffman,B.,Zent,E.L.,Malhi,Y.,Prieto,A.,Rudas,A.,Ruschell,A.R.,Silva,N.,Vos,V.,Zent,S.,Oliveira,A.A.,Schutz,A.C.,Gonzales,T.,Nascimento,M.T.,Ramirez-Angulo,H.,Sierra,R.,Tirado,M.,Medina,M.N.U.,vanderHeijden,G.,Vela,C.I.A.,Torre,E.V.,Vriesendorp,C.,Wang,O.,Young,K.R.,Baider,C.,Balslev,H.,Ferreira,C.,Mesones,I.,Torres-Lezama,A.,Giraldo, L.E.U., Zagt, R., Alexiades, M.N., Hernandez, L., Huamantupa-Chuquimaco, I.,Milliken,W.,Cuenca,W.P.,Pauletto,D.,Sandoval,E.V.,Gamarra,L.V.,Dexter,K.G.,Feeley,K., Lopez-Gonzalez, G. & Silman,M.R. (2013)Hyperdominance in theAmazonian TreeFlora.Science,342,325–+.

Svenning, J. & Skov, F. (2007) Could the tree diversity pattern in Europe be generated bypostglacialdispersallimitation?Ecologyletters,10,453–460.

Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012a)EnvironmentalDNA.MolecularEcology,21,1789–1793.

Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C.&Willerslev, E. (2012b)Towards next-generation biodiversity assessment using DNAmetabarcoding.Molecular Ecology, 21,2045–2050.

Tansley,A.G.(1935)TheUseandAbuseofVegetationalConceptsandTerms.Ecology,16,284–307.

Teh,Y.W.,Jordan,M.I.,Beal,M.J.&Blei,D.M.(2006)HierarchicalDirichletProcesses.JournaloftheAmericanStatisticalAssociation,101,1566–1581.

Tilman,D.(1982)Resourcecompetitionandcommunitystructure,Princetonuniversitypress.Tilman,D.,May,R.M.,Lehman,C.L.&Nowak,M.A.(1994)Habitatdestructionandtheextinction

debt.Nature,371,65–66.Tilman,D.,Reich,P.B.&Knops,J.M.H.(2006)Biodiversityandecosystemstabilityinadecade-

longgrasslandexperiment.Nature,441,629–632.Tokeshi, M. (1996) Power Fraction: A New Explanation of Relative Abundance Patterns in

Species-RichAssemblages.Oikos,75,543–550.Tuomisto, H., Ruokolainen, K. & Yli-Halla, M. (2003) Dispersal, environment, and floristic

variationofwesternAmazonianforests.Science,299,241–244.Vaduva, C., Gavat, I. & Datcu, M. (2013) Latent Dirichlet Allocation for Spatial Analysis of

SatelliteImages.IeeeTransactionsonGeoscienceandRemoteSensing,51,2770–2786.Vallade,M.&Houchmandzadeh,B.(2003)Analyticalsolutionofaneutralmodelofbiodiversity.

PhysicalReviewE,68,5.Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)Decomposingbiodiversitydatausingthe

Page 70: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

66

LatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.EcologyLetters,17,1591–1601.

deVargas,C.,Audic,S.,Henry,N.,Decelle,J.,Mahe,F.,Logares,R.,Lara,E.,Berney,C.,LeBescot,N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J.-M., Bittner, L.,Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horak, A., Jaillon, O.,Lima-Mendez,G.,Lukes,J.,Malviya,S.,Morard,R.,Mulot,M.,Scalco,E.,Siano,R.,Vincent,F.,Zingone,A.,Dimier,C.,Picheral,M.,Searson,S.,Kandels-Lewis,S.,Acinas,S.G.,Bork,P.,Bowler,C.,Gorsky,G.,Grimsley,N.,Hingamp,P.,Iudicone,D.,Not,F.,Ogata,H.,Pesant,S.,Raes, J.,Sieracki,M.E.,Speich,S.,Stemmann,L.,Sunagawa,S.,Weissenbach, J.,Wincker,P., Karsenti, E. & Tara Oceans, C. (2015) Eukaryotic plankton diversity in the sunlitocean.Science,348.

Vellend,M.(2010)Conceptualsynthesisincommunityecology.QuarterlyReviewofBiology,85,183–206.

Vilhena, D.A. & Antonelli, A. (2015) A network approach for identifying and delimitingbiogeographicalregions.NatureCommunications,6.

Volkov, I.,Banavar, J.R.,Hubbell, S.P.&Maritan,A. (2003)Neutral theoryandrelative speciesabundanceinecology.Nature,424,1035–1037.

Wakeley, J. (2009) Coalescent Theory: An Introduction, Roberts & Company Publishers,GreenwoodVillage.

Wallace,A.R.(1876)TheGeographicalDistributionofAnimals:WithaStudyoftheRelationsofLivingandExtinctFaunasasElucidatingthePastChangesoftheEarth’sSurface:InTwoVolumes.

Wang,H.G.,Wei,Z.,Mei,L.J.,Gu,J.X.,Yin,S.S.,Faust,K.,Raes,J.,Deng,Y.,Wang,Y.L.,Shen,Q.R.&Yin, S.X. (2017) Combined use of network inference tools identifies ecologicallymeaningfulbacterialassociationsinapaddysoil.SoilBiology&Biochemistry,105,227–235.

Watson,H.C.(1859)CybeleBritannica,.Watterson,G.A.(1974)Modelsforthelogarithmicspeciesabundancedistributions.Theoretical

PopulationBiology,6,217–250.Whittaker,R.H.(1965)DominanceandDiversityinLandPlantCommunities.Science,147,250–

260.Whittaker,R.H.(1960)VegetationoftheSiskiyouMountains,OregonandCalifornia.Ecological

Monographs,30,279–338.Williamson,M.H. (1988)Relationshipofspeciesnumbertoarea,distanceandothervariables.Á

In:Myers,AAandGiller,P.S.(eds),Analyticalbiogeography,anintegratedapproachtothestudyofanimalandplantdistributions,ChapmanandHall,pp.91Á115.

Wisz, M.S., Pottier, J., Kissling, W.D., Pellissier, L., Lenoir, J., Damgaard, C.F., Dormann, C.F.,Forchhammer,M.C.,Grytnes,J.-A.,Guisan,A.,Heikkinen,R.K.,Høye,T.T.,Kühn,I.,Luoto,M.,Maiorano,L.,Nilsson,M.-C.,Normand,S.,Öckinger,E.,Schmidt,N.M.,Termansen,M.,Timmermann, A., Wardle, D.A., Aastrup, P. & Svenning, J.-C. (2013) The role of bioticinteractions in shapingdistributionsand realisedassemblagesof species: implicationsforspeciesdistributionmodelling.BiologicalReviews,88,15–30.

Woodcock, S., vanderGast,C.J.,Bell,T., Lunn,M.,Curtis,T.P.,Head, I.M.&Sloan,W.T. (2007)Neutralassemblyofbacterialcommunities.FemsMicrobiologyEcology,62,171–180.

Page 71: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

67

Wright, J.P., Jones, C.G. & Flecker, A.S. (2002) An ecosystem engineer, the beaver, increasesspeciesrichnessatthelandscapescale.Oecologia,132,96–101.

Wright,S.(1931)EvolutioninMendelianpopulations.Genetics,16,97–159.Wright, S.J. (2002) Plant diversity in tropical forests: a review of mechanisms of species

coexistence.Oecologia,130,1–14.Yu,D.W., Ji,Y.Q.,Emerson,B.C.,Wang,X.Y.,Ye,C.X.,Yang,C.Y.&Ding,Z.L. (2012)Biodiversity

soup: metabarcoding of arthropods for rapid biodiversity assessment andbiomonitoring.MethodsinEcologyandEvolution,3,613–623.

Zillio,T.,Volkov,I.,Banavar,J.R.,Hubbell,S.P.&Maritan,A.(2005)Spatialscalinginmodelplantcommunities.PhysicalReviewLetters,95.

Page 72: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Introduction

68

Page 73: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

69

Chapter1CausesofvariationinsoilbetadiversityacrossdomainsoflifeinthetropicalforestsofFrenchGuiana

GuilhemSommeria-Klein1,2,LucieZinger1,2,AmaiaIribar1,ElianeLouisanna3,Sophie

Manzi1,VincentSchilling1,EricCoissac4,HeidySchimann3,PierreTaberlet4,Jérôme

Chave1

1UniversitéToulouse3PaulSabatier,CNRS,IRD,UMR5174LaboratoireEvolutionetDiversitéBiologique(EDB),F-31062Toulouse,France.2EcoleNormaleSupérieure,CNRS,UMR8197InstitutdeBiologiedel’ENS(IBENS),F-75005Paris,France3INRA,AgroParisTech,CIRAD,CNRS,UniversitédesAntilles,UniversitédelaGuyane,UMREcologiedesForetsdeGuyane(EcoFoG),F-97379Kourou,France.4UniversitéGrenobleAlpes,CNRS,UMRLaboratoired'EcologieAlpine(LECA),F-38000Grenoble,France.

Page 74: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

70

Chapteroutline

Betadiversitypatterns,i.e.howtaxonomiccompositionshiftsthroughspace,havelong

been used to infer the mechanisms of community assembly. Indeed, depending on

whether taxonomic composition covaries with environmental conditions or with

geographical distance, it can be inferred whether community assembly is driven by

deterministic niche processes or by neutral dispersal limitation. In this chapter, this

reasoningisappliedtoasoilDNAdatasetcollectedinvarious1-haforestplotsinFrench

Guiana,forarangeofbarcodesspanningmostofthetreeoflife.Toenablebothtypesof

processestobedistinguished,thesampledplotscoverarangeofsoiltypesaswellasa

rangeofinter-plotdistances.Inter-plotdistancesareapproximatelyregularlyspacedon

alogarithmicscale,soastobetterassesstheeffectofdispersallimitationontaxonomic

composition.Indeed,neutraldispersallimitationispredictedtoyieldalineardecrease

of taxonomicsimilaritywith log-distance.Asasidequestion, theeffectofpast logging

activitiesonsoilbiodiversityisassessedbasedonasetofdisturbedforestplots.

Page 75: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

71

Abstract

Disentanglingtheprocessesthatcausetheassemblyofecologicalcommunitiesisakey

challenge, and these include both stochastic (neutral) processes and deterministic

niche filtering. Progress in biodiversity assessment using environmental DNA now

streamlines the studyofbiodiversitypatterns acrossdomainsof life.Using soilDNA

samples,wequantifiedthecausesofvariationinbetadiversitypatternsacrossmajor

taxonomic groups in the lowland tropical forest of French Guiana on a spatial scale

ranging from 40 m to 140 km, for a range of soil physico-chemical properties. We

quantifiedtherespective influenceofsoilconditions,dispersal limitation,andhuman

disturbances on beta diversity. In undisturbed forest plots, we found that the beta

diversity of bacteria and protists was primarily driven by soil conditions, while the

observedpatternsinplants,andtoalesserextentinannelids,werebestexplainedby

dispersal limitation. Both factors had an effect on fungi, arthropods and insects,

whereaswecouldnotdetect influenceofeitherfactoronnematodesandflatworms.

This analysis was consistent with a comparison of our data to the similarity decay

predicted by the neutral theory of biodiversity. These results suggest that spatial

patterns of plant biodiversity across the Amazon do not necessarily extend to other

taxonomicgroups, and that environmental factorsplaya foremost role in explaining

thesepatternsintropicalsoils.Alongthedisturbancegradient,wefoundasignificant

shift in taxonomic composition in two functionally important groups, plants and

annelids,asmallereffectonfungi,andnoeffectintheothergroups.

Page 76: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

72

Page 77: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

73

Introduction

Beta diversity describes the turnover of taxonomic composition through geographical

and environmental space, and yields insight into the mechanisms of community

assembly(Whittaker,1960,1972;Rosenzweig,1995;Gaston&Blackburn,2008).Asa

measureofthespatialvariabilityoftaxonomiccomposition,itmaybebroadlydefinedas

thedifferenceor ratiobetweenregional (gamma)diversityand local (alpha)diversity

(Whittaker, 1960; Chao et al., 2012). This has important practical implications for

biodiversityestimatesandconservation(Bassetetal.,2012;Hubbell,2013;terSteegeet

al.,2013;Socolaretal.,2017).

The extent of beta diversity and its causal mechanisms are dependent on the

spatial scale at which taxonomic turnover is considered (Soininen et al., 2007). Beta

diversityisoftenquantifiedwithinabiogeographicregion,sothatitisnotcausedbya

largeclimaticdifferenceoradifferentbiogeographichistorybetweenstations(Kreft&

Jetz,2010).Variationinbetadiversitycanbeascribedtotwotypesofprocesses:niche-

basedprocesses,whenabiotic andbiotic environmentalheterogeneitydetermines the

spatialdistributionoftaxabasedontheirphenotypicdifferences,andneutralprocesses,

when turnover in taxonomic composition results from demographic stochasticity

combinedwithlimiteddispersal(Leiboldetal.,2004).However,becauseenvironmental

differencestendtoalsobespatiallystructured,bothtypesofprocessesareoftendifficult

todisentangle(Gilbert&Lechowicz,2004).

Onefrontierinthestudyofbetadiversityisthatithasmostoftenbeenrestricted

toasingletaxonomicgroup,andespeciallyforesttrees(Whittaker,1960,1972;Nekola

&White,1999;Conditetal., 2002), amphibians (Baselgaetal., 2012), andarthropods

(Harrison et al., 1992; Novotny et al., 2007; Hortal et al., 2011), and freshwater taxa

(Cottenie, 2005). Studies that have attempted to compare patterns of beta diversity

acrosstaxaarescarce(butseeHarrisonetal.,1992).This is largelybecausetheeffort

needed to coordinate inventories of biological diversity across taxa is enormous, and

increases dramatically for smaller-bodied taxa (Lawton et al., 1998). DNA-based

methods have lifted this constraint and they have dramaticallywidened the range of

Page 78: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

74

taxaforwhichdiversitypatternscanbemeasured.Insteadofcollectingorganismsand

assigning them a taxon label based on observation and on expert knowledge,

identificationisbasedonminuteamountsofbiologicalmaterialandonthesequencing

of universal DNA amplicons (DNA barcodes), a method first developed for

microorganisms (Pace, 1997). This method has been extended to rapid taxonomic

surveys:bulkDNAisextractedfromenvironmentalsamplesandDNAisamplifiedusing

universal primers, then sequenced (Taberlet et al., 2012, Yu et al. 2012). This

environmental DNA approach to biological diversity inventory aims at detecting the

presence of cells or of extracellular DNA for a range of taxa in a sample. Such an

approachisinprincipleapplicabletoanytaxonomicgroupinthetreeoflife(Bahramet

al., 2013; Schuldtetal., 2015; Siles&Margesin, 2016;Vincentetal., 2016). Since it is

possibletonormalizetheDNAextractionandsequencingproceduresformanysamples

atonce,suchanapproachissuitedtotheexplorationofbetadiversitypatterns.

We expect that smaller organisms with short generation times display higher

beta diversity at short spatial scale, i.e. over a few meters, than larger organisms,

becausetheyare locally filteredbyenvironmentalheterogeneity(Ramirezetal.,2014;

Mariadassouetal.,2015).Conversely,thebetadiversityofsmallorganismsispredicted

to be less dependent on distance compared to large organisms, owing to their higher

dispersalability(Soininenetal.,2007).Thus,weexpectthespatialdistributionofsmall

organismstobeprimarilygovernedbynicheeffects,whileweexpectlargeorganismsto

better comply with distance-limited neutral dynamics (Hubbell, 2001; Martiny et al.,

2011).Thesepredictionshavedirectimplicationsforthemaintenanceofbiodiversityin

disturbedlandscapes.Organismswithhigherdispersalabilitiesshouldbefoundevenin

heavilydisturbedhabitats.Ontheotherhand,slowdispersersshouldbemoreaffected

bydisturbances,andwouldalsotakelongertorecolonizehabitatsafterabandonment.

Inthisstudy,wecomparesoilbetadiversitypatternsacrossdomainsoflifeina

lowland tropical rainforest. We collected soil samples at locations separated by a

geographical distance ranging from 40m to 140 km, and spanning a variety of soil

types,whichwe quantified, aswell as a range of humandisturbance intensities.We

targeted taxonomic groups using barcodes with different levels of taxonomic

resolution,whichallowedustotesttherobustnessoftheobservedpatterns.Wethus

address the following questions: 1) What is the relative importance of dispersal

Page 79: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

75

limitation and environmental filtering in explaining beta diversity across taxonomic

groups? 2) How good a fit is the dispersal-limited neutral theory for the various

taxonomicgroups?3)Howdoesbetadiversitydependonforestdisturbancebylogging

activities?Finally,weexplorethe implicationsofour findings forcommunityecology

andfortheconservationoftropicalforestecosystems.

Page 80: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

76

Methods

Samplingscheme1.

Wesampledfifteen1-haplotsintheundisturbedlowlandrainforestofFrenchGuiana,

to which we added four 1-ha plots in disturbed habitats (see below). Geographical

distances between plots in pristine forest are approximately regularly spaced on a

logarithmicscale.Thischoicewasmotivatedbytheexpectationofalinearrelationship

between taxonomic similarity and log-distance in a spatially explicit neutral model

(Chave & Leigh, 2002). Twelve plots are located at the Nouragues research station

(about100kminland;latitude4°5’17"Nandlongitude52°40’48"W;Bongersetal.,

2001),andthreeattheParacouresearchstation(nearthecoast; latitude5°18′Nand

longitude52°53′W;Gourlet-Fleuryetal.2004);seeFig.1forlocations.Allplotsconsist

ofterrafirmeforest,butcoverarangeofsoiltypes(seebelow).

Inaddition to samplingplots inundisturbed forest,wealsosampledareas that

haveundergonedisturbancesofdifferentintensities.AtParacou,someplotshavebeen

experimentally logged at several logging intensities starting in 1986

(https://paracou.cirad.fr/experimental-design). In the twoheaviest logging treatments

(T2andT3),33-56%oftheabovegroundbiomasswaslostduetothefellingoperations.

Eighteenyearsafterlogging,theimpactofloggingactivitieswasstillvisible.Wesampled

twocontiguous1-haplotsinoneofthemostheavilyimpactedareas(P12plot).Wealso

sampled two contiguous 1-ha plots in a 25-ha area (Arbocel plot) 14 km away from

Paracou,thatwasfullyclear-cutin1976andleftregeneratingsincethen.

Withineach1-haplot,wecollectedeightysoilsamplesofabout30geachwithan

auger from themineral soil horizon (~10 cmdeep) along a square grid. Tominimize

samplingbiasandcoarsenthespatialgrain,wepooledsoilsamplesfivebyfivefollowing

across-shapedpatternabout15metersacross,withonesampleatthecentreandfour

samples in the corners (Fig. 2). This resulted in sixteen pooled samples per plot.We

extracted DNA from about 10 g of soil per pooled sample within a few hours after

Page 81: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

77

sample collection, using the protocol described in Zinger etal. (2016). The remaining

soilwasdriedforsubsequentanalysesofsoilproperties.

Figure1:Samplingscheme.Relativepositionofallsampled1-haforestplots,in(A)Paracou,and (B) Nouragues; (C) relative position of the Paracou, Arbocel, and Nouragues sites.Undisturbedplotsareinredandthefourdisturbedplots(twoinParacouandtwoinArbocel)inyellow. In Nouragues, PP and GP denote respectively the Petit Plateau and Grand Plateaupermanentmonitoredplots,and‘GP-liana’denotestheL18subplotinGrandPlateau.

DNA amplification and sequencing yielded read counts for Operational

TaxonomicUnits(OTUs)atsixteensitesperplot(seebelow).Wefurtherpooledthese

samples four by four by averaging relative OTU abundances, so as to obtain one

samplingpointper0.25-haplot(Fig.2).Wedefinedthedistancebetweentwosampling

points as the distance between the centres of the two sets of pooled samples. Some

sampleswere removed from thedatasetowing to insufficientPCRyields (seebelow);

hencesomesamplingpointshavefewerthanfoursamplesoraremissing.

Inselberg summit

PP GP-liana

GP

Balanfois

Parare

P12P11

P06 Nouragues

ParacouArbocel

A B C

D

French Guiana

Page 82: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

78

Soilsampleswerealsopooled fourby four toobtainasinglecompositesample

per0.25-hasubplot.Foreachpooledsoilsample,twelvemeasurementsweremadefrom

about60gofdrysoil.Granulometrydistinguishedtheclay(0-2µm),silt(2-63µm)and

sandfractions(63-2000µm).ThepHofsoilinwatersolutionwasmeasured,aswellas

total carbon (C) andnitrogen (N)mass fractions. Themass fraction of plant-available

phosphorus (P2O5) was measured using the Olsen extraction method. Lastly, a BaCl2

extractionwasperformedandtheconcentrationofmajorelementswasmeasuredusing

ICPMS(Ca,Mg,K,Fe,Mg,andAl).

Molecularandsequenceanalyses2.

WeamplifiedfivebarcodesbyPCRfromsoilsamples,targetingbacteria(16SrRNAgene

V5-V6regions;Fliegerovaetal.,2014),eukaryotes(18SrRNAgenev7region;Guardiola

etal.,2015),Viridiplantae(chloroplastictrnL-P6loop;Taberletetal.,1991),fungi(ITS1)

and insects (mitochondrial 16S rRNA; Clarke et al., 2014). Each soil sample was

amplified thrice independentlybyPCR, following the sameprotocol as inZingeretal.

(2017). Ampliconswere labelledwith a distinct nucleotide tag for each PCR, and six

sequencinglibraries,oneperbarcode,wereprepared.Sequencingwascarriedoutusing

paired-end Illumina sequencing (MiSeq 2x250 for 16S bacteria, 16S insects and ITS

fungi; HiSeq 2x100 for 18S eukaryotes and trnL plants). Negative PCR controls were

included in theprotocol to helpdetect contaminants. ThePCRs that yielded less than

1,000readswerediscardedfromsubsequentanalyses.

Data analyseswere conducted as inZingeretal. (2017). Sequencingdatawere

curated using the OBITools package (Boyer et al., 2016): paired-end reads were

assembled, dereplicated, and low-quality sequences were excluded. The resulting

sequenceswereclusteredintoOTUsusingtheInfomapalgorithm(Rosvalletal.,2009),

withadissimilaritythresholdofthreemismatchesandexponentiallydecreasingweights

onedges.OTUsrepresentedbyasinglesequencewereremoved,andthemostabundant

sequence in the clusterwas taken to be the true sequence. Taxonomic identifications

were assigned to OTUs using the ecotag program in the OBITools package based on

GenbankandSILVAdatabases(Zingeretal.,2017).OTUswithlessthan75%similarity

Page 83: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

79

to any reference sequence were removed, as well as those with a taxonomic

identification outside of the taxonomic group targeted by the barcode. Further steps

were takentominimize thenumberofcontaminantOTUsasdescribed inZingeretal.

(2017). RareOTUswere not removed, and only the relativeOTU abundances in each

samplewereusedforfurtheranalyses.

Figure 2: Sampling scheme in each 1-ha forest plot. In each of the nineteenplots (fifteenundisturbedandfourdisturbed),eightysoilsampleswerecollected(openandfullblackcircles),and were pooled five by five (small dashed crosses). After conducting the molecular andsequenceanalysesonthesixteenpooledsamples(fullblackcircles),resultswerepooledfourbyfour(largedashedcross),andstatisticalanalyseswereperformedontheresultingfoureffectivesamplingpoints(opensquares).Thesixteenpooledsoilsampleswerealsodirectlypooledfourbyfourforconductingsoilanalyses.

Taxonomic identifications for the eukaryote 18S marker were used to assign

OTUs to sub-clades (Table S1): arthropods, insects, annelids, nematodes, flat worms

(Platyhelminthes), protists, fungi, and plants (Viridiplantae). The 18S marker was

0 20 40 60 80 100

020

4060

8010

0

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

Sampling scheme1-ha plot

(m)

Page 84: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

80

comparedwithmorespecificmarkersforfungi,plants,andinsects(ITS1,trnL,and16S,

respectively; Table S1). A rarefaction analysis was performed for each marker by

sampling with replacement between 1 and 8,000 reads per sample (Fig. S1). For all

markers,thenumberofOTUsreachednearsaturationinmostsamples.

Statisticalanalyses3.

We performed all statistical analyses in R using the ‘vegan’ package (version 2.4-2,

available at https://cran.r-project.org/), and followed the guidelines of Legendre &

Legendre(2012).Analyseswereperformedseparatelyforeachtaxonomicgroup.

WeperformedaPCAonsoilvariablesaftercenteringandnormalizingthem(i.e.,

subtractingtheirmeananddividingthembytheirstandarddeviationoverallsampling

points). Since clay, silt and sand fractions sum to 1, they yield only two independent

measurements;wechosetokeepclayandsiltfractions,asclayandsandfractionswere

almostperfectlyanticorrelated (correlationcoefficientof -0.97; seeResults,TableS3).

BeforeconductingthePCA,welumpedCa,Mg,MnandKconcentrationstogetherintoa

single‘exchangeablecations’variable.Inallfurtheranalyses,weusedthefirstfourPCA

axesasenvironmentalvariables.

We first studied the taxonomic dissimilarity among pairs of sampled locations

(‘distance-based’approach).Wecomputed theSorensen taxonomicdissimilarity index

(numberofnon-sharedOTUsdividedbynumberofOTUsinbothsamples),whichisone

possiblemeasureofoccurrence-basedbetadiversity(Koleffetal.,2003).TheSorensen

index between pairs of sampling points was regressed against their environmental

dissimilarity and against the logarithm of their geographical distance (measured in

meters).Theenvironmentaldissimilaritybetweentwosamplingpointswasdefinedas

theirEuclidiandistancewithrespecttothefoursoilPCAaxes.Totestthesignificanceof

regressionsoftheSorensenindexagainstenvironmentalandgeographicaldistances,we

performed Mantel tests with 999 permutations using simple and partial Pearson’s

correlationcoefficientsasteststatistics(functions‘mantel’and‘mantel.partial’).

Page 85: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

81

Figure 3. Principal Component Analysis of soil variables for the fifteen undisturbed plots,projectedonthefirsttwoaxes(40%and30%oftotalvariance).‘GP-bottom‘correspondstothelowerhalfoftheGP-O13plot,whichbelongstoabottomland.

We then directly compared the taxonomic composition of sampled locations

using canonical ordination ('raw-data' approach; Legendreetal., 2005).We regressed

the OTU abundance data on environmental and spatial variables using Canonical

Redundancy Analysis (RDA; function ‘rda’). We first applied the Hellinger

transformationtoOTUabundancedata(i.e.,square-rootoftherelativeOTUabundances

at each sampling point) and centred them per OTU (i.e., subtracted the mean over

samplingpoints).Weusedthesixselectedsoilvariablesasexplanatoryenvironmental

variables,aftercentringandnormalization.WeusedPrincipalCoordinatesofNeighbour

Matrices (PCNM) as spatial explanatory variables representing different possible

patternsofspatialautocorrelationinthedata(Borcard&Legendre,2002;Borcardetal.,

2004). Two separate PCNM decompositions were performed for the Nouragues and

Paracou sites (function 'pcnm'; Borcard & Legendre, 2002), i.e. in each site we

performed a Principal Coordinates Analysis of the distancematrix between sampling

points, after setting all distances larger than a threshold distance to four times this

threshold distance (chosen as the minimal distance required to connect all sampling

points). We obtained seventeen PCNM variables with positive eigenvalues for

Nouragues,andsixforParacou.PCNMvariablesfrombothsiteswereassembledintoa

●●

●●

●●

● ●

●●

● ●●

●● ●

● ●

●●

●●●

●●

●●

Balanfois

GP

GP−bottom

GP−Liana

Inselberg summit

Paracou Parare

PP

pH

Ctot

Ntot P2O5 Clay

Silt

Al

Fe

Mg+Mn+Ca+K

d = 2

NBA1−ABGH

NBA1−CDEF NBA1−KLMN NBA1−IJOP

NBA2−ABGH

NBA2−CDEF NBA2−KLMN

NBA2−IJOP NINS−ABGH NINS−CDEF NINS−KLMN

NINS−IJOP

NF21−ABGH

NF21−CDEF NF21−KLMN

NF21−IJOP

NH20−ABGH NH20−CDEF

NH20−KLMN

NH20−IJOP

NH21−ABGH NH21−CDEF

NH21−KLMN

NH21−IJOP

NL11−ABGH

NL11−CDEF

NL11−KLMN NL11−IJOP

NL12−ABGH NL12−CDEF

NL12−KLMN NL12−IJOP

NO13−ABGH NO13−CDEF

NO13−KLMN

NO13−IJOP

NPA5−ABGH

NPA5−CDEF NPA5−KLMN

NPA5−IJOP NPA6−ABGH NPA6−CDEF

NPA6−KLMN

NPA6−IJOP

NL18−ABGH

NL18−CDEF

NL18−KLMN

NL18−IJOP

P063−ABGH P063−CDEF P063−KLMN P063−IJOP

P064−ABGH P064−CDEF

P064−KLMN

P064−IJOP

P111−ABGH P111−CDEF P111−KLMN P111−IJOP

pH

Ctot

Ntot P2O5 Clay

Silt

Al

Fe

Mg+Mn+Ca+K

Eigenvalues

Page 86: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

82

single staggered matrix. The two submatrices were connected by adding a ‘dummy’

variable distinguishing Nouragues and Paracou sites by two different values. At each

site,weaddedUTMcoordinates(northingsandeastings)astwoadditionalexplanatory

variablesaftercentringandnormalization,soastoaccountforlinearspatialtrendsthat

cannotbecapturedbyPCNMvariables.

The total variance of taxonomic composition was partitioned between an

environmental component and a spatial component (function 'varpart'; Borcardetal.,

1992; Legendre et al., 2005). Two RDA-based forward selections of environmental

variables and spatial variableswereperformed separately (function 'ordiR2step'with

0.05thresholdp-valueforaddingavariabletothemodel;Blanchetetal.,2008),yielding

twoRDA-basedlinearmodels.WeonlyproceededwithvariableselectionwhentheRDA

conductedonallvariableswassignificant(p < 0.05;Blanchetetal.,2008);whenitwas

notforeitherenvironmentalorspatialvariables,wedidnotpartitionthevariance.

We then tested the predictions of the dispersal-limited neutral theory on the

dataset.Neutral processes arepredicted to yield a decayof taxonomic similaritywith

distance in the absence of dispersal barrier (Chave & Leigh, 2002). We used here

𝐹! 𝐴,𝐵 = 𝑝!!𝑝!!!!!! as ameasure of taxonomic similarity between samplesA andB,

where𝑝!!istheproportionofspeciessinsampleA,𝑝!! thatinsampleB,andSthetotal

number of species. Chave and Leigh (2002) predicted that in a continuous spatially

explicit dispersal-limitedneutralmodelwith spatial density of individuals𝜌, dispersal

parameterizedbyaGaussiankernelwithvariance𝜎!, anda rateof apparitionofnew

species equal to𝜈, 𝐹! 𝐴,𝐵 depends only on the pairwise distance𝑟between samples,

and can be expressed as𝐹! 𝑟 = − 𝑎 ln 𝑟 +𝑏, with𝑏 𝑎 = ln 2𝜈 2𝜎 + 𝛾(where𝛾is

Euler’s constant) and1 𝑎 = 𝜌𝜋𝜎! − ln 𝜈 /2 (cf. Appendix). We measured𝐹! among

pairsofsamplingpoints,regresseditagainstthelog-transformedgeographicaldistance

ln 𝑟 , and assessed significance byMantel test for 999 permutations, using Pearson’s

correlationcoefficientasteststatistics.Themeandispersaldistancepergeneration 2𝜎

canbeobtainedprovidedthatanestimateof𝜌isavailable.Forplants,weassumedthat

most of DNA retrieved came from tree species, and that the forest holds 500mature

trees(≥10cmdbh)perhectare,i.e.𝜌 = 0.05 m!!,whichisclosetoobserveddensities

Page 87: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

83

(seeConditetal.2002).Wealsocomputedthequantity𝜎! 𝜈,whichmaybeinterpreted

astheratiobetweendispersalabilityanddiversificationrate.

Figure4:Occurrence-based(Sorensen)dissimilarityasafunctionoflog-distance.Theredlinefiguresthelinearregression.

Finally,weconductedaseparateanalysistoexplorehowbetadiversitydepends

on logging activities. Because our sampling effort along this disturbance gradientwas

Sore

nsen

dis

sim

ilarit

y

Log10 of geographical distance

●●

●●

● ●

●●●●

●●●●

●●●

●●●●●●

●●●●●

●●

●●

●●●●●●●●

●●●

●●

●●

● ●●

●●●●

●●●

●●●

●●●●

●●

● ●

●●●●●

●●

●●

●●●●●●●●

●●●

● ●

●●●

●●

●●●●

●●●●●●●●●

●●

●●●●

●●

●●●●

● ●

●●●

●●

●●●●●

●●●●●●●●●

●●

●●●●

●●

●●●

●●●

●●●●

●●

●●●

●●●●●

●●

●●●●●●●

●●

●●●●●●●

●●●●

●●●

●●●●●

●●

●●●●●

●●

●●

●●●●●●●

●●●

●●●●●●●

●●●●

● ●●●●●●●

●●

●●●●●●●●●

●●

●●●●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●●●●●●●

●● ●

●●●●●●●

●●

●●●●●●●●

●●●

●●

●●●

●●●●●

●● ●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●●

●● ●●

●●●

●●

●●

●●●●

●●●

●●●●●●

●●●

●●●●

●●

●●●●●●●

●●

●●●●●●●

●●●

●●

●●●

●●●●●●

●●

●●●●●●

●●

●●

●●●●●●●●

●●●●●

●●

●●

●●●●●

●● ●

●●●●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●

●●

●●●●●●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●●●●●●

●●●●

●●●●●

●●

●●●●●●●●

●●

●●●●●●●●

●●●●

●●●●●

● ●●

●●●●

●●

●●

●●●●●●●●

●●●●

● ●

●●●

●●●●●●●●●

●●●● ●●

●●●●●●●●●●

● ●

●●

●●●●●●●

●●●●

●●●●

●●

●● ●

●●

● ●●●●●●●●

●●●●

●●●●●●●●●●●●

●●

●●

●●●●●●●●●

●●●●

●●

●●●●●

●●

●●

●●●●●●●●

●●

●●●●

●●

●●●●

●●

●●●

●●●●●●●●

●●

● ●●●●●

●●●

●●●●

●●

●●●●●●●●

●●

● ●●●●●●●●●●●●

●●

●●●●●●●●●

●●●●

●●

●●●●

●●●●●●●

●●●●

●●●●●●●

●●●●

● ●●●●●●●●

●●

●●●●●●●

●●●●

●●●●●●

●●●

●●●●

●●●●

●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●

● ●●

●●

●●

●●●●●

●●●●●●●

● ●●

●●●

●●

●●●●●

●●●●●●●

●●

●●

●●

● ●●●●●●●●●●●

●●●

●●

●●●●●

●●●●●●●

● ●●

●●

●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●

●●

● ●●●●●●●●●●●●

●●

●●●●●●●●●●●●●

●●●●●●●●

●●●

●●●●●●●●●

●●

●●●●●●●●

●●

●●●●●●●●●●●

●●●

●●●●●

●●

●●

●●●●

●●

●●●●●

● ●●●

●●●●

●●

●●●●

●● ●

●●●● ●●●●

●●●

●●

●●●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Bacteria 16S

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●●

●●●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●●●●

● ●

●●

●●

●●

●●●●

●●●●

●●●

● ●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

● ●

●● ●

●●● ●

●●

●●●

●●●●●●

●●

●●

● ●●●

●● ●

●●●●

●●

●●●

●●

●●

●●●●●●●●

● ●●

●●●

● ●

●●●●●●●

●●●

● ●

●●●

●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

● ●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

● ●

●●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Protists 18S

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●

●●

●●

●●●●●

●●●

● ● ●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●

●●●●●●●●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●

●●●●●●●●●●●●●

● ●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●●

●●●●●●●●

●●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●

●●●

●●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●●●

●●

●●

●●●● ●●

●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●●

●●●●●

●●●

● ●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●●●●●

●●

● ●

●●●●●●●●●●●

● ●●

●●

●●

●●●●

●●

●●●●

●● ●●●●●●●●●●

●●

●●

●●●●●

●●

● ●

●●

●●●

●●●

●●●●

●●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●●●●

●●●

●●

●●●

●●●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●

●●●

●●

●●●●●●

●●●

●●●●●●●●

●●●●

●●

●●

●●

●●

● ●

●●●

●●●●

●●

●●●

●●●●●●●

●●

●●

●●●

●●●●●● ●

●●

●●●

●●

●● ●

●●●●●●●●●

●●●●●●●●●●●●●

●●●

●●●

●●

●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●●●●●●

●●●●●

●● ●

●●●

●●●●●

●●

●● ●

●●●●●●●●

● ●●●●

●●●

●●●●● ● ●●

●●●

●●

●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●●●●

●●●

●●

●●●●●●●●●●

●●●

●●●●

●●●●

●●●●

●●●

●●●●●●●●

●●●

● ●●

●●●●

●●●

●●

●●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●● ●

●●● ●●●●●●●●●

●●● ●

●●●●●●●●●●●

● ●

●●

●●

●●

●●●●●●

●●

●●●●●●

●●●●●● ● ●

●● ●

●●● ●●●●●●●●●

●●●

●●●●●●●●●●●●

●●

●●●●

●●●

●●●●●

●●

●●●

●●●

●●●●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●

●●●

●●●

●●●●●●

●●●

●●●●●●●●

●●●

●●●●●●●●●●●●

●●●

●●●

●●●●●●●

●●●●●●●●●●●●

●●

●●●●●●●

●●

● ●●●

●●●●●●●●

●●

●●●

●●●●●

●●●

●●●●

●●●●●

●●●●

●●●●●●●●●

●●

●●

●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●

●●●●●

●● ●

●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●●

● ●

●●

● ●

●●

●●●●●●●

●●

●●

●●

●●●●●●●

●●●

●●

●●●●●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●●●●●

● ●

●●●

●●●

●●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●●●

●●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●●

●●●●

●●●●

●●●●

● ●●●

●●

●●●●

●●

●●

●●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Fungi ITS

● ●

●● ●●●●

●●●●

●●●●●●●● ●●●

●●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

● ●

●● ●

●●●

●●

●●

●●●●

●●●●

●●●

●●

●●●

●●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●● ●●●●

●●●●●●

●●

●●●● ●●●●

●●●

●●●● ●

●●

●●

●●●●●●●

●●●

●●

●●●●

● ●

●●

●●●●

●●●●

●●●●

●●●● ●●

●●

●●

●●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●●

●●●●●

●●●

●●●●●●●●●●●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●●●●●●●●●●●

●●●●●

●●

●●

●●●

●●●●●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●●●●●● ●

●●

●●●●●●●●●

●●●●●

●●●

●●●●●

●●●●

●●●●

●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●●●●●●●

●●

●●

●●

●●●●●●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●

●●●●●●

●●

●●●● ●

●●●●●

●●

●●●●

●●●●

●●●●

●●●●

●●

●●●●

●●

●●

●●●

●●

●●●●

●●●●●●●●

●●●●

●●

●●

●●

●●●●

●●

●●●●

●●●●

●●

●●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●● ●

●●●●●●●

●●●●

●●

●●

●●●

●●●●●

●●●●

●●

●●●

● ●●●●●●●

●● ●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●●

●●●●●

●●

● ●●●

●●

●●

●● ●●

●●

●●●●●

●●

●●

●●●

●●

●● ●●●

●●

●●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●●●●●●●●

●●●

●●

●●●●

●●

●●

● ●

●●

●●●●●●

●●●●●

●●●●●●●

●●●●

●●●

●●

●●

●●

● ●●●

●●

●●

●●●●● ●

●●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●●●●●●

●●

●●

●●

●●

● ●

●●●●●●●●●●●●

●●

●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●●●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●●●●●●

●●

●●

●●

●●

●●

● ●●

●●●●

●●

●●●●●

●●●

●● ●●

●●

●●●

●●

●●●● ●

●●●●●●●

●●

●●

●●

● ● ●

●●●●

●●

●●●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●●● ●

●●●●●●

●●●

●●●●

●●

● ●

●●●● ●●●●●●●

●●

●●

●●●

●●

●●

●●●●

●●●●●●

●●

●●

●●●

●●

●●●●●

●●

●● ●●●

●●●●

●●●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●●●

●●●

●●●

● ●

●●●●●●●●

●●●●

●●

●●

●●

●●

● ●

●●●●●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●●

● ● ●●●

●●●●

●●●●

●●●●

●●

●● ●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●●● ●

●●

●●

●●

● ●

●●●

●●●

●●

●●●●

●●●● ●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●●

● ●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Plants trnL

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●● ● ●

●●●●●

●●

●●●●

●●

●●●●

●●●

●●

●●●

●●

● ●

●● ●●

●●

●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●●

●●

●●●●

●●●●

●●●

●●

●●

●●

●●●●●●

●●

●●

●●

● ●

●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●●

● ●

●●●

●●●

●●

●●

●●●●

●●

●●●●

●●●●

●●

● ●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●

●●

● ●

●●●●●

●●

●●

●●

●●●●

●●●●

●●

●●●●●●

●●●

●●

●●●●●

●●

●●●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●●

● ●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●●

● ●

●●

●●

●●

●●●

●●

●●●●

● ●●

●●●

●●

●●

●●●

●●

●●

●●●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●●●●

●●

●●●

●●

●●●●

●●●●●●

●●

● ●●

●●

● ●

●●●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●

●●●●

● ●

●●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●●●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●●

●●

●●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

● ●●●

●●

●●●

●●●

●●●●

●●●●●

●●●

●●

●●●

●●

●●

●●●

●●●●

● ●●

●●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Arthropods 18S

●●

●●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

● ●

●●●

●●●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●●●●●

●●●

●●●●●

●●●

●●●●●●●●

●●

●●●●

●●●

●●●

●●●

●●●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●● ●●●

●●●●

● ●

●●

●●●

●●●●

● ●

●●●●

●●

●●

●●●

●●

●●●●●●●●

●●●

●●●●●

●●●●●●●●●● ●

●●

●●

●●●●

●●●●●

●●●

●●●●●●●●

●●●●●

●●●

●●●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●

●●●

●●●●

●●●

●●

●●

● ●●

●●●

●●

●●

●●

●●●●

●●●

●●●●●

●●●

●●●●

●●

●●●●

●●●●

●●

●●●●

●●

●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●●

● ●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●● ●●

●●●

●●●●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

● ●●

●●●

●●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●● ●●

●●●●

● ●●●

●●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●●

●●●

●●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●●●●●●●

●●●●

●●● ●

●●●

●●●

●●●

●●●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

● ●●

●●●●●

●●

●●●●●●●●●●●

●●

●●●●

●●

●●●

●●●●

●●

●●●●●●

●●●

● ●

●●

●●

●●●●●●

●●

●●●

●●●● ●●

●●●●●

●●

● ●

● ●

●●

●●●●

●●●

●●

●●●●●

●●

●●●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●●

●●●●●

● ●

●●

●●●●●

●●●●●

●●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●●

●●●●●●

●●●●●●

●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●●

●●●●

●●

●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Insects 16S

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●● ●

● ●●●●

●●

●●

●●

●●●

●●

● ●

●●

●●●●

●●

●●

●●●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

● ●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●●●

●●●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●●

●●

●●

●●●

●●

● ● ●

●●

●●

●●●● ●

● ●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●

●●

●●●● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

● ●

● ●●●

●●

●●

●●●●

● ●● ●

●●

● ●

●●

●●

●●●

● ●●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Annelids 18S

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

● ● ●

●●

●●

●●●●

●●

●●

●●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●● ●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

● ●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●●●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Nematodes 18S

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●●● ●

●● ●●●●●●●●●●●● ●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Platyhelminthes 18S

Page 88: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

84

limited,wesimplyinvestigatedtherelativeeffectofdisturbanceandsoilconditionson

the various taxonomic groupswithout accounting for spatial structure.Wemeasured

the Sorensen dissimilarity index among pairs of sampling points in all Paracou and

Arbocel plots, both disturbed and undisturbed. We quantified logging intensity by a

dummyvariabletakingvalue0inundisturbedlocations(ParacouP6andP11areas),1

inmildly disturbed ones (Paracou P12 area), and 2 in strongly disturbed ones (clear

cutting;Arbocel).Wethenfollowedasimilarapproachasforthecomparisonbetween

soil effects and spatial aggregation in themain dataset.We performed amultivariate

linear regression (i.e., aone-dimensionalRDA)of theOTUabundancedata (Hellinger-

transformed and OTU-centred) against the logging intensity variable (centred and

normalized). When the linear regression was significant, we partitioned the total

variance of taxonomic composition between a logging intensity component and a soil

component.Weobtainedthesoilcomponentaspreviously:weperformedaPCAonsoil

variables, kept the first four axes, and built a RDA-based model by forward variable

selection.

Page 89: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

85

Results

Chemical and physical soil properties varied across the samples (Table S2). The pH

rangedfrom3.8to5.5,Ccontentfrom1.9%to4.2%,Ncontentfrom0.12to0.31%,and

P contentwas very low (see also Grau etal., 2017). Soilswere also poor in terms of

exchangeablecationcontent(K+,Ca2+,Mg2+,Mn2+),andvariedsignificantly in termsof

texture,with sandy (up to 80% sand) to clayey (up to 80% clay) soils. Paracou soils

tended tobesandierandmorenutrient-poor thanNouraguessoils.This suggests that

the Nouragues-Paracou comparison compounds geographical distance and

environmentaldistanceeffects.ThefirstPCAaxis(40%of totalvariance)corresponds

toorganicmatter(totalcarbonandnitrogen)andclaycontents,whicharecorrelatedto

aluminium concentration and anticorrelated to pH; the second PCA axis (30% of

variance) corresponds to nutrient and silt contents, the third to phosphorus (13% of

variance)andthefourthtoiron(7%;Fig.3).

Geographicaldistance Soil

Mean𝐷!"#$%&$%

𝑟𝒅𝒊𝒔𝒕 𝑟𝒅𝒊𝒔𝒕,𝒑𝒂𝒓𝒕 slope𝒅𝒊𝒔𝒕 𝑟!"#$ 𝑟𝒔𝒐𝒊𝒍,𝒑𝒂𝒓𝒕 slope𝒔𝒐𝒊𝒍

PlantstrnL 0.42 0.65*** 0.61*** 0.038 0.29*** 0.06 0.011

Bacteria16S 0.49 0.16* -0.02 0.014 0.46*** 0.44*** 0.028

Protists18S 0.60 0.16** 0.05 0.012 0.30*** 0.26*** 0.015

FungiITS 0.87 0.43*** 0.29*** 0.029 0.54*** 0.45*** 0.025

Arthropods18S 0.53 0.36*** 0.29*** 0.026 0.28*** 0.17* 0.014

Insects16S 0.89 0.23*** 0.16** 0.013 0.25*** 0.18** 0.010

Annelids18S 0.35 -0.031 -0.08 -0.004 0.10 0.12 0.009

Nematodes18S 0.70 0.11* 0.09 0.012 0.05 0.02 0.004

Platyhelminthes18S 0.57 -0.079 -0.11* -0.015 0.07 0.10 0.009

Table1:Linearregressionoftaxonomicdissimilarity(Sorensenindex)againstsoilandgeographicaldistance.𝑟!"#$ ,𝑟!"#$ ,𝑟!"#$,!"#$ ,𝑟!"#$,!"#$arethesimpleandpartialPearson’scorrelationcoefficients.SignificancewasassessedusingManteltests:***forp<0.001;**for0.001<p<0.01;*for0.01<p<0.05.

Page 90: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

86

Sorensendissimilarityvariedacross taxonomicgroups, and for thesamegroup

depending on the tested DNA barcode (Table 1; see Table S4, Fig. S2 and S3 for a

comparisonbetweenbarcodeswithingroup). Itwashighest for insects16Sand fungi

ITS (ca. 0.9 in average), and lowest for annelids and plants trnL (ca. 0.4 in average).

When plotted against log-transformed geographical distance (Fig. 4), Sorensen

dissimilarityshowedastronglysignificantcorrelationforplants,fungi,arthropodsand

insects,aweakcorrelationforprotists,bacteria,andnematodes,andnocorrelationfor

annelids and flat worms (by decreasing order of correlation coefficient; Table 1).

Sorensendissimilaritywasalsoregressedagainstsoildissimilarity(Fig.5).Wefounda

strongcorrelationtosoildissimilarityinfungi,bacteria,protists,plants,arthropodsand

insects, andnocorrelation inannelids, flatwormsandnematodes (Table1).To testa

possible collinearity between soil dissimilarity and geographical distance, we finally

computedthepartialcorrelationrdist,parttolog-distanceconditionalonsoildissimilarity.

Thepartial correlation to log-distancewassignificant inplants, fungi, arthropods,and

insects,butnotintheothergroups.Conversely,whencomputingthepartialcorrelation

rsoil,part to soildissimilarity conditionalon log-distance, the correlationwas retained in

fungi,bacteria,protists,insectsandarthropods,butlostinplants.

RDA-basedpartitioningofbeta-diversityshowedthatenvironmentalfactorsand

spatialaggregation togetherexplainedaproportionofbeta-diversity thatranged from

45%inbacteriatozeroinflatworms(Fig.6,Tables2,S5).Withinthefractionofbeta

diversityexplainedbysoileffects,thefirsttwosoilPCAaxeswerethemainexplanatory

factors,withthesilt-nutrientaxisplayingaparticularlyimportantroleinbacteria(Fig.

S4). The relative contribution of spatial aggregation and soil properties varied across

groups,withamajoreffectofspatialaggregationrelativetosoilinannelidsandplants,

while both effects were of the same magnitude for bacteria. While the collinearity

betweenenvironmentalandspatialvariables introduceduncertaintyas to theiractual

relative importance to beta diversity, pure spatial aggregation explained an equal or

higherproportionofthevariationcomparedtopureenvironmentalfactorsinallgroups.

Forbacteriaandprotists,thiscontrastswiththeconclusionsofdistance-basedanalyses.

The fit of the neutral prediction for the decay of taxonomic similarity𝐹!with

geographical distance was statistically significant for plants, bacteria, protists, fungi,

insectsandannelids,butnotforarthropods,nematodesandflatworms(TableS6,Fig.

Page 91: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

87

S6).Atagivengeographicaldistance, the𝐹!statistic tendedtobemorescatteredthan

Sorensendissimilarityandtoexhibitoutliers(Fig.S6).Assumingadensityofoneplant

individualper20m2,asmeasured formatureneotropical forest trees,weestimateda

mean dispersal distance per generation of 43 m in plants. The dispersal to

diversificationratio𝜎! 𝜈washighestforfungiandinsects,intermediateforplantsand

annelids,andsmallestforprotistsandbacteria(TableS6).

Finally, we found that past logging activities had the strongest effect on plant

composition (TableS7,Fig.S6).Theyalsohadaneffectonannelids,whichwas larger

than the effect of soil conditions, and a small but strongly significant effect on fungi.

However,theyhadlittletonodetectableeffectonothergroups.

Puresoilfraction Mixedfraction Purespatial

fractionTotalexplained

variance

PlantstrnL 2.4*** 7.8 11.0*** 21.1***Bacteria16S 12.7*** 18.5 14.0*** 45.2***Protists18S 2.2** 8.7 10.0*** 20.8***FungiITS 3.8*** 4.9 5.9*** 14.5***Arthropods18S 1.5* 2.8 2.4** 6.7***Insects16S 0.1 1.3 1.5** 2.9***Annelids18S 5.5** 5.5 15.3*** 26.2***Nematodes18S 1.4** 1.4 2.4*** 5.2***Platyhelminthes18S NA NA NA NA

Table 2: Fractions of variance (adjusted R2, in%) explained by Canonical RedundancyAnalysisforenvironment-onlyandspatial-onlymodels.Significance:***forp<0.001;**forp<0.01;*forp<0.05.

Page 92: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

88

Discussion

We have explored the patterns of soil beta diversity in the tropical forests of French

Guianabasedonfifteenundisturbed1-haplots,aswellasfourdisturbedplots.Distance-

basedanalysesusingSorensendissimilaritysuggest thatatourstudyscale,plantbeta

diversity is driven predominantly by geographical distance, bacteria and protist beta

diversitybysoilproperties,whilefungi,arthropodandinsectbetadiversitydependson

both types of factors. Finally, annelid, nematode and flatwormbeta diversity did not

correlatewithanyofthesefactors.Theobservationthatbothgeographicaldistanceand

environmentplayarole inexplainingcommunityassemblyhasalreadybeenreported

for a range of taxonomic groups, either in eukaryotes or in bacteria (Cottenie, 2005;

Thompson&Townsend,2006;Martinyetal.,2011).However,ourresultsareoneofthe

rarecasestudieswherebetadiversityhasbeenquantifiedacrossthesamesitesovera

broadrangeoftaxonomicgroups.

The dependence of plant beta diversity on geographical distance in tropical

forests has been reported in the past, and has been presented as evidence for the

importance of dispersal-limited neutral processes in shaping these ecological

communities(Conditetal.,2002).Likewise,thestrongdependenceofbetadiversityon

soil conditions in unicellular organisms (bacteria and protists) is in agreement with

expectations(Soininenetal.,2007;Ramirezetal.,2014).Whilewecouldexpectfungito

be primarily responsive to environmental conditions owing to their good dispersal

abilities, widespread plant-fungi associations may be responsible for the observed

dependenceonbothenvironmentalconditionsandgeographicaldistance(Bahrametal.,

2013).Indeed,dispersalishamperedbyhostspecificity,andthethedistributionofhost-

specificfungaltaxareflectsthatoftheirplanthosts.

For insects, previous studieshave reported a lowbetadiversity (Novotnyetal.,

2007; Basset et al., 2012). However, these studies have primarily focused on above-

groundherbivores,whichareknowntohavegooddispersalability.Incontrast,wehave

sampledsoil-dwellinginsects,andthusourfindingthattheseorganismshavehighbeta

diversity, influenced by both soil properties and dispersal limitation, does not

Page 93: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

89

necessarily contradicts the results of previous publications. However, our finding is

significant because it shows that spatial patterns of biodiversity in insects cannot be

easily generalized across ecosystem compartments. Finally, annelids, nematodes and

flatworms are represented by a limited number of OTUs (Table S1), and the lack of

patternsinthesegroupsmightbeduetoalackofstatisticalpower.

Sore

nsen

dis

sim

ilarit

y

Soil dissimilarity

●●

●●

● ●

●●● ●

●● ●

●●●

●● ●●●

●●

● ● ●

●●

● ●

●●

●●●● ●

●●●

●●

●●

● ●●

●●

● ●

● ●●

● ●●

●●

●●

●●

●●

●●

● ● ●

●●

● ●

●● ●●●● ●

●●●

● ●

●●

●●

●● ●

●●

●●

● ●● ●●

● ●

●●

●●

● ●

●●●●

● ●

●● ●

●●

●●

●●●

●● ●

●●● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●●● ●●●

● ●

●●

●●●● ●

●●

●●

●●

●●

●● ●●●

●●

● ●● ●●

● ●

●●

●●● ● ●

●●

● ●●

●●

●●●● ●

●●

●●

●●●● ●●

●●

●●

●●

●● ● ● ●●

●●

●●

●● ● ●

●●

● ●

●●

●●●

● ●

●●

●●●●

●●

●●

●●●

● ●● ●

●● ●

●●

● ●●

●●

● ●

●●

●●●● ●●

●●●

●●

●● ●

●●

● ●●

●●● ●

●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●● ● ●

● ● ●

●●

●●

●● ●●

●●

●●●

●● ●

● ●●

●●

●●

●●

●●● ● ●

●●

● ●

●●●●

●● ●

●●●

●●

● ●●

●●

●●● ●

●●

●●

●●

● ●

●●

● ●

●● ●●●● ●

●●●

● ●

● ●

●●

●●

● ●●

●● ●

●● ● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●●●●●

●●

●●

●●

●● ●

●●

● ●

●●● ●●● ●

● ●●

●●

●●●

●● ●●

●●

●●

●●

● ● ●

●●

● ●

●●

● ●●● ●

● ●●●

●●●●●

●●

●●●

●● ●

●●

● ●

●●● ●●

●●

● ●●●

●●

●●●

●● ●

●● ● ●

●●

● ●

●● ●●●● ●

●●

●●

● ●

●● ●

●●

●●● ●

●●●

●● ●●●●

●●●●●

●●●

●●

●●

● ●

●●

● ●●

●●

●● ● ●

●●●

●●

●●●

●●

●●●●

●●

●●●●

● ●●

●●

●●●●●

●●●●

●●

●●

●●● ●

●●

●●●

●●

●●

●●

●●●●

●●

● ●

●●● ●

● ●●●●●

●● ●●

●●

●●●

●●

●●●

● ●● ●● ●●

● ●

● ●● ●●

●●

●●●●

●●

● ●● ●● ●●

●● ●

● ●●● ●●

●●●

●●●

●●

●● ●●

●●

●●●

●● ●

●●

●●●●

●●●●●

●●

● ●●

●●

● ●●● ●

●●●●

●● ●●

●● ●●●

●●

●● ● ●●● ●

●●●

●● ● ●●●

●●

●●●

●●●

●●●

●● ● ● ●●

● ●●

●●●● ●

●●●

●●●●

●●

● ●●

●●

● ●

●● ● ●●

●●

●● ●●●

●●●

●● ●

●●

●●●

●●

●●●● ●●

●●

●●

● ●

●●●● ●

●●● ●●●●

● ●●

● ●

●● ● ●●

●●

●● ●●●

● ●●

●●

●●

● ●●

●● ●● ●●●

●●

●●

●●● ●

● ●● ●

●●●●

●●

● ●●

● ●●

●●●● ●●

●●

●●

●● ●

●●● ●● ●●

●●●

●●●●●

●●●

●● ● ●●●● ●●

●●

●● ● ●●●●●

●●

●●●

●●●●

●●●●

●●●

●●●● ●

●●

●●

●●

●●

●●●

● ●●●

●●● ●

●● ● ●

●●

●●●●

●●●

●● ● ●●●●●

●●●

●●●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Bacteria 16S

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●●

●●●

● ●

●●

●●

●●

●●●

● ●

●●

●● ●

● ●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

● ●●

● ● ●●

●● ●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●● ●

●●●

● ●

●●

●●

●●

●●

● ●●

●●●

●●●

●●●

●●

●●

●● ●

●●

● ●

●●

●●

● ● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

● ● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

● ●●

●●

●●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●● ●

●●●

●●●

●●

●●

●●

●●●●

●●

●● ● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●●●●

●●

●●

●●●● ●

●●

●●

●●

●●

●●●

● ●● ●

●● ●

● ●●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●● ●

●●● ●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●● ●●

●●

●●●

●●

●●

●●●●●

●●

● ●●

● ● ●

● ●

●● ●●

●●●

●●●

●●

●●●

●●

● ●●

●●

●●●

●●●●●

●●●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●● ●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●● ●●

●●●

●●

● ●

● ●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●●

● ●

●●●

●●

●●

●●●

● ●

●●● ●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Protists 18S

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ● ●● ● ●●

●●

● ●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●● ●

● ●

● ●● ●

●●● ●

●●● ●●

●●

●●

● ● ●●

●●

●●

● ●

●●●

●●

●●

● ●●●● ● ●● ●

● ●

●●●

●●●

● ● ●●●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●● ●

● ●

● ●●

●●

●●

●●●● ● ●

●●●

●●

● ●●

●●

●●

●●

● ●●●

● ● ●

● ●●

● ●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●●

●●

●● ●●●

●●●

●●

●●

●●

●●

● ●

●●

● ●●

●●●

●●

● ●●

●●● ●●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●●●

● ●●

● ●

●●

●●●●

●●

●●●●●

● ●●

●●

●●

●●●

●●

●●

●●

●●● ●●●

●●

●● ● ●●●

● ●

●●● ●

●●

●●

●●

●●●

●●

● ● ● ●

●●●

●●● ● ●●●●

●●

●●

●●●

●●●

● ●

●●●●●

●●

● ●

●●

● ●●●

●● ● ● ●●● ● ● ●●

●●●●

● ●

●●●

●●

●●

●●

●●

● ● ● ●●●

●●

●●

●●●

● ●●●

● ● ●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●●

●● ●● ●

●●●

●●

●● ●

●●●●●

● ●●

●●●

● ●

●●●

●●● ●●●● ● ●

●●

● ●●●

● ●●● ●●

● ●●

●●

● ●

●●

●●●

●●●

● ● ●● ●● ●●

●● ●●●●

● ●●●●●

● ●

●●

● ●

●●

●●

●●

●●

●● ●● ●

● ●●●

● ● ●

●●●●●

● ● ●

●●●

●●●

●●

● ●

●●●

●●

●● ●●

● ●●

●●● ●●

●●

●●●●

● ●● ●

●●

● ●

●●

●●● ●●●

● ●●●

●●

●●

● ● ●●● ● ●● ●●

●●

●●

●●

●●

●●

● ●●● ●●

●● ● ●

●●

● ●

●●●

● ●● ●●

●●●

●●

●● ●●● ● ●

● ● ●●

●●●

●● ● ●

●●●●

●●●

●●

● ●●

● ● ● ●●●

●●

●●●

●●● ●

●●

●●

● ●●

● ●

●●

●●

● ● ●●●● ●

●●●●●

●● ● ●●●●● ●●●

●●

●● ●●●● ● ●● ● ●

●●

●●● ●

●●● ●●●●

● ●●●●●

●●

●●

●●

● ●

●●

● ●●●

●●

●● ● ●●●

●●●●●●● ●

●● ●

● ●●● ●● ● ● ●●● ●

●●

●●● ●●●●●

●●●●●

●●

●●

●●

● ●●

●● ●●●

● ●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●●

●● ●●●●

● ●●●●●

●● ●●●

●●● ● ●

●●●

●●

●●●

● ●●●●●

●●●

●●

●●● ●●●

● ● ●

●●● ●●●● ●●●●●

● ● ●

●●●

● ● ●●●

●●

●●

●●●

●●●●●●●

●●

● ●●●

●● ●

● ●

●●●●

●● ● ●●●●●

●●

● ●●

●● ● ●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●● ●●

● ●●

●●

●●

● ●●●●

●●

● ● ●●

●●

●●

●●●

●● ●

●●●●

●● ●

● ●●

●●●

●● ●●

●●

●●

●●

● ●

●●

●●

●●● ●●●

● ●

●●

● ●

●●

●●●

●●●●

●●

●●

●●

● ●●● ●●

●●

●●

● ●● ●●

●●

●● ●

●●

●● ●●

●●●

●●

●●

●●

●●●●

●●●

● ●

●●

● ●●

●●●

● ●

●●●

●●●

●●●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

● ●●

● ●●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Fungi ITS

● ●

● ● ● ● ●●

●●● ●

●● ●

●●●●●● ●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●● ●

● ●●

●●

● ●

●●● ●

●●

● ●● ●

●●

●●●

●●

●●

●●●

●●●

●● ●●

●●

●●

●●

●●

● ● ● ● ●●

●●● ●●●

●●

●●● ●●●

● ●

●●

●● ●●●

●●

●●

● ●● ●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●●

●●●

●● ●●

●●

●●

●● ●

●●●

●●

●● ●

●● ●

●●

●●

● ●

●●

●●

●●●●●

●●●●

●●● ●

● ●●

●●

●●

● ●●●●

●●● ●● ●● ●

●●

●●●●

●●

●●

●●

●●

● ●●●●●

●●

● ●●

●●● ●●

●●

● ●●

●●

●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●● ● ●

●●●

●●●

●●

●●●●●

●●

●●

●● ●● ●

●●

●●●●●

●● ●●

●●● ●

●●●

●●●● ●

● ●●●●

●●●●

● ●●

●●● ●

●●● ●

● ●

●●● ●

● ●

●●

●●●

●●●

●●

● ●●

●● ● ●●●● ●

● ●

● ●

●●

●●●● ● ● ●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●● ● ●

● ●

● ●●

● ●

●●●

● ●

●●

● ●●●

●● ●

●●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●●●

●●●

●●

●●●● ●

●●

●●

●●

●●

●● ●●

● ●

●●●

●●●

● ●

●●

●●

●●

● ●●●

● ●

●●●●

●●

●●

●●

●●

●●

● ●● ●

●●● ●

● ●●

● ●●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

● ●●● ● ●● ●

●●●

●●●

● ●

● ●

●●

● ●●

●●●

●●

●●

●●●

● ●

●●

●●

●● ●

●●

●● ●

● ●

●● ●●

● ●

●●

●●●●

●●

●● ● ●●

●●

●●

●●

●●

●● ● ●●

●●

● ●●

●●●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●● ● ●●

●● ●

●●

● ●● ●

●●

●●

●●

● ●

●●

●●●

●●

● ●

●●● ●●●

● ●●

●● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●●

●●●

●●● ● ●●● ●●

● ● ●●

●●

●●

●●

●●

● ●●●

●●

●●

●● ● ●●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●

●●

● ●●

●●

●●● ● ●

●● ● ●●●

●●

●●

●●

●●

●●●

●● ●●

●● ●

●●

●●

●● ●●

● ● ●●●

●●

●●

● ●

●●●

● ●

●● ● ●

●●

●●

●●

●● ● ●

● ● ●●●

●●

● ●

●●●

●● ●

●●

●●

●●●

●● ● ● ●●

●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●● ● ●

●● ●

●●●●●

●●●

●●

● ●●● ●

●●● ● ●

●●

●●

●●

●●

●● ●

● ● ●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●● ●● ●

●●●

●●●

●●●

●●

●●

●●

●●

● ●●● ● ●●●

● ●●

● ●

●●

●●●

●●

●●

●●●●

●● ●● ●●

●●

●●

●●●

●●

●●●●

●●

●●● ●●

●●●

●●●

●●●

●●

●●

●●

●●●

● ● ●●

● ●

●●

●●● ●

●●

●●

●●

●●

●●● ● ●●

●●

●●●●

●●

●●

●●

● ●

●●● ●

●●●

● ●●

●●

●● ●●

●●●

●●

●●

●●

● ●●

●●

●●

● ●●

●● ●●●

●● ●

●●

●●●

●●●

●●

●●●

●●

●●●

● ●

●● ●

●●●

● ●

●●

●●●

● ●●

●●●

● ●

●●

●●

● ●

● ● ●

●●●

●●

●●

●●

●● ●

● ●●

●●

●●

● ● ●●●●

●● ●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Plants trnL

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

● ●●

● ●

●●

●●● ●

● ●

●●

●●

●●●

● ●

●●●

●●

●●

● ● ●●

●●

●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●

●●●●●

● ●

●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

● ● ●

●●

● ●●

●●●

●●

●●●

●●

●●

● ●

● ●●

●●

●●

● ●●

● ●

●●

●● ●

●●●

● ● ●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

● ●●

●●

●●

●●

●● ●●

●●

●●

● ●

●●●●

●●

●●

●● ●● ●

●●

●●

●●

● ●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●● ●●

● ●

●●

●●

● ●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●● ●

●●

●●

● ● ●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●● ●

●●

●●●

●●

●●

●●

●●●

●●● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●●

●●●

●●

●●

●●●

● ●

●●

●●●●● ●●●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●●

●●●●●

●●

●● ●

●●

● ●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

● ● ●●

●●● ●●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

● ●

● ●●●●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●● ● ● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●●

●● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●●

● ●

●●

●● ● ●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●●

●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

● ●●

●●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

● ● ●●

●●

●●●

● ●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Arthropods 18S

●●

● ●● ● ●

●●

●●●

●●

●●

●●

●●

● ●

● ● ●●

● ●

●●

●●●

●●

●●

● ●

●●

●●●● ●

●● ● ●

●●

●● ●

● ●●●

●●

● ●

● ●

●●●● ● ●

●●●

●● ●

●●●

● ●

●●●●

●●● ●●●

●●●●

●● ●

● ● ●

●●●

●● ●

●●●●

●●

●●

● ● ●●

●●

●●

●●

●●

●●●●

● ●● ●

●●

●●●● ●

●●

●●

●●

● ●

●●

● ●●●

●●

●●●●

● ●

●●

● ●●

● ●

●●●●●●

●●

●●●

●●●●●

●●

●●● ●●●

● ●●

●●

●●

●●●

● ● ● ●●●● ●

●● ● ●●

●●●

●●

● ●●

●●●

●●

● ●●

●●

●● ●

● ●●●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●

●● ●

●●

●●

●●

●●●

●●

● ●●

●● ●

●●

● ●

●●

● ●

●●●

●●● ●

●●●

●●

●●

● ●●

●●

● ●

●●

●●

●●●

●●●

●●● ●●

●● ●

●●●●

●●

● ●●●

● ● ●●

●●

●●● ●

●●

●● ●●

●●●

●●●

●●

●●

● ●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●● ●●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●

●●●

●●●

● ●

●●

●●

● ● ● ●●

●●

●●

●●

●●

● ●

●●● ●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●●

●●

● ●●

●●

●●

●●

●● ●

●● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●● ●●

●●

●●

●●●●

●●●

● ●●●●●

● ● ●

●●

● ●

●●

●●

●●

●●●

● ●●●

●●

●●

●●●

●●

●●

●● ●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

● ●● ●●

●●

●● ●

●●

●●●●

●●●●

● ●

●●●

●●●

●●

●●

●●●

●●● ●

●●●●

●●

●●

● ●●

●●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●●● ●

●●

●●

● ●●

●●

●● ●

● ●

●●●

●●

● ●● ●●●

●●

●●●

●●

● ●●

●●●

●●

●●●● ●

●●

● ●

●●

●●

●●● ●●

●●

● ●●

● ●●

●●●

●●●●

●●

●●

● ●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●● ●●

●●

● ●●

●● ●● ●

●●

●●

● ●●●

● ●

●●● ●

●●

● ●

●●●

●●

●●●

●●●

● ●●

●●●

●●●

● ●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●●

●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Insects 16S

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ● ●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

● ●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●

●● ● ●

●● ●● ●

●●

●●

●●

●●●

●●

●●

●●

●● ●●

●●

● ●

●●●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

● ●

● ●

●●

●●●

●● ●●

● ●

● ●

●●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●●

●● ●

● ●

●●

● ●

●●

●●

●●●

● ●●

● ●

●●

● ●

●●

●● ●

● ●

●●

● ●

●●

● ●●

● ●

● ●

●●●●●

● ●●

●●

●●●

●● ●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

● ● ●

●●

●●

●●

● ●

● ● ●●

●●●

● ●

● ●

●●

● ●●

●●●●

●●

● ● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ● ●

●●

● ●

● ●

● ●

●●

●●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●●

●●

●●

●●

●●

● ●

●● ●

● ●

●●

●●

● ●

● ●

●●●

●●

●●

● ●

● ●

●● ●

●●

●●

●●●

●●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●● ●

● ●

●● ●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●●● ●

●● ●●

●●

●●

●●●

● ●

● ●

●●

●●●

●●

●●

● ●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●●● ●

●●

●●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●●

●●●

●●●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●● ● ●

●●●●

●●

●●

●●

● ●

●●●

●●●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Annelids 18S

● ●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●●

●●

●●

● ●●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●●

●●

●●

●● ●●

●●

●● ●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●● ●

●● ●

●● ●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

● ●

● ●

●●●

●●

●●

●●

●●

● ●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●

●●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ● ●●

●● ●● ●● ●● ●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Nematodes 18S

● ●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

●● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●●

●●

● ● ●

●●

●●

●●

● ●●

● ● ●

●●

●●

●●

●●● ● ● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ● ●

●●

●●

● ●●

● ● ●

●●

●●

●●

●●

●●

● ● ●● ●

●● ● ●● ● ● ●●●● ● ●●●● ●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●● ● ●●●●●●●●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Platyhelminthes 18S

Page 94: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

90

Figure5:Occurrence-based(Sorensen)dissimilarityasafunctionofsoildissimilarity.Soildissimilarity is computed as the Euclidian distance between the first four PCA axes of themeasuredsoilvariables.Theredlinefiguresthelinearregression.

TheRDA(‘rawdata’)approachledtoslightlydifferentconclusionsthanMantel-

basedcorrelations,andbroughtadditionalinsight.Asignificantfractionofannelidbeta

diversitycouldbeexplainedbyspatialaggregation,whiledistance-basedanalysesusing

Sorensen dissimilarity did not detect any signal in this group. This is in linewith the

limited dispersal abilities reported for annelids in this area (Decaëns et al., 2016).

Spatial aggregation was also found to be an important factor explaining the spatial

distributionofprotistsandbacteriainadditiontosoilproperties.Incontrast,wefound

little explanatory power for insects and arthropods. Overall, this is in line with the

highersensitivity tospatial structurereported in the literature for ‘rawdata’analyses

(Legendreetal.,2005),eventhoughtheinterpretationofthisspatialstructureasbeing

indicative of neutral processes is not straightforward (Smith & Lundholm, 2010).

However, a potential problem in our study design is that the logarithmic geographic

samplingschemeisnotideallysuitedtothedescriptionbyPCNMvariables.Becauseof

thechallengingnatureofextractingDNAonsite tominimizecontaminations,wecould

notmultiply the number of sampling points, butwe hope to address the issue of the

samplingdesignforDNA-basedbetadiversityanalysesinaforthcomingcontribution.

The fit of the neutral prediction for the decay of similarity with distance was

significant forall taxonomicgroupsexceptarthropods,nematodesandflatworms,but

waspoorerthanthefitofSorensendissimilaritytolog-distance.Apossibleconfounding

factoristhatunliketheSorensenindex,the𝐹! similaritymeasureissensibletonoisein

OTUabundances,andmayalsobebiasedbyunevensamplingeffortamongsamplesin

DNA-based data. Overall, a decay of𝐹!similarity with distance was detected in the

groups forwhich raw-data analyses showed an effect of spatial aggregation,which is

consistentwith the fact that both types of analysis rely on abundance information. In

particular,adecayof𝐹! similaritywithdistancewasfoundinannelidswhilenonewas

detected using Sorensen dissimilarity, which suggests that in this group, differences

between samples lie in the abundance pattern of OTUs rather in their occurrence

pattern.

Page 95: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

91

Figure 6: Variance partitioning between soil PCA axes and spatial structure (PCNMdecomposition).Thespatialmodel is thereunionof two independentPCNMdecompositions,one for the Nouragues sampling sites and one for the Paracou sampling sites, plus the UTMcoordinatesinbothgroupsofsites.ThetwoPCNMdecompositionsareconnectedbyadummyvariablethattakesonevalueinNouraguesandanotherinParacou.Forwardvariableselectionisperformed on soil and spatial variables before variance partitioning. Hatching indicates non-significantpurefractions.

Our estimate of 43m for themean dispersal distance per generation in plants

wasclosetothatmeasuredempiricallyforneotropicaltrees(39m;Conditetal.,2002),

andtothatestimatedbyfittingtheneutralsimilaritydistance-decaypredictiontotree

censusdata (between40and73m;Conditetal.,2002).Becausean importantpartof

theretrievedplantDNAoriginates fromthetreerootsystem,conflatingthedensityof

plant individuals with that measured for trees may be a reasonable assumption,

howeversuchestimatesforthedensityofindividualsaredifficulttoobtainintheother

Pure spatialMixedPure soil

0.0

0.1

0.2

0.3

0.4

Varia

nce

parti

tioni

ng (R

DA)

Bacter

ia 16

S

Protists

18S

Fungi

ITS

Fungi

18S

Plants

trnL

Plants

18S

Arthrop

ods 1

8S

Annelid

s 18S

Nemato

des 1

8S

Platyh

elmint

hes 1

8S

Insec

ts 18

S

Insec

ts 16

S

Page 96: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

92

taxonomicgroups.Thedispersaltodiversificationratio𝜎! 𝜈isdirectlymeasuredasthe

ratiobetween the intercept𝑏and theslope𝑎of the linear regressionof𝐹!against log-

distance(cf.Appendix).Lower𝜎! 𝜈in fungiand insectsreflects the lowmean levelof

similaritybetweensamplesinthesegroups,whichis,underaneutralmodel,indicative

offasterdiversificationthandispersal,whilethereversewouldholdtrueinhigher-𝜎! 𝜈

bacteriaandprotists(TableS6).

The challenge of measuring beta diversity is critical in conservation biology

(Koleff et al., 2003; Socolar et al., 2017), and today the vast majority of the lowland

tropicallandscapesarepartlydeforestedoratleastdegradedbyhumanactivities,with

directandmeasurableimpactonbiologicaldiversity(Barlowetal.,2016).Thetropical

forestsof FrenchGuianahave experienced low ratesof forest clearanceover thepast

decade (Hansen et al., 2013) and our sampling sites can therefore be considered as

undisturbed, and a baseline for the many studies focused on disturbed landscapes.

Hence, in our study, the processes shaping community assembly are unlikely to be

ascribed to human factors. We acknowledge that humans may have had previously

unnoticed impactsonbiodiversityespeciallyoncultivatedplants (Heckenbergeretal.,

2008) or earthworms (Marichal et al., 2010), however the great majority of our

undisturbedsitesarelocatedfarfrompresentorhistoricallocationsofdisturbancesand

wearethereforefairlyconfidentthatthepatternswehaveuncoveredarecontingenton

natural processes. However, to better quantify the possible magnitude of human

disturbances,wealsostudiedhowbetadiversityisalteredbyintensiveloggingandby

clear-cutting,atsiteswheretheforesthashadatleast18yearstorecover.Differencesin

vegetationareeasilynoticeableonthefield,andareindeedreflectedinourDNA-based

study.Thisanalysis,althoughlimitedinthenumberofsamples,alsoshowsaneffectof

pastloggingactivitiesonannelidsandtoalesserextentonfungi;howeverlittleimpact

ontheothercomponentsofsoilbiodiversitycanbedetected.

The current study is predicated on our assumption that DNA-basedmetrics of

betadiversitydocapturethesameecologicalprocessesasclassicones.Wedidfindthat

our data capture most of the diversity present in our soil samples, as indicated by

rarefaction analyses (Fig. S1).We also testedwhether our resultsweredependent on

the choice of the DNA barcode, by comparing the results obtained for the same

taxonomic groupwith twodistinctDNAbarcodes (Tables S4, S5, Fig. S4, S5). Inmost

Page 97: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

93

cases, the results appear robust to the choice of the DNA barcode, even though we

detected,asexpected,moresignalinthespecificbarcodesforplants, insectsandfungi

than in the generic 18S barcode of lower taxonomic resolution. Overall, althoughwe

emphasize that current DNA-based inventories do not always capture the same

taxonomicgrain as classic surveys, this approachhas the advantageofbeing scalable,

anditshouldthusbeappropriateforrapidbiodiversityinventories,especiallyinfragile,

orthreatenedecosystems.

Page 98: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

94

Acknowledgements

WethankMaximeRéjou-Méchainforfruitfuldiscussions.Thisworkhasbenefitedfrom

“Investissement d’Avenir” grants managed by the French Agence Nationale de la

Recherche (CEBA, ref. ANR-10-LABX-25-01 and TULIP, ref. ANR-10-LABX-0041;

ANAEE-France:ANR-11-INBS-0001),anadditionalANRgrant(METABARproject;PIP.

Taberlet), and funds fromCNRS.Work has been carried out at the CNRSNouragues

Research Station, within the Nouragues Natural Reserve, and at the CIRAD Paracou

ResearchStation.Wethankthemanagersofbothresearchstations.

Page 99: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

95

References

Bahram,M.,Koljalg,U.,Courty,P.-E.,Diedhiou,A.G.,Kjoller,R.,Polme,S.,Ryberg,M.,Veldre,V.&Tedersoo,L.(2013)Thedistancedecayofsimilarityincommunitiesofectomycorrhizalfungiindifferentecosystemsandscales.JournalofEcology,101,1335–1344.

Barlow, J., Lennox,G.D.,Ferreira, J.,Berenguer,E.,Lees,A.C.,Nally,R.M.,Thomson, J.R.,Ferraz,S.F. de B., Louzada, J., Oliveira, V.H.F., Parry, L., Solar, R.R. de C., Vieira, I.C.G., Aragão,L.E.O.C., Begotti, R.A., Braga, R.F., Cardoso, T.M., Jr, R.C. de O., Jr, C.M.S., Moura, N.G.,Nunes, S.S., Siqueira, J.V., Pardini, R., Silveira, J.M., Vaz-de-Mello, F.Z., Veiga, R.C.S.,Venturieri,A.&Gardner,T.A. (2016)Anthropogenicdisturbance in tropical forestscandoublebiodiversitylossfromdeforestation.Nature.

Baselga, A., Gómez-Rodríguez, C. & Lobo, J.M. (2012) Historical legacies in world amphibiandiversity revealed by the turnover and nestedness components of beta diversity.PLoSOne,7,e32341.

Basset,Y.,Cizek,L.,Cuénoud,P.,Didham,R.K.,Guilhaumon,F.,Missa,O.,Novotny,V.,Ødegaard,F.,Roslin,T.,Schmidl,J.,Tishechkin,A.K.,Winchester,N.N.,Roubik,D.W.,Aberlenc,H.-P.,Bail, J., Barrios, H., Bridle, J.R., Castaño-Meneses, G., Corbara, B., Curletti, G., Duarte daRocha,W.,DeBakker,D.,Delabie, J.H.C.,Dejean,A.,Fagan,L.L.,Floren,A.,Kitching,R.L.,Medianero,E.,Miller,S.E.,GamadeOliveira,E.,Orivel,J.,Pollet,M.,Rapp,M.,Ribeiro,S.P.,Roisin, Y., Schmidt, J.B., Sørensen, L. & Leponce, M. (2012) Arthropod Diversity in aTropicalForest.Science,338,1481–1484.

Blanchet, F.G., Legendre, P. & Borcard, D. (2008) Forward selection of explanatory variables.Ecology,89,2623–2632.

Bongers, F., Charles-Dominique, P., Forget, P.-M.&Théry,M. (2001)Nouragues:dynamicsandplant-animalinteractionsinaNeotropicalrainforest,SpringerScience&BusinessMedia.

Borcard, D. & Legendre, P. (2002) All-scale spatial analysis of ecological data by means ofprincipalcoordinatesofneighbourmatrices.EcologicalModelling,153,51–68.

Borcard,D.,Legendre,P.,Avois-Jacquet,C.&Tuomisto,H.(2004)Dissectingthespatialstructureofecologicaldataatmultiplescales.Ecology,85,1826–1832.

Borcard,D.,Legendre,P.&Drapeau,P.(1992)Partiallingoutthespatialcomponentofecologicalvariation.Ecology,73,1045–1055.

Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16,176–182.

Chao, A., Chiu, C.-H. & Hsieh, T.C. (2012) Proposing a resolution to debates on diversitypartitioning.Ecology,93,2037–2051.

Chave, J. & Leigh, E.G. (2002) A spatially explicit neutral model of beta-diversity in tropicalforests.TheoreticalPopulationBiology,62,153–168.

Clarke, L.J., Soubrier, J., Weyrich, L.S. & Cooper, A. (2014) Environmental metabarcodes forinsects: in silicoPCRrevealspotential for taxonomicbias.MolecularEcologyResources,14,1160–1170.

Condit, R., Pitman, N., Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Nunez, P., Aguilar, S.,Valencia,R.,Villa,G.,Muller-Landau,H.C.,Losos,E.&Hubbell,S.P.(2002)Beta-diversityintropicalforesttrees.Science,295,666–669.

Cottenie, K. (2005) Integrating environmental and spatial processes in ecological community

Page 100: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

96

dynamics.EcologyLetters,8,1175–1182.Decaëns, T., Porco, D., James, S.W., Brown, G.G., Chassany, V., Dubs, F., Dupont, L., Lapied, E.,

Rougerie,R.&Rossi,J.-P.(2016)DNAbarcodingrevealsdiversitypatternsofearthwormcommunities inremotetropical forestsofFrenchGuiana.SoilBiologyandBiochemistry,92,171–183.

Fliegerova,K.,Tapio,I.,Bonin,A.,Mrazek,J.,Callegari,M.L.,Bani,P.,Bayat,A.,Vilkki,J.,Kopečný,J.&Shingfield,K.J.(2014)EffectofDNAextractionandsamplepreservationmethodonrumenbacterialpopulation.Anaerobe,29,80–84.

Gaston,K.&Blackburn,T.(2008)Patternandprocessinmacroecology,JohnWiley&Sons.Gilbert, B. & Lechowicz, M.J. (2004) Neutrality, niches, and dispersal in a temperate forest

understory. Proceedings of the National Academy of Sciences of the United States ofAmerica,101,7651–7656.

Gourlet-Fleury, S., Ferry,B.,Molino, J.-F.,Petronelli,P.&Schmitt,L. (2004)Experimentalplots:keyfeatures,Elsevier.

Grau,O.,Peñuelas,J.,Ferry,B.,Freycon,V.,Blanc,L.,Desprez,M.,Baraloto,C.,Chave,J.,Descroix,L.&Dourdain,A.(2017)Nutrient-cyclingmechanismsotherthanthedirectabsorptionfromsoilmaycontrol foreststructureanddynamics inpoorAmazoniansoils.ScientificReports,7,45017.

Guardiola,M.,Uriz,M.J.,Taberlet,P.,Coissac,E.,Wangensteen,O.S.&Turon,X.(2015)Deep-sea,deep-sequencing:metabarcodingextracellularDNAfromsedimentsofmarinecanyons.PloSone,10,e0139633.

Hansen,M.C., Potapov, P.V., Moore, R., Hancher,M., Turubanova, S.A., Tyukavina, A., Thau, D.,Stehman,S.V.,Goetz,S.J.,Loveland,T.R.,Kommareddy,A.,Egorov,A.,Chini,L.,Justice,C.O.&Townshend, J.R.G. (2013)High-ResolutionGlobalMapsof21st-CenturyForestCoverChange.Science,342,850–853.

Harrison,S.,Ross,S.J.&Lawton, J.H. (1992)BetaDiversityonGeographicGradients inBritain.JournalofAnimalEcology,61,151–158.

Heckenberger,M.J.,Russell,J.C.,Fausto,C.,Toney,J.R.,Schmidt,M.J.,Pereira,E.,Franchetto,B.&Kuikuro,A.(2008)Pre-ColumbianUrbanism,AnthropogenicLandscapes,andtheFutureoftheAmazon.Science,321,1214.

Hortal,J.,Diniz-Filho,J.A.F.,Bini,L.M.,Rodríguez,M.Á.,Baselga,A.,Nogués-Bravo,D.,Rangel,T.F.,Hawkins,B.A.&Lobo,J.M.(2011)Iceageclimate,evolutionaryconstraintsanddiversitypatternsofEuropeandungbeetles.EcologyLetters,14,741–748.

Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.

Hubbell,S.P. (2013)Tropical rain forestconservationand the twinchallengesofdiversityandrarity.EcologyandEvolution,3,3263–3274.

Koleff,P.,Gaston,K.J.&Lennon,J.J.(2003)Measuringbetadiversityforpresence–absencedata.JournalofAnimalEcology,72,367–382.

Kreft,H.&Jetz,W.(2010)Aframeworkfordelineatingbiogeographicalregionsbasedonspeciesdistributions.JournalofBiogeography,37,2029–2053.

Lawton,J.H.,Bignell,D.E.,Bolton,B.,Bloemers,G.F.,Eggleton,P.,Hammond,P.M.,Hodda,M.,Holt,R.D., Larsen, T.B.&Mawdsley,N.A. (1998)Biodiversity inventories, indicator taxa andeffectsofhabitatmodificationintropicalforest.Nature,391,72–76.

Legendre, P., Borcard, D. & Peres-Neto, P.R. (2005) Analyzing beta diversity: partitioning thespatialvariationofcommunitycompositiondata.EcologicalMonographs.

Page 101: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

97

Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.Leibold, M.A., Holyoak, M., Mouquet, N., Amarasekare, P., Chase, J.M., Hoopes, M.F., Holt, R.D.,

Shurin, J.B., Law, R., Tilman,D., Loreau,M.&Gonzalez, A. (2004) Themetacommunityconcept:aframeworkformulti-scalecommunityecology.EcologyLetters,7,601–613.

Mariadassou,M.,Pichon,S.&Ebert,D.(2015)Microbialecosystemsaredominatedbyspecialisttaxa.EcologyLetters,18,974–982.

Marichal,R.,Martinez,A.F.,Praxedes,C.,Ruiz,D.,Carvajal,A.F.,Oszwald,J.,delPilarHurtado,M.,Brown, G.G., Grimaldi, M., Desjardins, T. & others (2010) Invasion of Pontoscolexcorethrurus (Glossoscolecidae, Oligochaeta) in landscapes of the Amazoniandeforestationarc.AppliedSoilEcology,46,443–449.

Martiny,J.B.H.,Eisen,J.A.,Penn,K.,Allison,S.D.&Horner-Devine,M.C.(2011)Driversofbacterialbeta-diversitydependonspatialscale.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,108,7850–7854.

Nekola, J.C.&White,P.S. (1999)Thedistancedecayofsimilarity inbiogeographyandecology.JournalofBiogeography,26,867–878.

Novotny, V., Miller, S.E., Hulcr, J., Drew, R.A.I., Basset, Y., Janda, M., Setliff, G.P., Darrow, K.,Stewart, A.J.A., Auga, J., Isua, B., Molem, K., Manumbor, M., Tamtiai, E., Mogia, M. &Weiblen, G.D. (2007) Low beta diversity of herbivorous insects in tropical forests.Nature,448,692–695.

Pace,N.R.(1997)Amolecularviewofmicrobialdiversityandthebiosphere.Science,276,734–740.

Ramirez,K.S., Leff, J.W.,Barberan,A., Bates, S.T., Betley, J., Crowther,T.W.,Kelly, E.F.,Oldfield,E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographic patterns in below-grounddiversity inNewYork City’s Central Park aresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.

Rosenzweig,M.L.(1995)Speciesdiversityinspaceandtime,CambridgeUniversityPress.Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European Physical

JournalSpecialTopics,178,13–23.Schuldt, A., Wubet, T., Buscot, F., Staab, M., Assmann, T., Böhnke-Kammerlander, M., Both, S.,

Erfmeier,A.,Klein,A.-M.,Ma,K.,Pietsch,K.,Schultze,S.,Wirth,C.,Zhang,J.,Zumstein,P.&Bruelheide, H. (2015)Multitrophic diversity in a biodiverse forest is highly nonlinearacrossspatialscales.NatureCommunications.

Siles, J.A. & Margesin, R. (2016) Abundance and Diversity of Bacterial, Archaeal, and FungalCommunitiesAlonganAltitudinalGradientinAlpineForestSoils:WhatAretheDrivingFactors?MicrobialEcology,72,207–220.

Smith,T.W.&Lundholm,J.T.(2010)Variationpartitioningasatooltodistinguishbetweennicheandneutralprocesses.Ecography,33,648–655.

Socolar, J.B.,Gilroy, J.J.,Kunin,W.E.&Edwards,D.P. (2017)HowShouldBeta-Diversity InformBiodiversityConservation?TrendsinEcology&Evolution,31,67–80.

Soininen,J.,McDonald,R.&Hillebrand,H.(2007)Thedistancedecayofsimilarityinecologicalcommunities.Ecography,30,3–12.

ter Steege, H., Nigel, C.A., Sabatier, D., Baraloto, C., Salomao, R.P., Guevara, J.E., Phillips, O.L.,Castilho, C.V.,Magnusson,W.E.,Molino, J.F.,Monteagudo,A., Vargas, P.N.,Montero, J.C.,Feldpausch, T.R., Coronado, E.N.H., Killeen, T.J., Mostacedo, B., Vasquez, R., Assis, R.L.,Terborgh, J.,Wittmann,F.,Andrade,A.,Laurance,W.F.,Laurance,S.G.W.,Marimon,B.S.,

Page 102: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

98

Marimon, B.H., Vieira, I.C.G., Amaral, I.L., Brienen, R., Castellanos, H., Lopez, D.C.,Duivenvoorden, J.F.,Mogollon, H.F.,Matos, F.D.D., Davila, N., Garcia-Villacorta, R., Diaz,P.R.S., Costa, F., Emilio, T., Levis, C., Schietti, J., Souza, P., Alonso, A., Dallmeier, F.,Montoya,A.J.D.,Piedade,M.T.F.,Araujo-Murakami,A.,Arroyo,L.,Gribel,R., Fine,P.V.A.,Peres,C.A.,Toledo,M.,Gerardo,A.A.C.,Baker,T.R.,Ceron,C.,Engel,J.,Henkel,T.W.,Maas,P., Petronelli, P., Stropp, J., Zartman, C.E., Daly, D., Neill, D., Silveira, M., Paredes, M.R.,Chave,J.,Lima,D.D.,Jorgensen,P.M.,Fuentes,A.,Schongart,J.,Valverde,F.C.,DiFiore,A.,Jimenez, E.M., Mora, M.C.P., Phillips, J.F., Rivas, G., van Andel, T.R., von Hildebrand, P.,Hoffman,B.,Zent,E.L.,Malhi,Y.,Prieto,A.,Rudas,A.,Ruschell,A.R.,Silva,N.,Vos,V.,Zent,S.,Oliveira,A.A.,Schutz,A.C.,Gonzales,T.,Nascimento,M.T.,Ramirez-Angulo,H.,Sierra,R.,Tirado,M.,Medina,M.N.U.,vanderHeijden,G.,Vela,C.I.A.,Torre,E.V.,Vriesendorp,C.,Wang,O.,Young,K.R.,Baider,C.,Balslev,H.,Ferreira,C.,Mesones, I.,Torres-Lezama,A.,Giraldo, L.E.U., Zagt, R., Alexiades, M.N., Hernandez, L., Huamantupa-Chuquimaco, I.,Milliken,W.,Cuenca,W.P.,Pauletto,D.,Sandoval,E.V.,Gamarra,L.V.,Dexter,K.G.,Feeley,K., Lopez-Gonzalez, G. & Silman,M.R. (2013) Hyperdominance in the Amazonian TreeFlora.Science,342,325–+.

Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012)EnvironmentalDNA.MolecularEcology,21,1789–1793.

Taberlet,P.,Gielly,L.,Pautou,G.&Bouvet,J.(1991)Universalprimersforamplificationofthreenon-codingregionsofchloroplastDNA.Plantmolecularbiology,17,1105–1109.

Thompson, R.&Townsend, C. (2006)A trucewith neutral theory: local deterministic factors,speciestraitsanddispersallimitationtogetherdeterminepatternsofdiversityinstreaminvertebrates.JournalofAnimalEcology,75,476–484.

Vincent, J.B., Weiblen, G.D. & May, G. (2016) Host associations and beta diversity of fungalendophytecommunitiesinNewGuinearainforesttrees.MolecularEcology,25,825–841.

Whittaker,R.H.(1972)EvolutionandMeasurementofSpeciesDiversity.Taxon,21,213–251.Whittaker,R.H.(1960)VegetationoftheSiskiyouMountains,OregonandCalifornia.Ecological

Monographs,30,279–338.Yu,D.W., Ji, Y.Q.,Emerson,B.C.,Wang,X.Y., Ye,C.X., Yang,C.Y.&Ding,Z.L. (2012)Biodiversity

soup:metabarcodingofarthropodsforrapidbiodiversityassessmentandbiomonitoring.MethodsinEcologyandEvolution,3,613–623.

Zinger, L., Chave, J., Coissac, E., Iribar, A., Louisanna, E., Manzi, S., Schilling, V., Schimann, H.,Sommeria-Klein,G.&Taberlet,P.(2016)ExtracellularDNAextractionisafast,cheapandreliable alternative for multi-taxa surveys based on soil DNA. Soil Biology andBiochemistry,96,16–19.

Zinger, L., Taberlet, P., Schimann, H., Bonin, A., Boyer, F., De Barba, M., Gaucher, P., Gielly, L.,Giguet-Covex,C.,Iribar,A.,Réjou-Méchain,M.,Rayé,G.,Rioux,D.,Schilling,V.,Tymen,B.,Viers,J.,Zouiten,C.,Thuiller,W.,Coissac,E.&Chave,J.(2017)Soilcommunityassemblyvariesacrossbodysizesinatropicalforest.BioRxiv.

Page 103: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

99

SupplementaryInformation

#OTUs #Reads

PlantstrnL 776 5,142,400

Plants18S 71 366,646

Bacteria16S 11,380 3,863,620

Protists18S 295 240,223

FungiITS 4,312 2,151,746

Fungi18S 386 832,153

Arthropods18S 342 463,057

Insects16S 3,497 1,331,880

Insects18S 70 185,446

Annelids18S 18 145,044

Nematodes18S 81 10,672

Platyhelminthes18S 32 15,619

TableS1:NumberofOTUsandreadcountpertaxonomicgroup.

Page 104: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

100

pH Ctot Ntot P2O5 Clay Silt Sand Al Fe Mg Mn K Ca

Unit None (g/kg) (mg/kg) (%) (%) (%) (cmol+/kg)

Inselbergsummit 4.9 30.5 1.8 <5.0 27.5 4.6 67.9 2.1 0.081 0.27 0.011 0.097 0.08

PP-F21 4.9 29.1 1.9 5.8 33.6 4.0 62.6 2.6 0.068 0.13 0.011 0.088 0.09

PP-H20 4.3 34.3 2.1 7.3 48.1 4.5 47.4 2.5 0.144 0.46 0.018 0.127 0.26

PP-H21 4.8 31.1 2.2 6.0 51.0 4.3 44.7 2.3 0.063 0.37 0.017 0.082 0.25

GP-L11 5.0 35.7 3.0 5.3 73.3 12.8 13.9 1.2 0.006 0.67 0.125 0.113 1.48

GP-L12 4.6 37.0 3.1 7.3 71.8 16.8 11.4 1.6 0.006 0.43 0.215 0.112 0.75

GP-O13 4.3 32.6 2.1 6.8 71.7 12.8 15.6 3.5 0.030 0.51 0.070 0.114 0.75

GP-Liana 5.2 27.6 2.6 7.8 52.0 30.4 17.6 0.4 0.005 1.65 0.252 0.143 3.64

Balanfois-1 3.9 41.8 2.9 <5.0 78.7 7.6 13.8 3.5 0.041 0.22 0.028 0.084 0.35

Balanfois-2 3.9 40.7 2.9 <5.0 79.6 6.0 14.4 3.3 0.046 0.31 0.025 0.087 0.27

Parare-5 4.4 35.9 2.6 10.3 64.5 12.3 23.3 2.8 0.056 0.59 0.020 0.114 0.21

Parare-6 4.0 38.1 2.5 11.3 55.5 19.2 25.4 4.0 0.058 0.24 0.019 0.116 0.16

Paracou-06.3 5.0 19.2 1.2 6.5 13.4 7.2 79.4 1.2 0.043 0.22 <0.010 0.078 0.10

Paracou-06.4 4.9 20.1 1.3 8.3 12.2 7.0 80.9 1.2 0.035 0.16 <0.010 0.058 0.11

Paracou-11.1 5.0 20.1 1.2 6.8 16.3 8.0 75.7 1.6 0.053 0.23 <0.010 0.080 0.14

Paracou-12.1 4.6 27.8 1.7 <5.0 27.0 7.6 65.5 2.3 0.124 0.24 <0.010 0.080 0.07

Paracou-12.2 4.6 20.6 1.2 7.3 16.5 6.5 77.1 1.9 0.113 0.21 <0.010 0.079 0.16

Arbocel-7.3 4.6 30.7 1.9 7.0 22.0 10.0 68.0 1.6 0.143 0.32 <0.010 0.085 0.13

Arbocel-7.4 4.6 30.3 1.9 <5.0 24.6 10.7 64.7 1.7 0.147 0.29 <0.010 0.076 0.16

Table S2:Mean soil variables in all nineteen 1-ha plots.Eachvalue is theaverageof fourseparatemeasurements, eachmade on twenty pooled soil samples. Al, Fe,Mg,Mn, K, and Caconcentrationsareexpressedincmolofpositivechargesperkg.Valuesinitalicscorrespondtodisturbedplots.

Page 105: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

101

pH Ctot Ntot P2O5 Cly Silt Al Fe Mg Mn K Ca

pH 1 -0.59 -0.41 -0.02 -0.53 0.09 -0.82 -0.17 0.27 0.19 -0.01 0.32

Ctot 1 0.91 -0.04 0.74 0.11 0.60 -0.03 0.05 0.08 0.33 -0.01

Ntot 1 -0.01 0.83 0.36 0.35 -0.31 0.33 0.41 0.47 0.31

P2O5 1 -0.01 0.27 0.08 -0.13 0.12 0.14 0.30 0.07

Clay 1 0.24 0.48 -0.45 0.27 0.41 0.51 0.29

Silt 1 -0.22 -0.39 0.76 0.67 0.52 0.76

Al 1 0.15 -0.38 -0.38 0.04 -0.45

Fe 1 -0.34 -0.56 -0.23 -0.48

Mg 1 0.70 0.60 0.91

Mn 1 0.56 0.81

K 1 0.59

Ca 1

VIF 5.035.9 44.8 1.5 12.5 4.58.33.3 7.2 5.0 3.2 10.8

TableS3:Correlationcoefficientsbetweensoilvariablesinthefifteenundisturbedplots.Bold font indicates correlation coefficients above 0.70. Variance Inflation Factors (VIF) arecomputedasthediagonalelementsoftheinversecorrelationmatrix.

Page 106: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

102

Geographicaldistance Soil

Mean𝐷!"#$%&$%

𝑟𝒅𝒊𝒔𝒕 𝑟𝒅𝒊𝒔𝒕,𝒑𝒂𝒓𝒕 slope𝒅𝒊𝒔𝒕 𝑟!"#$ 𝑟𝒔𝒐𝒊𝒍,𝒑𝒂𝒓𝒕 slope𝒔𝒐𝒊𝒍

PlantstrnL 0.42 0.65*** 0.61*** 0.038 0.29*** 0.06 0.011

Plants18S 0.50 0.22** 0.15* 0.022 0.24*** 0.17* 0.016

FungiITS 0.87 0.43*** 0.29*** 0.029 0.54*** 0.45*** 0.025

Fungi18S 0.45 0.31*** 0.20* 0.022 0.39*** 0.31*** 0.019

Insects16S 0.89 0.23*** 0.16** 0.013 0.25*** 0.18** 0.010

Insects18S 0.57 0.07 0.05 0.008 0.06 0.03 0.005

TableS4:Linearregressionoftaxonomicdissimilarity(Sorensenindex)againstsoilandgeographicaldistance:comparisonbetweenbarcodeswithintaxonomicgroups(cf.Table2). 𝑟!"#$ ,𝑟!"#$ ,𝑟!"#$,!"#$ ,𝑟!"#$,!"#$are the simple and partial Pearson’s correlation coefficients.SignificancewasassessedusingManteltests:***forp<0.001;**for0.001<p<0.01;*for0.01<p<0.05.

Puresoilfraction Mixedfraction Purespatial

fractionTotalexplained

variance

PlantstrnL 2.4*** 7.8 11.0*** 21.1***

Plants18S 1.4 6.8 15.1*** 23.3***

FungiITS 3.8*** 4.9 5.9*** 14.5***

Fungi18S 4.1*** 15.2 11.8*** 31.2***

Insects16S 0.1 1.3 1.5** 2.9***

Insects18S NA NA NA NA

Table S5: Fractionsof variance (adjustedR2, in%)explainedbyCanonicalRedundancyAnalysis for environment-only and spatial-onlymodels: comparison between barcodeswithintaxonomicgroups(cf.Table3).Significance:***forp<0.001;**forp<0.01;*forp<0.05.

Page 107: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

103

𝑅 1 𝑎(×10!) 𝑏/𝑎 log!" 𝜎! 𝜈

PlantstrnL 0.26*** 0.55 24 -20

Plants18S 0.16** 0.24 33 -28

Bacteria16S 0.22*** 6.8 48 -42

Protists18S 0.33*** 0.075 46 -40

FungiITS 0.14*** 2.3 13 -11

Fungi18S 0.23*** 0.76 31 -26

Arthropods18S 0.07 0.71 43 -37

Insects16S 0.08** 1.1 15 -13

Insects18S 0.09 0.11 33 -28

Annelids18S 0.25*** 0.061 30 -26

Nematodes18S 0.06 1.0 53 -46

Platyhelminthes18S 0.10 0.13 34 -30

Table S6: Fitting the neutral prediction for the decay of taxonomic similarity withdistance(Chave&Leigh,2002).𝐹! 𝐴,𝐵 = 𝑝!!𝑝!!!

!!! ,where𝑝!!istheproportionofspeciessin sample A and𝑝!! that in sample B, is regressed against the log-transformed geographicaldistance r between samples (expressed in meters), as𝐹! 𝑟 = 𝑎 ln 𝑟 + 𝑏:𝑅is the correlationcoefficient;***,**and*denotethesignificanceassessedbyManteltest(𝑝 < 0.001,𝑝 < 0.01and𝑝 < 0.05,respectively);and𝜎! 𝜈(expressedinsquare meters)istheratiobetweenthevariance𝜎!ofthedispersalkernelandtheneutralspeciationprobability𝜈.

Page 108: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

104

Loggingonly

Pureloggingfraction

Mixedfraction

Puresoilfraction

Totalexplainedvariance

PlantstrnL 12.9*** 4.0*** 9.0 3.5** 16.4***Plants18S 10.9*** 3.1* 7.8 2.3 13.1***Bacteria16S 11.9*** 1.9* 10.0 18.2*** 30.1***

Protists18S 4.8** 1.1 3.6 4.8** 9.6**FungiITS 4.3*** 1.6*** 2.6 5.2*** 9.4***Fungi18S 7.6*** 4.7*** 2.9 8.6*** 16***Arthropods18S 1.7 NA NA NA NAInsects16S 1.6* NA NA NA NAInsects18S 3.4 NA NA NA NAAnnelids18S 6.0* 6.4* -0.3 5.4* 11.4**Nematodes18S 1.9* NA NA NA NAPlatyhelminthes18S 0.6 NA NA NA NA

TableS7:Fractionsofvariance(adjustedR2,in%)explainedbyCanonicalRedundancyAnalysisforloggingintensityandforsoilconditions.Significance:***forp<0.001;**forp<0.01;*forp<0.05.

Page 109: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

105

Taxonomicgroup Selectedspatialvariables

Bacteria16S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Nouragues.MEM.5

Protists18S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Paracou.MEM.1PlantstrnL UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Nouragues.MEM.5 Paracou.Nouragues UTME.Paracou Plants18S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Nouragues.MEM.4 Nouragues.MEM.5 Nouragues.MEM.8 Nouragues.MEM.12 Nouragues.MEM.15 Nouragues.MEM.16 FungiITS UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Fungi18S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Arthropods18S UTMN.Nouragues UTME.Nouragues Annelids18S UTMN.Nouragues Nouragues.MEM.1 Paracou.MEM.3 Nematodes18S UTMN.Nouragues Platyhelminthes18S Noselectedmodel Insects16S UTME.Nouragues Paracou.Nouragues Nouragues.MEM.2 Nouragues.MEM.11 Nouragues.MEM.15 Insects18S Noselectedmodel

TableS8:Selectedspatialmodelsafterforwardvariableselection.Selectionisappliedonthe following variables: UTM coordinates in Nouragues and Paracou (‘UTMN.Nouragues’,‘UTME. Nouragues’, ‘UTMN.Paracou’, ‘UTME.Paracou’), the dummy variable connectingNouragues and Paracou sites (‘Paracou.Nouragues’), and PCNM variables in Nouragues(‘Nouragues.MEM.1’ to ‘Nouragues.MEM.17’) and Paracou (‘Paracou.MEM.1’ to‘Paracou.MEM.7’),whichrepresentdifferentpossiblepatternsofspatialautocorrelation.

Page 110: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

106

Figure S1: Rarefaction analyses. In each sample and for each barcode, we sampled withreplacementbetween1and8,000reads,andplotted thecorrespondingnumberofOTUs(onecurvepersampleandperbarcode).

Eukaryotes18S

PlantstrnL FungiITS

Bacteria16S Insects16S

Page 111: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

107

Figure S2: Occurrence-based (Sorensen) dissimilarity as a function of log-distance;comparisonbetweenbarcodeswithintaxonomicgroups(cf.Fig.4).Theredlinefiguresthelinearregression.

●●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

● ●

●●

●●●●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●●●●●

●●●

●●●●●

●●●

●●●●●●●● ●

●●●●

●●●

●●●

●●●

●●●●●●●●●

●●

●●●●

●●

●●

●●

●●●●●

●●●●

●●

● ●●●

●●●

● ●

●●

●●●

●●●●

● ●

●●●●

●●

●●

●●●

●●

●●●●●●●

●●●

●●●●●

●●●●●●●●●●

●●●

●●

●●●●

●●●●●

●●●

●●●●●●●●

●●●●●

●●●

●●●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

● ●●

●●●

●●

●●

●●●●

●●

●●●●●

●●●

●●●●

●●

●●●●

●●●●

●●

●●●●

●●

●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●●

● ●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●● ●●

●●●

●●●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●●●

● ●●●

●●●

●●●●●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●●●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●●●●●●●

●●●●

●●●

●●

●●●

●●●

●●●

●●●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

● ●●

●●●●●

●●

●●●●●●●●●●●

●●

●●●●

●●

●●●

●●●●

●●

●●●●●●

●●

● ●

●●

●●

●●●●●●

●●

●●●

●●

●● ●●

●●●●●

●●

● ●

● ●

●●

●●●

●●●

●●●●●

●●

●●●●

●●

●●●●●

●●●

●●●

●●

●●

●●

●●●

●●●●●

● ●

●●

●●●●●

●●●●●

●●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●●●●●

●●●●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●●

●●●●

●●

●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Insects 16S

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

● ●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

● ●

●●

●●

● ●

●●●●

● ●

●●

● ●

●●●

● ●●●

●●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●●

●●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●●

●●●

●●

●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Insects 18S

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●●●●

●●

●●

●●●●●

●●●

● ● ●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●

●●●●●●●●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●

●●●●●●●●●●●●●

● ●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●●

●●●●●●●●

●●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●● ●●

●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●●

●●●●●

●●●

● ●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●●●●●

●●

● ●

●●●●●●●●●●●

● ●●

●●

●●

●●●●

●●

●●●●

●● ●●●●

●●●●●●

●●

●●

●●●●●

●●

● ●

●●

●●●

●●●

●●●●

●●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●

●●●

●●

●●●●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●●

●●●●●●●

●●

●●

●●●

●●●●●● ●

●●

●●●

●●

●● ●

●●●●●

●●●●●

●●●●●●●●●●●●●

●●●

●●●

●●

●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●●●●●●

●●●●●

●● ●

●●

●●●●●

●●

●● ●

●●●●●●●●

● ●●●●

●●●

●●●●● ● ●●

●●●

●●

●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●●●●

●●●

●●

●●●●●●●●●●

●●●

●●●●

●●●●

●●●●

●●

●●●●●●●●

●●●

● ●●

●●●●

●●●

●●

●●●●●

●●●

●●●●●●●●

●●

●●●●●●●●●●●●●●●●

●●●● ●

●●● ●●●●●●●●●

●●

● ●●●●●●●●●●●●

● ●

●●

●●

●●

●●●●●●

●●●●●●

●●●●●● ● ●

●● ●

●●● ●●●●●●●●●

●●●

●●●●●●●●●●●●

●●

●●●●

●●●

●●●●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●●●●●●●●●

●●●

●●

●●●●●●●

●●●

●●●

●●●●●●

●●●

●●●●●●●●

●●●

●●●●●●●●●●●●

●●●

●●●

●●●●●●●

●●●●●●●●●●●●

●●

●●●●●●●

●●

● ●●●

●●●●●●●●

●●

●●●

●●●●●

●●●

●●●●

●●●●

●●●●

●●●

●●●●●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●

●●●●●

●● ●

●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●●

● ●

●●

● ●

●●

●●●●●●●

●●

●●

●●●●●●●

●●●

●●

●●●●●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●●●

● ●

●●●

●●●

●●●●●●

●●●●●●●●●●●●

●●●●

●●●●●●●●●

●●●●●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●●●

●●●●

●●●●

● ●●●

●●

●●●●

●●

●●

●●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Fungi ITS

●● ● ●

●●

●●

●●

●●●●●

●●●●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

● ●●

●●●

●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●●

●●●●●●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●

●●●●●●●●●●●

●●

●●●

●●

●●●

●●●

●●●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●●●

●●●

●●●

●●●●

●●●

●●●

●●

●●●

●●●●●

●●●●

●●●

●●●●●●●●●

●●●●●●●●

●●●

●●●

●●

●●

● ●

●●

●●

●●●

●●●●●

●●

●●●●

●●●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●●●●●●

●●

●●

●●●●

●●●

●●

●●●●

●●●●

●●●●

● ●

●●●●●

●●●●

●●●●●

●●●

●●●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●●●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●

● ●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●●●●

●●●

●●

●●

● ●●

●●

●●

●●

●●

●● ●●●

●●

●●●

●●

●●●

●●

● ●

●●●

●●●

●●●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●●●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●● ●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●●

● ●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

● ●●●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●●

● ●●

●●

●●

●●●●●

●●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

● ●●

●●

●●●

●●

● ●●

● ●●●●●●

●●●●●

●●●●●●

●●

●●

● ●

●●

●●

●●●

●●●●●

●●

●●●

●●●●●●●●●

●●

●●

●●●●

●●

●●●●

● ●●

●●

●●

●●

●●●●●●

●●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●●●

●●

●●●●

●●

● ●●

●●

● ●

●●

●●●●

●●

●●●

●●

●●

●●

● ●

●●

●●●●

●● ●●

●●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Fungi 18S

●●

●● ●●●●

●●●●

●●●●●●●● ●●●

●●●●

● ●

●●●

●●

●●

●●●●

●●

●●

●●

● ●

●● ●

●●●

●●

●●

●●●●

●●●●

●●●

●●

●●●

●●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●● ●●●●

●●●●●●

●●

●●●● ●

●●●

●●

●●●● ●

●●

●●

●●●●●●●

●●●

●●

●●●

● ●

●●

●●●●

●●

●●

●●●●

●●●● ●●

●●

●●

●●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●●●

●●●●

●●●●●

●●●

●●●●●

●●●●●●●●

●●

●●●●●

●●

●●

●●●●●

●●●●

●●●●●●●●

●●●●●

●●

●●

●●●

●●●●●●

●●●

●●

●●

●●

●●●●●

●●●

●●●●●●●●● ●

●●

●●●●●●●●●

●●●●●

●●●

●●●●●

●●●●

●●●●

●●●●●●●●

●●●●●

●●

●●●●●●

●●●●●

●●●●●

●●●●

●●

●●●

●●●

●●●

●●

●●●

●●●●●●●●

●●

●●

●●

●●●●●●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●●●●

●●

●●●● ●

●●●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

● ●●●●●●●●

●●●●

●●

●●

●●●

●●●●●

●●●●

●●

●●●

● ●●●●●●●

●● ●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●●●

●●

● ●●●

●●

●●

●● ●●

●●

●●●●●

●●

●●

●●●

●●

●● ●●●

●●

●●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●●●●●●●●

●●●

●●

●●●●

●●

●●

● ●

●●

●●

●●●●

●●●●●

●●●●●●●

●●●●

●●●

●●

●●

● ●●●

●●

●●

●●●●● ●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●●●●●●

●●

●●

●●

●●

● ●

●●●●●●●●●●●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●●●●●

●●

●●

●●

●●

● ●●

●●●●

●●

●●●●●

●●●

●● ●●

●●

●●●

●●

●●●● ●

●●●●●●●

●●

●●

● ● ●

●●●●

●●

●●●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●●● ●

●●●●●●

●●●

●●●

●●

● ●

●●●● ●●●●●●●

●●

●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●●

●●

●●●●●

●●

●● ●●●

●●●●

●●●●●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●

●●●

●●●

● ●

●●●●●●●●

●●●●

●●

●●

●●

●●

● ●

●●●●●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●●

● ● ●●●

●●●●

●●●●

●●●●

●●

●● ●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

● ●

●●●

●●●

●●

●●●

●●●

● ●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Plants trnL

●●

●●●

●●

●●

●●●●

● ●

●●

●●●●

●●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●●●

● ●●

●●

●●

●●

●●●●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●●

●●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Plants 18S

Sore

nsen

dis

sim

ilarit

y

Log10 of geographical distance (m)

Page 112: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

108

Figure S3: Occurrence-based (Sorensen) dissimilarity as a function of log-distance;comparisonbetweenbarcodeswithintaxonomicgroups(cf.Fig.5).Theredlinefiguresthelinearregression.

Sore

nsen

dis

sim

ilarit

y

Soil dissimilarity

●●

● ●● ● ●

●●

●●●

●●

●●

●●

●●

● ●

● ● ●●

● ●

●●

●●

●●

●●

● ●

●●

●●●● ●

●● ● ●

●●

●● ●

● ●●

●●

● ●

● ●

●●●● ● ●

●●●

●● ●

●●●

● ●

●●●●

●●● ●●●

●●●●

●●

● ● ●

●●●

●● ●

●●●●

●●

●●

● ● ●●

●●

●●

●●

●●●●

● ●● ●

●●

●●● ●

●●

●●

● ●

●●

● ●●●

●●

●●●●

● ●

●●

● ●●

● ●

●●●●●●

●●●

●●●●●

●●

●●● ●

●●● ●

●●

●●

●●●

● ● ● ●●●● ●

●● ● ●●

●●●

●●

● ●●

●●●

●●

● ●●

●●

●● ●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ●

●●

● ●

●●

●●

●●● ●

●●●

●●

●●

● ●●

●●

● ●

●●

●●●

●●

●●● ●●

●●

●●●●

●●

● ●●●

●● ●

●●

●●● ●

●●

●● ●●

●●●

●●●

●●

●●

● ●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●● ●●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●●

●●●

● ●

●●

● ● ● ●●

●●

●●

●●

●●

● ●

●●● ●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●●

●●

● ●●

●●

●●

●●

● ●

●● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●●

●●●

● ●●●●●

● ● ●

●●

● ●

●●

●●

●●●

● ●●●

●●

●●

●●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●● ●

●●

●● ●●

●●●●

● ●

●●●

●●●

●●

●●

●●●

●●● ●

●●●●

●●

●●

● ●●

●●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●●● ●

●●

●●

● ●●

●●

●● ●

● ●

●●●

●●

● ●● ●●●

●●

●●●

●●

● ●●

●●●

●●

●●●● ●

●●

● ●

●●

●●

●●● ●●

●●

● ●●

●●

●●●●

●●●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●● ●●

●●

● ●●

●● ●● ●

●●

●●

● ●●●

● ●

●●● ●

●●

● ●

●●●

●●

●●●

●●●

● ●●

●●●

●●●

●●

●●

● ●●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●●

●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Insects 16S

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●●

●●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ● ●

●●

●●

●●●

●●

●●

●● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

● ● ●

● ● ●●

● ●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●●

● ●

●●

●●

●●

●● ●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●●

●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Insects 18S

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ● ●● ● ●●

●●

● ●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●● ●

● ●

● ●●

●●● ●

●●● ●●

●●

●●

● ● ●●

●●

●●

●●

●●●

●●

●●

● ●●●● ● ●● ●

● ●

●●●

●●●

● ● ●●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●● ●

● ●

● ●●

●●

●●

●●●● ● ●

●●●

●●

● ●●

●●

● ●

●●

● ●●●

● ● ●

● ●●

● ●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●●

●●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

● ●●

●●● ●●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●●●

● ●●

● ●

●●

●●●●

●●

●●●●●

● ●●

●●

●●

●●●

●●

●●

●●

●●● ●●●

●●

●● ● ●●●

● ●

●●

● ●●

●●

●●

●●

●●●

●●

● ● ● ●

●●●

●●● ● ●●●●

●●

●●

●●●

●●●

● ●

●●

●●●

●●

● ●

●●

● ●●●

●● ● ●●

●● ● ● ●●

●●●

●● ●

●●

●●

●●

●●

●●

● ● ● ●●●

●●

●●

●●●

● ●●●

●● ●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●●

●● ●● ●

●●●

●●

●● ●

●●●●●

● ●●

●●●

● ●

●●●

●●● ●●

●● ● ●●

●●

● ●●●

● ●●● ●●

● ●●

●●

● ●

●●

●●●

●●

●● ● ●● ●

● ●●

●● ●●●●

● ●●●●●

● ●

●●

●●

●●

●●

●●

●● ●● ●

● ●●●

● ● ●

●●●●●

● ● ●

●●

●●●

●●

● ●

●●●

●●

●● ●●

● ●●

●●● ●●

●●

●●●●

● ●● ●

●●

● ●

●●

●●● ●●●

● ●●●

●●

●●

● ● ●●● ● ●● ●●

●●

●●

●●

●●

●●

● ●●● ●●

●● ● ●

●●

● ●

●●●

● ●● ●●

●●●

●●

●● ●

●● ● ●● ● ●

●●●

●● ● ●

●●●●

●●●

●●

● ●●

● ● ● ●●●

●●

●●●

●●●●

●●

●●

● ●●

● ●

●●

●●

● ● ●●●●

●●

●●●●●

●● ● ●●

●●● ●●●

●●

●● ●●●● ● ●● ● ●

●●

●●● ●

●●● ●●●●

● ●●●●●

●●

●●

●●

● ●

●●

● ●●●

●● ● ●●●

●●●●●●● ●

●● ●

● ●●● ●● ● ● ●●● ●

●●

●●● ●●●● ●

●●●●●

●●

●●

●●

● ●●

●●

●●●

● ●

●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●● ●●●●

● ●●●●●

●●

●●●

●●● ● ●

●●●

●●

●●●

● ●●●●

●●●

●●

●●● ●●●

● ● ●

●●● ●●●● ●●●●●

● ● ●

●●

● ● ●●●

●●

●●

●●●

●●●

●●●●

●●

● ●●●

●● ●

● ●

●●●●

●● ● ●●●●●

●●

● ●●

●● ● ●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●● ●●

● ●●

●●

●●

● ●●●●

●●

● ● ●●

●●

●●

●●●

●●●

●●●●

●● ●

● ●●

●●●

●● ●●

●●

●●

●●

● ●

●●

●●

●●● ●●●

● ●

●●

● ●

●●

●●●

●●●●

●●

●●

● ●●● ●●

●●

●●

● ●● ●●

●●

●● ●

●●

●● ●●

●●●

●●

●●

●●

●●●●

●●●

● ●

●●

● ●●

●●●

● ●

●●●

●●●

●●●●

●●

●●●

●●●●●●●

●●

●●●

●● ●

●●●●●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

● ●●

● ●●●

●●●●

●●●●

●●

●●●

●●

●●

●●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Fungi ITS

●●● ●

●●

●●

●●

●● ●●

●●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●●

●●

● ●●●

●●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●●

●●●●

●●

●●

● ●

●●

●●●

●●●●

●●

●●

●●

●●

●●●

● ●●●

●●

●●

● ●●

●●

● ●●

●● ●

●●

●●

●●

●●●●

●●

● ● ●

●●●

●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●●●

● ●●

●●●

●●

●●

● ●●

●●●

●●

●● ●

●●

●●●

●●

●●

●●●

●● ●

●●● ●

●●

●●

●● ●● ●●

●●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●●●●

●●

●●●

●● ●●

●● ●

●●

● ●●

●●

●●

●●●

●●●●

●●

●● ●

● ●●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●● ●●

●●●●

●●

●● ●● ●

●●●●

●●●● ●

●● ●

● ●●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●

● ●

●●●

● ●● ●

●●

●●

●●

●●

●●

●●

●●● ● ●

● ●

● ●

●●

●●

●● ●

●●

●●●●

●●

● ●

●●

● ● ●

●●

●●

●●●

●●●●

● ●

●●

●●●

●● ● ●

●●●

●●●

●●

● ●●●

●●

● ●

●●

●●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●●●

●●

● ●●

● ●

●●

● ●●

●● ●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

● ●

●●●

●●

● ●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●● ●●

●●●

●●

●●

●●

● ●

●●

●●

●●●

●● ●

● ●

●●

●● ●

●●

●●

●●●

●●

●●●

●●

●● ● ●

●●

●●

●●

●●

●●●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●●

● ●

●●

●●

●●●●●

●●

● ●

●●● ●

●●

●●

●●

● ●●

●●

●●

●●●

● ●

●●

● ●

●● ●

●●

●●

●● ●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●●

●●

●●●

● ● ●●

●● ●

●●●

●● ●

●●

●●

●●

●●

● ●●

● ●●●

●●

●● ●

●●

●●●

●●

●●

●●

● ●●

●●

●●

● ●●●

● ●

●●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●● ●●

●●

● ● ●

● ●

●●

●●

● ●●

●● ●

●●

●● ●

●●●

● ●●

● ●

● ●

●●●●

●●●

●●

●●

●●

●● ●

●●

●●

●●●

●●

●●● ●

● ●

●●

●●

●●

●●●

● ●

●●●

● ● ●

●●

●●

●●

●●●

● ●●●

●● ●

●● ●●

●●●

●● ●

● ●

●●

● ●

●●

●●

●● ●

●●

●●●

●●

●●●

●●

● ● ●● ●●●

●●

●●

●●

●●

● ●

●●●●

● ●●

●●

●●

● ●

●●● ●● ●

● ●●●

●●

●●

●●

●●

●●

● ●

●●● ●

●●●

●●

●●

●●

●●●●

●● ● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Fungi 18S

●●

● ● ● ● ●●

●●● ●

●● ●

●●●●●● ●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●● ●

● ●●

●●

● ●

●●● ●

●●

● ●● ●

●●

●●●

●●

●●

●●●

●●●

●● ●●

●●

●●

●●

●●

● ● ● ● ●●

●●● ●●●

●●

●●● ●●

●● ●

●●

●● ●●●

●●

●●

● ●● ●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●● ●●

●●●

●● ●●

●●

●●

●● ●

●●●

●●

●● ●

●● ●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●● ●

● ●●

●●

●●

● ●●●●

●●● ●● ●● ●

●●

●●●●

●●

●●

●●

● ●●

●●●●

●● ●

●●

●● ●●

●●

● ●●

●●

●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

● ●●

●●

●●● ● ●

●●●

●●●

●●

●●●●●

●●

●●

●● ●● ●

●●

●●●●●

●● ●●

●●● ●

●●●

●●

●● ●● ●●

●●

●●

●●

●●

●●

●●● ●

●● ●● ●

●●● ●

●●

●●

●●●

●●●

●●

● ●●

●● ●●

●●● ●

● ●

● ●

●●

●●●● ● ● ●●

●●

●● ●

●●

●●

●●

●●

●●●

●●●

● ●●

● ●

● ●●

● ●

●●

● ●

●●

● ●●●

●● ●

●●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●●

●●●

●●

●●● ●

●●

●●

●●

●● ●

● ●

●●●

●●●

● ●

●●

●●

●●

● ●●●

● ●

●●●●

●●

●●

●●

●●

● ●

● ●●●

● ●● ●●

● ●●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

● ●●● ● ●● ●

●●●

●●●

● ●

● ●

●●

● ●●

●●●

●●

●●

●●

● ●

●●

●●

● ●●

●●● ●

● ●

●● ●●

● ●

●●

●●●●

●●

●● ● ●●

●●

●●

●●

●●

●● ● ●●

●●

●●

●●●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●● ● ●●

●● ●

●●

● ●● ●

●●

●●

●●

● ●

●●

●●●

●●

● ●

●●● ●●●

● ●●

●● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●●

●●●

●●● ● ●●● ●●

● ● ●●

●●

●●

●●

● ●●●

●●

●●

●● ● ●●●

●●

●●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●● ● ●

●● ● ●●●

●●

●●

●●

●●

●●●

●● ●

●●

● ●●

●●

●●

● ●●

● ● ●●

●●

●●

● ●

●●

● ●

●● ● ●

●●

●●

●●

●● ● ●

● ● ●●

●●

● ●

●●●

●● ●

●●

●●

●●●

●● ●

● ●●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●● ● ●

●● ●

●●●●

●●

●●●

●●

● ●●● ●

●●● ● ●

●●

●●

●●

●● ●

● ● ●●

●●

●●●

●●●

●●

●●

● ●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●● ●● ●

●●●

●●●

●●●

●●

●●

●●

● ●●● ● ●●

●● ●

● ●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

● ●●

●●

●●●●

●●

●●● ●●

●●●

●●●

●●●

●●

●●

●●

●●●

●● ●●

● ●

●●

●●● ●

●●

●●

●●

●●

●●● ● ●●

●●

●●●●

●●

●●

●●

● ●

●●● ●

●●●

● ●●

●●

●● ●●

●●●

●●

●●

●●

● ●●

●●

●●

● ●●

●● ●●●

●● ●

●●

●●●

●●●

●●

●●●

●●

●●●

● ●

● ●

●●●

● ●

●●

●●●

● ●

●●●

● ●

●●

●●

● ●

● ● ●

●●●

●●

●●●

●● ●

● ●●

●●

● ● ●●●●

●● ●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Plants trnL

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●●

●● ●

●●

●●●

● ●●

●●

●●

●●

●●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●● ●

●●

● ●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●●

● ●

●●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●●

●●

●● ●

●●

● ●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Plants 18S

Page 113: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

109

Figure S4: Proportion of variance explained by RDA for each soil variable. Only soilvariablesselectedbyforwardvariableselection(outofthesixinitialvariables)areshown.

AllPCA axis 4 FePCA axis 3 PPCA axis 2 Silt−nutrientsPCA axis 1 Clay−C−N−Al−pH

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Prop

ortio

n of

exp

lain

ed v

aria

nce

(RDA

)

Bacter

ia 16

S

Protists

18S

Fungi

ITS

Fungi

18S

Plants

trnL

Plants

18S

Arthrop

ods 1

8S

Annelid

s 18S

Nemato

des 1

8S

Platyh

elmint

hes 1

8S

Insec

ts 18

S

Insec

ts 16

S

Page 114: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

110

Figure S5: Testing the neutral prediction for the decay of taxonomic similarity withgeographical distance: F2 similarity as a function of log-distance.Red linedenotes linearregression.Notethaty-scalevariesacrosstaxonomicgroups.

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●●● ●

●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●●●

●●●●

● ●●●

●●

●●●●●●●●●

●●

●●●●

●●

●●●●

● ●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●●●

●●

●●

●●●

●●

● ●●●●

●●●●

●●

●●●

● ●●●●

●●●●

●●

●●●

●●

● ●●●●

●●●●

●●

●●●

●●

● ●●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●●

●● ●

●●

●●

●●●

●●●●

● ● ●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●

●●●

●●●●

●●●

●●●

●●

●●●

●●●●

●●

●●●

●●●●

●●●●

●●

●●●●

●●●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●●●

●●

●●●●

● ●

●●●●

●●●

●●●●

●●

●●●

●●●●

●●●

●●●

● ●●●

●●

2 3 4 5

0.00

20.

006

0.01

00.

014 Bacteria 16S

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

2 3 4 50.

20.

30.

40.

50.

60.

7

Protists 18S

● ●●●

●●

●●●●●●●●●●●●●● ●●●●●●●●

●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●

● ●●●●●●●●●●●●●● ●●●

●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●

●● ●●● ●●●●●

●●●●●●●● ●●●●●●

●●●

●●●●●●●●●●●●● ●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●● ●●●

●●●●●

●●●●●●●●●●●●●●●● ●●●●●●●●●●

●●● ● ●●●●●●●●●●●●

●● ●●●●●●●

●●●●●●●●●●●●

●●●● ●●●●●●●●●●●●●

●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●

●●●●●●●●●●●●●●● ●●●

●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●

●●●● ●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●● ●●●●●● ●●●●●●●●

●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●

●●●●●● ●●●●●●

●●●

●●●●●●●●●●●●● ●●●●●●●●●●

●●

●●●●●●● ●●●

●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●

●●●

●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●

●●● ●●●●●●●●

●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●

●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●

● ●

●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●

●●●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●

●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●

● ●●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●

●●

●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●

●●

● ● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●

●●

● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●

● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

● ●

●●

●●● ●●●● ●●●●●●●●●●●●

●●

●●● ●●●● ●●●●●●●●●●

●●

●●

●●● ●●●● ●●●●●●●●●●●●

●●●●●● ●●●●●●●●●●●●

●●●● ●●●●●●●●●●●●

● ●

●●●● ●●●●●●●●●●●●

●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●

● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●● ●●

●●● ●

●●●● ● ● ●●● ●●●●● ● ●●● ●●●●● ●●● ●●●●● ●●●●●●

●● ●

●●●● ●●●●●●●●

● ●●●

●●

2 3 4 5

0.00

0.05

0.10

0.15

Fungi ITS

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●●

●●

● ● ●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●●●

●●

● ●

●●

●●●●●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●●

●●●

●●●●

●●

●●●

●●●

●●

●●●

●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●

●● ●

●●

●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●●●

●●●

●●

●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●

●●●● ●

●●

●●●

●●●●

●●●

●●

●●●

●●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●●●

●●●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●●

●●●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

● ●

●●●

●●

●●

●●

●●●

●●

●●● ●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●●

● ●●

●●

●●●●

●●●

●●

●●

● ●●

●●

●●●

●●●●

●●●

● ●

●●●●

●●

●●●●

●●●

●●

●●

●●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●

●●●●●

●●●●●

●●

●●●●●

●●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

2 3 4 5

0.00

0.04

0.08

0.12

Plants trnL

● ●● ● ●

●●●●●●

●●

●●●●●

●●●● ●

●●●●●●●●●●●●●

●●●●●●

●●●●

●●●●

●●●

●●

●● ●

●●

●●●

●●●

●●

●●

●●

●●

● ● ●●● ●●●

●●●●●●●●●●●●● ●●●●●

●●●●

●●

●●●●●

●●

●●●●

●●●●

●●

●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●

●●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●●●

●●●●●●●●

●●●●●●●

● ●●●

●●

●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●

●●

●●●●

●●●

●●

●●

●●●●●●●●●●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●●●●

●●●●●●

●●●●●●●●●●●● ●

●●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●●●

● ●●●

●●●●●

●● ●

●●●

●●

●●

●●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

● ●●●

●●●●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●● ●

● ●●●

● ●●●

●●●●

●●●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

● ●

●●

●●●

●●

● ●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

● ●●●

●●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●●●●●●

●●

●●●

●●●●

●●●

●●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●●●●●●●●●●●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●●●

●●●●●●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

2 3 4 5

0.0

0.1

0.2

0.3

0.4

Arthropods 18S

●●● ● ●●● ●●●●●●

●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●

●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●

●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●● ●●●●●

●●●●

●●●●●

●●

●●●

●●●●

●●●●●●

● ●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●● ●

●●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●

●●●●●

●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●● ●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●

●●●●● ●●●●●

●●●

● ●●●

●●●●●●● ●●●●●●●

●●●●●●● ●●●●●●●● ●●●●●●●●

●● ●●

●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●

●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ●●●

●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●●●● ● ●●●

●●

●●●●●● ●●●●●●●●●

● ●●●●●●●●●●●●● ●●

●●●●●●●●●●

● ●●●●●●●

●●●

●●●●

●●●●●

●●

● ●

●●●●●

●●● ●●●●●●●●●●● ●●●

●●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●

●●●●●●●●● ●

●●●

●●●●● ●●●●●●●●●●●●●●

●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●

●●

●●● ●●● ●●●●●●●●●

● ●●●●●●●●●●●●● ●

● ●●● ●

●●●●●●●

● ●●●●●●●●●●●●● ● ●●● ●●● ●●●●●●●

●●●● ●●●●●●●●●●●●● ●●● ●●● ●●●●●●●

●●

●●●●●●●● ●● ●●● ●●●●●●●

● ●●●●●●●●●●●●●● ●●● ●●●

●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●

●●●●●●●● ●●●

●●●●●●●●●● ●●

●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●● ●●●●●●●●●

● ● ● ●●●

●●●

●●●●●●●●●●●●● ● ●●● ●●●

●●●●●●●●●●●●

● ●●● ●●●● ●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●

●● ●●●●

●●●●●●●●●● ●●●

●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●

●●●

●●●

●●

●●●●●●●● ●●●●●●●●●●●●●

●●

●●●

●●

● ● ●●● ●

●●●● ●

●●●

●●●●● ● ●●● ●●●●● ●●● ●●●●● ●●●●

●●

●●●●● ●●

●●●●●● ●●●

●●

2 3 4 50.

00.

10.

20.

30.

4

Insects 16S

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●●

●●

●●

●●●●●

●●●

● ●

●●

●●

●●●●

●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●●●●●●●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●●●

●●●●

●●

2 3 4 5

0.2

0.4

0.6

0.8

Annelids 18S

●●

●●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

● ●●●●●●

●●●●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●● ●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●●

● ●

●●●

●●

●●

●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●●

●●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●● ●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●● ●●●

●●●●

●●

●●

●●

●●●●

●●

●●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●●

●●●●●●

●●

●●●●●●

●●●● ●●●●

●●●

●●●

●●● ●

●●●●●●●●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●●

●●●●●

●●●

●●

●●●

●●

●●

●●●●●●

●●●●

●●

●●●●●●●●●

●●●

● ●

●●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

● ●●●●

●●

●●

● ●

●●●●

● ●●●

●●

● ●●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●

● ●●

● ●●●

●●

●●●

●●●●●

●●

●●●

● ●

●●

●●

●●

●●

●●●●●●

●●●●

●●●

●●●●●

●●●●

●●

●●●●

●●●

●●●●●●●●●

●●●

●●●●●

●●

●●●●●

●●●

●●●

●●

●●●●

●●●●●●●●

●●●

● ●●

●●

●●

●●

●● ●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●●

● ●

●●

●●

●●●●

●●

●●

● ●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●●

● ● ●

●●

● ●●●

●●

●●●

●●

●●●

●●●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●●●

●●●

●●●

●●

●●

●●●● ●●●●●●●●●●●●

●●

●●

●●●●

●●●●●●

●●●

●●●●●●●

●●

●●●●●●●●

●●●●

●●

● ●●

●●●

●●

●●

●●

● ●

●●

●● ●●

●●

●●●

●●●●

●●●

●●

●●

●●

2 3 4 5

0.0

0.1

0.2

0.3

0.4

Nematodes 18S

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●●

●●

●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

● ●●● ●●

● ●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

● ●

●●

●●●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●● ●●● ●

●● ●●●●●●●●●●●● ●●●●●●

●●

●●

●●●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●●●●●●● ●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●

● ●●●●●●●●●●●●

●●

●●●

●●

●●

●● ●

●●●●●●

●●

●●●

● ●

●●●●

●●

●●

●●

●●●

●●

●●

●●

2 3 4 5

0.0

0.2

0.4

0.6

0.8

Platyhelminthes 18S

F 2

Log10 of geographical distance (m)

Page 115: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

111

Appendix: Fitting the neutral prediction for the distance-decay of

similarity

Chave & Leigh (2002) derived an analytical prediction for the decay of taxonomic

similaritywith distance in a continuous spatially explicit version of Hubbell’s neutral

model of biodiversity, where individuals have spatial density𝜌and where dispersal

follows a radially symmetric Gaussian probability density

𝑃 𝑟 = 1 2𝜋𝜎! exp − 𝑟! 2𝜎! as a function of distance r. They predict that the

stationaryprobability𝐹! 𝑟 that two randomly selected individualsdistantof rbelong

tothesamespeciesdecreasesas:

𝐹! 𝑟 ≃2𝐾!

𝑟 2𝜈2𝜎

ln 1𝜈 + 2𝜌𝜋𝜎!

provided thatr is larger thanσ. Inourdataset, theminimalvalue takenby r is40m,

whichisapproximatelyequaltothemeandispersaldistancepergenerationfortropical

trees(Conditetal.,2002).Becausethemeandispersaldistancepergenerationis 2𝜎in

themodel, and because trees are likely to be the organismswith the largestσ in our

study,theassumptionthatrislargerthanσcanberegardedhereasreasonable.

The parameter𝜈 it is the speciation probability in the underlying neutral

dynamics,i.e.theprobabilityforanewlybornindividualtobelongtoanewspecies.This

parametercharacterizes theregionalspeciesdiversity foragivenpopulationsize.The

function𝐾! 𝑟 is themodifiedBessel function of the secondkind andof zeroth order,

thatcanbeapproximatedas𝐾! 𝑟 ≃ − ln 𝑟 2 − 𝛾if𝑟 ≪ 1,where𝛾isEuler’sconstant.

Because𝜈 ≪ 1, this approximation can be regarded as valid in our case where𝑟 =

𝑟 2𝜈 𝜎.Theprobability𝐹! 𝑟 thenbecomes:

𝐹! 𝑟 ≃ −2 ln 𝑟 2𝜈

2𝜎 + 2𝛾

ln 1𝜈 + 𝜌𝜋𝜎!

Page 116: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter1–DNA-basedBetaDiversity

112

Inempiricaldata,theprobability𝐹! 𝐴,𝐵 thatarandomlyselectedindividualin

site A belongs to the same species as a randomly selected individual in site B can be

measuredas𝐹! 𝐴,𝐵 = 𝑝!!𝑝!!!!!! ,where𝑝!!istheproportionofspeciessinsiteAand

𝑝!! thatinsiteB,andSisthetotalnumberofspeciesinbothsites(Chave&Leigh,2002).

We can thus compute the quantity𝐹! 𝐴,𝐵 for every pair of sampling points and

performedthelinearregression𝐹! = −𝑎 ln 𝑟 +𝑏,whereristhedistancebetweentwo

samplingpoints(inmeters).Byidentificationwiththemodel’sprediction,weobtain:

𝑏𝑎 = ln

2𝜈2𝜎 + 𝛾

1𝑎 = 𝜌𝜋𝜎! +

12 ln

1𝜈

The first equation provides the value of 𝜎 as a function of 𝜈 , 𝑎 and 𝑏 as

𝜎! 𝜈 = 1 2 exp − 𝑏 𝑎 + 𝛾 ,whilethesumofthetwoequationsprovidesthevalueof

𝜎 as a function of𝜌 ,𝑎 and𝑏 as the solution of:𝜌𝜋𝜎! − ln 2𝜎 + 𝑏 + 1 𝑎 + 𝛾 +

ln 2 2.

Page 117: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

113

Chapter2Inferring neutral biodiversity parametersusingenvironmentalDNAdatasets

GuilhemSommeria-Klein1,LucieZinger1,PierreTaberlet2,EricCoissac2,JérômeChave1

AspublishedinScientificReports,

Volume6,2016

1UniversitéToulouse3PaulSabatier,CNRS,UMR5174LaboratoireEvolutionetDiversitéBiologique,F-31062Toulouse,France.2UniversitéGrenobleAlpes,CNRS,UMR5553Laboratoired’EcologieAlpine,F-38000Grenoble,France.

Page 118: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

114

Chapteroutline

The distribution of species abundances has been one of themost intensively studied

patternsinecology,andtheuseofenvironmentalDNAcoulddramaticallyincreaseour

abilitytomeasureempiricalspeciesabundancedistributionsoverawiderangeoftaxa.

However, DNA-based abundance measurements are noisy and difficult to interpret

comparedtoclassicalcensusesofindividualorganisms.Thischapterdiscussestowhich

extent and under which conditions the whole species abundance distribution may

nevertheless remain informative. The bias on the estimates of Hubbell’s neutral

parameters is takenasameasureof this lossof information. Indeed,Hubbell’sneutral

theoryhasbeenthefirsttoproposearealisticquantitativepredictionforthispatternon

mechanistic grounds. Even though the underlying assumptions have been much

debated, this model remains fundamental as a null model against which non-neutral

effects can be contrasted. It also provides a characterization of species abundance

distributionsbasedontwoparameters,onemeasuringintrinsicdiversity,andtheother

measuringtheconnectivitybetweenlocalandregionalcommunitiesthroughmigration.

The problem is addressed by simulating several plausible sources of bias, based on

literatureandonassumptionsbackedbyabenchmarkdataset.

Page 119: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

115

Abstract

TheDNApresentintheenvironmentisauniqueandincreasinglyexploitedsourceof

information for conducting fast and standardized biodiversity assessments for any

type of organisms. The datasets resulting from these surveys are however rarely

compared to the quantitative predictions of biodiversity models. In this study, we

simulate neutral taxa-abundance datasets, and add simulated noise typical of DNA-

based biodiversity surveys. The resulting noisy taxa abundances are used to assess

whether the two parameters of Hubbell’s neutral theory of biodiversity can still be

estimated.We find thatparameters canbe inferredprovided thatPCRnoiseon taxa

abundances does not exceed a certain threshold. However, inference is seriously

biased by the presence of artifactual taxa. The uneven contribution of organisms to

environmental DNA owing to size differences and barcode copy number variability

doesnot impedeneutralparameter inference,providedthat thenumberofsequence

readsusedforinferenceissmallerthanthenumberofeffectivelysampledindividuals.

Hence, estimating neutral parameters from DNA-based taxa abundance patterns is

possible but requires some caution. In studies that include empirical noise

assessments,ourcomprehensivesimulationbenchmarkprovidesobjectivecriteriato

evaluatetherobustnessofneutralparameterinference.

Page 120: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

116

Page 121: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

117

Introduction

Theobservationofbiodiversitypatternssuchasthediversity,relativeabundanceand

spatialdistributionoforganismsunderpinsmuchofecological theory (Brown,1995;

Rosenzweig,1995;Hubbell,2001).Yetempiricalmeasurementsofthesepatternsare

noisy. In all cases, some taxa are countedmore effectively than others, and error is

generated bymisidentification. Amajor question iswhether this noise is significant

enough to undermine comparisons between empirical measurements and models

(Hilborn&Mangel,1997;Legendre&Legendre,2012).This issuehasrecently taken

on new significance following the advent of DNA-based biodiversity exploration

methods, which are developing fast and hold the promise of rapid, repeatable and

comprehensivebiodiversitymeasurements(Biketal.,2012;Taberletetal.,2012).Yet

they are also less direct than classic biodiversity surveys and entail poorly assessed

noisesources.Inthisstudy,weaskhowtheparameterestimatesofHubbell’sneutral

theory,oneofthemostprominentquantitativebiodiversitymodelsofthelastdecade

(Hubbell,2001;Etienne&Alonso,2007;Rosindelletal.,2012),areaffectedbynoisein

taxa-abundance datasets. We focus on the type of noise generated in DNA-based

surveys, and specifically in DNA metabarcoding surveys (see below; Taberlet et al.,

2012), currently the most popular method for environmental DNA analysis.

Nevertheless,ourresultscanapplymoregenerally.

DNAmetabarcodingisamulti-taxaextensionoftheDNA-basedidentificationof

singlespecimenfromtissuesamplesusingauniversalDNA-barcodesequence(Hebert

et al., 2003). It consists in amplifying a short DNA barcode by PCR from the DNA

extracted fromanenvironmental sample (e.g. soil,water,bulksampleoforganisms),

and sequencing the product by high-throughput sequencing. This method is not

restricted to the detection of known taxa and hence allows for comprehensive

biodiversity measurement. DNA metabarcoding was initially developed to study

bacterialcommunities(Giovannonietal.,1990;Huberetal.,2007;Roeschetal.,2007;

Zinger et al., 2012), but has since been extended to many other groups including

archaea(Schleperetal.,2005)andeukaryoticclades(e.g.plants,earthworms,insects,

Page 122: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

118

fungi;Bienertetal.,2012;Yoccozetal.,2012;Yuetal.,2012;Tedersooetal.,2014).It

ishencenowpossibletostudypatternsofdiversityacrossalldomainsoflife(Ramirez

etal., 2014;Tedersooetal., 2015).However,DNAmetabarcodingobservationshave

seldom been compared to the predictions of biodiversity models (Hubbell, 2001;

Ricklefs,2004).

Over the past decade, the neutral theory of biodiversity has represented a

significantadvanceininterpretingempiricalbiodiversitypatternswithinanecological

guild(Hubbell,2001;Etienne&Alonso,2007;Rosindelletal.,2012).Hubbell’sneutral

model is simple, easily generates biodiversity patterns, allows for exact maximum-

likelihood parameter inference from taxa-abundance distributions, and neutral

predictions on taxa-abundance distributions compare well with empirical surveys

(Etienne,2005;Etienne&Alonso,2005;Jabot&Chave,2009).InHubbell’smodel,sites

vacatedbythedeathofanindividualarereplacedbytheoffspringoflocalindividuals

orbyimmigrants.Birth,deathandimmigrationalloccurirrespectiveofthetaxonthe

organismbelongsto(neutralityhypothesis).Immigrantsaredrawnfromamuchlarger

(regional)poolofindividuals,andtheadditionofnewtaxaintheregionalpoolismade

possibleby(rare)speciationevents.Hubbell’smodelhastwoparameters:θdescribes

the taxon diversity of the regional pool, and m is the immigration rate from the

regionalpoolintothesampledcommunity(seeSupplementaryNote1).

ThepredictionsofHubbell’sneutralmodelhavesofarbeenprimarilycompared

tointegrativepatternsobtainedformacroorganismsusingclassiccensusdata,suchas

theabundancedistributionoftropicalforesttrees(Hubbell,2001).Somestudieshave

alsoappliedneutralmodelstoenvironmentalDNAdatatointerpretthecompositionof

microbial communities. Sloan et al. (2006, 2007) and (Woodcock et al., 2007)

Woodcock et al. (2007) developed a continuous approximation to Hubbell’s model

adapted to large-sized bacterial populations. They focused on estimating the rate of

immigration into the local community independentlyof assumptionson the regional

pool of taxa, by comparing taxa occurrence in multiple samples (Sloan et al., 2006;

Drakare & Liess, 2010; Ostman et al., 2010; Ayarza & Erijman, 2011; Roguet et al.,

2015) or by measuring the turnover of taxa over time (Ofiteru et al., 2010). The

composition of many microbial communities was found to be compatible with

Page 123: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

119

stochasticimmigrationoftaxaofequivalentfitnessfromaregionalpool,atoddswith

the classic assumption that deterministic niche sorting explains the assemblage of

microbial communities (Baas Becking, 1934; Fenchel & Finlay, 2004). Another

approach is tosimultaneouslyestimate thediversityand immigrationparametersby

fitting the taxa-abundance distribution, as it has been commonly done for classic

censuses ofmacroorganisms. Dumbrell etal.(2010) and Lee etal. (2013) did so on

fungal andbacterialDNAdatausingmaximum-likelihoodparameter inferencebased

ontheexactEtiennesamplingformulas(Etienne,2005,2007,2009),whileHarrisetal.

(2015)followedaBayesianapproachinspiredbythefieldofmachinelearning.

Most DNA-based studies comparing empirical abundance patterns to the

predictionsofneutralmodelshavebeenlimitedbythepoordetectabilityofraretaxa

owing to the methods used (Sanger sequencing, DGGE, t-RFLP, ARISA). High-

throughputsequencingnowallowsforimprovedsamplingandprovidesbetterquality

data.Nevertheless,metabarcodingdataarenotdirectlycomparablewithclassiccensus

dataowing tobothexperimental andbiological factors.First,bothPCRamplification

and sequencing produce artifacts. During the PCR amplification, DNA polymerase

makesmistakeswhenreplicatingDNAstrands,ataratethatdependsonenzymetypes.

DNA strands suffer further damage during the high-temperature denaturation step

(Pienaar et al., 2006; Quince et al., 2011; Degnan & Ochman, 2012). Furthermore,

Illuminasequencinggeneratesbetween10-3and10-2errorsperbasepair(Rossetal.,

2013). Clustering algorithms are used to cluster the reads displaying errors with

respect to the original sequence into a singleMolecularOperationalTaxonomicUnit

(MOTU; Sipos et al., 2010; Coissac et al., 2012; Mahe et al., 2014). While these

approaches strongly reduce the number of artifacts in the data, they do not exclude

artifactualMOTUs that aremore difficult to detect (e.g. chimerical fragments, highly

degraded sequences). Second, unbalanced PCR amplification and sequencing among

taxadistortstherelativeabundancesofMOTUs(Siposetal.,2007;Amendetal.,2010;

Airdetal.,2011;Nguyenetal.,2015).Third,relativeabundancesarefurtherbiasedby

noisesourcesinherenttotheuseofDNAbarcodes,suchasthestrongvariabilityofthe

barcode copy number among taxa (Kembel etal., 2012;Weber& Pawlowski, 2013).

Thisproblemisevenmoreseriousformulticellularorganismsbecausethereadcount

shouldalsodependoncellabundance.Abundancesarefurtherbiasedbythevariable

Page 124: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

120

rate of DNA release into the environment through excreted, sloughed or decaying

material(Andersenetal.,2012;Maruyamaetal.,2014;Klymusetal.,2015).

Inthispaper,weconductsimulationstoaddresshowthesourcesofuncertainty

mentionedabovemaydistortparameterestimatesinHubbell’sneutraltheory,andwe

discuss theconceptualdifferencesbetween individual-basedandenvironmentalDNA

approaches to themeasurement of biodiversity.We ask the following questions: 1)

whatistheeffectofartifactualMOTUsandabundancenoiseonestimatingtheneutral

diversity parameter? 2) Can we use the same approach for multicellular as for

unicellularorganisms?3)Whataretheeffectsofthedifferentnoisesourcesonneutral

parameterinferencewhenaccountingfordispersallimitation?

Page 125: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

121

Methods

SamplingfromHubbell’sneutralmodel1.

We generated samples of J individuals following the stationary taxa-abundance

distribution of Hubbell’s neutral model. The immigration from the regional pool of

diversityparameterθintothesampledcommunitycanbeeithercharacterizedbythe

immigrationratemorby thenormalized immigrationparameter𝐼 = !!!!

𝐽 − 1 that

doesnotdependonthesamplesizeJandisthusinvariantbysampling.If𝑚 ≪ 1,I is

approximatedbytheproductJm,noted𝑁!𝑚inSloanetal.(2006,2007).

Wefirstassumednodispersallimitation(i.e.𝑚 = 1).Wegeneratedasampleby

runningJtimesthefollowingalgorithmparameterizedbyθ:atstepj,drawindividual

j+1 from a new taxon with probability𝜃/(𝑗 +𝜃), or draw one of the j individuals

already present and add an individual j+1 of the same taxon. This algorithm, due to

Hoppe(1984),partitionsJindividualsintoarandomnumberToftaxaaccordingtothe

Ewensdistributionofparameterθ(Ewens,1972).

Wethengeneratedsamples fromadispersal-limitedneutralcommunityusing

thetwo-stepprocedureprovidedinEtienne(2005)whichpartitionsJindividualsinto

arandomnumberToftaxa.First,werunJtimesHoppe’salgorithmasdescribedabove

butwithparameterI,soastopartitiontheJindividualsintoAimmigratingancestors.

Second, we run A times the algorithm with parameter θ, so as to partition the A

immigrating ancestor into T taxa, thus taking into account the taxa-abundance

distributionintheregionalpool.Finally,weassignthe J individualstothetaxonomic

identityoftheirimmigratingancestor.

We generated samples ofJ =105 individuals.We explored a realistic range ofparametervalues:θin[1,500]andmin[0.001,1].

Page 126: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

122

SimulatingnoiseinDNAsequencereads:experimentalnoise2.

WesimulatedtheDNAmetabarcodingprocedurebysamplingNsequencereadsfrom

therelativetaxaabundancesoftheneutralmodel,possiblyaftermodifyingtherelative

abundancesaccordingtosimulatednoisesources(seebelow).Wepresenttheresults

obtained for the valueN =104 , a typical number of Illumina sequence reads for oneenvironmentalsample.

In order to test the effect of misidentification bias on neutral parameter

inference,weaddedartifactualMOTUstothedata,whilekeepingthenumberofreads

constant. We assumed that each true MOTU with a read abundance r generates a

random number of artifactual MOTUs, drawn from a multinomial distribution with

weight r. We added either singletons, or MOTUs with larger read abundances. We

obtainedanexampleof artifactualMOTUswith realistic abundance structure froma

benchmark experiment (see below and SupplementaryMethods). Drawing on these

empirical data,we simulated read abundances in the followingway: each artifactual

MOTUwas assumed to have an abundance of 1 read ifr <50 , or an abundance x ifr ≥50 ,wherexliesbetween1andr /50 withaprobabilitydensityp(x)=

1log(r/50)x .

Molecular experimental procedures introducebiases also in read abundances,

becausetheefficiencyofPCRamplificationandsequencingisvariableacrossMOTUs.

For instance, PCR amplification is less efficient if PCR priming sites differ from the

primersequence(Siposetal.,2007),orifthebarcodesequenceistoolongorGC-rich

(Airdetal.,2011).Asaresult,thereadabundancedistributionofMOTUsisnoisedwith

respect to theDNAbarcode abundancedistribution in the sample.Weassumed that

thenoise takes the formofa lognormallydistributedmultiplicativenoiseonrelative

abundances,withmean1andlogstandarddeviation𝜎!"#.Thischoiceisparsimonious

because this noise is predominantly due to PCR (Aird et al., 2011), and the

multiplicativeamplificationofDNAstrandsbyPCRgeneratesamultiplicativenoiseon

abundances. This multiplicative noise can be further assumed to result from the

product of random independent variables and thus to be lognormally distributedby

virtue of the central limit theorem. We tested the effect of noise intensity𝜎!"#on

Page 127: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

123

neutralparameterinference.Forcompleteness,wealsotestedtheeffectofanadditive

Gaussiannoiseofstandarddeviation𝜎!"" onMOTUsrelativeabundances,fordifferent

𝜎!"" values.Thistypeofnoisecanberegardedassimulatingthenoisegeneratedinthe

sequencingstep.

To illustrate our modelling choices with empirical data, we produced a

benchmark dataset obtained by mixing the DNA of 16 plant species in known

quantities.TheexperimentanditsresultsaredetailedintheSupplementaryMethods.

Afterfollowingstandarddatacurationprotocols,wefoundthatthedatasetcontained

33%ofartifactualMOTUsanddisplayedalognormallydistributedmultiplicativenoise

onrelativeabundancesoflogstandarddeviation𝜎!"# = 1.2.Wereportedthesevalues

onthefiguresasexamplesofrealisticnoiseintensities.

SimulatingnoiseinDNAsequencereads:‘biological’noise3.

Irrespective of experimental noise, variability in the number of barcode copies per

individualmaycausebiasintheinterpretationofreadabundances.Forbacteria(16S

rDNA)orprotists(18SrDNA),barcodecopynumbervariability innuclearDNAisan

importantcontributiontoabundancenoise(Kembeletal.,2012;Weber&Pawlowski,

2013):Kembeletal.(2012)foundthatthebarcodecopynumberofthe16SrDNAgene

follows a zero-truncated Poisson distribution of parameter𝜆 = 4across a range of

bacterial clades. Formulticellular eukaryotes, organellic barcodes are typically used,

andtheysimilarlydisplayvariablecopynumberspercellacrosstaxaandtissuetypes.

Toassessthisissue,wetestedhowazero-truncatedPoisson-distributedmultiplicative

noiseaffectsneutralparameter inference, forvariousvaluesof theparameterλ.The

intensity of this noise is measured by the coefficient of variation (i.e., standard

deviation over mean) of the zero-truncated Poisson distribution. Since it reaches a

maximumatλ =1.8 ,noiseintensityismaximalforthisvalue.Formulticellularorganisms,thevariabilityinthenumberofbarcodecopiesper

individual is further amplified because the number of cells may vary vastly across

individuals, owing to body-size differences. We simulated size differences between

Page 128: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

124

individuals followingasimpleandgenericapproach.As inO’Dwyeretal. (2009),we

assumedthatallindividuals,irrespectiveofthetaxontheybelongto,growinsizeover

timeataconstantrategfromaninitialnumberofcellsn0atbirth,anddieataconstant

rated. The stationary probability densitypind(n) of having a numbern of cells for arandomly chosen individual is given by the solution of the von Foerster equation

(O’Dwyeretal.,2009):𝑝!"# 𝑛 = !!𝑒!

!!(!!!!)(seeSupplementaryNote2).Weusedthis

distribution todrawanumbern of cellsbetweenn0 and infinity for each individual,

andmodifiedtheMOTUsrelativeabundancesaccordingly.Notethatwesimulatedsize

differencesbetweenindividualsandnotbetweentaxa,whichwouldhavebeenakinto

simulatingamultiplicativenoiseontaxaabundancesasabove.Wetestedtheeffecton

neutralparameterinferenceforarangeofvaluesof !!!!

+ 1,theratioofthemeancell

number!!+ 𝑛!dividedbytheinitialcellnumber𝑛!.Noiseintensityismeasuredbythe

coefficientofvariation1 (1+ !!𝑛!)oftheprobabilitydensity𝑝!"# 𝑛 .Itisboundedby

1 for !!!!

≫ 1, which corresponds to the case of taxa spanning large ranges of body

sizes,suchastreesorvertebrates.

Organismsmaybeentirely contained in theenvironmental sample if theyare

sufficiently small, orwhenDNA is extracted from amixture of directly sampled live

organisms,suchasinsectsfromalighttrap(bulksamples;Yuetal.,2012).However,in

most cases, only small fractions of these organisms are sampled (e.g. roots, pollen,

seeds, spores, faeces, and different secretion types), or even only extracellular DNA

resulting fromcelldeathandsubsequentdestructionofcellstructure(Levy-Boothet

al., 2007; Taberlet et al., 2012). Thus, the abundance distribution of environmental

DNAalsodependsonthekineticsofDNAreleaseanddegradationintheenvironment.

We assumed that this dynamics is fast with respect to changes in community

composition,sothatthe ‘stock’ofenvironmentalDNAis inasteadystate.Underthis

assumption, the rate of DNA release through the death of organisms is roughly

proportionaltothetotalnumberofcellsofthecurrentlylivingindividuals.Inaddition,

therateofenvironmentalDNAreleasebyalivingorganismreflectsitsmetabolicrate

and we assumed it to scale as the power 3/4 of body mass (or cell number), as

predicted by themetabolic theory of ecology61. DNA degradation ratewas assumed

Page 129: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

125

uniform across individuals. Even though we focus here on multicellular organisms,

unicellularorganismsdoexcreteDNAmaterialanddifferinmetabolicratesaswell.

Based on the assumptions of the previous paragraph, we simulated the

abundancedistributionofenvironmentalDNAasfollows.We(1)generatedaneutral

sampleofindividuals,(2)assignedanumberofcellsnbetweenn0andinfinitytoeach

individualasabove,(3)countedafirstcontributiondn ofeachindividualtothestock

of environmentalDNA,withd thedeath rate, (4) andcounteda secondcontribution

𝑟!𝑛!!of each individual to the stock of environmental DNA, with r0 the rate of DNA

releaseforahypotheticalone-cellindividual.Thus,environmentalDNAabundanceper

individual is proportional to𝑛 + !!!𝑛!!rather than n. We tested the effect on neutral

parameter inference by varying 𝑟! 𝑑 , the parameter controlling the relative

contributionoflivinganddeadorganismstoenvironmentalDNA.

Estimatingtheneutralmodelparametersfromthetaxa-abundance4.

distribution

We estimated the parameters of Hubbell’s neutral model by maximum-likelihood

inference from the simulated taxa-abundancedistribution foranumberof simulated

noise sources. To test the influence of noise,we compared the estimated parameter

valuesθ andI with the values of θ and I used to generate the initial samples ofindividuals. For each set of parameters and noise intensity, we generated 100

simulatedsamples.Wereportedthemeanandstandarddeviationoftherelativebiases

(𝜃 − 𝜃) 𝜃andlog!"(𝐼 𝐼)overthe100realizations.

In the absence of dispersal limitation, the Ewens distribution permits the

inference of θ by likelihood maximization. The maximum-likelihood estimator of θ,

hereafter referred to as theEwens estimator, is implicitly given by𝑇 = !!!!

!!!!!! as a

functionofthenumberToftaxaandthenumberJofindividuals(Ewens,1972).Inthe

dispersal-limitedcase,theEtiennedistributionprovidesanexactlikelihoodexpression

for the simultaneous inference of θ and I (Etienne, 2005), as implemented in the

Page 130: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

126

software Tetame (Jabot et al., 2008). As noted previously in the literature, the

likelihoodlandscapeoftheEtienneformulaoftendisplaystwolocalmaxima(Etienne

etal.,2006;Jabot&Chave,2009).Tofindthetrueparametervalues,wefirstestimated

θ using the Ewens estimator, and selected the local maximum with the θ estimate

closesttothevalueyieldedbytheEwensestimator.Priortotheseanalyses,wetested

the performances of both estimators on unbiased neutral data depending on

parametervaluesandsamplesize(seeSupplementaryNote3).

IntypicalenvironmentalDNAdata,thenumberJofindividualsinthesampleis

unknown.Asalreadydoneinpreviousstudies(Leeetal.,2013),weusedthenumber

of sequence reads as an effective number of individuals. This is possible owing to a

mathematicalpropertyoftheEwensandEtiennedistributions:bothdistributionsare

invariantbysamplingwithoutreplacement(Etienne&Alonso,2005),hencemaximum-

likelihood inference yields the same results on any random sample from the

community,andonanyrandomsubsamplefromaninitialsample(uptoapossiblebias

in the estimator). As a consequence, read abundances can be used for neutral

parameter inference, as long as the reads can be regarded as forming a subsample

withoutreplacementoftheinitialindividuals.Thisassumptionishowevernotalways

verified in empirical data (see Discussion). The invariance property of Etienne

distributiononlyholdsifthedistributionisexpressedasafunctionofI,thereforewe

usedheretheimmigrationparameterIinsteadofmforthepurposeofinference.Inthe

following,malwaysreferstothevalueintheinitialsampleofJindividuals.

Intheabsenceofdispersallimitation,θcanalsobeestimatedfromtheslopeof

the ranked log-abundance curve, a method that has the advantage of being

independentofJ.Indeed,thelogarithmof𝔼[𝑃!],theexpectedrelativeabundanceofthe

ith most abundant taxon, is given by:log 𝔼[𝑃!] = − log𝜃 − 𝑖 log(1+ 1 𝜃)(Ewens &

Tavaré, 1997). For simulated abundancenoise,we estimatedθ using thismethod in

additiontoEwensestimator.Werestrictedthelinearregressiontothelineardomain

of the ranked log-abundance curve. We also compared the performance of both

inferencemethods intheabsenceofsimulatednoise forsamplesof102,103,104and

105 sequence reads and for initial samples of individuals of different sizes (see

SupplementaryNote4).

Page 131: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

127

Results

We first included artifactual MOTUs in a simulated sample and tested the effect on

estimatingthediversityparameterθoftheneutralmodelwithoutdispersallimitation.

The relative bias(𝜃 − 𝜃) 𝜃increasedwith the proportion of artifactualMOTUs, first

linearlyandthenfasterthanlinearly(Fig.1a-b).Itdidnotdependontheinitialθvalue

oronthereadabundanceoftheintroducedartifactualMOTUs.Thestandarddeviation

of𝜃wasnotmodifiedbythepresenceofartifactualMOTUs.

Next, we simulated PCR noise, modelled as a lognormally distributed

multiplicativenoisewith log standarddeviation𝜎!"#. Thisnoisehadnoeffecton the

inference of theθ parameter below a threshold𝜎!"#,!! . For𝜎!"# > 𝜎!"#,!! , θ was

underestimated.Thevalueof𝜎!"#,!!decreasedwith increasingθ but remainedof the

orderof1forθbetween1and500(𝜎!"#,!! ≈ 5for𝜃 = 1and𝜎!"#,!! ≈ 0.5for𝜃 = 500;

seeFig.1c-d).WealsoappliedanadditiveGaussiannoiseofstandarddeviation𝜎!"" to

therelativeabundances.Thistypeofnoiseintroducedabiasin𝜃forvaluesof𝜎!"" at

leastoneorderofmagnitudelargerthantherelativeabundanceoftheleastabundant

MOTUs(SupplementaryFig.S1).Neithertypeofnoiseaffectedthestandarddeviation

of𝜃(Fig. 1, Supplementary Fig. S1). These results held both inmaximum-likelihood

inferenceandwhenusinglinearregressionontherankedlog-abundance.

We then simulated the variability in barcode copy number by applying a

multiplicative noise distributed according to a zero-truncated Poisson distribution.

Thistypeofnoisehadnoeffectonθinference,evenforthemaximumnoiseintensityat

𝜆 = 1.8(Fig. 1e-f). We accounted for body size differences by assuming a steadily

growingcellnumbernoverthecourseofanindividual’slife,andbyvaryingtheratio

𝑔 𝑑𝑛! + 1ofthemeannumberofcells𝑔 𝑑 + 𝑛!dividedbytheinitialnumberofcells

𝑛!.We found that this ratio had no effect on themean and standard deviation ofθ,

evenatlargevalues(Fig.1g-h).Wealsotestedtheeffectofassigninganenvironmental

DNA mass proportional to𝑛 + !!!𝑛!!to individuals (where n is the cell number) to

reflectthejointeffectofmortality(nterm)andcellularturnover(n!!term,proportional

Page 132: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

128

tometabolic rate).Wedidnot findanyeffectonθ inferenceeven for largevaluesof

𝑟! 𝑑(SupplementaryFig.S1).

0 50 100 150

15

5050

0 θ = 2030% artif. MOTUs

a

MOTU abundance rank

Read

abu

ndan

ce (l

og s

cale

)

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.5

1.0

1.5

Benchmark dataset

b

θ = 1θ = 20θ = 500

Proportion of artifactual MOTUs(θ

−θ)

θ

0 20 40 60 80 100

15

5050

0

θ = 20σlog = 1.2

c

MOTU abundance rank

Read

abu

ndan

ce (l

og s

cale

)

0.05 0.10 0.20 0.50 1.00 2.00

−0.8

−0.4

0.0

0.2

0.4

d

Benchmark dataset

σlog (log scale)

(θ−

θ)θ

0 20 40 60 80 100 120

15

5050

0 θ = 20λ = 4

e

MOTU abundance rank

Read

abu

ndan

ce (l

og s

cale

)

0 2 4 6 8

−0.4

−0.2

0.0

0.2

0.4

Bacteria Kembel et al. 2012

f

λ

(θ−

θ)θ

0 20 40 60 80 100 120

15

5050

0 θ = 20g (dn0) = 1000

g

MOTU abundance rank

Read

abu

ndan

ce (l

og s

cale

)

1e−03 1e−01 1e+01 1e+03

−0.4

−0.2

0.0

0.2

0.4

h

g (dn0) (log scale)

(θ−

θ)θ

Page 133: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

129

Figure1:Neutralparameterinferencewithoutdispersallimitation.Leftpanels:meanMOTUrank abundancedistributionsover100 realizations forθ = 20in a104-read sample,without(dashedblueline)andwith(blackline)simulatednoise:(a)30%artifactualMOTUsadded(asmeasuredinbenchmarkdataset),(c)multiplicativelognormalnoiseoflogstandarddeviationσ!"# = 1.2(as measured in benchmark dataset), (e) multiplicative zero-truncated Poissonnoisesimulatingbarcodecopynumbervariability(Poissonparameterλ = 4; cf.Kembeletal.2012),and(g)sizestructureamongindividuals,foraratio !

!!!= 1000(meanbodymassover

birthmass).Rightpanels:meanandstandarddeviationover100realizationsoftherelativebias on the θ estimate in a 104-read sample, forθ = 1(green),θ = 20(black) andθ = 500(red),asafunctionof(b)theproportionofartifactualMOTUs(dashedbluelineunderlinesthelineardependence),(d)thelognormalnoiseintensityσ!"#,(f)thePoissonparameterλ,and(h)theratio𝑔 (𝑑𝑛!)

!!!!

.

Finally, we replicated the analysis in the presence of dispersal limitation (i.e.

assuming that𝑚 < 1 ). We found that the dispersal-limited maximum-likelihood

estimatorcanbestronglybiasedevenintheabsenceofsimulatednoisewhendispersal

limitation is toostrongor tooweak,especially for largeθvalues (seeSupplementary

Note 3). Therefore, we limited ourselves to parameter values that could be well

estimated in the absence of simulated noise. Provided the immigration rate is large

enough (𝑚 > 0.1 ), the relative bias (𝜃 − 𝜃) 𝜃 depended on the proportion of

artifactual MOTUs similarly to the m=1 m = 1case. For lower values of m, the

dependence of(𝜃 − 𝜃) 𝜃on the proportion of artifactualMOTUswas even stronger

(Fig. 2a-b). The relative biaslog!"(𝐼 𝐼)on the normalized immigration parameter

increased linearly with the proportion of artifactual MOTUs. Applying a lognormal

multiplicativenoiseof logstandarddeviation𝜎!"#onMOTUsrelativeabundancesdid

not bias the estimation of (θ, I) below a noise threshold𝜎!"#,!!identical to the one

foundwithout dispersal limitation. The threshold𝜎!"#,!!decreased only slightlywith

decreasingmvalue.Above𝜎!"#,!! ,θwasunderestimatedandIoverestimated(Fig.2c-

d). Applying an additive Gaussian noise of standard deviation𝜎!"" to the relative

abundancesintroducedabiasforvaluesof𝜎!"" largerthantherelativeabundanceof

theleastabundantMOTUs(SupplementaryFig.S2).Amultiplicativenoisedistributed

according to a zero-truncated Poisson had no influence on the parameter estimates

(Fig.2e-f),andlikewiseanexponentiallydistributednumberofcellsstillhadnoeffect

onparameterinferenceinthedispersal-limitedcase(Fig.2g-h,SupplementaryFig.S2).

Page 134: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

130

Discussion

Although they provide an unparalleled amount of information, biodiversity studies

basedonenvironmentalDNAalsohavelimitations.Oneofthemisthattheabundance

of sequence reads corresponding to a given molecular taxonomic unit does not

necessarily reflect the true population abundance of the corresponding taxon. Our

analysisoffersaquantitativeassessmentoftheimportanceofthisissueinattempting

torelateenvironmentalDNAdatasetswiththeoreticalmodelpredictions.

Our goal was to assess when amplicon-based DNA read abundance data can

offerbiological insights into thepredictionsofHubbell’s neutral theory.We selected

Hubbell’smodeloverothermodelspredictingtaxa-abundancedistributionsbecauseit

incorporatesanumberofkeyfeaturesforanybiodiversitymodelsuchasdemographic

stochasticityanddispersallimitation(Vellend,2010).Estimatingtheparametersθand

m of the neutral model is useful in interpreting biodiversity patterns even if the

communityisnotgovernedbypurelyneutralmechanisms(Jabotetal.,2008).Indeed,

θ is closely related to Fisher’s biodiversity index, and is an unbiased index of

biodiversity,whilemquantifieshowthelocalsampleisconnectedtoitssurroundings.

Wesimulatedtaxaabundancedatasetsfromaneutralmodelandaddednoisetothem

usingarangeofplausiblenoisetypesandintensities.Weshowedthattheparameters

θand I could still be reliably estimated by maximum likelihood inference from the

simulated sequence reads, provided that artifactual MOTUs are rare, and that

lognormal noise on relative read abundances is below a log standard deviation

thresholdthatdependsonθ.Wealsoshowedthatunderourmodellingassumptions,

neutral inference is unbiased for assemblages of multicellular organisms and for

variablebarcodecopynumbers.Finally,we found that thenoise termshada similar

effect on parameter inference when fitting the one-parameter version of the model

(withoutdispersallimitation)andwhenfittingHubbell’sdispersal-limitedmodel.

Oneof themajordifferencesbetweenenvironmentalDNAsurveysandclassic

biodiversitysurveysisthatthenumberofsampledindividualsisusuallynotmeasured.

Page 135: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

131

Yet,mostbiodiversitymeasuresassumetheknowledgeoftheorganisms’samplesize.

To solve this problem, we assumed in our simulations that the number of reads is

several times smaller than the number of effectively sampled individuals:𝑁 = 10!

sequencereads for𝐽 = 10!initial individuals.Under thisassumption,sequencereads

may be seen as a random subsample of the individuals, and because themaximum-

likelihood approach of the neutral theory relies on sampling formulas that are

invariant under subsampling, it follows that the inference on reads is unbiased (see

SupplementaryNote 4). Generating a larger number of individuals did not alter our

resultsbutwascomputationallyprohibitivewithouralgorithm.

The assumption that the number of sampled individuals exceeds that of

sequence reads is reasonable for prokaryotes (Whitman et al., 1998) and

microorganisms in general, but is unrealistic for larger organisms. One empirical

method to test whether the sequencing data meet the requirement for neutral

maximum-likelihoodinferenceistotakeasmallersubsampleofreadsandcheckthat

theparameterestimatesareunchanged.Ifnot,oneshoulddecreasesamplesizeuntil

stability is achieved (see SupplementaryNote 4). If environmental DNA data do not

consistofadiscretenumberofreads,asisthecaseint-RFLPandARISA,anarbitrarily

setsamplesizemaybeused(Leeetal.,2013).Thenumberof individualscanalsobe

estimatedempirically, as inWoodcocketal. (2007)orDumbrelletal. (2010). In the

neutralmodelwithoutdispersallimitation,amorestraightforwardapproachistoinfer

θ from the slope of the ranked log-abundance distribution, but this requires an

arbitrarydelimitation of the linear domain of the curve, and it is reliable only if the

read sample is large enough and contains a large enough taxonomic diversity. A

general rule is that the sampling scheme should be suited to the size and spatial

density of the target organisms: for large organisms, multiple spatially distributed

environmentalsamplesshouldbepooledsoastosampleasufficientlylargenumberof

individuals.Forinstance,capturingtheabundancedistributionofplanttaxafromsoil

DNAsamples requirespoolinga sufficientnumberof soil samplesovera sufficiently

largearea.

Page 136: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

132

0.0 0.1 0.2 0.3 0.4 0.5

02

46

810

Benchmark dataset

a

m = 1m = 0.1m = 0.01m = 0.001

Proportion of artifactual MOTUs

log 1

0[I

I]

0.0 0.1 0.2 0.3 0.4 0.5

−0.5

0.0

0.5

1.0

b

Benchmark dataset

Proportion of artifactual MOTUs

(θ−θ)

θ

0.05 0.10 0.20 0.50 1.00 2.00

−20

24

68

10

c

Benchmark dataset

σlog (log scale)

log 1

0[I

I]

0.05 0.10 0.20 0.50 1.00 2.00

−0.5

0.0

0.5

d

Benchmark dataset

σlog (log scale)

(θ−θ)

θ

0 2 4 6 8

−10

12

3

e

Bacteria Kembel et al. 2012

λ

log 1

0[I

I]

0 2 4 6 8

−0.4

0.0

0.2

0.4

0.6

f

Bacteria Kembel et al. 2012

λ

(θ−θ)

θ

1e−03 1e−01 1e+01 1e+03

−10

12

3

g

g (dn0) (log scale)

log 1

0[I

I]

1e−03 1e−01 1e+01 1e+03

−0.4

0.0

0.4

0.8

h

g (dn0) (log scale)

(θ−θ)

θ

Page 137: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

133

Figure2:Neutralparameterinferenceinthepresenceofdispersallimitation.Wesimulateda10!-read sample and computed the mean and standard deviation over 100 realizations of(𝜃 − 𝜃) 𝜃 and log!"(𝐼 𝐼) . Results are plotted for𝜃 = 20 and form = 1 (black),m = 0.1(green),m = 0.01(blue) andm = 0.001(red). Panels a-b: variation with the proportion ofartifactualMOTUs(dashedbluelineunderlinesthelineardependence).Panels c-d:variationwith the log standard deviation σ!"# of a multiplicative lognormal noise on relativeabundances. Panels e-f: variation with the parameter λ of a multiplicative zero-truncatedPoissonnoise.Panelsg-h:variationwiththebodysizeratio𝑔 (𝑑𝑛!)

!!!!

.

When accounting for dispersal limitation, a single sample of sequence reads

does not always provide enough information to reliably infer both θ and I from the

taxa-abundance distribution, even in the absence of additional noise source. The

maximum-likelihoodestimatormaybestronglybiasedwhentheimmigrationrateinto

the local community is either too low or too high, and increasingly so for larger

θ values (see Supplementary Note 3). Since these biases decrease with larger read

samplesize,thenumberofsequencereadsshouldbeaslargeaspossibleaslongasit

does not preclude using the sequence reads for parameter inference. Moreover, in

ordertoavoidbiasinthecaseofweakdispersallimitation,theEwensestimatorshould

be favoured whenever it yields a higher likelihood value than the dispersal-limited

estimator.

Inpractice,environmentalDNAstudiesoftensamplethesameregionalspecies

pool in different locations, which allows for more robust multi-sample maximum-

likelihood inference (Etienne, 2007, 2009). It should be noted however that exact

maximum-likelihood inference can be computationally prohibitive in the dispersal-

limitedcaseforlargernumbersofreadsthanweusedinthisstudyorinthecaseofa

multi-sample approach with large read samples (Lee et al., 2013). Continuous

approximationsdrawingontheworkofSloanetal.(2006,2007)andWoodcocketal.

(2007) might then be preferred, such as the Bayesian formulation of Harris et al.

(2015).

Our analysis reveals that the presence of artifactual MOTUs is the most

detrimentaltoneutralparameterinference.Bioinformaticsmethodsaimingatlimiting

thenumber of artifactualMOTUs shouldbe carefully applied to the sequencingdata

Page 138: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

134

beforeanyattemptatestimatingbiodiversityindices(Siposetal.,2010;Coissacetal.,

2012; Mahe et al., 2014). However, these methods do not guarantee a complete

filtering of artifactual MOTUs from empirical datasets. In particular, chimeric

sequences formed at the PCR stagemay bemisconstrued asMOTUs. Because these

sequences are generated by rare error-generating PCR events, they should be

predominantly representedby few reads.Thusone strategy for removingartifactual

MOTUsconsistsinignoringallMOTUsbelowanempiricallysetabundancethreshold.

However,indoingso,welosetheinformationontherelationshipbetweenthenumber

ofreadsandthenumberofMOTUs.Hencewesuggestthatamoresatisfactorymethod

tomitigatethisproblemistotakeasufficientlysmallsubsampleofthesequencereads

soastotrimouttheartifactualMOTUs.

ThepresenceofartifactualMOTUsinoursimulatedtaxaassemblagesmanifests

itselfbyabreakintheslopeoftherankedlog-abundancecurve(Fig.1a,seealsoFig.S3

in Supplementary Methods). Thus, the adequate subsample size for an empirical

dataset may be chosen so as to trim out the MOTUs with abundances below an

observedbreakintherankedlog-abundancecurve.Anotherfindingofourstudyisthat

forthesameproportionofartifactualMOTUs,theθestimatehasasimilarrelativebias

acrossθvaluesandtheIestimateasimilarrelativebiasacrossIvalues.Therefore, if

artifactual MOTUs cannot be entirely excluded in an environmental DNA dataset,

conclusionsshouldbebasedonratiosofneutralparameterestimatesamongsamples

ratherthanonabsolutevalues.

We modelled PCR noise using a lognormally distributed multiplicative noise

term. We found a threshold noise value beyond which the inference of the neutral

parametersbecomesbiased.Thisthresholdwasfoundtobelowerforlargerθvalues.

For instance, the empirical noise intensity𝜎!"# = 1.2measured on our benchmark

datasetwasnearorbelowthethreshold𝜎!"#,!!forθvaluesuptoca.𝜃 = 20,whilefor

larger θ values, it was responsible for a moderate underestimation of θ (20% for

𝜃 = 500)and fora seriousoverestimationof I.Nevertheless,ourbenchmarkdataset

was here used for illustrative purposes, and noise intensity may differ in other

datasets. In metabarcoding studies, noise intensity likely depends on the barcode,

taxonomicgroupandwetlaboratoryprotocol.Thereforewestronglyadvisetoinclude

Page 139: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

135

at least onebenchmarkdataset as part of any environmentalDNA study to quantify

noise intensity.Empiricalnoiseassessmentscanthenbecomparedtooursimulation

results.

WealsosimulatedaGaussianadditivenoiseonabundancedataandfoundthat

ithadadisproportionateeffectontheleastabundantMOTUs,thusdistortingthetaxa-

abundancedistribution: parameter inferencewasbiased if the standarddeviationof

thenoisewaslargerthantheabundanceoftheleastabundantMOTUs.Hereagain,itis

possible to correct for this type of noise in empirical datasets by subsampling the

sequence reads. Additive noise can be considered to model the abundance noise

generated by the sequencing step or by a single PCR cycle, while the succession of

severalPCRcyclesproducesamultiplicativeabundancenoise.

Another potential bias is due to the indirect relationship between the

number of DNA barcode sequences in the sample and the number of sampled

individuals. In particular, in the case ofmulticellular individuals, some of themmay

contributedisproportionatelymore thanothers.Given thevariabilityandcomplexity

oftheassociatednoisestructure,wechosetofollowamodellingapproachretainingas

much generality as possible. We size-biased our samples by assuming that DNA

availabilityintheenvironmentisproportionaltobodymass,ortotheturnoverofbody

mass (i.e. the metabolic rate). We found that neutral parameter estimates are not

modifiedbysizestructure in thecommunity, irrespectiveofhowstronglystructured

thecommunityis,whichisaninterestingandgeneralresult.

Our approach to accounting for body size is directly inspired from the size-

structuredneutralmodelofO’Dwyeretal.(2009).Thismodelintegratesthegrowthof

individuals intoaneutralpopulationdynamicswithoutdispersal limitation,andmay

offeranalyticalpredictionsfortheneutral“SpeciesBiomassDistribution”(SBD)while

accounting for the dependence of birth, death and growth rates on the size of

individuals.When individuals grow inbody size at a constant rate andneitherbirth

nor death rates depend on size, this model predicts the same SBD as obtained

analyticallyunderourassumptionofindependentexponentiallydistributedsizes(see

SupplementaryNote2).OurchoiceofarateofenvironmentalDNAreleasescalingwith

the3/4thpowerofbodymassismotivatedbyapredictionofthemetabolictheoryof

Page 140: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

136

ecology,whichrelates themetabolic rate to thebodymass inoneof the fewgeneral

lawsofecology(Westetal.,1997).

Eventhoughourmodellingapproachderivesfromtheoreticalconsiderations,it

isalsosupportedbysomeempiricalevidence:ithasbeenshownthattherateofDNA

detectionintheenvironmentisbiasedbythesizeoforganisms(Andersenetal.,2012;

Maruyamaetal.,2014;Klymusetal.,2015),andthefactthatDNAabundanceshould

scale non-linearly with body mass has been experimentally verified in fishes

(Maruyama et al., 2014). Nevertheless, the noise introduced by size structure,

fragments of organisms and extracellular DNA certainly has a farmore complicated

structurethanwesimulated.Forinstance,ratesofDNAreleaseintotheenvironment

and of DNA degradation both depend on taxa and on local conditions, and fluctuate

temporally (Levy-Booth et al., 2007; Barnes et al., 2014; Strickler et al., 2015).

Moreover,theunevenspatialdistributionofenvironmentalDNAmaypreventproperly

samplingthetaxa-abundancedistributioninthecommunity,especiallyifwholepieces

of living or decaying multicellular organisms are contained in the environmental

sample. Poolingmultiple spatially distributed samples should help average out local

heterogeneity.

Inthisstudy,weconsideredthatdepartureofthenumberofDNAbarcodereads

fromtherealtaxonabundanceisasourceofbias.However,thissourceofbiasmaybe

generallyseenastheaccumulationofmutationsduringreplication.Inecology,theonly

type of replication taken into consideration is demography, butDNAmetabarcoding

data are also the result of cellular and PCR replication processes. Since the

assumptionsoftheneutraltheoryaregenericandapplytoanycollectionofreplicating,

mutating,andpotentiallydispersingentities,wecouldreplaceindividualorganismsby

DNAbarcodesasourbasicreplicatingentities,andreinterprettheneutralparameters

accordingly.Asa consequence,weexpect the taxa-abundancestructurepredictedby

theneutraltheorytoberobustaslongastheDNAbarcodesdonotdiffertoomuchin

theirreplicating,mutatinganddispersingabilities.

This study demonstrates that inferring the parameters of Hubbell’s neutral

model from the taxa-abundance distribution is possible even in noised biodiversity

Page 141: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

137

datasets.Wetestedthishypothesisforarangeofbiologicallyplausiblenoisetermson

simulated metabarcoding data, and we provide guidance for neutral parameter

inference from such data. Our results indicate that whether an environmental DNA

dataset really reflects the sampled communitydependsonnoise intensity.Theyalso

suggest that this question can be answered by computing simple metrics on a

benchmarkdatasetandcomparingthemtooursimulations.Theonlywaytoquantify

thenoiselevelistoconductcarefulbenchmarkingexperiments,whichwilldependon

theexactsamplingandanalysisprotocol.

Page 142: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

138

Acknowledgements

We thankRyanChisholm,FabienLarocheand JamesO’Dwyer for fruitfuldiscussion.

Thisworkhasbenefitedfrom“Investissementd’Avenir”grantsmanagedbytheFrench

AgenceNationaledelaRecherche(CEBA,ref.ANR-10-LABX-25-01andTULIP,ref.ANR-

10-LABX-0041)andfromanadditionalANRgrant(METABARproject;PIP.Taberlet).

Page 143: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

139

References

Aird,D., Ross,M.G., Chen,W.-S., Danielsson,M., Fennell, T., Russ, C., Jaffe, D.B., Nusbaum, C.&Gnirke, A. (2011) Analyzing and minimizing PCR amplification bias in Illuminasequencinglibraries.GenomeBiology,12.

Amend, A.S., Seifert, K.A. & Bruns, T.D. (2010) Quantifying microbial communities with 454pyrosequencing:doesreadabundancecount?MolecularEcology,19,5555–5565.

Andersen, K., Bird, K.L., Rasmussen,M., Haile, J., Breuning-Madsen,H., Kjaer, K.H., Orlando, L.,Gilbert, M.T.P. &Willerslev, E. (2012)Meta-barcoding of “dirt” DNA from soil reflectsvertebratebiodiversity.MolecularEcology,21,1966–1979.

Ayarza, J.M. & Erijman, L. (2011) Balance of Neutral and Deterministic Components in theDynamicsofActivatedSludgeFlocAssembly.MicrobialEcology,61,486–495.

Baas Becking, L.G.M. (1934) Geobiologie of inleiding tot demilieukunde., W.P. Van Stockum &Zoon,TheHague,theNetherlands.

Barnes, M.A., Turner, C.R., Jerde, C.L., Renshaw, M.A., Chadderton, W.L. & Lodge, D.M. (2014)EnvironmentalconditionsinfluenceeDNApersistenceinaquaticsystems.EnvironmentalScience&Technology,48,1819–1827.

Bienert,F.,DeDanieli,S.,Miquel,C.,Coissac,E.,Poillot,C.,Brun,J.&Taberlet,P.(2012)TrackingearthwormcommunitiesfromsoilDNA.MolecularEcology,21,2017–2030.

Bik,H.M.,Porazinska,D.L.,Creer,S.,Caporaso,J.G.,Knight,R.&Thomas,W.K.(2012)Sequencingour way towards understanding global eukaryotic biodiversity. Trends in Ecology &Evolution,27,233–243.

Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16,176–182.

Brown,J.H.(1995)Macroecology,UniversityofChicagoPress.Coissac,E.,Riaz,T.&Puillandre,N.(2012)BioinformaticchallengesforDNAmetabarcodingof

plantsandanimals.MolecularEcology,21,1834–1847.Degnan, P.H. & Ochman, H. (2012) Illumina-based analysis ofmicrobial community diversity.

ISMEJournal,6,183–194.Drakare,S.&Liess,A.(2010)Localfactorscontrolthecommunitycompositionofcyanobacteria

in lakes while heterotrophic bacteria follow a neutral model. Freshwater Biology, 55,2447–2457.

Dumbrell,A.J.,Nelson,M.,Helgason,T.,Dytham,C.&Fitter,A.H. (2010)Relativerolesofnicheandneutralprocesses instructuringasoilmicrobial community. ISMEJournal,4,337–345.

Etienne, R.S. (2007) A neutral sampling formula for multiple samples and an “exact” test ofneutrality.EcologyLetters,10,608–618.

Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.

Etienne,R.S. (2009)Maximum likelihoodestimationofneutralmodelparameters formultiplesamples with different degrees of dispersal limitation. Journal of Theoretical Biology,257,510–514.

Etienne, R.S. & Alonso, D. (2005) A dispersal-limited sampling theory for species and alleles.EcologyLetters,8,1147–1156.

Page 144: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

140

Etienne,R.S.&Alonso,D. (2007)Neutral community theory:Howstochasticity anddispersal-limitationcanexplainspeciescoexistence.JournalofStatisticalPhysics,128,485–510.

Etienne,R.S.,Alonso,D.&McKane,A.J.(2007)Thezero-sumassumptioninneutralbiodiversitytheory.JournalofTheoreticalBiology,248,522–536.

Etienne,R.S.,Latimer,A.M.,Silander,J.A.&Cowling,R.M.(2006)Commenton“Neutralecologicaltheory reveals isolation and rapid speciation in a biodiversity hot spot.” Science,311,610B–+.

Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical populationbiology,3,87–112.

Ewens, W.J. & Tavaré, S. (1997) Multivariate Ewens Distribution. Discrete MultivariateDistributions(ed.byJohnson,.

Fenchel, T. & Finlay, B.J. (2004) The ubiquity of small species: Patterns of local and globaldiversity.Bioscience,54,777–784.

vonFoerster,H.(1959)Someremarksonchangingpopulations.KineticsofCellularProliferation,pp.382–399.Stohlman,F.

Giovannoni,S.J.,Britschgi,T.B.,Moyer,C.L.&Field,K.G.(1990)GeneticdiversityinSargassoSeabacterioplankton.Nature,345,60–63.

Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticalandecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.

Hebert,P.D.N.,Cywinska,A.,Ball,S.L.&DeWaard, J.R.(2003)Biological identificationsthroughDNAbarcodes.ProceedingsoftheRoyalSocietyB-BiologicalSciences,270,313–321.

Hilborn,R.&Mangel,M.(1997)Theecologicaldetective:confrontingmodelswithdata,PrincetonUniversityPress.

Hoppe, F.M. (1984) Polya-like urns and the Ewens sampling formula. JournalofMathematicalBiology,20,91–94.

Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.

Huber, J.A.,MarkWelch,D.,Morrison,H.G.,Huse,S.M.,Neal,P.R.,Butterfield,D.A.&Sogin,M.L.(2007)Microbialpopulationstructuresinthedeepmarinebiosphere.Science,318,97–100.

Jabot,F.&Chave,J.(2009)Inferringtheparametersoftheneutraltheoryofbiodiversityusingphylogeneticinformationandimplicationsfortropicalforests.EcologyLetters,12,239–248.

Jabot, F., Etienne, R.S. & Chave, J. (2008) Reconciling neutral community models andenvironmentalfiltering:theoryandanempiricaltest.Oikos,117,1308–1320.

Kembel, S.W., Wu, M., Eisen, J.A. & Green, J.L. (2012) Incorporating 16S gene copy numberinformation improves estimates of microbial diversity and abundance. PlosComputationalBiology,8,11.

Klymus,K.E.,Richter,C.A.,Chapman,D.C.&Paukert,C.(2015)QuantificationofeDNAsheddingrates from invasive bighead carp Hypophthalmichthys nobilis and silver carpHypophthalmichthysmolitrix.BiologicalConservation,183,77–84.

Lee,J.E.,Buckley,H.L.,Etienne,R.S.&Lear,G.(2013)Bothspeciessortingandneutralprocessesdrive assembly of bacterial communities in aquatic microcosms. Fems MicrobiologyEcology,86,288–302.

Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.

Page 145: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

141

Levy-Booth,D.J.,Campbell,R.G.,Gulden,R.H.,Hart,M.M.,Powell,J.R.,Klironomos,J.N.,Pauls,K.P.,Swanton,C.J.,Trevors,J.T.&Dunfield,K.E.(2007)CyclingofextracellularDNAinthesoilenvironment.SoilBiology&Biochemistry,39,2977–2991.

Mahe, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn,M. (2014) Swarm: robust and fastclusteringmethodforamplicon-basedstudies.Peerj,2.

Maruyama,A.,Nakamura,K.,Yamanaka,H.,Kondoh,M.&Minamoto,T.(2014)ThereleaserateofenvironmentalDNAfromjuvenileandadultfish.PlosOne,9,13.

Nguyen,N.H., Smith,D., Peay, K.&Kennedy, P. (2015) Parsing ecological signal fromnoise innextgenerationampliconsequencing.NewPhytologist,205,1389–1393.

O’Dwyer, J.P.,Lake, J.K.,Ostling,A.,Savage,V.M.&Green, J.L. (2009)An integrative frameworkforstochastic,size-structuredcommunityassembly.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6170–6175.

Ofiteru, I.D., Lunn, M., Curtis, T.P.,Wells, G.F., Criddle, C.S., Francis, C.A. & Sloan,W.T. (2010)Combined niche and neutral effects in a microbial wastewater treatment community.Proceedings of the National Academy of Sciences of the United States of America, 107,15345–15350.

Ostman, O., Drakare, S., Kritzberg, E.S., Langenheder, S., Logue, J.B. & Lindstrom, E.S. (2010)Regionalinvarianceamongmicrobialcommunities.EcologyLetters,13,118–127.

Pienaar, E., Theron, A., Nelson, A. & Viljoen, H.J. (2006) A quantitative model of erroraccumulationduringPCRamplification.ComputationalBiologyandChemistry,30, 102–111.

Quince, C., Lanzen, A., Davenport, R.J. & Turnbaugh, P.J. (2011) Removing noise frompyrosequencedamplicons.BMCBioinformatics,12,18.

Ramirez,K.S., Leff, J.W.,Barberan,A., Bates, S.T., Betley, J., Crowther,T.W.,Kelly, E.F.,Oldfield,E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographic patterns in below-grounddiversity inNewYork City’s Central Park aresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.

Ricklefs, R.E. (2004) A comprehensive framework for global patterns in biodiversity. EcologyLetters,7,1–15.

Roesch, L.F., Fulthorpe, R.R., Riva, A., Casella, G., Hadwin, A.K.M., Kent, A.D., Daroub, S.H.,Camargo,F.A.O.,Farmerie,W.G.&Triplett,E.W.(2007)Pyrosequencingenumeratesandcontrastssoilmicrobialdiversity.ISMEJournal,1,283–290.

Roguet,A.,Laigle,G.S.,Therial,C.,Bressy,A.,Soulignac,F.,Catherine,A.,Lacroix,G.,Jardillier,L.,Bonhomme, C., Lerch, T.Z.& Lucas, F.S. (2015)Neutral communitymodel explains thebacterialcommunityassemblyinfreshwaterlakes.FemsMicrobiologyEcology,91,11.

Rosenzweig,M.L.(1995)Speciesdiversityinspaceandtime,CambridgeUniversityPress.Rosindell, J., Hubbell, S.P., He, F., Harmon, L.J. & Etienne, R.S. (2012) The case for ecological

neutraltheory.TrendsinEcology&Evolution,27,203–208.Ross,M.G.,Russ,C.,Costello,M.,Hollinger,A.,Lennon,N.J.,Hegarty,R.,Nusbaum,C.&Jaffe,D.B.

(2013)Characterizingandmeasuringbiasinsequencedata.GenomeBiology,14.Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European Physical

JournalSpecialTopics,178,13–23.Schleper,C.,Jurgens,G.&Jonuscheit,M.(2005)Genomicstudiesofuncultivatedarchaea.Nature

ReviewsMicrobiology,3,479–488.Sipos, M., Jeraldo, P., Chia, N., Qu, A.I., Dhillon, A.S., Konkel, M.E., Nelson, K.E., White, B.A. &

Page 146: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

142

Goldenfeld,N.(2010)RobustcomputationalanalysisofrRNAhypervariabletagdatasets.PlosOne,5,8.

Sipos,R.,Szekely,A.J.,Palatinszky,M.,Revesz,S.,Marialigeti,K.&Nikolausz,M.(2007)Effectofprimer mismatch, annealing temperature and PCR cycle number on 16S rRNA gene-targettingbacterialcommunityanalysis.FEMSMicrobiologyEcology,60,341–350.

Sloan,W.T.,Lunn,M.,Woodcock,S.,Head,I.M.,Nee,S.&Curtis,T.P.(2006)Quantifyingtherolesof immigrationand chance in shapingprokaryote community structure.EnvironmentalMicrobiology,8,732–740.

Sloan,W.T.,Woodcock, S., Lunn,M.,Head, I.M.&Curtis, T.P. (2007)Modeling taxa-abundancedistributions in microbial communities using environmental sequence data.MicrobialEcology.

Strickler,K.M., Fremier,A.K.&Goldberg,C.S. (2015)Quantifying effectsofUV-B, temperature,andpHoneDNAdegradation inaquaticmicrocosms.BiologicalConservation,183, 85–92.

Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012)EnvironmentalDNA.MolecularEcology,21,1789–1793.

Taberlet,P.,Coissac,E.,Pompanon,F.,Gielly,L.,Miquel,C.,Valentini,A.,Vermat,T.,Corthier,G.,Brochmann, C. & Willerslev, E. (2007) Power and limitations of the chloroplast trnL(UAA)intronforplantDNAbarcoding.Nucleicacidsresearch,35,e14–e14.

Tedersoo,L.,Bahram,M.,Cajthaml,T.,Põlme,S.,Hiiesalu, I.,Anslan,S.,Harend,H.,Buegger,F.,Pritsch, K., Koricheva, J. & Abarenkov, K. (2015) Tree diversity and species identityeffectsonsoilfungi,protistsandanimalsarecontextdependent.IsmeJournal.

Tedersoo, L., Bahram,M., Polme, S., Koljalg, U., Yorou, N.S.,Wijesundera, R., Ruiz, L.V., Vasco-Palacios,A.M.,Thu,P.Q.,Suija,A.,Smith,M.E.,Sharp,C.,Saluveer,E.,Saitta,A.,Rosas,M.,Riit,T.,Ratkowsky,D.,Pritsch,K.,Poldmaa,K.,Piepenbring,M.,Phosri,C.,Peterson,M.,Parts,K., Partel,K.,Otsing, E.,Nouhra, E.,Njouonkou,A.L.,Nilsson,R.H.,Morgado, L.N.,Mayor,J.,May,T.W.,Majuakim,L.,Lodge,D.J.,Lee,S.S.,Larsson,K.H.,Kohout,P.,Hosaka,K.,Hiiesalu,I.,Henkel,T.W.,Harend,H.,Guo,L.D.,Greslebin,A.,Grelet,G.,Geml,J.,Gates,G., Dunstan, W., Dunk, C., Drenkhan, R., Dearnaley, J., De Kesel, A., Dang, T., Chen, X.,Buegger,F.,Brearley,F.Q.,Bonito,G.,Anslan,S.,Abell,S.&Abarenkov,K.(2014)Globaldiversityandgeographyofsoilfungi.Science,346,1078–+.

Vellend,M.(2010)Conceptualsynthesisincommunityecology.QuarterlyReviewofBiology,85,183–206.

Volkov, I., Banavar, J.R.,Hubbell, S.P.&Maritan,A. (2003)Neutral theory and relative speciesabundanceinecology.Nature,424,1035–1037.

Weber,A.A.T.&Pawlowski,J.(2013)Canabundanceofprotistsbeinferredfromsequencedata:AcasestudyofForaminifera.PlosOne,8,8.

West,G.B.,Brown,J.H.&Enquist,B.J.(1997)Ageneralmodelfortheoriginofallometricscalinglawsinbiology.Science,276,122–126.

Whitman, W.B., Coleman, D.C. & Wiebe, W.J. (1998) Prokaryotes: The unseen majority.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,95,6578–6583.

Woodcock, S., vanderGast, C.J., Bell, T., Lunn,M., Curtis, T.P.,Head, I.M.& Sloan,W.T. (2007)Neutralassemblyofbacterialcommunities.FemsMicrobiologyEcology,62,171–180.

Yoccoz,N.G.,Brathen,K.A.,Gielly,L.,Haile,J.,Edwards,M.E.,Goslar,T.,vonStedingk,H.,Brysting,A.K.,Coissac,E.,Pompanon,F.,Sonstebo,J.H.,Miquel,C.,Valentini,A.,deBello,F.,Chave,

Page 147: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

143

J.,Thuiller,W.,Wincker,P.,Cruaud,C.,Gavory,F.,Rasmussen,M.,Gilbert,M.T.P.,Orlando,L., Brochmann, C., Willerslev, E. & Taberlet, P. (2012) DNA from soil mirrors planttaxonomicandgrowthformdiversity.MolecularEcology,21,3647–3655.

Yu,D.W., Ji, Y.Q.,Emerson,B.C.,Wang,X.Y., Ye,C.X., Yang,C.Y.&Ding,Z.L. (2012)Biodiversitysoup:metabarcodingofarthropodsforrapidbiodiversityassessmentandbiomonitoring.MethodsinEcologyandEvolution,3,613–623.

Zinger, L., Gobet, A. & Pommier, T. (2012) Two decades of describing the unseenmajority ofaquaticmicrobialdiversity.MolecularEcology,21,1878–1896.

Page 148: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

144

SupplementaryInformation

Figure S1: Effect of additive noise andmetabolic rate on neutral parameter inference.Leftpanels:meanMOTUrankabundancedistributionsover100realizations for𝜃 = 20ina10!-read sample, without (dashed blue line) andwith (black line) simulated noise: (a) additiveGaussian noise of standard deviation𝜎!"" = 5. 10!!(5 times the relative abundance1 𝑁 =10!!of the least abundantMOTUs), and (c) size structureamong individuals andnon-linearscalingofDNAreleasewithbodymass, forabodysizeratio !

!!!= 1,000andaratio!!

!= 100

betweenmetabolicrateanddeathrate.Rightpanels:meanandstandarddeviationover100realizations of the relative bias on the𝜃estimate in a10!-read sample, for𝜃 = 1(green),𝜃 = 20(black)and𝜃 = 500(red),asafunctionof(b)theadditivenoiseintensity𝜎!"" ,and(d)theratio!!

!.

0 20 40 60 80 100 120

15

5050

0 θ = 20σadd = 5.10−4

a

MOTU abundance rank

Rea

d ab

unda

nce

(log

scal

e)

1e−05 1e−03 1e−01 1e+01

−0.6

−0.2

0.0

0.2

0.4

b

1/N

θ = 1θ = 20θ = 500

σadd (log scale)

(θ−θ)

θ

0 20 40 60 80 100 120

15

5050

0 θ = 20r0 d = 100

c

MOTU abundance rank

Rea

d ab

unda

nce

(log

scal

e)

1e−02 1e+00 1e+02 1e+04

−0.4

−0.2

0.0

0.2

0.4

d

r0 d (log scale)

(θ−θ)

θ

Page 149: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

145

Figure S2:Effectofadditivenoiseandmetabolic rateonneutralparameter inference in thepresenceofdispersallimitation.Wesimulateda10!-readsampleandcomputedthemeanandstandard deviation over 100 realizations of(𝜃 − 𝜃) 𝜃andlog!"(𝐼 𝐼). Results are plotted for𝜃 = 20and for𝑚 = 1(black),𝑚 = 0.1(green),𝑚 = 0.01(blue) and𝑚 = 0.001(red). Panelsa-b: variation with the noise intensity𝜎!"" of an additive Gaussian noise on relativeabundances(1 𝑁 = 10!!istherelativeabundanceoftheleastabundantMOTUs).Panelsc-d:variationwiththeratio!!

!betweenmetabolicrateanddeathrate.

1e−05 1e−04 1e−03 1e−02 1e−01

−2−1

01

2a

1/N

m = 1m = 0.1m = 0.01m = 0.001

σadd (log scale)

log 1

0[I

I]

1e−05 1e−03 1e−01 1e+01−0.6

−0.2

0.2

0.4

0.6

b

1/N

σadd (log scale)(θ

−θ)

θ

1e−02 1e+00 1e+02 1e+04

−10

12

3

c

r0 d (log scale)

log 1

0[I

I]

1e−02 1e+00 1e+02 1e+04

−0.4

0.0

0.2

0.4

0.6

d

r0 d (log scale)

(θ−θ)

θ

Page 150: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

146

SupplementaryMethods:Quantifyingnoiseusingabenchmark

dataset

Tobuildourbenchmarkdataset,wemixedthegenomicDNAextractedfrom16Alpine

plant species in known quantities (Table S1), and we amplified and sequenced the

chloroplasttrnLP6-loopbarcode(primerg-h;Taberletetal.,2007).Amplificationand

sequencing were replicated eight times. The DNA concentrations of the different

species in the mixture scaled logarithmically, with a doubling in genomic DNA

concentration from one species to the next more abundant. The 16 species thus

spannedalargerangeofDNAconcentration(1.10-5ng/µLto1ng/µL),representative

oftheDNAabundancesfoundinenvironmentalsamples.

The PCR mixtures comprised 2 ng DNA template, 10 µl of AmpliTaq Gold®

MasterMix (LifeTechnologies,Carlsbad,CA,USA),0.25µMofeachprimer,3.2µgof

BSA (Roche Diagnostic, Basel, Switzerland) for a final reaction volume of 20 µl.

Thermocycling conditions consisted of an initial denaturation step (95°C, 10 min)

followedby35cyclesofdenaturationat95°C(30s),primerannealingat50°C(30s)

andelongationat72°C(1min),andbyafinalextensionstep(72°C,7min).Amplicons

werethenpurified(MinEluteTMPCRpurificationkit,Qiagen),pooled,loadedonaHiSeq

Illuminalaneandsequencedusingthepaired-endtechnology.Thereadcoveragewas

about105Illuminasequencereadsforeachoftheeightreplicates.

Thesequencingdatawerefirstcuratedfollowingclassicalproceduresusingthe

OBITools package(Boyer et al., 2016), consisting in paired-end read assembly, read

assignationtotheirrespectivesamplesanddereplication.Sequencesoflengthshorter

than 10 nucleotides or containing ambiguous nucleotides were excluded. The

sequenceswerethenprocessedusingtheInfomapclusteringalgorithm(Rosvalletal.,

2009),tominimizethenumberofartifactualMOTUsbyclusteringsequencestogether

based on their similarity. The dataset is considered as a network of sequences

connected by links weighted according to sequence similarity. We used weights

decreasing exponentially with the number of nucleotide differences between

sequences and we discarded the links for more than 5 nucleotide differences. All

replicates were lumped for this clustering analysis. In parallel, all sequences were

Page 151: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

147

assignedtoataxonusingthebarcodesofthe16speciesasareferencedatabase(Table

S1).

The clustering algorithm yielded 48 clusters (i.e. MOTUs), 24 of which were

foundonlyinsomeofthereplicates(Fig.S3a).Eachinputspecieswasrepresentedas

themost abundant sequenceof aMOTU found in all 8 realizations. Takingonly into

account theMOTUs shared across replicates, the proportion of artifactualMOTUs in

thecurateddatasetis33%(Fig.S3b).Usingthetaxonomicassignationofallsequences

tothemostsimilarofthe16species,wefoundthateachartifactualMOTUoriginates

from a single species and is at least 50 times less abundant than the species that

generated it (Fig. S3a). Therefore, artifactual MOTUs have little impact on the

abundance of the true MOTUs in the dataset. Moreover, the number of artifactual

MOTUs generated by a species is proportional to the latter’s read abundance r (Fig.

S3c), and the log-abundance of these artifactual MOTUs is uniformly distributed

between0andlog(𝑟 50).OurmodelingchoiceforsimulatingartifactualMOTUswith

realisticabundancesbuiltontheseempiricalobservations.

The amplification factor, i.e. the ratio between the read abundance and the

initialDNAconcentration,was found tobeapproximatelyconstantover therangeof

DNA concentrations spanned in the dataset (Fig. S3d). However, it varied across

species and replicates. This results in a multiplicative noise on relative abundances

that is approximately lognormally distributed, with logarithm standard deviation

𝜎!"# = 1.2 (Fig. S3e). Seventy-three percent of the variance of the logarithm is

explained by differences among species (likely related to the variability in barcode

copy number and in efficiency of PCR amplification) while the remaining variance

correspondstothevariabilityamongrealizations(Fig.S3d).

References:

Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-inspired software package for DNAmetabarcoding.Molecular Ecology Resources,16,176–182.

Rosvall,M., Axelsson, D. & Bergstrom, C.T. (2009) Themap equation.TheEuropeanPhysicalJournalSpecialTopics,178,13–23.

Taberlet,P.,Coissac,E.,Pompanon,F.,Gielly,L.,Miquel,C.,Valentini,A.,Vermat,T.,Corthier,G.,

Page 152: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

148

Brochmann, C. &Willerslev, E. (2007) Power and limitations of the chloroplast trnL(UAA)intronforplantDNAbarcoding.Nucleicacidsresearch,35,e14–e14.

Species Dilutionfactor

Sequence Sequencelength(nt)

SequenceGCcontent

(%)Taxusbaccata 1.000000 atccgtattataggaacaataattttattttctagaaaagg 41 24.39Salviapratensis 0.500000 atcctgttttctcaaaacaaaggttcaaaaaacgaaaaaaaaaag 45 26.67

Populustremula 0.250000atcctatttttcgaaaacaaacaaaaaaacaaacaaaggttcataaagacagaataagaatacaaaag 68 25.00

Rumexacetosa 0.125000 ctcctcctttccaaaaggaagaataaaaaag 31 35.48

Carpinusbetulus 0.062500atcctgttttcccaaaacaaataaaacaaatttaagggttcataaagcgagaataaaaaag 61 27.87

Fraxinusexcelsior 0.031250 atcctgttttcccaaaacaaaggttcagaaagaaaaaag 39 33.33Piceaabies 0.015625 atccggttcatggagacaatagtttcttcttttattctcctaagataggaaggg 54 38.89

Loniceraxylosteum 0.007813 atccagttttccgaaaacaagggtttagaaagcaaaaatcaaaaag 46 32.61Abiesalba 0.003906 atccggttcatagagaaaagggtttctctccttctcctaaggaaagg 47 44.68

Acercampestre 0.001953atcctgttttacgagaataaaacaaagcaaacaagggttcagaaagcgagaaaggg 56 39.29

Brizamedia 0.000977atccgtgttttgagaaaacaagggggttctcgaactagaatacaaaggaaaag 53 39.62

Rosacanina 0.000488 atcccgttttatgaaaacaaacaaggtttcagaaagcgagaataaataaag 51 31.37Capsellabursa-pastoris 0.000244 atcctggtttacgcgaacacaccggagtttacaaagcgagaaaaaagg 48 45.83

Geraniumrobertianum 0.000122atccttttttacgaaaataaagaggggctcacaaagcgagaatagaaaaaaag 53 33.96

Rhododendronferrugineum 0.000061 atccttttttcgcaaacaaacaaagattccgaaagctaaaaaaaag 46 30.43

Lotuscorniculatus 0.000031atcctgctttacgaaaacaagggaaagttcagttaagaaagcgacgagaaaaatg 55 38.18

TableS1:Listandcharacteristicsofthe16plantspeciesincludedinthebenchmarkdataset.

Page 153: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

149

Figure S3: Empirical results for the benchmark dataset obtained by mixing the DNA of 16plantspecies,thenamplifyingbyPCRandsequencingonanIlluminaplatformthechloroplasttrnLP6-loopbarcode,witheightreplicates.Panela:Readabundanceofthe16species(�)andoftheartifactualMOTUs(∘,∘),averagedoverthereplicates,asafunctionofthespeciesinitialabundance.SomeartifactualMOTUswere found ineveryrealization(∘),butotherswerenot(∘).Thebluedottedlinesdelineatetheabundancedomainchosentomodeltheabundancesofartifactual MOTUs. Panel b: Number of reads per MOTU as a function of the MOTU’sabundance rank, including and excluding artifactual MOTUs (black and dashed blue,respectively). Panel c:LinearrelationshipbetweenthenumberofartifactualMOTUsandthe

1e+00 1e−01 1e−02 1e−03 1e−04

●●

●●

●●●

●●

●● ●

● ●●

0.1

1010

00

aN

umbe

r of r

eads

(log

sca

le)

Species initial abundance (log scale)5 10 15 20

550

500

5000

5000

0

b

MOTU abundance rank

Num

ber o

f rea

ds (l

og s

cale

)●

● ●

●●●●●●●●●●

0.5 0.4 0.3 0.2 0.1 0.0

c

01

23

Species initial abundance

Num

ber o

f arti

fact

ual M

OTU

s

1e−04 1e−03 1e−02 1e−01 1e+00

−4−2

02

4

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●●

●●●

●●

●●●●

●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

d

Species initial abundance (log scale)

log(

ampl

ifica

tion

fact

or)

log(amplification factor)−3 −2 −1 0 1 2

05

1015

20

e

Page 154: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

150

relativeabundanceofthespeciesthatgeneratedthem(themostabundantspeciesisexcluded,as well as the MOTUs found in only some of the replicates). Panel d: Logarithm of theamplification factor, i.e. the ratio between the read abundance and the initial DNAconcentration, as a function of the initial DNA concentration of the species (dotted lines:standarddeviation𝜎!"# = 1.2overallspeciesandallreplicates).Panel e:Probabilitydensityof the logarithm of the amplification factor over the 16 species and the 8 realizations,approximatelynormallydistributed.

Page 155: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

151

SupplementaryNote1:Hubbell’sneutralmodel

Hubbell’s neutral model of biodiversity describes a large pool of JM individuals

undergoing randomdeath, birth and speciation events in the followingway: at each

timestep,oneindividualatrandomdies,andisreplacedbyanewindividual.Thisnew

individualbelongstoataxonnotpreviouslyfoundinthecommunitywithprobability

ν, or to oneof the already existing taxawithprobability1-ν. In the latter case, each

taxon has a probability to be picked proportionally to its abundance in the

community1. In the absence of dispersal limitation, the multivariate steady-state

distributionof taxaabundances is called theEwensdistributionand is characterized

bythesingleparameter𝜃 = !!!!

(𝐽! − 1)(Ewens,1972;Etienne&Alonso,2005).Any

sampleconsistingof𝐽 < 𝐽! individualsdrawnatrandomfromthecommunityfollows

alsotheEwensdistributionofparameterθ.

Adispersal-limitedversionof thismodel isdefinedas follows(Hubbell,2001;

Etienne & Alonso, 2005). New taxa disperse into a single local community by

immigrationfromaregionalpool,whichfollowsthemodelwithoutdispersallimitation

describedabove.Whenanindividualdies, it isreplacedbyanimmigratingindividual

withprobabilitym,andbytheoffspringofalocalindividualwithprobability1-m.Two

immigrantsmaybelongto thesametaxon.Themultivariatesteady-statedistribution

of taxa abundances in the dispersal-limited local community depends on two

parameters: the dispersal parameter 𝐼 = !!!!

(𝐽 − 1) , where J is the number of

individualsinthelocalcommunity,andthediversityparameterθoftheregionalpool

(Etienne,2005).Anysampledrawnatrandomfromthelocalcommunityalsofollows

theEtiennedistributionofparametersθandI(Etienne&Alonso,2005).

References:

Etienne, R.S. & Alonso, D. (2005) A dispersal-limited sampling theory for species and alleles.EcologyLetters,8,1147–1156.

Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.

Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical population

Page 156: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

152

biology,3,87–112.Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),

PrincetonUniversityPress.

Page 157: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

153

SupplementaryNote2:Modelingsizedifferences

ModelingsizedifferencesusingthevonFoersterequation1.

The von Foerster equation (von Foerster, 1959; O’Dwyer et al., 2009) describes a

populationwhereindividualsgrowinnumberofcells(ormass)nwithagrowthrate

g(n),andwheretheydiewithadeathrate𝑑 𝑛 .Theevolutionofthenumber𝑗 𝑛, 𝑡 d𝑛

ofindividualswithanumberofcellsbetweennandn+dnattimetisgivenby:

𝜕𝑗(𝑛, 𝑡)𝜕𝑡 = −

𝜕 𝑔 𝑛 𝑗 𝑛, 𝑡𝜕𝑛 − 𝑑 𝑛 𝑗 𝑛, 𝑡

When g(n) and𝑑 𝑛 are independent of n, the stationary (i.e., time-independent)

solutionofthevonForsterequationis:

𝑗 𝑛 = 𝐽𝑑𝑔 𝑒

!!!(!!!!)

whereJistheconstantpopulationsize,givenby:

𝐽 = 𝑗 𝑛!

!!𝑑𝑛

Therefore,arandomlychosenindividualinthepopulationhasanumbernofcellswith

probabilitydensity:

𝑝!"# 𝑛 =𝑗 𝑛𝐽 =

𝑑𝑔 𝑒

!!!(!!!!)

Weusedthisprobabilitydensitytodrawanumberofcellsbetweenn0andinfinityfor

eachindividualoftheneutralsample.Themean< 𝑛 >andthecoefficientofvariation

𝜎! < 𝑛 >ofthenumberofcellsofarandomlychosenindividualaregivenby:

< 𝑛 >=𝑔𝑑 + 𝑛!

𝜎!< 𝑛 > =

1𝑑𝑔 𝑛! + 1

Page 158: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

154

ComparisonwithO’Dwyeretal.(2009)’ssize-structuredneutralmodel2.

O’Dwyer et al. (2009) transformed the deterministic von Foerster equation into a

probabilistic equation, and integrated it into the master equation of Volkov et al.

(2003), which describes a neutral dynamics without dispersal limitation in a

probabilisticway.Theresultingsize-structuredneutralmodelpredictsthat, insteady

state, if the growth rate𝑔 𝑛 , the birth rate𝑏 𝑛 and the death rate𝑑 𝑛 are

independentofn, and if !!≫ 𝑛!(i.e., individuals growmuch larger than their size at

birth),arandomlychosenspecieswillhaveatotalnumberofcells(oratotalbiomass)

nwithprobabilitydensity:

𝑝!" 𝑛 =𝜈𝑏𝑛 (𝑒

!!!!! ! − 𝑒!!!!)

where𝜈is the speciation rate of the neutralmodel (𝜈 𝑏 ≪ 1). Adding size structure

doesnotmodifytheprobabilityforarandomlychosenspeciestohaveJindividuals:

𝑃!" 𝐽 =𝜈𝑏𝐽

𝑏𝑑

!

While themodel of O’Dwyer etal. (2009) explicitly accounts for the coupling

between the demographic dynamics and the growth of individuals, we generated a

neutral sample of individuals and then assigned an independent number of cells to

eachindividual.Therefore,underourassumptions,thenumbersofcellsofthedifferent

individuals are described by independent and identically distributed exponential

random variables𝑁! , and for!!≫ 𝑛!, the total number of cells of a species with J

individualsfollowsanErlangdistribution:

𝑁!

!

!!!

∼ 𝐸𝑟𝑙𝑎𝑛𝑔(𝑔𝑑 , 𝐽)

withprobabilitydensity:

𝑝!" 𝑛 𝐽 = 𝑝!"#$%&(!!,!)𝑛 =

1𝐽 − 1 !

𝑑𝑔

!

𝑛!!!𝑒!!!!

Page 159: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

155

Theprobabilitydensityforaspeciesofhavingatotalnumberofcellsn is thengiven

by:

𝑝!" 𝑛 = 𝑃!" 𝐽 𝑝!" 𝑛 𝐽!

!!!

Combining the expressions of 𝑃!"(𝐽) and 𝑝!" 𝑛 𝐽 above, we obtain the same

expression for𝑝!" 𝑛 as predicted by the size-structured model of O’Dwyer et al.

(2009).Therefore,inthesimplecasewhereg 𝑛 ,𝑏 𝑛 and𝑑 𝑛 areindependentofthe

number of cells n, explicitly accounting for the coupling between demographic

dynamicsandindividualgrowthisequivalenttoassumingaswedidthatallindividuals

haveindependentandidenticallydistributednumbersofcells.

The modelling approach of Volkov et al. (2003) and O’Dwyer et al. (2009)

differs from that of Ewens (1972) and Etienne (2005). The former consists in

describing the population dynamics of a single specieswith a fluctuating number of

individuals, independently of the remaining of the community, and then considering

that the results hold for every species in the community (“mean-field” approach). In

contrast,theEwensandEtiennedistributionsareobtainedbyexplicitlyconsideringa

communitywithaconstantnumberofindividualsandafluctuatingnumberofspecies

through time. However, the two approaches yield identical stationary distributions

providedthatthenumberofspeciesislargeenough(Etienneetal.,2007).

References:

Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.

Etienne,R.S.,Alonso,D.&McKane,A.J.(2007)Thezero-sumassumptioninneutralbiodiversitytheory.JournalofTheoreticalBiology,248,522–536.

Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical populationbiology,3,87–112.

vonFoerster,H.(1959)Someremarksonchangingpopulations.KineticsofCellularProliferation,pp.382–399.Stohlman,F.

O’Dwyer, J.P.,Lake, J.K.,Ostling,A.,Savage,V.M.&Green, J.L. (2009)An integrative frameworkforstochastic,size-structuredcommunityassembly.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6170–6175.

Page 160: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

156

Volkov, I., Banavar, J.R.,Hubbell, S.P.&Maritan,A. (2003)Neutral theory and relative speciesabundanceinecology.Nature,424,1035–1037.

Page 161: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

157

SupplementaryNote3:Estimatorperformancewithoutsimulated

noise

Weexploredhowthemaximum-likelihoodneutralestimatorsbehaveintheabsenceof

simulated noise over the range of tested parameter values (𝜃in [1, 500] andm in

[0.001,1]).WefoundthatwhiletheEwensestimatorisverylittlebiased(Fig.S4a-b),

thedispersal-limitedestimatorcanbestronglybiaseddependingonparametervalues

and sample size (Fig. S4c-f). The dispersal-limited estimator underestimates θ and

overestimatesIwhentheimmigrationrateintothelocalcommunityistoosmall,and

overestimatesθ and underestimates Iwhen the immigration rate is too large. In the

case of our10!-read sample, values of I around𝐼 = 10! (i.e.𝑚 = 0.01in the10!-

individualsample)allowforthe leastbiasedestimationof(𝜃, 𝐼).Biasesarestrongest

for𝜃 > 100.

Forbothestimatorsstandarddeviationandbiasdecreasewithsamplesize,but

amuch larger sample size is required to obtain accurate estimates in the dispersal-

limited case than in the absence of dispersal limitation. While sample sizes of ca.

𝑁 = 100are sufficient for the Ewens estimator, sample sizes of𝑁 = 10!are still not

sufficientforsomeparametervaluesinthedispersal-limitedcase.Larger𝜃valuesand

smaller I values require larger sample sizes. Estimating the neutral parameters

simultaneouslyfromseveralreadsamplesreducesthesebiases(Etienne,2007).

Reference:

Etienne, R.S. (2007) A neutral sampling formula for multiple samples and an “exact” test ofneutrality.EcologyLetters,10,608–618.

Page 162: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

158

Figure S4: Neutral parameter inference without simulated noise, for different parametervalues. The mean and standard deviation of the relative biases on parameter estimates areplotted over 500 realizations. Panels a-b:𝜃 inference without dispersal limitation, as afunction of (a) the input𝜃value and (b) the read number𝑁, for𝜃equal to 1, 20, and 500.Panelsc-d:𝜃andlog!"(𝐼)inferenceasafunctionoftheinput𝜃value,formequalto0.1,0.01,and 0.001. Panels e-f: 𝜃 andlog!"(𝐼) inference as a function of the read number𝑁 , for𝑚 = 0.01andfor𝜃equalto1,20,and500.

1 2 5 10 50 200

−0.3

−0.1

0.1

0.3

a

θ (log scale)

(θ−θ)

θ

100 200 500 1000 2000

−0.4

0.0

0.4

b

θ = 1θ = 20θ = 500

Read number N (log scale)

(θ−θ)

θ

1 2 5 10 50 200

−20

12

34

c

θ (log scale)

log 1

0[I

I]

1 2 5 10 50 200

−10

12

34

d

m = 0.1m = 0.01m = 0.001

θ (log scale)

(θ−θ)

θ

100 500 2000 5000

−2−1

01

2

e

Read number N (log scale)

log 1

0[I

I]

100 500 2000 5000

0.0

0.2

0.4

0.6

f

θ = 1θ = 20θ = 500

Read number N (log scale)

(θ−θ)

θ

Page 163: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

159

SupplementaryNote4:Neutralparameterinferencewiththenumber

ofindividualsunknown

Because exact maximum-likelihood inference of the neutral parameters relies on

sampling formulas that are invariant under subsampling, it is possible to use the

sequence reads as effective individuals as long as we can consider the reads as a

subsample from the initial individuals. Therefore, there should be less reads than

individuals. A further complication is that the sequence reads are sampled with

replacementfromtheinitialindividualsinoursimulations(i.e.theyareamultinomial

samplefromtherelativeabundances)insteadofwithoutreplacementasrequiredfor

theinvariancepropertytohold.Hencethereshouldbeinfactseveraltimeslessreads

thanindividuals,becausesamplingwithandwithoutreplacementareequivalentonly

inthiscase.

Toillustratethisassumption,weexploredhowtheEwensmaximum-likelihood

estimatorbehavesintheabsenceofsimulatednoisedependingontheinitialnumber

of individuals J, for𝑁 = 10!,𝑁 = 10!,𝑁 = 10! and 𝑁 = 10! , and for 𝜃 = 20 . As

expected, the Ewens estimator yields an unbiased𝜃estimate as long as the initial

numberof individuals isca.oneorderofmagnitude larger thanthenumberofreads

(Fig. S5a-d).We then simulated a number of reads larger than the initial number of

individuals (𝑁 = 10! reads for 𝐽 = 10! individuals, and𝑁 = 10! reads for𝐽 = 10!

individuals),andtooksmallersubsamplesofreadsfromtheoriginalreadsampleuntil

reachingastable𝜃maximum-likelihoodestimate.Asexpected,the𝜃estimatebecomes

stableundersubsamplingforsamplesatleastoneorderofmagnitudesmallerthanthe

initialnumber𝐽ofindividuals.Usingthismethod,weachievedanunbiasedestimation

of𝜃in spite of the small initial number of individuals (Fig. S5e-f). In the dispersal-

limited case, we expect the maximum likelihood estimator based on the Etienne

samplingformulatobehavesimilarly.

WealsocomparedestimatingθusingtheEwensestimatorandestimatingθby

linearregressionontherankedlog-abundance.Wefoundthatbothmethodsperform

similarlywhen thenumberof reads isoneorderofmagnitude larger than the initial

numberof individuals,andthatwhenthisconditionisnotmet, linearregressionstill

Page 164: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

160

provides an unbiased θ estimate (Fig. S5a-d). However, unlike maximum likelihood

inference, linear regressionon the ranked log-abundance is not reliablewheneither

thenumberofreadsortheinitialnumberofindividualsistoolow(lowerthanca.500

for𝜃 = 20; Fig. S5a-d), orwhen there is too little taxonomicdiversity in the sample.

Moreover,the𝜃estimatedependsonthearbitrarydelimitationofthelineardomainof

thecurve.

1e+02 1e+03 1e+04 1e+05

−0.8

−0.4

0.0

a

Number of reads N

J (log scale)

(θ−θ)

θ

1e+02 1e+03 1e+04 1e+05

−0.8

−0.4

0.0

b

Number of reads N

J (log scale)

(θ−θ)

θ

1e+02 1e+03 1e+04 1e+05

−0.6

−0.4

−0.2

0.0

c

Number of reads N

J (log scale)

(θ−θ)

θ

1e+02 1e+03 1e+04 1e+05

−0.4

−0.2

0.0

0.2

d

Number of reads N

J (log scale)

(θ−θ)

θ

−0.3

−0.1

0.1

e

N J

1e+02 1e+03 1e+04 1e+05Nsubsample (log scale)

(θ−θ)

θ

−0.5

−0.3

−0.1

0.1

f

N J

1e+02 1e+03 1e+04Nsubsample (log scale)

(θ−θ)

θ

Page 165: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

161

Figure S5:𝜃inference without dispersal limitation and without simulated noise for 𝜃 = 20.Themeanandstandarddeviationof therelativebiason theθ estimateareplottedover100realizations.Panels a-d:θ inferencebymaximumlikelihood(black)andbylinearregressionon the ranked log-abundance (blue), as a function of the initial number of individuals J, (a)for 𝑁 = 10! reads, (b)𝑁 = 10! reads, (c)𝑁 = 10! reads and (d)𝑁 = 10! reads (linearregression too inaccurate to be plotted for 𝑁 = 10!). Panels e-f: Maximum-likelihood θestimateasafunctionofthesize𝑁!!"#$%&'( ofthereadsubsampleusedforestimation,startingfromanoriginal sampleof (e)𝑁 = 10!readsor (f)𝑁 = 10!reads.Anunbiasedθestimate isobtainedwhen𝑁!"#!$%&'( isatleastoneorderofmagnitudesmallerthantheinitialnumberofindividuals(e)𝐽 = 10!or(f)𝐽 = 10!.

Page 166: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter2–NeutralParameterInference

162

Page 167: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

163

Chapter3TopicmodellingrevealsspatialstructureinaDNA-basedbiodiversitysurvey

GuilhemSommeria-Klein1,LucieZinger1,2,EricCoissac3,AmaiaIribar1,Heidy

Schimann4,PierreTaberlet3,JérômeChave1

1UniversitéToulouse3PaulSabatier,CNRS,IRD,UMR5174LaboratoireEvolutionetDiversitéBiologique(EDB),F-31062Toulouse,France.2EcoleNormaleSupérieure,CNRS,UMR8197InstitutdeBiologiedel’ENS(IBENS),F-75005Paris,France.3UniversitéGrenobleAlpes,CNRS,UMRLaboratoired'EcologieAlpine(LECA),F-38000Grenoble,France.4INRA,UMR745EcoFoG(AgroParisTech,CIRAD,CNRS,UniversityoftheFrenchWestIndies,UniversityofFrenchGuiana),F-97387Kourou,France.

Page 168: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

164

Chapteroutline

Thesecondchapterexploredtheeffectofnoiseontheinterpretabilityofenvironmental

DNA data. In this third chapter, another challenge of environmental DNA data is

addressed,namelythefactthatmicrobialdatasetstypicallyyieldalargenumberofrare

OTUs, and that sampling effort cannot be controlled across samples. As in the first

chapter, the focus is here on datasets containingmany spatially distributed samples.

However, while the first chapter aimed at comparing the taxonomic composition of

samples with respect to their spatial layout and to environmental descriptors, this

chapterdescribesamethodtoexplore thestructureofanenvironmentalDNAdataset

independently of any additional information. The results can then be interpreted in

regardofcontextualdata.Thismethod,which iscloselyrelated tomethodsalready in

use inmicrobiology, is suited to large and sparsedatasets, and accounts for sampling

effects. It consists in decomposing the data into assemblages of OTUs based on their

propensitytoco-occuracrosssamples.Inthischapter,itistestedusingsimulationsand

byapplyingittoalargesoilDNAdatasetcollectedoveraforestplotfollowingaregular

sampling scheme. A measure of the stability of the decomposition is also proposed.

Lastly,theapplicationofthisapproachtoecologicaldataisdiscussedmoregenerally.Of

particular interest is that thismethod ismodel-based, and could thusbe extendedby

modifying the underlying model, including by the addition of more mechanistic

elements.

Page 169: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

165

Abstract

High-throughput sequencing of amplicons from environmental DNA samples has

become a major method for rapid, standardized and comprehensive biodiversity

assessments, allowing for the study of all life formswithin a single sample.However,

data interpretation is often difficult because a large number of rare taxa confound

patterns. Hence, retrieving and describing the structure of such datasets requires

efficientmethodsfordimensionalityreduction.Here,wedescribethefirstapplicationof

Latent Dirichlet Allocation (LDA) to an environmental DNA dataset. LDA uses a

probabilisticmodel todecomposesamples intooverlappingassemblagesbasedon the

co-occurrenceoftaxaandthecovarianceoftheirabundances.Itaccountsforsampling

effectsandaccommodateslargeandsparsedatasets.Weshowthatthegroupingoftaxa

into assemblages can be tested statistically, and to this end develop a measure of

assemblagestability.WethenapplyaLDAalgorithmtoa largesoilsurveyofbacteria,

protists and metazoans in a 12-ha plot of primary tropical forest. The LDA analysis

reveals thatbacterial and protist assemblages display a strong spatial structurewhile

metazoans do not. Furthermore, bacteria and protists exhibit very similar spatial

patterns,whichmatchthetopographicalfeaturesoftheplot.WeconcludethatLDAisa

computationally efficient and robust method to detect and interpret the structure of

large DNA-based biodiversity datasets.We discuss the possible future applications of

thisapproachinbiodiversityscience.

Page 170: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

166

Page 171: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

167

Introduction

High-throughput sequencing is shedding a new light on the study of biodiversity

patternsacrossdomainsof life.Asimpleandefficientmethod is ‘DNAmetabarcoding’

(Taberletetal.,2012),whichconsists inamplifyingandsequencingagenomicmarker

(‘DNAbarcode’) intheDNAcontainedinenvironmentalsamplessuchassoil,wateror

feces(Thomsen&Willerslev,2015).Theresultingsequencescanthenbeclusteredinto

molecular Operational Taxonomic Units (OTUs), which serve as proxies of species in

biodiversity assessments, and which can possibly be assigned to known taxa after

comparison to reference databases. Metabarcoding data typically consist of a

‘communitymatrix’thatliststheOTUsfoundineachenvironmentalsample,aswellas

theirreadcounts.

Agoalofcommunityecologyistounderstandpatternsofspeciesco-occurrence

andturnoveracrossspace.Letusassumethatmanysampleshavebeencollectedacross

space, in a regular fashion. So far, the search for community structure has been

performed using multivariate ordination, as well as distance-based or partitioning-

based clustering (Legendre & Legendre, 2012). These methods have proven their

efficiency, but they have limitationswhen it comes to analysing datasetswith a very

largenumberofOTUs, andmany rareOTUs, resulting in largeand sparse community

matrices (Holmes et al., 2012). Their results are also biased by the uneven sampling

effort across samples in metabarcoding data, since sampling effort depends on the

amountofDNAretrievedandonPCRyieldforeachsample.

Probabilistic approaches to detecting data structure offer an alternative to

ordinationmethodsbyexplicitlymodellingthesamplingprocessthatunderliesthedata

(Holmes et al., 2012). This can be achieved using a so-called mixture model, which

assumesthatthedataarestructuredintoamixtureofseveral(unobserved)component

units, eachwith a distinctive taxonomic composition. Under thismodel, the observed

discretesamplesofsequencereads,whichmaybeofdifferentsizes,aresampledfrom

Page 172: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

168

thismixture.Thecomponentunitscanthenbeinferredfromthedatausingmaximum-

likelihoodorBayesianinference,whichproviderigorousmeansofassessinggoodness-

of-fit and of selecting the number of component units. Mixture models have been

successfully used in microbiology (Knights et al., 2011; Holmes et al., 2012; Ding &

Schloss,2014;Shafieietal.,2015)andincommunityecology(Valleetal.,2014),either

inanunsupervisedway(dataclustering)orinasupervisedway(dataclassification).In

particular, Valle et al. (2014) used Latent Dirichlet Allocation (LDA) to cluster tree

abundance data across forest plots into component assemblages – or ‘component

communities’.Theyshowedthatthismethodperformedbetterthanhierarchicalandk-

meansclusteringonsimulateddata.Here,weexplore thepotentialof thismethod for

theanalysisoflargemetabarcodingdatasets.

LDAdecomposessamplesintoamixtureofcomponentassemblages,whichmay

themselvesoverlapintheirtaxonomiccomposition.Thecomponentassemblagescanbe

interpretedascommunitiesofco-occurringtaxa.Becauseeachsampleisrepresentedby

a mixture of component assemblages, the model captures the smooth turnover in

speciescompositionalongenvironmentalgradients(Valleetal.,2014).Thismodelwas

originally introduced by Blei etal. (2003) to decompose large sets of text documents

into topics (a problem known as ‘topic modelling’), based solely on their word

frequency, and has been subsequently extended to the analysis of large and complex

datasets in various fields (see Blei (2012) for a review). The same model has been

independently introduced in population genetics tomodel population structure using

the distribution of alleles across individuals, and is now a cornerstone of population

genetics analyses (model with admixture in the Structure software; Pritchard et al.,

2000).

OneissuefortheapplicationofLDAtometabarcodingisthattheinterpretation

thatcanbemadeofabundance information, i.e. theDNAreadcountperOTU,remains

debated (Nguyen et al., 2015; Sommeria-Klein et al., 2016). For bacteria, it seems

possible to relate the read count to the number of cells in the sample (Kembel etal.,

2012),while in the case ofmacro-organisms, the read countmaybe indicative of the

taxon’s biomass in the environment (Andersen et al., 2012; Klymus et al., 2015).

Nevertheless,metabarcodingdataareoftenbestusedasoccurrencedata,anditisthus

important to evaluate the applicability of LDA to occurrence-based datasets. Second,

Page 173: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

169

depending on how strongly structured the data are, the LDA algorithm may fail to

converge to an optimal solution. It is indeed acknowledged in the literature that the

resultofLDAdecompositionmayvaryfromoneruntotheother(Steyvers&Griffiths,

2007;Balagopalan,2012;Valleetal.,2014).Hence,itwouldbeimportanttoquantifythe

robustness of the LDA decomposition, especially since environmental DNA data are

noisy.Wefirstaddresstheseproblemsonsimulateddata,andthenturntotheanalysis

of an empirical metabarcoding dataset describing the soil biodiversity of bacteria,

protistsandmetazoansoveralargetropicalforestplotinFrenchGuiana(Zingeretal.,

2017).We thusaddresshere the followingquestions: (1) canLDAaccurately retrieve

assemblages from occurrence data, (2) can we define a stability metric for the

decomposition of metabarcoding data into component assemblages, and (3) can

componentassemblagesretrievedfromempiricaldataberelatedtovariationinabiotic

conditions? Finally, we discuss our results in light of those obtained bymultivariate

methods(Zingeretal.,2017).

Page 174: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

170

Methods

LatentDirichletAllocation1.

LDA decomposition takes as an input a community matrix representing samples by

columns and OTUs by lines, where the entries are the read counts per OTU in each

sample.Occurrencedatacanalsobeprovidedasaninput,sincetheyareaspecialcaseof

abundancedatawhereOTUabundancesonly takevalues0or1. Inference consists in

fitting a generative model to the observed community matrix. The generative model

describesawaytogeneratethedatabasedontwoassumptions:thedataarestructured

intoKassemblages,whereKisafixedparameter,andeachsampleisamixtureoftheK

assemblages in Dirichlet-distributed proportions. The model involves unobserved

(‘latent’) variables describing the underlying decomposition of the data into the K

assemblages,andthe fittingprocessconsists inestimatingthemost likelyvalueof the

latentvariablesandofthemodel’sparametersgiventheobserveddata(Fig.1).

The generative model consists of the following steps. For sequence read n in

samplem,assemblagemembershipznisgeneratedbyacategoricaldrawfromavector

ofKmixtureweights 𝜃!! !∈ !,! (i.e.,oneoutofKcategoriesischosenatrandomwith

probability weights 𝜃!! !∈ !,! ). Then, the OTU membership wn is generated by a

categoricaldrawfromavectorofVmixtureweights 𝜙!!!

!∈ !,!,whereVisthenumber

of distinct OTUs in the whole dataset. The mixture weights 𝜃!! represent the

decompositionofeachsamplem intotheKassemblages,whilethemixtureweights𝜙!!

representthetaxonomiccompositionofeachassemblagek.Themodelfurtherassumes

thatthemixtureweights𝜃!!followforeachsamplemasymmetricDirichletdistribution

ofmixingparameterα.Therefore,foreachsamplem:

𝜽𝒎 = 𝜃!! !∈ !,! ∼ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝛼

Andthen,foreachsequencereadninsamplem:

𝑧! ∼ 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 𝜽𝒎

Page 175: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

171

𝑤! ∼ 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 𝝓𝒛𝒏

Thus, fitting the generativemodel to the observed data consists in finding the

most likely assemblage mixtures 𝜽𝒎 for the M samples, the most likely OTU

compositions𝝓𝒌 for the K assemblages, and the most likely value for the mixing

parameterαoftheDirichletdistribution.Thevalueofαindicateswhetherthesamples

tend to be decomposed into an evenmixture of component assemblageswith similar

abundances (case𝛼 > 1) or into an uneven mixture dominated by one or a few

componentassemblages(case𝛼 < 1).Asharpspatialsegregationoftheassemblagesis

associatedwithaαvaluemarkedlylowerthanunity.TheDirichletdistributionisused

as a prior primarily because it is the conjugate prior of the categorical distribution,

whicheasesanalyticalcalculations.

Figure 1. Illustration of Latent Dirichlet Allocation’s (LDA) principle. LDA decomposesacommunitymatrixwith discrete abundance information (e.g., read count) intoK assemblagesbasedontheco-occurrenceofOTUsandthecovarianceoftheirabundancesacrosssamples.Kisfixed beforehand and can be selected using likelihood-based model selection methods. Theassemblage mixture 𝜃!! !∈ !,! in each sample m, with 𝜃!!!

!!! = 1 , and the taxonomiccomposition 𝜙!! !∈ !,! ofeachassemblagek,with 𝜙!!!

!!! = 1,areinferredfromthedata.

InferenceusingaVariationalExpectation-Maximizationalgorithm2.

WefittedthegenerativemodeltotheobserveddatausingtheVariationalExpectation-

Maximization (VEM) algorithm proposed and implemented by Blei et al. (2003), and

Sample1 Sample2 Sample3

OTU1nOTU2★OTU3u

nnnn ★

nn ★★u

★★★uuu

Sample1 Sample2 Sample3

Assemblage1Assemblage2

10

0.50.5

01

Assemblage1 Assemblage2

OTU1nOTU2★OTU3u

0.750.250

00.50.5

θkm

φνk

LDAdecomposi<onK=2assemblages

Sample1 Sample2 Sample3

OTU1nOTU2★OTU3u

nnnn ★

nn ★★u

★★★uuu

Page 176: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

172

wrapped into theRpackage ‘topicmodels’byGrün&Hornik (2011).Compared to the

often-followed Bayesian approach of Griffiths & Steyvers (2004), this approach is

computationallyfaster,estimatesallparametersandallowsforabetter-justifieduseof

AICforselectingthenumberofassemblages.Thealgorithmusesapproximatelikelihood

maximizationtoestimatetheparametersαand𝝓 = 𝜙!! !∈ !,! !∈ !,!,aswellas the

posterior distribution of the latent variables 𝒛 = 𝑧! !∈ !,!! !∈ !,!and 𝜽 =

𝜃!! !∈ !,! !∈ !,!giventhedata𝒘 = 𝑤! !∈ !,!! !∈ !,!

.

First,wesetthemodelparametersto𝛼 = 0.1andtorandomlychosenvaluesfor

φ .Then,thefollowingtwostepsarerepeateduntilthelikelihood(ormoreprecisely,a

lower bound for the likelihood) converges. The variational step approximates the

posteriordistribution𝑃 𝒛,𝜽|𝒘,𝛼,𝝓 ofzandθ ,giventhedatawandgiventhecurrent

values of α and𝝓. This is achieved by minimizing the Kullback-Leibler divergence

between a variational approximation and the true posterior. The Expectation-

Maximization(EM)stepestimatestheparametersαand𝝓bymaximizingthemarginal

log-likelihood 𝐿 𝛼,𝝓 = ln 𝑃 𝒘|𝛼,𝝓 , making use of the approximation to the

posteriordistribution𝑃 𝒛,𝜽|𝒘,𝛼,𝝓 foundinthevariationalstep(Bleietal.,2003;Grün

& Hornik, 2011). We used a convergence threshold of 10-7 for the EM step and a

convergencethresholdof10-8forthevariationalstepinallouranalyses.

Thisalgorithmprovidesanestimateofthemarginallog-likelihoodln 𝑃 𝒘|𝛼,𝝓

of the final decomposition, that can be used to compare different realizations of the

algorithmortocomputethemodel’sAIC.Itisadeterministicalgorithminthesensethat

it consists in a simple iterative optimization. However, the resultmay depend on the

initializationforthetaxonomiccomposition𝝓ofassemblages.

Computingtheoptimalnumberofassemblages3.

WeselectedthenumberKofassemblagesbasedonAIC.Thereisnorigorousexpression

ofAICforamodelsuchasLDA(Burnham&Anderson,2002),butwechosetocompute

the AIC as2(𝐿 𝛼,𝝓 + 𝐾(𝑉 − 1)+ 1), where𝐿 𝛼,𝝓 = ln 𝑃 𝒘|𝛼,𝝓 is the marginal

log-likelihoodoftheLDAdecomposition.Indeed,thereare𝐾(𝑉 − 1)freeparametersto

Page 177: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

173

beestimated in𝝓 = 𝜙!! !∈ !,! !∈ !,!,plusthemixingparameter𝛼.This is thesame

expressionastheoneusedelsewhere(Than&Ho,2012).Weusedthelowerboundon

themarginallog-likelihoodcomputedaspartoftheVEMalgorithmasanapproximation

for𝐿 𝛼,𝝓 . We also tried to correct the AIC for small sample size as2[𝐿 𝛼,𝝓 +

𝐾 𝑉 − 1 + 1 1+ !!], whereM is the number of samples (Burnham & Anderson,

2002),butthisdidnotmodifyourresults,andwedonotreporttheseanalyseshere.

Assessingthestabilityofthedecomposition4.

TheLDAdecompositionreflectstheco-occurrencestructureofOTUsamongsamples,as

wellasthecovariancestructureoftheirabundancesinthecaseofabundancedata.Ifthe

dataarenotstronglystructured,theymayexhibitacomplexlikelihoodlandscape,which

increasesthechancethatthealgorithmreachesalocallikelihoodmaximum.Toaddress

this issue, we ran the algorithm a hundred times starting from random initial

assemblages𝝓𝒌,andweselectedonlytherealizationwiththehighest likelihoodvalue

for interpretation. We also measured the stability of the decomposition across the

hundred realizations,with two goals inmind:measuring how strongly structured the

data are, and assessing whether the realization with highest likelihood has indeed

reached the optimal solution. We removed the occasional realizations with α values

much larger than 1 from the analysis, because they correspond to non-informative

solutionswhereallsamplescontainallassemblagesinsimilarproportions.

Tomeasurethestabilityofthedecompositionacrossrealizations,wefirstneeded

todefineameasureofsimilaritybetweentwopossibledecompositionsofthedata.We

computeditasthemeansimilaritybetweentheassemblagesofthetwodecompositions.

Therefore, itboilsdowntodefiningameasureofsimilaritybetweentwoassemblages.

WeusedthesymmetrisedKullback-Leibler(sKL)divergence,ameasureofdissimilarity

between two distributions that stems from information theory and that is commonly

used in statistics and machine learning (Burnham & Anderson, 2002; Meila, 2006;

Steyvers&Griffiths, 2007).TheKullback-Leiblerdivergence (or relative entropy)of a

distribution 𝒒 = 𝑞! !∈ !,! relative to a distribution 𝒑 = 𝑝! !∈ !,! is defined as

Page 178: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

174

𝐷 𝒑 𝒒 = 𝑝! ln 𝑝! 𝑞!!!!! , with 𝑝!!

!!! = 1 and 𝑞!!!!! = 1 (Kullback, 1959). It

measurestheamountofinformationlostwhenapproximatingthedistributionpbythe

distributionq. The symmetrisedKullback-Leiblerdivergencebetweenp andq is then

defined as𝐷! 𝒑,𝒒 = 𝐷 𝒑 𝒒 + 𝐷 𝒒 𝒑 2. Between two assemblages𝑘!and𝑘!, the

sKL divergence can be computed either based on their spatial distribution, i.e.

𝐷! 𝜽𝒌𝟏 𝜃!!!!

!!! ,𝜽𝒌𝟐 𝜃!!!!

!!! ,orbasedontheirOTUcomposition, i.e.𝐷! 𝝓𝒌𝟏 ,𝝓𝒌𝟐 .

Thus,wewereable tomeasureboth thespatialandthe taxonomicsimilaritybetween

twoassemblages.Since𝐷! 𝒑,𝒒 isinfiniteassoonasthereisatleastoneiin 1,𝑁 that

verifies𝑝! = 0or𝑞! = 0, we avoided infinite sKL divergence values by setting a lower

boundineveryentryofθ andφ ,equaltotheinverseofthesumofallelementsinthe

communitymatrix(i.e.,theinverseofthetotalnumberofreadsinthecaseofabundance

data,ortheinverseofthetotalnumberofoccurrencesinthecaseofoccurrencedata).

Therefore,everypointwherebothdistributionstakevaluesbelowthisthresholdhasa

nullcontributionto𝐷! 𝒑,𝒒 .

We used the sKL divergence to define the similarity measure

𝜎 𝑘!, 𝑘! = 𝐷! 𝑘!, 𝑘! !"# − 𝐷! 𝑘!, 𝑘! 𝐷! 𝑘!, 𝑘! !"# between two assemblages𝑘!

and𝑘!,where 𝐷! 𝑘!, 𝑘! !"#is the average sKLdivergenceover1000 randomizations

of the assemblages.When computing spatial similarity,weperformed randomizations

by randomly shifting the spatial distribution of one assemblage with respect to the

other, so as to account for spatial autocorrelation (Fortin & Payette, 2002). When

computingtaxonomicsimilarity,weperformedrandompermutationsoftheOTUsinone

distributionwithrespecttotheother.Thesimilarity𝜎 𝑘!, 𝑘! isequalto1foraperfect

match, and to0when theassemblagesareas similar as expectedby chance.We then

defined the similarity between two decompositions𝑑!and𝑑!as the mean similarity

betweentheirbest-matchingassemblages,i.e.𝑆 𝑑!,𝑑! = 𝜎 𝑘!, 𝑘!∗ 𝑘! 𝐾!!!!! ,where

assemblage 𝑘!∗ 𝑘! is the best match in decomposition 𝑑! of assemblage 𝑘! in

decomposition𝑑!, as deduced from the comparison of𝜎values.Whenmore than one

assemblage 𝑘! in decomposition 𝑑! had a best match with assemblage 𝑘!∗ in

decomposition𝑑!,weforcedaone-to-onecorrespondencebetweentheassemblagesof

both decompositions by giving priority to higher𝜎values. This situation should be

Page 179: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

175

rarely encountered however, since assessing stability mostly involves comparing

decompositionsthatcloselyresembleeachother.

Figure2.Assessing thestabilityof theLDAdecompositionusing themetric I.Eachpanelrepresents100realizationsof theLDAalgorithmwithrandomassemblage initializations foramockdataset. Inbothcases,therealizationwithhighest likelihood(i.e., thebestrealization) iscompared to each of the 99 others by plotting their similarity S as a function of their log-likelihooddifference. Themetric I is defined as the intercept of the linear regression (dashedblue line). Two cases are illustrated: (a) realizations grow increasingly similar to the bestrealization as their likelihood increases (𝐼 = 1), and (b) dissimilar realizations with similarlikelihood coexist (𝐼 = 0.5). Values of I close to 1 indicate that the best realization is likely tohavereachedtheoptimalsolution.

Wemeasured the stability of the decomposition across𝑛 = 100realizations by

computing two metrics. First, we computed the mean similarity across all pairs of

realizations 𝑆 ! = 𝑆 𝑑!,𝑑! 𝑛 𝑛 − 1 2!!,!! .Themoresimilartherealizationsare

irrespectiveof the initialcondition, themorestronglystructuredthedataare likelyto

be. Second, we compared the realization with highest likelihood (i.e., the best

realization) to each of the𝑛 − 1others. To assess whether the best realization had

indeed reached the optimal solution, we plotted for each pair their similarity𝑆as a

functionoftheir log-likelihooddifference(Fig.2).Weperformeda linearregressionof

thesimilarityagainstthelog-likelihooddifference,andusedtheintercept𝐼!asametric.

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000

Llh difference with best realization

Sim

ilarit

y S

to b

est r

ealiz

atio

n

a

●●

●●

●●

●●

●●

0.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000

Llh difference with best realization

b

Page 180: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

176

Thismetric𝐼!assesseswhether the realizations tend to be increasingly similar to the

best realization as their likelihood increases, i.e. to ‘converge’ toward the best

realization. Values of𝐼! close to 1 mean that we can be confident that the best

realization has reached the global likelihood maximum, provided that the space of

possible initializations has been adequately sampled. We computed both metrics for

spatial ( 𝑆!"#$. ! , 𝐼!"#$.,! ) and taxonomic ( 𝑆!"#$. ! , 𝐼!"#$.,! ) similarities between

assemblages.

Simulateddata5.

To test the performance of the LDA algorithm on occurrence-transformed data with

respect to the original abundance data, we simulated a metabarcoding dataset. This

simulated dataset comprised 1,131 samples containing a total of 1,000 OTUs and

decomposedinto5assemblages.WefirstdefinedtheassemblagesbydrawingtheirOTU

compositionfromaDirichletdistributionofmixingparameter0.02.Wethenassignedto

each sample a mixture of assemblages in proportions determined by a sinusoidal

function of the sample’s index, so that the relative abundances of all 5 assemblages

successively peak at 100% (Fig. 3). Combining the assemblage mixture and the

taxonomiccompositionofassemblages,weobtainedtherelativeabundancesofOTUsin

eachsample.Wegeneratedthesimulateddatasetbysampling1,000sequencereadsper

sample from these relativeabundances,which resulted inanaveragediversityof105

OTUspersample.

Page 181: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

177

Figure3:LDAdecompositionofsimulatedoccurrenceandabundancedata.LDAappliedto

asimulateddatasetwith5assemblages,1,000MOTUs,1,131samples,and1,000sequencereads

persample,(a)fortheoriginalabundancedata,and(b)fortheoccurrencedataderivedfromthe

samedataset.EachplotshowstheassemblageproportionsestimatedbyLDAforK=5(coloured

lines; only the realization with highest likelihood out of 100 is shown) and the simulated

assemblageproportions(dashedblacklines).

Tropicalforestsoilmetabarcodingdataset6.

We applied LDA to an empiricalmetabarcoding dataset describing the biodiversity of

bacteria, protists and metazoans over a 300x400 m tropical forest plot (called Petit

Plateau;Chaveetal.,2008)attheNouraguesEcologicalResearchStation, ina lowland

tropical forest of central French Guiana (Bongers et al., 2001). Site conditions, data

collection,laboratoryprocedures,andsequencingfilteringproceduresarealldescribed

indetailinZingeretal.(2017)andareonlybrieflysummarizedhere.

Thesamplingcampaignwasconductedtowardstheendofthe2012dryseason.

Soilsampleswerecollectedfromthemineralhorizon(~10cmdeep)usingasoilauger

every10monasquaregridcoveringtheplotandexcludingtheedges,whichresultedin

1,131 soil samples (Fig. S1). Extracellular DNA was extracted in the field from each

sample (Zinger et al., 2016). The present study uses data from two DNA barcodes

0 200 600 1000

0.0

0.2

0.4

0.6

0.8

1.0

Asse

mbl

age

prop

ortio

ns in

sam

ples

a − Abundance

Samples0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

b − Occurrence

Samples

Page 182: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

178

amplified by PCR and sequenced on high-throughput Illumina sequencers, targeting

bacteria(16SrDNA),andalleukaryotes(18SrDNA).Thesequencingdatawerecurated

using theOBITools package (Boyer etal., 2016). Sequenceswere clustered intoOTUs

based on their similarity using the Infomap algorithm (Rosvall et al., 2009) with a

similarity cut-off of 3mismatches, so as to cluster spurious sequences resulting from

PCRandsequencingerrors.EachOTUwasgivenataxonomicassignationbycomparing

itssequencetothefollowingreferencedatabases:GenBankr197fortheeukaryotic18S

marker, andSILVA for thebacterial16Smarker.Sequencematching todatabaseswas

conductedusingtheecotagprogramincludedintheOBIToolspackage.Basedonthese

taxonomic assignations, we further split the eukaryotic 18S dataset into protists,

arthropods,annelids,nematodes,andflatworms(Platyhelminthes).

Out of the 1,131 samples, a number of samples were excluded from the

sequencingresultsforeachbarcodeduetoinsufficientPCRyields(7.2%ofsamplesfor

bacteriaand0.2%foreukaryotes).Weinterpolatedthecontentofthemissingsamples

by samplingwith replacement themeannumberof readsper sample from the (up to

eight)non-emptynearestneighbouringsamplesonthegrid.WethenappliedtheLDAon

either the read-abundance data, or on the occurrence-transformed data, defining the

absenceofanOTUinasamplestrictlyaszeroread-abundanceinthesample.Wedidnot

trimthedataforrareOTUs,orforOTUsrepresentedinasinglesample.

A fine-grained description of the forest canopy structure and topography was

obtainedusingasmall-footprintLiDARsurveycarriedoutoverthesamplingsiteinthe

same year as the soil sampling (2012; Rejou-Mechain et al., 2015). This allowed the

generation ofmaps of topography, slope, and canopy height from the LiDAR cloud of

points. The topography of the plot is relatively smooth, with amaximal difference in

elevationof30m.Mapsofsoilwetness(Beven&Kirkby,1979)andlightatgroundlevel

were also derived from the LiDARmeasurements (Tymenetal., 2017).We compared

theLiDAR-deriveddatawiththemetabarcodingdatabycomputingthemeanvaluesof

the environmental variables over 10-m-by-10-m cells centred on the soil sampling

points. We sought a biological interpretation for the retrieved assemblages by

comparing their spatial distribution to the distribution of LiDAR-obtained

environmental variables. To do so, we computed Pearson’s correlation coefficient

between the spatial distributions and assessed the significance of the correlation by

Page 183: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

179

performing 100,000 spatial randomizations, i.e. shifting randomly one spatial

distributionwithrespecttotheother,soastoaccountforspatialautocorrelation.

Table 1: Stability of LDA decomposition for occurrence data. For each of the taxonomicgroupsunderstudy:totalnumberofMOTUs;optimalnumberofassemblages𝐾!"#(!"#)obtainedfromAICminimization; spatial and taxonomic stability for three assemblages asmeasuredbythe 𝑆 !""and𝐼!""metrics;estimatedvalueofthemixingparameterαinthebestrealizationoutof100forthreeassemblages.

𝐾 = 3

Richness 𝐾!"#(!"#) 𝑆!"#$. !"" 𝐼!"#$.,!"" 𝑆!"#$. !"" 𝐼!"#$.,!"" 𝛼!"#$ !"#$.

Bacteria16S 20,162 5 0.85 1.0 0.95 1.0 0.16

Protists18S 1,648 2 0.68 1.0 0.95 1.0 0.082

Arthropods18S 1,881 2 0.62 0.62 0.91 0.93 0.11

Nematodes18S 378 2 0.33 0.49 0.88 0.94 0.05

Platyhelminthes18S 126 2 0.52 0.50 0.86 0.88 7.0

Annelids18S 51 2 0.41 0.57 0.83 0.90 0.035

Page 184: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

180

Results

We first appliedLatentDirichletAllocationdecomposition toa simulateddataset, and

compared the results for abundance and occurrence data. AICminimization correctly

recoveredthesimulatednumberofassemblages(five)inbothcases(Fig.S2).Inthecase

ofoccurrencedata,LDAyieldedmoreevenassemblagemixturesthansimulated(Fig.3).

Thealgorithmreachedtheoptimalsolutionmorereliablyforoccurrencedatathanfor

abundancedata( S!"#$. !"" = 0.98foroccurrencedata, 𝑆!"#$. !"" = 0.89forabundance

data,𝐼!"#$.,!"" = 1.0in both cases; cf. Fig. S2). Next, we applied the analysis to the

tropical forest soil dataset. Using the read-abundance data, the optimal number of

assemblages was always larger than 50, while it ranged between 2 and 5 for the

occurrencedata,dependingonthetaxonomicgroup(Table1).Asalsoobservedonthe

simulateddata,theLDAalgorithmconvergedmorereliablytowardtheoptimalsolution

foroccurrencedatathanforabundancedata(TableS1).ThusweconcludethatLDAcan

be effectively applied to occurrence-based biodiversity data. In the rest, we describe

resultsobtainedusingoccurrencedataandassuming𝐾 = 3assemblages,avalueclose

tothatminimizingtheAICacrosstaxonomicgroups.

Wefoundcleardifferenceswhencomparingbacteriaandunicellulareukaryotes

(henceforth denoted as protists) tometazoans (arthropods, annelids, flat worms and

nematodes).Bacteriaandprotistsdisplayedastrongerspatialstructureatthescaleof

ourstudyplot,asdeducedfromthespatialstabilityofthedecomposition:thesimilarity

intercept𝐼!"#$.,!""was equal to 1.0 (Table 1, Fig. S3), with a mean similarity across

realizations S!"#$. !""of0.85and0.68, respectively (Table 1). In contrast, metazoans

displayed a lower similarity intercept (0.49 ≤ 𝐼!"#$.,!"" ≤ 0.62 ), and also a lower

similarity across realizations (0.33 ≤ S!"#$. !"" ≤ 0.62). We also found that spatial

structurewaspositively correlated to taxonomicdiversity,measuredbyOTUrichness

(correlation coefficient𝜌 = 0.85between S!"#$. !!!and the number of OTUs; Fig. 4).

The taxonomic stability of the assemblages was higher than their spatial stability,

followingthesametrendsasthespatialstability,butwithlesspronounceddifferences

amongtaxonomicgroups(Table1).

Page 185: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

181

Figure 4. Stability for occurrence data measured as the mean similarity acrossrealizations 𝑺 𝟏𝟎𝟎,asafunctionofthenumberofOTUs.Themetric 𝑆 !""ismeasuredbasedon the (a) spatial and (b) taxonomic similarity between assemblages. The blue line figures alinear regression, and the shaded area its standard error. Pearson’s correlation coefficient is𝜌 = 0.85forspatialstabilityand𝜌 = 0.92fortaxonomicstability.

For all taxonomic groups except flat worms, the estimates of the mixing

parameterαweremuchsmallerthan1(Table1),indicatingastrongspatialsegregation

amongassemblages.Inbacteriaandprotists,thedecompositionintothreeassemblages

wasstronglylinkedtotopographicalfeatures(Fig.4,TableS2).Theblueassemblageof

Fig. 4 was associated with terra firme areas, defined as areas of higher topography,

gentler slope, and lower soil wetness. The green assemblage was associated with

hydromorphic areas, defined as displaying the opposite environmental correlations

(TableS2).Finally,thespatialdistributionoftheredassemblagematchedthelocationof

exposed rock patches that are scattered across the forest plot, based on direct

observations. In metazoans, we were unable to identify similar terra firme and

hydromorphicassemblages(Fig.S4,TableS2),howeveroneassemblageinarthropods

andnematodesdidmatchtheexposedrockspatialpattern(TableS3).Thisexposedrock

assemblagewasindeedconsistentlyfoundtobethemosttaxonomicallydistinctiveinall

taxonomic groups.Neither light at ground level nor canopy height explained the LDA

decompositioninanyofthetaxonomicgroups.

Bacteria

Protists

Annelids

Nematodes

Platyhelminthes

Arthropods

0.25

0.50

0.75

1.00

2 3 4 5Number of OTUs (log10)

Spat

ial s

tabi

lity

a

●●

BacteriaProtists

Annelids

Nematodes

Platyhelminthes

Arthropods

0.80

0.85

0.90

0.95

1.00

2 3 4 5Number of OTUs (log10)

Taxo

nom

ic s

tabi

lity

b

Page 186: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

182

Discussion

Large environmental DNA datasets offer a unique opportunity to unlock some of the

major challenges in community ecology, yet as a result data accumulation is

accelerating, thuscreatingtheneed fornovelmethodsadaptedto thesedata.Herewe

havepresentedthepotentialoftheLatentDirichletAllocationmethodfortheanalysisof

metabarcodingdata.Thismodel-basedmethodisadaptedtolargeandsparsedatasets.

Itassignsaprobabilityweightforeachsampletobelongtoanassemblagebasedonthe

OTUs in thissample,andalso infers thecomposition inOTUsofeachassemblage(see

Table S4). It thus goes beyond a categorical classification of samples and generates

biologically interpretable assemblages. Here, we further elaborate on the advantages

andlimitationsofthisapproach,andontheimplicationstotheanalysisoftheforestsoil

dataset.

DiscussingtheassumptionsofLDA.Unlikeinclassicalmultivariatemethods,noprior

transformationofthedataisrequired:inputdataconsistofdiscreteOTUabundances,or

occurrences,andsamplesizesmayvaryacrosssamples.Inputdataarenotrequiredto

meetanormalityassumption,thedefinitionofadissimilaritymetricisnotrequired,and

LDA thusmakes amore parsimonious use of the data. The assumptionsmade by the

underlyingmodelareminimal:theDirichletprioristhenaturalpriorfortheparameters

ofthecategoricaldistribution,anditissufficientlyflexibletofitmostdatasets(O’Brien

&Record,2016).Onecouldtakeasteptowardmoremechanisticmodellingbyadding

moreassumptionstotheLDAapproach.For instance,onecouldassumethataneutral

dynamicstakesplacewithinassemblages,sothattheirtaxonomiccompositionfollows

the taxa-abundance distribution predicted by Hubbell’s neutral theory of biodiversity

(Hubbell, 2001;Harrisetal., 2015).Assuming aDirichlet prior also on the taxonomic

composition of assemblages, as done in the Bayesian version of LDA (Griffiths &

Steyvers, 2004; Valle et al., 2014), is a first step in that direction, since the Dirichlet

distributionapproximatestheneutraltaxa-abundancedistributionforalargenumberof

taxa.

Page 187: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

183

Assessing therobustnessof theLDAdecompositionandselecting thenumberof

assemblages. In many applications of LDA, the question of the robustness of the

decompositioniscrucial.Howevertherobustnessofthealgorithm,asmeasuredbythe

similarity of the output across runs, has rarely been assessed, probably because it

entails a serious computational burden. Here we have proposed a practical way to

measure the similarity across runs based on the symmetrised Kullback-Leibler

divergence,andhaveusedittoquantifyhowstablethedecompositioniswithrespectto

initialization. We have computed two complementary stability metrics. First, 𝑆

measuresthemeansimilarityacrosspairsofrealizations.Thisstabilitymetricisgeneral

sinceitisnotcentredonthebestrealization,andmeasureshowstronglystructuredthe

data are. Second, I is the similarity intercept obtained by comparing the highest-

likelihoodrealizationtoallothersthroughalinearregressionoftheirsimilarityagainst

their log-likelihood difference. This second stability metric takes account of the

likelihoodinformation,islesscomputationallyintensive,andisusedtoassesswhether

therealizationwithhighestlikelihoodhasreachedtheoptimalsolution.

The symmetrised Kullback-Leibler (sKL) divergence is suited to assessing

stabilitybecauseitissensitivetosmalldifferencesbetweendistributions.However,itis

unbounded,whichmakesitdifficulttointerpret.BynormalizingthesKLdivergenceby

itsmeanvalueoverrandomizations,wedefinedasimilarityindexσequalto1whenthe

distributions are identical and to 0when they are nomore similar than expected by

chance.This indexalsoaccounts for spatial autocorrelation in thedatabyperforming

spatialrandomizations.

To compute the similarity between two decompositions, we consider only the

similarity between the best-matching assemblages of both decompositions, thus

discardingpart of the information.Thismethodworkswellwhen thedecompositions

aresimilar,howeversimilarityisundesirablylowwhenassemblagesaremergedorsplit

between the two decompositions. This could be corrected by computing the sKL

divergence between the full partitioning of the data in both decompositions, i.e. the

assignment of every sequence read to an assemblage, instead of comparing pairs of

assemblages.Whilethisistheapproachadvocatedforintheclusteringliterature(Meila,

Page 188: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

184

2006;Vinhetal.,2010),itwouldlikelybeverycomputationallyintensiveinthecaseof

LDA.

ThereisnouniquemethodtoselectthenumberofLDAcomponentunits(Airoldi

etal.,2010).HereweuseAICminimizationasan indicationof theoptimalnumberof

assemblages. Another commonly used method consists in splitting the data into a

learningsetandatestset,andoptimizingthepredictivepowerofthemodelonthetest

set, as measured by a perplexity function (Blei et al., 2003). A more sophisticated

methodistofollowthenon-parametricmodellingapproachof(Tehetal.,2006),where

the number of assemblages ismodelled as a random latent variable that is estimated

from the data. However, this method proved to have convergence issues on our

empirical data. Stability of the algorithm’s output could also be used as a criterion to

select the number of assemblages. When a large number of LDA component units is

selected,anadditionalstepofanalysisusingsimplerstatisticalmethodsmaybeneeded

torepresentandinterprettheresultoftheLDAdecomposition(Mauchetal.,2015).

Tropical forest soil biodiversity decomposition. By applying LDA to an

environmentalDNAdataset,wedescribedthespatialstructureofbacterial,protistand

metazoansoilcommunitiesina12-hatropicalforestplot.Thespatialpatternsretrieved

byLDAforthesetaxonomicgroupsallowedustoshedlightonsoilcommunitystructure

(seealsoZingeretal.,2017).

We applied the LDA algorithm to metabarcoding data with no further

transformation than clustering the sequences to avoid defining spurious OTUs. We

verified that the interpolation of missing samples played no role in generating the

observed patterns. The AIC minimization yielded between 2 and 5 assemblages for

occurrencedatadependingonthetaxonomicgroup,butweusedthevalue𝐾 = 3across

groupstofacilitateintercomparisonandbecausetheLDAdecompositionisrobusttothe

number of assemblages close to the optimum (Fig. S7). For the 20,162-OTU bacterial

dataset, the largest dataset considered in this study, numerical inference of the LDA

decompositionforthreeassemblagestookabout25minutesforoccurrencedataand35

minutes for abundance data, which amounts to respectively 48 and 60 hours when

running100realizationsofthealgorithmtoteststability.

Page 189: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

185

Abun

danc

eO

ccur

renc

eBacteria Protists Fungi

a b c

d e f

110 120 130Topography (m a.s.l.)

0 5 10Wetness

h

0.1 0.2 0.3 0.4 0.5Slope

i

a b

cd e

f

110 120 130Topography (m a.s.l.)

g

0 5 10Wetness

h

0.1 0.2 0.3 0.4 0.5Slope

i

100m

a b c

d

e

f

110 120 130Topography (m a.s.l.)

g

0 5 10Wetness

0.1 0.2 0.3 0.4 0.5Slope

i

Topographic Wetness Index

Figure5:Spatialdistributionofmicroorganismassemblages,for𝑲 = 𝟑assemblages.

SpatialdistributionoftheassemblagesobtainedfromindependentLDAdecompositionsofbacteriaandprotists,for(a-b)abundanceand(c-d)occurrencedata.Sampledlocationsareindicatedbydarkdots,andtheassemblagemixturebetweensampleshasbeeninterpolatedusingordinarykriging.Terrafirme(inblue),hydromorphic(ingreen)andexposedrock(inred)assemblagescanbeidentifiedineachtaxonomicgroup,basedoncorrelationsto(e-f)Lidar-derivedtopography,TopographicWetnessIndexandslope,aswellasonfieldobservations.Thespatialpatternsretrievedforabundancedataaresimilartothoseobtainedwithoccurrencedatabutlessstronglycorrelatedtotopographicvariables.

Page 190: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

186

Thestabilityanalysisof thealgorithm indicates that communitiesofunicellular

organisms (i.e. bacteria andprotists) aremarkedly structured at the scale of theplot,

while metazoan communities are less so. The stability of the decomposition is also

stronglycorrelatedwiththenumberofOTUs,whichspansseveralordersofmagnitude

across taxonomic groups (Fig. 4, Table1). Thus, the lower statistical power in groups

containing fewer OTUs could explain this pattern. However, it is more likely due to

ecologicaldifferencesbetweengroups.Indeed,thispatternisconfirmedbyZingeretal.

using ordination-based variation partitioning between environmental and spatial

components.

Furthermore,thetwounicellularorganismgroupscaneachbedecomposedinto

threespatiallysegregatedassemblagesmatchingplottopography.Whilethecovariation

ofmicroorganism compositionwith topographywas already detected in Zinger et al.,

spatialpatternscanherebedirectlyrepresentedundertheformofassemblagesthatare

characteristic of the different topographic conditions (Fig. 5, Table S4). These spatial

patternscanalsobeshowntobesimilarbetweenbacteriaandprotists,whichisbotha

novel insight and a hint that the assemblages retrieved by LDAdo reflect community

structure. One assemblage associated with patches of exposed rock was retrieved in

bacteriaandprotistsbutalsoinarthropodsandnematodes.Itstaxonomiccomposition

is particularly distinctive (Fig. S7), which might be explained by the high amount of

decaying organic matter retained between the boulders in these patches. A current

limitationofLDAisthatitsabilitytocomparetaxonomiccompositiontoenvironmental

data is limited to computing simple correlations between the spatial distribution of

retrieved assemblages and environmental variables. This is in contrast to ordination-

basedmethods such as CanonicalRedundancyAnalysis, and improving on this aspect

wouldbeausefuldirectionofresearch.

Using occurrence versus abundance data. The use of occurrence data was

computationallyfaster,andledtomorestableandmoreinterpretablepatterns.Because

biodiversity data typically display a wide range of taxonomic abundances (Fig. S5),

switching from abundance to occurrence data amounts to dramatically increasing the

weight of rare taxa. In the empirical dataset, these OTUs constitute the bulk of the

Page 191: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

187

diversity: OTUs tallying on average less than one sequence read per samplemake up

over 85% of the total number of OTUs in bacteria and protists (Fig. S5). They play a

significant role in shaping the patterns, since removing them erases the retrieved

occurrence-basedspatialpatterns(Fig.S6).Thishintsattheimportanceofraretaxain

defining communities of microorganisms. A possible caveat however is that some of

thoserareOTUsmightbegeneratedbyremnantPCRerrorsinthedata.IfPCRerrorsare

repeatable for a given DNA sequence, this would produce groups of consistently co-

occurringOTUsandthusartificiallyincreasethestabilityofoccurrence-basedpatterns.

Conclusion. LDA is an efficientmethod to detect structure in the large and complex

datasetsgeneratedbyenvironmentalDNAsequencingmethods.Therepresentationof

spatialbiodiversitypatternsderived fromLDA iseasily interpretable, and themethod

comeswithameasureofhowstronglythisrepresentationissupportedbythedata.LDA

couldbeused toexplore thebiogeographicpatternsarising in larger-scaleDNA-based

biodiversitysurveyssuchastheEarthMicrobiomeProject(Gilbertetal.,2014)andthe

Tara Oceans Project (Sunagawa et al., 2015). It could also be applied in non-spatial

samplingdesigns,suchastimeseries.Lastly,LDAisoneexampleofafamilyofmodels,

which could for instance find applications in the study of plant-microorganism

interactions (Rosen-Zvi et al., 2004). We hope this study will stimulate research on

model-basedmethodsofdataanalysisfortheecologicalinterpretationofenvironmental

DNAstudies.

Page 192: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

188

Acknowledgements

We thank Dylan Craven, BartHaegeman,HélèneMorlon, TimPaine,MélanieRoy andMarc-AndréSelosseforfruitfuldiscussions.WethankBlaiseTymenforhishelpwiththe

Lidardata.Thisworkhasbenefited from “Investissementd’Avenir” grantsmanagedby

theFrenchAgenceNationaledelaRecherche(CEBA,ref.ANR-10-LABX-25-01andTULIP,

ref.ANR-10-LABX-0041;ANAEE-France:ANR-11-INBS-0001), anadditionalANRgrant

(METABARproject;PIP.Taberlet), funds fromCNRS.Wearegrateful to theGenotoul

bioinformaticsplatformToulouseMidi-Pyrénéesforprovidingcomputingresources.

Page 193: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

189

References

Airoldi, E.M., Erosheva, E.A., Fienberg, S.E., Joutard, C., Love, T. & Shringarpure, S. (2010)ReconceptualizingtheclassificationofPNASarticles.PNAS,107,20899–20904.

Andersen, K., Bird, K.L., Rasmussen,M., Haile, J., Breuning-Madsen,H., Kjaer, K.H., Orlando, L.,Gilbert, M.T.P. &Willerslev, E. (2012)Meta-barcoding of “dirt” DNA from soil reflectsvertebratebiodiversity.MolecularEcology,21,1966–1979.

Balagopalan,A.(2012)ImprovingTopicReproducibilityinTopicModels.Beven,K.J.&Kirkby,M.J. (1979)Aphysicallybased,variablecontributingareamodelofbasin

hydrology.HydrologicalSciencesBulletin,24,43–69.Blei, D. (2012) Probabilstic Topic Models. Communication of the Association for Computing

Machinery,55,77–84.Blei,D.M.,Ng,A.Y.&Jordan,M.I.(2003)LatentDirichletAllocation.JournalofMachineLearning

Research,3,993–1022.Bongers, F., Charles-Dominique, P., Forget, P.-M.&Théry,M. (2001)Nouragues:dynamicsand

plant-animalinteractionsinaNeotropicalrainforest,SpringerScience&BusinessMedia.Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-

inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16,176–182.

Burnham,K.P.&Anderson,D.R.(2002)Modelselectionandmultimodelinference,Springer,NewYork.

Chave,J.,Olivier,J.,Bongers,F.,Châtelet,P.,Forget,P.-M.,vanderMeer,P.,Norden,N.,Riéra,B.&Charles-Dominique,P.(2008)Above-groundbiomassandproductivityinarainforestofeasternSouthAmerica.JournalofTropicalEcology,24,355–366.

Ding,T.&Schloss,P.D.(2014)Dynamicsandassociationsofmicrobialcommunitytypesacrossthehumanbody.Nature,509,357–+.

Fortin,M.J.& Payette, S. (2002)How to test the significance of the relation between spatiallyautocorrelated data at the landscape scale: A case study using fire and forest maps.Ecoscience,9,213–218.

Gilbert, J.A., Jansson, J.K. & Knight, R. (2014) The Earth Microbiome project: successes andaspirations.BmcBiology,12.

Griffiths,T.&Steyvers,M.(2004)CollapsedGibbsSamplingforLDA.101,5228–5235.Grün,B.&Hornik,K.(2011)topicmodels:anRpackageforfittingtopicmodels.Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticaland

ecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.

Holmes,I.,Harris,K.&Quince,C.(2012)DirichletMultinomialMixtures:GenerativeModelsforMicrobialMetagenomics.PlosOne,7.

Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.

Kembel, S.W., Wu, M., Eisen, J.A. & Green, J.L. (2012) Incorporating 16S gene copy numberinformation improves estimates of microbial diversity and abundance. PlosComputationalBiology,8,11.

Klymus,K.E.,Richter,C.A.,Chapman,D.C.&Paukert,C.(2015)QuantificationofeDNAsheddingrates from invasive bighead carp Hypophthalmichthys nobilis and silver carp

Page 194: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

190

Hypophthalmichthysmolitrix.BiologicalConservation,183,77–84.Knights,D.,Kuczynski, J., Charlson,E.S., Zaneveld, J.,Mozer,M.C.,Collman,R.G.,Bushman,F.D.,

Knight,R.&Kelley,S.T.(2011)Bayesiancommunity-wideculture-independentmicrobialsourcetracking.NatureMethods,8,761-U107.

Kullback,S.(1959)InformationTheoryandStatistics,JohnWiley&Sons.Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.Mauch,M.,MacCallum,R.M.,Levy,M.&Leroi,A.M.(2015)Theevolutionofpopularmusic:USA

1960-2010.RoyalSocietyopenscience,2,150081–150081.Meila,M.(2006)Comparingclusterings—aninformationbaseddistance.JournalofMultivariate

Analysis,98,873–895.Nguyen,N.H., Smith,D., Peay, K.&Kennedy, P. (2015) Parsing ecological signal fromnoise in

nextgenerationampliconsequencing.NewPhytologist,205,1389–1393.O’Brien,J.&Record,N.(2016)ThepowerandpitfallsofDirichlet-multinomialmixturemodels

forecologicalcountdata.BioRxivpreprint.Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using

multilocusgenotypedata.Genetics,155,945–959.Rejou-Mechain,M.,Tymen,B.,Blanc,L.,Fauset,S.,Feldpausch,T.R.,Monteagudo,A.,Phillips,O.L.,

Richard,H.&Chave,J.(2015)Usingrepeatedsmall-footprintLiDARacquisitionstoinferspatialandtemporalvariationsofahigh-biomassNeotropical forest.RemoteSensingofEnvironment,169,93–101.

Rosen-Zvi,M.,Gri_ffiths,T.,Steyvers,M.&Smyth,P.(2004)TheAuthor-TopicModelforAuthorsandDocuments.

Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European PhysicalJournalSpecialTopics,178,13–23.

Shafiei, M., Dunn, K.A., Boon, E., MacDonald, S.M.,Walsh, D.A., Gu, H. & Bielawski, J.P. (2015)BioMiCo:asupervisedBayesianmodel for inferenceofmicrobialcommunitystructure.Microbiome,3,8.

Sommeria-Klein, G., Zinger, L., Taberlet, P., Coissac, E. & Chave, J. (2016) Inferring neutralbiodiversityparametersusingenvironmentalDNAdatasets.Scientificreports,6.

Steyvers,M.&Griffiths,T.(2007)ProbabilisticTopicModels.LatentSemanticAnalysis:ARoadtoMeaning (ed. by T. Landauer), D. McNamara), S. Dennis), and W. Kintsch), LaurenceErlbaum.

Sunagawa, S., Coelho, L.P., Chaffron, S., Kultima, J.R., Labadie, K., Salazar, G., Djahanschiri, B.,Zeller,G.,Mende,D.R.,Alberti,A.,Cornejo-Castillo,F.M.,Costea,P.I.,Cruaud,C.,d’Ovidio,F.,Engelen,S.,Ferrera, I.,Gasol, J.M.,Guidi,L.,Hildebrand,F.,Kokoszka,F.,Lepoivre,C.,Lima-Mendez,G.,Poulain, J.,Poulos,B.T.,Royo-Llonch,M.,Sarmento,H.,Vieira-Silva,S.,Dimier,C.,Picheral,M.,Searson,S.,Kandels-Lewis,S.,Bowler,C.,deVargas,C.,Gorsky,G.,Grimsley,N.,Hingamp,P.,Iudicone,D.,Jaillon,O.,Not,F.,Ogata,H.,Pesant,S.,Speich,S.,Stemmann, L., Sullivan,M.B.,Weissenbach, J.,Wincker, P., Karsenti, E., Raes, J., Acinas,S.G., Bork, P. & Tara Oceans, C. (2015) Structure and function of the global oceanmicrobiome.Science,348.

Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012)EnvironmentalDNA.MolecularEcology,21,1789–1793.

Teh,Y.W., Jordan,M.I.,Beal,M.J.&Blei,D.M.(2006)HierarchicalDirichletProcesses. JournaloftheAmericanStatisticalAssociation,101,1566–1581.

Than,K.&Ho,T.B.(2012)FullySparseTopicModels.

Page 195: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

191

Thomsen,P.F.&Willerslev,E.(2015)EnvironmentalDNA-Anemergingtoolinconservationformonitoringpastandpresentbiodiversity.BiologicalConservation,183,4–18.

Tymen,B.,Vincent,G.,Courtois,E.A.,Heurtebize, J.,Dauzat, J.,Marechaux, I.&Chave, J. (2017)Quantifying micro-environmental variation in tropical rainforest understory atlandscapescalebycombiningairborneLiDARscanningandasensornetwork.AnnalsofForestScience,2,1–13.

Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)DecomposingbiodiversitydatausingtheLatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.EcologyLetters,17,1591–1601.

Vinh,N.X.,Epps,J.&Bailey,J.(2010)Informationtheoreticmeasuresforclusteringscomparison:variants, properties, normalization and correction for chance. Journal of MachineLearningResearch,11,2837–2854.

Zinger, L., Chave, J., Coissac, E., Iribar, A., Louisanna, E., Manzi, S., Schilling, V., Schimann, H.,Sommeria-Klein, G. & Taberlet, P. (2016) Extracellular DNA extraction is a fast, cheapand reliable alternative for multi-taxa surveys based on soil DNA. Soil Biology andBiochemistry,96,16–19.

Zinger, L., Taberlet, P., Schimann, H., Bonin, A., Boyer, F., De Barba, M., Gaucher, P., Gielly, L.,Giguet-Covex,C.,Iribar,A.,Rejou-Mechain,M.,Raye,G.,Rioux,D.,Schilling,V.,Tymen,B.,Viers,J.,Zouiten,C.,Thuiller,W.,Coissac,E.&Chave,J.(2017)Soilcommunityassemblyvariesacrossbodysizesinatropicalforest.bioRxiv.

Page 196: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

192

Spatialstability(K=3)

Taxonomicstability(K=3)

Abundancedata

Occurrencedata

Abundancedata

Occurrencedata

⟨ S!"#$.⟩ !""

I !"#$.,!""

⟨ S!"#$.⟩ !""

I !"#$.,!""

⟨ S!"#$.⟩ !

""

I !"#$.,!""

⟨ S!"#$.⟩ !

""

I !"#$.,!""

Bacteria16S

0.88

1.0

0.85

1.0

0.99

1.0

0.95

1.0

Protists18S

0.72

0.87

0.68

1.0

0.92

0.96

0.95

1.0

Arthropods18S

0.46

0.65

0.62

0.62

0.65

0.78

0.91

0.93

Nem

atodes18S

0.43

0.67

0.33

0.49

0.69

0.87

0.88

0.94

Platyhelminthes18S

0.45

0.69

0.52

0.50

0.66

0.83

0.86

0.88

Annelids18S

0.63

0.81

0.41

0.57

0.75

0.85

0.83

0.90

TableS1:StabilityofLDAdecompositionforoccurrenceandabundancedata.Foreachofthetaxonomicgroupsunderstudy,

spatialandtaxonom

icstabilityforthreeassemblagesasmeasuredbythe⟨ 𝑆⟩ !""and𝐼 !""metrics,forabundanceandoccurrence

data.

SupplementaryInformation

Page 197: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

193

Topography Wetness Slope

Terrafirme

Bacteria16S 0.36** -0.27** -0.26**Protists18S 0.23** -0.15** -0.17**Arthropods18S 0.18** -0.16*** -0.027Nematodes18S 0.15** -0.091** -0.081**Platyhelminthes18S 0.12** -0.11** -0.043Annelids18S 0.023 0.042 -0.091*

Hydromorphic

Bacteria16S -0.43*** 0.40*** 0.31**Protists18S -0.23** 0.10* 0.21***Arthropods18S -0.097* 0.10** -0.015Nematodes18S -0.096*** 0.10** 0.045Platyhelminthes18S 0.044 -0.099** 0.0087Annelids18S -0.057 0.058 0.022

Exposedrock

Bacteria16S 0.00024 -0.078** 0.0084Protists18S -0.052 0.084 -0.025Arthropods18S -0.12* 0.083 0.070Nematodes18S -0.071 -0.018 0.049Platyhelminthes18S -0.14** 0.19** 0.027Annelids18S 0.0098 -0.072* 0.075**

Table S2: Correlation coefficients between the spatial distribution of assemblages andabiotic variables. p-values p were computed based on 100,000 spatial randomizations.Significantcorrelationcoefficientsare indicatedby*,**,*** (𝑝 < 0.05, 𝑝 < 0.01, 𝑝 < 0.001),andadditionally by bold font when they are consistent with a hydromorphic or terra firmeinterpretation. Taxonomic groups in bold are those that can be assigned a ‘terra firme’ or‘hydromorphic’ label based on correlations to topography,wetness and slope, or an ‘exposedrock’labelbasedoncorrelationtothe‘exposedrock’bacterialassemblage(seeTableS3).

Page 198: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

194

Table S3: Correlation coefficients𝝆𝒔𝒑𝒂𝒕. between the spatial distribution of bacterialassemblages and the assemblages in other taxonomic groups. p-valuespwere computedbased on 100,000 spatial randomizations. Significant correlation coefficients are indicated by*,**,*** ( 𝑝 < 0.05, 𝑝 < 0.01, 𝑝 < 0.001 ), and correlation coefficients larger than 0.50 areindicatedbyboldfont.LabelsofassemblagesarethesameasinTableS2.

Bacteria16S

Terrafirme Hydromorphic Exposedrock

1stassemblage

-

Terrafirme

Protists18S 0.76*** -0.40*** -0.53***

Arthropods18S 0.23*** 0.12** -0.43***

Nematodes18S 0.24*** -0.19*** -0.098***

Platyhelminthes18S 0.32*** -0.10** -0.29***

Annelids18S 0.23*** 0.0046 -0.29***

2ndassemblage

-

Hydromorphic

Protists18S -0.45*** 0.51*** 0.022

Arthropods18S 0.16*** -0.21*** 0.022

Nematodes18S 0.10*** 0.16*** -0.29***

Platyhelminthes18S 0.20*** -0.13*** -0.12**

Annelids18S -0.064 0.13* -0.059*

3rdassemblage

-

Exposedrock

Protists18S -0.56*** -0.055 0.76***

Arthropods18S -0.65*** 0.12* 0.69***

Nematodes18S -0.48*** 0.045 0.56***

Platyhelminthes18S -0.48*** 0.20*** 0.38***

Annelids18S -0.18*** -0.076** 0.31***

Page 199: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

195

Table S4: Five most abundant OTUs per bacterial assemblage (out of 20,162 bacterialOTUs),foroccurrenceandabundancedata.

OTUproportion Taxonomicassignment

Occurrence-basedassemblages

Terrafirme

8.1.10-4 Acidobacteria 8.1.10-4 Acidobacteriaceae(Subgroup1)sp. 8.1.10-4 Acidobacteriaceae(Subgroup1)sp. 8.0.10-4 Acetobacteraceaesp. 8.0.10-4 unculturedHolophagasp. …

Hydromorphic

6.2.10-4 Acidothermaceaesp. 6.2.10-4 unculturedHolophagasp. 6.1.10-4 Nitrosomonadaceaesp. 5.9.10-4 unculturedHolophagasp. 5.9.10-4 Haliangiaceaesp. …

Exposedrock

6.1.10-4 RhizobialesIncertaeSedissp. 5.8.10-4 unculturedAcetobacteraceaebacterium 5.8.10-4 unculturedAcidobacteriaceaebacterium 5.7.10-4 Acidobacteriaceae(Subgroup1)sp. 5.7.10-4 Bacteria …

Abundance-basedassemblages

Terrafirme

4.0.10-2 Acidobacteria 3.0.10-2 unculturedNitrosococcussp. 2.6.10-2 unculturedBacillaceaebacterium 2.2.10-2 Acidothermaceaesp. 1.9.10-2 Alcaligenaceaesp. …

Hydromorphic

1.7.10-2 Alcaligenaceaesp. 1.6.10-2 unculturedThermosporotrichaceaebacterium 1.6.10-2 unculturedBacillaceaebacterium 1.4.10-2 Acidobacteria 1.3.10-2 Acidothermaceaesp. …

ExposedRock

3.8.10-2 Acidothermaceaesp. 1.6.10-2 Acidothermaceaesp. 1.5.10-2 unculturedNitrosococcussp. 1.0.10-2 unculturedSteroidobactersp. 1.0.10-2 Xanthobacteraceaesp. …

Page 200: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

196

Figure S1. Soil sampling over 12 ha of tropical forest. 1,131 soil samples (one every 10meters) were taken from the mineral soil horizon on a permanent plot of relativelyhomogeneous primary plateau forest at the Nouragues Ecological Research Station, FrenchGuiana.

300m400m

100m

Page 201: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

197

FigureS2.LDAappliedtoasimulateddatasetwith5assemblages,1,000MOTUs,1,131samples,and1,000 sequence readsper sample, (a,c) for the original abundancedata, and (b,d) for theoccurrencedataderivedfromthesamedataset.Panels(a,b)showthecomparisonbetweentherealization with highest likelihood and the 99 others using the spatial similarity𝑆!"#$. .𝑆!"#$. !"" = 0.98 for occurrence data, 𝑆!"#$. !"" = 0.89 for abundance data,𝐼!"#$.,!"" = 1.0 inboth cases; cf. Fig. 2. Panels (c,d) show AIC comparison between different K values, with 3realizationsperKvalue.

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●●●

●●●

●●●

●●

●●

0 50000 100000 150000

0.65

0.75

0.85

0.95

Spat

ial s

imila

rity

to b

est r

ealiz

atio

n

Llh difference with best realization

a

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●

0 2000 4000 6000

0.6

0.7

0.8

0.9

1.0

Llh difference with best realization

b

● ●

● ● ●

2 3 4 5 6 7 8

8.8e

+06

9.4e

+06

1.0e

+07

● ●

● ● ●

● ●

● ● ●

● ● ● ●

c

Number K of assemblages

AIC

● ● ● ●

2 3 4 5 6 7 8

1220

000

1260

000

● ● ● ●

● ● ●●

● ● ●

d

Number K of assemblages

Page 202: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

198

FigureS3.StabilityofLDAdecomposition(𝑲 = 𝟑)forthedifferenttaxonomicgroups.Therealizationwithhighestlikelihoodoutof100iscomparedtothe99othersbasedontheirspatialsimilarity(y-axis)andontheirlog-likelihooddifference(x-axis),foroccurrencedataandforallthe taxonomic groupsunder study. The intercept I of the linear regression (dashedblue line)showsadifferencebetweenunicellularorganisms(𝐼 = 1.0)andmetazoans(𝐼 < 0.62).

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●● ● ●●

●●

●●●●

●●

0.00

0.25

0.50

0.75

1.00

0 2000 4000 6000 8000Llh difference

with best realization

Spat

ial s

imila

rity

to b

est r

ealiz

atio

n

a − Bacteria●●●●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●●●

●●●●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

● ● ●

0.00

0.25

0.50

0.75

1.00

0 300 600 900Llh difference

with best realization

b − Protists

●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●

● ●

●●

●●

0.00

0.25

0.50

0.75

1.00

0 200 400 600Llh difference

with best realization

Spat

ial s

imila

rity

to b

est r

ealiz

atio

n

c − Arthropods

● ● ●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

0.00

0.25

0.50

0.75

1.00

0 100 200 300 400Llh difference

with best realization

d − Nematodes

● ●●●

●●●●●●

●●●●●●●●●

●●●●●

●●●●●●●●●

●●

●●●●

●●●●

●●

●●

●●●

●●●●●●●●●●●●●

●●●

●●●●

●●

●●●

●●●●

●●●●

0.00

0.25

0.50

0.75

1.00

0 200 400 600Llh difference

with best realization

Spat

ial s

imila

rity

to b

est r

ealiz

atio

n

e − Platyhelminthes●

●●

●●

●●

●●

●●

0.00

0.25

0.50

0.75

1.00

0 200 400 600Llh difference

with best realization

f − Annelids

Page 203: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

199

Figure S4: Spatialdistributionof eukaryotic assemblages, for𝑲 = 𝟑assemblages.SpatialdistributionoftheassemblagesobtainedfromindependentLDAdecompositionsofarthropods,nematodes,flatworms(Platyhelminthes)andannelids,foroccurrence(a-d)andabundance(e-h)data.Asinfigure4,sampledlocationsareindicatedbydarkdots,andtheassemblagemixturebetweensampleshasbeeninterpolatedusingordinarykriging.Foroccurrencedata,an‘exposedrock’ assemblage (in red) can be identified in arthropods and nematodes based on spatialcorrelationtothebacterial‘exposedrock’assemblage(TableS3).An‘exposedrock’assemblagemaybedistinguishedinflatwormsandannelidsaswellbutislessconspicuousthere.

Arthropods Nematodes

Occ

urre

nce

Platyhelminthes AnnelidsAb

unda

nce

a b c d

e f g h

Page 204: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

200

FigureS5:Rankedlog-abundancedistributionforbacteriaandprotists.

0 5000 15000

01

23

45

0 500 1000 15000

12

34

50 2000 6000 10000

01

23

45

OTUs ranked by abundance

OTUs ranked by abundance

MOTUs ranked by abundance

Abun

danc

e (lo

g 10 o

f rea

d nu

mbe

r)a - Bacteria b - Protists c - Fungi

Page 205: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

201

Figure S6:Effectofdatapre-processingonLDAdecomposition.Spatialdistributionoftheassemblages obtained from independent LDA decompositions of bacteria and protists foroccurrencedata, (a-b)withoutanyfilteringofrareOTUs,(c-d)afterremovingOTUsoccurringonly ina single sample, and (e-f) after removingOTUswith less thanone readper sampleonaverage (low-abundance OTUs). Removing single-sample OTUs brought little change to thedecomposition.Removinglow-abundanceOTUsontheotherhandyieldedverydegradedspatialpatterns in bacteria and protists, hinting at the important role of rareMOTUs in defining theretrievedassemblages.

a b

c d

e f i

Occ

urre

nce,

no

sing

le-s

ampl

e M

OTU

Occ

urre

nce,

no

low

-abu

ndan

ce

MO

TUO

ccur

renc

e

Bacteria Protists

Page 206: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Chapter3–TopicModelling

202

Figure S7: Spatial distribution of microorganism assemblages for occurrence data, for𝑲 = 𝟑 and for𝑲 = 𝑲𝐦𝐢𝐧[𝐀𝐈𝐂] assemblages (bacteria:𝐾!"# !"# = 5 ; protists:𝐾!"# !"# = 2 ).Decompositionsfor𝐾 = 𝐾!"#[!"#]andforK=3differprimarilythroughsplittingormergingofassemblages,withoutmajordisruptionofthespatialpatterns.ThisillustratestherobustnessofLDA decomposition to the number of assemblages close to the optimum. The exposed rockassemblage(darkred) is leftunchangedforKbetween2and5 inbacteriaandprotists,whichindicatesastrongtaxonomicdistinctiveness.

a b

c d f

Bacteria Protists Fungi

K =

3K m

in[A

IC]

Page 207: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

203

Discussion

Page 208: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

204

I. Synthesis

While data acquisition in ecology has long been dominated by low-technology

approachesrelyingmostlyondirecthumanobservation,thefieldhasrecentlywitnessed

a trend toward automated data acquisition. In particular, automated biodiversity

measurementscannowbeobtainedthroughthesequencingofenvironmentalDNA.This

method has originated in microbiology, where it is often the only means to obtain

informationontheorganismsunderstudy,butcannowbeappliedwithincreasingease

toanytypeoforganism.Thisprovidesecologistswithanunprecedentedinfluxofdata,

whichalsocreatesnewchallenges.

WhileDNA-baseddataareauniquemeanstoobtainexhaustiveandstandardized

biodiversitymeasurements,itremainsuncertaintowhatextenttheywillhelpsolvethe

classical questions of community ecology. Indeed, these data are obtained through

indirect observation of the targeted organisms, and lack in detail and accuracy

comparedtodirectobservations:inasense,qualityistradedforquantity.Thisentailsa

shiftfromstudiesrichinbiologicaldetailstowardthestudyofstructureandpatternsin

largedatasets.Moreover, thesheeramountofdataproduced is in itselfanobstacle to

theuseoftheclassicalstatisticalapproachesofecology.Conversely,currenttheoretical

modelsinecologyareoftennotwellsuitedforcomparisonwithdata.

The characteristics of environmental DNA data make them well suited to the

studyofintegrativepatternsofbiodiversity,forwhichthequantityandexhaustiveness

ofavailabledatamattermorethandetailed informationonindividualtaxa. Integrative

patterns have long been a key source of information for addressing one of the core

questionsof communityecology:whatare thedriversof communityassembly, and in

particular, when do dispersal limitation and demographic drift supersede abiotic

filteringandspeciesinteractionsasthemaindrivers?Thefirstandthirdchaptersofthis

thesis explore the use of environmental DNA data for the study of spatially explicit

biodiversitypatterns,whilethesecondchapterfocusesonrelativespeciesabundances.

ThesepatternsarestudiedinthetropicalforestofFrenchGuiana,a‘hyperdiverse’and

Page 209: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

205

poorlyknownecosystem;twocharacteristicsthatmakeautomateddatacollectionmost

needed.

ThefirstchaptershowshowenvironmentalDNAdatacanbeusedtoinvestigate

thedriversofbetadiversityinaspatiallyexplicitcontext,asithasbeendonepreviously

for classical data such as tree censuses in monitored forest plots. On a spatial scale

rangingfrom40mto140kmbetweensamplingpoints,adecayoftaxonomicsimilarity

withdistanceisobservedinmostgroups,i.e.plants,fungi,arthropods,insects,annelids,

bacteria, and protists, but not in nematodes and flatworms. Clear differences can be

observed between domains of life regarding the relative influence of geographic

distanceandabioticconditionsonbetadiversity:thedatahintatapredominanteffectof

dispersal limitation in plants and annelids, a predominant effect of abiotic filtering in

bacteriaandprotists,andamixtureofboth in fungi,arthropodsand insects.Thebeta

diversity of fungi and soil insects appears to be especially high. These findings are in

agreementwithexpectationsandpreviousempirical results forplants andunicellular

organisms (Condit et al., 2002; Soininen et al., 2007; Ramirez et al., 2014), but bring

some novel insight for annelids, fungi and insects. In addition, the inclusion of a few

forest plots subject to past logging activities indicates that even after two decades at

least,aneffectcanbedetectedonplantandannelidcomposition,aswellasonfungitoa

lesserextent,whereas it isnot thecase forothergroups.Thus, large-scalepatternsof

biodiversitycannowbereadilymeasuredandcomparedacrossatropicalforest’swhole

rangeoftaxausingenvironmentalDNA.

The second chapter focuses on relative species abundances, a pattern that has

been extensively used to test the predictions of theoretical models of community

assembly,especiallysinceHubbell’sworkontheneutraltheoryofbiodiversity(Hubbell,

2001). A major obstacle in exploiting species abundance patterns generated using

environmentalDNAisthatabundanceinformationisunreliable,becauseitisnoisyand

difficulttointerpret.However,simulationsshowthatevenifabundancemeasurements

areunreliable for individual taxa, valuable information can still be retrieved from the

species abundance distribution as a whole, as long as the noise is not too strong. In

particular, the parameters that characterize diversity and connectivity in a neutral

communitymaystillbereliablyestimated.Thanktothesampling-invariancepropertyof

neutralmodels,sequencingreadsmaybeusedasdiscreteabundanceunits inplaceof

Page 210: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

206

individualsaslongastheDNAoriginatesfromanumberofindividualslargerthanthe

numberofreads.Whileitisusuallythecaseformicroorganisms,thisconditionmaynot

beverifiedforlargerorganisms.Lastly,greatcareshouldbetakeninclusteringspurious

OTUsgeneratedduringPCRamplificationandDNAsequencing,sincetheystronglybias

neutralparameterestimates.

When the spatial distribution of species is shaped at least partly by niche

processes or by limiteddispersal, the structure of spatially distributed environmental

DNAdatashouldbemarkedbytheseprocesses.However,thissignalmaybefaintand

complex. Moreover, it is usually obscured by a large number of rare species and an

uneven sampling effort across samples. The third chapter shows how a categorical

mixturemodelsimilartosomeofthemodelsusedinmicrobiology,populationgenetics

ortextdocumentmodelling,LatentDirichletAllocation,canbeusedtoretrievespatial

patternsinaregularly-sampled12-haforestplot.Unliketheclassicalpattern-detection

tools of community ecology, such as simple ordination and clustering algorithms, this

model is designed to accommodate discrete abundance data in a large number of

unevenly sized samples, and performswell on large and sparse communitymatrices.

Even though the fitted model parameters may depend on the initialization of the

inference algorithm, this uncertainty can be quantified by measuring the similarity

between the outputs of different runs. The stability of the output across initial

conditions may even be used as an empirical measure of how strong the spatial

structureis.

In the 12-ha forest plot, the strongest structure is detected for bacteria and

protists.Moreover,thespatialpatternsofthesetwogroupsareverysimilar,andmatch

the topography of the forest plot. This is in agreement with the findings of the first

chapter,sinceabioticfilteringwasfoundtheretostronglyinfluencethebetadiversityof

these groups. In contrast, spatial structure in arthropods and annelids isweak,which

indicatesthatthespatialscaleandthelevelofenvironmentalheterogeneityina12-ha

plot are insufficient todetect theprocesses thatwere found toacton thesegroupsat

largerspatialscales.

Overall, we conclude that environmental DNA data can offer a uniquely

comprehensive, if somewhat crude, perspective on community structure in a complex

Page 211: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

207

and species-rich ecosystem. In addition to the classical tools of community ecology,

model-basedstatisticalmethodscanbeborrowedfromfieldsmoreaccustomedtolarge

and complex datasets, and put to good use to take full advantage of these data. The

development of ecology into a data-rich field should foster the development of

theoreticalmodels thatcanbecomparedtodatausingrigorousstatisticalapproaches,

following the example of Hubbell’s neutral model and its subsequent theoretical

developments (Etienne, 2005; Harris et al., 2015). Building on generative models

stemming frommachine learning, such as Latent Dirichlet Allocation, is one possible

avenueforthedevelopmentofsuchmodels,asdiscussedinthefollowing.

Page 212: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

208

II. Perspectives

Aquaticcommunities 1.

Thisthesisaimedatexploringgeneralapproachesfortheanalysisandinterpretationof

large biodiversity datasets, with the underlying goal of understanding community

assembly processes from biodiversity patterns. Nevertheless, it mostly focuses on

communityassemblyinlandecosystems,especiallyasstudiedthroughtheamplification

andsequencingofDNAextractedfromsoilsamples.Onemayfollowasimilarapproach

forstudyingcommunitiesofaquaticorganismsbyextractingDNAfromwatersamples.

Experimentally, themethodconsists in filteringwater throughameshsoas to collect

smalllivingorganismsaswellasfragmentsorsloughedmaterialfromlargerorganisms.

In particular, this approach allows for the study of planktonic microorganisms (i.e.,

suspended in thewater column and passively transported bywatermovements), the

knowledge ofwhich is so far very fragmented, despite them forming the basis of the

ocean’s foodwebandbeing responsible for theproductionof half of the atmospheric

dioxygen(Fieldetal.,1998).

The Tara project is an unprecedented and on-going effort to sample marine

planktonic communities in various locations spread across the world’s oceans (de

Vargas etal., 2015). Samplingwas conducted chiefly in the open ocean from2009 to

2012, withmore recent campaigns focusing onmore specific habitats. Samples were

collectedatdifferentdepths,andusingdifferentmeshsizessoastoassignthesampled

organismstodifferentsizeranges.TheLatentDirichletAllocationapproachofthethird

chapteriscurrentlybeingappliedtothisdataset,soastounderstandthebiogeography

andcommunitystructureofplanktoniceukaryotesacrosstheworld’socean.

Page 213: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

209

Figure1:Biogeographicpatternsinoceanicplanktonpredictedbyaneutralagent-basedmodel.Transportbyoceaniccurrentwassimulatedduring1,400yearswithconstantmutationrate starting from a single genome. Biogeographic regions are distinguished based on theirdominantOTUs,definingOTUsat either (top)99.9%similarityor (bottom)99.5%similarity.AdaptedfromHellwegeretal.(2014).

However, it is unclear what a suitable neutral model would be for planktonic

communities. Indeed, unlike land organisms, planktonic organisms do not actively

disperse.Instead,thelocalcommunityistransportedovertimealongoceaniccurrents,

and slowlymixeswith surrounding communities along theway.Hubbell’smodel of a

localcommunityunderconstantimmigrationflowcouldberegardedasasuitablemodel

for a planktonic community followed through time along an oceanic current

(‘Lagrangian’ perspective). However, whether several simultaneously sampled

communitiescanbeconsideredas independentandundergoing immigration fromthe

same metacommunity depends on their positions relative to oceanic currents.

We first consider the diversity in the model,which is not a realistic estimate of the actualdiversity in the surface ocean microbe popula-tion because we simulate super-individuals, butit illustrates the behavior of model. When themodel starts with a diverse population (that is,each cell has a different genome) and no muta-tion, diversity decreases monotonically as OTUsare lost by extinction andnot gained bymutation(Fig. 1A). After ~100 years, the population con-sists of ~10 OTUs, resident in relatively distinctspatial regions, and the rate of OTU loss becomeslimited by dispersal between these provinces (28).At that time, the model starts to predict higherOTU richness than a neutral theory model thatdoes not consider dispersal limitation (31). Therate of OTU loss becomes low, but the popu-lations continue to mix (28) and the probabilityof extinction remains greater than zero. At100,000 years, the model includes two OTUs inthe Southern Ocean and everywhere else. Themodel should eventually reduce to one OTU,although this is not realized in the 100,000-yearsimulation.When the model is initialized with a diverse

population and includes mutation, it also ex-hibits an initial rapid loss in diversity but then

levels off at an OTU richness slightly higher thanthe simulation without mutation (Fig. 1B). Forthese simulations, we determined the diversityfrom a sample of the population (100 cells) by per-forming pairwise whole-genome BLAST (BasicLocal Alignment Search Tool) alignment, iden-tifying OTUs using 99.9% whole-genome iden-tity cutoff and then up-scaling to the true richness(in the model) using Chao1, a nonparametric spe-cies estimator that extrapolates from the sampledata to “true” richness (see supplementary mate-rials and methods). The OTU richness is variableover time because of stochastic transport andsampling (see fig. S4), but that is identical forall simulations. Therefore, the difference in OTUrichness between the simulations with and with-out mutation can be attributed solely to muta-tion. The difference is relatively small but increaseswith higher taxonomic resolution (99.95% cut-off) or mutation rate (×3). For a simulation start-ing with a uniform population (all cells havethe same genome) and including mutation, theOTU richness starts at one and then increasesonce sufficient mutations accumulate to exceedthe OTU threshold. After ~200 years, the simu-lations starting diverse and uniform converge. Atthat time, model has reached a dynamic steady

state where the rate of OTU loss by extinction isbalanced by the rate of OTU gain by mutation.From a practical perspective, this shows thatspecifying different initial conditions or runningthe model any longer would not change thediversity.The model is then used to explore the role

of neutral evolution in producing biogeographicpatterns. As an example, we compared the ge-nomes of cells fromHawaii and the Gulf of Alaska(Fig. 2B). For the simulation starting diversewithout mutation, the difference (nucleotide di-vergence) is 100% until ~700 years when it ab-ruptly decreases to 0% (Fig. 2A). This is causedbya takeover of the Central Pacific province by a cellfrom the North Pacific province or a coalescenceof these two subpopulations (27) (see also movieS1 around 700 years). The simulation starting di-verse with mutation also starts at 100% anddecreases at the coalescence event, but thenincreases again as the two subpopulations di-verge. The simulation starting uniform initiallyhas 0% difference but immediately starts toincrease and then converges with the simulationstarting diverse and including mutation. Coales-cence events are stochastic (see also fig. S4B),and we observed two such events over the 1500-year simulation period. There are also occasionalabrupt drops in nucleotide divergence, which aredue to vagrant cells that enter a province but donot establish. The magnitude of nucleotide di-vergence is a function of the growth and muta-tion rates (figs. S5 and S6).The time between coalescence events puts a

limit to how much two provinces can diverge,and in this case, the model predicts up to 0.5%difference (99.5% identity). These results can berelated to observations. If two cells are sampledfrom these two locations and their genomes aresequenced and compared, 0 to 0.5% of the ob-served difference can be attributed to neutral pro-cesses. This level of divergence is substantial butconsiderably lower than what is commonly con-sidered a species (>95% identity). We map outthe biogeographic pattern produced by neutralevolution for Hawaii compared with all locationsacross the globe (Fig. 2B). The model predictsthat nucleotide divergence generally increaseswith distance from Hawaii. However, the diver-gence is larger for the North Pacific than theIndian Ocean, so distance and/or the presenceof landmasses are not necessarily good proxiesof dispersal barriers. We also compile this in-formation for all locations into an atlas of neu-tral biogeography (table S1).We mapped out the biogeographic pattern

produced by neutral evolution using fragmentrecruitment (7), which is akin to in silico DNAhybridization. Specifically, we took the single-cell genomes (SCGs) of the OTUs that were main-tained in the simulation starting uniform at 1400years (see Fig. 1B) and recruited fragments col-lected on a 10°-by-10° grid. We assigned each gridbox to the highest-recruiting SCG (i.e., the dom-inant OTU) and colored them accordingly, il-lustrating the provinces produced by neutralevolution andmaintained by dispersal limitation

1348 12 SEPTEMBER 2014 • VOL 345 ISSUE 6202 sciencemag.org SCIENCE

Fig. 3. Biogeographic patterns (OTU provinces) in global surface ocean microbes predicted by aneutral agent-based model and quantified by metagenomics fragment recruitment. Alignment offragments collected on a 10°-by-10° grid (number of samples n = 10,000 at each box, fragment length l =1000 base pairs) with SCGs from OTUs remaining at 1400 years (see Fig. 1B). (Top) 99.9% and(bottom) 99.5% BLAST identity. “Start Uniform” simulation, where all initial cells have the same,completely random genome, is shown. The colors demarcate areas with common dominant OTUs.

RESEARCH | REPORTS

Page 214: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

210

Simulations of the transport of plankton by currents between stations could help

measure their level of connectivity (seeFig. 1; Followsetal., 2007;Wardetal., 2012;

Hellwegeretal.,2014),andserveasthebasisforinference-orientedmodellingefforts.

Topicmodellingofbiodiversitydata2.

AsdiscussedinthesectionIII.5oftheIntroduction,LatentDirichletAllocationisavery

versatilemethod,thathasbeenemployedinavarietyofcontextsfarbeyonditsoriginal

intendeduseasa ‘naturallanguageprocessing’method.Itcouldbecomearoutinetool

fortheanalysisofenvironmentalDNAdata,astheverysimilarStructuresoftwarehas

become in population genetics. While the third chapter focuses on the analysis of

spatially distributed samples, LDA could prove equally useful for the analysis of time

series,orwhenbothaspatialandatemporaldimensionsarepresent,as inValleetal.

(2014).Itcouldalsobeusedtoanalysesamplesthatareneitherspatiallynortemporally

distributed. This is for instance often the case of humanmicrobiome data, which are

currently collected in large quantities, and the interpretation of which is an active

domain of research inmedical sciences (Huttenhower etal., 2012). Another potential

application is the analysis of the bacterial communities found in sewage plants, the

understanding of which is of critical importance for the optimization of wastewater

treatment(Ofiteruetal.,2010).

Theuseofgenerativemixturemodelsisnotnewinmicrobiology:thesemethods

have been first introduced to the field with the works of Knights et al. (2011) and

Holmes et al. (2012). However, probably because ecology and microbiology are still

relatively separate scientific fields, andbecause theuseof environmentalDNAdata is

more recent in ecology, generative mixture models have been little used so far in

ecology, except for the effort of Valle et al. (2014) on classical tree census data.

Furthermore, focus in microbiology appears to have beenmostly on models without

admixture (i.e.,where samplesbelong to a single assemblage, cf. Introduction), unlike

topic models. While LDA is one of the simplest topic models (along with the earlier

Probabilistic Latent Semantic Analysis model, or PLSA; Hofmann, 2001), many

extensions have been developed for the analysis of text documents since its original

Page 215: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

211

introduction. The adaptation of these methods to bioinformatics, e.g. for the

classification of DNA sequences or the identification of protein function, has been

extensivelyexplored(Liuetal.,2016).Ecology,andmicrobiology,wouldbenefitfroma

similareffortorientedtowardbiodiversitydata.Inthefollowing,Ireviewafewpossible

examples.

Figure2:Terrestrial biogeographicunits of theworld inferred from thedistributionof21,037speciesofamphibians,birdsandmammals. InferencewasperformedusingUPGMAhierarchical clustering on phylogenetic dissimilarity. Thick lines denote main biogeographicboundaries (separating ‘realms’) and dotted lines denote minor ones (separating ‘regions’).AdaptedfromHoltetal.(2013).

First,theapproachpresentedinthethirdchaptercanbeusedirrespectiveofthe

spatial scale at which the data are collected, andmay for instance be applied to the

definitionofbiogeographicunits.AsidefromthesequencingofenvironmentalDNA,the

development of DNA sequencing methods now allows for efficiently and accurately

assigningtoataxonanycollectedbiologicalmaterial,onceasuitablereferencedatabase

hasbeenestablished.Thus,abetterusecanbemadeofthelargenumberofspecimens

eithercollectedinthefieldorstoredinmuseumcollections,andtheresultingdatamay

be used to study biogeographic patterns in a data-driven way. In recent years,

alternatives to the classical hierarchical clustering approach (followed for instance in

Holt et al., 2013; see Fig. 2) have been sought to address this problem (Vilhena &

Antonelli,2015;Bloomfieldetal.,2017).TheAppendixillustratesthepotentialofLDAin

(3). Using existing knowledge of his time (6),mostly on the distributions and taxonomic rela-tionships of broadly defined vertebrate families,Wallace divided the world into six terrestrialzoogeographic units largely delineated by what wenow know as the continental plates. Despite rely-ing on limited information and lacking a statisticalbasis, Wallace’s original map is still in use today.

Wallace’s original zoogeographic regional-ization scheme considered ancestral relationshipsamong species, but subsequent schemes generallyused data only on the contemporary distribu-tions of species without explicitly consideringphylogenetic relationships (7–9). Phylogenetictrees contain essential information on the evolu-tionary relationships of species and have be-come increasingly available in recent decades,permitting the delineation of biogeographic re-gions as originally envisioned by Wallace. The

opportunity now exists to use phylogenetic in-formation for grouping assemblages of speciesinto biogeographic units on a global scale. In ad-dition to permitting a sound delimitation of bio-geographic regions, phylogenetic informationallows quantifying phylogenetic affinities amongregions (e.g., 10). Newly developed statisticalframeworks facilitate the transparent character-ization of large biogeographic data sets while min-imizing the need for subjective decisions (11).

Here, we delineated the terrestrial zoogeo-graphic realms and regions of the world (12) byintegrating data on the global distributions andphylogenetic relationships of the world’s am-phibians (6110 species), nonpelagic birds (10,074species), and nonmarine mammals (4853 species),a total of 21,037 vertebrates species [see (13) fordetails]. Pairwise phylogenetic beta diversity (pb)metrics were used to quantify change in phyloge-netic composition among species assemblagesacross the globe. Analyses of combined taxa pbvalues identified a total of 20 zoogeographic re-gions, nested within 11 larger realms, and quan-tified phylogenetic relatedness among all pairs ofrealms and regions (Fig. 1, figs. S1 and S2, andtables S1 and S2). We also used pb to quantifythe uniqueness of regions, with the Australian(mean pb = 0.68),Madagascan (mean pb = 0.63),and South American (mean pb = 0.61) regionsbeing the most phylogenetically distinct assem-blages of vertebrates (Fig. 2). These evolutiona-rily unique regions harbor radiations of speciesfrom several clades that are either restricted to agiven region or found in only a few regions.

Our combined taxa map (Fig. 1) contrastswith some previously published global zoogeo-graphic maps derived exclusively from data onthe distribution of vertebrate species (8, 9, 11).The key discrepancy between our classification

of zoogeographic regions and these previousclassifications is the lack of support for previ-ous Palearctic boundaries, which restricted thisbiogeographic region to the higher latitudes ofthe Eastern Hemisphere. The regions of centraland eastern Siberia are phylogenetically moresimilar to the arctic parts of the Nearctic region,as traditionally defined, than to other parts ofthe Palearctic (fig. S2). As a result, our newlydefined Palearctic realm extends across thearctic and into the northern part of the WesternHemisphere (Fig. 1 and fig. S1). These resultsbear similarities with the zoogeographic map of(11) derived from data on the global distributionof mammal families. In addition, our results sug-gest that the Saharo-Arabian realm is interme-diate between theAfrotropical and Sino-Japaneserealms [see the nonmetric multidimensional scaling(NMDS) plot in fig. S2]. Finally, we newly definethe Panamanian, Sino-Japanese, and Oceanianrealms [but see the Oceanian realm of Udvardyin (14) derived from data on plants].

Our classification of vertebrate assemblagesinto zoogeographic units exhibits some interest-ing similarities with Wallace’s original classi-fication, as well as some important differences(fig. S3). For example, Wallace classified islandseast of Borneo and Bali in his Australian region(fig. S3), which is analogous to our Oceanian andAustralian realms combined (Fig. 1 and fig. S1).In contrast, we find that at least some of theseislands (e.g., Sulawesi) belong to our Orientalrealm, which spans Sundaland, Indochina, andIndia (Fig. 1 and fig. S1). Moreover, our Ocean-ian realm is separate from theAustralian realm andincludes New Guinea together with the PacificIslands (14), whereas Wallace lumped thesetwo biogeographic units into the Australian re-gion. Wallace further argued that the Makassar

1Center for Macroecology, Evolution, and Climate, Depart-ment of Biology, University of Copenhagen, 2100 CopenhagenØ, Denmark. 2Biodiversity and Climate Research Centre (BiK-F)and Senckenberg Gesellschaft für Naturforschung, Sencken-berganlage 25, 60325 Frankfurt, Germany. 3Department of Bio-geography and Global Change, National Museum of NaturalSciences, Consejo Superior de Investigaciones Científicas, Calle deJosé Gutiérrez Abascal, 2, 28006 Madrid, Spain. 4Centro deInvestigação em Biodiversidade e Recursos Genéticos, Universi-dade de Évora, Largo dos Colegiais, 7000 Évora, Portugal.5Center forMacroecology, Evolution, and Climate, Natural HistoryMuseum of Denmark, University of Copenhagen, 2100CopenhagenØ,Denmark. 6Department of Ecology andEvolution,Stony Brook University, Stony Brook, NY 11794–5245, USA.7Department of Vertebrate Zoology, MRC-116, National Mu-seum of Natural History, Smithsonian Institution, Post OfficeBox 37012, Washington, DC 20013–7012, USA. 8BiodiversityResearch Group, School of Geography and the Environment,Oxford University Centre for the Environment, South Parks Road,Oxford OX1 3QY, UK.

*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected]

Fig. 1. Map of the terrestrial zoogeographic realms and regions of the world.Zoogeographic realms and regions are the product of analytical clustering ofphylogenetic turnover of assemblages of species, including 21,037 species ofamphibians, nonpelagic birds, and nonmarine mammals worldwide. Dashedlines delineate the 20 zoogeographic regions identified in this study. Thick

lines group these regions into 11 broad-scale realms, which are named. Colordifferences depict the amount of phylogenetic turnover among realms. (For moredetails on relationships among realms, see the dendrogram and NMDS plot infig. S1.) Dotted regions have no species records, and Antarctica is not included inthe analyses.

www.sciencemag.org SCIENCE VOL 339 4 JANUARY 2013 75

REPORTS

on

Aug

ust 2

9, 2

016

http

://sc

ienc

e.sc

ienc

emag

.org

/D

ownl

oade

d fr

om

Page 216: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

212

this respect, by applying it to a large dataset of Amazonian frogs assembledwith the

helpofDNAidentification.LDAprovedinthiscaseanefficientwaytoinfertheoptimal

number of biogeographic units, assess the strength of the underlying signal, and

distinguishbetweensharpanddiffuseboundariesbetweenbiogeographicunits.

Nevertheless, unlike distance-based clustering, LDA does not allow for taking

phylogenetic information intoaccount,and improvingon thisaspectcouldbeauseful

research avenue. This problem could be for instance approached through the

Generalized Polya urn LDA model (Mimno et al., 2011). Furthermore, despite its

shortcomings, distance-based hierarchical clustering is appreciated by ecologists

because it provides additional hindsight on the relationship between the different

samples,andbecauseitoffersthepossibilityofchoosingthenumberofclustersbased

on the hierarchical tree. The hierarchical LDAmodel (or hLDA; Griffiths etal., 2004),

whichdescribesahierarchyofnestedtopics,couldbeappealinginthisrespect.

Second, the study of interactions between taxa constitutes a central interest of

communityecology.LargeenvironmentalDNAdatasetsprovideindirectinformationon

potential interactions between taxa through the co-occurrence of OTUs and the

covarianceof theirabundances (seeFig.3;Faust&Raes,2012).LDAassemblagesare

inferred based on this information, and thus reflect the presence of potential

interactions within each assemblage. Nevertheless, the application of LDA

decomposition separately todifferent taxonomic groups, asdone in the third chapter,

does not provide any information on the possible interactions between these groups.

Conversely, when LDA is applied to the whole dataset, it is not possible to explicitly

distinguish between subgroups of preferentially interacting taxa, such as plants and

fungi for instance. This shortcoming could be addressed by using the ‘author-topic

model’,anextensionofLDAaimingataccountingforsample‘metadata’,suchasauthors

inatextdocument(Rosen-Zvietal.,2004,2010).ThismodelisidenticaltoLDAexcept

thateachdocument(orsample)isnotdirectlycharacterizedbyamixtureoftopics(or

assemblages), but by its authors, to each ofwhich is assigned amixture of topics. In

practice, authorsmaybeanydiscrete labels, andcould for instancecorrespond to the

oneor few treespeciessurroundingeachsoil sample ifone is interested in tree-fungi

interactions.Themethodwouldthusyieldamixtureoffungiassemblagesforeachtree

species, and indirectly a mixture of assemblages for each sample based on the tree

Page 217: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

213

speciessurroundingit.Asimplerversionofthismodelmightalsobeconsideredwherea

singleassemblagecharacterizeseachtreespecies.

Figure3:Occurrence-based inferenceof interactionsbetweenprokaryoticOTUs fromaglobaldataset(Chaffronetal.,2010).EachnoderepresentsanOTU,andedgesbetweennodesrepresent significant associations based on co-occurrence. Edge thickness increases withsignificance.AdaptedfromFaust&Raes(2012).

Moregenerally,ecologicalstudiesoftendonotlimitthemselvestoexploringthe

structure of a single type of data, but attempt at uncovering statistical relationships

betweendifferent typesofdata,suchas taxonomicandenvironmentaldata.While the

author-topicmodelonlyallowsforaddingdiscretelabelstoeachsample,othermodels

such as Dirichlet-multinomial Regression (Mimno & McCallum, 2012) can also

accommodate continuous attributes, and could be used to account for environmental

measurements in themodel. The goodness-of-fit of themodelwithout environmental

Rule: if Bacteroides (OTU 4612) and Lachnospiraceae incertae

sedis (OTU b1228) are present, then Faecalibacterium (OTU b418) is also present

Colour code

Bacterial phyla

AcidobacteriaActinobacteriaAquificaeBacteroidetesChloroflexiFirmicutesGemmatimonadetesNitrospiraePlanctomycetesProteobacteriaSpirochaetesThermodesulfobacteriaTM7Verrucomicrobia

Archaeal phyla

CrenarchaeotaEuryarchaeota

OTUs above phylum level are white

Node fill colour code

Bacteroides

Clostridia unclassifiedFaecalibacterium

LachnospiraceaeLachnospiraceae unclassified and incertae sedis

Roseburia

Ruminococcaceae unclassifiedRuminococcus

Subdoligranulum

Node border colours distinguish between different OTUs

REVIEWS

542 | AUGUST 2012 | VOLUME 10 www.nature.com/reviews/micro

© 2012 Macmillan Publishers Limited. All rights reserved

Page 218: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

214

datacouldthenbecomparedtothatofthemodelaccountingforthesedata,ideallyusing

AIC.Thiswouldentail a slightlydifferentuseof topicmodelling than thatof the third

chapter: namely, shifting the focus from the exploration of data structure toward

hypothesistesting.However,bothapproacheshavetheirownmerits.

Finally,theapplicationoftopicmodellingtoecologyneedsnotbelimitedtotaxa

abundanceandoccurrencedata. It could for instancebeextended toexonsequencing

data describing functional types, or to RNA sequencing data characterizing gene

expression. Moreover, aside from the major technological revolution that is high-

throughputDNAsequencing,otherpromisingtechnologiesarecurrentlybeingadapted

toautomateddatacollectioninecology,notablyLidarandhyperspectralimaging.Topic

modellinghasbeensuccessfullyusedtoretrievepatternsfromimages(Luoetal.,2015),

and could possibly also find application in the analysis of remote-sensing ecological

data.

Statisticalversusmechanisticmodelling3.

Topicmodelling is but one ofmany competingbranches ofmachine learning that are

currentlyactivelydevelopedtoexploittheever-increasingamountofdataproducedby

currenttechnologies(Bishop,2006).Overtherecentyears,somebranchesofmachine

learninghavebecomeparticularlyprominent,especiallymulti-layeredneuralnetworks

underthenameof‘deeplearning’(LeCunetal.,2015).Suchmethodsareindeedefficient

atdetectingstructureinlargedatasets,andhavebeenrecentlyappliedtobioinformatics

problems such as DNA sequence classification (Rizzo et al., 2016). However, these

methods are not based on an easily interpretable model. As such, they can only be

fruitfully applied to supervised learning tasks, i.e. to situations where correct and

incorrectresultscanbetoldapartapriori,whicharemoretypicalofengineeringthan

basicscience.

In contrast, topic models have a mathematical structure that is similar to the

multivariate formulationofneutralmodels, asdiscussed inHarrisetal. (2015)and in

thethirdpartoftheIntroduction.Thisparallelcouldbeexploitedtobuildmixedmodels

Page 219: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

215

combiningtheadvantagesofbothtypesofmodels.Forinstance,alocalcommunity,such

as an island,may receive immigrating individuals from different source communities

thathavedistinct(known)taxonomiccompositions.Byassuminganeutraldynamicsin

the local community, and modelling the origin of immigrating individuals by a topic

model, one could possibly infer from the local taxonomic composition the relative

contribution of the different source communities. Conversely, starting from a topic

model as in the third chapter, one could assume that a neutral dynamics takes place

withineachassemblage.

While topic and neutral models may seem to be of different nature, the

distinction between statistical and mechanistic models is more tenuous than may

appear at first glance. The first topic modelling papers mentioned mechanistic

arguments to justify theirmodels, arguing than theymirrored theway humanswrite

text documents, and some subsequent developments try to better account for the

structure of natural language (Wallach, 2006). When applied to ecological data, the

assumptionthatlocalcommunitiesareamixtureofseveralassemblagesofco-occurring

taxa constitutes a genuine biological hypothesis. Conversely, the realism of the

hypothesesinHubbell’sneutralmodelhasbeenmuchdebated(Rosindelletal.,2012),

and onemight argue that itsmost valuable hindsight is on the nature of the species

abundance distribution pattern itself: namely, thatmost empirical species abundance

distributions can be approximately decomposed into orthogonal diversity and

connectivitycomponents,irrespectiveoftheirexactmechanisticinterpretation(Jabotet

al.,2008).

While a very flexiblemodel is undesirablewhen one aims at testingmodelling

hypotheses on data, it becomes an advantage when one aims at characterizing the

systemathandthrougha limitednumberofrelevantparameters.This isoftenamore

realistic prospect when faced with large datasets resulting from automated data

collection. However, as illustrated by the case of Hubbell’s neutral model, relevant

parameters cannotbedeterminedwithout anunderstandingof thebasicprocesses at

play. Moreover, characterizing a system is of little use if this does not entail the

possibility of predictions and generalization. A right balance is thus to find between

flexibilityandfalsifiabilityinbuildingmodelsfortheanalysisoflargedatasets,andthe

Page 220: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

216

shift toward inference-oriented models should not preclude building them on first

principles(Marquetetal.,2014).

Page 221: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

217

References

Bishop, C.M. (2006) Pattern Recognition andMachine Learning, Springer. Michael Jordan, JonKleinberg,BernhardSchölkopf.

Bloomfield, N.J., Knerr, N. & Encinas-Viso, F. (2017) A comparison of network and clusteringmethodstodetectbiogeographicalregions.Ecography.

Chaffron,S.,Rehrauer,H.,Pernthaler,J.&vonMering,C.(2010)Aglobalnetworkofcoexistingmicrobes fromenvironmentalandwhole-genomesequencedata.GenomeResearch,20,947–959.

Condit, R., Pitman, N., Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Nunez, P., Aguilar, S.,Valencia,R.,Villa,G.,Muller-Landau,H.C.,Losos,E.&Hubbell,S.P.(2002)Beta-diversityintropicalforesttrees.Science,295,666–669.

Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.

Faust, K. & Raes, J. (2012) Microbial interactions: from networks to models. Nature ReviewsMicrobiology,10,538–550.

Field, C.B., Behrenfeld,M.J., Randerson, J.T. & Falkowski, P. (1998) Primary production of thebiosphere:integratingterrestrialandoceaniccomponents.Science,281,237–240.

Follows, M.J., Dutkiewicz, S., Grant, S. & Chisholm, S.W. (2007) Emergent Biogeography ofMicrobialCommunitiesinaModelOcean.Science,315,1843.

Griffiths,T.L.,Jordan,M.I.,Tenenbaum,J.B.&Blei,D.M.(2004)Hierarchicaltopicmodelsandthenestedchineserestaurantprocess.Advancesinneuralinformationprocessingsystems,pp.17–24.

Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticalandecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.

Hellweger,F.L.,vanSebille,E.&Fredrick,N.D.(2014)Biogeographicpatternsinoceanmicrobesemergeinaneutralagent-basedmodel.Science.

Hofmann, T. (2001) Unsupervised learning by probabilistic latent semantic analysis.Machinelearning,42,177–196.

Holmes,I.,Harris,K.&Quince,C.(2012)DirichletMultinomialMixtures:GenerativeModelsforMicrobialMetagenomics.PlosOne,7.

Holt,B.,Lessard,J.P.,Borregaard,M.K.,Fritz,S.A.,Araujo,M.B.,Dimitrov,D.,Fabre,P.H.,Graham,C.H.,Graves,G.R.,Jonsson,K.A.,Nogues-Bravo,D.,Wang,Z.H.,Whittaker,R.J.,Fjeldsa,J.&Rahbek,C.(2013)AnUpdateofWallace’sZoogeographicRegionsoftheWorld.Science,339,74–78.

Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.

Huttenhower,C.,Gevers,D.,Knight,R.,Abubucker,S.,Badger, J.H.,Chinwalla,A.T.,Creasy,H.H.,Earl, A.M., FitzGerald, M.G., Fulton, R.S., Giglio, M.G., Hallsworth-Pepin, K., Lobos, E.A.,Madupu,R.,Magrini,V.,Martin,J.C.,Mitreva,M.,Muzny,D.M.,Sodergren,E.J.,Versalovic,J.,Wollam,A.M.,Worley,K.C.,Wortman,J.R.,Young,S.K.,Zeng,Q.,Aagaard,K.M.,Abolude,O.O.,Allen-Vercoe,E.,Alm,E.J.,Alvarado,L.,Andersen,G.L.,Anderson,S.,Appelbaum,E.,Arachchi, H.M., Armitage, G., Arze, C.A., Ayvaz, T., Baker, C.C., Begg, L., Belachew, T.,Bhonagiri, V., Bihan, M., Blaser, M.J., Bloom, T., Bonazzi, V., Paul Brooks, J., Buck, G.A.,

Page 222: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

218

Buhay,C.J.,Busam,D.A.,Campbell, J.L.,Canon,S.R.,Cantarel,B.L.,Chain,P.S.G.,Chen, I.-M.A., Chen, L., Chhibba, S., Chu, K., Ciulla, D.M., Clemente, J.C., Clifton, S.W., Conlan, S.,Crabtree,J.,Cutting,M.A.,Davidovics,N.J.,Davis,C.C.,DeSantis,T.Z.,Deal,C.,Delehaunty,K.D.,Dewhirst,F.E.,Deych,E.,Ding,Y.,Dooling,D.J.,Dugan,S.P.,MichaelDunne,W.,ScottDurkin,A., Edgar,R.C., Erlich,R.L., Farmer, C.N., Farrell, R.M., Faust,K., Feldgarden,M.,Felix, V.M., Fisher, S., Fodor, A.A., Forney, L.J., Foster, L., Di Francesco, V., Friedman, J.,Friedrich, D.C., Fronick, C.C., Fulton, L.L., Gao, H., Garcia, N., Giannoukos, G., Giblin, C.,Giovanni,M.Y.,Goldberg, J.M.,Goll, J.,Gonzalez,A.,Griggs,A.,Gujja,S.,KinderHaake,S.,Haas,B.J.,Hamilton,H.A.,Harris,E.L.,Hepburn,T.A.,Herter,B.,Hoffmann,D.E.,Holder,M.E.,Howarth,C.,Huang,K.H.,Huse,S.M.,Izard,J.,Jansson,J.K.,Jiang,H.,Jordan,C.,Joshi,V., Katancik, J.A., Keitel, W.A., Kelley, S.T., Kells, C., King, N.B., Knights, D., Kong, H.H.,Koren,O.,Koren,S.,Kota,K.C.,Kovar,C.L.,Kyrpides,N.C.,LaRosa,P.S.,Lee,S.L.,Lemon,K.P.,Lennon,N.,Lewis,C.M.,Lewis,L.,Ley,R.E.,Li,K.,Liolios,K.,Liu,B.,Liu,Y.,Lo,C.-C.,Lozupone, C.A.,DwayneLunsford,R.,Madden,T.,Mahurkar,A.A.,Mannon, P.J.,Mardis,E.R., Markowitz, V.M., Mavromatis, K., McCorrison, J.M., McDonald, D., McEwen, J.,McGuire, A.L., McInnes, P., Mehta, T., Mihindukulasuriya, K.A., Miller, J.R., Minx, P.J.,Newsham,I.,Nusbaum,C.,O’Laughlin,M.,Orvis,J.,Pagani,I.,Palaniappan,K.,Patel,S.M.,Pearson,M., Peterson, J., Podar,M., Pohl, C., Pollard, K.S., Pop,M., Priest,M.E., Proctor,L.M., Qin, X., Raes, J., Ravel, J., Reid, J.G., Rho, M., Rhodes, R., Riehle, K.P., Rivera, M.C.,Rodriguez-Mueller, B., Rogers, Y.-H., Ross, M.C., Russ, C., Sanka, R.K., Sankar, P., FahSathirapongsasuti, J., Schloss, J.A., Schloss, P.D., Schmidt, T.M., Scholz, M., Schriml, L.,Schubert,A.M.,Segata,N.,Segre,J.A.,Shannon,W.D.,Sharp,R.R.,Sharpton,T.J.,Shenoy,N.,Sheth,N.U.,Simone,G.A.,Singh,I.,Smillie,C.S.,Sobel,J.D.,Sommer,D.D.,Spicer,P.,Sutton,G.G.,Sykes,S.M.,Tabbaa,D.G.,Thiagarajan,M.,Tomlinson,C.M.,Torralba,M.,Treangen,T.J.,Truty,R.M.,Vishnivetskaya,T.A.,Walker, J.,Wang,L.,Wang,Z.,Ward,D.V.,Warren,W.,Watson,M.A.,Wellington,C.,Wetterstrand,K.A.,White,J.R.,Wilczek-Boney,K.,Wu,Y.,Wylie, K.M.,Wylie, T., Yandava, C., Ye, L., Ye, Y., Yooseph, S., Youmans, B.P., Zhang, L.,Zhou,Y.,Zhu,Y.,Zoloth,L.,Zucker,J.D.,Birren,B.W.,Gibbs,R.A.,Highlander,S.K.,Methé,B.A., Nelson, K.E., Petrosino, J.F., Weinstock, G.M., Wilson, R.K. & White, O. (2012)Structure, function anddiversityof thehealthyhumanmicrobiome.Nature,486, 207–214.

Jabot, F., Etienne, R.S. & Chave, J. (2008) Reconciling neutral community models andenvironmentalfiltering:theoryandanempiricaltest.Oikos,117,1308–1320.

Knights,D.,Kuczynski, J., Charlson,E.S., Zaneveld, J.,Mozer,M.C.,Collman,R.G.,Bushman,F.D.,Knight,R.&Kelley,S.T.(2011)Bayesiancommunity-wideculture-independentmicrobialsourcetracking.NatureMethods,8,761-U107.

LeCun,Y.,Bengio,Y.&Hinton,G.(2015)Deeplearning.Nature,521,436–444.Liu, L., Tang, L., Dong, W., Yao, S. & Zhou, W. (2016) An overview of topic modeling and its

currentapplicationsinbioinformatics.SpringerPlus,5.Luo, W., Stenger, B., Zhao, X. & Kim, T.-K. (2015) Automatic Topic Discovery for Multi-Object

Tracking.AAAI,pp.3820–3826.Marquet,P.A.,Allen,A.P.,Brown,J.H.,Dunne,J.A.,Enquist,B.J.,Gillooly,J.F.,Gowaty,P.A.,Green,

J.L.,Harte,J.,Hubbell,S.P.,O’Dwyer,J.,Okie,J.G.,Ostling,A.,Ritchie,M.,Storch,D.&West,G.B.(2014)OnTheoryinEcology.Bioscience,64,701–710.

Mimno,D.&McCallum,A.(2012)Topicmodelsconditionedonarbitraryfeatureswithdirichlet-multinomialregression.arXivpreprintarXiv:1206.3278.

Page 223: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

219

Mimno,D.,Wallach,H.M., Talley, E., Leenders,M.&McCallum,A. (2011)Optimizing SemanticCoherence inTopicModels.Proceedingsofthe2011ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.

Ofiteru, I.D., Lunn, M., Curtis, T.P.,Wells, G.F., Criddle, C.S., Francis, C.A. & Sloan,W.T. (2010)Combined niche and neutral effects in a microbial wastewater treatment community.Proceedings of the National Academy of Sciences of the United States of America, 107,15345–15350.

Ramirez,K.S., Leff, J.W.,Barberan,A., Bates, S.T., Betley, J., Crowther,T.W.,Kelly, E.F.,Oldfield,E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographic patterns in below-grounddiversity inNewYork City’s Central Park aresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.

Rizzo,R.,Fiannaca,A.,LaRosa,M.&Urso,A.(2016)ADeepLearningApproachtoDNASequenceClassification. Computational Intelligence Methods for Bioinformatics and Biostatistics:12th International Meeting, CIBB 2015, Naples, Italy, September 10-12, 2015, RevisedSelected Papers (ed. by C. Angelini), P.M. Rancoita), and S. Rovetta), pp. 129–140.SpringerInternationalPublishing,Cham.

Rosen-Zvi,M.,Chemudugunta,C.,Griffiths,T.,Smyth,P.&Steyvers,M.(2010)LearningAuthor-TopicModelsfromTextCorpora.AcmTransactionsonInformationSystems,28.

Rosen-Zvi,M.,Gri_ffiths,T.,Steyvers,M.&Smyth,P.(2004)TheAuthor-TopicModelforAuthorsandDocuments.

Rosindell, J., Hubbell, S.P., He, F., Harmon, L.J. & Etienne, R.S. (2012) The case for ecologicalneutraltheory.TrendsinEcology&Evolution,27,203–208.

Soininen,J.,McDonald,R.&Hillebrand,H.(2007)Thedistancedecayofsimilarityinecologicalcommunities.Ecography,30,3–12.

Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)DecomposingbiodiversitydatausingtheLatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.EcologyLetters,17,1591–1601.

deVargas,C.,Audic,S.,Henry,N.,Decelle,J.,Mahe,F.,Logares,R.,Lara,E.,Berney,C.,LeBescot,N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J.-M., Bittner, L.,Chaffron,S.,Dunthorn,M.,Engelen,S.,Flegontova,O.,Guidi,L.,Horak,A.,Jaillon,O.,Lima-Mendez, G., Lukes, J.,Malviya, S.,Morard,R.,Mulot,M., Scalco, E., Siano,R., Vincent, F.,Zingone,A.,Dimier,C.,Picheral,M., Searson, S.,Kandels-Lewis, S.,Acinas, S.G.,Bork,P.,Bowler,C.,Gorsky,G.,Grimsley,N.,Hingamp,P.,Iudicone,D.,Not,F.,Ogata,H.,Pesant,S.,Raes,J.,Sieracki,M.E.,Speich,S.,Stemmann,L.,Sunagawa,S.,Weissenbach,J.,Wincker,P.,Karsenti,E.&TaraOceans,C. (2015)Eukaryoticplanktondiversity in thesunlitocean.Science,348.

Vilhena, D.A. & Antonelli, A. (2015) A network approach for identifying and delimitingbiogeographicalregions.NatureCommunications,6.

Wallach,H.M.(2006)Topicmodeling:beyondbag-of-words.Proceedingsofthe23rdinternationalconferenceonMachinelearning,pp.977–984.ACM.

Ward,B.A.,Dutkiewicz,S.,Jahn,O.&Follows,M.J.(2012)Asize-structuredfood-webmodelfortheglobalocean.LimnologyandOceanography,57,1877–1891.

Page 224: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Discussion

220

Page 225: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

221

AppendixLarge-scaleDNAbarcodingofAmazoniananuransleadstoanewdefinitionofbiogeographicalsubregionsintheGuianaShieldandrevealsavastunderestimationofdiversityandlocalendemism

Jean-PierreVachera,GuilhemSommeria-Kleina,FrancescoFicetolab,MiguelTrefaut

Rodriguesc,PhilippeJ.R.Kokd,BriceP.Noonane,AndrewSnydere,RawienJairamf,Paul

Ouboterf,JerrianeOliveiraGomesg,TeresaC.S.Avila-Piresg,JucivaldoDiasLimah,Raffael

Ernsti,MichelBlancj,MaëlDewynterk,TimJ.Colsonl,SergioM.deSouzam,PedroNunesn,

AugustinCamachoo,MauroTeixeirap,RenatoRecoderq,JoséCassimiror,Quentin

Martinezs,ChristianMartyt,PhilippeGaucheru,ChristopheThébauda,AntoineFouqueta,u.

Page 226: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

222

aLaboratoireEDB,UMR5174,CNRS-UPS-IRD,Toulouse,France;bLaboratoired'ÉcologieAlpine(LECA),UMR5553,Grenoble,France;cUniversidade de São Paulo, Instituto de Biociências, Departamento de Zoologia, Caixa Postal11.461,CEP05508-090,SãoPaulo,SP,Brazil;dAmphibian Evolution Lab, Biology Department,Vrije Universiteit Brussel, 2 Pleinlaan, 1050Brussels,Belgium;eDepartmentofBiology,507ShoemakerHall,University,MS38677,USA;f National Zoological Collection Suriname (NZCS), Anton de Kom University of Suriname,Paramaribo,Suriname;gMuseuParaenseEmílioGoeldi,LaboratóriodeHerpetologia/CZO,Av.Perimetral,1901;TerraFirme,Belém,Pará,Brazil;hCentro de Pesquisas Zoobotânicas e Geologicas (CPZG), Instituto de Pesquisas Cientificas eTecnológicasdoEstadodoAmapá(IEPA),Macapá,AP,Brazil;IMuseumofZoology,SenckenbergNaturalHistoryCollectionsDresden,KönigsbrückerLandst.159,01109Dresden,Germany;jPointeMaripa,RN2/PK35,93711,Roura,FrenchGuiana;k Biotope, Agence Amazonie-Caraïbes, 30 Domaine de Montabo, Lotissement Ribal, 97300Cayenne,FrenchGuiana;lAdresses8rueMareschal,30900Nîmes,France;timpasseJeanGalot,Montjoly,FrenchGuiana;uLaboratoire Écologie, évolution, interactions des systèmes amazoniens (USR 3456 LEEISA),UniversitédeGuyane,CNRSGuyane,Cayenne,FrenchGuiana.

Page 227: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

223

Introduction

Amazoniaencompassesabout40%oftheworld’stropicalforests(Sioli,1984;Hubbellet

al.,2008;Hoorn&Wesselingh,2010),andmanytaxonomicgroupsreachtheirhighest

speciesrichness in thisregion(Antonelli&Sanmartín,2011; Jenkinsetal.,2013).The

processes that have given rise to this exceptionally highdiversity have long intrigued

biologists(Wallace,1852;Bates,1863).Amazoniagivesanappearanceofhomogeneity,

because it is a vast and seemingly uniform extent of forest that is faunistically very

distinct from other Neotropical regions (Dinerstein et al., 1995; Olson et al., 2001;

Antonelli& Sanmartín, 2011;Vilhena&Antonelli, 2015).However, this ismisleading:

temperaturesandrainfallvarywidelyacrossAmazonia(Mayle&Power,2008),andso

dovegetation types (Anderson,2012;Hughesetal.,2013).Moreover,Amazoniahada

tumultuousclimatologicalandgeologicalpast,mainlycausedbytheAndeanupliftand

the setting-up of the Rio Amazonas watershed during the late Tertiary (Hoorn et al.,

2010).

ThedistributionofspecieswithinAmazoniaisknowntorelatetothislarge-scale

environmental heterogeneity. The observed congruence between the geographic

distribution of birds and primates on the one hand and themajor interfluves on the

other hand (Wallace, 1852; Haffer, 1974) led to the definition of biogeographic

subregions (BSRs), coined as “Amazonian areas of endemism” (Wallace, 1852;Haffer,

1974;Cracraft,1985).However,thereisstilllittleconsensusonhowtobestdelimitand

nameBSRs,withmanytermsbeingusedinterchangeably(Vilhena&Antonelli,2015).In

fact, the very existence and boundaries of different BSRs across Amazonia and the

relative degree of endemism within them have simply never been analysed using

modern analytic tools (e.g., clustering) and large species assemblages having

unambiguousdistributiondata(Nelsonetal.,1990;Morrone,2005;Nakaetal.,2012).

Moreover,currentknowledgeonthedelimitationofAmazonianBSRsismainlybasedon

birds, the best-known taxonomic group, as well as primates and plants displaying

limited distributions in Amazonia. The explanatory power of the Amazonian BSR as

currently defined remains questionable until their boundaries are proven to match

Page 228: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

224

acrossmultiple taxonomicgroups.However, thisseemsunlikelybecause thesegroups

have overall high dispersal abilities, and their distribution patterns may be poor

predictorsforlessvagiletaxa(Claramuntetal.,2011;Pigot&Tobias,2015;Zizkaetal.,

2016).

Because small terrestrial vertebrates such as anurans have more limited

dispersalabilitiesandpossiblyagreatersensitivitytoenvironmentalvariation,theyare

bettersuitedtothedelineationofrelevantbioregions(Zeisset&Beebee,2008).Anuran

assemblagesmay display different or finer geographic patterns than those previously

described at the continental (Vilhena & Antonelli, 2015) or the regional scale

(Vasconcelos et al., 2014). For instance, one of the rare studies unambiguously

delimitingBSRsinAmazoniafoundwell-delimitedbioregionsintheGuianaShieldbased

onthedistributionofbirdspecies,includingalargehomogeneousregionspanningthe

easternpartoftheGuianaShield(Nakaetal.,2012).Yet,studiesonanuranamphibians

suggest a finerbiogeographic structure in theEasternGuianaShield,wheredivergent

lineages of frogs exhibit concordant distribution limits (Fouquet et al., 2012d, 2013,

2016). In thispaper,weaimatdelimitingAmazonianbioregions inadata-drivenway

basedonanewlycollecteddatasetofmolecularanurandiversity,withaparticularfocus

ontheEasternGuianaShield.

RevealingthebasicgeographicalstructureofspeciesdiversityinAmazoniaisnot

onlyofcrucialimportanceforconservation(DaSilvaetal.,2005),itisalsoanimportant

prerequisite for the study of the processes that gave rise to present-day diversity

patterns.IdentifyingBSRsinAmazoniamayhelpidentifythephysicalbarriersrelevant

to speciation, define the contact zones between closely related parapatric taxa, and

capture the effects of dispersal limitation in the structure of Amazonian communities

(e.g., Moura et al., 2016). Many hypotheses have been proposed to explain

heterogeneities in species distribution across Amazonia, including landscape change

inducedbylateTertiaryclimatefluctuations(Haffer1969),theupliftoftheAndes,and

continuousdispersal across large rivers (Hayes&Sewlal, 2004;Antonellietal., 2010;

Hoorn et al., 2010), or past environmental gradients (Colinvaux et al., 2000). These

differenthypotheseshavebeenverifiedforsometaxonomicgroupsatdifferentspatial

andtemporalscales(Hall&Harvey,2002),butthereisstillnoconsensusaboutthemain

driversofdiversificationwithinAmazonia.

Page 229: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

225

TwomajorchallengestoourunderstandingofthebasicstructureofAmazonian

biodiversity are the scarcity of occurrence data and the imprecision of species

delineation(WallaceanandLinneanshortfalls).Theseshortfallsareparticularlyobvious

insmallterrestrialvertebratessuchasanurans(Ficetolaetal.,2014).Almostallanuran

taxa with large ranges in Amazonia exhibit deep divergences when analysed with

genetic tools, suggesting that they comprise several species, each with a restricted

distribution(Fouquetetal.,2007a;Funketal.,2012;Geharaetal.,2014;Fouquetetal.,

2015b;Ferrãoetal.,2016;Fouquetetal.,2016).Thesestudiestypicallyimplythatthe

actual species richness in these groups is more than twice that estimated from

morphology only. Therefore, ranges of Amazonian amphibians used in broad

biodiversityassessmentssuchastheInternationalUnionfortheConservationofNature

(IUCN) Red list are likely to be largely inaccurate (Ficetola et al., 2014). Out of 427

amphibianspeciesinhabitingthe6millionkm2ofAmazoniaaccordingtoIUCN,atleast

150species(35%)aredistributedovermorethan1millionkm2(Fouquetetal.,2007a).

Suchahighproportionofbroadly-distributed species seemsunlikely (Wynn&Heyer,

2001),becauseamphibiansusuallydisplaylowdispersalcapacitiesandoftenhavesmall

niches (Duellman & Trueb, 1994;Wells, 2010). This gap in our understanding of the

actualdiversityanddistributionofspeciescouldseriouslyinvalidateconclusionsdrawn

fromIUCNdata(Fodenetal.,2013;Jenkinsetal.,2013,2015;Pimmetal.,2014;Feeley

&Silman,2016).

Theoverallaimsofthisstudywere(1)toobtainanewgeoreferenceddatasetof

Amazonian anurans based onmolecular diversity,with a focus on the easternGuiana

Shield (EGS) (east of the Tepuis, and north of Rio Negro and Rio Amazonas), (2) to

provide estimates of the number of species and of their distributions in this part of

Amazonia, (3) to infer data-driven spatial boundaries betweenBSRs, aswell as to re-

assesstheirrateofendemism.Giventhatanuranspeciesboundariesanddistributions

areplaguedwithuncertainty inAmazoniaandthat IUCNdataareoftenout-datedand

imprecise, it is necessary to use occurrence records linked to taxonomic frameworks

based on clear criteria. Therefore, we conducted extensive fieldwork to collect

specimensrepresentativeofpresent-daydiversityatthescaleoftheentireregion,and

obtained mitochondrial DNA sequences (16S rDNA) from these specimens. We also

included inouranalysespubliclyavailablesequences fromotherspecimens.Basedon

Page 230: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

226

thesesequences,wegeneratedtwonewtaxonomicframeworksforAmazoniananurans.

Our dataset represents the largest molecular diversity dataset gathered so far in

Amazoniaforanytaxonomicgroup.

Page 231: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

227

Materialandmethods

Fieldwork1.

Weundertook fieldwork in several localities throughout theGuiana Shield, notably in

southernSuriname,FrenchGuiana,andtheBrazilianstatesofAmapáandRoraima.We

collectedspecimensofasmanyanuranspeciesaspossibleperlocalitybynocturnaland

diurnal active searches (visual and acoustic). Each specimen was identified and

photographed. They were subsequently euthanized using an injection of Xylocaine®

(lidocainechlorhydrate).Tissuesamples(liverormuscletissuefromthighortoe-clip)

wereremovedandstoredin95%ethanol,whilespecimensweretaggedandfixed(using

formalin10%)beforebeing transferred to70%ethanol forpermanent storage.These

field surveys allowed us to cover the anuran communities of the EGS at an

unprecedented fine scale (Fig.1A).Wecompleted thesedata for the restofAmazonia

withloansofmaterialfromseveralinstitutions,notablyfromUniversidaddeSaoPaulo

for the upper Madeira, lower Xingu, Abacaxis and Purus Rivers. Ultimately, the total

numberofanalysedsamplesreached4,681.

Moleculardata2.

We extracted DNA from the samples using the Wizard Genomic extraction protocol

(Promega;Madison,WI,USA).Wetargetedac.a.400bpfragmentof themitochondrial

16SrDNAusingMiSeqandSangertechniques(SupplementaryMethods).Weeventually

generated4,492sequences.

Additionally, we retrieved from GenBank (as of the 1st August 2015) all

sequences of species congenericwith those occurring in theGuiana Shield, aswell as

sequences of Adelphobates and Phyzelaphryne, two genera restricted to southern

Amazonia.Weremoved low-qualityor tooshortsequences,aswellasduplicates from

Page 232: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

228

the same specimen. We obtained approximate geographical coordinates for most of

theserecordssearchingtheoriginalpapers,localityinformation,orcollectiondatabases.

The finaldatasetcontained11,166sequences,10,254ofwhichweregeotagged.

ThisbarcodedatasetisprobablythemostextensivegatheredsofarinAmazoniaforany

vertebrate group. 8,181 records are fromAmazonia proper, including 4,634 from the

EGS, while the remaining are from adjacent regions. The obtained sequences were

alignedwithMAFFTv.7(Katoh&Standley,2013).Weusedtheresultingalignment to

generate a neighbour-joining tree using pairwise deletion and p-distancemodel with

MEGAv.7.0.16(Kumaretal.,2016).

Taxonomicframeworks3.

Whilethereisvalidcriticismagainstrelianceonsimplisticsingle-sequenceapproaches

to species delineation (Goldstein & DeSalle, 2011; Krishna Krishnamurthy & Francis,

2012), such approaches can take us further toward the comparative quantification of

biodiversityoverdifferentspatialscales(Emersonetal.,2011;Yuetal.,2012; Jietal.,

2013). In thecaseofAmazoniananurans, clearandexhaustivedelimitationof species

boundariesbasedonmorphology,acousticsandmoleculardataremainsoutofreach.As

aconsequence,manyspeciesgroupshaveaveryconfusedtaxonomyleadingtofrequent

misidentification, lumpingof undescribed specieswithin a single taxon, and assigning

speciestopolyphyleticgroups.ThisresultsinlargelyinaccurateIUCNdata.Inorderto

compare our sequence dataset to IUCN data, we built two different taxonomic

frameworks. The TAXO1 taxonomic framework is conservative, linking as much as

possibleeachsequencetoanominaltaxonsoastoformamonophyleticgroup,whilethe

TAXO2taxonomicframeworkresultsfromapurelyDNA-basedspeciesdelineation(see

below).

ForTAXO1,ourgoalwastogroupundernominal taxathesequences forminga

monophyletic group according to the neighbour-joining tree, so as to obtain the

geographicrangeofthelineagesalreadyconsideredbytheIUCN.Originalfieldworkand

GenBank assignments were often contradictory because of the above-mentioned

Page 233: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

229

reasonsandbecauseoftaxonomicchangessubsequenttoidentification,andwerethus

oftenmodified.Wefirstidentifiedthesequencesthatcouldbeunambiguouslylinkedto

a nominal taxon by considering the literature (e.g. sequences from type series), the

knownrangeofthetaxon,andthelocationofthetypelocality.Then,wecheckedwether

thisidentificationwasinaccordancewiththeIDofthemostcloselyrelatedsequences.If

in accordance, this taxon ID was applied to the sequences until another taxon was

applicable to more distant lineage. When a taxon was found to be paraphyletic, we

checked for possible misidentification, and whether one of the lineages could be

identified as another taxon. When paraphyly was ambiguous, we kept the original

identification.Whenparaphylywasunambiguous,oneofthelineageswasidentifiedas

the nominal taxonwhile the other oneswere identified as “sp.” if they did not share

affinitieswithanothertaxon.Inafewcases,twoormoretaxawerelargelyintricatewith

shallow genetic distances among sequences and remained ambiguous despite the

allopatric distribution of the lineages.We then considered them as single taxon (e.g.

Atelopushoogmoedi,A.flavescens)giventheyrepresentsinglelineageandsinglepatchof

distribution.Ultimately,we think thatTAXO1provides a representativeupdate of the

currenttaxonomicknowledgeforAmazoniananurans.941specieswereconsideredin

TAXO1,including365occurringinAmazonia.

For TAXO2 we applied the Automatic Barcode Gap Discovery (ABGD) species

delineation method (Puillandre et al., 2012) to our sequence dataset. We performed

ABGDanalysesfromthesourcecodewithdefaultsettings(JC69,Pmin:0.001,Pmax:0.1,

steps:10,Nbbins:20)oneachgenus,andattributedanumbertoeachcandidatespecies

retrievedintheanalysis.ComputationswereperformedontheEDB-CalcClusterhosted

bythelaboratory"ÉvolutionetDiversitéBiologique"(EDB),usingasoftwaredeveloped

by the Rocks(r) Cluster Group (San Diego Supercomputer Center, University of

California, San Diego and its contributors. In 24 instances (17 concerning Amazonian

taxa),differentnominaltaxainTAXO1werelumpedintoauniquecandidatespeciesin

TAXO2becauseofashallowmtDNAdivergencebetweenthem(notablyinAtelopusspp.

and Osteocephalus ssp.). As these correspond to clearly distinct species based on

morphologyandacoustic,andformmonophyleticgroupsinpreviousstudies(butherein

with shallow divergence or recovered ambiguously paraphyletic due to the low

resolutioninour400bp-long16Sfragment),weconsideredthemasfalsenegativeand

Page 234: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

230

we applied to them the same taxonomic assignment as in TAXO1. Ultimately, 1,246

specieswereconsideredinTAXO2,including746occurringinAmazonia.

Third, we compiled amphibian species range data from the IUCN

(http://www.iucnredlist.org/technical-documents/spatial-data#amphibians), which is

themostwidely used amphibian distribution database. In order tomake this dataset

comparablewithTAXO1andTAXO2,weexcluded22genera(433species)thatareonly

partlyoverlappingwithourfocalarea,i.e.,westernAmazonia,northernAndes,Caatinga

andCerrados.OnegenusfromtheTepuis(Metaphryniscus)wasalsoomittedgiventhat

no sequences were available, as well as two introduced species (Eleutherodactylus

johnstoneiandLithobatescatesbeianus).Overall,51generawereusedinouranalyses.

Studyareaandspeciesdistributiondata4.

Ouranalysesfocusedonarectangularareathatincludesthewholecentral,easternand

northern parts of Amazonia (excludingmost of thewestern and southern parts). The

limitsofourstudyareawereW72°W47°andS11°N9°.Weappliedagridof1°Í1°

(500cells)tothisarea.ThisincludestheGuianaShield(Lujan&Armbruster,2011),the

centralandeasternpartsof theRioAmazonasdrainage,andthenorthernpartsof the

RioPurus,RioMadeira,RioTapajós,RioXingú,andRioTocantinsdrainages(Fig.1A)as

wellasperipheralnon-Amazonianareas.

We then estimated the putative range of each species by creating convex

polygonsoutofouroccurrencedatasetsTAXO1(358speciestotalwithinthefocalarea)

andTAXO2(596species)withthesppackageimplementedinR(RDevelopmentCore

Team,2016).ThenumbersofAmazonianspeciesincludedinTAXO1andTAXO2differ

from those occurring within the focal area because this area encompasses non-

Amazonian areas and excludes western and southern parts of Amazonia. We then

interpolated the occurrence of species in each cell of our study area for the three

datasets.Weexcludedspeciesoccurringinlessthanthreelocalitiesandcellswithless

than fivespecies in them, thus removingpoorly sampledspecies, thatdidnotprovide

enoughinformationforrangereconstruction,andpoorlysampledperipheralcells.118

Page 235: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

231

specieswerediscardedinTAXO1and318inTAXO2.Finally,weconsidered240species

inTAXO1,278inTAXO2,and440intheIUCNdatasetwithinthefocalarea(Fig.1D,E,

F).

Figure1.(A)Alloccurrencesinthebarcodingdatasetandinsetofthefocalarea;(B)AmazonianAreasofEndemismfromSmithetal.,2014;(C)speciesrichnessmappedfromoccurrencesdatafrom TAXO1 and TAXO2, which provide identical results; (D) species richness mapped fromTAXO1afterpolygontransformationandexclusionofrarespecies;(E)speciesrichnessmapped

Page 236: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

232

from TAXO2 after polygon transformation and exclusion of rare species; (F) species richnessmappedfromthedistributiondataofIUCNconsideredinouranalyses.

IdentificationofBiogeographicSubregions5.

TodelimitBSRsbasedonspeciesoccurrence,wedecomposedthecommunitymatrix-

i.e., the matrix listing the species occurring in each grid cell - using Latent Dirichlet

Allocation (Blei et al., 2003; Valle et al., 2014). LDA is an unsupervised clustering

methodbasedonaprobabilisticmodel,whichassumesthatseveralspeciesassemblages

coexistoverthestudyarea,thenumberKofwhichisfixedbeforehand.Thismethodhas

major advantages compared to classic clustering (e.g., hierarchical or k-means

clustering). First, it is likelihood-based, thusproviding rigorous tools for selecting the

number of assemblages and comparing decompositions. Second, assemblages may

partially overlap in taxonomic composition, and a given grid cell may either be

dominatedbyoneassemblageorcontainamixtureofassemblages.Thus, itallows for

modellinggradualchangesintaxonomiccompositionoverspace.Amixingparameterα

isestimatedaspartoftheinferenceprocedure,andindicateswhetherthesamplestend

tobedecomposedintoanevenmixtureofassemblages(case𝛼 > 1)orintoanuneven

mixturedominatedbyoneassemblage(case𝛼 < 1).

WeusedtheVariationalExpectationMaximization(EM)algorithmimplemented

byBleietal.(2003)andwrappedintotheRpackagetopicmodels(Grün&Hornik,2011)

forparameterinference,withaconvergencethresholdof10!!fortheEMstepand10!!

for the variational step.We assessed the reliability of the solution by comparing the

taxonomic composition of assemblages between 100 realizations of the algorithm

starting from random initial conditions. We only interpreted the decomposition

correspondingtotherealizationwiththehighestlikelihoodoutof100.Weselectedthe

numberKofassemblagesbyAICminimization.Werepresentedthespatialdistribution

of assemblages on a map after ordinary Kriging between cells (R package gstat ;

Pebesma, 2004). We also computed the Jaccard taxonomic dissimilarity between

assemblages and displayed it as a dendrogram. Additionally, we decomposed the

Page 237: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

233

datasets into K=3 assemblages to assess the coarser biogeographic structure of the

studyarea.SeeSommeria-Kleinetal.(inprep.)forfurthermethodologicaldetails.

Page 238: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

234

Results

Underestimation of species richness. Based on our analyses, among the 363

Amazonian species found in TAXO1, 53 genetic lineages could not be associatedwith

any nominal taxa. In the EGS, most of these undescribed lineages were already

documented (e.g., Adelophryne sp., Scinax sp. 2, or Pristimantis sp. 1) (Fouquet et al.,

2007b, 2012b). In southern and western Amazonia however, several lineages are

reportedhereforthefirsttime(e.g.,Allobatessp.“Divisor”,Amazophrynellasp.“Acre”,

Dendropsophussp.“Xingú”),Thissuggeststhatspeciesdiversityhasbeenwellsampled

inthelowlandsoftheGuianaShield,butnotintherestofAmazonia.Ourdatasetsalso

provide evidence of range extension formany taxa compared to previous knowledge.

ThisisforexamplethecaseofScinaxnasicus,whichextendstotheSipaliwinisavannah

(Suriname), Pristimantis koheleri, to the southern part of the Guiana Shield, or

Synapturanusmirandariberoi,tothesouthernpartoftheAmazonasdrainage.However,

mostofthesenewlydocumentedpopulationsarehighlygeneticallydivergentfromthe

populations lying within the known range of the species and are considered as

independentspeciesinTAXO2.

Infact,246TAXO1speciesdisplaysplits,yielding568species(Í2.3)inTAXO2.

TAXO2 provides 1,548 pairwise comparisons among species that are lumped as

conspecific in TAXO1. 39% of these average pairwise distances (p-distance pairwise

deletion)wereabove6%,athresholdbelievedtoconservativelydelimitspecies(Vences

et al., 2005; Fouquet et al., 2007a) and 85% were above 3% (Fig. 2A). In terms of

taxonomy, 436 TAXO2 species cannot be assigned to any of the 310 nominal taxa of

TAXO1. These observations suggest that the TAXO1 framework remains

overconservativeinmanyinstances.

Page 239: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

235

Figure2.A:HistogramoftheaveragepairwisedistancesamongTAXO2speciesconsideredasasingle TAXO1 species (white bars) and among TAXO2 species considered as different TAXO1species(redbars;this lastdistributionwasrandomlysampledtoharbourthesamenumberofcomparisonsthan inthepreviousone);(B-C)Examplesofgeneticandgeographicpatterns fortwoPanamazonian singleTAXO1species thatprovidedrasticallydifferentpatterns inTAXO2;Leptodactyluspetersiibeing split into 16 specieswhereasHypsiboascalcaratus is only split intwo candidate species in TAXO2. The colours of the lineages on the tree correspond to thecoloursoftheoccurrencepointsandareasonthemap.†indicatescandidatespeciesthatwerediscardedfromtheanalysesinTAXO2(lessthanthreelocalityrecords).

Page 240: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

236

Anumberofdistinctpatternsofdistributionemergefromtheoccurrencedataof

TAXO1 and TAXO2. We highlight three of them that segregate groups of species

occurring in the EGS: Guiana Shield endemic groups; Panamazonian allopatric groups

andwidespreadspecies.Thefirstpatternconcernsfivegroupsthatareendemictothe

GuianaShieldandoccurinboththehighlandsandthelowlands:Adelophryne(4species

in TAXO1 vs.4 in TAXO2),Otophryne (3 vs.3 species),Synapturanus (3 vs.4 species),

Anomaloglossus (15 vs.29 species),Vitreorana ritae clade (3 vs.3 species),Hypsiboas

benitezi clade (3 vs. 3 species). Among them, only Anomaloglossus seems to have

substantiallydiversifiedinthelowlands.Secondly,thevastmajorityofspeciesoccurring

in the EGS are nested inwidespread Amazonian or lowlands Neotropical clades (Fig.

2B).Most of these cladesdisplaydeepdivergence amongpopulations (above6%; e.g.

Leptodactyluspetersii–16candidate species inTAXO2)andcontain several candidate

specieswithmorerestrictedranges.Finally,78speciesoutof358(22%)inTAXO1,45

out of 596 (8%) in TAXO2 and 142 out of 440 (32%) in IUCN actually have broad

distributions(>1millionskm2)withinourfocalstudyarea(e.g.,H.calcaratus)(Fig.2C).

Biogeographical subregions.Wedecomposed theTAXO1,TAXO2 and IUCNdatasets

usingLatentDirichletAllocation.AICminimizationyieldedanoptimalnumberofspecies

assemblages close toK = 8 for all three datasets (Fig. S2). The retrieved assemblages

were found to be spatially segregated (mixing parameter α much smaller than 1:

𝛼!"#$ = 0.021,𝛼!"#$! = 0.019,𝛼!"#$! = 0.016 ) and contiguous. We could thus

interpretthemasBSRs.TheLDAdecompositionwasfoundtobereliableforthethree

datasetsbasedonitsstabilityover100realizations(Fig.S2).

Page 241: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

237

Figure 3. Maps generated by interpolating the eight-assemblage Latent Dirichlet Allocation(LDA)decompositionofthespeciesoccurrencedata(A,B,C),andcorrespondingdendrogramsshowingtherelationshipsbetweentheeightassemblagesrecoveredintheLDAdecompositionusing average Jaccard taxonomic dissimilarity (based on the presence/absence of species inassemblages). (A) TAXO1; (B) TAXO2; (C) IUCN data. The white dashed lines represent theapproximateboundariesoftheBSRforathree-assemblageLDAdecomposition(inpanel[B],thenorth-westernandsouth-easternregionsbelongtothesameassemblage).Thenumbersonthemaps correspond to the numbers attributed to assemblages for each dataset. (D) TAXO1; (E)TAXO2;(F)IUCNdata.

Eventhoughnotidentical,thespatialboundariesoftheeightBSRsretrievedfor

TAXO1andTAXO2wereverysimilar(Fig.3A-B).ThelowlandsoftheEGSwereclearly

separatedfromtherestofthestudyareabytheRioAmazonasandthePantepuiregion.

Moreover,theEGSwasalsofoundtoexhibitsomeinternalstructure,sincethisareawas

composedofthreeindependentBSRs,allfoundinbothTAXO1andTAXO2despitelarge

differencesinthedistributionofthespeciesconsidered(e.g.,Leptodactyluspetersii).One

of these three BSRs (BSR1 on Fig. 3A-B) comprised the southern part of Guyana,

Roraima and the Northern parts of Pará and Amazonas states (Brazil). A second one

(BSR2 on Fig. 3A-B) comprised the northern part of Guyana and adjacent Venezuela.

Finally,a thirdone(BSR3onFig.3A-B)comprisedthestateofAmapá(Brazil),French

Page 242: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

238

Guiana,andSuriname.ThesethreeBSRwereretrievedasasingleclusterinthecoarser

3-assemblage LDA decomposition. Taxonomic comparison between assemblages

indicatedthatamongthesethreeBSR,BSR1andBSR3weremoresimilartoeachother,

inbothTAXO1andTAXO2(Fig.1D,E).TheonlynotabledifferencebetweenTAXO1and

TAXO2intheEGSareawasthattheboundariesofBSR1matchedwelltheRioNegroand

Rio Amazonas in TAXO2, while BSR1 extended somewhat further west across the

RupununisavannahinTAXO1.TheboundariesbetweenBSRsinthisspecificareawere

also sharper in TAXO2 than in TAXO1. Outside of the EGS area, there was a striking

matchbetweenBSRboundariesandRioMadeirainTAXO1thatwasalreadyrecovered

in the 3-assemblage decomposition. In contrast, the Purus and Tapajòs Rivers were

foundtobeeachatthecenterofaBSRinbothTAXO1andTAXO2.

ThedistributionofBSRsusingtheIUCNdatabaseprovidedamarkedlydifferent

pattern, notably not matching the EGS boundaries. The three Guianas (Guyana,

Suriname,andFrenchGuiana)weregroupedtogetherinoneBSR,excludingthenorth-

westernpartofGuyanaand includingadjacentareasofAmapáandPará (Brazil).The

southernpartoftheEGSwasgroupedwiththesouthernpartoftheAmazondrainage,

thusencompassingRioAmazonas(Fig.3C).

Speciesrichnessandendemism.Intermsofspeciesrichnessandendemism,thethree

datasetsareradicallydifferent.TheBSR1ofIUCNiscomposedof119species,27.7%of

whichareendemic(Table1),andisgeographicallycomparabletothelumpingtogether

ofBSR2and3inTAXO1andTAXO2.Yet,despiteencompassingasmallergeographical

area,theBSR3ofTAXO1alonedisplayssimilarvaluesofrichnessandendemismasthe

BSR1 of IUCN. When considering the three Guiana Shield BSRs together in TAXO1,

richness(184species)andendemism(57%)aremuchhigherthanintheBSR1ofIUCN.

These metrics increase to 250 species and 82.4 % endemism in TAXO2 for the EGS

(Table1).BSR2(NorthernGuyana)containsthehighestnumberofendemicspeciesin

both taxonomic frameworks, reaching 75%endemism inTAXO2 (Table 1),while the

highestspeciesrichness(130inTAXO2)isfoundinBSR3(Suriname,FrenchGuianaand

Amapá).

Page 243: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

239

UICN TAXO1 TAXO2

Partition BSR Species

richness

Endemic

species

Endemism

rate (%)

Species

richness

Endemic

species

Endemism

rate (%)

Species

richness

Endemic

species

Endemism

rate (%)

K = 8

1 119 33 27.7 89 4 0.4 71 25 35.2

2 – – – 85 46 54.1 90 68 75.5

3 – – – 118 30 25.4 130 77 59.2

K = 3 1 – – – 184 105 57 250 206 82.4

Table1:SpeciesrichnessandendemismineachoftheBSRscoveringtheEGS.Thefigurespresentedinthistableincludesingletons(specieswithonlyoneoccurrencepoint)andspeciesthatoccurinlessthanthreecells.BSRnumberscorrespondtothosedisplayedinFig.3.ForK=3,assemble1actuallycorrespondstotheEGS.

Page 244: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

240

Discussion

Underestimation of species richness and regional endemism in Amazonia.We

analysedourmoleculardiversitydatausing two alternative taxonomic frameworks: a

conservativeframeworkTAXO1inwhichsequenceswereasmuchaspossibleclustered

intomonophyleticgroupsaroundpreviouslydescribednominaltaxa,andaframework

TAXO2 in which species were delineated solely based on the molecular distance

betweensequences.Ourspeciesdelineationanalysiscorroboratesprevioussuggestions

that the actual number of anuran species occurring in Amazonia remains vastly

underestimated (Fouquet et al., 2007a; Funk et al., 2012; Ferrão et al., 2016). The

numberofspeciesretrievedinTAXO2(746)andthelevelofdivergenceamongthemare

particularlystrikinginmanygroups.

OurTAXO1datasetcomprises363Amazonianspecies,whichisclosetothe427

speciesrecordedbytheIUCN.However,oursamplingeffort is lowoutsidetheEGS,as

illustratedbythefactthatwedonotretrieveseveralnominaltaxaincludedintheIUCN

database.Therefore,theactualnumberofspeciesislikelytobelargelyunderestimated

in TAXO1 outside the EGS. Moreover, TAXO1 remains over-conservative in many

instances, as the level of genetic divergencewithin species is often very high. TAXO2

suggests the existence of more than twice the number of species found in TAXO1.

ConsideringthatunevensamplingisevenmoreofanissueinTAXO2thaninTAXO1,as

manyofour candidate species areonly retrieved inoneor a few localities, theactual

species count for Amazonia is likely to be substantially more than twice the current

count. Hence, comparisons between taxonomic frameworks should be limited to the

EGS,whereoursamplingeffortishighest.WhenconsideringsolelytheEGS,thenumber

ofcandidatespeciesretrievedinTAXO2is1.34timeshigherthanforTAXO1(Table2).

A species delineation solely based on mtDNA divergence remains overly

simplistic and cannot reliably delineate the species occurring in the region since it

necessarilyoverestimates theactualnumberof species in somecases (falsepositives)

and underestimates in others (false negatives) (Hickerson et al., 2006). The pitfalls

inherent to the sole use of shortmtDNA sequences for species delineation have been

Page 245: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

241

already extensively discussed (Hubert&Hanner, 2015).Nevertheless, inmost groups

for which the boundaries among species have been investigated using integrative

taxonomy,mtDNAdivergenceofsimilarmagnitudeasusedinthisstudytodifferentiate

between intra- and interspecific genetic divergence was generally associated with

phenotypicoracousticdifferentiationaswell(Funketal.,2012;Fouquetetal.,2015b;

Ortega-Andradeetal.,2015;Fouquetetal.,2016).Moreover,TAXO2subdivisionshave

already been proven to be associated with morphological or acoustic differences in

severalgroups(Jansenetal.,2011;Fouquetetal.,2013;Ferrãoetal.,2016).Thus, the

TAXO2 taxonomic framework takes into account finer subdivisions that certainly

correspond tophenotypically distinct species inmany cases, and it is highlyprobable

thattheprevalenceoffalsepositivesremainslimited.Incontrast,somefalsenegatives

weredetectedsinceseveralnominal taxawereretrievedasasinglecandidatespecies

using ABGD (e.g.,Atelopus flavescens and A. hoogmoedi,O. oophagus andO. taurinus).

ThesewerecorrectedinTAXO2buttheprevalenceoffalsenegativesremainsdifficultto

evaluate in most groups where species boundaries have not been investigated using

phenotypic traits. Overall, the present work provides an important update to the

documentation of Amazonian anuran diversity, which will undoubtedly contribute to

stimulatetheprocessofspeciesdelineationanddescription.

Ifourworkprovidesaglimpseofhow farwestill are fromreachinga realistic

estimateof thenumberof speciesoccurring throughoutAmazonia, italsoprovidesan

evenmorestrikingviewofthedegreeofregionalendemism.Ourestimatesoftherateof

endemismforthefrogsoftheEGSreach57.0%basedonTAXO1and82.4%basedon

TAXO2.ThesefiguresaretwotofourtimeshigherthantheestimateoftheIUCNforthe

samearea.Theyarealso1.0to1.4timeshigherthantherateofendemismoffrogsinthe

wholegeologicallydefinedGuianaShield,whichalsoencompassesVenezuelaandpartof

Colombia (Señaris&MacCulloch, 2005). In comparison, only 7.7%of bird species are

endemic to the whole Guiana Shield, 29 % of reptile species, and 11 % of mammal

species (Hollowell&Reynolds, 2005).These figures are still certainlyunderestimated

(Lim,2012),especiallyforreptiles(Geurgas&Rodrigues,2010;deOliveiraetal.,2016),

buttaxonomyhasprobablyreachedamuchmorestablelevelforbirdsandmammalsin

theGuianaShieldthanforanurans.IncomparisonwithothertropicalAmericanregions,

Page 246: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

242

51.3%ofthevertebratespeciesfromtheAtlanticForestofBrazilareendemic,and46.2

%ofthevertebratesfromthetropicalAndesareendemic(Myersetal.,2000).

A simpleandroughextrapolationbasedon the species richnessandendemism

weobtainedfortheEGS(184–250specieswith57–82%endemism)appliedtotheeight

AmazonianBSRsretrievedinouranalysis leadstoca.1,472–2,000speciesinourfocal

area,which represent about three to five times the 427 species that are supposed to

occur inAmazonia according to the IUCN.Enhancingdata coverage in order to refine

theseestimationswouldrequireextensivefieldworkinremoteareas.Nevertheless,new

predictiveapproachesbasedonthedetectionofcrypticdiversity(Espíndolaetal.,2016)

maypermit to get amoreprecise estimate of species richness and endemism in each

BSR,andthereforewouldhelptargetingareaswheretofocussampling.

Biogeographic division of the eastern Guiana Shield. The extent of the BSRs

retrieved forTAXO1andTAXO2arevery similar in spiteof theuseof twodrastically

differenttaxonomicframeworks.Incontrast,theBSRsretrievedfromtheIUCNdatabase

areverydifferentanddonotcorrespondtoanylandscapefeature.Nobarriereffectof

the lowerRioAmazonas isevendistinguishable.This ismost likelyresulting fromthe

artificiallylargedistributionofmanyspeciescontainedinthisdatabaseonbothsidesof

thisriver.

The locationof theRioMadeiramatcheswell theboundarybetweenBSR5and

BSR6 in TAXO1,which is in accordancewithwhat has already been shown for other

groups of terrestrial vertebrates, such as birds (Fernandes et al., 2012; Ribas et al.,

2012) and primates (Cortés-Ortiz et al., 2003). The sharpness of this pattern is not

obviousinTAXO2,butthisisprobablyduetotheremovalofmanysingletonsfromthe

dataset after species delineation. Another interesting aspect is the lack of apparent

suture effect between the Purus and the Solimões drainages, also in accordancewith

whathaspreviouslybeenfoundforothergroupofterrestrialvertebrates(Cortés-Ortiz

etal.,2003;Fernandesetal.,2012;Ribasetal.,2012).Theseriversdisplayameandering

behaviour associated with an unstable course over time, thus enabling gene flow

throughconnectionbetweenpopulationslocatedonbothsidesanddispersalofspecies

fromone interfluve to theother (Aleixo,2004,2006;Batesetal., 2004; Jacksonetal.,

Page 247: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

243

2013).Onthecontrary,wideriversintheBrazilianshieldsuchasRioMadeiradisplaya

putativelymorestablecourseovertimeandaremorelikelytoactaslonglastingsuture

zones that might have promoted diversification or at least been more efficient in

preventingdispersal(Antonellietal.,2010;Moraesetal.,2016).Suchcharacteristicsare

alsofoundinriversoftheEGS(Fernandesetal.,2012;Fouquetetal.,2012a,2015a),but

exceptfortheRioBrancoandRioNegro,theimpactoftheGuianaShieldriversongene

flowthroughdispersallimitationmightnotbeasimportantasfortheAmazonianrivers

of theBrazilianShield, owing to the smaller extentof the catchments and the smaller

width of the rivers themselves. This is reflected in our results, as the suture zones

betweenthethreeBSRsoftheEGSdonotcorrespondtoanymajordrainage.Infact,itis

more likely that the delimitation of these assemblages resulted from the combined

influence of past climatic and landscape changes (Fouquet etal., 2012c). The current

climatic characteristics of the EGS are heterogeneous, with a large dryer corridor

observedinthesouthernpart(Mayle&Power,2008),wherepatchesofsavannahsare

found today. This corridor alsomatches the suture zone betweenBSR1 vs. BSR2 and

BSR3. The strong climatic fluctuations in the Neotropics during the Miocene and

Plioceneplayedacrucialroleinthediversificationofseveralorganisms(Antonellietal.,

2010).Morerecentclimatefluctuationsandassociatedlandscapemodificationsduring

thePleistocenecertainlyhelpedmaintainthediversitythatresultedfromdiversification

eventsduringtheMioceneandPlioceneperiods(Carnaval&Bates,2007).

TheouterlimitsofthethreeBSRsmatchwellthedelimitationoftheGuiananarea

retrieved for birds (Naka, 2011), confirming the relevance of qualifying the EGS as a

biogeographic area. Nonetheless, using anuran assemblages as a model revealed

biogeographic heterogeneity within this region that could not be detected with bird

assemblages, likely because birds have much higher dispersal abilities than anurans

(Pigot&Tobias,2015).Thedistinctivenessof theBSRs compared to the remainingof

thedataset isalsoreflectedinthestructureofthedendrogramillustratingthe levelof

taxonomic similarity between assemblages (Fig. 3D, E). The southern limit of BSR1

corresponds to Rio Amazonas for both TAXO1 and TAXO2. This is congruent with

previousstudiesonterrestrialvertebratesindicatingthatthisriverisastrongbarrierto

gene flowand that it structures speciesassemblages (Cortés-Ortizetal., 2003;Haffer,

2008; Ribas et al., 2012). The delineation of the western part of BSR1 differs across

Page 248: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

244

datasets. It coincides perfectly with the lower Rio Negro, and the Rio Branco and

associatedsavannahs(Rupununi) inTAXO2butextends furtherwest inTAXO1.These

differencesareinherenttothescarcersamplingwestandsouth-westoftheRioNegro

andRioBranco,weakening the sharpnessof the analysis in that zone, aphenomenon

that becomes even more prevalent in TAXO2 because of the further taxonomic

subdivisions. Another reason could be the inclusion of both forest and open habitat

speciesinouranalysis,whichcouldblurthepatterninareaswherebothsavannahand

forestarefound.

It is interestingtonote that the limitsof theBSRsof theEGSarerathersimilar

when considering either aK=3 or aK=8 decomposition, for both TAXO1 and TAXO2.

Thisindicatesthatastrongco-occurrencesignalunderliesthedelineationoftheseBSRs,

especially in the caseof the twonorthernmostones (BSR2andBSR3)whosewestern

and eastern boundaries coincide perfectly with the ones retrieved in the three-

assemblagedecomposition(Fig.3).

Conclusion.Despitebeingfarfromexhaustive,ourbarcodingdatasetisthelargestever

gathered forAmazonia,andweargue that it isclose frombeingexhaustivewithin the

EGS. Of course, the patterns we obtained need to be confirmed in other taxonomical

groups, and need even for the anurans to be much improved outside the EGS.

Nevertheless, our results help us understand the spatial scale of the sampling efforts

needed to capture the actual diversity of Amazonia. It implies notably that the

magnitudeoftheLinneanandWallaceanshortfallsinAmazoniaissolargethatwecould

questiontheconclusionsoflarge-scalestudiesbasedoncurrentlyadmittedbiodiversity

data inAmazonia (Feeley& Silman, 2011; Foden etal., 2013). In fact, evenwith very

coarsedata(IUCN),theyestimatedthatAmazonianamphibiansarehighlythreatenedby

climatechange.Consideringthatmanyspecieswerenotincludedandthattheyactually

harbour much narrower distributions, we can hypothesise that the situation is even

moreworrying.IfadegreeofendemismsimilartotheoneweestimatedwithintheEGS

actually occurs across Amazonia, the impact of habitat loss could have been

underestimated.ItisespeciallythecasealongtheArcofdeforestation(Vedovatoetal.,

2016),whereentire faunalassemblages thatmayharbourahighdegreeofendemism

Page 249: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

245

areatriskofextinction(DaSilvaetal.,2005).Moreover,onlyBSR3encompassesalarge

proportion of protected areas in the EGS. In contrast, BSR2 (northern Guyana) only

harbourstwoprotectedareasandtheBSR1onlyencompassesthreebiologicalreserves

(REBIO), four national forests (FLONA) and three national parks (PARNA) in its

Brazilian part. Such results demonstrate the importance of deciphering the basic

structureoftheAmazoniandiversityinordertoconserveitefficiently.

Page 250: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

246

Acknowledgements

Thiswork has benefited from an 'Investissement d'Avenir' grantmanaged byAgence

Nationale de la Recherche (CEBA, ref.ANR-10-LABX-25-01), France. We would like to

thank the following people for their help on the field: Daniel Baudin, Sébastien Cally,

ElodieCourtois,AndyLorenzini,BenoîtVillette.WethankPierreSolbèsatLaboratoire

Évolution et Diversité Biologique (Toulouse, France) for support with the EDB-cCacl

cluster.

Page 251: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

247

References

Aleixo, A. (2004) Historical diversification of a terra-firme forest bird superspecies: aphylogeographic perspective on the role of different hypotheses of Amazoniandiversification.Evolution,58,1303–1317.

Aleixo,A.(2006)HistoricaldiversificationoffloodplainforestspecialistspeciesintheAmazon:acasestudywithtwospeciesoftheaviangenusXiphorhynchus(Aves:Dendrocolaptidae).BiologicalJournaloftheLinneanSociety,89,383–395.

Anderson,L.O. (2012)Biome-ScaleForestProperties inAmazoniaBasedonFieldandSatelliteObservations.RemoteSensing,4.

Antonelli, A., Quijada-Mascareñas, A., Crawford, A.J., Bates, J.M., Velazco, P.M. & Wüster, W.(2010)MolecularstudiesandphylogeographyofAmazoniantetrapodsandtheirrelationtogeologicalandclimaticmodels. InHoorn,C.,Wesselingh,F.:Amazonia,LandscapeandSpeciesEvolution,1stedition.Blackwellpublishing,386–404.

Antonelli, A. & Sanmartín, I. (2011)Why are there somany plant species in the Neotropics?Taxon,60,403–414.

Bates,H.W.(1863)ThenaturalistontheRiverAmazons,arecordofadventures,habitsofanimals,sketchesofBrazilianandIndianlifeandaspectsofnatureundertheEquatorduringelevenyearsoftravel,JohnMurray,London.

Bates,J.M.,Haffer,J.&Grismer,E.(2004)AvianmitochondrialDNAsequencedivergenceacrossaheadwaterstreamoftheRioTapajós,amajorAmazonianriver.JournalofOrnithology,145,199–205.

Blei,D.M.,Ng,A.Y.&Jordan,M.I.(2003)LatentDirichletAllocation.JournalofMachineLearning,3,993–1022.

Carnaval,A.C.&Bates, J.M. (2007)AmphibianDNAshowsmarkedgenetic strucureand tracksPleistoceneclimatechangeinNortheasternBrazil.Evolution,61,2942–2957.

Claramunt, S., Derryberry, E.P., Remsen, J. V & Brumfield, R.T. (2011) High dispersal abilityinhibitsspeciationinacontinentalradiationofpasserinebirds.ProceedingsoftheRoyalSocietyB:BiologicalSciences.

Colinvaux, P.A., De Oliveira, P.E. & Bush, M.B. (2000) Amazonian and Neotropical plantcommunities on glacial time-scales: The failure of the aridity and refuge hypotheses.QuaternaryScienceReviews,19,141–169.

Cortés-Ortiz, L., Bermingham, E., Rico, C., Rodrıguez-Luna, E., Sampaio, I. & Ruiz-Garcıa, M.(2003) Molecular systematics and biogeography of the Neotropical monkey genus,Alouatta.MolecularPhylogeneticsandEvolution,26,64–81.

Cracraft, J. (1985) Historical Biogeography and Patterns of Differentiation within the SouthAmericanAvifauna:AreasofEndemism.OrnithologicalMonographs,49–84.

Dinerstein,E.,Olson,D.M.,Graham,D.J.,Webster,A.L.,Primm,S.A.,Bookbinder,M.P.&Ledec,G.(1995)AConservationAssessmentoftheTerrestrialEcoregionsofLatinAmericaandtheCaribbean,Washigton(DC):WorldBank.

Duellman,W.E.&Trueb,L.(1994)BiologyofAmphibians,JohnHopkinsUniversityPress.Emerson, B.C., Cicconardi, F., Fanciulli, P.P. & Shaw, P.J.A. (2011) Phylogeny, phylogeography,

phylobetadiversity and themolecular analysis of biological communities.PhilosophicalTransactionsoftheRoyalSocietyB:BiologicalSciences,366.

Page 252: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

248

Espíndola,A.,Ruffley,M.,Smith,M.L.,Carstens,B.C.,Tank,D.C.&Sullivan, J. (2016) Identifyingcryptic diversity with predictive phylogeography. Proceedings of the Royal Society B:BiologicalSciences,283.

Feeley, K.J. & Silman, M.R. (2016) Disappearing climates will limit the efficacy of Amazonianprotectedareas.DiversityandDistributions,22,1081–1084.

Feeley,K.J.&Silman,M.R.(2011)Thedatavoidinmodelingcurrentandfuturedistributionsoftropicalspecies.GlobalChangeBiology,17,626–630.

Fernandes, A.M., Wink, M. & Aleixo, A. (2012) Phylogeography of the chestnut-tailed antbird(Myrmeciza hemimelaena) clarifies the role of rivers in Amazonian biogeography.JournalofBiogeography,39,1524–1535.

Ferrão,M., Colatreli, O., de Fraga, R., Kaefer, I.L.,Moravec, J. & Lima, A.P. (2016)High speciesrichnessofScinaxtreefrogs(Hylidae)inathreatenedAmazonianlandscaperevealedbyanintegrativeapproach.PLoSONE,11,e0165679.

Ficetola,G.F.,Rondinini,C.,Bonardi,A.,Katariya,V.,Padoa-Schioppa,E.&Angulo,A.(2014)Anevaluationof the robustnessof global amphibian rangemaps. JournalofBiogeography,41,211–221.

Foden,W.B.,Butchart, S.H.M., Stuart, S.N., Vié, J.-C.,Akçakaya,H.R.,Angulo,A.,DeVantier, L.M.,Gutsche,A.,Turak,E.,Cao,L.,Donner,S.D.,Katariya,V.,Bernard,R.,Holland,R.A.,Hughes,A.F., O’Hanlon, S.E., Garnett, S.T., Şekercioğlu, Ç.H. &Mace, G.M. (2013) Identifying theWorld’sMostClimateChangeVulnerableSpecies:ASystematicTrait-BasedAssessmentofallBirds,AmphibiansandCorals.PLoSONE,8,e65427.

Fouquet, A., Courtois, E.A., Baudain, D., Lima, J.D., Souza, S.M., Noonan, B.P. & Rodrigues,M.T.(2015a)Thetrans-riverinegeneticstructureof28Amazonianfrogspeciesisdependentonlifehistory.JournalofTropicalEcology,31,361–373.

Fouquet,A.,Gilles,A.,Vences,M.,Marty,C.,Blanc,M.&Gemmell,N.J.(2007a)UnderestimationofspeciesrichnessinneotropicalfrogsrevealedbymtDNAanalyses.PlosOne,2,e1109.

Fouquet,A.,Ledoux, J.-B.,Dubut,V.,Noonan,B.P.&Scotti, I. (2012a)Theinterplayofdispersallimitation, rivers, and historical events shapes the genetic structure of an Amazonianfrog.BiologicalJournaloftheLinneanSociety,106,356–373.

Fouquet,A.,Loebmann,D.,Castroviejo-Fisher,S.,Padial, J.M.,Orrico,V.G.D.,Lyra,M.L.,Roberto,I.J.,Kok,P.J.R.,Haddad,C.F.B.&Rodrigues,M.T.(2012b)FromAmazoniatotheAtlanticforest:MolecularphylogenyofPhyzelaphryninaefrogsrevealsunexpecteddiversityanda striking biogeographic pattern emphasizing conservation challenges. MolecularPhylogeneticsandEvolution,65,547–561.

Fouquet,A.,Martinez,Q.,Courtois,E.A.,Dewynter,M.,Pineau,K.,Gaucher,P.,Blanc,M.,Marty,C.&Kok,P.J.R.(2013)AnewspeciesofthegenusPristimantis(Amphibia,Craugastoridae)associatedwiththemoderatelyelevatedmassifsofFrenchGuiana.Zootaxa,3750,569–586.

Fouquet,A.,Martinez,Q.,Zeidler,L.,Courtois,E.A.,Gaucher,P.,Blanc,M.,Lima,J.D.,Souza,S.M.,Rodrigues, M.T. & Kok, P.J.R. (2016) Cryptic diversity in the Hypsiboas semilineatusspeciesgroup(Amphibia,Anura)withthedescriptionofanewspeciesfromtheeasternGuianaShield.Zootaxa,4084,79–104.

Fouquet,A.,Noonan,B.P.,Rodrigues,M.T.,Pech,N.,Gilles,A.&Gemmell,N.J. (2012c)Multiplequaternary refugia in the Eastern Guiana Shield revealed by comparativephylogeographyof12frogspecies.SystematicBiology,61,461–489.

Fouquet, A., Orrico, V.G.D., Ernst, R., Blanc, M., Martinez, Q., Vacher, J.-P., Rodrigues, M.T.,

Page 253: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

249

Ouboter,P.,Jairam,R.&Ron,S.(2015b)AnewDendropsophusFitzinger,1843(Anura:Hylidae)oftheparvicepsgroupfromthelowlandsoftheGuianaShield.Zootaxa,4052,39–64.

Fouquet,A.,Recoder,R.,Teixeira Jr.,M.,Cassimiro, J.,Amaro,R.C.,Camacho,A.,Damasceno,R.,Carnaval, A.C., Moritz, C. & Rodrigues, M.T. (2012d) Molecular phylogeny andmorphometric analyses revealdeepdivergencebetweenAmazonia andAtlantic ForestspeciesofDendrophryniscus.MolecularPhylogeneticsandEvolution,62,826–838.

Fouquet, A., Vences, M., Salducci, M.D., Meyer, A., Marty, C., Blanc, M. & Gilles, A. (2007b)Revealingcrypticdiversityusingmolecularphylogeneticsandphylogeography in frogsof the Scinax ruber andRhinellamargaritifera species groups.MolecularPhylogeneticsandEvolution,43,567–582.

Funk,W.C.,Caminer,M.&Ron,S.R.(2012)HighlevelsofcrypticspeciesdiversityuncoveredinAmazonianfrogs.ProceedingsoftheRoyalSocietyB:BiologicalSciences,279,1806–1814.

Gehara,M.,Crawford,A.J.,Orrico,V.G.D.,Rodríguez,A., Lötters, S., Fouquet,A.,Barrientos,L.S.,Brusquetti,F.,DelaRiva,I.,Ernst,R.,Urrutia,G.G.,Glaw,F.,Guayasamin,J.M.,Hölting,M.,Jansen,M.,Kok,P.J.R.,Kwet,A., Lingnau,R., Lyra,M.,Moravec, J., Pombal Jr, J.P.,Rojas-Runjaic, F.J.M., Schulze, A., Señaris, J.C., Solé, M., Rodrigues,M.T., Twomey, E., Haddad,C.F.B.,Vences,M.&Köhler,J.(2014)Highlevelsofdiversityuncoveredinawidespreadnominaltaxon:continentalphylogeographyoftheneotropicaltreefrogDendropsophusminutus.PLoSONE,9,e103958.

Geurgas, S.R. & Rodrigues, M.T. (2010) The hidden diversity of Coleodactylus amazonicus(Sphaerodactylinae, Gekkota) revealed bymolecular data.Molecular Phylogenetics andEvolution,54,583–593.

Goldstein, P.Z. & DeSalle, R. (2011) Integrating DNA barcode data and taxonomic practice:Determination,discovery,anddescription.BioEssays,33,135–147.

Grün, B. & Hornik, K. (2011) topicmodels: An R Package for Fitting Topic Models. Journal ofStatisticalSoftware;Vol1,Issue13.

Haffer, J. (1974) Avian speciation in Tropical South America, Nuttall Ornithological Club, 14,Cambridge,Massachusetts.

Haffer, J. (2008)Hypotheses to explain the origin of species in Amazonia.Brazilian JournalofBiology,68,917–947.

Hall,J.P.W.&Harvey,D.J.(2002)ThephylogeographyofAmazoniarevisited:newevidencefromriodinidbutterflies.Evolution,56,1489–1497.

Hayes,F.E.&Sewlal,J.-A.N.(2004)TheAmazonRiverasadispersalbarriertopasserinebirds:effectsofriverwidth,habitatandtaxonomy.JournalofBiogeography,31,1809–1818.

Hickerson, M.J., Stahl, E.A. & Lessios, H.A. (2006) Test for Simultaneous Divergence usingApproximateBayesianComputation.Evolution,60,2435–2453.

Hollowell,T.&Reynolds,R.P.(2005)ChecklistoftheterrestrialvertebratesoftheGuianaShield.BulletinoftheBiologicalSocietyofWashington.

Hoorn, C. & Wesselingh, F.P. (2010) Introduction: Amazonia, landscape and species evolution.Amazonia:landscapeandspeciesevolution,pp.1–6.Wiley-Blackwell.

Hoorn, C., Wesselingh, F.P., ter Steege, H., Bermudez, M.A., Mora, A., Sevink, J., Sanmartín, I.,Sanchez-Meseguer, A., Anderson, C.L., Figueiredo, J.P., Jaramillo, C., Riff, D., Negri, F.R.,Hooghiemstra,H.,Lundberg,J.,Stadler,T.,Särkinen,T.&Antonelli,A.(2010)AmazoniaThrough Time: Andean Uplift, Climate Change, Landscape Evolution, and Biodiversity.Science,330,927–931.

Page 254: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

250

Hubbell,S.P.,He,F.,Condit,R.,Borda-de-Água,L.,Kellner, J.&terSteege,H.(2008)HowmanytreespeciesarethereintheAmazonandhowmanyofthemwillgoextinct?ProceedingsoftheNationalAcademyofSciences,105,11498–11504.

Hubert,N.&Hanner,R. (2015)DNABarcoding,speciesdelineationandtaxonomy:ahistoricalperspective.DNABarcodes,3,44–58.

Hughes,C.E.,Pennington,R.T.&Antonelli,A.(2013)NeotropicalPlantEvolution:AssemblingtheBigPicture.BotanicalJournaloftheLinneanSociety,171,1–18.

Jackson, N.D., Austin, C.C., Haffer, J., Capparella, A., Haffer, J., Gascon, C.,Malcolm, J., Patton, J.,Silva, M. da, Bogart, J., Hayes, F., Sewlal, J., Colwell, R., Slatkin, M., Peres, C., Patton, J.,daSilva, M., McLuckie, A., Lamb, T., Schwalbe, C., McCord, R., Fouquet, A., Ledoux, J.,Dubut,V.,Noonan,B.,Scott,I.,Brice,J.,Stølum,H.,Akin,J.,Mather,C.,Brooks,G.,Fitch,H.,Achen,P.,Jackson,N.,Austin,C.,Jackson,N.,Austin,C.,Soltis,D.,Morris,A.,McLachlan,J.,Manos,P.,Soltis,P.,Pyron,R.,Burbrink,F.,Excoffier,L.,Smouse,P.,Quattro, J., Jackson,N.,Glenn,T.,Hagen,C.,Austin,C.,Irwin,D.,Kocher,T.,Wilson,A.,Austin,C.,Spataro,M.,Peterson,S.,Jordon,J.,McVay,J.,DeWoody,J.,Schupp,J.,Kenefic,L.,Busch,J.,Murfitt,L.,Hoffman, J., Amos, W., Pompanon, F., Bonin, A., Bellemain, E., Taberlet, P., Weir, B.,Cockerham, C., Kalinowski, S., Raymond, M., Rousset, F., Stamatakis, A., Pritchard, J.,Stephens, M., Donnelly, P., Jost, L., Hedrick, P., Excoffier, L., Hedrick, P., Slatkin, M.,Balloux,F.,Lugon-Moulin,N.,Paetkau,D.,Waits,L.,Clarkson,P.,Craighead,L.,Strobeck,C.,Gaggiotti,O.,Lange,O.,Rassmann,K.,Gliddon,C.,Crawford,N.,Heller,R.,Siegismund,H., Faubet, P., Gaggiotti, O., Barton, N., Slatkin, M., Pemberton, J., Slate, J., Bancroft, D.,Barrett, J.,Dakin,E.,Avise,J.,Evanno,G.,Regnaut,S.,Goudet, J.,Brandley,M.,Guiher,T.,Pyron,R.,Winne,C.,Burbrink,F.,O’Donnell,R.,Mock,K.,Austin,J.,Lougheed,S.,Boag,P.,Fontanella,F., Feldman,C., Siddall,M.,Burbrink,F.,Guiher,T.,Burbrink,F., Starkey,D.,Shaffer, H., Burke, R., Forstner, M., Iverson, J., Zamudio, K., Savage, W., Makowsky, R.,Chesser,J.,Rissler,L.,Niemiller,M.,Fitzpatrick,B.,Miller,B.,Li,J.,Yeung,C.,Tsai,P.,Lin,R.,Yeh,C.,Postma,E.,Noordwijk,A.van,Gavrilets,S.,Tatarenkov,A.,Healey,C.,Avise,J.,Gavrilets, S., Li,H., Vose,M., Jackson, S.,Webb,R., Anderson,K.,Overpeck, J.,Webb, T.,Haywood,A.,Valdes,P.,Sellwood,B.,Kaplan,J.,Dowsett,H.,Saucier,R.,Estoup,A.,Jarne,P.,Cornuet,J.,O’Reilly,P.,Canino,M.,Bailey,K.,Bentzen,P.,Frazier,D.,Kesel,R.,Blum,M.,Guccione,M.,Wysocki,D.,Robnett,P.,Rutledge,E.,Smith,L.,Baker,J.,Killgore,K.&Kasul,R.(2013)TestingtheRoleofMeanderCutoffinPromotingGeneFlowacrossaRiverineBarrierinGroundSkinks(Scincellalateralis).PLoSONE,8,e62812.

Jansen, M., Bloch, R., Schulze, A. & Pfenninger, M. (2011) Integrative inventory of Bolivia’slowlandanuransrevealshiddendiversity.ZoologicaScripta,40,567–583.

Jenkins, C.N., Alves,M.A.S., Uezu, A. & Vale,M.M. (2015) Patterns of Vertebrate Diversity andProtectioninBrazil.PLoSONE,10,e0145064.

Jenkins,C.N.,Pimm,S.L.& Joppa,L.N. (2013)Globalpatternsof terrestrialvertebratediversityandconservation.ProceedingsoftheNationalAcademyofSciences,110,E2602–E2610.

Ji, Y.,Ashton,L., Pedley, S.M.,Edwards,D.P.,Tang,Y.,Nakamura,A.,Kitching,R.,Dolman,P.M.,Woodcock,P.,Edwards,F.A.,Larsen,T.H.,Hsu,W.W.,Benedick,S.,Hamer,K.C.,Wilcove,D.S., Bruce, C., Wang, X., Levi, T., Lott, M., Emerson, B.C. & Yu, D.W. (2013) Reliable,verifiableandefficientmonitoringofbiodiversityviametabarcoding.EcologyLetters,16,1245–1257.

Katoh, K. & Standley, D.M. (2013) MAFFT Multiple Sequence Alignment Software Version 7:Improvements inPerformanceandUsability.MolecularBiologyandEvolution,30,772–

Page 255: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

251

780.Krishna Krishnamurthy, P. & Francis, R.A. (2012) A critical review on the utility of DNA

barcodinginbiodiversityconservation.BiodiversityandConservation,21,1901–1919.Kumar, S., Stecher, G. & Tamura, K. (2016)MEGA7:Molecular Evolutionary Genetics Analysis

version7.0forbiggerdatasets.MolecularBiologyandEvolution,33,1870–1874.Lim, B.K. (2012) Preliminary assessment of Neotropical mammal DNA barcodes: an

underestimationofbiodiversity.TheOpenZoologyJournal,5,10–17.Lujan,N.K.&Armbruster,J.W.(2011)TheGuianaShield.HistoricalBiogeographyofNeotropical

FreshwaterFishes,pp.211–224.TheRegentsoftheUniversityofCalifornia.Mayle,F.E.&Power,M.J.(2008)ImpactofadrierEarly–Mid-HoloceneclimateuponAmazonian

forests.PhilosophicalTransactionsoftheRoyalSocietyB:BiologicalSciences,363,1829–1838.

Moraes,L.J.C.L.,Pavan,D.,Barros,M.C.&Ribas,C.C.(2016)Thecombinedinfluenceofriverinebarriers and flooding gradients on biogeographical patterns for amphibians andsquamatesinsouth-easternAmazonia.JournalofBiogeography,43,2113–2124.

Morrone, J.J. (2005) Biogeographic areas and transition zones of Latin America and theCaribbeanislandsbasedonpanbiogeographicandcladisticanalysesoftheentomofauna.AnnualReviewofEntomology,51,467–494.

Myers,N.,Mittermeier,R.A.,Mittermeier,C.G.,daFonseca,G.A.B.&Kent, J. (2000)Biodiversityhotspotsforconservationpriorities.Nature,403,853–858.

Naka, L.N. (2011) Avian distribution patterns in the Guiana Shield: implications for thedelimitationofAmazonianareasofendemism.JournalofBiogeography,38,681–696.

Naka, L.N., Bechtoldt, C.L., Henriques L. Magalli Pinto & Brumfield, R.T. (2012) The Role ofPhysicalBarriers in theLocationofAvianSutureZones in theGuianaShield,NorthernAmazonia.TheAmericanNaturalist,179,E115–E132.

Nelson,B.W.,Ferreira,C.A.C.,daSilva,M.F.&Kawasaki,M.L.(1990)Endemismcentres,refugiaandbotanicalcollectiondensityinBrazilianAmazonia.Nature,345,714–716.

deOliveira,D.P.,deCarvalho,V.T.&Hrbek,T.(2016)CrypticdiversityinthelizardgenusPlica(Squamata):phylogeneticdiversityandAmazonianbiogeography.ZoologicaScripta,45,630–641.

Olson,D.M.,Dinerstein,E.,Wikramanayake,E.D.,Burgess,N.D.,Powell,G.V.N.,Underwood,E.C.,D’amico,J.A.,Itoua,I.,Strand,H.E.,Morrison,J.C.,Loucks,C.J.,Allnutt,T.F.,Ricketts,T.H.,Kura, Y., Lamoreux, J.F.,Wettengel,W.W., Hedao, P. & Kassem, K.R. (2001) TerrestrialEcoregionsof theWorld:ANewMapofLifeonEarth:Anewglobalmapof terrestrialecoregionsprovidesaninnovativetoolforconservingbiodiversity.BioScience,51,933–938.

Ortega-Andrade,H.M.,Rojas-Soto,O.R.,Valencia,J.H.,EspinosadelosMonteros,A.,Morrone,J.J.,Ron,S.R.&Cannatella,D.C.(2015)InsightsfromIntegrativeSystematicsRevealCrypticDiversity inPristimantisFrogs (Anura:Craugastoridae) fromtheUpperAmazonBasin.PLoSONE,10,e0143392.

Pebesma, E. J. (2004) Multivariable geostatistics in S: the gstat package. Computers &Geosciences,30,683-691.

Pigot, A.L. & Tobias, J.A. (2015) Dispersal and the transition to sympatry in vertebrates.ProceedingsoftheRoyalSocietyB:BiologicalSciences,282,20141929.

Pimm,S.L., Jenkins,C.N.,Abell,R.,Brooks,T.M.,Gittleman, J.L., Joppa,L.N.,Raven,P.H.,Roberts,C.M. & Sexton, J.O. (2014) The biodiversity of species and their rates of extinction,

Page 256: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Appendix–BiogeographyofAmazonianAnurans

252

distribution,andprotection.Science,344.Puillandre, N., Lambert, A., Brouillet, S. & Achaz, G. (2012) ABGD, Automatic Barcode Gap

Discoveryforprimaryspeciesdelimitation.MolecularEcology,21,1864–1877.RDevelopmentCoreTeam(2016)R:ALanguageandEnvironmentforStatisticalComputing.R

FoundationforStatisticalComputingViennaAustria,0,{ISBN}3-900051-07-0.Ribas,C.C.,Aleixo,A.,Nogueira,A.C.R.,Miyaki,C.Y.&Cracraft, J. (2012)Apalaeobiogeographic

model for biotic diversification within Amazonia over the past three million years.ProceedingsoftheRoyalSocietyB:BiologicalSciences,279,681LP-689.

Señaris,J.C.&MacCulloch,R.D.(2005)Amphibians.ChecklistoftheTerrestrialVertebratesoftheGuianaShield(ed.byT.Hollowell)andR.P.Reynolds),pp.9–23.BulletinoftheBiologicalSocietyofWashingtonno.13.

Da Silva, J.M.C., Rylands, A.B.&Da Fonseca, G.A.B. (2005) The fate of theAmazonian areas ofendemism.ConservationBiology,19,689–694.

Sioli,H.(1984)TheAmazon:LimnologyandLandscapeEcologyofaMightyTropicalRiveranditsBasin,W.Junk,Dordrecht,TheNetherlands.

Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)DecomposingbiodiversitydatausingtheLatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.Ecologyletters,17,1591–1601.

Vedovato, L.B., Fonseca, M.G., Arai, E., Anderson, L.O. & Aragão, L.E.O.C. (2016) The extent of2014forestfragmentationintheBrazilianAmazon.RegionalEnvironmentalChange,1–6.

Vences, M., Thomas, M., van der Meijden, A., Chiari, Y. & Vieites, D.R. (2005) Comparativeperformanceofthe16SrRNAgeneinDNAbarcodingofamphibians.FrontiersinZoology,2,1–12.

Vilhena, D.A. & Antonelli, A. (2015) A network approach for identifying and delimitingbiogeographicalregions.NatureCommunications,6,6848.

Wallace, A.R. (1852) On the monkeys of the Amazon. Proceedings of the Zoological Society ofLondon,20,107–110.

Wells,K.D.(2010)TheEcologyandBehaviorofAmphibians,UniversityofChicagoPress,Chicago.Wynn, A. & Heyer,W.R. (2001) Do geographicallywidespread species of tropical amphibians

exist? An estimate of genetic relatedness within the neotropical frog Leptodactylusfuscus(Schneider,1799)(Anura,Leptodactylidae).TropicalZoology,14,255–285.

Yu, D.W., Ji, Y., Emerson, B.C., Wang, X., Ye, C., Yang, C. & Ding, Z. (2012) Biodiversity soup:metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring.MethodsinEcologyandEvolution,3,613–623.

Zeisset, I. & Beebee, T.J.C. (2008) Amphibian phylogeography: a model for understandinghistoricalaspectsofspeciesdistributions.Heredity,101,109–119.

Zizka,A.,Steege,H.Ter,Pessoa,M.D.C.R.&Antonelli,A.(2016)Findingneedlesinthehaystack:WheretolookforrarespeciesintheAmericantropics.Ecography.

Page 257: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture
Page 258: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture
Page 259: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Author:GuilhemSommeria-Klein

Title: Frommodels to data: understanding biodiversity patterns from environmentalDNAdata

Supervisors:JérômeChave,HélèneMorlon

Abstract: Integrative patterns of biodiversity, such as the distribution of taxaabundances and the spatial turnover of taxonomic composition, have been underscrutiny from ecologists for a long time, as they offer insight into the general rulesgoverning the assembly of organisms into ecological communities. Thank to recentprogressinhigh-throughputDNAsequencing,thesepatternscannowbemeasuredinafast and standardized fashion through the sequencing of DNA sampled from theenvironment (e.g. soil or water), instead of relying on tedious fieldwork and rarenaturalistexpertise.Theycanalsobemeasuredforthewholetreeoflife,includingthevast and previously unexplored diversity ofmicroorganisms. Taking full advantage ofthisnewtypeofdataischallenginghowever:DNA-basedsurveysareindirect,andsufferas such from many potential biases; they also produce large and complex datasetscompared to classical censuses. The first goal of this thesis is to investigate howstatisticaltoolsandmodelsclassicallyusedinecologyorcomingfromotherfieldscanbeadapted to DNA-based data so as to better understand the assembly of ecologicalcommunities.The secondgoal is toapply theseapproaches to soilDNAdata from theAmazonianforest,theEarth’smostdiverselandecosystem.

Twobroadtypesofmechanismsareclassically invokedtoexplaintheassemblyofecologicalcommunities:‘neutral’processes,i.e.therandombirth,deathanddispersalof organisms, and ‘niche’ processes, i.e. the interaction of the organisms with theirenvironment and with each other according to their phenotype. Disentangling therelative importance of these two types of mechanisms in shaping taxonomiccompositionisakeyecologicalquestion,withmanyimplicationsfromestimatingglobaldiversity to conservation issues. In the first chapter, thisquestion is addressedacrossthetreeoflifebyapplyingtheclassicalanalytictoolsofcommunityecologytosoilDNAsamplescollectedfromvariousforestplotsinFrenchGuiana.

The second chapter focuses on the neutral aspect of community assembly. AmathematicalmodelincorporatingthekeyelementsofneutralcommunityassemblyhasbeenproposedbyS.P.Hubbellin2001,makingitpossibletoinferquantitativemeasuresofdispersalandofregionaldiversityfromthelocaldistributionoftaxaabundances.Inthischapter,thebiasesintroducedwhenreconstructingthetaxaabundancedistributionfromenvironmentalDNAdataarediscussed,andtheirimpactontheestimationofthedispersalandregionaldiversityparametersisquantified.

The third chapter focuses on how non-random differences in taxonomiccomposition across a group of samples, resulting from various community assemblyprocesses,canbeefficientlydetected,representedandinterpreted.Amethodoriginallydesignedtomodelthedifferenttopicsemergingfromasetoftextdocumentsisappliedhere to soilDNAdata sampled along a grid over a large forest plot in FrenchGuiana.Spatialpatternsofsoilmicroorganismdiversityaresuccessfullycaptured,andrelatedtofinevariationsinenvironmentalconditionsacrosstheplot.Finally,theimplicationsofthethesisfindingsarediscussed.Inparticular,thepotentialoftopicmodellingforthemodellingofDNA-basedbiodiversitydataisstressed.

Keywords:spatialbiodiversitypatterns,speciesabundancedistribution,beta-diversity,environmental DNA, metabarcoding, soil biodiversity, tropical forest, French Guiana,statistical modeling of biodiversity, neutral theory of biodiversity, topic modeling

Page 260: École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf · Merci à Simon et aux locataires successifs de l’inénarrable maison de la culture

Auteur:GuilhemSommeria-KleinTitre : Desmodèle aux données: comprendre la structure de la biodiversité à partir de l'ADNenvironnementalDirecteursdethèse:JérômeChave,HélèneMorlonLieuetdatedesoutenance:UniversitéPaulSabatier,Toulouse,le14septembre2017

Résumé:Ladistributiondel’abondancedesespècesenunsite,etlasimilaritédelacompositiontaxonomiqued’unsiteà l’autre,sontdeuxmesuresdelabiodiversitéayantservidelonguedatede base empirique aux écologues pour tenter d’établir les règles générales gouvernantl’assemblage des communautés d’organismes. Pour ce type de mesures intégratives, leséquençage haut-débit d'ADN prélevé dans l'environnement («ADN environnemental»)représenteunealternativerécenteetprometteuseauxobservationsnaturalistestraditionnelles.Cette approche présente l’avantage d’être rapide et standardisée, et donne accès à un largeéventaildetaxonsmicrobiensjusqu’alorsindétectables.Toutefois,cesjeuxdedonnéesdegrandetailleà la structurecomplexe sontdifficilesàanalyser, et le caractère indirectdesobservationscomplique leur interprétation. Le premier objectif de cette thèse est d’identifier les modèlesstatistiques permettant d’exploiter ce nouveau type de données pour mieux comprendrel’assemblagedescommunautés.Ledeuxièmeobjectifestdetesterlesapprochesretenuessurdesdonnéesdebiodiversitédusolenforêtamazonienne,collectéesenGuyanefrançaise.

Deux grands types de processus sont invoqués pour expliquer l'assemblage descommunautésd’organismes : lesprocessus "neutres", indépendantsde l’espèce considérée,quesont la naissance, la mort et la dispersion des organismes, et les processus liés à la nicheécologiqueoccupéeparlesorganismes,c'est-à-direlesinteractionsavecl’environnementetentreorganismes.Démêlerl'importancerelativedecesdeuxtypesdeprocessusdansl’assemblagedescommunautés est une question fondamentale en écologie ayant de nombreuses implications,notamment pour l'estimation de la biodiversité et la conservation. Le premier chapitre abordecettequestionà travers la comparaisond’échantillonsd'ADNenvironnementalprélevésdans lesol de diverses parcelles forestières en Guyane française, via les outils classiques d’analysestatistiqueenécologiedescommunautés.

Le deuxième chapitre se concentre sur les processus neutres d’assemblages descommunautés. S.P. Hubbell a proposé en 2001 un modèle décrivant ces processus de façonprobabiliste,etpouvantêtreutilisépourquantifierlacapacitédedispersiondesorganismesainsique leur diversité à l’échelle régionale simplement à partir de la distribution d’abondance desespèces observée en un site. Dans ce chapitre, les biais liés à l’utilisation de l’ADNenvironnementalpourreconstituerladistributiond’abondancedesespècessontdiscutés,etsontquantifiésauregarddel’estimationdesparamètresdedispersionetdediversitérégionale.

Le troisièmechapitre seconcentresur lamanièredont lesdifférencesnon-aléatoiresdecompositiontaxonomiqueentresiteséchantillonnés,résultantdesdiversprocessusd’assemblagedes communautés, peuvent être détectées, représentées et interprétés. Un modèle statistiqueconçuàl'originepourclassifierlesdocumentsàpartirdesthèmesqu’ilsabordentesticiappliquéà des échantillons de sol prélevés selon une grille régulière au sein d’une grande parcelleforestière. La structure spatiale de la composition taxonomique des microorganismes estcaractériséeavecsuccèsetreliéeauxvariations finesdesconditionsenvironnementalesauseindelaparcelle.

Les implications des résultats de la thèse sont enfin discutées. L'accent est mis enparticulier sur lepotentieldesmodèles thématique («topicmodels»)pour lamodélisationdesdonnéesdebiodiversitéissuesdel’ADNenvironnemental.

Mots-clés : structure spatiale de la biodiversité, distribution d’abondance d’espèces, diversitébeta,ADNenvironnemental,metabarcoding,biodiversitédusol,forêttropicale,Guyanefrançaise,modélisationstatistiquedelabiodiversité,théorieneutredelabiodiversité,topicmodeling

Disciplineadministrative:EcologieIntituléetadressedulaboratoire:LaboratoireEvolution&DiversitéBiologique(EDB)UMR5174(CNRS/UPS/IRD),UniversitéPaulSabatier,Bâtiment4R1118routedeNarbonne,31062Toulousecedex9,France.