Du
Shedule
Place Amphithéâtre de la Délégation du CNRS
THESIS
Soutenance de thèse d'Hugo Menet
Jury :
Lars Arvestad, Professeur, Université de Stockholm, Suède, Rapporteur
Catherine Matias, DR CNRS, Rapporteure
Gergely Szöllősi, Chercheur, Université Loránd Eötvös, Budapest, Hongrie, Rapporteur
Sabine Peres, Professeure, Lyon 1, Examinatrice
Eric Tannier, DR INRIA, Directeur de thèse
Vincent Daubin, DR CNRS, Co-directeur de thèse
Abstract :
Biological systems like holobionts are made of entities at many scales (macro-
organisms, micro-organisms, genes...), which are, on the one hand, bound to a com-
mon history because they all function together and depend on the others, and on
the other hand, driven by their individual interests. The evolution of such a system
is approached by phylogenetic reconciliation, which describes the coevolution of two
different levels, genes and species, or hosts and symbionts, for example. The limit
to two levels has confined the use of reconciliation to either molecular studies on
genes and species trees or ecological studies on host-symbiont associations. The
holobiont concept is an occasion to gather all these scales by modeling multi-level
inter-dependencies. In this thesis, we explore and extend reconciliation to model
such multi-level systems
Phylogenetic reconciliation is a phylogenetic method that arose at the interplay of
two communities, the coevolution of host and symbiont, and the comparison of gene
and species trees. Lately, despite this initial development, these two communities
tend not to interact much, even if they have a lot to learn from each other. We
review the development of these methods, take a generic approach, and highlight
the new advances that propose more integrative models, reaching out for multi-level
reconciliation.
In recent years, new methods have been proposed to integrate the evolution of
species, gene and gene domain, or geography, host and symbiont, but none have
yet investigated the levels at the heart of the holobiont: host, symbiont and genes,
and none in a probabilistic setting and with horizontal transfers. We reimplemented
ALE, a probabilistic DTL reconciliation software, and extended it to consider the
reconciliation of three levels: host, symbiont, and gene. This new probabilistic
model of the evolution of three nested levels allows gene transfer, host switch, gene
duplication, symbiont diversification inside a host, and gene or symbiont loss. Given
three phylogenetic trees, we devise a Monte Carlo algorithm able to infer joint
scenarios and compute their likelihood in the model, accounting for gene transfer
rates’ dependence on host symbiont reconciliation as well as the impact of ghost
lineages on these rates. As in ALE, we use amalgamation to take into account
uncertainty in the gene trees, but also to infer the symbiont tree using universal
unicopy genes as a topology distribution for the symbiont tree. This method was
evaluated using a simulated dataset on which we showed its capacity to distinguish
models of 2-level and 3-level coevolution using the computed likelihood. It is able
on aphids/enterobacteria systems to retrieve transfers better than the host unaware
method.
With potentially an exponential number of most parsimonious solutions, recon-
ciliation output can be hard to interpret, notably when considering multiple sampled
scenarios or multiple gene families, moreover when we want to look at multi-level
systems. Few graphical software exists, and none are generic and can use RecPhy-
loXML, a common format endorsed by the gene species community. We propose
Thirdkind, a software we developed, able to output a graphical display of a recon-
ciliation scenario as an SVG file. It is easy to use and install. It can handle the
embedment of three trees that is the output of our 3-level reconciliation framework
and can resume the evolution of multiple gene families or sampled scenarios in a
single figure by aggregating redundant transfers.
A fascinating example of complex coevolution history is the relationship between
Helicobacter pylori and its human host. Helicobacter pylori is a bacterial pathogen
that is believed to have followed its human host during its ancestral migrations,
during the colonization of Africa, Asia, Europe, Oceania, and the Americas. The
bacterial strains are structured in populations whose geographical repartition is
mostly congruent with that of their host. One of the significant discrepancies is
the European population, which seems to result from introgression between two
ancestral populations, one related to modern African and the other to modern Asian.
These hypotheses rely on Bayesian models of SNPs attribution to populations, for
whole genomes, or a small subset of genes via Multi Locus Sequence Typing. We
took a more phylogeny-focused approach using a dataset constructed in the team,
with a phylogeny for 120 strains, comprising the ancestral strain found in
Ötzi, dated to 5 kilo years ago, and 1034 gene trees. We applied reconciliation to gene
trees and population trees to better understand the mixed origins of the genes in
the European population. This new approach, which relies on matching certain
leaves of the gene trees (here the European ones) uniformly to all the leaves of the
upper tree and then looking at the posterior probability of matching, could be easily
transposed to other problems. We also used our 3-level reconciliation framework to
test different population trees