le jeudi 28 mars à 13:00

Tristan Mary Huard (INRA-Moulon)

par Vincent Daubin - 28 mars

Some contributions to the estimation of genetic distances between populations

We consider the problem where one wants to evaluate the level of divergence between K
populations. Each population is characterized by its allelic frequency prole, where allelic fre-
quencies are assumed to be estimated from a sample at several (typically thousands/millions
of) markers. In this context the FST is a widely used criterion for the quantication of the
divergence between two populations, that can also be adapted to the question of detecting ge-
nomic regions that exhibit a divergence level substantially higher than the rest of the genome.
Still, the concept of FST remains ambiguous - with dierent available denitions assumed to
be "connected" in some sense - and the strategy to estimate the FST when there are more
than 2 populations is still an open question, the most popular strategy being to consider all
possible pairs of population successively.
In this presentation we will rst propose a hierarchical model for the history of population
divergence and show that the two classical denitions of the FST (as provided by Hudson
and Weir & Cockerham) actually measure independent quantities. We will then provide an
estimation procedure based on the moment estimators suggested by Bhatia (in the case
of 2 populations) and show how both the FST components and the history of population
divergence may be jointly estimated. Lastly, we will consider the problem of detecting genomic
regions under selection and provide a segmentation procedure for the identication of such
regions. Both the estimation and the segmentation procedures will be illustrated on the 1KG
human genome dataset that gathers several human populations sampled over the world.