UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive" UCB Lyon 1 - Bât. Grégor Mendel 43 bd du 11 novembre 1918 69622 VILLEURBANNE cedex
I am generally interested in developing statistical and machine learning methods to solve problems in molecular biology.
In particular, I have ongoing projects on :
- Finding genetic determinants of bacterial resistances, with collaborators at Biomérieux.
- Improving the inference of phylogenetic trees, with local collaborators at LBBE.
- Detecting differential splicing, with local collaborators at LBBE.
- Removing unwanted variation from gene expression data, with collaborators at Agendia, UCSF and WEHI.
- To appear
Johann Gagnon-Bartsch, Laurent Jacob, and Terence P. Speed.
Removing unwanted variation from high dimensional data with negative controls. To appear as a monograph from the Institute of Mathematical Statistics.
- Technical reports
Magali Jaillard, Maud Tournoud, Leandro Lima, Vincent Lacroix, Jean-Baptiste Veyrieras, and Laurent Jacob (2017).
Representing Genetic Determinants in Bacterial GWAS with Compacted De Bruijn Graphs
Laurent Jacob and Terence P. Speed (2016).
The healthy ageing gene expression signature for Alzheimer’s disease diagnosis : a random sampling perspective.
Miles E. Lopes, Laurent Jacob, and Martin J. Wainwright (2011).
A more powerful two-sample test in high dimensions using random projection.
Guillaume Obozinski*, Laurent Jacob*, and Jean-Philippe Vert (2011).
Group lasso with overlaps : the latent group lasso approach.
Anne-Claire Haury, Laurent Jacob, and Jean-Philippe Vert (2010).
Improving stability and interpretability of gene expression signatures.
The code I use for my research is available either as a tarball or an R package :
- Code and data used to generate the plots in our random sampling study of the healthy ageing signature paper.
- RUVnormalize : remove unwanted variation from gene expression data (bioconductor).
- FlipFlop : identify transcripts and their abundances from RNASeq data (bioconductor).
- NCIGraph : use NCI pathway database integration networks in R (bioconductor).
- DEGraph : detect differentially expressed gene networks from expression data (bioconductor).
- Overlasso : code and data used for the ICML 2009 overlapping group lasso paper, for reproducibility purpose. For efficiency, I recommend using the SPAMS library instead.
- Clustered multi-task : code and data used for the NIPS 2008 multi-task paper.
- GPCR : code and data used for the 2008 GPCR paper.
- MHC : code and data used for the 2008 epitope prediction paper. We also developped KISS, a web application.
- Spring 2014-2017, "Statistique en grande dimension pour la génomique", Master 2 Mathématique en Action, Univ. Lyon 1.
- Fall 2015, "Statistical learning and applications", M2 Informatique, ENS Lyon.
- Fall 2010, Teaching assistant for the "PB HLTH C240C/STAT C245C : Biostatistical Methods : Computational Statistics with Applications in Biology and Medicine" class, UC Berkeley.
- Fall 2010, "Using networks in machine learning and statistics", BioInfoSummer 2010 summer school, Melbourne, Australia.
- Fall 2010, "Statistical Methods and Software for mRNA-Seq and ChIP-Seq", 3-day class at Centro de Investigacion Principe Felipe, Valencia, Spain.
- Spring 2007-2009, "Apprentissage en bioinformatique et noyaux pour structures" for the machine learning module of École des Mines de Paris (in french). Lecture slides, material and questions for the practical session.
Here are a few presentations I gave on some of my projects :
- FlipFlop : identify transcripts and their abundances from RNASeq data.
- RUV : remove unwanted variation from gene expression data.
- DEGraph : detect differentially expressed gene networks from expression data.