
UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive" UCB Lyon 1 - Bât. Grégor Mendel 43 bd du 11 novembre 1918 69622 VILLEURBANNE cedex
I am generally interested in developing statistical and machine learning methods to solve problems in molecular biology.
In particular, I have ongoing projects on :
- GWAS for bacterial genomes and metagenomes, with collaborators at Pendulum and UC Berkeley.
- Improving the inference of phylogenetic trees, with local collaborators at LBBE.
- Predicting phenotypes from sequences using sparse learning and convolutional networks, with collaborators at Inria.
- To appear
Laurent Jacob, Anke Witteveen, Inès Beumer, Leonie Delahaye, Diederik Wehkamp, Jeroen van den Akker, Mireille Snel, Bob Chan, Arno Floore, Niels Bakx2, Guido Brink, Coralie Poncet, Jan Bogaerts, Mauro Delorenzi, Martine Piccart, Emiel Rutgers, Fatima Cardoso, Terence Speed, Laura van ’t Veer and Annuska Glas.
Controlling technical variation amongst 6693 patient microarrays of the randomized MINDACT trial
Published in Communications Biology (2020).Dexiong Chen, Laurent Jacob, and Julien Mairal.
Convolutional Kernel Networks for Graph-Structured Data
To appear in ICML 2020.Thierry Wirth, Marine Bergot, Jean-Philippe Rasigade, Bruno Pichon, Maxime Barbier, Patricia Martins-Simoes, Laurent Jacob, Rachel Pike, Pierre Tissieres, Jean-Charles Picaud, Angela Kearns, Philip Supply, Marine Butin, Frédéric Laurent.
Niche specialization and spread of a multidrug-resistant Staphylococcus capitis clone involved in neonatal sepsis
Published in Nature Microbiology (2020).Dexiong Chen, Laurent Jacob, and Julien Mairal.
Recurrent Kernel Networks
Published inAdv. Neural Information Processing Systems (NeurIPS) 2019Dexiong Chen, Laurent Jacob, and Julien Mairal.
Biological Sequence Modeling with Convolutional Kernel Networks.
Short version accepted in RECOMB 2019,
full version published in Bioinformatics 2019.Laurent Jacob, Florence Combes, and Thomas Burger.
PEPA test : fast and powerful differential analysis from relative quantitative proteomics data using shared peptides Published in Biostatistics 2019.Johann Gagnon-Bartsch, Laurent Jacob, and Terence P. Speed.
Removing unwanted variation from high dimensional data with negative controls. To appear as a monograph from the Institute of Mathematical Statistics.
- Technical reports
Magali Jaillard, Maud Tournoud, Leandro Lima, Vincent Lacroix, Jean-Baptiste Veyrieras, and Laurent Jacob (2017).
Representing Genetic Determinants in Bacterial GWAS with Compacted De Bruijn GraphsLaurent Jacob and Terence P. Speed (2016).
The healthy ageing gene expression signature for Alzheimer’s disease diagnosis : a random sampling perspective. Short version published in Genome Biology.Miles E. Lopes, Laurent Jacob, and Martin J. Wainwright (2011).
A more powerful two-sample test in high dimensions using random projection.Guillaume Obozinski*, Laurent Jacob*, and Jean-Philippe Vert (2011).
Group lasso with overlaps : the latent group lasso approach.Anne-Claire Haury, Laurent Jacob, and Jean-Philippe Vert (2010).
Improving stability and interpretability of gene expression signatures.
- Software
The code I use for my research is available either as a tarball or an R package :
- CKN-seq : convolutional kernel networks for transcription binding site prediction.
- OnAge : test of between-group differences in the onset of senescence.
- dbgwas : software and data used in our bacterial GWAS paper.
- Code used in the pepa test proteomics paper.
- Code and data used to generate the plots in our random sampling study of the healthy ageing signature paper.
- RUVnormalize : remove unwanted variation from gene expression data (bioconductor).
- FlipFlop : identify transcripts and their abundances from RNASeq data (bioconductor).
- NCIGraph : use NCI pathway database integration networks in R (bioconductor).
- DEGraph : detect differentially expressed gene networks from expression data (bioconductor).
- Overlasso : code and data used for the ICML 2009 overlapping group lasso paper, for reproducibility purpose. For efficiency, I recommend using the SPAMS library instead.
- Clustered multi-task : code and data used for the NIPS 2008 multi-task paper.
- GPCR : code and data used for the 2008 GPCR paper.
- MHC : code and data used for the 2008 epitope prediction paper. We also developped KISS, a web application.
- Teaching
- From 2021, "Advanced machine learning theory", Master 2 Mathématiques avancées, ENS Lyon.
- Since 2020, "Introduction to big data and machine learning", Master 2 bioinformatique, Univ. Lyon 1.
- Since 2018, "Statistical learning, theory and applications", CNRS formation entreprises.
- Since 2014, "Statistique en grande dimension pour la génomique", Master 2 Mathématique en Action, Univ. Lyon 1.
- Fall 2015, "Statistical learning and applications", M2 Informatique, ENS Lyon.
- Fall 2010, Teaching assistant for the "PB HLTH C240C/STAT C245C : Biostatistical Methods : Computational Statistics with Applications in Biology and Medicine" class, UC Berkeley.
- Fall 2010, "Using networks in machine learning and statistics", BioInfoSummer 2010 summer school, Melbourne, Australia.
- Fall 2010, "Statistical Methods and Software for mRNA-Seq and ChIP-Seq", 3-day class at Centro de Investigacion Principe Felipe, Valencia, Spain.
- Spring 2007-2009, "Apprentissage en bioinformatique et noyaux pour structures" for the machine learning module of École des Mines de Paris (in french). Lecture slides, material and questions for the practical session.