CSTB team: Complex Systems and Translational Bioinformatics

ArchivesStructure

From CSTB team: Complex Systems and Translational Bioinformatics
Jump to navigation Jump to search

Previous team organisation 2013-2015

Theme : LBGI Bioinformatique et Génomique Intégratives
lbgi.fr

LBGI Bioinformatics and Integrating Genomics, led by Olivier Poch and Julie Thompson, focuses on a thriving field of research in the field of health: translational bioinformatics. Our main objective is to develop a robust IT infrastructure capable of managing big data in order to extract relevant knowledge in a "bed-patient" approach. In this context, we are particularly interested in the study of rare genetic diseases and the understanding of the pathophysiological mechanisms involved in these diseases, which often have a potential interest in understanding altered biological processes in more common diseases, such as obesity, diabetes or cancer....

Research topics

The LBGI is devoted to the development of robust, automated and integrated in silico approaches (analytical approaches, statistics, data integration and mining, extraction and representation of knowledge...) in order to study the evolution and behavior of complex biological systems ("Hyperstructures", networks, etc.) in humans and various animal models. Taking advantage of our integrated IT approaches and long-standing collaborations at the international, national and local levels, the LBGI participates in the analysis of complex systems involved in various human diseases, including the study of functional deficiencies related to retinal diseases or the brain, the identification of genetic variations related to ciliopathies and the characterization of the genomic and transcriptomic context in various cancers.

Operations

The work of the LBGI is organized around two main complementary axes:

  • "Translational IT" (Julie Thompson), to develop an IT infrastructure dedicated to the integrated analysis of the "big data" resulting from high-throughput studies of human genetic diseases. This includes the design and development of original data management systems (storage, quality control, heterogeneous data integration) and analysis tools dedicated to data mining and extraction of biomedical knowledge. An important aspect is the development of intuitive user interfaces to facilitate access by biologists and clinicians.
  • "Systems bioinformatics" (Olivier Poch/Odile Lecompte), to develop research in the emerging field of the analysis of complex biological systems, in order to understand genotype-phenotype relationships and to anwser questions related to human diseases. This includes integrated studies of evolutive, "omics" and patient data, particularly those concerning ciliopathies, and the development of a systemic approach to the relationships between mutations and biological networks in diseases.

Keywords

.........


Theme : SONIC (Stochastic Optimisation and Nature Inspired Computing)

La thématique SONIC (Stochastic Optimisation and Nature Inspired Computing), led by Pierre Collet, studies and uses techniques to tackle complex problems that are insoluble by exact methods. Nature-inspired methods are privileged for their robustness and their very good exploration of the search space. The team uses mainly:

  • evolutionary algorithms, including :
    • genetic algorithms (applied to discrete and combinatory problems),
    • evolutionary strategies (applied to continuous problems),
    • genetic programming (applied to learning and data mining problems),
    • multi-objective evolutionary optimisation (for all industrial problems that need to optimize several antagonistic criteria at the same time),
  • optimisation by ant colonies,
  • emerging approaches (BOIDS, optimisation by particle swarms).

The team is currently at the highest international level in the use of massively parallel graphics cards (GPGPU) for scientific computation by artificial evolution and for artificial intelligence, being the first to obtain accelerations of about three orders of magnitude compared to a modern CPU core on generic optimization problems with the EAsy Specification of Evolutionary Algorithms platform [1]. Typically, a calculation for one day on a computer with multiple GPU cards becomes equivalent to several years of computation on a modern PC-compatible computer, which makes it possible to tackle problems that cannot be addressed by other techniques.

The goal is ambitious: it consists of implementing a true artificial intelligence that is competitive with human intelligence on a PC type computer equipped with several graphic cards. There are two main types of projects: fundamental projects dealing with the adaptation of evolutionary algorithms to the characteristics of these new cards, and applied projects that test the developed algorithms on real problems, which are often very different from toy problems like benchmarks.

Keywords

.........


Theme Theoretical Bioinformatics

The theoretical bioinformatics theme has been led by Christian Michel for more than 30 years. (i) Combinatorial study of circular codes (C. Michel) Scientific context: Circular codes were discovered in genes in 1996. These sets of words are very poorly understood from a mathematical point of view. Results: A new concept in the so-called "pearl" code theory allows us to describe varieties of commas-free codes and circular codes. Strong circular codes are identified, that are more constrained than comma-free codes . Recently (2016), an approach by graph theory was used to obtain new theorems on the circular codes formed by words of finite length on a finite alphabet.

(ii) Probabilistic models of gene evolution by substitution of genetic motifs (E. Benard, C. Michel) Scientific Context: Classical nucleotide evolution models (Jukes and Cantor, 1969; Kimura, 1980, 1981) are generalized to finite-size genetic patterns. Results: Using a mathematical approach based on the Kronecker operators (product and sum), these extended models allow us to determine the probability of exact occurrence (analytic solution) of a genetic motif of any size (dinucleotides, trinucleotides, etc.). ) over time as a function of substitution parameters (transitions and transversions) associated with each site of the studied patterns. Evolution can be oriented in the direct sense (from the past to the present) and vice versa (from the present to the past). The introduction of these Kronecker operators made it possible to resolve this theory of probabilistic models of gene evolution by substitution of genetic motifs that had been open since 1990.

(iii) Probabilistic models of gene evolution by substitution, insertion and deletion of genetic motifs (S. Lèbre, C. Michel) Scientific Context: There are very few probabilistic models of gene evolution involving nucleotide substitution, insertion and deletion processes. One of the reasons for this lies in the mathematical difficulty, from a modeling point of view, but also in the determination of analytical solutions. Results: We developed a more general class of evolution models in which the insertion and deletion parameters are explicit parameters independent of the substitution parameters. The idea is based on the introduction of a concept derived from population dynamics to obtain a system of differential equations combining the classical substitution process with the insertion / deletion process. By deriving a general solution verified for any diagonalizable substitution matrix, we obtained an analytic expression of the probability of occurrence of the nucleotides as a function of time, the eigenvalues and eigenvectors of the substitution matrix, the vector of the insertion rates of Nucleotides, the total insertion rate and the vector of the initial probabilities of the nucleotides. The analytic solutions are nontrivial with Gaussian hypergeometric functions and Kronecker operators (product and sum). Various mathematical properties were obtained: time scale, time decomposition, time inversion and time transformation as a function of the length of the sequence.

(iv) Stochastic models for the inference of genetic networks (S. Lèbre) Stochastic approaches focus on the reconstruction of genetic regulation networks. We have developed the ARTIVA (Auto Regressive TIme VArying) network model which has the particularity of proposing a variable dependency structure over time for continuous data. A Monte Carlo Method using Markov Chains (MCMC) with reversible jumps has been specifically adapted for the inference of this model from time series of gene expression. We then refined the model by introducing an exchange of information between the successive structures of the network. Following the recent transfer of Sophie Lèbre to the University of Montpellier, this research theme has been stopped.

Keywords

.........