Presentation of the CSTB team
Complex Systems are present everywhere around us. They can be defined as interacting reconfigurable entities, structured on several emerging levels of organization, in which the whole cannot be understood without the parts and the parts cannot be understood without the whole.
A "complex system" is generally any system comprising a large number of heterogeneous entities, between which local interactions create multiple levels of structuring and collective organization. Examples include natural systems ranging from biomolecules and living cells to social networks and the ecosphere, as well as sophisticated artificial systems such as the Internet, large power grids or any large scale distributed software.
Biological systems are unique in the complexity of their functioning and regulation, and the integrated study of the multiple levels that contribute to the final behavior of these systems now represents a new challenge for the scientific community. Thanks to the ever increasing quantities of data that describe each component of the system in detail, new opportunities exist to develop descriptive and predictive modeling approaches. These developments are applicable to the whole field of complex systems science, from social networks to finance.
In medicine, this 'systemic' awareness has led to the emergence of a new field of interdisciplinary research: translational medicine. This field aims to understand and exploit the diversity of clinical and phenotypic manifestations of diseases in patients to better understand and model the emergence and evolution of diseases. Ultimately, the aim of these developments is to lead to optimized and personalized treatments.
'The "Complex Systems and Translational Bioinformatics" team covers a broad spectrum of research in computer science, from bioinformatics to artificial intelligence.
In this context, CSTB intends to actively participate in 4P developments (Participative, Predictive, Preventive and Personalized) by developing original solutions in the fields of education, health or industry within two research themes:
- BIOGIM has a long experience in the analysis, annotation and mining of biomedical data. In particular, in the field of rare genetic diseases, BIOGIM seeks to identify the associations between genotype and phenotype and to understand the patterns and trends in the data. Traditional methods, which have been successful in the study of simple systems, are limited when applied to complex dynamic systems, where the genetic heritage of each patient underlies a large number of variations that interact with each other to produce effects from the atomic level to the organism.
- The questions we are now addressing focus on how to identify critical points in a complex biological system and how to predict the impact of disturbances (mutations, drugs, for example) on the stability and behavior of the system. This requires a multi-scale and multi-modal theoretical modeling of the biological functions and their regulation that underlie the observed phenotypes, while taking into account their dynamic interactions with the environment.
- BIONICS has expertise in the modeling of complex systems and nature-inspired optimization algorithms, including artificial evolution and artificial immune systems. These inherently massively parallel and asynchronous systems are constitutive of IT in the 21st century, composed of massively parallel computers in networks.
- The applications of nature-inspired complex systems include IT security and the search for patterns (artificial immune systems), optimization and artificial intelligence (artificial evolution) ecosystems for calculation and teaching (biological ecosystems) and of course, participatory, predictive, preventive and personalized translational medicine (which is the case for all complex systems).
- Indeed, on the basis of the observed data ("Participative"), we will try to determine "Predictive" models allowing to implement a "Prevention" in a "Personalized" way, for the Factory of the Future, for IT security, health (patient networks) and education (student / teacher networks).
The team also coordinates the BICS (Bio-Informatics and Complex Systems) platform of the ICUBE laboratory, offering the community a unique portal to databases and software for bioinformatics (BISTRO), data mining (ClowdFlows), massively parallel computation (EASEA CLOUD) and education (POEM).
Internationally, the team coordinates (with the Havre University) the UniTwin CS-DC of the UNESCO: a Digital Campus Complex Systems with more than 120 universities (>3 million students, >3000 researchers in 28 countries).
Previous team organisation 2013-2015
LBGI Bioinformatics and Integrating Genomics, led by Olivier Poch and Julie Thompson, focuses on a thriving field of research in the field of health: translational bioinformatics. Our main objective is to develop a robust IT infrastructure capable of managing big data in order to extract relevant knowledge in a "bed-patient" approach. In this context, we are particularly interested in the study of rare genetic diseases and the understanding of the pathophysiological mechanisms involved in these diseases, which often have a potential interest in understanding altered biological processes in more common diseases, such as obesity, diabetes or cancer....
The LBGI is devoted to the development of robust, automated and integrated in silico approaches (analytical approaches, statistics, data integration and mining, extraction and representation of knowledge...) in order to study the evolution and behavior of complex biological systems ("Hyperstructures", networks, etc.) in humans and various animal models. Taking advantage of our integrated IT approaches and long-standing collaborations at the international, national and local levels, the LBGI participates in the analysis of complex systems involved in various human diseases, including the study of functional deficiencies related to retinal diseases or the brain, the identification of genetic variations related to ciliopathies and the characterization of the genomic and transcriptomic context in various cancers.
The work of the LBGI is organized around two main complementary axes:
- "Translational IT" (Julie Thompson), to develop an IT infrastructure dedicated to the integrated analysis of the "big data" resulting from high-throughput studies of human genetic diseases. This includes the design and development of original data management systems (storage, quality control, heterogeneous data integration) and analysis tools dedicated to data mining and extraction of biomedical knowledge. An important aspect is the development of intuitive user interfaces to facilitate access by biologists and clinicians.
- "Systems bioinformatics" (Olivier Poch/Odile Lecompte), to develop research in the emerging field of the analysis of complex biological systems, in order to understand genotype-phenotype relationships and to anwser questions related to human diseases. This includes integrated studies of evolutive, "omics" and patient data, particularly those concerning ciliopathies, and the development of a systemic approach to the relationships between mutations and biological networks in diseases.
La thématique SONIC (Stochastic Optimisation and Nature Inspired Computing), led by Pierre Collet, studies and uses techniques to tackle complex problems that are insoluble by exact methods. Nature-inspired methods are privileged for their robustness and their very good exploration of the search space. The team uses mainly:
- evolutionary algorithms, including :
- genetic algorithms (applied to discrete and combinatory problems),
- evolutionary strategies (applied to continuous problems),
- genetic programming (applied to learning and data mining problems),
- multi-objective evolutionary optimisation (for all industrial problems that need to optimize several antagonistic criteria at the same time),
- optimisation by ant colonies,
- emerging approaches (BOIDS, optimisation by particle swarms).
The team is currently at the highest international level in the use of massively parallel graphics cards (GPGPU) for scientific computation by artificial evolution and for artificial intelligence, being the first to obtain accelerations of about three orders of magnitude compared to a modern CPU core on generic optimization problems with the EAsy Specification of Evolutionary Algorithms platform . Typically, a calculation for one day on a computer with multiple GPU cards becomes equivalent to several years of computation on a modern PC-compatible computer, which makes it possible to tackle problems that cannot be addressed by other techniques.
The goal is ambitious: it consists of implementing a true artificial intelligence that is competitive with human intelligence on a PC type computer equipped with several graphic cards. There are two main types of projects: fundamental projects dealing with the adaptation of evolutionary algorithms to the characteristics of these new cards, and applied projects that test the developed algorithms on real problems, which are often very different from toy problems like benchmarks.
The theoretical bioinformatics theme has been led by Christian Michel for more than 30 years. (i) Combinatorial study of circular codes (C. Michel) Scientific context: Circular codes were discovered in genes in 1996. These sets of words are very poorly understood from a mathematical point of view. Results: A new concept in the so-called "pearl" code theory allows us to describe varieties of commas-free codes and circular codes. Strong circular codes are identified, that are more constrained than comma-free codes . Recently (2016), an approach by graph theory was used to obtain new theorems on the circular codes formed by words of finite length on a finite alphabet.
(ii) Probabilistic models of gene evolution by substitution of genetic motifs (E. Benard, C. Michel) Scientific Context: Classical nucleotide evolution models (Jukes and Cantor, 1969; Kimura, 1980, 1981) are generalized to finite-size genetic patterns. Results: Using a mathematical approach based on the Kronecker operators (product and sum), these extended models allow us to determine the probability of exact occurrence (analytic solution) of a genetic motif of any size (dinucleotides, trinucleotides, etc.). ) over time as a function of substitution parameters (transitions and transversions) associated with each site of the studied patterns. Evolution can be oriented in the direct sense (from the past to the present) and vice versa (from the present to the past). The introduction of these Kronecker operators made it possible to resolve this theory of probabilistic models of gene evolution by substitution of genetic motifs that had been open since 1990.
(iii) Probabilistic models of gene evolution by substitution, insertion and deletion of genetic motifs (S. Lèbre, C. Michel) Scientific Context: There are very few probabilistic models of gene evolution involving nucleotide substitution, insertion and deletion processes. One of the reasons for this lies in the mathematical difficulty, from a modeling point of view, but also in the determination of analytical solutions. Results: We developed a more general class of evolution models in which the insertion and deletion parameters are explicit parameters independent of the substitution parameters. The idea is based on the introduction of a concept derived from population dynamics to obtain a system of differential equations combining the classical substitution process with the insertion / deletion process. By deriving a general solution verified for any diagonalizable substitution matrix, we obtained an analytic expression of the probability of occurrence of the nucleotides as a function of time, the eigenvalues and eigenvectors of the substitution matrix, the vector of the insertion rates of Nucleotides, the total insertion rate and the vector of the initial probabilities of the nucleotides. The analytic solutions are nontrivial with Gaussian hypergeometric functions and Kronecker operators (product and sum). Various mathematical properties were obtained: time scale, time decomposition, time inversion and time transformation as a function of the length of the sequence.
(iv) Stochastic models for the inference of genetic networks (S. Lèbre) Stochastic approaches focus on the reconstruction of genetic regulation networks. We have developed the ARTIVA (Auto Regressive TIme VArying) network model which has the particularity of proposing a variable dependency structure over time for continuous data. A Monte Carlo Method using Markov Chains (MCMC) with reversible jumps has been specifically adapted for the inference of this model from time series of gene expression. We then refined the model by introducing an exchange of information between the successive structures of the network. Following the recent transfer of Sophie Lèbre to the University of Montpellier, this research theme has been stopped.