| Literature DB >> 33286713 |
Abstract
The gene is a fundamental concept of genetics, which emerged with the Mendelian paradigm of heredity at the beginning of the 20th century. However, the concept has since diversified. Somewhat different narratives and models of the gene developed in several sub-disciplines of genetics, that is in classical genetics, population genetics, molecular genetics, genomics, and, recently, also, in systems genetics. Here, I ask how the diversity of the concept impacts data-integration and data-mining strategies for bioinformatics, genomics, statistical genetics, and data science. I also consider theoretical background of the concept of the gene in the ideas of empiricism and experimentalism, as well as reductionist and anti-reductionist narratives on the concept. Finally, a few strategies of analysis from published examples of data-mining projects are discussed. Moreover, the examples are re-interpreted in the light of the theoretical material. I argue that the choice of an optimal level of abstraction for the gene is vital for a successful genome analysis.Entities:
Keywords: anti-reductionism; data-mining; experimentalism; gene concept; reductionism; scientific method
Year: 2020 PMID: 33286713 PMCID: PMC7597212 DOI: 10.3390/e22090942
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Several modeling approaches have been applied to the gene concept.
| Gene Concept | Modeling Approach | Empirical Evidence |
|---|---|---|
| Mendel’s cellular elements. | A black box model. The gene has certain functions, but nothing is known about the components of the genic mechanism, or how they interact. | Influenced by Darwinian natural history and associated 19th century evolutionary debates. Mendel performed experiments, but might selectively reported them [ |
| The gene of classical genetics, and the chromosomal theory of heredity. | A sketch (i.e., a semi-transparent glass box). Mechanistic details are fuzzy, but the gene has several well-defined and empirically proven properties. | Initially, the evidence came from field experiments in plant genetics. Increasingly, there were also experiments in animal genetics for which specialized model systems were set up in laboratories (such as Morgan’s fruit fly model). |
| The gene of population genetics. | Mathematical equations. Statistical analyses. Probability theory. | The observations of genetic variability in natural or artificial populations. Some experiments, especially in the context of artificial evolution. |
| The molecular gene. | A transparent glass box. | Experiments and genetic engineering focusing on simplest organisms, to establish the basic principles. Next, the basic principles were extended to other species. |
| The gene of genomics. | A hierarchy of domains and functional units. | High-throughput screens from the surveys of populations, or from experimental groups. |
| The evolutionary gene. | The model includes information on gene’s evolutionary history, in particular on the pattern of duplications and speciation events. Data-mining strategies can be anti-reductionist, for example genes can be grouped and analyzed as gene families. | Morphological or sequence characters; gene and protein sequences can be aligned and phylogenetics trees can be constructed. |
| The virtual gene. | Computational data integration; data storage in relational databases. Data-mining strategies can be anti-reductionist, for example genes can be grouped and analyzed as pathways or networks. | Many kinds of empirical data can integrated using bioinformatics pipelines and databases. Data can be browsed using genome browsers, or data-mined using statistical tools and visualization. |
Examples of data-mining of genomics datasets. Examples of reductionist and anti-reductionist data-mining strategies in genomics are given. Analytical strategy and the focus of the analysis are specified, as well as the type of evidence and the main result.
| Strategy | Focus. | Main Result | Reference |
|---|---|---|---|
|
| Promoter | A correlation between the size of promoter architecture and the breadth of expression was detected. | [ |
| Gene | First, an integrative data-mining procedure was used to clone the most endothelial-specific genes. The procedure was combined with experimental verification. | [ | |
|
| Gene family | The analysis of the gene family of roundabouts suggested that magic roundabout (ROBO4) is an endothelial-specific ohnolog of ROBO1. ROBO4 neo-functionalized to an essential new role in angiogenesis | [ |
| Signaling pathway | The evolution of the TGF-beta signaling pathway in the animal kingdom was analyzed. The components of the pathway were found to have emerged with the first animals, and diversified in vertebrates. | [ | |
| Signaling network | 2R-WGD was found to have remodeled the signaling network of vertebrates. This macro-mutation facilitated the evolution of key vertebrate evolutionary novelties and environmental adaptations. | [ |