| Literature DB >> 22355706 |
Vladimir Trifonov1, Laura Pasqualucci, Riccardo Dalla-Favera, Raul Rabadan.
Abstract
Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing (HTS), expression profiles, proteomics, and electronic health records (EHR) are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of fractal-like distributions that commonly appear in the analysis of such data. The first set of examples are drawn from a HTS experiment. Here, the distributions appear as part of the evaluation of the error rate of the sequencing and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in EHR. The distributions are also relevant to identification of subclonal populations in tumors and the study of quasi-species and intrahost diversity of viral populations.Entities:
Mesh:
Year: 2011 PMID: 22355706 PMCID: PMC3240948 DOI: 10.1038/srep00191
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Thomae's function, a self-similar function over the rational numbers in the unit interval (top left).
The human genome is diploid with two strands per chromosome. The reads covering a position of the genome can originate from each of the four strands (top right). For every position, the ratio between the number of reads from one of the strands to the total number of reads from the chromosome and the ratio between the number of reads from the chromosome to the total number of reads covering the position are rational numbers. The distribution of each of these ratios follows a self-similar distribution (bottom).
Figure 2Coverage in the cancer sequencing experiment (top).
Coverage of the two copies of the cancer genome (bottom left). Coverage of the two strands of a fixed copy of the cancer genome (bottom right).
Figure 3Comparing the co-morbidity of various conditions with the 2009 H1N1 pandemic versus seasonal influenza.