Literature DB >> 27231054

Lineage-specific genomics: Frequent birth and death in the human genome: The human genome contains many lineage-specific elements created by both sequence and functional turnover.

Robert S Young1.   

Abstract

Frequent evolutionary birth and death events have created a large quantity of biologically important, lineage-specific DNA within mammalian genomes. The birth and death of DNA sequences is so frequent that the total number of these insertions and deletions in the human population remains unknown, although there are differences between these groups, e.g. transposable elements contribute predominantly to sequence insertion. Functional turnover - where the activity of a locus is specific to one lineage, but the underlying DNA remains conserved - can also drive birth and death. However, this does not appear to be a major driver of divergent transcriptional regulation. Both sequence and functional turnover have contributed to the birth and death of thousands of functional promoters in the human and mouse genomes. These findings reveal the pervasive nature of evolutionary birth and death and suggest that lineage-specific regions may play an important but previously underappreciated role in human biology and disease.
© 2016 The Authors BioEssays Published by WILEY Periodicals, Inc.

Entities:  

Keywords:  enhancers; evolution; indels; lineage-specific biology; promoters; transcriptional regulation

Mesh:

Substances:

Year:  2016        PMID: 27231054      PMCID: PMC4949557          DOI: 10.1002/bies.201500192

Source DB:  PubMed          Journal:  Bioessays        ISSN: 0265-9247            Impact factor:   4.345


cap analysis of gene expression chromatin immunoprecipitation DNaseI hypersensitive site acetylated histone H3 lysine 27 dimethylated histone H3 lysine 4 trimethylated histone H3 lysine 4 long non‐coding RNA long terminal repeat microRNA transcription factor

Introduction

The large and varied diversity observed between individuals and across species is reflected in high levels of genetic diversity. The study of this diversity between mammalian species has been possible on a genome‐wide scale since the publication of the first complete drafts of the human and mouse genomes in 2001 1 and 2002 2, respectively. The subsequent emergence of next‐generation sequencing technologies 3 has led to an explosion of whole‐genome sequencing, such that public databases now host many mammalian genomes, and 39 of these can currently be directed viewed and compared through the Ensembl genome browser (www.ensembl.org) 4. The first personal genome sequence was only published in 2007 5, but has now been joined by a number of studies, including the 1,000 Genomes Project which sequenced over 1,000 individual genomes 6 and those from deCODE Genetics which sequenced over 2,500 individual genomes from the Icelandic population 7. This wealth of data has stimulated the field of comparative genomics, which investigates both the similarities and differences between genomes. Much early work focussed on identifying shared features between sequenced genomes and restricted itself to the small proportion of the genome which encodes for protein‐coding genes. Many of these genes have been deeply conserved throughout evolution from yeast to human, the lineages of which diverged approximately one billion years ago 8. The number of protein‐coding genes found in vertebrate species is relatively constant and, unexpectedly, does not appear to correlate with our assumptions regarding organismal complexity 9. Not all protein‐coding genes, however, are evolutionarily ancient. For example, C20orf203 is found only in the human genome, and is absent from closely related primates. This gene is highly expressed in the brain, and is further upregulated in Alzheimer's disease, which suggests a potential role for this lineage‐specific gene in the development of the disease 10. There are now several reports of genes that have been born and died in various species, through a variety of mechanisms (fully reviewed by Kaessmann 11). Comparative genomics has also been applied to studying the remaining, non‐coding, regions of the genome, which make up almost 99% of the genome 12 and contain a wealth of transcriptional and regulatory elements. MicroRNAs (miRNAs) – short, approximately 22 nt long, non‐coding RNA genes primarily involved in negative regulation of protein‐coding genes 13 – are often deeply conserved across a range of divergent species 14. Long non‐coding RNAs (lncRNAs) are a relatively unstudied class of non‐coding transcripts that are over 200 bp long 15. lncRNAs show modest evolutionary constraint, which has been interpreted as indicating that these sequences have been conserved across species because they encode a biological function 16, 17, 18. Similarly, non‐coding regulatory elements such as enhancers, which positively regulate gene expression at a distance 19, have been computationally predicted in regions which show increased evolutionary conservation across species. This approach has been demonstrated to have a 45% success rate in predicting enhancers using comparative genomics alone 20. As for protein‐coding genes, there are also corresponding examples of non‐coding RNA genes that have been emerged during evolution, such as the mouse‐specific lncRNA Poldi. This lncRNA is restricted to the post‐meiotic cells of the testis, and promotes sperm motility and testis development 21. Similarly, the miRNA miR‐941 – which was born in the human lineage between one and six million years ago from the expansion of an evolutionarily unstable tandem repeat sequence – was recently discovered to be important for neurotransmitter signalling in the brain 22. There are a handful of known examples of enhancer birth and death in the human genome. For example, the non‐coding element HACNS1 has evolved rapidly in humans and only the human sequence, but not the orthologous primate sequence, is able to function as a limb enhancer in mouse reporter experiments 23. Alternatively, an enhancer for the AR gene, which is conserved across most mammalian species has been deleted from the human genome. The activity of this enhancer is correlated with the formation of whiskers and penile spines in non‐human mammals, and it has been speculated that this loss in humans may be linked to increased monogamous reproductive strategies relative to other primates 24. On a genome‐wide scale, non‐coding elements are less conserved between mammalian species than protein‐coding genes 25, 26. This has led many to speculate that organismal diversity is not in fact driven by changes to the protein‐coding gene set, but by divergence in the regulatory mechanisms responsible for controlling their expression 27. Due to their increased volatility, much recent work has therefore focussed on the birth and death of such non‐coding, regulatory elements within the human genome. In this essay, I will first discuss sequence turnover, where sequence is either inserted (born) or deleted (dies) along a lineage. I will then describe analyses of functional elements that have been identified through experimental profiling (Box 1) and were subsequently shown to completely turn over between lineages despite conservation of the underlying DNA sequence (functional turnover). Where such profiling has been done in multiple species, it is possible to further define functional turnovers as gains and losses down individual lineages and throughout this essay I will also refer to these as birth and death events, respectively. Finally, I will examine those studies that have considered both the birth and death of sequence and function in the same experimental system. I will show that the birth and death of entire regulatory elements are frequent occurrences within the human genome, and will suggest that future research is likely to focus on both the transcriptional regulatory and phenotypic consequences of these events to normal and perhaps also pathogenic human diversity. Chromatin immunoprecipitation followed by sequencing (ChIP‐seq): Identifies the locations of histone modifications or the sites of a DNA‐binding protein, such as a transcription factor, by high‐throughput sequencing of DNA pulled down with an antibody specific to the modification or protein of interest. DNase 1 hypersensitivity sites sequencing (DNase‐seq): Discovers all classes of active regulatory elements by digesting accessible chromatin which is not packaged into nucleosomes, followed by high‐throughput DNA sequencing. Cap analysis of gene expression (CAGE): Clones the 5′ ends of transcribed mRNA molecules and then subjects them to high‐throughput sequencing to precisely identify the sites of transcription initiation at promoters and other transcribed elements, such as enhancers.

Sequence turnover is common in the human genome

The insertion or deletion of sequence along one of the lineages that separates two species (collectively known as ‘indels’) results in gaps in the sequence alignments which describe the relationship between orthologous sequences within the two genomes being compared. Insertions can be discriminated from deletions by comparing the sequences of three or more genomes simultaneously. The principle of parsimony makes the assumption that the most likely evolutionary history for a set of related sequences is the scenario that can be explained by the minimal number of mutations. In this way, a sequence is defined as having been deleted if it is present in the outgroup species, while an inserted sequence will be absent from this species (Fig. 1). This type of analysis also allows one to identify the lineage along which the insertion or deletion has taken place. Most alignment tools and analysis programs, however, treat these gaps as missing data.
Figure 1

By comparison to a third outgroup species, the lineage along which a mutation took place can be identified and an alignment gap (indel) can be classified as an insertion or deletion. In this example, a gap in the orthologous sequence in both the mouse and pig genomes reveals that novel sequence has been inserted in the human genome – as shown by the blue triangle – and that there has been a birth of sequence on the human lineage. Conversely, if there is no gap in the orthologous pig sequence, then a sequence death (pink triangle) is inferred to have taken place on the mouse lineage.

By comparison to a third outgroup species, the lineage along which a mutation took place can be identified and an alignment gap (indel) can be classified as an insertion or deletion. In this example, a gap in the orthologous sequence in both the mouse and pig genomes reveals that novel sequence has been inserted in the human genome – as shown by the blue triangle – and that there has been a birth of sequence on the human lineage. Conversely, if there is no gap in the orthologous pig sequence, then a sequence death (pink triangle) is inferred to have taken place on the mouse lineage. Within the human population, there are likely to be many millions of polymorphic indels which are found in some, but not all, individuals. A study of 79 diverse human genomes reported almost two million small indels 28. However, the limited overlap between this and other studies suggests that this is an under‐estimation of the total number of indels segregating within the human population, and that there are many indels yet to be discovered 28. Longer regions of sequence that has either been inserted or deleted within an individual genome – known as structural variants and generally defined as longer than 1 kb – are less common, and only approximately 20,000 have so far been detected by the 1,000 Genomes Project 6. Within individuals, these variants of different lengths have been found to be associated with differences in gene expression 29, 30. Polymorphic indels disrupt the coding sequence of over 6% of annotated human genes, but 72% of genes contain an intronic indel 31. The reduced frequency of evolutionarily conserved, functional material within intronic sequences 32 implies that these coding sequence‐disrupting indels are likely to confer a substantial genetic load. Many deletions are shared across populations, and have been present since humans migrated out of Africa 33. The lower average frequency of deletions relative to insertions in the population suggests that sequence loss is more damaging than the birth of new sequence 34. The distribution of indels throughout the genome has been used to quantify the amount of functional, but lineage‐specific, sequence within the genome. This model assumes that indels occur randomly within the genome and that unexpectedly large distances between indels therefore contain sequence which is preserved by natural selection 35, presumably because this sequence conveys a biological function as yet unknown. By comparing the quantity of sequence defined to be functional using this metric from a range of different pairs of species alignments, the authors determined that this quantity rapidly decreases as the evolutionary distance between the species being compared increased 36. This implies that most functional material is conserved only within a narrow range of related species, and that there must be a rapid turnover of functional sequence and a large quantity of lineage‐specific sequence within mammalian genomes. This rate of sequence turnover is not constant between genomes, and appears to be higher along the mouse lineage, where sequence is preferentially deleted at a particularly high rate 37, 38. The vast majority of this evolutionary volatile sequence is found outside protein‐coding gene borders, and it has been predicted that 110–143 Mb (50%) of functional non‐coding DNA sequence within the human genome has turned over in the last 130 million years 39.

What mechanisms drive sequence birth and death in the genome?

There are a number of molecular mutations that insert or delete sequence in the genome. Transposable elements, which are capable of jumping around the genome, make up approximately half of the human genome 40 and are divided into two major classes. Retrotransposons duplicate via an RNA intermediate (Fig. 2A) before the new copy is re‐integrated into the genome at a distant site. Retrotransposition does not typically include the copying or movement of intronic and surrounding regulatory DNA. The other class of DNA transposons use a cut‐and‐paste mechanism (Fig. 2B) in which the entire DNA sequence is excised and then re‐integrated into the genome. Repetitive elements, and particularly retrotransposons, are enriched at both species‐specific enhancers and gene promoters 41. Promoters, which are the site of RNA polymerase II complex assembly and transcription initiation 42, are enriched for repetitive elements only at sequences that have been inserted, rather than deleted, in both the human and mouse genomes 43. One class of transposons, known as long terminal repeats (LTRs), is particularly common at tissue‐restricted promoters, which is consistent with the previously reported role for LTRs in driving such a limited expression profile 44, 45. Despite the association between simple repetitive elements and sequence deletion 46, no such relationship was found between repetitive sequences and promoters which have been deleted along either the human or mouse lineages 43. Instead, simple repeats were found to be enriched at newly inserted promoters that are broadly expressed, but it remains unknown which types of this family of repeats are responsible or the manner in which they drive widespread expression across tissues.
Figure 2

DNA sequences can be inserted and deleted by various mechanisms. Retrotransposons replicate via an RNA intermediate (A) while other transposable elements use a cut‐and‐paste mechanism to integrate the DNA sequence at a distant site (B). Unequal crossing‐over during cell division (C) and replication slippage (D) can also result in the birth and death of DNA sequences.

DNA sequences can be inserted and deleted by various mechanisms. Retrotransposons replicate via an RNA intermediate (A) while other transposable elements use a cut‐and‐paste mechanism to integrate the DNA sequence at a distant site (B). Unequal crossing‐over during cell division (C) and replication slippage (D) can also result in the birth and death of DNA sequences. Sequence can also be inserted or deleted from the genome through the activity of normal cellular processes, such as recombination and replication. Unequal crossing‐over takes place when non‐homologous regions are paired during cell division. This can result in one chromosome gaining sequence and the other losing the same sequence (Fig. 2C), but this exchange of sequence need not be reciprocal as shown here. Indeed, a comparative analysis of the human, chimp and macaque genomes has suggested that recombination is more associated with the gain of sequence than sequence loss in the human genome 47. DNA replication can also create indels through replication slippage (Fig. 2D), if the DNA polymerase skips over a region (known as ‘forward slippage’) to remove sequence or skips back to replicate a region twice (known as ‘backward slippage’), resulting in the insertion of a second copy of the sequence. These replication errors are most frequent at regions containing nearby tandem duplications and, although it has been suggested that they are responsible for most births of recently arisen short insertions in the human lineage 48, replicative errors are actually more likely to be associated with the loss of sequence across the entire genome 47. Despite reports of sequence gain and loss within the human genome, their accurate discovery remains a difficult task, requiring the development of specialised computational pipelines 49. The two commonly used genome‐wide alignments have been built with different methods – the Ensembl EPO pipeline 50 builds alignments and reconstructs candidate ancestral genome sequences across multiple species simultaneously while the UCSC BLASTZ alignments 51 are generated from small, local sequence alignments of two species which are then extended into larger blocks of related DNA sequences. Multi‐species alignments are then build separately using these pairwise alignments. The difference in these approaches results in substantial differences in the amount of aligning sequence, e.g. UCSC aligns 1.0 Gb (33%) of the human genome to mouse while Ensembl aligns only 820 Mb (26%), and similar discrepancies in the amount of sequence which is estimated to have been gained or lost within the human genome. Furthermore, progressive alignment algorithms incur a greater penalty when creating insertions rather than deletions 52, which hinders robust discrimination of these separate classes of mutations. Further improvements in our ability to identify regions which are inserted and deleted within whole‐genome alignments and the human population are likely to take account of the different mechanisms, and sometimes complicated, evolutionary histories which generate these events. Our increased knowledge of driving forces between these events should improve our ability to predict and accurately detect when an insertion or deletion has truly taken place, rather than as now solely defining them as positions within the genome where alignment pipelines fail to identify orthologous sequences.

Tissue‐restricted regulatory elements show frequent functional turnover

Genome sequencing projects have been followed by a second wave of functional genomics studies, as exemplified by the work of the ENCODE consortium 53. Functional genomics combines experimental techniques with advances in DNA sequencing to investigate the functional role of genes and other regulatory DNA sequences throughout the genome (see Box 1). Large‐scale functional turnover of both transcription factor (TF) binding and promoter locations have been reported. A comparison of four liver‐specific TFs (FOXA2, HNF1A, HNF4A and HNF6) in human and mouse revealed that 41–89% of their binding locations within aligning sequence were found in only one of these two species 54, implying a substantial rate of functional turnover. This suggests that there are many births and deaths of these binding sites along the two lineages, but this could not be confirmed from the data published in this study (Fig. 1). This high rate of TF binding turnover in the liver takes place across much of the animal clade (Fig. 3A) 26 and can even be detected between individual rodent lineages, suggesting that these turnover events are evolutionarily very rapid 55.
Figure 3

A: TF binding at the PCK1 locus in the livers of five vertebrates (human, mouse, dog, monodelphis, chicken). A conserved binding site is highlighted by the green box, and a human‐inserted site by the blue box. Other binding sites show more complicated evolutionary histories. Reproduced with permission from Schmidt et al. 26. B: Compensatory turnover of TF binding sites. The locus retains TF binding and a similar transcriptional response in both species, but each TF binding site has turned over.

A: TF binding at the PCK1 locus in the livers of five vertebrates (human, mouse, dog, monodelphis, chicken). A conserved binding site is highlighted by the green box, and a human‐inserted site by the blue box. Other binding sites show more complicated evolutionary histories. Reproduced with permission from Schmidt et al. 26. B: Compensatory turnover of TF binding sites. The locus retains TF binding and a similar transcriptional response in both species, but each TF binding site has turned over. Complete functional turnover of genetic elements, such as enhancers and promoters, is also frequent between mammalian species. While turnover is less prevalent in cis‐regulatory modules that contain multiple TFs bound to the same locus 55, only 279 (less than 1%) of enhancers active in the liver alone are conserved across 10 placental genomes 41. Promoters defined epigenetically by the presence of trimethylated histone H3 lysine 4 (H3K4me3) in the same system seem to be less susceptible than enhancers to this type of functional turnover 41, but this may not reflect the true turnover rate as promoters defined by their transcriptional output using CAGE turn over more frequently 56 (see also below ‘Both sequence and functional turnover contribute to the birth and death of functional promoters’ for a more detailed discussion where promoter turnovers were polarised into births and deaths). There may be further differences between promoter and TF‐binding site functional turnover because these events at promoters are often accompanied by changes to the underlying sequence 43 while binding site locations within rodents can turn over without changes to the underlying DNA sequence 55. It remains unclear what mechanisms, such as the lack of cooperative binding partners or a compaction of the local chromatin state, are responsible for driving these TF turnover events. The function of DNA‐binding factors or the length of their DNA‐binding motif may also be related to their evolutionary volatility. For example, the binding locations of the insulator protein CTCF – which also has an unusually long binding motif – are much more conserved between mammalian species than most transcription factors, and therefore less likely to be gained or lost between mammalian species 57. Fifteen per cent (5,178/33,966) of alignable binding sites in human are also present in each of macaque, mouse, rat and dog 58. Unlike TFs which often possess tissue‐specific roles in regulating gene expression, CTCF binding sites are largely consistent across tissues 59. CTCF is important for regulating the three‐dimensional structure of the genome, e.g. by insulating transcriptionally active from inactive regions 60 and it also demarcates the borders of DNA sequences which are anchored to the nuclear periphery 61. Like individual CTCF binding sites, this structural role for CTCF appears to be conserved as is the higher‐order genome structure which it regulates 62. Both the mouse ENCODE 63 and FANTOM5 64 collaborative projects have carried out comparative functional genomics analyses across a range of tissues, and confirmed this rapid functional turnover throughout the human and mouse genomes. Tissue‐restricted elements are more susceptible to turning over, perhaps due to the increased functional constraints on pleiotropic elements that are active across tissues, and the immune system and testis appear to be the tissues throughout the body with the greatest rates of turnover 43. While it is likely that the many of the changes observed within immune cells are driven by positive natural selection to avoid host pathogenicity 65, it is currently unclear to what extent sexual selection and the locally elevated mutation rate at active sites in the germ cells 43, 66, 67, 68 contribute to the functional element turnover within the testis.

The transcriptional and phenotypic consequences of functional turnover remain unclear

There is evidence that the turnover of an individual binding site can be compensated by the birth of a binding site at a second site within a gene locus (Fig. 3B). For example, approximately 25% of species‐specific TF binding site functional losses in the liver were mirrored by the gain of a separate, species‐specific, binding site within 10 kb 26. Similarly, while 53% of genes targeted in an OCT4 knockdown in embryonic stem cells in both human and mouse contained nearby OCT4‐NANOG binding, only 15% of these binding sites were found at the same position in both species 57. These observations suggest that, while the rapid evolutionary turnover of TF binding sites may be driven by a high mutation rate at these sites 68, this may be matched by a strong selective pressure to prevent a subsequent divergence in the transcriptional output regulated by these factors 69. Furthermore, the trans environment within the cell is more conserved than the individual elements themselves, as TF‐to‐TF interactions 70 and TF network topologies 71 are similar between human and mouse. These results are consistent with the independent observation that human chromosome 21, when inserted into mouse hepatocytes, behaves in largely the same manner as the human chromosome when in human cells. This again suggests that species‐specific binding is due to changes at binding sites themselves but that the same trans‐acting TFs can still drive DNA binding in both species 72. There is little evidence that the majority of these lineage‐specific elements, although defined by their functional activity, are directly involved in transcriptional regulation. It is known that not all TF binding sites have a direct effect on gene expression 73, 74. Binding sites that are identified in the liver and are conserved in multiple mammalian species show greater functional enrichments (e.g. disease ontology annotations), and are found near genes with a higher expression than species‐specific binding sites 75, suggesting that those sites that have been gained or lost down individual lineages are less likely to be functionally important in positively driving gene expression. Furthermore, gene expression and nearby TF binding divergence in the liver do not appear to be generally correlated within closely related mouse species 69. However, these conclusions are contradicted by the observation that mouse‐ and human‐specific binding of the glucocorticoid receptor in macrophages were associated with species‐specific upregulation of neighbouring genes upon glucocorticoid stimulation 76. Further studies of this type are required to determine whether the liver or the macrophage is the more representative system. Population genetics studies also support the argument that few regulatory elements that have been born along the human lineage possess a biological function. Although enriched for disease‐ and trait‐associated variants, the nucleotide diversity for human‐specific DHSs is relatively high and comparable to that of fourfold‐degenerate sites in exonic sequence, which further suggests that DHSs possess a relatively limited proportion of functional sequence 77. These elements do, however, experience at least some purifying selection, as indicated by their reduced diversity relative to sequences defined biochemically to be inactive 78. The ultimate test of functionality of lineage‐specific regions is to disrupt them to determine their biological role. Whether, as predicted computationally 36, these elements are frequently responsible for a phenotype has yet to be tested in a systematic manner.

Functional births and deaths of regulatory elements may be associated with expression changes at nearby genes

With data from only two species, these studies are largely limited to describing turnover events, and matched data from a third species is required to discriminate functional births from deaths along individual lineages (Fig. 1). Shibata et al. 79 measured DNaseI hypersensitive sites (DHSs) in human, chimp and macaque fibroblasts and identified hundreds of gains and losses along both the human and chimp lineages. DHSs that were born along each lineage were associated with up‐regulation of nearby target genes, while DHSs which died were associated with the concomitant down‐regulation of nearby genes. However, most differential gene expression could not be explained by the simple gain or loss of DHS sites. Both gained and lost DHSs were more likely to be experiencing positive selection specifically along the lineage in which they had been gained and lost, respectively. Active enhancers and promoters were similarly identified in embryonic limbs from human, macaque and mouse by profiling the location of acetylated histone H3 lysine 27 (H3K27ac) 80. Promoters identified using this data have gained activity along the human lineage more rapidly than enhancers (13 vs. 11%), but the vast majority of both classes of elements are gained through the co‐option (exaptation) of existing sequence rather than insertion of novel sequence 80. A similar study in the same species mapped two epigenetic marks (dimethylated histone H3 lysine 4, H3K4me2) and H3K27ac during human, macaque and mouse corticogenesis to confirm a high rate of human‐specific promoter and enhancer birth 81. These human‐specific elements were frequently found at, or near to, genes important for cortical development, suggesting that they may play important roles in regulating human‐specific aspects of this important biological process. A collection of histone modifications and protein‐binding sites have also been profiled in matched human, mouse and pig pluripotent stem cells, where divergence in the intensity of these binding factors at gene promoters is correlated with gene expression divergence 82. However, these authors did not explicitly examine functional birth and death of these elements between the lineages studied. These three‐species experiments also differ from those mentioned above in their methodology for detecting lineage‐specific elements. Those studies described above which focused on liver‐specific transcription factors identified binding regions in each species independently and then defined lineage‐specific regions as those in orthologous regions for which no binding peak had been discovered in other species. The description of an individual region as being lineage‐specific is dependent on the genome‐wide alignments used to identify orthology as these show clear discrepancies in the amount of sequence which can be aligned between species (see ‘What mechanisms drive sequence birth and death in the genome?’ above). The degree of overlap required to identify orthologous regions also affects the detection of functional turnover events. Some studies consider a single 1 bp overlap between regions as sufficient to define them as being conserved while others have required at least a 50% overlap in reciprocal comparisons between species 41, which will reduce the number of lineage‐specific regions that can be identified from the same data. In contrast, these studies describing functional genomics data from other tissues 79, 80, 81, combine these alignments with statistical methods, such as edgeR 83, to detect lineage‐specific regions as those orthologous regions which also show differential levels of histone modifications or chromatin accessibility between species. This approach does not depend on calling peaks in all species and will therefore account for regions with evidence for binding that just misses the threshold for calling a peak as significant within one of the related species. The use of a statistical framework also makes it possible to quantitatively measure the confidence in a single region being truly lineage‐specific and how these regions differ from those identified in the same system which show binding in all species, albeit at significantly different levels. Despite being dependent on replicated functional genomics datasets to make these statistical assessments, these more complex approaches, using more than simple genomic overlaps, will likely be considered the more robust approach to detect functional turnover in future.

Both sequence and functional turnover contribute to the birth and death of functional promoters

While both mechanisms of birth and death in the genome – sequence and functional turnover – are clearly important contributors to lineage‐specific genomics, it is only recently that they have been explicitly investigated simultaneously in the same experimental system. The FANTOM5 project, which identified promoter locations across a range of matched human and mouse cell lines and tissues, described the half‐life of promoters when aligned to increasingly divergent species 64. Evolutionary history varied with both expression profile and promoter class, where broadly expressed protein‐coding promoters and tissue‐restricted ncRNA promoters were more deeply conserved. These patterns have been similarly observed within aligning exonic sequence in both protein‐coding genes in human 84 and lncRNAs in Drosophila 18. The sequences of a large number of promoters have been born or died along the human lineage (conservatively 2,472 and 2,818, respectively), since its divergence with mouse 43. As seen for regulatory elements within the ENCODE datasets, the gain and loss of promoters is enriched within immune cells and the testes and brain‐biased promoters were less likely to show either type of sequence turnover. Genes that experienced at least one of these turnover events were enriched for evidence of positive selection acting on their coding sequence, suggesting promoter turnover may be related to adaptive evolution throughout the encoded protein, and not just at the turnover site 79. However, within the human population, both inserted and deleted promoters showed no evidence of either positive or purifying selection, suggesting that, as for the species‐specific TF binding sites described above, many of these may not be phenotypically relevant. Many promoters whose sequence has been conserved between human and mouse have experienced functional turnover (22 and 13% of aligned promoters in human and mouse, respectively), as they show no detectable evidence of transcription in the opposing species. These species‐specific promoters are specifically associated with decreased evolutionary constraint at the promoter elements 43. Similar levels of evolutionary constraint were seen at promoters with matched, divergent or reduced expression between species, suggesting that differences in transcriptional output were not driven by sequence changes at the promoter or at cis‐regulatory elements found at a constant distance from the promoter. This contrasts with the inverse correlation seen between expression and substitution rate divergence at promoters activated in lipopolysaccharide‐stimulated macrophages 85. Whether these differences are specific to the macrophage timecourse profiled here, or are a general feature of stress‐response genes remains unclear. While sequence gain and loss are clearly important factors in promoter evolution along the mouse and human lineages, the lack of well‐matched data across more species remains the limiting factor for resolving the large number of functional turnovers into births and deaths at aligned sequence. As shown in Fig. 4, beyond the liver, the number of mammalian genomes that has been sequenced outnumbers the number of tissues that have been comprehensively profiled experimentally in multiple species. Even when datasets from multiple tissues are available from consortia such as ENCODE and FANTOM, matched samples are usually only available for human and mouse, hence precluding this discrimination of functional gain from loss at aligning sequences.
Figure 4

Current state of published, functional genomics data from various mammalian species which are related as shown in the phylogenetic tree. Branch lengths indicate the genome‐wide estimate of the neutral substitution rate at fourfold degenerate sites. ChIP‐seq datasets describing the location of several TFs 53, CTCF 57, H3K27ac and H3K4me3 40 in the liver are available for up to 20 mammalian species, nine of which are shown here. Further studies of three species simultaneously have examined DHSs in fibroblasts 72; various histone modifications and TFs in pluripotent stem cells 75; H3K27ac in the developing limb 73; and H3K4me2 and H3K27ac during corticogenesis 74. The ENCODE, mouse ENCODE and FANTOM consortiums have published large collections of datasets from human and mouse tissues and cell lines. Comparative functional genomics studies within populations of the same species are likely to be a focus of future research.

Current state of published, functional genomics data from various mammalian species which are related as shown in the phylogenetic tree. Branch lengths indicate the genome‐wide estimate of the neutral substitution rate at fourfold degenerate sites. ChIP‐seq datasets describing the location of several TFs 53, CTCF 57, H3K27ac and H3K4me3 40 in the liver are available for up to 20 mammalian species, nine of which are shown here. Further studies of three species simultaneously have examined DHSs in fibroblasts 72; various histone modifications and TFs in pluripotent stem cells 75; H3K27ac in the developing limb 73; and H3K4me2 and H3K27ac during corticogenesis 74. The ENCODE, mouse ENCODE and FANTOM consortiums have published large collections of datasets from human and mouse tissues and cell lines. Comparative functional genomics studies within populations of the same species are likely to be a focus of future research.

Conclusions and prospects

The birth and death of both sequence and function is a common occurrence within the human genome, and represents an important contributor to genetic diversity. These turnover events have been observed at both distal regulatory elements and functional promoters and confirm that, while a useful predictor, evolutionary conservation is not required to identify functioning, lineage‐specific elements in the human genome. The availability of large amounts of functional genomics data in both human and mouse have already allowed evolutionary turnover to be investigated across a number of tissues. Extending the available datasets to more distantly related species will permit functional turnover events to be resolved into births and deaths along individual lineages (Fig. 4). Investigating the dynamics of births and deaths within the human population should reveal any phenotypic consequences, and whether they are associated with disease, such as autoimmune disorders, which could aid in the development of personalised strategies to treat these. Investigations into the combined effects of sequence and functional turnover in the birth and death of genetic elements have only recently been attempted. Further work will likely focus on the direct relationship between these, for example one might ask if functional deaths result in biologically unimportant sequence that is then a target for sequence deletion and the complete removal of the element from the genome. Despite this work, the biological relevance of these evolutionarily volatile elements still remains unclear. Do they drive the diversification of gene expression profiles, or do they simply represent the neutral churn of redundant genetic elements in the genome? By carefully matching current datasets to increasing amounts of functional genomics data from multiple species 86, we now have an exciting opportunity to reveal the role of evolutionary birth and death in shaping the mammalian genome and its regulatory apparatus. The author has declared no conflicts of interest.
  86 in total

1.  The deep evolution of metazoan microRNAs.

Authors:  Benjamin M Wheeler; Alysha M Heimberg; Vanessa N Moy; Erik A Sperling; Thomas W Holstein; Steffen Heber; Kevin J Peterson
Journal:  Evol Dev       Date:  2009 Jan-Feb       Impact factor: 1.930

2.  Rapidly evolving human promoter regions.

Authors:  Martin S Taylor; Tim Massingham; Yoshihide Hayashizaki; Piero Carninci; Nick Goldman; Colin A M Semple
Journal:  Nat Genet       Date:  2008-11       Impact factor: 38.330

Review 3.  Next-generation sequencing platforms.

Authors:  Elaine R Mardis
Journal:  Annu Rev Anal Chem (Palo Alto Calif)       Date:  2013       Impact factor: 10.745

4.  A map of the cis-regulatory sequences in the mouse genome.

Authors:  Yin Shen; Feng Yue; David F McCleary; Zhen Ye; Lee Edsall; Samantha Kuan; Ulrich Wagner; Jesse Dixon; Leonard Lee; Victor V Lobanenkov; Bing Ren
Journal:  Nature       Date:  2012-08-02       Impact factor: 49.962

5.  Conservation and divergence in Toll-like receptor 4-regulated gene expression in primary human versus mouse macrophages.

Authors:  Kate Schroder; Katharine M Irvine; Martin S Taylor; Nilesh J Bokil; Kim-Anh Le Cao; Kelly-Anne Masterman; Larisa I Labzin; Colin A Semple; Ronan Kapetanovic; Lynsey Fairbairn; Altuna Akalin; Geoffrey J Faulkner; John Kenneth Baillie; Milena Gongora; Carsten O Daub; Hideya Kawaji; Geoffrey J McLachlan; Nick Goldman; Sean M Grimmond; Piero Carninci; Harukazu Suzuki; Yoshihide Hayashizaki; Boris Lenhard; David A Hume; Matthew J Sweet
Journal:  Proc Natl Acad Sci U S A       Date:  2012-03-26       Impact factor: 11.205

6.  A promoter-level mammalian expression atlas.

Authors:  Alistair R R Forrest; Hideya Kawaji; Michael Rehli; J Kenneth Baillie; Michiel J L de Hoon; Vanja Haberle; Timo Lassmann; Ivan V Kulakovskiy; Marina Lizio; Masayoshi Itoh; Robin Andersson; Christopher J Mungall; Terrence F Meehan; Sebastian Schmeier; Nicolas Bertin; Mette Jørgensen; Emmanuel Dimont; Erik Arner; Christian Schmidl; Ulf Schaefer; Yulia A Medvedeva; Charles Plessy; Morana Vitezic; Jessica Severin; Colin A Semple; Yuri Ishizu; Robert S Young; Margherita Francescatto; Intikhab Alam; Davide Albanese; Gabriel M Altschuler; Takahiro Arakawa; John A C Archer; Peter Arner; Magda Babina; Sarah Rennie; Piotr J Balwierz; Anthony G Beckhouse; Swati Pradhan-Bhatt; Judith A Blake; Antje Blumenthal; Beatrice Bodega; Alessandro Bonetti; James Briggs; Frank Brombacher; A Maxwell Burroughs; Andrea Califano; Carlo V Cannistraci; Daniel Carbajo; Yun Chen; Marco Chierici; Yari Ciani; Hans C Clevers; Emiliano Dalla; Carrie A Davis; Michael Detmar; Alexander D Diehl; Taeko Dohi; Finn Drabløs; Albert S B Edge; Matthias Edinger; Karl Ekwall; Mitsuhiro Endoh; Hideki Enomoto; Michela Fagiolini; Lynsey Fairbairn; Hai Fang; Mary C Farach-Carson; Geoffrey J Faulkner; Alexander V Favorov; Malcolm E Fisher; Martin C Frith; Rie Fujita; Shiro Fukuda; Cesare Furlanello; Masaaki Furino; Jun-ichi Furusawa; Teunis B Geijtenbeek; Andrew P Gibson; Thomas Gingeras; Daniel Goldowitz; Julian Gough; Sven Guhl; Reto Guler; Stefano Gustincich; Thomas J Ha; Masahide Hamaguchi; Mitsuko Hara; Matthias Harbers; Jayson Harshbarger; Akira Hasegawa; Yuki Hasegawa; Takehiro Hashimoto; Meenhard Herlyn; Kelly J Hitchens; Shannan J Ho Sui; Oliver M Hofmann; Ilka Hoof; Furni Hori; Lukasz Huminiecki; Kei Iida; Tomokatsu Ikawa; Boris R Jankovic; Hui Jia; Anagha Joshi; Giuseppe Jurman; Bogumil Kaczkowski; Chieko Kai; Kaoru Kaida; Ai Kaiho; Kazuhiro Kajiyama; Mutsumi Kanamori-Katayama; Artem S Kasianov; Takeya Kasukawa; Shintaro Katayama; Sachi Kato; Shuji Kawaguchi; Hiroshi Kawamoto; Yuki I Kawamura; Tsugumi Kawashima; Judith S Kempfle; Tony J Kenna; Juha Kere; Levon M Khachigian; Toshio Kitamura; S Peter Klinken; Alan J Knox; Miki Kojima; Soichi Kojima; Naoto Kondo; Haruhiko Koseki; Shigeo Koyasu; Sarah Krampitz; Atsutaka Kubosaki; Andrew T Kwon; Jeroen F J Laros; Weonju Lee; Andreas Lennartsson; Kang Li; Berit Lilje; Leonard Lipovich; Alan Mackay-Sim; Ri-ichiroh Manabe; Jessica C Mar; Benoit Marchand; Anthony Mathelier; Niklas Mejhert; Alison Meynert; Yosuke Mizuno; David A de Lima Morais; Hiromasa Morikawa; Mitsuru Morimoto; Kazuyo Moro; Efthymios Motakis; Hozumi Motohashi; Christine L Mummery; Mitsuyoshi Murata; Sayaka Nagao-Sato; Yutaka Nakachi; Fumio Nakahara; Toshiyuki Nakamura; Yukio Nakamura; Kenichi Nakazato; Erik van Nimwegen; Noriko Ninomiya; Hiromi Nishiyori; Shohei Noma; Shohei Noma; Tadasuke Noazaki; Soichi Ogishima; Naganari Ohkura; Hiroko Ohimiya; Hiroshi Ohno; Mitsuhiro Ohshima; Mariko Okada-Hatakeyama; Yasushi Okazaki; Valerio Orlando; Dmitry A Ovchinnikov; Arnab Pain; Robert Passier; Margaret Patrikakis; Helena Persson; Silvano Piazza; James G D Prendergast; Owen J L Rackham; Jordan A Ramilowski; Mamoon Rashid; Timothy Ravasi; Patrizia Rizzu; Marco Roncador; Sugata Roy; Morten B Rye; Eri Saijyo; Antti Sajantila; Akiko Saka; Shimon Sakaguchi; Mizuho Sakai; Hiroki Sato; Suzana Savvi; Alka Saxena; Claudio Schneider; Erik A Schultes; Gundula G Schulze-Tanzil; Anita Schwegmann; Thierry Sengstag; Guojun Sheng; Hisashi Shimoji; Yishai Shimoni; Jay W Shin; Christophe Simon; Daisuke Sugiyama; Takaai Sugiyama; Masanori Suzuki; Naoko Suzuki; Rolf K Swoboda; Peter A C 't Hoen; Michihira Tagami; Naoko Takahashi; Jun Takai; Hiroshi Tanaka; Hideki Tatsukawa; Zuotian Tatum; Mark Thompson; Hiroo Toyodo; Tetsuro Toyoda; Elvind Valen; Marc van de Wetering; Linda M van den Berg; Roberto Verado; Dipti Vijayan; Ilya E Vorontsov; Wyeth W Wasserman; Shoko Watanabe; Christine A Wells; Louise N Winteringham; Ernst Wolvetang; Emily J Wood; Yoko Yamaguchi; Masayuki Yamamoto; Misako Yoneda; Yohei Yonekura; Shigehiro Yoshida; Susan E Zabierowski; Peter G Zhang; Xiaobei Zhao; Silvia Zucchelli; Kim M Summers; Harukazu Suzuki; Carsten O Daub; Jun Kawai; Peter Heutink; Winston Hide; Tom C Freeman; Boris Lenhard; Vladimir B Bajic; Martin S Taylor; Vsevolod J Makeev; Albin Sandelin; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal:  Nature       Date:  2014-03-27       Impact factor: 49.962

7.  Human-specific loss of regulatory DNA and the evolution of human-specific traits.

Authors:  Cory Y McLean; Philip L Reno; Alex A Pollen; Abraham I Bassan; Terence D Capellini; Catherine Guenther; Vahan B Indjeian; Xinhong Lim; Douglas B Menke; Bruce T Schaar; Aaron M Wenger; Gill Bejerano; David M Kingsley
Journal:  Nature       Date:  2011-03-10       Impact factor: 49.962

8.  Species-specific transcription in mice carrying human chromosome 21.

Authors:  Michael D Wilson; Nuno L Barbosa-Morais; Dominic Schmidt; Caitlin M Conboy; Lesley Vanes; Victor L J Tybulewicz; Elizabeth M C Fisher; Simon Tavaré; Duncan T Odom
Journal:  Science       Date:  2008-09-11       Impact factor: 47.728

9.  An expansive human regulatory lexicon encoded in transcription factor footprints.

Authors:  Shane Neph; Jeff Vierstra; Andrew B Stergachis; Alex P Reynolds; Eric Haugen; Benjamin Vernot; Robert E Thurman; Sam John; Richard Sandstrom; Audra K Johnson; Matthew T Maurano; Richard Humbert; Eric Rynes; Hao Wang; Shinny Vong; Kristen Lee; Daniel Bates; Morgan Diegel; Vaughn Roach; Douglas Dunn; Jun Neri; Anthony Schafer; R Scott Hansen; Tanya Kutyavin; Erika Giste; Molly Weaver; Theresa Canfield; Peter Sabo; Miaohua Zhang; Gayathri Balasundaram; Rachel Byron; Michael J MacCoss; Joshua M Akey; M A Bender; Mark Groudine; Rajinder Kaul; John A Stamatoyannopoulos
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

Review 10.  Mammalian RNA polymerase II core promoters: insights from genome-wide studies.

Authors:  Albin Sandelin; Piero Carninci; Boris Lenhard; Jasmina Ponjavic; Yoshihide Hayashizaki; David A Hume
Journal:  Nat Rev Genet       Date:  2007-05-08       Impact factor: 53.242

View more
  3 in total

Review 1.  Comparative transcriptomics in human and mouse.

Authors:  Alessandra Breschi; Thomas R Gingeras; Roderic Guigó
Journal:  Nat Rev Genet       Date:  2017-05-08       Impact factor: 53.242

Review 2.  Biological function in the twilight zone of sequence conservation.

Authors:  Chris P Ponting
Journal:  BMC Biol       Date:  2017-08-16       Impact factor: 7.431

3.  Evolution of mouse circadian enhancers from transposable elements.

Authors:  Julius Judd; Hayley Sanderson; Cédric Feschotte
Journal:  Genome Biol       Date:  2021-06-29       Impact factor: 13.583

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.