| Literature DB >> 35205235 |
Baocheng Guo1,2,3, Ming Zou1, Takahiro Sakamoto4, Hideki Innan4.
Abstract
In his influential book "Evolution by Gene Duplication", Ohno postulated that frameshift mutation could lead to a new function after duplication, but frameshift mutation is generally thought to be deleterious, and thus drew little attention in functional innovation in duplicate evolution. To this end, we here report an exhaustive survey of the genomes of human, mouse, zebrafish, and fruit fly. We identified 80 duplicate genes that involved frameshift mutations after duplication. The frameshift mutation preferentially located close to the C-terminus in most cases (55/88), which indicated that a frameshift mutation that changed the reading frame in a small part at the end of a duplicate may likely have contributed to adaptive evolution (e.g., human genes NOTCH2NL and ARHGAP11B) otherwise too deleterious to survive. A few cases (11/80) involved multiple frameshift mutations, exhibiting various patterns of modifications of the reading frame. Functionality of duplicate genes involving frameshift mutations was confirmed by sequence characteristics and expression profile, suggesting a potential role of frameshift mutation in creating functional novelty. We thus showed that genomes have non-negligible numbers of genes that have experienced frameshift mutations following gene duplication. Our results demonstrated the potential importance of frameshift mutations in molecular evolution, as Ohno verbally argued 50 years ago.Entities:
Keywords: ARHGAP11B; NOTCH2NL; Ohno; frameshift mutation; gene duplication
Mesh:
Year: 2022 PMID: 35205235 PMCID: PMC8872073 DOI: 10.3390/genes13020190
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1An example of duplicated genes, with one copy having experienced a frameshift mutation (H10, OR2T7, and OR2T27 in human in Table 1).
Duplicated genes with frameshift mutations.
| Species | Group ID | Gene ID of | Gene ID of | Gene Name of Original Copy | Gene Name of Derived Copy | Gene ID of Outgroup | No. of Frameshift Mutations | Type | Length (No. of Amino Acids) | Expression |
|---|---|---|---|---|---|---|---|---|---|---|
| Human | H01 | ENSG00000235233.8 | ENSG00000204520.12 |
|
| ENSPTRT00000033163.5 | 1 | C **** | 15/223/385 *** | Quantitative level |
| H02 | ENSG00000235233.8 | ENSG00000231225.9 |
|
| ENSPTRT00000033163.5 | 1 | C | 15/289/385 | Quantitative level | |
| H03 | ENSG00000235233.8 | ENSG00000233051.9 |
|
| ENSPTRT00000033163.5 | 2 | N&C | 27/194/385&15/194/385 | Quantitative level | |
| H04 | ENSG00000233439.7 | ENSG00000206458.9 |
|
| ENSPTRT00000066839.2 | 1 | C | 24/63/152 | Quantitative level | |
| H05 | ENSG00000170122.5 | ENSG00000184492.6 |
|
| ENSMICG00000036465 | 1 | C | 116/408/439 | Quantitative level | |
| H06 | ENSG00000153779.10 | ENSG00000176679.8 |
|
| ENSPCOT00000008535.1 | 1 | C | 37/185/241 | Pattern/Quantitative level ***** | |
| H07 | ENSG00000170122.5 | ENSG00000273514.1 |
|
| ENSMICG00000036465 | 1 | C | 55/417/439 | Quantitative level | |
| H08 | ENSG00000275568.4 | ENSG00000187951.10 |
|
| ENSPTRT00000012671.3 | 1 | C | 47/267/1023 | Pattern/Quantitative level | |
| H09 | ENSG00000204149.10 | ENSG00000204172.12 |
|
| ENSPTRG00000029891 | 1 | C | 15/658/686 | Quantitative level | |
| H10 | ENSG00000187701.3 | ENSG00000281395.1 |
|
| ENSGGOG00000003840 | 1 | N | 17/308/317 | NA | |
| H11 | ENSG00000211678.2 | ENSG00000211676.2 |
|
| NA | 1 | N | 16/47/50 | Pattern/Quantitative level | |
| H12 | ENSG00000274070.1 | ENSG00000239521.7 |
|
| NA | 1 | N | 36/163/329 | Quantitative level | |
| H13 | ENSG00000258405.9 | ENSG00000221874.4 |
|
| ENSCCAG00000033386 | 1 | C | 12/233/253 | Quantitative level | |
| H14 | ENSG00000134545.13 | ENSG00000183542.5 |
|
| ENSMUST00000053708.8 * | 1 | C | 15/158/233 | Pattern/Quantitative level | |
| H15 | ENSG00000134545.13 | ENSG00000255819.7 |
|
| ENSMUST00000053708.8 | 1 | C | 15/150/228 | Quantitative level | |
| H16 | ENSG00000182816.8 | ENSG00000186980.6 |
|
| NA | 1 | C | 26/65/175 | Pattern/Quantitative level | |
| H17 | ENSG00000152086.8 | ENSG00000243910.7 |
|
| NA | 1 | C | 40/241/450 | Quantitative level | |
| Mouse | M01 | ENSMUSG00000060816.2 | ENSMUSG00000062546.4 |
|
| ENSRNOT00000086882.1 | 1 | C | 13/279/309 | Pattern/Quantitative level |
| M02 | ENSMUSG00000094918.3 | ENSMUSG00000096641.6 |
|
| ENSRNOT00000060670.1 | 1 | C | 16/102/1017 | Pattern/Quantitative level | |
| M03 | ENSMUSG00000066487.3 | ENSMUSG00000090544.2 |
|
| ENSRNOT00000019508.6 | 1 | N | 13/274/302 | Pattern/Quantitative level | |
| M04 | ENSMUSG00000066487.3 | ENSMUSG00000044533.15 |
|
| ENSRNOT00000019508.6 | 1 | C | 26/293/302 | Pattern/Quantitative level | |
| M05 | ENSMUSG00000091733.1 | ENSMUSG00000096372.1 |
|
| ENSRNOT00000046644.4 * | 1 | N | 51/213/211 | Pattern/Quantitative level | |
| M06 | ENSMUSG00000072066.6 | ENSMUSG00000055228.7 |
|
| ENSRNOT00000061526.2 | 1 | C | 62/106/353 | Pattern/Quantitative level | |
| M07 | ENSMUSG00000099974.1 | ENSMUSG00000053820.4 |
|
| ENSRNOT00000039850.3 | 1 | C | 26/128/172 | Pattern/Quantitative level | |
| M08 | ENSMUSG00000109516.1 | ENSMUSG00000109396.1 |
|
| ENSRNOT00000058760.2 | 1 | C | 13/301/313 | NA | |
| M09 | ENSMUSG00000091477.1 | ENSMUSG00000072595.2 |
|
| NA | 1 | C | 15/177/201 | Pattern/Quantitative level | |
| M10 | ENSMUSG00000096446.1 | ENSMUSG00000079244.3 |
|
| ENSRNOT00000042743.5 | 1 | C | 19/167/198 | Pattern/Quantitative level | |
| M11 | ENSMUSG00000099115.1 | ENSMUSG00000092086.1 |
|
| NA | 1 | N | 28/357/351 | Pattern/Quantitative level | |
| M12 | ENSMUSG00000031320.9 | ENSMUSG00000098559.1 |
|
| ENSRNOT00000076978.3 | 1 | C | 11/263/266 | Pattern/Quantitative level | |
| M13 | ENSMUSG00000062456.3 | ENSMUSG00000081906.2 |
|
| ENSRNOT00000052231.4 | 3 | M | 15/191/192 | Pattern/Quantitative level | |
| M14 | ENSMUSG00000047980.6 | ENSMUSG00000091411.1 |
|
| ENSRNOT00000032208.3 | 1 | C | 18/102/117 | Pattern/Quantitative level | |
| M15 | ENSMUSG00000099294.1 | ENSMUSG00000081607.2 |
|
| NA | 1 | C | 20/183/112 | Pattern/Quantitative level | |
| M16 | ENSMUSG00000055942.13 | ENSMUSG00000074369.12 |
|
| ENSRNOT00000045660.2 | 1 | C | 27/151/218 | Pattern/Quantitative level | |
| M17 | ENSMUSG00000094472.1 | ENSMUSG00000094856.1 |
|
| ENSRNOT00000091672.1 | 1 | C | 19/338/500 | Pattern/Quantitative level | |
| M18 | ENSMUSG00000024766.14 | ENSMUSG00000086875.1 |
|
| ENSRNOT00000035013.4 | 2 | M | 20/233/399 | Pattern/Quantitative level | |
| M19 | ENSMUSG00000108596.1 | ENSMUSG00000062997.6 |
|
| NA | 1 | C | 48/125/123 | Pattern/Quantitative level | |
| M20 | ENSMUSG00000067919.8 | ENSMUSG00000058186.13 |
|
| ENSRNOT00000081537.1 | 1 | C | 11/645/685 | Pattern/Quantitative level | |
| M21 | ENSMUSG00000061829.3 | ENSMUSG00000071490.3 |
|
| NA | 1 | C | 55/357/367 | Pattern/Quantitative level | |
| M22 | ENSMUSG00000061829.3 | ENSMUSG00000060024.1 |
| NA | 1 | N | 79/384/367 | Pattern/Quantitative level | ||
| M23 | ENSMUSG00000078495.10 | ENSMUSG00000078496.9 |
|
| ENSRNOT00000034746.6 | 1 | C | 16/368/547 | Pattern/Quantitative level | |
| M24 | ENSMUSG00000100296.1 | ENSMUSG00000069289.1 |
|
| MGP_PahariEiJ_T0040451.1 | 1 | C | 11/311/316 | Pattern/Quantitative level | |
| M25 | ENSMUSG00000095638.1 | ENSMUSG00000093922.1 |
|
| NA | 3 | N&M | 34/117/126&9/117/126 | Pattern/Quantitative level | |
| M26 | ENSMUSG00000096515.5 | ENSMUSG00000091008.1 |
|
| ENSRNOT00000087773.1 | 1 | C | 14/117/117 | Pattern/Quantitative level | |
| M27 | ENSMUSG00000051242.1 | ENSMUSG00000045062.4 |
|
| ENSMOCT00000026210.1 | 3 | M | 20/829/828 | Pattern/Quantitative level | |
| M28 | ENSMUSG00000067199.4 | ENSMUSG00000070526.2 |
|
| ENSNGAT00000028350.1 | 1 | C | 17/283/274 | Pattern/Quantitative level | |
| M29 | ENSMUSG00000027925.2 | ENSMUSG00000050635.1 |
|
| NA | 1 | C | 56/76/109 | Pattern/Quantitative level | |
| M30 | ENSMUSG00000035783.8 | ENSMUSG00000099104.1 |
|
| NA | 1 | C | 21/156/221 | Pattern/Quantitative level | |
| Zebrafish | Z01 | ENSDARG00000099789.1 | ENSDARG00000095189.1 |
|
| NA | 2 | C | 20/157/173 | Pattern/Quantitative level |
| Z02 | ENSDARG00000090975.3 | ENSDARG00000103701.1 |
| NA | XP_025757774.1 | 1 | N | 29/91/133 | Pattern/Quantitative level | |
| Z03 | ENSDARG00000092779.1 | ENSDARG00000102941.1 |
|
| NA | 1 | N | 14/193/229 | Pattern/Quantitative level | |
| Z04 | ENSDARG00000097099.1 | ENSDARG00000095593.1 |
|
| XP_018956261.1 | 1 | N | 15/63/66 | Pattern/Quantitative level | |
| Z05 | ENSDARG00000102028.1 | ENSDARG00000102853.1 |
|
| XP_018942195.1 | 1 | N | 21/233/266 | Pattern/Quantitative level | |
| Z06 | ENSDARG00000092625.2 | ENSDARG00000092202.2 |
|
| ENSIPUT00000009250.1 | 1 | C | 15/194/214 | Pattern/Quantitative level | |
| Z07 | ENSDARG00000095545.2 | ENSDARG00000094878.2 |
|
| ENSIPUT00000009250.1 | 1 | C | 34/198/203 | Pattern/Quantitative level | |
| Z08 | ENSDARG00000074279.1 | ENSDARG00000078586.3 | NA | NA | CI01000026_05016210_05017522 * | 1 | N | 22/221/233 | Pattern/Quantitative level | |
| Z09 | ENSDARG00000095444.1 | ENSDARG00000093963.2 |
|
| ENSIPUT00000009250.1 | 1 | N | 24/202/268 | Pattern/Quantitative level | |
| Z10 | ENSDARG00000092512.1 | ENSDARG00000093579.1 |
|
| XP_018949576.1 | 2 | M | 34/218/462 | Pattern/Quantitative level | |
| Z11 | ENSDARG00000090975.3 | ENSDARG00000099246.1 |
| NA | XP_025757774.1 | 1 | N | 20/108/133 | Pattern/Quantitative level | |
| Z12 | ENSDARG00000095532.1 | ENSDARG00000095026.1 |
|
| NA | 1 | C | 63/216/232 | Pattern/Quantitative level | |
| Z13 | ENSDARG00000092512.1 | ENSDARG00000091890.1 |
|
| XP_018949576.1 | 1 | C | 54/288/462 | Pattern/Quantitative level | |
| Z14 | ENSDARG00000099200.1 | ENSDARG00000077910.4 |
|
| ENSIPUT00000032654.1 | 1 | N | 12/257/347 | Pattern/Quantitative level | |
| Z15 | ENSDARG00000043445.5 | ENSDARG00000093845.3 |
|
| XP_016119820.1 | 1 | C | 37/360/384 | Pattern/Quantitative level | |
| Z16 | ENSDARG00000094001.3 | ENSDARG00000093588.1 |
|
| NA | 1 | C | 15/70/90 | Pattern/Quantitative level | |
| Z17 | ENSDARG00000104631.1 | ENSDARG00000078683.3 |
|
| ENSIPUT00000017639.1 | 1 | C | 77/413/387 | Pattern/Quantitative level | |
| Z18 | ENSDARG00000094001.3 | ENSDARG00000094329.1 |
|
| NA | 2 | C | 18/73/90 | Pattern/Quantitative level | |
| Z19 | ENSDARG00000017984.10 | ENSDARG00000096625.1 |
|
| NA | 1 | C | 13/332/394 | Pattern/Quantitative level | |
| Z20 | ENSDARG00000043894.5 | ENSDARG00000102034.1 |
|
| XP_016341397.1 | 2 | C | 38/124/124 | Pattern/Quantitative level | |
| Z21 | ENSDARG00000026704.6 | ENSDARG00000097373.2 |
|
| XP_016407654.1 | 1 | N | 18/489/550 | Pattern/Quantitative level | |
| Z22 | ENSDARG00000078195.2 | ENSDARG00000104993.2 |
|
| XP_018967896.1 | 1 | C | 11/110/150 | Pattern/Quantitative level | |
| Z23 | ENSDARG00000100941.1 | ENSDARG00000086498.3 |
|
| CI01000308_00041387_00048067 | 1 | C | 9/229/907 | Pattern/Quantitative level | |
| Z24 | ENSDARG00000102866.1 | ENSDARG00000105247.1 |
|
| NA | 1 | C | 35/206/223 | Pattern/Quantitative level | |
| Z25 | ENSDARG00000095057.2 | ENSDARG00000070858.4 |
|
| XP_016375730.1 | 2 | M | 34/226/267 | Pattern/Quantitative level | |
| Z26 | ENSDARG00000097997.1 | ENSDARG00000089747.3 |
|
| NA | 1 | N | 42/330/391 | Pattern/Quantitative level | |
| Z27 | ENSDARG00000068363.5 | ENSDARG00000092930.1 |
|
| CI01000028_05937526_05949610 | 1 | N | 10/112/577 | Pattern/Quantitative level | |
| Z28 | ENSDARG00000069869.3 | ENSDARG00000103341.1 |
|
| NA | 1 | C | 16/215/240 | Pattern/Quantitative level | |
| Z29 | ENSDARG00000099359.1 | ENSDARG00000086495.2 |
|
| KKF19921.1 * | 1 | C | 12/1344/1430 | Pattern/Quantitative level | |
| Z30 | ENSDARG00000001431.10 | ENSDARG00000104713.1 |
|
| ENSEBUT00000022154.1 | 1 | C | 32/151/898 | Pattern/Quantitative level | |
| Z31 | ENSDARG00000101249.2 | ENSDARG00000105342.1 |
|
| ENSTNIG0000000877 | 1 | C | 46/69/103 | Pattern/Quantitative level | |
| Fruit fly | F01 | FBgn0053236 | FBgn0053242 |
|
| KMQ81491.1 | 2 | M | 19/171/172 | Pattern/Quantitative level |
| F02 | FBgn0262610 | FBgn0262575 | NA | NA | FBgn0167536 | 1 | C | 20/192/196 | Pattern/Quantitative level |
* An outgroup sequence was obtained and used to determine the original and derived copies, but was not used for sequence analyses due to bad alignment. ** This was an exceptional case in which the shorter copy (Vmn1r213) was considered ancestral even with no outgroup, because Vmn1r214 was also the ancestral copy for the case of M21. *** The length of frameshift region/the length of the derived copy/the length of the original copy. **** “C”, ““M”, and “N” mean that the frameshift changed the reading frame in the C-terminus, middle region, or N-terminus of the amino acid sequence of a gene, respectively. ***** Showing tissue-specific expression divergence/only quantitative level expression divergence between duplicates with frameshift mutation. NA: not available.
Figure 2Locations of typical frameshift mutations along their coding sequences in bp from the start codon. The original and derived copies are presented in light blue and blue, with frameshifted regions in red. Triangles with positive numbers are insertions, and reverse triangles with negative numbers are deletions. The numbers are the sizes of the indels. (A) One representative case for type-C. (B) One representative case for type-N. (C) Seven cases involving multiple frameshift mutations: H12, CASTOR3, and CASTOR2 in human; H10, OR2T7, and OR2T27 in human; H3, MICA, and MICA in human; M13, Rpl-ps1, and Rpl-ps6 in mouse; M18, Lipo5, and Lipo3 in mouse; M27, Pcdhb7, and Pcdhb9 in mouse; Z20 and Z25 in zebrafish (gene names are unavailable); and F01, CSNK2B, and CSNK2B in fruit fly.
Figure 3(A) Distribution of the locations of frameshift mutations mapped on the original copy rescaled to the (0,1) interval. (B) Distribution of the size of frameshifted region relative to the original copy. (C) Distribution of K for type-C, -N, and -M. (D) Distribution of K/K for type-C, -N, and -M. Data for the four species are pooled. (E) Distribution of K for human, mouse, zebrafish, and fruit fly. (F) Distribution of K/K for human, mouse, zebrafish, and fruit fly. Data for type-C, -N, and -M are pooled. The data were binned with a window of size 0.1 in all panels.