| Literature DB >> 28992302 |
Zachary R Hanna1,2,3,4, James B Henderson3,4, Jeffrey D Wall1,3,4,5, Christopher A Emerling1,2, Jérôme Fuchs3,6, Charles Runckel7,8,9, David P Mindell1, Rauri C K Bowie1,2, Joseph L DeRisi7,8, John P Dumbacher3,4.
Abstract
We report here the assembly of a northern spotted owl (Strix occidentalis caurina) genome. We generated Illumina paired-end sequence data at 90× coverage using nine libraries with insert lengths ranging from ∼250 to 9,600 nt and read lengths from 100 to 375 nt. The genome assembly is comprised of 8,108 scaffolds totaling 1.26 × 109 nt in length with an N50 length of 3.98 × 106 nt. We calculated the genome-wide fixation index (FST) of S. o. caurina with the closely related barred owl (Strix varia) as 0.819. We examined 19 genes that encode proteins with light-dependent functions in our genome assembly as well as in that of the barn owl (Tyto alba). We present genomic evidence for loss of three of these in S. o. caurina and four in T. alba. We suggest that most light-associated gene functions have been maintained in owls and their loss has not proceeded to the same extent as in other dim-light-adapted vertebrates.Entities:
Keywords: Aves; Strigidae; Strigiformes; bird; nuclear genome
Mesh:
Year: 2017 PMID: 28992302 PMCID: PMC5629816 DOI: 10.1093/gbe/evx158
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Specimen Data
| Specimen | County | State | Country | Date | Specimen Institution |
|---|---|---|---|---|---|
| CAS:ORN:98821 | Marin County | CA | United States | 26 Jun 2005 | California Academy of Sciences |
| CNHM<USA-OH>:ORNITH | Hamilton County | OH | United States | 29 Nov 2010 | Cincinnati Museum Center |
Note.—Information regarding the S. o. caurina and S. varia individuals from which we obtained genomic sequences for this study including the county, state, country, and date of collection for each specimen as well as the specimen code and institution where each specimen is archived.
Metrics of Preliminary Assemblies
| Assembly | contig N50 (nt) | scaffold N50 (nt) | Total Length of Assembly (Gnt) | Ns (%) | Total Number of Scaffolds | Number Of Scaffolds > 1 Mnt In Length | Partial CEGs Found by CEGMA | Complete CEGs Found by CEGMA |
|---|---|---|---|---|---|---|---|---|
| 1 | 9,499 | 3,869,235 | 1.275 | 4.77 | 51,843 | 292 | 231 | 205 |
| 2 | 12,096 | 3,522,724 | 1.274 | 4.40 | 48,264 | 295 | 233 | 205 |
| 3 | 10,425 | 4,007,375 | 1.272 | 4.88 | 47,075 | 0 | 226 | 200 |
| 5 | 10,315 | 4,164,870 | 1.272 | 4.45 | 46,146 | 287 | 232 | 206 |
| 6 | 9,142 | 3,780,867 | 1.275 | 4.86 | 51,615 | 296 | 230 | 202 |
| 7 | 9,802 | 3,478,271 | 1.274 | 4.42 | 54,240 | 327 | 233 | 209 |
| 8 | 12,650 | 3,665,028 | 1.271 | 4.18 | 43,092 | 313 | 231 | 204 |
| 9 | 12,006 | 3,587,241 | 1.271 | 4.66 | 44,939 | 307 | 226 | 201 |
| 10 | 12,487 | 3,586,666 | 1.271 | 4.26 | 44,345 | 314 | 232 | 204 |
| 11 | 14,651 | 3,917,141 | 1.276 | 4.26 | 50,636 | 293 | 234 | 217 |
| 12 | 14,627 | 3,728,521 | 1.276 | 4.28 | 50,349 | 305 | 234 | 219 |
| 13 | 14,672 | 3,917,121 | 1.276 | 4.26 | 50,129 | 293 | 234 | 217 |
| 14 | 13,967 | 3,431,044 | 1.300 | 4.50 | 127,384 | 318 | 238 | 218 |
Note.—Various continuity and completeness summary statistics for our preliminary assemblies. We removed contigs/scaffolds <300 nt in order to remove unassembled reads from the assemblies before calculating these statistics. We defined contigs with the very restrictive parameter that each N split a scaffold into a separate contig. “Partial CEGs found by CEGMA” refers to the number of gene sequences found by CEGMA in the assembly in at least partial completeness out of 248 total CEGs. An asterisk and bolded font mark the preliminary assembly that we chose to use as the basis for the final assembly.
Final Assembly Metrics
| Assembly Version | No Gap-Closing, no Scaffolds, or Contigs Removed | Gap-Closed, No Scaffolds or Contigs Removed | Gap-Closed, Scaffolds and Contigs <1,000 nt Removed |
|---|---|---|---|
| Number of scaffolds | 3,754,960 | 3,754,960 | 8,108 |
| Total size of scaffolds | 1,884,397,264 nt | 1,882,081,621 nt | 1,255,541,132 nt |
| Longest scaffold | 15,783,852 nt | 15,750,186 nt | 15,750,186 nt |
| Shortest scaffold | 128 nt | 128 nt | 1,000 nt |
| Number of scaffolds > 1 K nt | 8,112 (0.2%) | 8,095 (0.2%) | 8,095 (99.8%) |
| Number of scaffolds > 10 K nt | 1,754 (0.0%) | 1,746 (0.0%) | 1,746 (21.5%) |
| Number of scaffolds > 100 K nt | 661 (0.0%) | 661 (0.0%) | 661 (8.2%) |
| Number of scaffolds > 1 M nt | 303 (0.0%) | 303 (0.0%) | 303 (3.7%) |
| Number of scaffolds > 10 M nt | 9 (0.0%) | 9 (0.0%) | 9 (0.1%) |
| Mean scaffold size | 502 nt | 501 nt | 154,852 nt |
| Median scaffold size | 150 nt | 150 nt | 1,904 nt |
| N50 scaffold length (L50 scaffold count) | 1,843,286 nt (209) | 1,836,279 nt (209) | 3,983,020 nt (92) |
| N60 scaffold length (L60 scaffold count) | 622,124 nt (370) | 619,581 nt (371) | 3,012,707 nt (129) |
| N70 scaffold length (L70 scaffold count) | 255 nt (216,251) | 255 nt (218,976) | 2,162,240 nt (178) |
| N80 scaffold length (L80 scaffold count) | 174 nt (1,110,583) | 174 nt (1,113,245) | 1,545,070 nt (246) |
| N90 scaffold length (L90 scaffold count) | 143 nt (2,336,958) | 143 nt (2,338,577) | 618,731 nt (372) |
| scaffold %GC | 42.81% | 43.82% | 41.31% |
| scaffold %N | 2.89% | 0.74% | 1.10% |
| Percentage of assembly in scaffolded contigs | 66.4% | 65.7% | 98.5% |
| Percentage of assembly in unscaffolded contigs | 33.6% | 34.3% | 1.5% |
| Average number of contigs per scaffold | 1.0 | 1.0 | 3.4 |
| Average length of break (>25 Ns) between contigs in scaffold | 311 | 703 | 716 |
| Number of contigs | 3,929,029 | 3,774,552 | 27,252 |
| Number of contigs in scaffolds | 179,939 | 22,372 | 21,478 |
| Number of contigs not in scaffolds | 3,749,090 | 3,752,180 | 5,774 |
| Total size of contigs | 1,830,109,624 nt | 1,868,296,631 nt | 1,241,823,123 nt |
| Longest contig | 186,255 nt | 1,259,046 nt | 1,259,046 nt |
| Shortest contig | 5 nt | 128 nt | 130 nt |
| Number of contigs > 1 K nt | 123,891 (3.2%) | 23,915 (0.6%) | 23,915 (87.8%) |
| Number of contigs > 10 K nt | 37,347 (1.0%) | 12,373 (0.3%) | 12,373 (45.4%) |
| Number of contigs > 100 K nt | 58 (0.0%) | 3,909 (0.1%) | 3,909 (14.3%) |
| Number of contigs > 1 M nt | 0 (0.0%) | 8 (0.0%) | 8 (0.0%) |
| Mean contig size | 466 nt | 495 nt | 45,568 nt |
| Median contig size | 150 nt | 150 nt | 6,702 nt |
| N50 contig length (L50 contig count) | 7,855 nt (46,856) | 81,400 nt (4,678) | 171,882 nt (2,057) |
| N60 contig length (L60 contig count) | 3,275 nt (81,600) | 33,521 nt (8,121) | 134,419 nt (2,876) |
| N70 contig length (L70 contig count) | 254 nt (448,715) | 255 nt (254,729) | 98,604 nt (3,955) |
| N80 contig length (L80 contig count) | 170 nt (1,346,255) | 173 nt (1,148,692) | 66,668 nt (5,484) |
| N90 contig length (L90 contig count) | 142 nt (2,548,877) | 142 nt (2,367,845) | 34,559 nt (8,023) |
Note.—Assembly (contaminant and mitochondrial sequences removed) metrics before gap-closing,after gap-closing,and after both gap-closing and removal of all contigs and scaffolds <1,000 nt in length. Strings of 25 or more N’s broke scaffolds into contigs.
Summary of Conserved Ortholog Searches
| Assembly | Draft. No Gap-Closing, Contigs/Scaffolds < 300 nt Removed | Draft. Gap-Closed, No Removal of Small Contigs/Scaffolds | Final. Gap-Closed, Contigs/Scaffolds <1,000 nt Removed | Final. Gap-Closed, Contigs/Scaffolds <1,000 nt Removed |
|---|---|---|---|---|
| Method | ||||
| Total conserved orthologs examined | 248 | 248 | 248 | 3,023 |
| Complete orthologs (% of total) | 221 (89.11%) | 228 (91.94%) | 228 (91.94%) | 2,605 (86.17%) |
| At least partial orthologs (% of total) | 235 (94.76%) | 236 (95.16%) | 235 (94.76%) | 2,815 (93.12%) |
| Duplicated orthologs (% of total) | 92 (37.10%) | 83 (33.47%) | 99 (39.92%) | 46 (1.52%) |
| Missing orthologs | 13 (5.24%) | 12 (4.84%) | 13 (5.24)% | 208 (6.88%) |
Note.—Comparison of the number of conserved orthologous genes found in the final assembly (gap-closed,contigs/scaffolds <1,000 nt removed) using the CEGMA and BUSCO tools. In order to illustrate the effect of gap-closing and removal of small fragments on assembly completeness metrics,also included are the results of CEGMA gene searches conducted on two draft versions of the final assembly where we either did not perform gap-closing and removed contigs/scaffolds < 300 nt or performed gap-closing and did not remove any small contigs/scaffolds.
Comparative Statistics of Avian Genomes
| Species | Common name | Scaffold N50 (nt) | No. Scaffolds/Contigs | Contig N50 (nt) | Length (Gnt) | Ns (%) | Complete CEGs (% of 248) | Partial CEGs (% of 248) |
|---|---|---|---|---|---|---|---|---|
| Northern Spotted Owl | 3,983,020 | 8,108 | 171,882 | 1.26 | 1.10 | 228 (91.94%) | 235 (94.76%) | |
| Barn Owl | 51,873 | 166,092 | 19,113 | 1.14 | 1.02 | 144 (58.06%) | 198 (79.84%) | |
| Downy Woodpecker | 2,086,781 | 85,828 | 29,578 | 1.17 | 3.72 | 196 (79.03%) | 216 (87.10%) | |
| Zebra Finch | 62,374,962 | 37,095 | 38,644 | 1.23 | 0.75 | 192 (77.42%) | 214 (86.29%) | |
| Bald Eagle | 669,725 | 346,419 | 10,218 | 1.26 | 3.97 | 217 (87.50%) | 240 (96.77%) | |
| Golden Eagle | 9,230,743 | 1,141 | 215,151 | 1.19 | 1.07 | 226 (91.13%) | 238 (95.97%) | |
| Chimney swift | 3,839,435 | 60,234 | 33,918 | 1.13 | 4.02 | 191 (77.02%) | 222 (89.52%) | |
| Chicken | 82,310,166 | 23,474 | 2,905,620 | 1.23 | 0.96 | 226 (91.13%) | 237 (95.56%) |
Note.—Comparative statistics of our S. o. caurina assembly with those of a selection of other avian genome assemblies.
Repetitive Element Summary
| Type Level 1 | Type Level 2 | Type Level 3 | Type Level 4 | Number of Elements | Element Total Length (nt) | Assembly Portion (%) |
|---|---|---|---|---|---|---|
| 175,287,790 | 9.31 | |||||
| 727,006 | 168,672,903 | 8.96 | ||||
| Retroelement | SINE | 40,360 | 4,770,020 | 0.25 | ||
| Retroelement | SINE | ALU | 53 | 6,194 | 0.00 | |
| Retroelement | SINE | MIR | 15,510 | 1,558,420 | 0.08 | |
| Retroelement | Penelope | 169 | 35,110 | 0.00 | ||
| Retroelement | Total LINEs | 486,310 | 115,604,290 | 6.14 | ||
| Retroelement | LINE | LINE1 | 622 | 58,117 | 0.00 | |
| Retroelement | LINE | LINE2 | 3,116 | 317,864 | 0.02 | |
| Retroelement | LINE | L3/CR1 | 28,122 | 5,153,289 | 0.27 | |
| Retroelement | LINE | CRE/SLACS | 0 | 0 | 0.00 | |
| Retroelement | LINE | L2/CR1/Rex | 452,030 | 109,807,316 | 5.83 | |
| Retroelement | LINE | R1/LOA/Jockey | 0 | 0 | 0.00 | |
| Retroelement | LINE | R2/R4/NeSL | 131 | 44,590 | 0.00 | |
| Retroelement | LINE | RTE/Bov-B | 15 | 3,492 | 0.00 | |
| Retroelement | LINE | L1/CIN4 | 98 | 23,441 | 0.00 | |
| Retroelement | Total LTR elements | 200,336 | 48,298,593 | 2.57 | ||
| Retroelement | LTR | BEL/Pao | 0 | 0 | 0.00 | |
| Retroelement | LTR | ERV_classI | 983 | 122,219 | 0.01 | |
| Retroelement | LTR | ERV_classII | 400 | 54,854 | 0.00 | |
| Retroelement | LTR | ERVL | 436 | 91,660 | 0.00 | |
| Retroelement | LTR | ERVL-MaLRs | 51 | 4,838 | 0.00 | |
| Retroelement | LTR | Gypsy/DIRS1 | 111 | 14,921 | 0.00 | |
| Retroelement | LTR | Retroviral | 197,967 | 47,947,799 | 2.55 | |
| Retroelement | LTR | Ty1/Copia | 0 | 0 | 0.00 | |
| 37,526 | 5,628,486 | 0.30 | ||||
| DNA element | En-Spm | 0 | 0 | 0.00 | ||
| DNA element | hAT-Charlie | 418 | 28,220 | 0.00 | ||
| DNA element | hobo-Activator | 4,235 | 719,417 | 0.04 | ||
| DNA element | MuDR-IS905 | 0 | 0 | 0.00 | ||
| DNA element | PiggyBac | 0 | 0 | 0.00 | ||
| DNA element | Tc1-IS630-Pogo | 806 | 141,663 | 0.01 | ||
| DNA element | TcMar-Tigger | 528 | 39,074 | 0.00 | ||
| DNA element | Tourist/Harbinger | 9,255 | 958,360 | 0.05 | ||
| DNA element | Other (Mirage, P-element,Transib) | 0 | 0 | 0.00 | ||
| 0 | 0 | 0.00 | ||||
| 6,225 | 986,401 | 0.05 | ||||
| 1,907,394 | 232,038,709 | 12.33 | ||||
| Small RNA | 12,051 | 1,645,166 | 0.09 | |||
| Satellites | 1,261,021 | 185,995,538 | 9.88 | |||
| Simple repeats | 564,508 | 40,568,395 | 2.16 | |||
| Low complexity repeats | 69,814 | 3,829,610 | 0.20 | |||
Note.—Summary of the repeat elements found during two rounds of repeat masking (homology-based followed by denovo-model-based masking). Depending on the type of repeat element, we provide information at different category summary levels. We use the “Type level” column headings to organize these categories.
Library Alignment Statistics
| Library | Mean Paired and Unpaired Read Genome Coverage Postfiltering (X) | SD of Paired and Unpaired Read Genome Coverage Postfiltering (X) | Fraction of Aligned Bases From Unpaired Reads | Total Fraction of Filtered Aligned Bases | Fraction Aligned Bases Filtered Due to Mapping Quality < 20 | Fraction Aligned Bases Filtered as Duplicates | Fraction Aligned Bases Filtered as Low Quality With Q < 20 | Fraction Aligned Bases Filtered as Second Observation From Overlapping Reads | Fraction Aligned Bases Filtered From Regions Already with > 1,000× coverage |
|---|---|---|---|---|---|---|---|---|---|
| Nextera350bp lane 1 | 4.369 | 5.484 | 0.048 | 0.533 | 0.060 | 0.444 | 0.004 | 0.023 | 1.52E-03 |
| Nextera350bp lane 2 | 11.162 | 8.960 | 0.039 | 0.559 | 0.056 | 0.480 | 0.005 | 0.017 | 1.43E-03 |
| Hydroshear | 1.093 | 2.784 | 0.004 | 0.549 | 0.033 | 0.429 | 0.005 | 0.081 | 2.03E-03 |
| Nextera550bp lane 1 | 2.741 | 3.708 | 0.393 | 0.096 | 0.034 | 0.038 | 0.011 | 0.011 | 1.05E-03 |
| Nextera550bp lane 2 | 5.790 | 5.435 | 0.327 | 0.126 | 0.032 | 0.066 | 0.019 | 0.008 | 1.26E-03 |
| Nextera700bp | 23.357 | 14.710 | 0.041 | 0.216 | 0.046 | 0.126 | 0.009 | 0.032 | 3.64E-03 |
| noPCR550bp | 3.244 | 2.661 | 0.241 | 0.059 | 0.013 | 0.003 | 0.014 | 0.029 | 4.32E-04 |
| PCR900bp | 1.978 | 1.894 | 0.073 | 0.052 | 0.012 | 0.024 | 0.014 | 0.001 | 3.34E-04 |
| MP4kb | 2.528 | 2.745 | 0.300 | 0.361 | 0.048 | 0.306 | 0.002 | 0.004 | 5.36E-04 |
| MP7kb | 2.528 | 2.734 | 0.256 | 0.449 | 0.045 | 0.397 | 0.002 | 0.004 | 4.53E-04 |
| MP11kb | 1.641 | 2.205 | 0.168 | 0.652 | 0.046 | 0.601 | 0.001 | 0.004 | 2.56E-04 |
| CMCB41533 | 15.552 | 12.253 | 0.030 | 0.341 | 0.299 | 0.037 | 2.37E-04 | 2.59E-03 | 2.50E-03 |
Note.—Alignment statistics for all Sequoia (Strix occidentalis caurina) libraries and the CMCB41533 (Strix varia) library calculated using Picard’s CollectWgsMetrics.
Genomic Locations of Selected Microsatellite Loci
| Locus | Primer | References | Usage Comments | Length Primer | Length Alignment | Mismatches | Genome Scaffold | Genome Start | Genome End | Microsatellite Length (nt) |
|---|---|---|---|---|---|---|---|---|---|---|
| 13D8 | F | ( | population genetics ( | 22 | 22 | 0 | scaffold88 | 4,241,040 | 4,241,019 | 187 |
| 13D8 | R | 21 | 21 | 0 | scaffold88 | 4,240,854 | 4,240,874 | |||
| 15A6 | F | ( | population genetics ( | 21 | 21 | 0 | scaffold233 | 2,208,703 | 2,208,723 | 148 |
| 15A6 | R | 19 | 16 | 0 | scaffold233 | 2,208,847 | 2,208,832 | |||
| 1C6 | F | ( | None | 20 | 20 | 0 | scaffold178 | 2,550,734 | 2,550,753 | 110 |
| 1C6 | R | 20 | 20 | 0 | scaffold178 | 2,550,843 | 2,550,824 | |||
| 4E10 | F | ( | None | 22 | 22 | 0 | scaffold11 | 768,391 | 768,371 | 230 |
| 4E10 | R | 22 | 22 | 0 | scaffold11 | 768,162 | 768,183 | |||
| 4E10.2 | F | ( | population genetics ( | 18 | 18 | 0 | scaffold11 | 780,562 | 780,579 | 226 |
| 4E10.2 | R | 18 | 18 | 0 | scaffold11 | 780,787 | 780,770 | |||
| 6H8 | F | ( | population genetics ( | 21 | 21 | 0 | scaffold103 | 3,773,885 | 3,773,865 | 93 |
| 6H8 | R | 16 | 16 | 0 | scaffold103 | 3,773,793 | 3,773,808 | |||
| 8G11 | F | ( | None | 18 | — | — | — | — | — | — |
| 8G11 | R | 17 | — | — | — | — | — | |||
| Bb126 | F | ( | hybrid diagnostic ( | 20 | 20 | 0 | scaffold219 | 2,548,147 | 2,548,166 | 185 |
| Bb126 | R | 24 | 24 | 0 | scaffold219 | 2,548,331 | 2,548,308 | |||
| BOOW18 | F | ( | hybrid diagnostic ( | 19 | 19 | 1 | scaffold244 | 648,444 | 648,426 | 205 |
| BOOW18 | R | 20 | 20 | 1 | scaffold244 | 648,240 | 648,259 | |||
| FEPO5 | F | ( | population genetics ( | 22 | 22 | 0 | scaffold138 | 720,315 | 720,336 | 270 |
| FEPO5 | R | 25 | 25 | 2 | scaffold138 | 720,584 | 720,560 | |||
| Oe045 | F | ( | hybrid diagnostic ( | 23 | 23 | 2 | scaffold173 | 3,777,655 | 3,777,677 | 127 |
| Oe045 | R | 19 | 19 | 0 | scaffold173 | 3,777,781 | 3,777,763 | |||
| Oe053 | F | ( | population genetics ( | 23 | 23 | 1 | scaffold136 | 299,240 | 299,262 | 218 |
| Oe053 | R | 22 | 22 | 1 | scaffold136 | 299,457 | 299,436 | |||
| Oe128 | F | ( | hybrid diagnostic ( | 27 | 27 | 0 | scaffold722 | 802,232 | 802,206 | 319 |
| Oe128 | R | 24 | 24 | 0 | scaffold722 | 801,914 | 801,937 | |||
| Oe129 | F | ( | population genetics ( | 24 | 21 | 2 | scaffold529 | 3,066,759 | 3,066,739 | 266 |
| Oe129 | R | 24 | 24 | 1 | scaffold529 | 3,066,497 | 3,066,520 | |||
| Oe149 | F | ( | population genetics ( | 21 | 21 | 1 | scaffold11 | 51,010 | 50,990 | 258 |
| Oe149 | R | 20 | 20 | 0 | scaffold11 | 50,753 | 50,772 | |||
| Oe3-7 | F | ( | population genetics ( | 20 | 19 | 1 | scaffold35 | 572,329 | 572,347 | 129 |
| Oe3-7 | R | 23 | 23 | 0 | scaffold35 | 572,456 | 572,434 |
Note.—Locations of commonly used microsatellite loci in our draft genome assembly. We searched for all of the primer pairs used in several S. occidentalis population genetics studies as well all of those designed for use in S. o. lucida (Thode etal. 2002). The “Primer” column designates the forward or reverse primer with “F” or “R,” respectively. The “Reference” column gives the citation of the publication that originally described each primer pair. The “Comment” column gives the citation(s) of the publication(s) in which a primer pair has been used for population-level study of S. occidentalis or and/or study of S. occidentalis x S. varia hybrids. “Length alignment” refers to the length of the BLASTN (Altschul etal. 1997; Camacho etal. 2009) alignment. The “Microsatellite length” refers to the inferred length of the microsatellite PCR product based on the length of the primers and their mapping positions in the genome assembly.
. 1—Demographic history of Strix occidentalis caurina and Strix varia with bootstrap replicates. (Panel A) depicts the demographic history estimated for S. o. caurina. (Panel B) depicts the demographic history estimated for Strix varia.