| Literature DB >> 21291514 |
Hindrik Hd Kerstens1, Richard Pma Crooijmans, Bert W Dibbits, Addie Vereijken, Ron Okimoto, Martien Am Groenen.
Abstract
BACKGROUND: Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken.Entities:
Mesh:
Year: 2011 PMID: 21291514 PMCID: PMC3039614 DOI: 10.1186/1471-2164-12-94
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequencing and mapping results for the four chicken breeds analyzed for structural variation
| Sequencing | Mapping | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Breed | Raw reads | ||||||||
| 31.61 | 23.59 | 76.14 | 23.22 | 0.52 | 0.02 | 470 | 22547 | 549 | |
| 29.70 | 21.84 | 73.30 | 25.81 | 0.64 | 0.14 | 1019 | 22058 | 1872 | |
| 34.82 | 24.83 | 78.26 | 21.14 | 0.48 | 0.01 | 2108 | 21209 | 335 | |
| 32.28 | 20.64 | 76.60 | 22.64 | 0.54 | 0.07 | 7388 | 22058 | 1030 | |
Paired-end sequencing of RRLs resulted in the indicated number of raw reads per breed. Sequencing read counts are in millions. Mapping percentages are relative to Paired l32q20.
1Paired l32q20 = paired reads had the RRL restriction tag trimmed to 32 bp and were filtered for a minimum per base quality of 20;
2Concordant = both reads of a read pair mapped to the expected orientation relative to each other and in the expected distance according to the RRL size range;
3Neither end = none of the reads of a read pair mapped to the reference;
4One end = only one read of a read pair was mapped;
5Diff chr = both reads of a read pair mapped, but to different chromosomes;
6Too short = both reads of a read pair mapped to the expected orientation relative to each other but at a closer distance than expected based on the RRL size range;
7Too long = both reads of a read pair mapped at a larger distance from each other than expected;
8Relative orientation = reads of a read pair mapped in another orientation relative to each other than expected based on the reference chicken genome.
RRL construction simulated by an in silico AluI digest of the WASHUC2 build of the reference chicken genome
| Line | Size-range | Number of fragments | Genome fraction | Sequenced (32 bp reads) | RRL coverage calculated |
|---|---|---|---|---|---|
| 150-200 | 583826 | 101 Mb (8%) | 18.7 Mb (1.5%) | 37-40X | |
| 125-200 | 947538 | 151 Mb (12%) | 30.3 Mb (2.4%) | 22-26X |
Fragments were collected in corresponding size ranges as used in the in vitro RRL preparation. The total number of collected fragments and number of bases captured are indicators of what genome fraction was sampled. Based on trimmed reads, the fraction of the genome actually sequenced was calculated. The number of raw read pairs obtained (see Table 1) divided by the number of fragments is an indicator of the RRL coverage.
Figure 1Sequence coverage of the RRL. On the x-axis are the obtained sequence coverages of RRL-fragments estimated by read-pair clusters and on the y-axis the frequency in which they occurred (10 log scale)
Figure 2Distribution of fragment sizes for concordantly mapping reads in the four sequenced chicken breeds. For unclear reasons, broiler 2 had remarkably higher representation of smaller fragments (left long shoulder), whereas fragments in base pairs of the size range 180-200 were two magnitudes less abundant compared to the three other breeds.
Comparison of the mapping quality and distribution between concordantly and discordantly mapping read pairs
| Chromosome | Number of mapping read pairs | Average mapping quality | Mapping density | RRL density | |||
|---|---|---|---|---|---|---|---|
| 5329141 | 67.92 | 38 | |||||
| 3968343 | 68.14 | 39 | |||||
| 3344481 | 68.87 | 34 | |||||
| 2758645 | 68.53 | 34 | |||||
| 1975228 | 68.53 | 32 | |||||
| 1258393 | 68.31 | 30 | |||||
| 1336228 | 68.78 | 29 | |||||
| 1119526 | 68.63 | 27 | |||||
| 1016524 | 68.16 | 25 | |||||
| 761372 | 68.20 | 30 | |||||
| 677920 | 68.56 | 32 | |||||
| 864303 | 68.33 | 24 | |||||
| 780565 | 68.47 | 24 | |||||
| 740461 | 67.86 | 21 | |||||
| 669260 | 68.56 | 19 | |||||
| 722054 | 68.78 | 19 | |||||
| 1845751 | 68.05 | 40 | |||||
The number of concordant and discordant (in italics) mapping read pairs per chromosome are given. The average mapping quality of concordantly and discordantly mapping read pairs was calculated per chromosome. By calculating the mapping density, the distribution of mapping read pairs over the genome were evaluated. Mapping density was calculated by dividing the chromosome length by the number of concordantly/discordantly mapping read pairs. RRL density was calculated to ascertain the contribution of the RRL approach to differences in mapping density. RRL densities were calculated by dividing the chromosome length by the (in silico) estimated number of RRL fragments.
Validation structural polymorphisms
| Prediction | Confirmation | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 251 | 1 | X | 97 | 2 | NA | |||||
| 402 | 3 | 97 | 1,2 | 10_1627991-1628223 | 232 | 170 | 1,2 | |||
| 414 | 2 | 93 | W | NA | ||||||
| 640 | 1 | X | 99 | 1 | NA | |||||
| 661 | 121 | X | X | 77 | W,B,1,2 | NA | ||||
| 729 | 4 | X | 94 | W,2 | 3_110574268-110574832 | 564 | 165 | W,2 | ||
| 780 | 6 | X | 96 | W,1,2 | NA | |||||
| 884 | 1 | X | X | 99 | 1 | NA | ||||
| 970 | 2 | X | 99 | 1 | NA | |||||
| 1248 | 3 | 73 | 2 | 1_188914114-188915200 | 1086 | 162 | B,1,2 | |||
| 1319 | 1 | 97 | 2 | 2_55356006-55357163 | 1157 | 162 | 1,2 | |||
| 1376 | 2 | 70 | 2 | 4_23256240-23257477 | 1237 | 139 | W,B,1,2 | |||
| 5845 | 1 | X | 90 | W | 2_112569238-112574924 | 5686 | 159 | W | ||
| 19574 | 15 | X | 96 | W,1 | - | - | - | - | ||
| 8128 | 489 | X | 93 | 2 | 1_61836457_61844398 | 7941 | 187 | W,B,1,2 | ||
| 64 | 48 | 71 | B,1,2 | 2_152470660* | 1,2 | |||||
| 86 | 39 | 69 | 2 | 3_19576932 | 115 | 201 | W,B,1,2 | |||
| 229 | 141 | 79 | B,1,2 | 4_43663736-43663781 | 45 | 184 | W,B,1,2 | |||
| 274 | 10 | 76 | B,1,2 | 6_6687386-6687469 | 83 | 191 | B,1,2 | |||
| 283 | 140 | X | 74 | B,1,2 | 2_46860428-46860509 | 81 | 202 | B,1,2 | ||
| 360 | 4 | 76 | 1 | 3_67474749-67474961 | 212 | 148 | 1 | |||
| 367 | 21 | X | 72 | B | 1_189692870-189693048 | 178 | 189 | B | ||
| 544 | 4 | 69 | 1,2 | 7_28561048-28561407 | 359 | 185 | 12 | |||
| 662 | 2 | 60 | 1 | 1_44948882-44949390 | 508 | 154 | W,B,1,2 | |||
| 868 | 2 | X | 97 | 2 | 1_99177206-99177957 | 751 | 117 | B,1,2 | ||
Structural variants (SV) 13-18 were chosen before application of the empirical rule (span-size deviation) × n >500, whereas 50-59 were chosen after. Span size is the distance (in base pairs) on the reference sequence spanned by discordantly mapping read pairs. The number of observed discordantly mapping read pairs that support the presence of this structural variant is given by n. CMP is flagged in case there were also concordantly mapping read pairs observed in that particular genomic region. Discordantly mapping read pairs spanning an assembly problem in the reference genome are flagged in the RE column. The alternative mapping quality of a predicted SV is the average mapping quality calculated over discordantly mapping read pairs within a cluster. Deletion breakpoints are in the notation chr_start-stop, whereas insertion breakpoints are given in the notation chr_position. Not acquired (NA) breakpoints were due to false positive SV predictions whereas breakpoints for SV27 were not acquired for technical reasons and not accurately acquired in SV50 due to low sequence complexity. W = white egg layer; B = brown egg layer; 1 = broiler 1; 2 = broiler 2.
*Due to the low sequence complexity, the exact location of insertion could not be revealed
Figure 3PCR-based genotyping on a breed level (A) and individual level (B). A) Genotyping for the presence of SVs in breeds, represented by pooled samples. Except for SV50 and SV51, a small (see Table 4 for approximate sizes and breed encoding) PCR fragment that was absent in the reference was expected in some of the breeds that have the deletion. In SV50 and SV51, a slightly larger PCR fragment than that observed in the reference was expected in breeds that have the insertion. B) Genotyping for the presence of SVs in eight individuals of breeds in which the SV was detected in pooled samples. Except for SV50 and SV51, a small PCR fragment was expected in individuals homozygous for the deletion and SVs in which the reference genotype is too long for PCR. Heterozygous individuals in which both genotypes can be spanned (see Table 4) by PCR show two bands. In SV50 and SV51, both PCR fragments, which differ slightly in size, are expected in heterozygous individuals, whereas only the larger fragment is expected in individuals homozygous for the insertion.
Figure 4Distinguishing putative deletions from false positives in genotyping validation results obtained by PCR. Predicted deletions in the initial validation study that were confirmed are in green; those that could not be confirmed are in red. The black line represents the discrimination rule (span-size difference)×n >500, which is valid for 220-720 bp. The SV predictions that were selected based on the model and confirmed are in blue.
Figure 5Venn diagrams representing the distribution of predicted deletions in the four chicken breeds at mapping constraints 60 (left) and 35 (right). The number of structural variants is proportionally represented per breed, and line colors were assigned as follows: green = brown egg layer; blue = broiler 1; red = broiler 2; and purple = white egg layer. For example, the area that is surrounded by the blue line in the left diagram represents SVs found in broiler 1. Of these, 23 were specific for broiler 1 (yellow area), and 28 were shared with broiler 2 (dark yellow area surrounded by both the blue and red lines). The orange area surrounded by the blue, red, and green line represent 18 SVs shared by broiler 1, broiler 2, and brown egg layers. The red area in the middle of the diagram surrounded by the four line colors represents 20 SVs shared by the four breeds analyzed.
Figure 6Size distribution of predicted deletions at two mapping constraints.
Figure 7Distribution of predicted SVs over the chicken chromosomes. Shown are chicken chromosomes in which 186 deletions (red) and 2 insertions (blue) were identified.
Analyses of putative deletions for their effects on gene annotations
| Breakpoints* | Transcript(s) | Modification | Protein |
|---|---|---|---|
| 8_4940538-4940787 | ENSGALT00000005255 | Truncation last exon | Flavin_mOase |
| 14_14073018-14073274 | ENSGALT00000003325 | Truncation exon 9 or 5' deletion exon 10 | PDZ domain |
| 3_78504957-78505263 | ENSGALT00000025445 | 5' deletion in last exon | Ionic channel |
| 9_6501514-6501912 | ENSGALT00000008864/40988 | 5' deletion in exon 4 | Transcription factor |
| 1_70753183-70753846 | ENSGALT00000022933 | Truncation exon 10 | EGF-like |
| 1_13962380-13963075 | ENSGALT00000013428 | Truncation exon 2 | Unknown |
| 11_748787-749698 | ENSGALT00000002076/23151 | Truncation last exon | ADP-ribosylation factor-like |
Putative deletions with breakpoints predicted in exons were further analyzed in Ensembl [26]. Involved transcripts and protein functions were identified and putative modifications recorded.
*Breakpoints are estimated from the mapping results and might differ a few tens of bases from the exact genomic locations.
Putative functional annotations of predicted SVs
| Coding | Repeats | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 280 | 43.9 | 0.36 | 5 | 19.6 | 5.3 | 5.0 | 25.0 | 36.1 | 42.9 | |
| 186 | 43.0 | 0.54 | 3.8 | 18.8 | 4.3 | 3.2 | 26.9 | 36.6 | 41.9 | |
SVs of data subsets aamq 35 and aamq 60 were annotated based on their mapping location on the chicken genome. SVs were analyzed to determine whether they mapped within genes, within exons, or partially overlapped exons.
1CR1 = chicken repeat 1 [36]
2GGLTR = Gallus gallus long terminal repeat
3other = other specific repeat classes
4SVs that mapped in repetitive sequences were analyzed for signatures of common repeats in the chicken genome and scanned for tandem repeats identified by Tandem Repeat Finder [37];
5SVs that mapped in repetitive sequences were analyzed for signatures of simple repeats identified by the DUST algorithm [38];
6The fraction of SVs that mapped in intronic and intergenic regions not identified as repetitive or low complexity are given in column "%!".
Annotation of confirmed deletions and DNA signatures at breakpoints
| Breakpoints | Gene | Exons | Repeats | Signatures |
|---|---|---|---|---|
| 4_43663736-43663781 | ENSGALG00000010719 | ENSGALE00000116074 | MH | |
| 2_46860428-46860509 | ENSGALG00000012116 | |||
| 6_6687386-6687469 | ||||
| 1_189692870-189693048 | ||||
| 3_67474749-67474961 | ||||
| 10_1627991-1628223 | ENSGALG00000001729 | trf1 | MH | |
| 7_28561048-28561407 | ENSGALG00000011699 | dust | ||
| 1_44948882-44949390 | dust | |||
| 3_110574268-110574832 | ENSGALG00000016679 | CR1-F0, Z-REP, trf, dust | ||
| 1_99177206-99177957 | ||||
| 1_188914114-188915200 | dust, trf | |||
| 2_55356006-55357163 | ENSGALG00000012402 | dust, trf | ||
| 4_23256240-23257477 | ENSGALG00000020249 | dust, trf | ||
| 2_112569238-112574924 | CR1-Y4, dust, trf | |||
| 1_61836457_61844398 | ENSGALG00000012956 | CR1-D2, Mariner1, GG, dust | MH |
Deletions were annotated based on their mapping position on the chicken genome and deleted sequences were analyzed for common and more chicken-specific repeats. trf = repeats identified by Tandem Repeat Finder [37]; dust = simple repeats identified by the DUST algorithm [38]; CR1, = chicken repeat 1 [36]; Z-REP = macrosatellite family on chicken chromosome Z [39]; GG = repeats on the chicken genome identified by RECON [40]. We also analyzed the DNA sequence at SV breakpoints for signatures indicating the mechanism by which the SVs are formed, and we identified microhomology (MH) in some cases.
Figure 8Microhomologies detected in sequenced SVs. Shown are the three SVs in which microhomology (grey boxes) was detected at the SV breakpoints.