| Literature DB >> 26697366 |
Kevin C Lambirth1, Adam M Whaley2, Jessica A Schlueter2, Kenneth L Bost1, Kenneth J Piller1.
Abstract
Transgenic crops have become a staple in modern agriculture, and are typically characterized using a variety of molecular techniques involving proteomics and metabolomics. Characterization of the transgene insertion site is of great interest, as disruptions, deletions, and genomic location can affect product selection and fitness, and identification of these regions and their integrity is required for regulatory agencies. Here, we present CONTRAILS (Characterization of Transgene Insertion Locations with Sequencing), a straightforward, rapid and reproducible method for the identification of transgene insertion sites in highly complex and repetitive genomes using low coverage paired-end Illumina sequencing and traditional PCR. This pipeline requires little to no troubleshooting and is not restricted to any genome type, allowing use for many molecular applications. Using whole genome sequencing of in-house transgenic Glycine max, a legume with a highly repetitive and complex genome, we used CONTRAILS to successfully identify the location of a single T-DNA insertion to single base resolution.Entities:
Keywords: Agrobacterium; FISH, Fluorescent In-situ Hybridization; IGB, Integrated Genome Browser; Insertion; Junction sequences; NGS, Next-Generation Sequencing; Next generation sequencing; T-DNA, Transfer DNA; Transfer DNA; Transformation; hTG, human thyroglobulin
Year: 2015 PMID: 26697366 PMCID: PMC4664744 DOI: 10.1016/j.gdata.2015.09.001
Source DB: PubMed Journal: Genom Data ISSN: 2213-5960
Fig. 1Experimental pipeline. Flowchart detailing each major step in the pipeline, from DNA extraction and sequencing to alignment to the reference genome and T-DNA sequence.
Fig. 2Plasmid map of hTG construct. The hTG plasmid map shows all regions included in the transformation plasmid utilized in the Agrobacterium transformation of the original ST77 event. The T-DNA construct contains the soybean β-conglycinin promoter (7S), tobacco etch virus translational enhancer element (TEV), human thyroglobulin gene (hTG), cauliflower mosaic virus terminator element (T35S) followed by the selectable marker cassette comprised of the nopaline synthase promoter (NOS promoter), phosphinothricin acetyltransferase gene (BAR ORF), and nopaline synthase terminator element (NOS Term). The aad A region of the vector confers antibiotic resistance to spectinomycin and streptomycin for selection of Agrobacterium.
Discordant read pairs and sequences. All discordant read pairs for ST77 and the position of the start of the read are shown, as well as their mated sequence and pair relationship.
| Read origin | Start base | Mate pair relationship | Read sequence |
|---|---|---|---|
| Chr03 | 44,332,446 | Mate1; other read matches reverse reference; this read is one of a pair | ATTAGGATGACCCGACATGTCTCTTAGAATGAGTAACATAAAAC1TAGAATTATGGAAATTAGAATATTTCAAGAGCCTTTCACTTCAACTGATTATAAG |
| scaffold ST77 | 187 | Mate2; this read matches reverse reference; this read is one of a pair | AGTCACGACGTTGTAAAACGACGGCCAGTGCCAAGCTTGCATGCCTGCAGGATCCATGCCCTTCATTTGCCGCrTATTAATTAATTTGGTAACAGTCCGT |
| scaffold ST77 | 11,433 | Mate1; other read matches reverse reference; this read is one of a pair | CGGCGTTAATTCAGTACATTAAAAACGTCCGCAATGTGTTATTAAGTTGTCTAAGCGTCAATTTGTTTACACCACAATATATCCTGTTCAACATTCAACA |
| Chr03 | 44,332,928 | Mate2; this read matches reverse reference; this read is one of a pair | TAATAATAAAACAAGTAGTCCTTGGCTAGTTGGCTTACTTTTCATGTTTTAAGGAAACAAGTTGAGGAAGGGAAAAAATGTTGATACTGCTGCrCGTACG |
| Chr03 | 44,332,927 | Mate1; this read matches reverse reference; this read is one of a pair | ATAATAATAAAACAAGTAGTCCTTGGCTAGTTGGCTTACTTTTCATGTTTTAAGGAAACAAGTTGAGGAAGGGAAAAAATGTTGATACTACTACTCGTAC |
| scaffold ST77 | 11,269 | Mate2; other read matches reverse reference; this read is one of a pair | AAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCmCCAGTCGGGAAACCTGTCGTGC |
| scaffold ST77 | 226 | Mate1; this read matches reverse reference; this read is one of a pair | CATGCCTGCAGGATCCATGCCCTTCATTTGCCGCTTATTAATTAATTTGCTAACAGTCCGTACTAATCAGTTACTTATCCrTCCTGCATCATAATTAATC |
| Chr03 | 44,332,559 | Mate2; other read matches reverse reference; this read is one of a pair | ATTTAGTTAATACAACGTGGATGAAGAAAGGAAAGACATTAGAGAAAGAGTAAGCAAATAACGCACTCGATTTGTTATCTAATTAGTATGCTGTTGTACC |
Fig. 3Insert location range and PCR verification. (A) The established maximum range of the location of the T-DNA insert based on discordant paired-end read mates. The discordant paired read reported farthest upstream began at base 44,332,659. The discordant paired read reported farthest downstream began at base 44,332,827. (B) Primer sequences and attributes used in the amplification of right and left border T-DNA junction sequences. The resulting products and their sizes are shown for the transgenic sample analyzed in duplicate, including a wild-type control using primers F1 and R2 to amplify the genomic insert locus in the absence of the hTG T-DNA.
Fig. 4Aligned sequenced PCR products and insert layout. (A) Section of the insert location between two soybean genes. Colored bars represent sequences from the PCR amplicons of the junction sites that aligned to the soybean reference genome on chromosome 3. Purple is the product from primer F1, green from primer R1, yellow from primer F2, and blue from primer R2. 40 bases of genomic DNA have been deleted as a result of the insertion, shown as the uncolored region between the primer products. Start bases for each primer product are shown, as well as their alignment to either the sense or antisense DNA strand. (B) Illustration of the constructed consensus sequence of the T-DNA insert locus, showing the location of the primers used for junction characterization, flanking genomic DNA sequences, and inserted T-DNA elements.