| Literature DB >> 22554190 |
Sun Zhou1, Guoli Ji, Xiaolin Liu, Pei Li, James Moler, John E Karro, Chun Liang.
Abstract
BACKGROUND: Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be "unclean". Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22554190 PMCID: PMC3424822 DOI: 10.1186/1472-6750-12-16
Source DB: PubMed Journal: BMC Biotechnol ISSN: 1472-6750 Impact factor: 2.563
Figure 1The expanded definitions of cDNA terminal structures. The original four canonical cDNA termini – 5TSS, 3TSS, 5TNS and 3TNS [12] have been expanded by adding some sub-categories.
The top patterns of abnormal sequences detected in 172, 229ESTs generated from UGALGB
| 1 | 5' | N,3TSS-3,N,…,3TSS-3,N | 27426 | 30.74% | Single or multiple poly(A) | Additional file |
| 2 | 3' | (N,)5TNS-4,N | 7707 | 9.28% | Additional file | |
| 3 | 3' | (N,)5TNS-5,N | 6814 | 8.21% | The vector fragment 2 ( | Additional file |
| 4 | 5' | N,3TSS-5,V | 1414 | 1.59% | The vector fragment 2 ( | Additional file |
| 5 | 5' | N,5TNS-1,N | 1386 | 1.55% | 3′-like terminus, i.e., poly(T) | Additional file |
| 6 | 3' | 5TNS-2,N | 1238 | 1.49% | Additional file | |
| 7 | 5′ | N,3TSS-5,N | 873 | 0.98% | The vector fragment 2 ( | Additional file |
| 8 | 5' | N,3TSS-4,V | 800 | 0.90% | Additional file |
1V stands for vector sequence while N stands for non-vector sequence.
2 Total sequence numbers for a given case
3 Of 172,229 ESTs, 83,021 are designated as 3′-ESTs (with ".b" in their sequence names) whereas 89,208 as 5′-ESTs (".g" in their sequence names). The percentage is calculated using the total sequence number for each case divided by all 3′-end or 5′-end ESTs.
4All examples are displayed in Additional file 1: Figure S1.
Summary of restriction enzyme cutting abnormality (RECA)
| 5′: N,3TNS-1,V,3TSS-2 | 5′: a cDNA | FLD1_38_A06.g1_A029 (Fig. S2 A) | ||
| 5′: 5TSS,N,V,3TSS-2 | 5′: a cDNA sense strand | RTDR1_20_F07.g1_A015 (Fig. S2 C) | ||
| 5′:5TSS,3TNS-1,V,3TSS-2 | No cDNA | NXRV076_A06_F (Fig. S2 F) | ||
| 5′: 5TSS-2,V,5TNS-1,N | 5′: a cDNA | NXRV_013_E07_F (Fig. S2 G) | ||
| 5′: 5TSS-2,V,N,3TSS | 5′: a cDNA sense strand | RTFEPL1_26_F12.g1_A029 (Fig. S2 I) | ||
| Neither of the two enzyme sites is cut off. | 5′: 5TSS-2,V,3TSS-2 | No cDNA | NXCI_011_D03_F (Fig. S2 K) | |
| Both the two enzyme sites are cut off, but the vector fragment that should be removed still remains. | 5′: N,3TNS-1,V,5TNS-1,N | 5′: | RTCNT1_24_B05.g1_A029 (Fig. S2 M) | |
| XhoI cuts off at wrong site from the vector. | 5′: N,V,3TSS-2 | 5′: a cDNA sense strand | COLD1_26_G12.b1_A029 (Fig. S2 N) | |
| EcoRI cuts off at wrong site from the vector. | 5′: V,N | 5′: a cDNA sense strand | RTCA1_14_E09.g1_A029 (Fig. S2 O) |
Figure 2The expected construction of cDNA insertion and all types of Restriction Enzyme Cutting Abnormality (RECA). The label “Expected” means the expected construction of cDNA library. Sequencing direction is indicated as 3′ or 5′ with an arrow. VF1 (Vector fragment 1) and VF2 (Vector fragment 2) are referred to the left and right vector borders of the cloning sites. A, B, C, D, E and F are special types of RECA, defined as following: RECA-Type A: EcoRI site is cut off but XhoI site remains intact. A1: cDNA is inserted with inversion; A2: cDNA is inserted without inversion; A3: Adapter/linker fragments are inserted. RECA-Type B: XhoI site is cut off but EcoRI site remains intact. B1: cDNA is inserted with inversion; B2: cDNA is inserted without inversion. RECA-Type C: Neither of the two enzyme sites is cut off. RECA-Type D: Both the two enzyme sites are cut off, but the excised vector fragment remains. RECA-Type E: XhoI cuts off the vector at wrong site. RECA-Type F: EcoRI cuts off the vector at wrong site. The yellow color indicates EcoRI recognition site or EcoRI sticky end. The brown color stands for XhoI recognition site or XhoI sticky end. The blue represents the plasmid vector. Dark green denotes for adapter/linker fragment. cDNA insert direction is represented by red color with gradual changes: cDNA sense strand is from deep red to light red whereas cDNA non-sense strand is from light red to deep red.
Figure 3Detailed illustration of two sub-categories of Type A Restriction Enzyme Cutting Abnormality (RECA-Type A). RECA-Type A indicates that EcoRI site of the vector is cut off whereas XhoI site is kept. A1 is the special case where cDNA is inserted with inversion while cDNA is inserted without inversion for A2. Because XhoI and EcoRI sticky ends cannot be smoothly ligated, so a random sequence fragment between the vector and cDNA end have been detected. Blue stands for the plasmid vector, yellow for EcoRI, brown for XhoI, red for cDNA, gray for a random sequence fragment, pink for Adapter1, and green either for poly(A) in sense strand of cDNA or for poly(T) in non-sense strand of cDNA.
Numbers of each type of RECA sequences
| UGALAB | 432 | 101 | 0 | 107 | 100 | 0 | 2 | 9 | 9 | 5 | 765 |
| NCSUFBG | 3 | 0 | 37 | 18 | 12 | 250 | 0 | 0 | 0 | 2 | 322 |
| Total | 436 | 101 | 37 | 125 | 112 | 250 | 1 | 9 | 9 | 7 | 1087 |
1 Other types include two cases: (1) sequences with complicated patterns whose type is hard to be determined; (2) sequences with too bad quality to determine the sequence type
Figure 4Schematic view of double-termini adapters showing two types of concatenation.
Figure 5Snapshots of AFST user interfaces. a: The main interface allows users to upload their sequences, specify relevant information about vector and adapter/linker sequences, initiate data processing, and obtain tabular results showing abnormality. b: Details of a normal sequence. The high-quality region between 5TNS-4 (from 2 to 62, marked with blue and green) and 3TNS (from 900 to 926, marked with pink, yellow and blue) is the final clean sequence (i.e., the region with a light red background). The color legends and their meanings can be found by clicking ‘color table’. c: Details of an abnormal sequence. This sequence has RECA abnormality (RECA-Type A1), where the double-stranded cDNA insert is inverted in its orientation and inserted into the double-strand plasmid vector after enzyme digestion. The vector sequence region between 5TNS-2 (highlighted with blue and brown) and 5TSS-1 (highlighted with yellow and pink) is the part that should have been cut off theoretically after enzyme digestion.