| Literature DB >> 16326860 |
Roger A Hoskins1, Mark Stapleton, Reed A George, Charles Yu, Kenneth H Wan, Joseph W Carlson, Susan E Celniker.
Abstract
cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16326860 PMCID: PMC1301602 DOI: 10.1093/nar/gni184
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Primer3 parameter settings
| Parameter | Setting |
|---|---|
| Primer length | 23 bases ±2 |
| Max. number of Ns in primer sequence | 0 |
| Product size | Full-length of annotated transcript |
| 65° ±5 | |
| GC clamp | Most 3′ base must be G or C |
| GC content | 50% ±20 |
| Max. complementarity (self) | 8 |
| Max. complementarity (paired primer) | 8 |
| Max. mononucleotide repeat in primer | 5 bases |
| Max. end stability | 9 |
aSee for score calculation methods.
Summary of selected cDNA clones
| Gene name | Clone ID | GenBank accession nos | Classification | Annotated transcript length | cDNA insert length | Annotated CDS length | cDNA ORF length |
|---|---|---|---|---|---|---|---|
| IP01413 | match | 961 | 962 | 603 | 603 | ||
| IP01330 | n.d., S268P | 2317 | 2147 | 1581 | 1581 | ||
| B-H2 | IP01479 | match | 3089 | 3034 | 1935 | 1935 | |
| bsh | IP01040 | 5′ extension | 1524 | 2034 | 1281 | 1287 | |
| btn | NC | N/A | co-ligated | 2332 | N/A | 474 | N/A |
| C15 | IP08859 | n.d., S113I | 1105 | 1880 | 1017 | 1017 | |
| Cdk7 | IP01401 | match | 1392 | 1457 | 1059 | 1059 | |
| IP01389 | match | 1626 | 1743 | 900 | 900 | ||
| dmrt11E | NC | N/A | genomic | 1134 | N/A | 1131 | N/A |
| dmrt99B | IP01169 | match | 1533 | 2343 | 1530 | 1530 | |
| dys | IP08837 | 3′ extension | 2707 | 3484 | 2643 | 2655 | |
| IP08836 | 3′ short | 7830 | 2631 | 6477 | 2110 | ||
| e(y)2 | IP01143 | match | 481 | 468 | 303 | 303 | |
| eve | NC | N/A | SLIP artifact | 1468 | N/A | 1128 | N/A |
| Fer2 | NC | N/A | SLIP artifact | 840 | N/A | 837 | N/A |
| ftz | IP01266 | match | 1758 | 1758 | 1230 | 1230 | |
| gcm2 | IP01423 | match | 2415 | 2257 | 1818 | 1818 | |
| gsb | IP01408 | match | 1452 | 1652 | 1281 | 1281 | |
| H15 | IP01538 | match | 2555 | 2606 | 1980 | 1980 | |
| NC | N/A | 3′ short | 7002 | N/A | 5877 | N/A | |
| hbn | IP01393 | match | 1802 | 1790 | 1227 | 1227 | |
| Her | IP01491 | match | 450 | 631 | 447 | 447 | |
| HGTX* | IP01125 | match | 3049 | 3229 | 1539 | 1539 | |
| HLH3B | IP01280 | match | 1353 | 1434 | 1128 | 1128 | |
| HLH4C | IP01307 | match | 1424 | 1456 | 501 | 501 | |
| HLHm7 | IP09063 | co-ligated | 723 | 1061 | 558 | 558 | |
| HLHmdelta | IP01594 | match | 1016 | 1017 | 519 | 519 | |
| HLHmgamma | IP08862 | match | 842 | 959 | 615 | 615 | |
| lbl | IP08853 | exon variant | 1847 | 1752 | 1116 | 882 | |
| nau | IP01012 | exon variant | 1534 | 1450 | 996 | 984 | |
| nht | IP01149 | exon variant | 780 | 966 | 777 | 735 | |
| OdsH | IP01524 | match | 1226 | 1310 | 1146 | 1146 | |
| Poxn | IP01592 | match | 2178 | 2468 | 1275 | 1275 | |
| Rfx | NC | N/A | 3′ short | 3943 | N/A | 2691 | N/A |
| rn | IP01358 | exon variant | 3661 | 3118 | 2838 | 1626 | |
| ro | IP01518 | match | 1241 | 1202 | 1050 | 1050 | |
| IP01323 | exon variant | 732 | 609 | 450 | 417 | ||
| sc | IP01419 | match | 1422 | 1432 | 1035 | 1035 | |
| sens | IP01345 | match | 2450 | 2461 | 1623 | 1623 | |
| sisA | IP01195 | match | 768 | 770 | 567 | 567 | |
| NC | N/A | genomic | 3159 | N/A | 2007 | N/A | |
| Sox15 | IP09065 | n.d., P319L | 3654 | 3638 | 2352 | 2352 | |
| Sox21a | IP01552 | co-ligated | 1167 | 2993 | 1164 | 1164 | |
| Su(z)2 | IP01427 | co-ligated | 6313 | 2218 | 4104 | 1806 | |
| sv | IP01047 | exon variant | 4690 | 920 | 2382 | 537 | |
| TfIIA-S-2 | IP09007 | co-ligated | 2917 | 4415 | 1527 | 1527 | |
| TfIIEbeta | IP01109 | match | 1052 | 1022 | 876 | 876 | |
| tj | NC | N/A | genomic | 1530 | N/A | 1527 | N/A |
| tll | IP01133 | match | 1938 | 1942 | 1356 | 1356 | |
| IP01285 | exon variant | 4114 | 3504 | 3282 | 2670 | ||
| zen | NC | N/A | SLIP artifact | 1272 | N/A | 1059 | N/A |
| CG10147 | IP01005 | match | 1347 | 1792 | 1344 | 1344 | |
| CG10309 | IP01015 | exon variant | 2778 | 3308 | 2775 | 2772 | |
| CG10348 | IP08802 | 5′ extension | 1593 | 2054 | 1590 | 1629 | |
| CG10431 | IP01025 | 5′ extension | 2151 | 3382 | 2148 | 2352 | |
| CG11085 | IP01054 | exon variant | 828 | 1576 | 825 | 885 | |
| CG11152 | IP01059 | match | 1800 | 2320 | 1797 | 1797 | |
| CG11294 | IP01065 | match | 946 | 1021 | 783 | 783 | |
| CG12029 | IP01101 | 5′ extension | 503 | 3021 | 327 | 2253 | |
| CG13287 | NC | N/A | 3′ short | 1386 | N/A | 1383 | N/A |
| CG15258 | IP01147 | match | 591 | 702 | 588 | 588 | |
| CG15336 | IP01157 | antisense | 546 | 705 | 543 | 318 | |
| CG15710 | IP01184 | match | 798 | 932 | 795 | 795 | |
| CG15782 | IP01192 | gene merge | 455 | 2383 | 426 | 1920 | |
| CG1663 | IP01201 | match | 1164 | 1411 | 1161 | 1161 | |
| CG16779 | NC | N/A | co-ligated | 5943 | N/A | 5940 | N/A |
| CG16899 | IP01211 | exon variant | 1074 | 1928 | 1071 | 1326 | |
| CG17186 | IP01220 | match | 1146 | 1433 | 1143 | 1143 | |
| CG17195 | IP01224 | 5′ extension | 737 | 942 | 723 | 849 | |
| CG17196 | IP01227 | 5′ short, u.s. | 831 | 890 | 828 | 495 | |
| CG17197 | IP01230 | dicistronic | 951 | 1656 | 948 | 870 | |
| CG17198 | IP01235 | 5′ extension | 873 | 1087 | 858 | 897 | |
| CG17287 | IP01239 | 5′ extension | 1017 | 1801 | 1014 | 1080 | |
| CG17328 | IP01243 | match | 1413 | 1480 | 1239 | 1239 | |
| CG17385 | IP01247 | match | 837 | 1192 | 834 | 834 | |
| CG17568 | IP01252 | 3′ extension | 1509 | 1856 | 1506 | 1539 | |
| CG17801 | NC | N/A | 5′ & 3′ short | 1054 | N/A | 1035 | N/A |
| CG17803 | IP01257 | 5′ extension | 1401 | 2038 | 1329 | 1761 | |
| CG18476* | IP01261 | match | 2954 | 2982 | 2808 | 2808 | |
| CG2120 | NC | N/A | genomic | 1035 | N/A | 1032 | N/A |
| CG30417 | IP01291 | match | 807 | 863 | 804 | 804 | |
| CG30431* | IP01295 | match | 1810 | 1847 | 1254 | 1254 | |
| CG30443* | IP01303 | match | 1771 | 1801 | 1686 | 1686 | |
| CG31241* | IP01327 | match | 2020 | 2053 | 1473 | 1473 | |
| CG31612* | IP01335 | match | 3308 | 3452 | 2961 | 2961 | |
| CG32611* | IP08939 | SLIP artifact | 3313 | 3001 | 3309 | 2136 | |
| CG32705* | IP01380 | 5′ short | 4705 | 1215 | 4101 | 1188 | |
| CG32767* | IP01381 | n.d., frame | 7670 | 5378 | 3843 | 3672 | |
| CG32772* | IP01388 | exon variant | 2476 | 3027 | 1629 | 1560 | |
| CG3485 | IP01409 | intron | 993 | 1677 | 990 | N/A | |
| CG40351* | IP01431 | 3′ short | 5846 | 2006 | 4924 | 1750 | |
| CG4318 | IP01435 | 5′ extension | 699 | 1487 | 696 | 708 | |
| CG4328 | IP01440 | n.d., frame | 1593 | 1491 | 1590 | 816 | |
| CG4565 | IP01448 | 3′ extension | 672 | 886 | 669 | 825 | |
| CG4676 | NC | N/A | intron | 1008 | N/A | 852 | N/A |
| CG4956 | IP01459 | 5′ extension | 858 | 1057 | 855 | 906 | |
| CG5245 | IP01468 | exon variant | 1506 | 1542 | 1503 | 1422 | |
| CG6118 | IP09048 | 5′ short, u.s. | 2832 | 4052 | 2829 | 2646 | |
| CG7368 | IP08855 | match | 1593 | 2849 | 1590 | 1590 | |
| CG7691 | IP01563 | match | 852 | 1306 | 849 | 849 | |
| CG7963 | NC | N/A | co-ligated | 966 | N/A | 963 | N/A |
| CG8089 | IP01584 | n.d., frame | 1875 | 1998 | 1872 | N/A | |
| CG8117 | IP08861 | match | 489 | 792 | 486 | 486 | |
| CG9793 | IP09168 | intron | 1041 | 1227 | 1038 | N/A |
aGenes represented in our EST collection are indicated by asterisks.
bClone ID numbers for compromised clones that were not fully sequenced are not reported (NC).
cClones that were not fully sequenced were not submitted to GenBank and have no accession numbers (N/A).
dClone classifications relative to the Release 4.1 annotations are indicated. Nucleotide discrepancies (n.d.) are reported with either the corresponding difference in the predicted protein sequence or an indication of a frameshift (n.d., frame). 5′ short clones with upstream in-frame stop codons (5′ short, u.s.), genomic contaminants (genomic), and retained introns (intron) are also indicated by abbreviations. All other classes are reported as described.
eRelease 4.1 annotated transcript lengths and annotated CDS lengths are reported in nucleotides. For genes with multiple annotated transcripts, the length of the one that most closely matches the cDNA sequence is reported.
fcDNA insert and ORF lengths are reported in nucleotides. For clones with unfinished sequences, these data are not known with confidence (N/A). For clones classified as ‘co-ligated’, ‘genomic contaminant’ or ‘retained intron’, ORF lengths are not reported (N/A).
gRpb4 and Ada2A were separate gene annotations in Release 3.1 but are merged into one in Release 4.1. The cDNAs recovered in the two experiments correspond to different Release 4.1 transcript isoforms.
Figure 1Description of SLIP. (A) A pair of oppositely directed PCR primers is designed within an exon of a target gene. The primers abut at their 5′ end with no overlap, and the 5′ end are phosphorylated. (B) The primers are used to amplify specific clones from a plasmid cDNA library. The positions of the primers (arrows) within a target cDNA are shown, with the vector indicated in white and the cloned cDNA insert indicated in black and white cross-hatch. The resulting linear products are complete sequences of target clones, including the intact vector and the entire insert, which is split into two halves at the position of the PCR primers. Self-ligation of the linear PCR products into circular products replicates the original target cDNA clones. The methylation-sensitive restriction enzyme DpnI is used to digest the un-amplified plasmid library DNA, leaving the self-ligated amplification products intact. These products are cloned, sequenced and analyzed as described in the text to identify bona fide target-specific cDNAs.
Experimental design
| Gene name | Transcript length | Primer 1 | Primer 2 |
|---|---|---|---|
| 4743 | GCGAGAGAGAAAGAGCGTACGAG | TCTCGTGGTTTCCTCCTGACC | |
| 961 | GGGAACGCAACCGCGTAAAGC | GGGCATTTCTCCGGATAACAGAG | |
| 2422 | GGAACTCCATGGTTTTGTATAATCC | TGGTGTGTTCATTCTGATGTGC | |
| amos | 1154 | AATCGGGTACCTGAGCGGATCG | GCCAACCTCTTGAGGATCAGCAG |
| ato | 1483 | AACTGCCATTGGTCGTGCCACTC | GGTGGTGAGTTGCAGCGGTCTC |
| BBS2 | 1650 | TGGAGTTCGAGCGTATAGCCACTG | CTTCCTCGGTGGCCTCATTCC |
| B-H2 | 3089 | CCGGAAATGTCCGCAACAACG | TGGCATTGTGGTCATGTGTGG |
| bsh | 1524 | TCCCACTACAACGGAGATCAG | AGTTCCGTGTCGCGAGTGGTG |
| Bteb2 | 962 | CCGGACTTAAGTGATTGGGAGCAG | CGGACATACGGTCAGGTCATTG |
| btn | 2332 | TCACTCTTTCCACTTCACAACATGC | AAACAAAATGAGAGTGTGCAAATG |
| C15 | 1105 | CCATTGAGCGAGTCCCTGCAGTC | CGTTTCGGACTCGTCGTAGCAG |
| cato | 570 | CCGGAATGGCAATTCTTGGATG | CGATAAGAAAGCCCCCTGTCC |
| Cdk7 | 1392 | CTAAACGATGCTGCCCAATGC | GAGCCTCATAAATAGCACGAAAATG |
| CrebB-17A | 1080 | GACCGGGTGCTTGGTGTCAAC | CCCAAATGCTCACCTGCAGTC |
| 1626 | ACTGCCCGTTGAAATTCAGAATAC | GGGGAGAGGGAATCGGCCTAC | |
| dimm | 1173 | GTGCCACCAGACGAACTTCACAG | GACGGACGGGTCGAGAACTTCC |
| dmrt11E | 1134 | TCGCTGTGTTGTACCTCATGC | GCATGCAAGGGATCGGACTCG |
| 978 | CAAGAAGCTCTGCACCTACAAGAAC | TGACCCCGCAGCTCTGAAATG | |
| dmrt99B | 1533 | CGCCTTGAAGGGACACAAACG | CTGACCACTCCGTGGTTCCTG |
| dys | 2707 | ACGAAGGGCGCCTCGAAGATG | CGATTTGTTTGCATCGAATCTTG |
| 8834 | AAAATTTTCACGGTTGCTTAAATGG | TGCGTCTGTTTAAATGTCACTCTTC | |
| 540 | CCGAGCTACGAGGTGATGATGG | CGACAAGTGTTTTCAGGTTGTCC | |
| E(y)2 | 481 | AGATACGCGACACAAGGATGAGC | AAGTCTTGCAAATTACCAAGTTTCC |
| E5 | 1575 | GAGATCGGCTCCACTAAGGGTCAG | GTCGAGGATTCGCCCACAATC |
| 5994 | AGTTTCCGCCGCATTGTAATTG | CATTCAGCAAGTATTCTGCTTTCTC | |
| eve | 1468 | ATCCTTCCTGGTTACCCGGTACTGC | ACCTCGCTCCTGCCAGTTACTTC |
| fd3F | 1083 | CGGTCACCTGTGGGCCATTTC | GCTCCTTGGGGCGCTTTAACTC |
| fd64A | 1098 | GGCCTTCTACTACCAGGGCATCG | GGTGAAAACGATCCGCACATCAG |
| fd96Ca | 1119 | CCGCTCAGCGATATCTACAAG | CAACATTTTCTCCGGCGAACTC |
| fd96Cb | 825 | TGGCCTTCGATATGTTCGAGAATG | TGGGATGAAGTGTCCAGTAGGAG |
| Fer2 | 840 | CCAGCAGCATTATATGCAACATAGC | ATGTGACGCAGGTTGTTGGAG |
| ftz | 1758 | TGTACAACATGTATCACCCCCACAG | TGTTCATGTTGTCGGCGTAGCTG |
| gcm2 | 2924 | GGGCTTTCGAATCGCGGAAAAC | TGCGAACAGGCAACACTTGAG |
| gsb | 1452 | AGCTGGAGTCCGTCCCTGTGTC | GCTGCCATCTCCACGATTTGG |
| H15 | 2555 | GACCGCAAATACGGGCGTAAAG | TCCGGTTTTTCGTGCTATTTATC |
| ham | 3327 | ACATGCAGCGAGTGGCACCAG | CCTTGGACGCACAGGACATCTG |
| 7002 | AGGAAACCCAAGAGCGAAACTCC | TTATCTTCGCCTATTTTTCCACTTC | |
| hbn | 1802 | AAAAACCAACTTGTAGCAAGTGAAG | TCTTATTTTGTTAGCGATTTTCCAG |
| Her | 450 | CCCAATTGATTGCTATTGGAGTGG | CTCTGATATACTCCGGATGTAGGC |
| HGTX* | 3049 | CATATAGCCTGATCTCGTTCAAATC | GGTAACTCCGTGGCCGGAAAATATC |
| HLH3B | 1353 | CCGGGCACCTGAACGGTAATG | ATCGCGGTGACTCGTTGGTCTG |
| HLH4C | 1424 | ACCGAAATCAGTGGTGCAAATAGC | GCTGGACACTGGACTTTCTTGC |
| HLH54F | 1066 | GATGCCAGTTCTCAAAGCTCCCAAC | CTCATCGAAGTCGTCATCAAAGAAC |
| HLHm7 | 723 | CTCCGCAAGCTGAAAGAGTCTAAG | ATGCTGCACGGTGACTTCCAG |
| HLHmdelta | 1016 | ACAATGGCCGTTCAGGGTCAG | GTATAATGGGTTTTGATTTGGTGTG |
| HLHmgamma | 842 | CTGGAACTTACCGTCACCCATTTGC | GATATCGGCTTTCTCCAAACG |
| Hmx | 792 | GATGGCAACTCGAAGAGAAAGAAG | ACCATGTGGCGAGGAGCTGTC |
| lbe | 2045 | GTCAGTATCGTCAGTACCGACTTG | AAGCCTTGTACACTCAAATCTTGC |
| lbl | 1847 | CCGTAAGGATACAGCCAGGATGTGC | CTTAGCTCCAAACTCTTTTCTACGG |
| nau | 1534 | CGTACGGTCCGCAAATCGAAGTC | CGAGTGTGTGTACCGCCTTCC |
| nerfin-2 | 2088 | AGTGTCGGGCATTACCAGCAATC | CGACGAAGTGAGTGGTGTCTGG |
| Neu2 | 1149 | TCCAACACCATATGCAAGTCCTG | CAGTGGATCGTGCTTCTCAAC |
| nht | 780 | GCAAGGCAAAAGTCTCCATAAAG | TTGCATCCTGAGAGCCTGAGTC |
| OdsH | 1226 | GCCCCAAATCCGGAAATTAGTC | ATCCATGGACAAGTTGAGAACG |
| org-1 | 2100 | TGCTATGGCAACGACTACTGG | GTTGTAGTCCGTTAGGCTGGTGTC |
| Poxn | 2178 | GCCTGAGACTGAGCATCCTAATAGC | CCAAATGGCGTTGCTCGAACTG |
| Rfx | 3943 | ACCCAGAAGATGTCAACAGTCGTG | TTCGCCTGGTCAGCTCTTTAC |
| rn | 3661 | TCGTTCGTTGTAACGCCCTACC | GGGGTACGAGCGGAACTGGTG |
| ro | 1241 | CATAGCGAACACTACGATTCTATCC | GGATCTAAGCTGTCACTCCTTTTG |
| 2422 | TGAGGAGCTGCGCCAAATACTCG | TCCTCGAAGCGACCCTCTAGTGAAG | |
| sc | 1422 | CGGCTCCATATAATGTAGACCAATC | GCGAGGAACCAGGCGATAGAG |
| sens | 2450 | GATTTGTGCAGTGAACAGTATTGAG | TCACTTTCTTGGCGTTGTGATCTTG |
| 2820 | GATTTGGGCTGTCGGCTTCAC | GTTTTCCCATTGTCCGGGCATC | |
| sisA | 768 | CCGCACTATCGACAGCATCGTC | GTGAACTGCTTCCGCGATACG |
| slou | 2778 | ACAGGCACACAACACGGCACATC | GGCAAGTCAATAGCTAAATGCTG |
| Sox100B | 1945 | CTGAAAGCCGAGCAGAAGAAGG | GGCATAGTTTGCCAAAACCAG |
| 3159 | CAGGACACGGAGAACAAATAAGTCC | GCCAATCTACACTAAACATCGATTC | |
| Sox15 | 3654 | GCGCTATCCGCTGTTTGTATCTTG | TCTTGGGAAAATGAAAATTCACG |
| Sox21a | 1167 | TAGGCTCTGGATCGGGAAACAC | CTCCCACACTCATACCCATGC |
| Su(z)2 | 6313 | ACCTGCAAAACACGCACAACAC | GCATCTTTCTGCCTATTCTATCTGC |
| sv | 4690 | AGAGCACGATTCCCAACATCTGC | AGCTGGCCTGTACTTGTATTAAGG |
| TfIIA-S-2 | 462 | CTGGGCAGAACGCTCCAGGAC | CGTTGTGGCCCTGTAATGTTGATAG |
| TfIIEbeta | 1052 | ACCGCCGCCTAGCGATGATTC | GGAGCTGGTCTATCGGGCTTG |
| Tj | 1530 | GTGAAGCGCGAGGATCACAGTC | ATGGCCCAAAGTCGTGACCTG |
| tll | 1938 | AATTCAATTTGTGCAAGCGTTTC | TCACTTGGCACTGGTGTATCTTTG |
| 8413 | GCTGCAAATCAAATTGTCACGTTTC | AGCTGTGGTTGGGCCATCTTC | |
| vnd | 3036 | TTTAAGTTGCCCTACCAGGATACC | TTCCAGACATAGTTCGATTTAGGC |
| zen | 1272 | CGATGTTAACCCCATCGGTCTG | TGATGATGACCATAGATCAAATCAC |
| zen2 | 942 | TTTCTGTCGGGATCGACTGTCGTG | CAGTTAGAAAACGCTCCTGCGTATC |
| CG10147 | 1347 | GAAGTTCCACTCCTTCCGAGCAC | TCCAGAAGCTCGTAGCACTCC |
| CG10309 | 2778 | AGGGACGCGCCAAGAATCTGAG | CGGGTGAGTAGATGTTCGTCTTG |
| CG10348 | 1593 | AAGCGGAGAAGCAGTTTCGATCAG | GCCCTTGGCATGGTGATTTAAG |
| CG10431 | 2151 | TCGGAAGATGACTCCATGAGTGG | AACCCTAATTGATGGCACAGC |
| CG10887 | 2031 | TTCACTCAACAGCAAATAGTGTTCC | AAGAATGTGAACGGCTTTGGTG |
| CG11072 | 171 | CACATCCATCAGAGCCCATAAG | CCCTCTGAACAAACTAGTGTGACG |
| CG11085 | 828 | ATAGACATTGAGGATCGATCTACGC | ATCCGAATCGTCGTTGGAGAGC |
| CG11152 | 1800 | GACTTTGGAACAATACCGCCTCCAG | ATCGTCGTTGATTGAGCTTGTGAAG |
| CG11294 | 946 | GATCTGCTCACGGAGTATATGTTTG | GCCGGGCAGCGGTGATATAAAAG |
| CG11762 | 957 | GGTGGATTTGGACGATGTTCCTG | AATCTTAGTCCGCTTATCATGTGC |
| CG11966 | 1764 | CAACTACATGCAGAGTGCCTATCAC | TTGTTGCTCTCATCCGGCAGTC |
| CG12029 | 503 | CGTTACAACCGCCGAAATAATCC | CACCGTTCGTACGCTCATCAAAC |
| CG13287 | 1386 | CACCGGGTGGAAAGACCACTC | GCGATGTGGGTGTCTCATTGTTG |
| CG13296 | 1398 | GCATGACCACTTCATCGACAGAAAC | TGGTCGGTGTGATACTGGTGCTC |
| CG1379 | 1107 | GGAAACCTGACACTGGGTGATTC | GCATCCGATTAGGTCATCAATCTC |
| CG15258 | 591 | CAAAATCGCATAACGCAGGAG | CTGGATCGACCGGATGTGTTC |
| CG15269 | 1764 | AAGGAGCGTAAATCCGCTCAGG | GCCCAGAATCTTACTGATGCTAAAG |
| CG15336 | 546 | ATCGGACGCGCATCTATTGGAATC | AGCCGCAGAACTCGCACATAAG |
| CG15398 | 885 | AGGGAGCCGAGGAAATGTCTTTGC | CACTTGCTTTGTTCGCCAGGACTTG |
| CG15455 | 921 | TCTACCGATCGCCTTCAAGGTTTTG | GTTTTGTTTGAGCGCCAGTGC |
| CG15696 | 540 | CCACCCAGCATCTTATGTCTCAAAG | GCGTACGGATGGAAAGGCAAG |
| CG15710 | 798 | CCTGTTTGCAGCCGATAAAAGAG | GCGCCACGTACACCTTGGTAAC |
| CG15782 | 455 | TGTCTATGCCCGCGAAATGCTC | TCGGGATAATGGGCTTCCTTG |
| CG1663 | 1164 | GGCGGAGGTACAGGAATCTTTC | AACTTGGATTCCTGAGCAATAGAC |
| CG16779 | 5943 | CCGGAAATGTTGGCAGATGTC | ATCCGGATACCCGATGGTCAG |
| CG16899 | 1074 | TTCTTGCGGTCGAAGGATAATGAG | TCTCTCTGCATCGGAGAACATGAG |
| CG17075 | 2907 | AAGTCCCGCAAAGGGAGTAGTGAC | TCGCTGTGCGACTCCGTCAAG |
| CG17186 | 1146 | CTCGTGCAACGTCTGCGGCTAC | AACAGTGACGATTCCACCTCAGAC |
| CG17195 | 737 | TGCATTTTGAGACGGGACCAC | GGTGTTGCACAAAACGCAGTG |
| CG17196 | 831 | TGGCCTGCTACCGTACAAGCTCAG | GCAAGTTTCCCAGGATATTGTATG |
| CG17197 | 951 | AGTACTTGGCGCGTCGAAATC | TGGTGCAGGCCCATATCAAAC |
| CG17198 | 873 | CAGTTTTTGGGATTGTTTGGACAG | TGGCATTACGTAGAAAGCTTCG |
| CG17287 | 1017 | CGGTGCATTTTCGTAGCTCCAG | AAACCATGTCCATAGAAACATGAAC |
| CG17328 | 1413 | TTTTTAAAACCGATGGCCTACCTTC | CTCTTATTTTAGCGCATTGCATC |
| CG17385 | 837 | AGACGGTGGCCAATCAGTTCAG | CGTCGAATTCTACCTCCATGC |
| CG17568 | 1509 | ATTAGCGAACTAATCGATTTTGCAG | AAGCGTATAGCACTCCGTACACAGC |
| CG17801 | 1054 | GCCGCGTCATATTTGCCCATC | TGGCGTTCATTCTGTTCTAACC |
| CG17803 | 1401 | AGACGACGGAAAAGTACAACATTC | GCCTAAGCTTTTTGTCGACCTG |
| CG18476* | 2954 | ATCAATCTGGATGCAACTGGTAGTC | TTCGGTTAGCTCGCATAAAATCTCC |
| CG2120 | 1035 | CGATCTATTGGAACTTGTGAATGG | TATTCCGAGCCCGGATTCAGC |
| CG30417 | 807 | CAACTGCTGGGCCTCCACGATTAC | CGGATCCAGTGGCTCGTAGAAAAC |
| CG30431* | 1810 | TCGTCCGTATAGATCGGGGCTTC | TTGGCATAGTTGTTTTCCTTGC |
| CG30443* | 1771 | GGAGACTACCCCGAACCTCCAC | TCGGCCTGGTTGGACGATGAC |
| CG31224* | 7059 | CAAAGCGGAAGACGAGAGGAAAG | TGCGCTCCTTCTTCCAATATACAAG |
| CG31241* | 2020 | GCGCCCAAGTACTGCTACTTCTTC | TCTCGAGGTAGAGGCTTCTGG |
| CG31612* | 3308 | ATGTTTCGGCGACGCTTATCAGTAG | GAGTGTCAGCAGAGATTCTTGTTCG |
| CG31632* | 3371 | AAAATGCAAGCCACTTCCGGTCAG | GGCCACGTCCATAACGGTTTTATC |
| CG32532* | 4422 | GCTGGAAAATTTCAATGGAGCGAAG | GGGGGCGTCCTTTCCTGATTTC |
| CG32611* | 3313 | TTGCCGAGCTGCAAACCTTAG | AACGGACGTCCTTCAATATCAC |
| CG32705* | 4705 | TGTCCGTGGGTGAACAACTGC | GTGATGATCGAAGGTCTCTATGC |
| CG32767* | 7670 | CGCAACAGATCGAATTTATACTGC | ATAGGGCGCTATCGTTAATGG |
| CG32772* | 2476 | CTCCGAAGACGTGGATCTGATATTC | TCGTCCTGCGACTGTAGGTTCTC |
| CG3485 | 993 | GATCTCTGAATGCACGGACTGTGAC | ACTTGTGCTATTGAAACCAGCAG |
| CG40351* | 5846 | AGACCCGTCCTATCCAACAATAAC | CCTGGCATTAAACCATCGTAAC |
| CG4318 | 699 | ATCCTTTTCCCAAGACCATTTGC | CCTTCGGTAGAACCGGGAAGC |
| CG4328 | 1593 | GCCGGAGTAGCCCACAATGAC | TTTGGTACTCGCACTCCTTCC |
| CG4374 | 2577 | CGAGTTTTGGCACCAGGACAAG | TAGCCCTGTAACGTGGGATCG |
| CG4565 | 672 | AATTCTGTTCTTTTGAACCCCTGTC | GTACTCGTCCGCCAAGAATTTG |
| CG4575 | 285 | GAAGAATATGGAGGCCTTTCAAAC | CGTATCGCGCCCACTTTGACG |
| CG4676 | 1008 | CTTTGCAGCACTGTCTCAGTTGG | CGATTTTTGGCTCTGCTGCTAAC |
| CG4956 | 858 | AGTTGCATTGGCCATAAAAATCAG | GGCAAAGAAGGTGCAGTGGTG |
| CG5245 | 1506 | AGTTAAAGGCGTCCCGTCGAAGC | GGCTCCAAATCTTTGCCATAACTAC |
| CG5369 | 846 | TATCCGAGAGCAATACGATCCTC | CGGATAGGTCGGTAATGTCTATGTC |
| CG6118 | 2832 | TCGAAGATCATCAAGAAGTGGAAC | CAGTGGAAGCGGCACATTGAG |
| CG7056 | 819 | CGCTGCGCTTCAATCCCATCTAC | AAGTGGGAGCCAGGACAGCAC |
| CG7368 | 1593 | TCCGCTGGTTTCCACCGTGAC | GCTGTGCTCGTCGTGAATTGC |
| CG7691 | 852 | ATTGGCGACAGGGCGACGATATAC | CGCCGCTCGGTTCACTTTGAC |
| CG7786 | 579 | CCATTCCCCAGAATCTTCGGCTAAC | AAATGGAGTAGGACGGATTGTTC |
| CG7963 | 966 | CCTTGCAGCCAATGTATTTATGC | TACACAGGTCCACTTGCATCTCC |
| CG8089 | 1875 | GTGCTGTCGAGGATTCAGGGAGAAG | TTGCACTTGCAGTGGCAGGAAG |
| CG8117 | 489 | GCGACTTGAACGGCTGCAAGG | CGTAAATTGCGTCCTCCAGTTTAG |
| CG9571 | 783 | TCCATCCGATCGCTGCTCTCC | GAAGCTGGACTGGAAGATTGGTG |
| CG9793 | 1041 | TTGGAAGTCCAAAGGGAGTTGCTC | ACAGCGTTCCCTAAAGGATATGG |
| CG9895 | 1233 | TACAGGTTAAATGGATCACGACTGC | CTCTGCGCTCCAGACGACGAC |
aGenes represented in our EST collection are indicated by asterisks.
bRelease 4.1 annotated transcript lengths in nucleotides are reported. For genes with multiple annotated transcripts, the length of the longest is reported.
cRpb4 and Ada2A were separate gene annotations in Release 3.1 but are merged into one in Release 4.1.
Summary of cDNA clones recovered
| Classification | Clone count |
|---|---|
| ORF identical to gene annotation | 43 |
| ORF alters gene annotation | |
| 5′ extension | 10 |
| 3′ extension | 3 |
| 5′ short with upstream in-frame stop codon | 2 |
| Exon variant | 12 |
| Dicistronic | 1 |
| Gene merge | 1 |
| Subtotal: high-quality, full-length cDNAs | 72 |
| Compromised clones: | |
| Nucleotide discrepancy | 6 |
| Short | |
| 5′ short | 1 |
| 3′ short | 5 |
| 5′and 3′ short | 1 |
| Co-ligated insert | 7 |
| Antisense transcript | 1 |
| Genomic contaminant | 4 |
| Retained intron | 3 |
| SLIP artifact | 4 |
| Subtotal: compromised cDNAs | 32 |
| Gene-specific clones recovered | 104 |
aOne cDNA clone was selected per target gene.
bThe clone encodes a protein that is identical to the corresponding Release 4.1 annotation.
cThese clones encode proteins that differ from their corresponding annotation. ‘5′ extension’ and ‘3′ extension’ clones encode additional N-terminal and C-terminal residues, respectively, relative to the annotation. ‘5′ short with upstream in-frame stop codon’ clones encode full-length ORFs that are missing sequences encoding N-terminal residues relative to the annotation and may represent alternatively spliced products. ‘Exon variant’ clones contain sequence differences relative to the annotation at internal positions in the CDS and represent alternatively spliced products.
dThe sequence of these clones have nucleotide differences, most likely the result of errors generated by reverse transcriptase during library construction, that introduce a missense or frameshift change in the ORF relative to the annotated CDS.
eThese clones are missing sequences encoding the N-terminal portion of the predicted protein sequence of the annotation for the ‘5′ short’ class, the C-terminal portion for the ‘3′ short’ class, or both for the ‘5′ and 3′ short’ class.
fThese clones contain sequences from two unrelated genes and are almost certainly the result of two cDNA molecules being cloned into the same plasmid vector during library construction. In three such cases, the clones encode proteins that are identical to the targeted annotation.
gThe sequence of the clone overlaps the annotated gene model but is transcribed from the opposite strand.
hThese clones do not include a poly-adenylated tail. These are genomic clones that contaminate the cDNA libraries.
iThese clones are poly-adenylated and include unprocessed intron.
Figure 2cDNA sequences improve gene annotations. (A) Comparison of cDNA IP01040 to the targeted Release 4.1 annotated gene model CG10604. Exons (filled boxes), introns (connecting lines), start codons (green) and stop codons (red) are indicated. The positions of the PCR primers used in the SLIP screening experiment are shown (arrows not to scale). The cDNA subsumes the gene model and extends beyond it by 829 bases at the 5′ end, including 5′-UTR sequence and sequences encoding an additional 142 N-terminal amino acids, and by 344 bases at the 3′ end. (B) Comparison of cDNA IP01192 to the three corresponding gene models CG14781, CG17782 and CG15783. The positions of the PCR primers within the target gene model CG15782 are indicated. The cDNA shows that the three annotated gene models are parts of one gene with a single long ORF.