| Literature DB >> 18231609 |
Andreas Rolfs1, Yanhui Hu, Lars Ebert, Dietmar Hoffmann, Dongmei Zuo, Niro Ramachandran, Jacob Raphael, Fontina Kelley, Seamus McCarron, Daniel A Jepson, Binghua Shen, Munira M A Baqui, Joseph Pearlberg, Elena Taycher, Craig DeLoughery, Andreas Hoerlein, Bernhard Korn, Joshua LaBaer.
Abstract
We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in approximately 15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance.Entities:
Mesh:
Year: 2008 PMID: 18231609 PMCID: PMC2211400 DOI: 10.1371/journal.pone.0001528
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Genes associated with Disease Classes (MeSH) in Publications.
The clone target list (HFLEX7000) was compared with all human genes (EntrezGene, 2004) and all genes represented by MGC (2004) with respect to published relationships of the genes to human diseases. The targeted genes reveal similar proportionality to the other gene lists but a general enrichment of genes related to diseases (Table 1; Supplementary Table S2).
MeSH Term Analysis for Gene and Diseases Association in Publications; Examples
| Disease MeSH Term | Genome | MGC | HFLEX7000 | MGC/Genome | HFLEX7000/Genome | HFLEX7000/MGC | |||
| # | (%) | # | (%) | # | (%) | ||||
| Neoplasms | 7614 | 23.0 | 3904 | 40.0 | 1884 | 56.4 | 1.74 | 2.45 | 1.41 |
| Pathological Conditions, Signs and Symptoms | 7464 | 22.5 | 3631 | 37.2 | 1775 | 53.2 | 1.65 | 2.36 | 1.43 |
| Nervous System Diseases | 5600 | 16.9 | 2753 | 28.2 | 1366 | 40.9 | 1.67 | 2.42 | 1.45 |
| Congenital, Hereditary, and Neonatal Diseases and Abnormalities | 5269 | 15.9 | 2599 | 26.6 | 1287 | 38.5 | 1.67 | 2.42 | 1.4 |
| Digestive System Diseases | 4710 | 14.2 | 2465 | 25.2 | 1253 | 37.5 | 1.77 | 2.64 | 1.49 |
| Immune System Diseases | 4583 | 13.8 | 2302 | 23.6 | 1170 | 35.0 | 1.70 | 2.53 | 1.49 |
| Skin and Connective Tissue Diseases | 4386 | 13.2 | 2256 | 23.1 | 1192 | 35.7 | 1.74 | 2.70 | 1.55 |
| Urologic and Male Genital Diseases | 4025 | 12.2 | 2103 | 21.5 | 1087 | 32.6 | 1.77 | 2.68 | 1.51 |
| Endocrine System Diseases | 4016 | 12.1 | 2060 | 21.1 | 1047 | 31.4 | 1.74 | 2.59 | 1.49 |
| Cardiovascular Diseases | 3969 | 12.0 | 2034 | 20.8 | 1031 | 30.9 | 1.74 | 2.58 | 1.48 |
| Hemic and Lymphatic Diseases | 3956 | 11.9 | 2049 | 21.0 | 1026 | 30.7 | 1.76 | 2.57 | 1.47 |
| Female Genital Diseases and Pregnancy Complications | 3826 | 11.6 | 2011 | 20.6 | 1028 | 30.8 | 1.78 | 2.66 | 1.50 |
| Nutritional and Metabolic Diseases | 3670 | 11.1 | 1897 | 19.4 | 955 | 28.6 | 1.75 | 2.58 | 1.47 |
| Respiratory Tract Diseases | 3524 | 10.6 | 1808 | 18.5 | 944 | 28.3 | 1.74 | 2.66 | 1.53 |
| Musculoskeletal Diseases | 3518 | 10.6 | 1751 | 17.9 | 913 | 27.3 | 1.69 | 2.57 | 1.53 |
| Disorders of Environmental Origin | 3466 | 10.5 | 1809 | 18.5 | 935 | 28.0 | 1.77 | 2.68 | 1.51 |
| Virus Diseases | 2963 | 8.9 | 1527 | 15.6 | 804 | 24.1 | 1.75 | 2.69 | 1.54 |
| Mental Disorders | 2903 | 8.8 | 1451 | 14.8 | 785 | 23.5 | 1.69 | 2.68 | 1.58 |
| Bacterial Infections and Mycoses | 2817 | 8.5 | 1441 | 14.7 | 754 | 22.6 | 1.73 | 2.65 | 1.53 |
| Eye Diseases | 2700 | 8.2 | 1318 | 13.5 | 709 | 21.2 | 1.65 | 2.60 | 1.57 |
| Stomatognathic Diseases | 2101 | 6.3 | 1069 | 10.9 | 583 | 17.5 | 1.72 | 2.75 | 1.60 |
| Otorhinolaryngologic Diseases | 1882 | 5.7 | 941 | 9.6 | 503 | 15.1 | 1.69 | 2.65 | 1.56 |
| Parasitic Diseases | 1739 | 5.3 | 918 | 9.4 | 493 | 14.8 | 1.79 | 2.81 | 1.57 |
Examples of MedGene analysis of disease term association with genes in PubMed, using either all human genes (2004), unique genes in MGC (2004), or HFLEX7000 (targets). Numerical values and percentiles of each class associated with genes are shown. Relative MeSH term associations in either MGC or HFLEX7000 to the genome, and in HFLEX7000 to MGC examine a potential bias in MGC or HFLEX7000 towards specific MeSH terms.
Figure 2Genes associated with Biological MeSH Terms in Publications.
The clone target list (HFLEX7000) was compared with all human genes (EntrezGene, 2004) and all genes represented by MGC (2004) with respect to published relationships of the genes to all biological MeSH terms and MeSH nodes (34). The targeted genes reveal similar proportionality to the other gene lists but a general enrichment of genes related to MeSH terms (Table 2; Supplementary Table S3).
Biological and Chemical MeSH Class Analysis for Genes Associated in Publications with MeSH Class; Examples
| Biological/Chemical MeSH Class | Genome | Genome | MGC | MGC | HFLEX7000 | HFLEX7000 | MGC/Genome | HFLEX 7000/MGC | HFLEX7000/Genome |
| # | % | # | % | # | % | ||||
| Transcription, Genetic | 6625 | 20.01 | 3457 | 35.38 | 1707 | 51.12 | 1.77 | 1.45 | 2.56 |
| Recombinant Proteins | 6454 | 19.49 | 3431 | 35.11 | 1697 | 50.82 | 1.80 | 1.45 | 2.61 |
| Cell Survival | 2912 | 8.79 | 1567 | 16.04 | 824 | 24.68 | 1.82 | 1.54 | 2.81 |
| Subcellular Fractions | 2753 | 8.31 | 1590 | 16.27 | 785 | 23.51 | 1.96 | 1.44 | 2.83 |
| Cyclic AMP | 2429 | 7.34 | 1271 | 13.01 | 670 | 20.07 | 1.77 | 1.54 | 2.74 |
| Protein Kinases | 2391 | 7.22 | 1307 | 13.37 | 668 | 20.01 | 1.85 | 1.50 | 2.77 |
| Insulin | 2212 | 6.68 | 1157 | 11.84 | 613 | 18.36 | 1.77 | 1.55 | 2.75 |
| Mitosis | 1917 | 5.79 | 1073 | 10.98 | 554 | 16.59 | 1.90 | 1.51 | 2.87 |
| RNA Splicing | 1950 | 5.89 | 1075 | 11.00 | 553 | 16.56 | 1.87 | 1.51 | 2.81 |
| DNA Replication | 1988 | 6.00 | 1071 | 10.96 | 540 | 16.17 | 1.83 | 1.48 | 2.69 |
| Antigens, Neoplasm | 1786 | 5.39 | 930 | 9.52 | 509 | 15.24 | 1.76 | 1.60 | 2.83 |
| Drug Resistance | 1645 | 4.97 | 910 | 9.31 | 486 | 14.56 | 1.87 | 1.56 | 2.93 |
| Drug Interactions | 1642 | 4.96 | 863 | 8.83 | 457 | 13.69 | 1.78 | 1.55 | 2.76 |
| Cell Membrane Permeability | 1195 | 3.61 | 669 | 6.85 | 365 | 10.93 | 1.90 | 1.60 | 3.03 |
| Staurosporine | 1012 | 3.06 | 570 | 5.83 | 331 | 9.91 | 1.91 | 1.70 | 3.24 |
| Synapses | 1128 | 3.41 | 581 | 5.95 | 312 | 9.34 | 1.75 | 1.57 | 2.74 |
| Lipoproteins | 1024 | 3.09 | 554 | 5.67 | 305 | 9.13 | 1.83 | 1.61 | 2.95 |
| Genes, Lethal | 1083 | 3.27 | 589 | 6.03 | 300 | 8.98 | 1.84 | 1.49 | 2.75 |
| Cell Extracts | 943 | 2.85 | 563 | 5.76 | 295 | 8.83 | 2.02 | 1.53 | 3.10 |
| Adenosine | 957 | 2.89 | 527 | 5.39 | 281 | 8.42 | 1.87 | 1.56 | 2.91 |
| Hormones | 976 | 2.95 | 499 | 5.11 | 271 | 8.12 | 1.73 | 1.59 | 2.75 |
| Anti-Inflammatory Agents | 891 | 2.69 | 473 | 4.84 | 270 | 8.09 | 1.80 | 1.67 | 3.01 |
| Peptide Library | 718 | 2.17 | 419 | 4.29 | 245 | 7.34 | 1.98 | 1.71 | 3.38 |
| Microglia | 700 | 2.11 | 406 | 4.15 | 241 | 7.22 | 1.97 | 1.74 | 3.41 |
| Tamoxifen | 776 | 2.34 | 445 | 4.55 | 237 | 7.10 | 1.94 | 1.56 | 3.03 |
| Pain | 812 | 2.45 | 391 | 4.00 | 229 | 6.86 | 1.63 | 1.71 | 2.80 |
| Liver Regeneration | 694 | 2.10 | 419 | 4.29 | 227 | 6.80 | 2.05 | 1.59 | 3.24 |
| Aspirin | 643 | 1.94 | 351 | 3.59 | 194 | 5.81 | 1.85 | 1.62 | 2.99 |
| Nucleosomes | 612 | 1.85 | 359 | 3.67 | 194 | 5.81 | 1.99 | 1.58 | 3.14 |
Examples of BioGene analysis of biological and chemical MeSH class associations with genes in PubMed, using either all human genes (2004), unique genes in MGC (2004), or HFLEX7000 (targets). Numerical values and percentiles of each class associated with genes are shown (#, %). Relative MeSH Class associations in either MGC or HFLEX7000 to the genome, and in HFLEX7000 to MGC examine a potential bias in MGC or HFLEX7000 towards specific MeSH terms.
Figure 3Workflow diagram of clone production.
The entire production process from the design of primers to production of glycerol stocks is shown. The process started by identifying MGC clones in the available plates and then creating array files along with matching PCR primer order files that included two primers anchored at the 3′ end, one for each format. The primers were used to amplify the ORFs from the matching MGC clones. PCR products were monitored in agarose gels, and products were purified prior to capture via In-Fusion reaction. Competent bacterial strains were transformed with the reaction followed by the robotic isolation of 4 resulting colonies per format, which were used to prepare 15% glycerol stocks. Prior to sequencing a single isolate plate of 96 targets were created. As indicated, step specific results were stored in our LIMS.
Clone Production and Sequencing Summary
| Type | Phase1 (1st isolate) | Phase2 (2nd isolate) | Total | |
| ORF Target | CLOSED | 3557 | 277 | 3557 |
| FUSION | 3557 | 327 | 3557 | |
| Avg ORF size and range (bp) | ALL | 1268 (99–4,785) | 1491 (171–4,395) | 1268 (99–4,785) |
| Clones for sequence validation | CLOSED | 3535 | 277 | 3812 |
| FUSION | 3496 | 327 | 3823 | |
| Number of reads | ALL | 20672 | 1957 | 22629 |
| Average number of reads per clone | ALL | 2.9 | 3.2 | 2.9 |
| Accepted clones | CLOSED | 3240 (91%) | 223 (81%) | 3463 (91%) |
| FUSION | 3242 (92%) | 258 (79%) | 3500 (92%) | |
| ALL | 6482 | 481 | 6963 | |
| Accepted clone match perfect with reference | ALL | 6226 | 443 | 6669 (95.8%) |
| Accepted clone with silent mutation(s) only | ALL | 94 | 13 | 107 (1.5%) |
| Accepted clone with 1 mis-sense | ALL | 162 | 25 | 187 (2.7%) |
| Rejected clones | ALL | 646 (9%) | 119 (20%) | 765 (10%) |
| Clones rejected for linker changes | ALL | 158 | 43 | 201 (26%) |
| Clones rejected for cds changes (ins/del/nonsense/mis) | ALL | 187 | 48 | 235 (31%) |
| Clones rejected for no/incomplete/wrong assembly | ALL | 301 | 28 | 329 (43%) |
| Accepted ORFs | CLOSED | 3097 (88%) | 222 (80%) | 3295 (93%) |
| FUSION | 3075 (88%) | 256 (78%) | 3259 (92%) | |
| ALL | 3387 | 453 | 3447 (97%) |