| Literature DB >> 23775796 |
Sony Malhotra1, Ramanathan Sowdhamini.
Abstract
The interaction of proteins with their respective DNA targets is known to control many high-fidelity cellular processes. Performing a comprehensive survey of the sequenced genomes for DNA-binding proteins (DBPs) will help in understanding their distribution and the associated functions in a particular genome. Availability of fully sequenced genome of Arabidopsis thaliana enables the review of distribution of DBPs in this model plant genome. We used profiles of both structure and sequence-based DNA-binding families, derived from PDB and PFam databases, to perform the survey. This resulted in 4471 proteins, identified as DNA-binding in Arabidopsis genome, which are distributed across 300 different PFam families. Apart from several plant-specific DNA-binding families, certain RING fingers and leucine zippers also had high representation. Our search protocol helped to assign DNA-binding property to several proteins that were previously marked as unknown, putative or hypothetical in function. The distribution of Arabidopsis genes having a role in plant DNA repair were particularly studied and noted for their functional mapping. The functions observed to be overrepresented in the plant genome harbour DNA-3-methyladenine glycosylase activity, alkylbase DNA N-glycosylase activity and DNA-(apurinic or apyrimidinic site) lyase activity, suggesting their role in specialized functions such as gene regulation and DNA repair.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23775796 PMCID: PMC3753632 DOI: 10.1093/nar/gkt505
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of the search protocol: three sensitive sequence search methods, namely, PSI-BLAST, RPS-BLAST and HMMScan were used to perform a comprehensive search for DBPs in Arabidopsis genome.
The number of proteins in A. thaliana genome that were identified as DNA-binding using three different methods
| Set | Method identifying the protein | Number of proteins | Proteins belonging to same group and family |
|---|---|---|---|
| I | HMMScan, PSI-BLAST, RPS-BLAST | 376 | 376 |
| II | HMMScan, PSI-BLAST | 463 | 463 |
| HMMScan, RPS-BLAST | 600 | 598 | |
| PSI-BLAST, RPS-BLAST | 686 | 685 | |
| III | PSI-BLAST | 271 | 258 |
| RPS-BLAST | 453 | 290 | |
| HMMScan | 618 | 358 |
The proteins were divided into three sets I, II and III depending on the number of methods that identify them as DNA binding.
Figure 2.Distribution of proteins: the proteins in Arabidopsis genome that were identified DNA-binding on performing searches using the structure-based families were studied for their distribution in structural groups and families. The highest populated group was helix-turn-helix.
Overrepresented DNA-binding PFam families in At-Dbome
| Pfam ID | Pfam Name | Normalized occurrence | Description |
|---|---|---|---|
| PF13724 | DNA_binding_2 | 42.86 | This domain, often found on ovate proteins, which is a plant Ku70 interacting protein involved in DNA double-strand break repair |
| PF08744 | NOZZLE | 25.89 | NOZZLE is a transcription factor that plays a role in patterning the proximal–distal and adaxial–abaxial axes |
| PF04689 | S1FA | 24.53 | S1FA is a DBP found in plants that specifically recognizes the negative promoter element S1F |
| PF04618 | HD-ZIP_N | 23.90 | Homeodomain leucine zipper (HDZip) genes encode putative transcription factors that are unique to plants. |
| PF02362 | B3 | 20.47 | The B3 DNA-binding domain (DBD) is a highly conserved domain found exclusively in transcription factors, from higher plants |
| PF02365 | NAM | 19.24 | NAM transcription factors are plant development proteins. |
| PF00097 | zf-C3HC4 | 19.10 | Zinc finger |
| PF06217 | GAGA_bind | 19.08 | This family includes gbp a protein from soybean that binds to GAGA element dinucleotide repeat DNA |
| PF04640 | PLATZ | 16.72 | Plant AT-rich sequence and zinc-binding proteins (PLATZ) are zinc-dependant DBPs. They bind to AT-rich sequences and functions in transcriptional repression |
| PF02701 | zf-Dof | 16.67 | Zinc finger found in several DBPs of higher plants |
| PF07716 | bZIP_2 | 16.36 | Basic leucine zipper |
| PF06200 | tify | 15.58 | The tify domain is a 36-amino acid domain only found among Embryophyta (land plants).found in a variety of plant transcription factors that contain GATA domains |
| PF02183 | HALZ | 15.30 | Plant-specific leucine zipper that is always found associated with a homeobox |
| PF13921 | Myb_DNA-bind_6 | 15.07 | MYB like DNA-binding domain |
| PF14215 | bHLH-MYC_N | 13.24 | MYB and MYC family regulate the biosynthesis of phenylpropanoids in several plant species |
| PF03859 | CG-1 | 12.18 | Sequence-specific DBP |
| PF03110 | SBP | 12.08 | Plant-specific transcription factors |
| PF13639 | zf-RING_2 | 11.91 | RING finger domain |
| PF08879 | WRC | 11.76 | WRC is named after the conserved Trp-Arg-Cys motif, it contains two distinctive features a putative nuclear localization signal and a zinc-finger motif (C3H). It is suggested that WRC functions in DNA-binding |
| PF07777 | MFMR | 11.17 | Multifunctional mosaic region |
| PF02309 | AUX_IAA | 11.06 | Plant-specific, repressors of auxin induces gene expression |
| PF02536 | mTERF | 10.53 | Leucine zipper |
DBPs’ families in At-Dbome were analysed for their occurrence in Arabidopsis genome and in PFam. Normalized ratio of occurrence of a given DNA-binding family in At-Dbome was calculated. This table shows the 22 PFam families (with their description and normalized occurrences) that were observed to be overrepresented in the plant genome.
Figure 3.Distribution of DNA repair proteins: the proteins in At-Dbome were further analysed for their involvement in DNA repair processes. This is the subset of proteins having DNA repair function and their distribution in different families. The most populated family was observed to be RuvB_N.
Figure 4.Genome-wide survey in four genomes: using the similar search protocol and using sequence-based families; genome-wide survey was performed in three other genomes, namely, C. elegans, D. melanogaster and S. cerevisiae. The number of proteins identified as DBPs in A. thaliana, S. cerevisiae, C. elegans and D. melanogaster were 4471, 720, 2028 and 2620, respectively.
Figure 5.Log odds score for GO molecular functions: the proteins in At-Dbome were mapped to their PFam families. The GO mapping for these families was performed, and we studied the over and underrepresented functions in Arabidopsis genome. The functions that were observed to be overrepresented in plant genome were involved in DNA repair mechanisms.