Literature DB >> 29334890

Characterization of the late embryogenesis abundant (LEA) proteins family and their role in drought stress tolerance in upland cotton.

Richard Odongo Magwanga1,2, Pu Lu1, Joy Nyangasi Kirungu1, Hejun Lu1, Xingxing Wang1, Xiaoyan Cai1, Zhongli Zhou1, Zhenmei Zhang1, Haron Salih1, Kunbo Wang3, Fang Liu4.   

Abstract

BACKGROUND: Late embryogenesis abundant (LEA) proteins are large groups of hydrophilic proteins with major role in drought and other abiotic stresses tolerance in plants. In-depth study and characterization of LEA protein families have been carried out in other plants, but not in upland cotton. The main aim of this research work was to characterize the late embryogenesis abundant (LEA) protein families and to carry out gene expression analysis to determine their potential role in drought stress tolerance in upland cotton. Increased cotton production in the face of declining precipitation and availability of fresh water for agriculture use is the focus for breeders, cotton being the backbone of textile industries and a cash crop for many countries globally.
RESULTS: In this work, a total of 242, 136 and 142 LEA genes were identified in G. hirsutum, G. arboreum and G. raimondii respectively. The identified genes were classified into eight groups based on their conserved domain and phylogenetic tree analysis. LEA 2 were the most abundant, this could be attributed to their hydrophobic character. Upland cotton LEA genes have fewer introns and are distributed in all chromosomes. Majority of the duplicated LEA genes were segmental. Syntenic analysis showed that greater percentages of LEA genes are conserved. Segmental gene duplication played a key role in the expansion of LEA genes. Sixty three miRNAs were found to target 89 genes, such as miR164, ghr-miR394 among others. Gene ontology analysis revealed that LEA genes are involved in desiccation and defense responses. Almost all the LEA genes in their promoters contained ABRE, MBS, W-Box and TAC-elements, functionally known to be involved in drought stress and other stress responses. Majority of the LEA genes were involved in secretory pathways. Expression profile analysis indicated that most of the LEA genes were highly expressed in drought tolerant cultivars Gossypium tomentosum as opposed to drought susceptible, G. hirsutum. The tolerant genotypes have a greater ability to modulate genes under drought stress than the more susceptible upland cotton cultivars.
CONCLUSION: The finding provides comprehensive information on LEA genes in upland cotton, G. hirsutum and possible function in plants under drought stress.

Entities:  

Keywords:  Cotton (Gossypium spp); Drought; Gene expression; Gene ontology; Genome; Identification; LEA proteins; miRNAs

Mesh:

Substances:

Year:  2018        PMID: 29334890      PMCID: PMC5769447          DOI: 10.1186/s12863-017-0596-1

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Drought stress has resulted in to massive losses in crop production and also has altered the natural equilibrium of the environment [1]. To save the ecosystem and enhance production, advanced molecular breeding is the recipe for activation and regulation of specific stress-related genes [2]. Water deficit stress do led to a series of changes including biochemical alterations like accumulation of osmolytes and specific proteins involved in stress tolerance [3]. One of the proteins that play a role in the mechanism of drought resistance is the LEA types of protein known as dehydrin [4]. In cotton production, drought is the main abiotic stress responsible for plant growth compromise and severe yield loss. Even though cotton is considered to be relatively tolerant to water deficit, its optimal growth and yield negatively affected when water supply is limited or interrupted [5]. Water is an essential element for biotic component of the biosphere, such that various responses have evolved to withstand water deficit in all plants and animals, to enable them withstand long periods of water deprivation by adopting a type of life condition known as anhydrobiosis [6]. There is great agronomic significance to understand cotton plant responses to water deficit due to the huge economic losses that results from drought [7]. Cotton metabolism and yield are negatively affected under water deficit conditions, especially at flowering stage [8]. Plants have acquired an evolutionary response to withstand the effect of low water availability, a condition that can disadvantage their growth and development. As immobile organisms, plants possess diverse strategies of responses to drought. Among the molecules highly associated with plant responses to water limitation are the late embryogenesis abundant (LEA) proteins [9]. These proteins are widespread in the plant kingdom and highly enriched during the late stages of embryogenesis and in vegetative tissues in response to water deficit [10]. LEA proteins were first discovered more than 30 years ago and were observed to accumulate at late stages of plant seed development [11]. The LEA proteins have been found in various tissues of abiotic stressed plants and non-plant organisms known to be tolerant to desiccation, such as bacteria and some invertebrates [12]. LEA proteins are members of a large group of hydrophilic, glycine-rich proteins present in a wide range of plant species [13]. This class of proteins are known to be intrinsically disordered in their structures and are mainly expressed under water deprivation condition [14]. The LEA genes are highly diverse, with wide distribution in the plant kingdom and has pivotal role in various stress tolerance responses [15]. Scientific investigations on LEA protein families have been on-going for more than two decades [16]. Although there has been a strong association of LEA protein families with environmental stress tolerance of significance drought and cold stress [17], LEA protein families for most of that time, their function has been entirely obscure [18]. Considerable evidence gives an indication that LEA genes are involved in desiccation, though their precise function is unknown [19]. The bacterial group 1 LEA proteins have the ability to block enzyme inactivation upon freeze–thaw treatments in vitro and it has analogous functions to plant LEA proteins [10]. Therefore, there is need to conduct a genome wide characterization of LEA protein families in cotton. The recent upland cotton genome publications, G. hirsutum [20], G. arboreum [21] and Gossypium raimondii [22], enabled us to carry out the identification and characterization of all cotton LEA genes. In this study, we identified 242, 136 and 142 candidate LEA proteins in G. hirsutum, G. arboreum and G. raimondii respectively, analysed their phylogenetic tree relationships, chromosomal positions, duplicated gene events, gene structure, conserved motif compositions and profiling analysis of gene expression from different cotton plant organs. Our results provides a strong platform for better understanding of the roles and evolutionary history of LEA genes, and will help in future studies of the molecular and biological functions of LEA protein families in cotton.

Methods

Identification of LEA gene families

The conserved LEA protein domains were downloaded from Hidden Markov Model (HMM) (PF 03760, PF03168, PF03242, PF02987, PF0477, PF10714, PF04927 and PF00257. In order to identify the LEA proteins in cotton, the HHM profile of LEA protein was subsequently employed as a query to perform a HMMER search (http://hmmer.janelia.org/) [23] against the G. hirsutum and G. arboreum, which were obtained from cotton genome project (http://www.cgp.genomics.org.cn) and G. raimondii genome downloaded from Phytozome (http://www.Phytozome.net/), with E-value <0.01. All redundant sequences were discarded from further analysis based on cluster W17 alignment results. SMART and PFAM database were used to verify the presence of the LEA gene domains. The isoelectric points and molecular mass of LEA proteins were estimated by ExPASy Server tool (http://web.expasy.org/compute_pi/). In addition, subcellular location prediction of upland cotton, Gossypium hirsutum LEA proteins was conducted using the TargetP1.1 (http://www.cbs.dtu.dk/services/TargetP/) server [24] and Protein Prowler Subcellular Localisation Predictor version 1.2 (http://bioinf.scmb.uq.edu.au/pprowler_webapp_1-2/) [25]. Validation and determination of the possible cell compartmentalization of the LEA protein was done by WoLFPSORT (https://wolfpsort.hgc.jp/) [26].

Chromosomal locations and syntenic analysis

The chromosomal distribution of LEA genes were mapped on cotton chromosomes based on gene position, from up down by Circos-0.69 (http://circos.ca/) [27]. Homologous genes of G. hirsutum, G. raimondii and G. arboreum were identified by BLASTP with threshold >80% in similarity and at least in 80% alignment ratio to their protein total lengths. Default parameters were maintained in all of the steps. Tandem duplications were designated as multiple genes of one family located within the same or neighbouring intergenic region [28]. The Ks/Ka value is an important tool in determining selection pressure acting on the protein coding genes. The genes paralogous pair, which has Ka/Ks, ratio greater than 1, denotes activating evolution under beneficial selection, indicating that at least some of the mutations were advantageous. When the ratio is equal to 1, then the mutation is neutral but if the ratio is less than 1, it implies that the mutation is disadvantageous or under purifying selection [29]. In the estimation of Ks and Ka substitution rate, we used an alignment of multiple nucleotide sequences of homologous genes which code for LEA proteins. In this research, paralogous pairs were aligned using MEGA 6.0. synonymous substitution (Ks) and non-synonymous substitution (Ka) rate were obtained by Dnasp [30].

Phylogenetic analyses, gene structure organization and motif composition of the LEA proteins in cotton

Full-length sequences of G. hirsutum, G. arboreum, G. raimondii, P. tabuliformis and A. thaliana LEA proteins were first aligned using ClustalW on MEGA 6 software [31] then conducted phylogenetic analyses based on protein sequences, with neighbour joining (NJ) method. Support for each node was tested with 1000 bootstrap replicates. The analysis of phylogenetic tree was carried out on upland cotton, G. hirsutum. The gene structures were obtained through comparing the genomic sequences and their predicted coding sequences from the cotton genome project. In addition, MEME (Multiple Expectation Maximization for Motif EliCitation) online program (http:// meme.nbcr.net/meme/cgi-bin/meme.cgi) [32], was used to identify the conserved protein motifs, with maximum number of different motif at 20; the minimum and largest base sequence width of 6 and 50 respectively.

Prediction of miRNAs targeted LEA genes

The miRNA sequences were obtained from miRBase (http://www.mirbase.org/) [33], the Plant miRNA database (http://bioinformatics.cau.edu.cn/PMRD/) [34] and EST database (http://www.ncbi.nlm.nih.gov/ nucest) LEA genes targeted by miRNAs were predicted by searching 5′ and 3´ UTRs and the CDS of all LEA genes for complementary sequences of the cotton miRNAs using the psRNATarget server with default parameters (http://plantgrn.noble.org/psRNATarget/?function=3) [35].

Promoter cis-element analysis

Promoter sequences (2 kb upstream of the translation start site) of all LEA genes were obtained from the cotton genome project (http://cgp.genomics.org.cn/page/species/index.jsp).Transcriptional response elements of LEA genes promoters were predicted using using the PLACE database (http://www.dna.affrc.go.jp/PLACE/signals can.html) [36].

Gene ontology (GO) annotation

The functional grouping of LEA proteins sequences and the analysis of annotation data were executed using Blast2GO PRO software version 4.1.1 (https://www.blast2go.com). Blast2GO annotation associates genes or transcripts with GO terms using hierarchical terms, cellular component (CC), biological process (BP) and molecular function (MF). Genes were described in three categories of the GO classification terms: molecular function, biological processes and cellular components.

Plant materials and treatment

One-month-old cotton seedlings of G. tomentosum-AD3–00 (P0601211), G. hirsutum-CRI-12 (G09091801–2) and their BC2F1 genotypes, with G. tomentosum as the donor and G. hirsutum as the recurrent parent were used to examine the expression patterns of the LEA genes under drought condition. G. tomentosum is drought tolerant genotype while G. hirsutum is drought susceptible genotype. The two upland cotton accessions are perennially grown and maintained by our research group, in Sanya Island, Hainan province, China. Plants were grown in boxes, with dimension of 41 × 41 cm, with a depth of 30 cm and with three biological replications in the greenhouse located at the cotton research institute, Chinese Academy of Agricultural Sciences (CAAS), Anyang, Henan province, China. The greenhouse conditions were set with temperature at 23 ± 1 °C and a 14-h light/10-h dark photoperiod. After one month of growth, watering was totally withdrawn from drought treated seedlings but not in control. The samples for RNA extraction were collected at 0, 7 and 14th day of drought stress exposure, for plants under drought and control. Root, stem and leaf were the main organs of target in this study.

RNA isolation and qRT-PCR verification

RNA extraction kit, EASYspin plus plant RNA kit, obtained from Aid Lab, China was used to extract total RNA from roots, stems and leaves. The quality and concentration of each RNA sample was determined using gel electrophoresis and a NanoDrop 2000 spectrophotometer. Only RNAs which met the criterion 260/280 ratio of 1.8–2.1, 260/230 ratio ≥ 2.0, were used for further analyses and stored at −80 °C. The cotton constitutive Ghactin7 gene, forward “ATCCTCCGTCTTGACCTTG” and reverse sequence “TGTCCGTCAGGCAACTCAT” was used as a reference gene and specific LEA genes primers were used for qRT-PCR validation. The first-strand cDNA synthesis was carried out with TranScript-All-in-One First-Strand cDNA Synthesis SuperMix for qPCR, obtained from TRAN, it was used in accordance with the manufacturer’s instructions. Primer Premier 5 was used to design 43 LEA primers with melting temperatures of 55–60 °C, primer lengths of 18–25 bp, and amplicon lengths of 101–221 bp. Details of the primers are shown in (Additional file 1: Table S1). Fast Start Universal SYBRgreen Master (Rox) (Roche, Mannheim, Germany) was used to perform qRT-PCR in accordance with the manufacturer’s instructions. Reactions were prepared in a total volume of 20 μL, containing 10 μL of SYBR green master mix, 2 μL of cDNA template, 6 μL of ddH2O, and 2 μL of each primer to make a final concentration of 10 μM.

Results

Identification of LEA genes in cotton

The HMM profile of the Pfam LEA domains (PF3760, PF03168, PF03242, PF02987, PF00477, PF10714, PF00257 and PF 04927) were used as the query to identify LEA genes in the cotton genomes. Two hundred and eighty LEA genes were identified in upland cotton, Gossypium hirsutum, one hundred-seventy LEA genes in G. raimondii and one hundred-fifty LEA genes in G. arboreum. All the LEA genes were analyzed manually using the SMART and PFAM database (http://pfam.xfam.org/) to verify the presence of the LEA gene domain. Finally, 242, 136 and 142 candidate LEA proteins were identified in G. hirsutum, G. arboreum and G. raimondii respectively. All identified LEA genes were grouped into eight groups, ranging from LEA 1 to LEA 6, dehydrin and seed maturation protein (SMP). To validate our classification of upland cotton LEA genes, we compared the LEA genes nomenclature with previous identification adopted by Hundertmark and Hincha [12] and Bies-Etheve et al. [37] (Table 1).
Table 1

LEA proteins distribution in upland cotton compared with other plants

LEA genes grouping in this studyPfamHundertmark et al. (2008)Bies-Etheve et al. (2008) Arabidopsis G. hirsutum (AD)G. arboreum (A)G. raimondii (D) Pinus tabuliformis TOTALS
LEA 1PF03760LEA1LEA47944327
LEA 2PF03168LEA2LEA7315785891335
LEA 3PF03242LEA3LEA641667639
LEA 4PF02987LEA4LEA31213871656
LEA 5PF00477LEA5LEA16976331
LEA 6PF10714PvLEA18LEA83424013
SMPPF04927SMPLEA56101617049
DEHYDRINPF00257DehydrinLEA2102488151
TOTALS5124213614230601
LEA proteins distribution in upland cotton compared with other plants The physicochemical parameters of each LEA gene were calculated by using ExPASy, an online tool [38]. Most of the LEA proteins in the same family had similar physicochemical parameters. Cotton LEAs of the LEA 4 contained a greater number of amino acid residues as depicted by their protein lengths (aa), followed closely by the dehydrins (Table 2). Dehydrins have been found to contain high number of amino acid residues from the structural analysis of LEA genes in Brassica napus [39]. Cotton LEA_6 family members all had relatively low molecular masses, ranging from 10.177 to 11.9634 kDa, similar findings was also reported in the analysis of B. napus LEA genes, in which all the LEA 6 genes had lower molecular masses [39]. Approximately two- thirds of the cotton LEA proteins had high isoelectric points Pl ≥ 7.0, including majority of LEA 2 family.
Table 2

LEA gene in upland cotton, Gossypium hirsutum and their sequence characteristics and subcellular location prediction and chromosome position

GENE IDPROTEIN TYPEGENE ANNOTATIONLENGTH (aa)PlMM(aa)Chr NOStartEndPositionSub cellular localization
WolfpsortPprowlerTargetP
CotAD_ 04417DEHYDRINDEHYDRIN-4986.3310,753.6At_chr082,270,8632,271,159296nuclother_
CotAD_ 07367DEHYDRINDEHYDRIN-2414316.31160,755.92scaffold72.12,201,8742,209,4557581plasother_
CotAD_ 08352DEHYDRINDEHYDRIN-231605.4517,797.83scaffold190.1910,057911,6471590nuclother_
CotAD_ 10,502DEHYDRINDEHYDRIN-82358.1826,216.65Dt01_chr152,276,2302,277,176946nuclother_
CotAD_ 11,398DEHYDRINDEHYDRIN-13516.525513.09Dt06_chr25776,882777,140258nuclotherS
CotAD_ 13,947DEHYDRINDEHYDRIN-194494.9249,643.3Dt13_chr181,193,1271,196,4733346nuclother_
CotAD_ 15,928DEHYDRINDEHYDRIN-181807.9819,341.26Dt12_chr26638,305639,4451140nuclother_
CotAD_ 16,331DEHYDRINDEHYDRIN-151285.4614,436.09Dt08_chr24877,320877,836516mitoother_
CotAD_ 19,173DEHYDRINDEHYDRIN-51807.9819,330.23At_chr12883,350884,5121162cytoother_
CotAD_ 22,357DEHYDRINDEHYDRIN-21975.4822,224.51At_chr02824,855825,544689chloother_
CotAD_ 27,143DEHYDRINDEHYDRIN-141359.4914,720.04Dt07_chr16115,939116,435496chloother_
CotAD_ 29,610DEHYDRINDEHYDRIN-161725.8919,224.36Dt08_chr24936,161936,813652cytoother_
CotAD_ 31,255DEHYDRINDEHYDRIN-101995.522,423.72Dt02_chr14711,572712,266694chloother_
CotAD_ 35,513DEHYDRINDEHYDRIN-121614.7917,827.6Dt05_chr19393,056394,6851629chloother_
CotAD_ 42,408DEHYDRINDEHYDRIN-203449.0137,987.42Dt13_ch18453,295454,3291034chloother_
CotAD_ 46,550DEHYDRINDEHYDRIN-11359.3314,717.04At_chr0169,39669,892496cytoother_
CotAD_ 50,983DEHYDRINDEHYDRIN-92438.8226,199.73Dt02_chr14297,459298,432973cytoother_
CotAD_ 53,264DEHYDRINDEHYDRIN-222115.0423,789.31scaffold1899.195,85196,576725cytoother_
CotAD_ 57,587DEHYDRINDEHYDRIN-172115.2223,654.28Dt09_chr23187,778188,503725cytoother_
CotAD_ 64,203DEHYDRINDEHYDRIN-61786.4819,521.14At_chr13125,424126,358934chloother_
CotAD_ 65,889DEHYDRINDEHYDRIN-116085.6267,130.02Dt03_chr17396,530401,9885458cytoother_
CotAD_ 70,948DEHYDRINDEHYDRIN-213328.4637,102.53Dt13_chr1841,71442,8801166chloother_
CotAD_ 75,267DEHYDRINDEHYDRIN-73328.2237,096.42At_chr13210,365211,5311166nuclother_
CotAD_ 75,537DEHYDRINDEHYDRIN-35335.3958,793.08At_chr03254,566259,5194953plasother_
CotAD_ 16,594LEA1LEA 1–71156.913,315.99Dt13_chr181,203,6591,204,098439mitoother_
CotAD_ 16,595LEA1LEA 1–81156.913,315.99Dt13_chr181,204,7161,205,155439cytoother_
CotAD_ 17,186LEA1LEA 1–11655.8117,459.58At_chr061,273,7091,274,303594plasother_
CotAD_ 20,491LEA1LEA 1–61656.0817,343.5Dt06_chr25298,147298,732585cytoother_
CotAD_ 30,219LEA1LEA 1–21136.312,106.42Dt02_chr1423,53624,035499chloother_
CotAD_ 31,140LEA1LEA 1–31649.316,871.44Dt02_chr1428,05328,616563cytoother_
CotAD_ 48,976LEA1LEA 1–91168.0113,437.07scaffold842.1284,300284,742442chloother_
CotAD_ 51,667LEA1LEA 1–41649.3316,897.52Dt02_chr14443,597444,162565nuclother_
CotAD_ 52,203LEA1LEA 1–54209.145,990.36Dt02_chr14444,540450,3455805cytoother_
CotAD_ 00275LEA2LEA 2–9827410.0929,834.66Dt09_chr232,049,1642,049,988824chlootherM
CotAD_ 00465LEA2LEA 2–1013049.5933,689.28Dt09_chr233,367,7093,368,9361227chlootherM
CotAD_ 00799LEA2LEA 2–1543378.9638,982.02scaffold26.12,048,6052,051,8043199golgotherM
CotAD_ 00808LEA2LEA 2–15522610.0826,011.22scaffold26.12,100,1302,100,810680cytootherM
CotAD_ 01033LEA2LEA 2–1052029.0622,587.14Dt10_chr201,010,9841,011,592608chlootherM
CotAD_ 01298LEA2LEA 2–10721810.2224,021.4Dt10_chr205,288,4145,289,070656cytoother_
CotAD_ 01321LEA2LEA 2–1082389.5426,020.28Dt10_chr205,756,5145,757,230716cytoSPM
CotAD_ 01385LEA2LEA 2–89247727,497.03Dt09_chr23159,994161,6901696cytoother_
CotAD_ 01700LEA2LEA 2–1002609.3628,399.83Dt09_chr232,815,7792,816,561782cytoother_
CotAD_ 02652LEA2LEA 2–9721210.0523,764.43Dt09_chr232,032,5082,033,146638mitoother_
CotAD_ 03037LEA2LEA 2–632629.0528,472.57Dt05_chr191,838,0691,841,1453076cytoother_
CotAD_ 03649LEA2LEA 2–343209.8235,345.6At_chr091,775,7191,776,8441125cytoother_
CotAD_ 03784LEA2LEA 2–751166.8213,537.66Dt07_chr16548,573548,923350chloother_
CotAD_ 05724LEA2LEA 2–3219710.0522,442.51At_chr091,755,5471,756,140593chloother/SP_
CotAD_ 05725LEA2LEA 2–332389.7327,552.78At_chr091,759,5121,760,228716nuclSP_
CotAD_ 06037LEA2LEA 2–11520510.0722,125.81Dt13_ch1890,45591,072617chloSP_
CotAD_ 07087LEA2LEA2–32069.7522,853.64At_chr022,101,7302,102,350620plasother_
CotAD_ 08181LEA2LEA 2–992028.6122,460.02Dt09_chr232,228,5672,229,175608cytoSP_
CotAD_ 08350LEA2LEA 2–1521985.0222,266.98scaffold190.1904,483905,252769chloSP_
CotAD_ 08837LEA2LEA 2–1252458.7726,376.34scaffold280.151,95253,9592007golgother_
CotAD_ 09578LEA2LEA 2–302609.3928,406.84At_chr091,173,1181,173,900782chloother_
CotAD_ 09685LEA2LEA 2–9325110.0727,153.8Dt09_chr23686,730687,485755chloSPC
CotAD_ 09732LEA2LEA 2–962329.4425,906.5Dt09_chr231,174,9941,175,943949chloother_
CotAD_ 10,376LEA2LEA 2–482779.9230,152.74Dt01_chr15100,033100,866833chloSPM
CotAD_ 11,658LEA2LEA 2–842639.8229,835.19Dt08_chr24199,067199,858791cytoSPM
CotAD_ 11,875LEA2LEA 2–1471756.9520,070.28scaffold42.1647,003647,530527chloSPM
CotAD_ 11,876LEA2LEA 2–14820910.0123,563.32scaffold42.1677,153677,782629chloSP_
CotAD_ 11,878LEA2LEA 2–1492269.4925,841.73scaffold42.1688,643689,323680chloSP_
CotAD_ 11,879LEA2LEA 2–1501299.4515,037.05scaffold42.1690,007690,396389chloSP_
CotAD_ 12,375LEA2LEA 2–251908.5921,328.78At_chr09106,696107,358662chloSP_
CotAD_ 13,115LEA2LEA 2–861929.4220,770.35Dt08_chr241,183,0521,183,630578extrSP_
CotAD_ 13,584LEA2LEA 2–672509.9628,048.83Dt06_chr25364,402365,154752golgSP_
CotAD_ 13,827LEA2LEA 2–1143607.8740,945.87Dt12_chr26971,427972,7481321E.R.mTPC
CotAD_ 14,147LEA2LEA 2–162129.9423,855.54At_chr07774,151774,789638mitoSPS
CotAD_ 15,892LEA2LEA 2–1123077.734,741.21Dt12_chr26406,661409,1532492chlomTP_
CotAD_ 16,731LEA2LEA 2–9425810.0128,519.44Dt09_chr23724,944725,720776chloother_
CotAD_ 17,044LEA2LEA 2–171514.8416,422.87At_chr07972,098972,634536cytoSPS
CotAD_ 17,045LEA2LEA 2–182199.7923,930.18At_chr07992,750993,409659cytoSP_
CotAD_ 17,062LEA2LEA 2–192449.7827,393.16At_chr071,176,1611,176,895734chlomTPC
CotAD_ 17,101LEA2LEA 2–92229.2625,294.09At_chr0694,60195,269668mitoSPS
CotAD_ 17,102LEA2LEA 2–1020910.3523,661.48At_chr06122,145122,774629nuclSP_
CotAD_ 17,103LEA2LEA 2–112656.730,299.29At_chr06134,827135,702875mitoSPC
CotAD_ 17,649LEA2LEA 2–372358.526,726.9At_chr10359,189361,4402251chloSP_
CotAD_ 18,210LEA2LEA 2–14120310.1722,501.33scaffold377.1414,366414,977611cytoSPC
CotAD_ 18,233LEA2LEA 2–14520310.0822,406.26scaffold377.1560,351560,962611chloSP_
CotAD_ 18,546LEA2LEA 2–951739.9119,695.85Dt09_chr23893,109893,715606chlomTPM
CotAD_ 18,729LEA2LEA 2–1422779.9230,227.97scaffold336.1433,013433,846833chloSP_
CotAD_ 19,078LEA2LEA 2–422169.8324,007.7At_chr1218,10318,753650nuclother_
CotAD_ 19,107LEA2LEA 2–431839.0420,031.24At_chr12312,977313,528551chloSPC
CotAD_ 19,205LEA2LEA 2–462976.8333,395.7At_chr121,142,9541,145,4752521chloSPS
CotAD_ 19,213LEA2LEA 2–381009.6411,538.35At_chr10410,401410,703302chlomTP_
CotAD_ 19,214LEA2LEA 2–391819.3220,628.72At_chr10411,491412,036545nuclSPS
CotAD_ 19,375LEA2LEA 2–1112258.5725,956.2Dt11_chr211,023,1651,023,842677golgmTPC
CotAD_ 20,020LEA2LEA 2–132509.8927,947.68At_chr06997,148997,900752mitoSP_
CotAD_ 20,308LEA2LEA 2–721919.5621,054.44Dt06_chr251,390,0371,390,612575chlocTPC
CotAD_ 21,731LEA2LEA 2–622449.8327,381.21Dt05_chr191,524,6331,525,367734nuclother_
CotAD_ 21,924LEA2LEA 2–11026210.1628,411.4Dt11_chr21855,130855,918788nuclSPS
CotAD_ 23,646LEA2LEA 2–742049.8121,921.93Dt07_chr1634,22734,841614nuclSP_
CotAD_ 24,019LEA2LEA 2–7120310.0422,391.06Dt06_chr25652,496653,107611mitoSPS
CotAD_ 24,497LEA2LEA 2–1062638.6429,247.79Dt10_chr201,715,9591,719,0933134chloother_
CotAD_ 24,499LEA2LEA 2–1381757.6620,026.25scaffold238.1343,833344,360527chloSP_
CotAD_ 25,271LEA2LEA 2–13920910.0123,559.33scaffold238.1356,524357,153629nuclSP_
CotAD_ 26,038LEA2LEA 2–1402269.4325,852.71scaffold238.1383,318383,998680chloSPS
CotAD_ 26,981LEA2LEA 2–2627410.0929,936.66At_chr09239,371240,195824chloother_
CotAD_ 27,453LEA2LEA 2–1312399.7626,994.13scaffold477.1176,874179,5262652mitoSP_
CotAD_ 27,789LEA2LEA 2–1511849.4120,135.39scaffold699.1759,396759,950554E.R.SPM
CotAD_ 28,249LEA2LEA 2–271509.2416,764.6At_chr09274,309275,011702nuclSP_
CotAD_ 28,252LEA2LEA 2–132228.6524,982.77At_chr07296,696299,0632367mitoother_
CotAD_ 28,872LEA2LEA 2–572579.126,949.97Dt03_chr171,936,8281,937,663835nuclSP_
CotAD_ 29,279LEA2LEA 2–1163059.6634,588.47Dt13_chr18639,522641,5492027chloSP_
CotAD_ 31,344LEA2LEA 2–1321015.5111,711.01scaffold1346.1193,028193,333305chloother_
CotAD_ 31,535LEA2LEA 2–82407.8927,649.86At_chr05790,866791,588722vacuother_
CotAD_ 31,536LEA2LEA 2–1362109.1923,875.63scaffold1346.1213,521214,153632plasSP_
CotAD_ 31,537LEA2LEA 2–13325410.2227,558.52scaffold1841.1200,526201,290764nuclcTPC
CotAD_ 31,780LEA2LEA 2–873109.9334,525.38Dt08_chr241,487,2961,488,5161220chloother_
CotAD_ 31,782LEA2LEA 2–902107.7223,638.39Dt09_chr23194,606195,238632chloSP_
CotAD_ 31,860LEA2LEA 2–1532069.8222,839.69scaffold257.11,162,4061,163,026620cytoSP_
CotAD_ 31,906LEA2LEA 2–1372329.6626,256.38scaffold769.1292,760295,4312671cytomTPM
CotAD_ 31,936LEA2LEA 2–531524.7416,462.97Dt01_chr15598,039598,839800mitoSP_
CotAD_ 32,487LEA2LEA 2–363059.9733,718.76At_chr11169,902171,2171315mitoother_
CotAD_ 32,645LEA2LEA 2–661999.322,785.41Dt06_chr25246,850247,449599chloSPC
CotAD_ 32,847LEA2LEA 2–242499.7927,707.74At_chr0961,15561,904749extrcTPC
CotAD_ 33,143LEA2LEA 2–543059.6334,544.43Dt02_chr141,894,1741,896,1972023chloSP_
CotAD_ 33,144LEA2LEA 2–602408.4927,655.92Dt05_chr19151,373152,095722chloSPS
CotAD_ 34,476LEA2LEA 2–923209.9235,579.84Dt09_chr23448,827449,9521125cytoSP_
CotAD_ 34,798LEA2LEA 2–682229.2325,253.03Dt06_chr25385,794386,462668nuclSP_
CotAD_ 35,069LEA2LEA 2–6920910.2523,628.4Dt06_chr25396,513397,142629chloSP_
CotAD_ 35,091LEA2LEA 2–702887.132,755.52Dt06_chr25403,729404,595866extrSP_
CotAD_ 35,514LEA2LEA 2–612065.923,420.27Dt05_chr19399,904400,524620mitoSPC
CotAD_ 36,328LEA2LEA 2–1444504.9249,131.5scaffold821.1548,888550,2401352chloother_
CotAD_ 36,446LEA2LEA 2–782319.4724,949.39Dt08_chr2458,78259,477695chloother_
CotAD_ 36,583LEA2LEA 2–1462068.8822,761.2scaffold821.1625,818626,438620chloother_
CotAD_ 37,776LEA2LEA 2–912029.0222,357.93Dt09_chr23337,931338,539608chloSPS
CotAD_ 37,888LEA2LEA 2–2128310.1531,410.18At_chr082,313,4182,314,5781160chloother_
CotAD_ 38,978LEA2LEA 2–852109.7622,644.27Dt08_chr24376,609377,241632nuclSPS
CotAD_ 39,064LEA2LEA 2–502109.4823,699.74Dt01_chr15220,837221,469632chloother_
CotAD_ 39,719LEA2LEA 2–521916.2920,961.07Dt01_chr15397,156397,731575nuclmTP_
CotAD_ 40,324LEA2LEA 2–152049.8121,780.76At_chr07720,430721,044614plasSP_
CotAD_ 41,569LEA2LEA 2–4720810.1922,559.45At_chr13343,514344,140626chloother_
CotAD_ 41,571LEA2LEA 2–882709.5630,627.54Dt09_chr2364,51265,324812chloSP_
CotAD_ 41,925LEA2LEA 2–1281889.2221,941.4scaffold1231.194,27094,836566nuclother_
CotAD_ 42,599LEA2LEA 2–1293739.943,118.75scaffold1231.196,29798,5172220cytoOtherM
CotAD_ 44,357LEA2LEA 2–1432109.3423,874.6scaffold1088.1451,853452,485632cytootherC
CotAD_ 45,324LEA2LEA 2–1092569.9928,431.93Dt11_chr2155,31761,8296512chloother_
CotAD_ 46,873LEA2LEA 2–292591028,603.52At_chr09355,476356,255779vacuother_
CotAD_ 47,322LEA2LEA2–52209.8524,666.72At_chr03430,461431,123662chloSP_
CotAD_ 47,454LEA2LEA 2–1306616.1473,583.12scaffold1851.1116,914132,92416,010cyskSPC
CotAD_ 47,495LEA2LEA 2–7631810.0935,234.15Dt07_chr161,185,3271,186,4791152chloSPS
CotAD_ 47,749LEA2LEA 2–772519.4127,769.63Dt07_chr161,400,3231,401,078755chlomTPM
CotAD_ 48,050LEA2LEA 2–1032179.2824,968.87Dt10_chr20968,935969,588653mitoother_
CotAD_ 48,069LEA2LEA 2–1041819.5720,577.73Dt10_chr20970,347970,892545extrSPS
CotAD_ 48,336LEA2LEA 2–582119.1223,479.93Dt04_chr22552,418553,053635nuclSP_
CotAD_ 48,753LEA2LEA 2–122109.2823,676.69At_chr06482,445483,077632mitoSP_
CotAD_ 48,769LEA2LEA 2–283049.5633,675.21At_chr09334,020335,2451225nuclSP_
CotAD_ 49,818LEA2LEA 2–1193174.6335,274.16scaffold2616.121,21922,172953cytoSP_
CotAD_ 53,045LEA2LEA 2–1022067.5822,650.27Dt10_chr20363,682364,302620cytocTPC
CotAD_ 53,263LEA2LEA 2–2325110.1127,168.81At_chr0924,39725,152755chloSP_
CotAD_ 53,981LEA2LEA 2–1232476.5927,715.29scaffold3326.142,20943,9441735mitocTPC
CotAD_ 54,337LEA2LEA 2–141524.8416,453.02At_chr07366,521367,321800chloSPS
CotAD_ 55,224LEA2LEA 2–552109.6623,769.83Dt03_chr17607,531608,163632mitoSPM
CotAD_ 56,356LEA2LEA 2–221739.9619,737.98At_chr0918,71219,318606chloSPS
CotAD_ 56,696LEA2LEA 2–562139.5123,750.48Dt03_chr17634,717635,358641nuclSP_
CotAD_ 58,358LEA2LEA 2–11320910.1923,626.51Dt12_ch26897,133897,762629chloSPS
CotAD_ 59,405LEA2LEA 2–613209.935,457.72Dt05_chr19251,378252,5011123chloSPS
CotAD_ 60,279LEA2LEA 2–1242478.7626,619.63scaffold2414.150,03752,0482011chloSP_
CotAD_ 60,435LEA2LEA2–12519.5727,952.81At_chr01137,428138,183755chlocTPC
CotAD_ 60,617LEA2LEA 2–492109.5123,780.9Dt01_chr15198,189198,821632mitoSP_
CotAD_ 61,173LEA2LEA2–72159.8424,043At_chr0459,86460,511647chlocTP_
CotAD_ 61,391LEA2LEA 2–511916.2920,884.97Dt01_chr15284,374284,949575chloSPC
CotAD_ 62,996LEA2LEA2–23189.9535,356.25At_chr01176,895178,0451150nuclSP_
CotAD_ 63,174LEA2LEA 2–1173779.7741,228.93scaffold3177.119,13721,2212084E.R.SPC
CotAD_ 64,004LEA2LEA 2–732199.6523,825.02Dt07_chr1634,19834,857659chloother_
CotAD_ 64,120LEA2LEA 2–4121810.1424,050.43At_chr129511607656chloSP_
CotAD_ 64,346LEA2LEA 2–642108.9923,572.5Dt06_chr2559,64360,275632chloSP_
CotAD_ 64,347LEA2LEA 2–652359.4426,111.93Dt06_chr2562,52463,231707plascTPC
CotAD_ 64,657LEA2LEA 2–4026210.2228,516.58At_chr11144,295145,083788vacuSP_
CotAD_ 65,119LEA2LEA 2–792068.8822,733.19Dt08_chr2459,66060,280620golgSP_
CotAD_ 65,370LEA2LEA 2–1263269.9936,098.18scaffold3528.184,69686,7842088chloother_
CotAD_ 66,245LEA2LEA 2–824504.9448,836.2Dt08_chr24121,249122,6011352chloother_
CotAD_ 66,538LEA2LEA2–62119.4723,424.96At_chr0459,28259,917635chloSP_
CotAD_ 66,551LEA2LEA 2–1182259.2825,226.24scaffold3976.120,35421,031677cytoother_
CotAD_ 66,774LEA2LEA 2–802169.9224,090.84Dt08_chr2468,42469,074650chloSP_
CotAD_ 66,775LEA2LEA 2–812259.6125,078.29Dt08_chr2472,94573,622677chloSPS
CotAD_ 67,823LEA2LEA 2–202229.4923,928.26At_chr08132,953133,621668cytoSPS
CotAD_ 68,063LEA2LEA2–42189.323,245.72At_chr03191,498192,154656cytoSP_
CotAD_ 68,189LEA2LEA 2–352066.7122,579.21At_chr1067,60768,227620chlocTPC
CotAD_ 69,737LEA2LEA 2–1342139.7523,867.69scaffold2095.1202,243202,884641chloSPS
CotAD_ 69,738LEA2LEA 2–1352109.8823,893.04scaffold2095.1208,893209,525632chloSP_
CotAD_ 70,003LEA2LEA 2–421919.6320,942.44At_chr12171,793172,368575cytocTPC
CotAD_ 70,190LEA2LEA 2–1204304.8148,185.02scaffold4817.131,92137,1405219cytoother_
CotAD_ 70,192LEA2LEA 2–1221304.7414,420.49scaffold4817.138,39738,789392nuclSPM
CotAD_ 71,431LEA2LEA 2–591869.5820,579.98Dt05_chr1965,53066,090560extrother_
CotAD_ 72,458LEA2LEA 2–1271929.5420,613.31scaffold3083.191,82892,406578cyskSP_
CotAD_ 72,913LEA2LEA 2–1213154.6335,071.89scaffold4398.134,68936,3901701cyskSP_
CotAD_ 73,966LEA2LEA 2–453209.9635,484.73At_chr12365,460366,5831123chloother_
CotAD_ 74,713LEA2LEA 2–832119.1223,479.93Dt08_chr24158,342158,977635golgother_
CotAD_ 76,129LEA2LEA 2–4420910.1923,626.51At_chr12317,009317,638629chloother_
CotAD_ 01504LEA3LEA 3–13938.8210,469.06Dt09_chr231,171,1081,171,516408chlootherC
CotAD_ 04558LEA3LEA 3–11009.7510,627.15At_chr04288,394288,796402chlomTPS
CotAD_ 04559LEA3LEA 3–21009.3410,419.9At_chr04291,007291,400393chloSPS
CotAD_ 21,416LEA3LEA 3–9929.839667.02Dt04_chr22127,037127,390353chlomTPC
CotAD_ 22,634LEA3LEA 3–111009.1810,503.02Dt04_chr22853,079853,496417mitoSPS
CotAD_ 23,118LEA3LEA 3–12999.5610,350.8Dt04_chr22855,441855,830389cytomTPS
CotAD_ 24,498LEA3LEA 3–81266.2713,502.83Dt03_chr171,079,7731,082,2222449chlomTPM
CotAD_ 26,668LEA3LEA 3–16989.3410,595.99scaffold141.1891,415891,795380chlomTPM
CotAD_ 33,003LEA3LEA 3–151207.0213,729.35scaffold944.1428,674429,107433chlomTPM
CotAD_ 35,021LEA3LEA 3–5859.779781.28At_chr11537,390537,732342chlomTPM
CotAD_ 36,999LEA3LEA 3–41266.2713,484.86At_chr081,181,2171,183,6292412cytomTPM
CotAD_ 40,972LEA3LEA 3–71249.5114,155.5At_chr13122,903123,887984chlomTPM
CotAD_ 41,714LEA3LEA 3–141059.311,363.76Dt11_chr21445,383445,797414chlomTPM
CotAD_ 43,605LEA3LEA 3–3929.759709.1At_chr04475,252475,623371nuclmTPC
CotAD_ 46,270LEA3LEA 3–61059.5111,449.87At_chr11673,169673,590421golgmTPS
CotAD_ 56,728LEA3LEA 3–10989.6610,549.99Dt04_chr22520,431520,811380golgmTPS
CotAD_ 00667LEA4LEA 4–122399.0326,335.09scaffold26.1961,112961,910798chlocTPC
CotAD_ 02872LEA4LEA 4–45695.8962,916.57Dt05_chr19496,264498,0581794mitoother_
CotAD_ 05963LEA4LEA 4–52665.229,264.99Dt05_chr192,777,8422,778,9351093extrSPS
CotAD_ 09404LEA4LEA 4–71279.2113,553.37Dt07_chr162,927,0502,928,024974chlocTPC
CotAD_ 09405LEA4LEA 4–81099.9612,066.81Dt07_chr162,946,0072,947,0471040chlomTPM
CotAD_ 10,044LEA4LEA 4–36345.7868,352.06At_chr07513,343515,4542111nuclmTP_
CotAD_ 13,989LEA4LEA 4–1010910.0412,094.87Dt13_chr181,907,7251,908,7661041chlomTPM
CotAD_ 22,633LEA4LEA 4–61366.9314,614.04Dt06_chr25240,527241,024497chloother_
CotAD_ 23,824LEA4LEA 4–94055.8844,549.64Dt12_chr262,459,6142,460,9131299chloother_
CotAD_ 50,359LEA4LEA4–12845.1331,311.17At_chr03809,118810,1651047cytoSPS
CotAD_ 62,314LEA4LEA 4–112399.0326,258.97scaffold3310.1996510,763798cytocTPC
CotAD_ 62,659LEA4LEA4–25685.9662,738.55At_chr0614,80716,5971790cytoother_
CotAD_ 74,061LEA4LEA 4–44055.6944,521.59At_chr12408,138409,4361298nuclother_
CotAD_ 03264LEA5LEA 5–41105.5511,915.92Dt06_chr251,093,1531,093,598445nuclother_
CotAD_ 07516LEA5LEA 5–21235.7813,999.94At_chr091,687,2671,687,988721cytoother_
CotAD_ 22,539LEA5LEA 5–91025.4911,072.09scaffold613.1610,955611,364409chloother_
CotAD_ 31,869LEA5LEA 5–81025.4911,072.09scaffold1551.1323,034323,444410nuclother_
CotAD_ 33,321LEA5LEA 5–71105.5511,972.03scaffold1788.1214,115214,558443chloother_
CotAD_ 46,888LEA5LEA 5–11447.7616,526.85At_chr08563,572564,7741202chloother_
CotAD_ 48,469LEA5LEA 5–51718.3919,503.27Dt08_chr24159,604160,8051201nuclother_
CotAD_ 56,699LEA5LEA 5–6948.110,073.96Dt10_chr2081,02081,398378chloother_
CotAD_ 57,519LEA5LEA 5–3948.110,073.96At_chr12153,745154,123378vacuother_
CotAD_ 13,789LEA6LEA 6–3867.89580.48Dt12_chr26651,517651,777260nuclother_
CotAD_ 19,623LEA6LEA6–1944.7610,176.98At_chr011,336,3401,336,624284extrother_
CotAD_ 44,941LEA6LEA 6–411411.8311,963.43scaffold3339.162,11962,571452cytoother_
CotAD_ 53,438LEA6LEA 6–2944.7510,257.11Dt07_chr1678,65478,938284chloSPM
CotAD_ 11,594SMPSMP-102644.8526,898.86scaffold189.1992,496993,467971cytoother_
CotAD_ 12,680SMPSMP-31694.6317,180.76At_chr071,633,7541,634,391637cytoother_
CotAD_ 12,681SMPSMP-41444.6114,950.53At_chr071,636,1231,636,635512chloother_
CotAD_ 12,682SMPSMP-52584.5626,168.03At_chr071,639,4671,640,458991cytoother_
CotAD_ 39,233SMPSMP-71714.4917,755.88Dt13_chr181,963,7151,964,463748chloother_
CotAD_ 43,455SMPSMP-62614.7926,971.01Dt12_chr261,316,5991,317,7061107chloother_
CotAD_ 45,390SMPSMP-12526.3526,222.21At_chr01335,790337,5781788chloother_
CotAD_ 51,205SMPSMP-92536.4626,151.04scaffold1984.1277,433279,2621829chloother_
CotAD_ 66,708SMPSMP-22586.4427,923.82At_chr04123,169126,3373168cytoother_
CotAD_ 67,721SMPSMP-82644.9326,885.82scaffold4155.134,27735,248971cytoother_

LEA: late embryogenesis abundant protein; LEA1, 2, 3, 4, 5 and 6 indicates the sub families of LEA proteins while the −1, −2,-3…. Represents the protein annotation number i.e. LEA1–1, the first member of LEA1 sub family; SMP: seed maturation protein; chlo: chloroplast; cyto: cytoplasm; extr: extracellular part of the cell; nucl: nucleus; mito: mitochondrion; cysk: cytoskeleton; golg: golgi body; vacu: vacuole; plas: plasma membrane; E.R: endoplasmic reticulum; SP: Secretory pathway (presence of a signal peptide); mTP: mitochondrial targeting peptide; cTP: chloroplast transit peptide; Other (nucleus, cytoplasmic, or otherwise). C: cytoplasm; S: secretory pathway; M: mitochondrion and -: others/other cell organelles; chr: chromosome; Dt: sub genome D and At: sub-genome A

LEA gene in upland cotton, Gossypium hirsutum and their sequence characteristics and subcellular location prediction and chromosome position LEA: late embryogenesis abundant protein; LEA1, 2, 3, 4, 5 and 6 indicates the sub families of LEA proteins while the −1, −2,-3…. Represents the protein annotation number i.e. LEA1–1, the first member of LEA1 sub family; SMP: seed maturation protein; chlo: chloroplast; cyto: cytoplasm; extr: extracellular part of the cell; nucl: nucleus; mito: mitochondrion; cysk: cytoskeleton; golg: golgi body; vacu: vacuole; plas: plasma membrane; E.R: endoplasmic reticulum; SP: Secretory pathway (presence of a signal peptide); mTP: mitochondrial targeting peptide; cTP: chloroplast transit peptide; Other (nucleus, cytoplasmic, or otherwise). C: cytoplasm; S: secretory pathway; M: mitochondrion and -: others/other cell organelles; chr: chromosome; Dt: sub genome D and At: sub-genome A The only LEA proteins with all its members having Pl < 7, were the SMPs, this results is in agreement to Pl values obtained for SMPs in Brassica napus, all had Pl < 7.0 [39]. The grand average of hydropathy (GRAVY) results as obtained from ExPASy indicated that cotton LEA 2 proteins are the most hydrophobic, with all except three with GRAVY values <0. The rest of the LEA proteins were highly hydrophilic, with almost all of the groups had gravy value of less than 0, these results are consistent with those of the LEA proteins in Arabidopsis thaliana [40]. Low hydrophobicity and high net charge are the main characteristics of other LEA proteins [38] which enables them to be totally or partially disordered, this unique features is an attribute which gives the LEA proteins the ability to form flexible structural elements such as molecular chaperones which are integral for the protection of plants from desiccation effects [41]. TargetP1.1 (http://www.cbs.dtu.dk/services/TargetP/) server [24] and Protein Pprowler Subcellular Localisation Predictor version 1.2 (http://bioinf.scmb.uq.edu.au/Pprowler_webapp_1–2/) [25], were used to predict the subcellular location of 242 Gossypium hirsutum LEA proteins, most of the LEA proteins were predicted to participate in the secretory pathway, same as the Brassica napus LEA proteins [39] (Table 2 and Additional file 2: Table S2). We further used WoLFPSORT [26] to investigate the particular cell compartments in which the LEA proteins were embedded in, 148 LEA genes were predicted to be chloroplasts proteins, 47 as cytoplasm proteins, 20 as mitochondrion proteins, 35 as nucleus proteins, 11 as Golgi body proteins, 7 as extracellular proteins, 7 as plasma proteins, 4 as vacuole proteins and 3 as endoplasmic reticulum proteins. The details of other characteristics of the nucleic acid and protein sequences are provided in (Table 2). LEA genes have ubiquitous distribution across cell compartments with unique subcellular localization [42]. LEA 4 gene families were found to be widely distributed in cell structures such as cytosol, mitochondria, plastid, ER, and pexophagosome [42]. The unique and wide distribution of LEA genes within the various cell structures is to establish interactions with various cellular membranes under stress conditions. The broad subcellular distribution of LEA proteins highlights the requirement for each cellular compartment to be provided with protective mechanisms to cope with drought stress [17]. In Summary, both experimental and prediction data indicates that LEA proteins have wide distribution in subcellular compartments [42].

Phylogenetic analyses, gene structure and protein motifs of LEA genes in upland cotton

To examine the evolutionary history and relationships of LEA protein families, an unrooted phylogenetic tree was constructed from alignments of the full lengths of LEA gene sequences with Neighbor-joining method based on similarities of the LEA genes in upland cotton, G. hirsutum. We constructed phylogenetic tree of all the groups of LEA genes separately, which we further combined with intron-exon and motifs to unearth more information about phylogenetic tree and LEA genes similarities (Fig. 1). Gene structural diversity and conserved motif divergence are possible mechanisms for the evolution of multigene families [43]. To gain further information into the structural diversity of cotton LEA genes, we analyzed the exon / intron organization in the full-length cDNAs with their corresponding genomic DNA sequences of individual LEA genes in cotton (Fig. 1). Most closely related LEA gene members within the same groups shared similar gene structures in terms of either intron numbers or exon lengths. For example, LEA 1,3,4,5, SMP and dehydrins genes had one to four introns with exception of LEA 2 and 6, which had zero to five introns. This result is in agreement with earlier finding in which dehydrin were found to have introns [44]. By contrast, the gene structure appeared to be more variable in LEA 2 which had the largest number of genes, with sizes of exon/intron structure variants with striking distinctions (Additional file 3: Figure S1). The result suggest the divergence functions of this group of protein family in upland cotton.
Fig. 1

Phylogenetic tree, gene structure and motif compositions of LEA genes in upland cotton. The phylogenetic tree was constructed using MEGA 6.0. Exon/intron structures of LEA genes in upland cotton, exons introns and up / down-stream were represented by yellow boxes, black lines and blue boxes, respectively. Protein motif analysis represented by different colours, and each motif represented by number

Phylogenetic tree, gene structure and motif compositions of LEA genes in upland cotton. The phylogenetic tree was constructed using MEGA 6.0. Exon/intron structures of LEA genes in upland cotton, exons introns and up / down-stream were represented by yellow boxes, black lines and blue boxes, respectively. Protein motif analysis represented by different colours, and each motif represented by number Twenty-five distinct motifs were identified. Motifs 1, 2, 3, 4, 5 and 6 were common among all the different groups of LEA genes, similar motifs have been previously identified in other plant species, including maize [45], Arabidopsis thaliana [40], tomato [46] among other plants. Motif analysis of the cotton LEA proteins showed that members of each LEA group possess several group-specific conserved motifs (Table 3). Similar features have been reported for LEA proteins in Solanum lycopersicum [46], Arabidopsis [40], Prunus [47] and poplar [48]. For example, a distinctive and conserved motif in the dehydrin group is the repeated motif, EKKGIMDKIKEKLPG (motif K, richness in lysine residues), in this study, we identified a unique motif among the dehydrin families, GEGREKKGFLEKIKEKLPGHHKKTEEAS, which we named as K1, because of the close similarity with the K- motif. In addition, the commonly known motifs such as EHHEKKGIMDKIKEKLPGHH (K motif) and HSLLEKLHRSNSSSSSSSSDE (S- motif) were also observed. K motif is rich in lysine (K) residues and it is known for protective role of enzymatic activities from the drought effects [49]. The motif pattern formation indicates that cotton LEA proteins are actively involved in various biological processes and are group specific in terms of their activities. The distinct nature of the conserved motifs observed in all the LEA protein families, gives an indication that, the LEA proteins evolved from the gene expansion within their specific gene families. In addition, LEA 4 gene families were found to contain repeats of conserved motif 3, in which in some cases, the repeats were 5, the same attribute was also noted in which they were found to have tendencies of harboring repeat motifs, more so motif 8 [37]. We further did comparison of the common motifs with already identified motif, by the use of Tomtom motif comparison tool, adopting the distance measure of Sandelin-Wasserman function [50]. Motif 1 had 23 matches, with 5 jolma2013, 3 JASPERCORE2014 vertebrates and 15-uniprobe mouse. In motif 2, had 35 matches, 5 jolma2013, 5JASPERCORE2014 vertebrates and 25-uniprobe mouse. With MEME functional tool, we were able to affirm the similarities of our motifs to already published motif in the motif database.
Table 3

A consensus amino acid sequence of the different motifs features of each upland cotton LEA protein families

The colour scheme of the logo indicates amino acid types. Polar: green = uncharged; blue = +vely charged; red = -vely charged; Non-polar: violet/purple = aliphatic. As described by Dure, 2001

A consensus amino acid sequence of the different motifs features of each upland cotton LEA protein families The colour scheme of the logo indicates amino acid types. Polar: green = uncharged; blue = +vely charged; red = -vely charged; Non-polar: violet/purple = aliphatic. As described by Dure, 2001

Phylogenetic analyses of the LEA -proteins in cotton with other plants

To get a better understanding of the evolutionary history and relationships of LEA gene families in cotton to other plants, multiple sequence alignment of 242 genes for G. hirsutum, 136 genes for G. arboreum, 146 genes for G. raimondii, 30 genes for Pinus tabuliformis and 51 genes for Arabidopsis LEA protein sequences (Fig. 2) were done. The boot strap values for some nodes of the NJ tree were low due to long sequence similarities. The reliability of the phylogenetic tree was done by reconstructing the phylogenetic tree with minimal evolution method. The trees produced by the two methods were identical thus the results were consistent. Based on the Phylogenetic tree analysis, LEA genes in cotton were further classified into eight (8) groups. LEA 2 was the largest group with 334 genes from G. hirsutum (157), G. raimondii (89), G. arboreum (85), A. thaliana (3) and P. tabuliformis (1). All the ortholog genes in LEA 2 were found in upland cotton, G. hirsutum, G. arboreum and G. raimondii genome while no ortholog genes were observed between upland cotton, G. hirsutum to either Arabidopsis thaliana and or P. tabuliformis. The second largest group were LEA 4, with highest number of genes 13 and 16 in P. tabuliformis and upland cotton respectively. Upland cotton, G. hirsutum contained the highest numbers of LEA genes of the following groups, LEA 1, LEA 2, LEA 3, LEA 5, LEA 6, SMP and dehydrin with the exception of LEA 4. Among all the LEA gene groups, only LEA 6 had fewer genes, 10 and 3 gene in cotton genome and Arabidopsis respectively (Table 1). The total number of ortholog genes between upland cotton, G. hirsutum, G. arboreum and G. raimondii were 201 out of 601 genes mapped on the Phylogenetic tree, which translates to 33%.
Fig. 2

Phylogenetic relationship of LEA genes in three cotton species with Arabidopsis and Pinus tabuliformis. Neighbor-joining phylogeny of 242 genes for G. hirsutum, 136 genes for G. arboreum, 146 genes for G. raimondii, 30 genes for Pinus tabuliformis and 51 Arabidopsis LEA protein sequences, as constructed by MEGA 6.0. The difference colours mark the various LEA gene types

Phylogenetic relationship of LEA genes in three cotton species with Arabidopsis and Pinus tabuliformis. Neighbor-joining phylogeny of 242 genes for G. hirsutum, 136 genes for G. arboreum, 146 genes for G. raimondii, 30 genes for Pinus tabuliformis and 51 Arabidopsis LEA protein sequences, as constructed by MEGA 6.0. The difference colours mark the various LEA gene types In this study, no ortholog genes were detected between upland cotton, G. hirsutum, P. tabuliformis and Arabidopsis. All the ortholog genes were detected only among the cotton species; this could be due to their evolution. Upland cotton, G. hirsutum emerged through hybridization mechanism between A and D genome [51]. Based on the results, there was close relationship between Arabidopsis and cotton species as compared to P. tabuliformis. The LEA genes seems to have a common evolutionary history [37], the aggregation pattern of the genes within the tree showed that LEA 1, LEA3, LEA 4, LEA5, LEA 6 and SMP had a common origin, similar results have also been obtained in the analysis of LEA genes in potato, in which SMP, LEA 1, LEA4 and LEA 5 shared a common point of origin [52]. A unique feature on the abundance of cotton LEA genes and their distribution was observed in which LEA 2 formed the majority of the cotton LEA gene families (Table 1 and Fig. 2). Analysis of the LEA genes in monocots and dicots, nearly half of the species containing LEA genes, the majority of the genes belong to the LEA 4 and dehydrin families [39]. The analysis of cotton LEA genes with other plants revealed that main differences occur in the LEA 2 genes (Fig. 2). The abundance of LEA 2 genes was lowest in Pinus tabuliformis (1) and Arabidopsis (3) and higher in G. hirsutum (157), G. arboreum (85) and G. raimondii (89). Similar results were also been observed in which lower proportions of other LEA gene families were observed in grapevine but significantly higher number of LEA 2 genes were observed in rice and poplar [53]. It is important to note that, such large number of LEA 2 families have not been described in the previously investigated genomes of poplar [48], rice [11] and Arabidopsis [40]. This result may be explained in part by the improved annotation of the higher plant genomes available at Phytozome (v10.2) and also by the fact that LEA 2 is an unusual group composed of ‘a typical’ proteins of hydrophobic nature. This finding suggests that the LEA protein families in higher plants may be larger and much more complex than previously described. On the other hand, minor variations were observed in the other upland cotton LEA gene families. Based on this result, the entire LEA 2 gene families probably were the last to evolve among the LEA gene families in higher plants.

Chromosomal distribution of cotton genes encoding LEA proteins

To determine the chromosomal locations of cotton LEA genes based on their positions, data retrieved from the whole cotton genome sequences were used. Chromosome distribution was done by BLASTN search against G. hirsutum and G. arboreum in cotton genome project and G.raimondii genome database in Phytozome (http://www.phytozome.net/cotton.php). One hundred and eighty six (186) upland cotton LEA genes were mapped in all chromosomes by Map chart and 56 upland LEA genes into unknown chromosomes (scaffold). A plot of LEA genes on the cotton genome shows that LEA loci are found on every chromosome (Fig. 3). The distribution of the mapped LEA genes were more in Dt with 110 (59%) compared to At, with only 76 (41%) genes. However, the densities of these loci were high on Dt_chr 09, with 9% of all the LEA genes. Gene loss was observed on At_chr 05, with a single gene compared to its homolog chromosome Dt_chr 05, which had 9 genes. Similar case was also noted on chromosome At_chr02 and Dt_chr02 with 2 and 7genes respectively. This result indicates an element of gene loss during the hybridization period, as result of crossing over or other internal or external chromosomal phenomenon.
Fig. 3

LEA genes distribution in tetraploid upland cotton, Gossypium hirsutum chromosomes. Chromosomal position of each LEA genes was mapped according to the upland cotton genome. The chromosome number is indicated at the top of each chromosome

LEA genes distribution in tetraploid upland cotton, Gossypium hirsutum chromosomes. Chromosomal position of each LEA genes was mapped according to the upland cotton genome. The chromosome number is indicated at the top of each chromosome In the A genome of, G. arboreum 136 LEA genes were mapped across all the 13 chromosomes, high density of these loci were observed on chromosome 10, which contained 21 genes, translating to 15% of all the LEA genes in the genome. The mapping of the gene loci were generally uniform, the lowest loci density was observed on chromosome 9, with 5 genes (4%), followed by chr 2, chr5 and chr 8, with 6 genes each (Fig. 3). In D genome, (G. raimondii), 143 LEA genes were distributed across all the chromosomes. The highest gene loci density was in chromosome 9 with 18 genes (13%) and the lowest density was in chromosome 12, with only 5 genes (3%). The mapping of the LEA genes in both diploid and tetraploid cotton chromosomes, tend to have a unique clustering pattern, high density LEA gene clusters were observed in specific chromosomal regions, either at the upper arm, lower arm or the middle region of the chromosomes as shown on chromosomes At_ch01, Dt01_chr15, Dt02_chr14, Dt05_chr19 and Dt10_chr20 within the AD genome, chr02, chr05, chr06 and chr07 in A genome and in D genome, ch07 and chr11 (Fig. 4). The clustering pattern of the LEA gene and chromosomal location could be attributed to LEA gene duplication patterns [37].
Fig. 4

LEA genes distribution in A and D cotton chromosomes: Chromosomal position of each LEA genes was mapped according to the upland cotton genome. The chromosome number is indicated at the top of each chromosome. a: chromosomes of the diploid cotton of A genome, G. arboreum; b: chromosomes of the diploid cotton of D genome, G. raimondii

LEA genes distribution in A and D cotton chromosomes: Chromosomal position of each LEA genes was mapped according to the upland cotton genome. The chromosome number is indicated at the top of each chromosome. a: chromosomes of the diploid cotton of A genome, G. arboreum; b: chromosomes of the diploid cotton of D genome, G. raimondii In general, genes which belong to the same family are distributed across the entire chromosomes in order to ensure their maximum functionalization [54]; this was evident in LEA 2 genes, which was distributed across the entire chromosomes of both tetraploid and diploid cotton (Figs. 3 and 4). It was unique to find some members of LEA gene families with restricted distribution, mainly found in some chromosomes but not all like dehydrin despite of their numbers, this implies that dehydrin like genes have the tendency to duplicate and evolve more conservatively within a particular chromosome.

Gene duplication and syntenic analysis

Expansion of gene families occurs through three processes namely, segmental duplication, tandem duplication and whole genome duplication [55]. Homologous and orthologous genes are the products of gene duplication events. Duplicated genes function in stress response and development processes in plants [56]. To analyse the relationships between the LEA genes and gene duplication events, syntenic blocks of LEA genes were combined among G. hirsutum, G. raimondii and G. arboreum (Fig. 5). A total of 241 LEA genes were duplicated across the three cotton genome. The most duplicated genes were detected between G. hirsutum and its ancestors, G. arboreum and G. raimondii, this could be due the origin of G. hirsutum, as a result of polyploidization of A and D genome (Table 4). Two types of duplication, tandem and segmental duplication event were identified. Majority of the duplicated LEA genes, were segmental, this implied that, segmental gene duplication, had a major contributing factor during the evolution time [57]. The Ka/Ks ratio is a pointer to selective pressure acting on a protein-coding gene. It has been observed that some systematic bias in some species do occur more easily in the process of nucleotides substitutions because of species diversity and high mutation rate do accelerates the changes in amino acid proportions [58]. The analysis of the Ka /Ks ratios of the 156 paralogous pairs, were less than 1 and only 20 had ratios of more than 1. Majority of LEA 2, LEA 4 and SMP had very low Ka/Ks ratios; the highest Ka/Ks ratio of 2.59265 was noted in LEA 6. This result is consistent to the previous findings of Brassica napus LEA genes, LEA 3 and LEA 6, families recorded higher Ka/Ks ratios, whereas LEA 5 and LEA 2 gene families recorded lowest Ka/Ks ratios [39]. In general Ka/Ks for paralogous gene pair of LEA genes had a range of 0 to 2.593 with mean of 0.4717. This result gives an indication that the LEA genes have been influenced extensively by purifying selection during the process of their evolution. LEA 2 gene families preferentially do have conserved structure and functions under selective pressure [59].
Fig. 5

Syntenic relationships among LEA genes from G. hirsutum, G. raimondii and G. arboretum. G. hirsutum, G. raimondii and G. arboretum chromosomes are indicated in different colours. The putative orthologous LEA genes between G. hirsutum and G. raimondii, G. hirsutum and G. arboretum, and G. raimondii and G. arboretum by different colours

Table 4

Gene duplication, Ks, Ka and Ka/Ks values calculated for paralogous LEA gene pairs in cotton genome

GENE FAMILYParalogous gene pairsLength (aa)KaKsKa/KsNegative/purifying selectionP-Value (Fisher)
AB
DEHYDRINCotAD_66,538CotAD_74,7136330.016770.040940.409579YES0.0891624
DEHYDRINCotAD_70,948CotAD_75,2679960.334290.422030.792098YES0.122937
DEHYDRINCotAD_53,981Cotton_A_054447530.005330.010720.49715YES0.365412
DEHYDRINCotAD_65,119Cotton_A_135736180.015160.047620.318478YES0.0332022
DEHYDRINCotAD_66,245Cotton_A_1358113500.016150.037160.434571YES0.0218162
DEHYDRINCotAD_64,004Cotton_A_143546570.020390.079080.257869YES0.00188419
DEHYDRINCotAD_66,774Cotton_A_231726480.01010.055470.182079YES0.0030156
DEHYDRINCotAD_66,775Cotton_A_231736750.026520.063140.419977YES0.0380326
DEHYDRINCotAD_60,435Cotton_A_243716992.876752.002421.43664NO0.319287
DEHYDRINCotAD_58,358Cotton_A_296926330.011970.056830.210713YES0.00684175
DEHYDRINCotAD_66,538Cotton_A_335486330.004170.006590.63267YES0.549017
DEHYDRINCotAD_70,948Cotton_A_403659960.334350.43850.762473YES0.0864744
DEHYDRINCotAD_75,267Cotton_A_403659960.003850.00940.409167YES0.292983
DEHYDRINCotAD_65,119Gorai.008G218500.16180.002140.006660.321634YES0.36944
DEHYDRINCotton_A_13573Gorai.008G218500.16180.012970.05480.236675YES0.00834018
LEA1CotAD_01033CotAD_081816060.051460.154160.3338YES0.000633352
LEA1CotAD_00275CotAD_39,7198220.014760.034760.424567YES0.130131
LEA1CotAD_01033CotAD_46,5506060.04930.187160.263397YES1.42E-05
LEA1CotAD_01298CotAD_64,1206540.04040.134640.300053YES0.000332152
LEA1CotAD_00275Cotton_A_140098220.014750.045030.327622YES0.0184625
LEA1CotAD_01298Cotton_A_229326540.033030.122810.268983YES0.000202033
LEA1CotAD_01033Cotton_A_275436060.051460.14560.353443YES0.00202823
LEA1CotAD_01298Gorai.004G155000.16302.636033.658040.720612YES0.753791
LEA1CotAD_01033Gorai.009G305100.16060.051530.144990.355394YES0.00205763
LEA2CotAD_11,658_Cotton_A_404997890.023090.017511.3191NO0.985982
LEA2CotAD_01700CotAD_095787800.012020.026340.456421YES0.151105
LEA2CotAD_02652CotAD_14,1476360.008350.019740.422805YES0.226532
LEA2Cotton_A_13471CotAD_17,103180.942.323971.113230.778217YES837
LEA2CotAD_13,584CotAD_20,0207500.008780.017110.513478YES0.287642
LEA2CotAD_17,062CotAD_21,7317320.007190.041610.17276YES0.00506705
LEA2CotAD_03649CotAD_31,3449600.011030.022140.497982YES0.177084
LEA2CotAD_17,101CotAD_31,5356660.015760.040260.391498YES0.077316
LEA2CotAD_17,102CotAD_31,5366270.004220.033650.125467YES0.0107355
LEA2Cotton_A_31083CotAD_35,0699392.27481.838581.23726NO0.447623
LEA2CotAD_19,214CotAD_35,5145430.009550.025090.380645YES0.19023
LEA2CotAD_19,623CotAD_36,9992820.032960.047710.69081YES0.631725
LEA2CotAD_18,546CotAD_37,7765190.010160.041950.242116YES0.0375368
LEA2CotAD_03649CotAD_37,8889600.045220.541420.08352YES9.32E-36
LEA2CotAD_40,972CotAD_38,9785910.960252.041930.470264YES0.00125135
LEA2CotAD_32,847CotAD_39,0646120.011060.04610.239936YES0.0153075
LEA2CotAD_12,375CotAD_42,4085972.420621.682881.43838NO0.288342
LEA2CotAD_28,872CotAD_44,9417200.01410.013691.03042NO0.900519
LEA2CotAD_08181CotAD_46,5506060.006540.049750.131503YES0.00250188
LEA2CotAD_25,271CotAD_48,7694050.006470.010630.609395YES0.539117
LEA2CotAD_28,252CotAD_53,2634920.013560.042850.31656YES0.069282
LEA2CotAD_09685CotAD_53,9817530.007110.043860.162112YES0.00252472
LEA2CotAD_35,091CotAD_60,4357530.030160.076890.392211YES0.0144267
LEA2CotAD_46,873CotAD_60,6176300.008350.034520.241852YES0.0372109
LEA2CotAD_46,888CotAD_61,3915730.013870.053130.261111YES0.0175133
LEA2CotAD_35,069CotAD_62,9969540.005510.036430.151164YES0.0017334
LEA2CotAD_17,045CotAD_64,0046570.022470.065230.344452YES0.0157104
LEA2CotAD_36,328CotAD_64,3466300.017770.075640.23489YES0.000973496
LEA2CotAD_21,924CotAD_64,6577860.013730.046930.292471YES0.0120925
LEA2CotAD_50,359CotAD_66,5386330.016770.040940.409579YES0.0891624
LEA2CotAD_19,078CotAD_66,7746480.010090.048420.208435YES0.00834864
LEA2CotAD_53,438CotAD_68,1896180.023410.028980.80786YES0.519399
LEA2CotAD_20,308CotAD_70,0035730.009150.022910.399181YES0.206152
LEA2CotAD_03649CotAD_73,9669600.045970.5270.087231YES4.70E-35
LEA2CotAD_37,888CotAD_73,9669600.015280.04420.345761YES0.0157353
LEA2CotAD_23,118CotAD_74,06112150.016110.068820.234049YES5.00E-05
LEA2CotAD_59,405CotAD_76,12962700.006540YES0
LEA2CotAD_13,584Cotton_A_018457500.008780.022940.382644YES0.139381
LEA2CotAD_20,020Cotton_A_0184575000.005680YES0
LEA2CotAD_01700Cotton_A_021967800.099920.589860.169389YES8.68E-22
LEA2CotAD_09578Cotton_A_021967800.09030.599440.150635YES7.07E-24
LEA2Gorai.007G048400.1Cotton_A_022945762.546711.772811.43654NO0.3084
LEA2CotAD_02652Cotton_A_023706360.012560.033110.379343YES0.101339
LEA2CotAD_14,147Cotton_A_023706360.004160.013120.316903YES0.244174
LEA2CotAD_09685Cotton_A_054447530.00890.043870.202818YES0.00516244
LEA2CotAD_10,376Cotton_A_056258310.006450.034440.187227YES0.00723285
LEA2CotAD_19,375Cotton_A_064356750.013450.055410.242679YES0.00759106
LEA2CotAD_01700Cotton_A_070367800.015510.037010.419158YES0.0752732
LEA2CotAD_09578Cotton_A_070367800.003420.010370.33004YES0.256013
LEA2Cotton_A_02196Cotton_A_070367800.094280.607650.155148YES1.27E-23
LEA2CotAD_12,681Cotton_A_082124320.031210.049280.633189YES0.35887
LEA2Gorai.005G043200.1Cotton_A_083347920.005070.015260.332323YES0.16956
LEA2CotAD_03649Cotton_A_086639600.005490.004371.25606NO0.744588
LEA2CotAD_37,888Cotton_A_086639600.043780.558390.07841YES1.73E-37
LEA2CotAD_10,044Cotton_A_0947319020.002740.002281.20458NO0.731531
LEA2CotAD_46,888Cotton_A_095965730.009220.04530.203506YES0.0147038
LEA2CotAD_46,873Cotton_A_096156300.008350.034520.241852YES0.0372109
LEA2CotAD_32,487Cotton_A_132406300.004250.019170.221854YES0.103356
LEA2CotAD_17,101Cotton_A_134696660.001950.013180.148053YES0.121749
LEA2CotAD_31,535Cotton_A_134696660.013770.047180.291898YES0.0234164
LEA2CotAD_31,536Cotton_A_134706270.002110.033730.062455YES0.00360292
LEA2CotAD_17,103Cotton_A_134718372.587122.323971.11323NO0.778217
LEA2CotAD_17,045Cotton_A_143546570.002010.012620.159578YES0.13409
LEA2CotAD_17,062Cotton_A_143707320.00990.026480.374024YES0.0618224
LEA2CotAD_21,731Cotton_A_143707320.008990.023540.381916YES0.138838
LEA2CotAD_03649Cotton_A_144789600.045920.529720.086683YES3.29E-35
LEA2CotAD_37,888Cotton_A_144789600.012470.035280.353401YES0.0321315
LEA2CotAD_25,271Cotton_A_146764050.006470.032340.200226YES0.085476
LEA2CotAD_31,140Cotton_A_159987470.001740.00580.300994YES0.356655
LEA2CotAD_20,308Cotton_A_176255730.013750.022960.59881YES0.347235
LEA2CotAD_44,941Cotton_A_179867200.012330.013690.900555YES0.874489
LEA2CotAD_13,827Cotton_A_1864511042.120921.896531.11832NO0.642563
LEA2CotAD_21,924Cotton_A_189197860.010280.052190.196967YES0.00026749
LEA2CotAD_19,078Cotton_A_2317264800.006720YES0
LEA2CotAD_35,069Cotton_A_243569540.005510.031780.173291YES0.00508945
LEA2CotAD_35,091Cotton_A_243716993.503091.611862.17333NO0.036477
LEA2CotAD_22,539Cotton_A_251954081.232651.241120.993172YES1
LEA2CotAD_23,646Cotton_A_272826090.025870.037380.692044YES0.542393
LEA2CotAD_23,646Cotton_A_273006090.042490.131350.323481YES0.000630664
LEA2Cotton_A_27282Cotton_A_273006090.043630.118180.369227YES0.00568388
LEA2CotAD_08181Cotton_A_2754360600.006970YES0
LEA2CotAD_40,972Cotton_A_296595910.966592.07090.466747YES0.00123143
LEA2CotAD_48,976Cotton_A_2977966000.006420YES0
LEA2CotAD_19,214Cotton_A_308895430.002370.00830.285978YES0.347253
LEA2CotAD_35,514Cotton_A_308895430.007160.016590.431351YES0.312651
LEA2CotAD_35,513Cotton_A_308906510.021930.051020.429783YES0.0738291
LEA2CotAD_13,115Cotton_A_310595760.02070.03790.546252YES0.312514
LEA2CotAD_30,219Cotton_A_324955970.011050.036260.304817YES0.0618481
LEA2CotAD_50,359Cotton_A_335486330.016780.033880.495283YES0.175709
LEA2CotAD_74,713Cotton_A_335486330.016780.033880.495283YES0.175709
LEA2CotAD_23,118Cotton_A_3811712150.016110.060770.265138YES0.000321992
LEA2CotAD_56,699Cotton_A_385346390.020210.044930.449883YES0.106618
LEA2CotAD_56,696Cotton_A_385356300.018380.022690.809786YES0.670475
LEA2CotAD_59,405Cotton_A_403636270.006360.040160.158424YES0.00848415
LEA2CotAD_46,888Gorai.001G122700.15730.00460.01480.310385YES0.238274
LEA2CotAD_46,873Gorai.001G124400.16300.002080.006740.30909YES0.361889
LEA2CotAD_28,872Gorai.005G203000.17200.012330.027620.446407YES0.170613
LEA2CotAD_44,941Gorai.005G203000.17200.001750.013680.127787YES0.0998325
LEA2Cotton_A_17986Gorai.005G203000.17200.010550.027620.382183YES0.12817
LEA2CotAD_30,219Gorai.006G104100.15970.008840.007071.25015NO0.743557
LEA2Cotton_A_32495Gorai.006G104100.15970.011060.043620.25359YES0.0255988
LEA2CotAD_17,101Gorai.006G150200.16660.019770.040180.491966YES0.209339
LEA2CotAD_31,535Gorai.006G150200.16660.003910.019810.197433YES0.082505
LEA2Cotton_A_13469Gorai.006G150200.16660.017770.040180.442182YES0.10598
LEA2CotAD_23,646Gorai.006G199800.16090.042490.114110.372373YES0.00460089
LEA2Cotton_A_27282Gorai.006G199800.16090.043640.101270.4309YES0.0292702
LEA2Cotton_A_27300Gorai.006G199800.16090.008520.014740.578415YES0.130872
LEA2Cotton_A_02294Gorai.007G048400.15762.546711.772811.43654NO0.3084
LEA2Gorai.002G235700.1Gorai.007G048400.15762.493871.67861.48568NO0.261229
LEA2Cotton_A_29142Gorai.007G365600.16300.026410.052370.504231YES0.0996897
LEA2CotAD_08181Gorai.009G305100.16060.004350.021030.206723YES0.090366
LEA2CotAD_19,214Gorai.010G176400.15430.009550.016640.574182YES0.403293
LEA2CotAD_35,514Gorai.010G176400.154300.008220YES0
LEA2Cotton_A_30889Gorai.010G176400.15430.007160.008250.8675YES0.63691
LEA3CotAD_31,344CotAD_37,8889600.04450.582010.076463YES1.76E-39
LEA3CotAD_31,344CotAD_73,9669600.045250.566750.079847YES1.38E-37
LEA3CotAD_31,344Cotton_A_086639600.011030.026620.414477YES0.0917464
LEA3CotAD_31,344Cotton_A_144789600.04520.569760.079332YES9.48E-38
LEA3CotAD_68,063Cotton_A_350396540.00410.006110.670634YES0.565408
LEA3CotAD_76,129Cotton_A_403636270.006360.033310.190971YES0.0241426
LEA4CotAD_73,966Cotton_A_086639600.044530.543630.081917YES8.99E-37
LEA4CotAD_73,966Cotton_A_144789600.005520.008650.63778YES0.447461
LEA4Cotton_A_08663Cotton_A_144789600.044480.546470.081397YES6.22E-37
LEA4CotAD_48,769Cotton_A_1467640500.021410YES0
LEA4CotAD_64,120Cotton_A_229326540.006070.012780.475195YES0.348209
LEA4Gorai.004G155000.1Cotton_A_229326302.642063.593110.735312YES0.675748
LEA4CotAD_46,550Cotton_A_275436060.006540.042420.154221YES0.00764013
LEA4CotAD_64,120Gorai.004G155000.16302.601952.912770.893293YES0.834736
LEA4Cotton_A_22932Gorai.004G155000.16302.642063.593110.735312YES0.675748
LEA4CotAD_70,003Gorai.006G083600.15730.013780.022820.603673YES0.351221
LEA4Cotton_A_17625Gorai.006G083600.15730.018410.022870.804972YES0.670413
LEA4CotAD_46,550Gorai.009G305100.16060.006550.027910.234738YES0.0613694
LEA4Cotton_A_27543Gorai.009G305100.16060.004350.013950.311662YES0.239375
LEA5CotAD_39,719Cotton_A_140098220.003250.009760.333365YES0.258994
LEA5CotAD_53,263Cotton_A_228894920.016310.042790.38131YES0.103316
LEA5CotAD_74,061Cotton_A_3811712150.004260.014760.28873YES0.0817353
LEA6CotAD_42,408Cotton_A_138546423.295241.270992.59265NO0.00291028
LEA6CotAD_60,617Gorai.001G124400.16300.010460.041520.251878YES0.00495557
LEA6Cotton_A_09615Gorai.001G124400.16300.010460.041520.251878YES0.00495557
SMPCotton_A_31083CotAD_62,9969392.266581.800771.25867NO0.353457
SMPCotAD_47,454Cotton_A_074518100.011410.059670.191218YES0.000658403
SMPCotAD_61,391Cotton_A_095965730.00460.007360.624435YES0.545436
SMPCotAD_64,657Cotton_A_189197860.006830.005081.345NO0.764969
SMPCotAD_62,996Cotton_A_2435695400.013510YES0
SMPCotton_A_31083Cotton_A_243569392.266581.85241.22359NO0.3981
SMPCotAD_61,391Gorai.001G122700.15730.018550.053130.349233YES0.0424313
SMPCotton_A_09596Gorai.001G122700.15730.013870.04530.306205YES0.042074
SMPCotton_A_02294Gorai.002G235700.16270.006140.03770.162756YES0.0142575
SMPCotAD_61,173Gorai.004G137100.16450.018610.046460.400532YES0.0204369
SMPCotton_A_31127Gorai.004G137100.16450.019650.04310.45597YES0.124339
SMPCotAD_47,454Gorai.009G452500.18101.799161.141251.57648NO0.0145321
SMPCotton_A_07451Gorai.009G452500.18101.850451.146841.61353NO0.0110846

A B: paralogous gene pair; aa amino acids, K non-synonymous substitutions per non-synonymous site, K synonymous substitutions per synonymous site); K/K the ratio, SMP seed maturation protein, LEA Late embryogenesis abundance, Yes presence of purifying selection while NO absence of purifying selection

Syntenic relationships among LEA genes from G. hirsutum, G. raimondii and G. arboretum. G. hirsutum, G. raimondii and G. arboretum chromosomes are indicated in different colours. The putative orthologous LEA genes between G. hirsutum and G. raimondii, G. hirsutum and G. arboretum, and G. raimondii and G. arboretum by different colours Gene duplication, Ks, Ka and Ka/Ks values calculated for paralogous LEA gene pairs in cotton genome A B: paralogous gene pair; aa amino acids, K non-synonymous substitutions per non-synonymous site, K synonymous substitutions per synonymous site); K/K the ratio, SMP seed maturation protein, LEA Late embryogenesis abundance, Yes presence of purifying selection while NO absence of purifying selection

Prediction of LEA genes (mRNA) targeted by miRNAs in upland cotton

Large groups of small RNAs, known as microRNAs (miRNAs) are reported as the regulators in plant adaptation to abiotic stresses [60]. In transgenic creeping bent grass, Agrostis stolonifera, over expression of rice miR319a showed enhanced salt and drought tolerance [61]; over expression of miR396c and miR394 in plants was due to hypersensitive to salinity stress [62]. In cotton, Gossypium hirsutum, a group of miRNAs and their targets have been identified, and some of them respond to salt and drought stresses [60]. To get more information on LEA genes functions, we carried out the prediction of miRNAs targets on LEA transcripts (mRNA) by the use of psRNATarget, the same as been used for other functional genes in cotton [63]. Out of 242 upland cotton LEA genes, 89 genes were found to be targeted by 63 miRNAs, representing 37% of all the LEA genes (Additional file 4: Table S3). The highest levels of target were detected on the following genes with more than 6 miRNAs; CotAD_00799 (6 miRNAs), CotAD_06037 (9 miRNAs), CotAD_13,827 (6 miRNAs), CotAD_19,205 (6 miRNAs), CotAD_31,936 (6 miRNAs) CotAD_33,143 (6miRNAs), CotAD_41,925 (8miRNAs) and CotAD_69,738 (7miRNAs) as highlighted in (Additional file 4: Table S3). The rest of the genes were either targeted by one or not more than 5 miRNAs. The high number of miRNAs targeting LEA genes could possibly had direct or indirect correlation to their stress tolerance levels to abiotic stress more so drought. Some specific miRNAs had high level of target to various genes such as ghr-miR164 (5 genes), ghr-miR2949a-3p (7 genes), ghr-miR2950 (10 genes), ghr-miR7492a (10 genes), ghr-miR7492b (10 genes), ghr-miR7492c (10 genes), ghr-miR7495a (10 genes), ghr-miR7495b (10 genes), ghr-miR7504a (5 genes), ghr-miR7507 (5 genes), ghr-miR7510a (6 genes), ghr-miR7510b (10 genes), ghr-miR827b (4 genes) and lastly ghr-miR827c (4 genes). It has been found that miRNAs might be playing a role in response to drought and salinity stresses through targeting a series of stress-related genes [60]. Cotton ghr-miR7510b not only involved in drought stress but also highly up regulated in ovule and fibre, thus has an integral role in fibre formation [64]. Deep sequencing of miRNA under drought and salinity, ghr-miR408a, ghr-miR2911, ghr-miR156a/c/d and ghr-miR3954a/b were found to have differential expression in either of the stress factors, drought and salt stress [60]. The biological processes, molecular functions and cellular components of cotton LEA genes were examined according Gene Ontology (GO) data base. Blast2GO v4.0 was used to carry out the analysis (Fig. 6 and Additional file 5: Table S4). The results showed that the 242 LEA genes were putatively involved in a range of biological processes. Of the 5 terms of biological processes defined by Blast2Go terms, most LEA genes were predicted to function in the response to desiccation (~29%), followed by response to stress and response to defense. Molecular function prediction indicated that all 242 LEA genes, majority were involved in signal transducer activity, transferase activity and DNA binding. In cellular component prediction of LEA genes exhibited to be involved in membrane were 117, membrane parts (113), cell (13) and cell part (13). Higher numbers of upland cotton, G. hirsutum LEA genes were mainly involved in cellular component and molecular functions and few were found to be involved in biological processes. In all the LEA groups, molecular functions, biological process and cellular components were noted except in LEA 1 in which only two GO functions, biological process and cellular components were observed.
Fig. 6

Gene Ontology (GO) annotation results for upland cotton LEA genes. GO analysis of 242 LEA protein sequences predicted for their involvement in biological processes (BP), molecular functions (MF) and cellular components (CC). For the results presented as detailed bar diagrams, as illustrated in Additional file 5: Table S4

Gene Ontology (GO) annotation results for upland cotton LEA genes. GO analysis of 242 LEA protein sequences predicted for their involvement in biological processes (BP), molecular functions (MF) and cellular components (CC). For the results presented as detailed bar diagrams, as illustrated in Additional file 5: Table S4 Promoter sequences, 2 kb upstream and downstream of the translation start and stop site of all the LEA genes were obtained from the cotton genome project. Transcriptional response elements of LEA genes promoters were analyzed using the PLACE database (http://www.dna.affrc.go.jp/PLACE/signalscan.html) [36]. In order to determine cis –acting regulator element, we queried a section of the sequence of each gene, but only the start and end codon were used for the selection of cis–promoter elements. Analysis of the promoter region of all upland cotton LEA genes identified the presence of various stress responsive cis-acting regulatory elements, including DRE/CRT, ABRE, LTRE and MYBS. These stress-responsive elements were relatively abundant in the promoters of the upland cotton LEA genes, more specifically ABREs (Fig. 7 and Additional file 6: Table S5), indicating that LEA proteins may have an important functional role in drought stress response and tolerance in upland cotton, G. hirsutum. There were significant differences in the average proportions of the promoter elements detected within the different LEA gene families (Fig. 7).
Fig. 7

Average number of the cis-promoters ABRELATERD1 (ACGTG), DRECRTCOREAT (G/ACCGAC), MYBCORE (TAACTG), LTRE1HVBLT49 (CCGAC) and others in promoter region of Gossypium hirsutum LEA genes from each LEA families. The promoter regions were analyzed in the 1 kb upstream promoter region of translation start site using the PLACE database

Average number of the cis-promoters ABRELATERD1 (ACGTG), DRECRTCOREAT (G/ACCGAC), MYBCORE (TAACTG), LTRE1HVBLT49 (CCGAC) and others in promoter region of Gossypium hirsutum LEA genes from each LEA families. The promoter regions were analyzed in the 1 kb upstream promoter region of translation start site using the PLACE database The upland cotton LEA genes from LEA 2, LEA3, SMP and dehydrins gene families contained the highest average proportions of stress-responsive elements, while those from LEA 1 and LEA 6 contained the lowest proportions. ABRELATERD1 (ACGTG) was the dominant cis promoter elements, similar findings, with the predominance of ABRE cis-element, have been reported for LEA genes in tomato [46], Arabidopsis [40] and Chinese plum [47]. ABRE is a cis-acting element majorly involved in abscisic acid signaling in response to abiotic stresses, while DRE/CRT and LTRE are major cis-acting regulatory elements involved in the ABA-independent gene expression in response to water deficit (DRE/CRT) and cold (DRE/CRT and LTRE) [65]. MYBS is well-studied cis-acting promoter element with key role in the abscisic acid-dependent signaling pathway in response to drought, salt and cold [66].

Upland cotton LEA genes expression analysis under drought stress

To examine the expression profile of LEA proteins family in various tissues under drought stress treatments, we selected 42 LEA genes based on phylogenetic tree analysis, intron–exon and protein motif features, for each LEA group. Three cotton genotypes, G. tomentosum, a wild type known to be drought resistant, G. hirsutum, an elite cultivar, though drought susceptible cultivar and their backcross type BC2F1 generation were cultivated in the greenhouse under drought simulated and well watered condition. The qRT-PCR analysis was done on the three sets of accessions on different plant organs, roots, stems, and leaves. The results showed that LEA genes were differentially expressed under drought treatment across different tissues tested. Based on the cluster analysis, gene expression profiling were categorized into 2 groups, sub group 1, included 15 genes; the majority in this cluster of genes were up regulated in G. tomentosum in all tissues after 14 days of stress exposure and down regulation after 7 days of stress except in root, which showed partial expression. In G. hirsutum, majority of the genes were up regulated after treatment except in leave tissues in which the genes showed down regulation. In BC2F1, majority of the genes were down regulated after one week of exposure but expression was high after two weeks under drought stress. This result showed that these genes might be involved either directly or indirectly in drought stress, and their role was majorly concentrated in roots and stem. The second cluster, with 28 genes, the majority of the genes showed down regulation after one week of stress in all the tissues across the three genotypes. Some genes were up regulated across the three genotypes after 2 weeks of drought treatment. The results exhibited differential expression pattern in the 3 genotypes tested (Fig. 8). Some of the LEA genes were differentially expressed in the three plant organs and genotypes tested while others showed same expression pattern in different tissues, this could be due to functional divergence of LEA genes during plant development, for instance, CotAD_16,595 and CotAD_40,972 were highly expressed in the roots in all the three genotypes (Fig. 1), implying that they could be responsible for enhancing roots traits to drought tolerance. CotAD_13,827 and CotAD_31,906 were highly expressed in the leaves while others such as CotAD_10,044 and CotAD_03264 were highly up regulated in the stem. Further analysis of the expression showed that more than a half the upland cotton LEA genes were increased in roots and leaves at 7th and 14th day of stress as opposed to the stem. Roots and leaves tissues are highly sensitive to drought, the roots is the first organ to be affected by water deficit [67]. The leaves wilt or become chlorotic in stress conditions and affects photosynthesis process [68].
Fig. 8

Differential expression of upland cotton LEA genes under drought stress. The heat map was visualized using Mev.exe program. (Showed by log2 values) in control, and in treated samples 7 and 14 days after drought treatment. a – BC2F1 (offspring), b – Gossypium tomentosum and c – Gossypium hirsutum. (i) Yellow – up regulated, blue – down regulated and black- no expression. (i). Percentage of genes exhibiting different responses to dehydration in leaf, root and stem of BC2F1; (ii). Percentage of genes exhibiting different responses to dehydration in leaf, root and stem of Gossypium tomentosum (iii). Percentage of genes exhibiting different responses to dehydration in leaf, root and stem of Gossypium hirsutum

Differential expression of upland cotton LEA genes under drought stress. The heat map was visualized using Mev.exe program. (Showed by log2 values) in control, and in treated samples 7 and 14 days after drought treatment. a – BC2F1 (offspring), b – Gossypium tomentosum and c – Gossypium hirsutum. (i) Yellow – up regulated, blue – down regulated and black- no expression. (i). Percentage of genes exhibiting different responses to dehydration in leaf, root and stem of BC2F1; (ii). Percentage of genes exhibiting different responses to dehydration in leaf, root and stem of Gossypium tomentosum (iii). Percentage of genes exhibiting different responses to dehydration in leaf, root and stem of Gossypium hirsutum

Discussion

LEA proteins family is a large and widely diversified across plant kingdom [15]. The LEA gene family has been identified in several crops, such as rice and maize [69], in other organisms such as invertebrates and microorganisms [70]. However, characterization of the LEA protein family and their role in drought stress tolerance in upland cotton has never been reported. In this study, we identified different numbers of LEA genes in G. hirsutum (242), G. arboreum (136), G. raimondii (142), A. thaliana (51) and P. tabuliformis (30). The number of LEA genes in cotton genome AD (G. hirsutum) were higher than A (G. arboreum) and D (G. raimondii). The number of LEA genes in AD is approximately 1.78 and 1.70 times that in A and D respectively. The high number of LEA genes in G. hirsutum were more likely caused by gene duplication and the conservation of the LEA genes during the polyploidization process, signifying the important role played by these groups of gene families in the process of plant growth and development [40]. Gene duplication, is a major feature of genomic architecture, with cardinal role in the process of plant genomic and organismal evolution, resulting into new raw genetic materials for genetic drift, mutation and selection, which ultimately results into emergence of new gene functions and evolution of gene networks [71]. Gene duplication mechanism not only lead to the expansion of genome content but aids in the diversification of gene function to ensure adequate adaptability and evolution of plants [55]. The tetraploid cotton have undergone whole genome duplication events during their evolution period [72] and G. hirsutum emerged through allopolyploidy [73]. In this study, only 46 (17%) tandemly duplicated genes were detected, similarly only 6 genes were found to be tandemly duplicated in Brassica napus, perhaps because the evolution of upland cotton and Brassica napus are due to whole genome duplication [74, 75]. Majority of the upland LEA genes showed a close relationship with respect to the block locations of G. arboreum and G. raimondii LEA genes. A phylogenetic analysis provided evidence of the contribution of whole genome duplication contribution to Upland cotton LEA genes abundance. LEA gene expansion through whole genome duplication have been observed in Arabidopsis [37] and Brassica napus [86]. The Gossypium arboreum genome contained 136 LEA genes and G. raimondii genome had 142 LEA genes; therefore, a WGD process would be expected to produce more than 242 LEA genes in Gossypium hirsutum. The LEA genes numbers proportions in G. hirsutum (tetraploid) implied that a larger number of the duplicated LEA genes were lost or became functionless after whole genome duplication. The loss of Upland cotton LEA genes could have been due to chromosomes rearrangement, the same mechanism was also observed in the case of Brassica [76]. The expansion of LEA genes in upland cotton was majorly through segmental duplication, 44% (130/242) of the upland cotton LEA genes emerged through segmental duplication. This finding is concurrent to observation made in Brassica 72 out of 108 genes occurred through segmental duplication [39] and in Arabidopsis in which 24% of its LEA genes arose through segmental type of gene duplication [40]. In synteny analysis, we identified 241 pairs with high similarity, implying that most LEA gene family members are embedded in highly-conserved syntenic regions, and some genes were either lost or recovered. The loss or gain of genes within the syntenic region have been observed in a number of gene families not only in LEA genes [77]. Characterization and structural analysis of genes with major functions on abiotic and biotic stress factors have been found to have fewer introns [48]. The analysis of the upland cotton LEA genes, LEA 1, 3, 4, 5, SMP and dehydrins genes had one to four intron with exception of LEA 2 and 6, which had zero to five introns. The reduced intron numbers in stress responsive genes have been recorded, such as trehalose-6-phosphate synthase gene family which plays an important role in abiotic stress and metabolic regulation [78]. The existence of introns in a genome is argued to cause enormous burden on the host [79]. The burden is because the introns requires a spliceosome, which is among the largest molecular complexes in the cell, comprising 5 small nuclear RNAs and more than 150 proteins [79]. It has also been found that intron transcription is costly in terms of time and energy [80]. Moreover, introns can extend the length of the nascent transcript, resulting into an additional expense for transcription [81]. The motif protein analysis and composition of each LEA gene family largely varied, although some amino acid-rich regions were detected, similar to previous studies done on Arabidopsis [40] and legumes [82]. We found that that genes belonging to the same families exhibited similar gene structure and motif composition. This results is consistent to previous studies which recorded similar exon - intron and protein motif within the same group of the LEA genes [23]. LEA proteins have disordered structure along their sequences due to their amino acid compositions [83]. LEA proteins play key roles in the plant cell despite of their disordered structure [41], they have the ability to form chaperons with other elements [84].The structural flexibility of the LEA proteins facilitate interactions with other macromolecules, such as membrane proteins, hence cell membrane stability during drought stress [85]. These results demonstrate that LEA proteins have intrinsic characteristics which enables them to functions as flexible integrators in protecting other molecules under drought stress and other forms of abiotic stress factors [86]. In relation to gene ontology (GO) analysis, biological processes, molecular functions and cellular components are features of genes or gene products that enable us to understand the diverse molecular functions of proteins. Cellular component and molecular activities were highest among the upland cotton LEA proteins, this could be in line with their functions of protecting the membranes and enzymes to maintain cellular activities under drought stress conditions [87]. The finding in this study is concurrent to previous studies which reported that LEA proteins are mainly located in subcellular regions such as chloroplast, nucleus, cytoplasm and mitochondria in Arabidopsis [40] and tomato [46]. The subcellular localization and the role of the LEA protein in the cell are positively correlated. Binding to different molecules such as ATP binding (GO: 0005524), sequence-specific DNA binding (GO: 0003700) and zinc ion binding (GO: 0008270) were the major activities for the action of upland LEA proteins as molecular function. Binding of LEA proteins to nucleic acids in order to protect cellular structures by constructing hydrogen network was reported, which is related to the roles of LEA proteins in drought stress tolerance [88]. In addition, LEA protein family groups have been found to enhance membrane stabilization through chaperons formation with phospholipids and other sugar molecules as described in model membranes under drought condition [87]. The molecular function of LEA proteins in drought stress may be through the binding activity. In addition, biological processes in response to stress factors were dominant, response to desiccation (GO: 0009269); abscisic acid transport (GO: 0080168); response to stress (GO: 0006950); response to water (GO: 0009415); auxin-activated signalling pathway (GO: 0009734); response to water deprivation (GO: 0009414); response to cytokinin (GO: 0009735) and phosphorylation (GO: 0016310). These biological roles detected in cotton LEA proteins were consistent with earlier findings of biological functions of the LEA proteins such as oxidant scavenging activity, enzyme and nucleic acid preservation and membrane maintenance, these biological functions protect cell structures from the deleterious effects of drought and other abiotic stress factors [89]. Our findings is further supported by the highly up regulation of LEA proteins in various studies done in transgenic modal plant, Arabidopsis [90] and bacteria [91]. The small RNAs are a diverse class of non-coding regulatory with important function in gene regulation under drought stress conditions by destroying the target gene transcripts in plants [92]. The analysis of upland cotton miRNAs showed that 89 LEA transcripts were targeted by 63 different miRNAs. The NAC gene family are plant-specific transcriptional factors known to play diverse roles in various plant developmental processes, MYB is a transcriptional factor family mainly involved in controlling various processes like responses to biotic and abiotic stresses, development, differentiation, metabolism, defense among other biological processes while mitogen-activated protein kinase (MAPK) gene families also do play an important roles in plant growth, development and defense response. The three plant transcriptional factors, MYB, NAC and MAPK are ranked top under the context of drought and salinity indicating their important roles for the plant to combat drought and salinity stress. Through target prediction, a series of cotton miRNAs were found to be associated with MYB, NAC and MAPK genes including miR164 [60, 93]. In this research work miR164 was found to target four (5) LEA genes, CotAD_03784, CotAD_07516, CotAD_19,375, CotAD_24497and CotAD_63,174. The association of these 5 LEA genes with miR164, which have been found to be linked to highly ranked plants transcription factors under drought and salt stress, provides a strong evidence of a major role played by LEA genes in drought stress. A small RNA like miR827 have been found to confer drought tolerance in transgenic Arabidopsis, homologous form of the same miRNA, denoted as Hv-miR827, have been proved to confer drought tolerance in barley [94]. The same miRNA, ghr-miR827a/b/c/d, was found to target 12 different LEA genes, this implied that these genes targeted by the miRNA had a direct functional role in enhancing drought tolerance in upland cotton. Gene promoters, also termed as cis-element, play various key roles in the transcriptional regulation of genes controlling a number of abiotic stress and plant hormones responses. Phyto-hormones enhance the ability of plants to adapt to changing environments. Many abiotic stress-related and plant hormones-related cis-elements, including W-Box, MBS, HSE, ABRE and TCA-elements, have been identified [95]. All of these and other cis-elements were detected in our investigation. Therefore, the results obtained is in agreement to the various cis-acting element detected in the analysis of LEA genes in various plants such as tomato [46], Chinese plum [47], in brassica [39] and poplar [48]. In each LEA gene, contained more than four cis-elements related to abiotic stress signal responsiveness, which provides strong evidence, that these genes might have important functions under different drought stresses. A number of studies have shown that LEA genes do have a significant contribution in drought stress [69]. From the heatmap and expression pattern of cotton LEA gene families, high number of LEA genes showed higher expression levels across all the plant organs tested. The high expression levels of these genes under drought condition, indicates the maintenance functions during stress conditions, leading to drought tolerance of the plants. A unique observation was made, in which high percentage of the LEA 2 genes, used in the expression analysis, almost all showed high expression, and it would be of interest to characterize this group of LEA gene family in upland cotton. High expression pattern of the LEA genes have been observed in Brassica napus [39], Prunus meme [47], Arabidopsis [40] and sweet orange [53]. LEA genes have been found to have wide distribution and abundance among the terrestrial plants as opposed to aquatic plants, the abundance of these gene families could be pegged on to their conservative role, aquatic plants do not suffer from drought stress, thus the shrinking number of LEA genes. Therefore, the finding of this work and previous publication on function of LEA genes may explain why the LEA gene family have a wider distribution in terrestrial plants but not moss plants [96]. LEA families with close taxonomic relationships generally exhibited similar scales and distributions. However, the scales of the LEA gene family differ in upland cotton Gossypium hirsutum and other higher plants such as Theobroma cacao L, sweet orange, Arabidopsis among others; this could be due to changes in the environmental conditions. The high number of LEA genes in upland cotton, suggested that stress adaptation might have initiated the evolution of protein coding sequences which are LEA specific. Similar observation have been reported in maize in which adaptation to abiotic stress led to evolution of protein coding sequences, leading to the variation of LEA genes in maize compared to rice [97]. We carried out the expression profiling of the LEA genes in drought susceptible upland cotton, Gossypium hirsutum, drought tolerant cultivar Gossypium tomentosum and their BC2F1 offspring. High numbers of genes were found to be highly up regulated in G. tomentosum than in G. hirsutum (Figs. 8 and 9). The results obtained is concurrent to similar findings which have been reported in maize landraces with varying drought stress tolerance levels when compared at the transcriptional level [98]. This result implied that more tolerant cotton genotypes had a greater ability to rapidly adjust more genes under drought stress than the more susceptible cotton cultivars. Moreover, rapid adjustments of greater number of differentially expressed genes and of different transcriptome factor families is considered an important trait of the drought tolerant genotypes [99].
Fig. 9

Quantitative PCR analysis of the selected LEA genes. Abbreviations: Rt (root), Sm (stem) and Lf (leaf). 0, 7 and 14 days of stress. Gh - Gossypium hirsutum, Gt - Gossypium tomentosum and BC-BC2F1 offspring. Y-axis: relative expression (2−ΔΔCT)

Conclusions

This research work provides the very first detailed analysis, characterization and expression profile of upland LEA genes under drought condition. A total of 242 LEA genes were identified in upland cotton and divided into eight groups. Chromosomal mapping and syntenic analysis showed that all the LEA genes were distributed in all the cotton chromosomes with some genes clustering either on the upper arm or the middle region of the chromosomes. Segmental gene duplication was found to have played a major role in the expansion of upland cotton LEA genes coupled with whole genome duplication. High numbers of cotton LEA genes had few introns. Genes belonging to the same family exhibited similar gene structures and protein motif composition. Expression profiling of the selected LEA genes showed differential expression under drought treatment across different plant organs. The outcome of this research provides the most current information thus will increases our understanding of LEA genes in cotton and the general role in drought stress tolerance. This work lays the very first foundation for further investigations of the very specific functions of these LEA proteins in cotton in reference to drought stress and other abiotic stress factors. Quantitative PCR analysis of the selected LEA genes. Abbreviations: Rt (root), Sm (stem) and Lf (leaf). 0, 7 and 14 days of stress. Gh - Gossypium hirsutum, Gt - Gossypium tomentosum and BC-BC2F1 offspring. Y-axis: relative expression (2−ΔΔCT) List of primers used for upland cotton, Gossypium hirsutum LEA genes expression analysis under drought stress. (DOCX 19 kb) LEA gene in upland cotton, Gossypium hirsutum and their subcellular location prediction. The colour scheme indicates where the genes are sub-localized. (DOCX 90 kb) Phylogenetic tree, gene structure and motif compositions of LEA 2 genes in upland cotton. The phylogenetic tree was constructed using MEGA 6.0. Exon/intron structures of LEA genes in upland cotton, exons introns and up / down-stream were represented by yellow boxes, black lines and blue boxes, respectively. Protein motif analysis represented by different colours, and each motif represented by number. (PDF 3609 kb) LEA genes and mRNA targets. (DOCX 53 kb) Gene ontology (GO) terms annotation of LEA genes in upland cotton. (DOCX 66 kb) Cis element analysis of putative LEA promoters related to drought stress. (DOCX 719 kb) Data. Newick format for the phylogenetic tree. (NWK 19 kb)
  83 in total

1.  Preformed structural elements feature in partner recognition by intrinsically unstructured proteins.

Authors:  Monika Fuxreiter; István Simon; Peter Friedrich; Peter Tompa
Journal:  J Mol Biol       Date:  2004-05-14       Impact factor: 5.469

2.  Inventory, evolution and expression profiling diversity of the LEA (late embryogenesis abundant) protein gene family in Arabidopsis thaliana.

Authors:  Natacha Bies-Ethève; Pascale Gaubier-Comella; Anne Debures; Eric Lasserre; Edouard Jobet; Monique Raynal; Richard Cooke; Michel Delseny
Journal:  Plant Mol Biol       Date:  2008-02-12       Impact factor: 4.076

3.  Late embryogenesis abundant proteins: versatile players in the plant adaptation to water limiting environments.

Authors:  Yadira Olvera-Carrillo; José Luis Reyes; Alejandra A Covarrubias
Journal:  Plant Signal Behav       Date:  2011-04-01

4.  Identification of the trehalose-6-phosphate synthase gene family in winter wheat and expression analysis under conditions of freezing stress.

Authors:  D W Xie; X N Wang; L S Fu; J Sun; W Zheng; Z F Li
Journal:  J Genet       Date:  2015-03       Impact factor: 1.166

5.  Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution.

Authors:  Fuguang Li; Guangyi Fan; Cairui Lu; Guanghui Xiao; Changsong Zou; Russell J Kohel; Zhiying Ma; Haihong Shang; Xiongfeng Ma; Jianyong Wu; Xinming Liang; Gai Huang; Richard G Percy; Kun Liu; Weihua Yang; Wenbin Chen; Xiongming Du; Chengcheng Shi; Youlu Yuan; Wuwei Ye; Xin Liu; Xueyan Zhang; Weiqing Liu; Hengling Wei; Shoujun Wei; Guodong Huang; Xianlong Zhang; Shuijin Zhu; He Zhang; Fengming Sun; Xingfen Wang; Jie Liang; Jiahao Wang; Qiang He; Leihuan Huang; Jun Wang; Jinjie Cui; Guoli Song; Kunbo Wang; Xun Xu; John Z Yu; Yuxian Zhu; Shuxun Yu
Journal:  Nat Biotechnol       Date:  2015-04-20       Impact factor: 54.908

6.  Gene cloning and characterization of a soybean (Glycine max L.) LEA protein, GmPM16.

Authors:  Ming-der Shih; Shu-Chin Lin; Jaw-Shu Hsieh; Chi-Hua Tsou; Teh-Yuan Chow; Tsai-Piao Lin; Yue-Ie C Hsing
Journal:  Plant Mol Biol       Date:  2005-03-24       Impact factor: 4.076

7.  Functional dissection of hydrophilins during in vitro freeze protection.

Authors:  José L Reyes; Francisco Campos; Hui Wei; Rajeev Arora; Yongil Yang; Dale T Karlson; Alejandra A Covarrubias
Journal:  Plant Cell Environ       Date:  2008-08-26       Impact factor: 7.228

8.  Identification of a novel LEA protein involved in freezing tolerance in wheat.

Authors:  Kentaro Sasaki; Nikolai Kirilov Christov; Sakae Tsuda; Ryozo Imai
Journal:  Plant Cell Physiol       Date:  2013-11-20       Impact factor: 4.927

9.  Genome sequence of the cultivated cotton Gossypium arboreum.

Authors:  Fuguang Li; Guangyi Fan; Kunbo Wang; Fengming Sun; Youlu Yuan; Guoli Song; Qin Li; Zhiying Ma; Cairui Lu; Changsong Zou; Wenbin Chen; Xinming Liang; Haihong Shang; Weiqing Liu; Chengcheng Shi; Guanghui Xiao; Caiyun Gou; Wuwei Ye; Xun Xu; Xueyan Zhang; Hengling Wei; Zhifang Li; Guiyin Zhang; Junyi Wang; Kun Liu; Russell J Kohel; Richard G Percy; John Z Yu; Yu-Xian Zhu; Jun Wang; Shuxun Yu
Journal:  Nat Genet       Date:  2014-05-18       Impact factor: 38.330

Review 10.  Effects of abiotic stress on plants: a systems biology perspective.

Authors:  Grant R Cramer; Kaoru Urano; Serge Delrot; Mario Pezzotti; Kazuo Shinozaki
Journal:  BMC Plant Biol       Date:  2011-11-17       Impact factor: 4.215

View more
  51 in total

1.  Comparative analysis of transcriptome in two wheat genotypes with contrasting levels of drought tolerance.

Authors:  Jitendra Kumar; Samatha Gunapati; Shahryar F Kianian; Sudhir P Singh
Journal:  Protoplasma       Date:  2018-04-12       Impact factor: 3.356

Review 2.  How do plants remember drought?

Authors:  Ayan Sadhukhan; Shiva Sai Prasad; Jayeeta Mitra; Nadeem Siddiqui; Lingaraj Sahoo; Yuriko Kobayashi; Hiroyuki Koyama
Journal:  Planta       Date:  2022-06-10       Impact factor: 4.116

3.  Transcriptomics for Drought Stress Mediated by Biological Processes in-relation to Key Regulated Pathways in Gossypium darwinii.

Authors:  Cuilian Xu; Muhammad Kashif Ilyas; Richard Odongo Magwanga; Hejun Lu; M Kashif Riaz Khan; Zhongli Zhou; Yujun Li; Zhengcheng Kuang; Asif Javaid; Danish Ibrar; Abdul Ghafoor; Kunbo Wang; Fang Liu; Haodong Chen
Journal:  Mol Biol Rep       Date:  2022-07-30       Impact factor: 2.742

4.  GBS-SNP and SSR based genetic mapping and QTL analysis for drought tolerance in upland cotton.

Authors:  Ravi Prakash Shukla; Gopal Ji Tiwari; Babita Joshi; Kah Song-Beng; Sushma Tamta; N Manikanda Boopathi; Satya Narayan Jena
Journal:  Physiol Mol Biol Plants       Date:  2021-08-20

5.  Late Embryogenesis Abundant (LEA)5 Regulates Translation in Mitochondria and Chloroplasts to Enhance Growth and Stress Tolerance.

Authors:  Barbara Karpinska; Nurhayati Razak; Daniel S Shaw; William Plumb; Eveline Van De Slijke; Jennifer Stephens; Geert De Jaeger; Monika W Murcha; Christine H Foyer
Journal:  Front Plant Sci       Date:  2022-06-16       Impact factor: 6.627

6.  Effect of thermospermine on expression profiling of different gene using massive analysis of cDNA ends (MACE) and vascular maintenance in Arabidopsis.

Authors:  G H M Sagor; Stefan Simm; Dong Wook Kim; Masaru Niitsu; Tomonobu Kusano; Thomas Berberich
Journal:  Physiol Mol Biol Plants       Date:  2021-03-14

7.  Genome-wide association analysis reveals seed protein loci as determinants of variations in grain mold resistance in sorghum.

Authors:  Habte Nida; Gezahegn Girma; Moges Mekonen; Alemu Tirfessa; Amare Seyoum; Tamirat Bejiga; Chemeda Birhanu; Kebede Dessalegn; Tsegau Senbetay; Getachew Ayana; Tesfaye Tesso; Gebisa Ejeta; Tesfaye Mengiste
Journal:  Theor Appl Genet       Date:  2021-01-16       Impact factor: 5.699

8.  Overexpression of Ks-type dehydrins gene OeSRC1 from Olea europaea increases salt and drought tolerance in tobacco plants.

Authors:  Samuel Aduse Poku; Zafer Seçgin; Musa Kavas
Journal:  Mol Biol Rep       Date:  2019-08-05       Impact factor: 2.316

9.  Contrasting roles of GmNAC065 and GmNAC085 in natural senescence, plant development, multiple stresses and cell death responses.

Authors:  Bruno Paes Melo; Isabela Tristan Lourenço-Tessutti; Otto Teixeira Fraga; Luanna Bezerra Pinheiro; Camila Barrozo de Jesus Lins; Carolina Vianna Morgante; Janice Almeida Engler; Pedro Augusto Braga Reis; Maria Fátima Grossi-de-Sá; Elizabeth Pacheco Batista Fontes
Journal:  Sci Rep       Date:  2021-05-27       Impact factor: 4.379

10.  Transcriptome Analysis of Tolerant and Susceptible Maize Genotypes Reveals Novel Insights about the Molecular Mechanisms Underlying Drought Responses in Leaves.

Authors:  Joram Kiriga Waititu; Xingen Zhang; Tianci Chen; Chunyi Zhang; Yang Zhao; Huan Wang
Journal:  Int J Mol Sci       Date:  2021-06-29       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.