Literature DB >> 30419064

SSR-Linkage map of interspecific populations derived from Gossypium trilobum and Gossypium thurberi and determination of genes harbored within the segregating distortion regions.

Pengcheng Li1,2, Joy Nyangasi Kirungu1, Hejun Lu1, Richard Odongo Magwanga1,3, Pu Lu1, Xiaoyan Cai1, Zhongli Zhou1, Xingxing Wang1, Yuqing Hou1, Yuhong Wang1, Yanchao Xu1, Renhai Peng4, Yingfan Cai2, Yun Zhou2, Kunbo Wang1, Fang Liu1.   

Abstract

Wild cotton species have significant agronomic traits that can be introgressed into elite cultivated varieties. The use of a genetic map is important in exploring, identification and mining genes which carry significant traits. In this study, 188 F2mapping individuals were developed from Gossypium thurberi (female) and Gossypium trilobum (male), and were genotyped by using simple sequence repeat (SSR) markers. A total of 12,560 simple sequence repeat (SSR) markers, developed by Southwest University, thus coded SWU were screened out of which only 994 were found to be polymorphic, and 849 markers were linked in all the 13 chromosomes. The map had a length of 1,012.458 cM with an average marker distance of 1.193 cM. Segregation distortion regions (SDRs) were observed on Chr01, Chr02, Chr06, Chr07 Chr09, Chr10 and Chr11 with a large proportion of the SDR regions segregating towards the heterozygous allele. There was good syntenic block formation that revealed good collinearity between the genetic and physical map of G. raimondii, compared to the Dt_sub genome of the G. hirsutum and G. barbadense. A total of 2,496 genes were mined within the SSR related regions. The proteins encoding the mined genes within the SDR had varied physiochemical properties; their molecular weights ranged from 6.586 to 252.737 kDa, charge range of -39.5 to 52, grand hydropathy value (GRAVY) of -1.177 to 0.936 and isoelectric (pI) value of 4.087 to 12.206. The low GRAVY values detected showed that the proteins encoding these genes were hydrophilic in nature, a property common among the stress responsive genes. The RNA sequence analysis revealed more of the genes were highly upregulated in various stages of fiber development for instance; Gorai.002G241300 was highly up regulated at 5, 10, 20 and 25 day post anthesis (DPA). Validation through RT-qPCR further revealed that these genes mined within the SDR regions might be playing a significant role under fiber development stages, therefore we infer that Gorai.007G347600 (TFCA), Gorai.012G141600 (FOLB1), Gorai.006G024500 (NMD3), Gorai.002G229900 (LST8) and Gorai.002G235200 (NSA2) are significantly important in fiber development and in turn the quality, and further researches needed to be done to elucidate their exact roles in the fiber development process. The construction of the genetic map between the two wild species paves away for the mapping of quantitative trait loci (QTLs) since the average distance between the markers is small, and mining of genes on the SSR regions will provide an insight in identifying key genes that can be introgressed into the cultivated cotton cultivars.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30419064      PMCID: PMC6231669          DOI: 10.1371/journal.pone.0207271

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Cotton (Gossypium spp.) is the most important fiber crop and one of the sources for animal feeds and edible oil. However cotton growing has significantly been threatened by various abiotic and biotic stresses, this condition has been worsened by intensive selection and inbreeding resulting into the narrow genetic base [1]. Improving cotton yield and fiber quality is important for the survival of the cotton industry [2]. Serious outbreaks of diseases and pests have resulted in great loss in fibre production and its quality. Researchers are facing difficulties in developing new varieties of cotton to meet emerging challenges, this is because of the limited diversity in the Germplasm of the commercially cultivated, upland cotton [3]. Wild cotton species have rich reservoir of genetic material, much of which has potential and valuable agronomic traits [4]. Some of the useful alleles introgressed into elite cultivars have been achieved through interspecific hybridization for instance, improvement in fibre quality where, long fibre length upland cotton, G. hirsutum was achieved through tri-interspecific hybridization between G. thurberi, G. raimondii and G. barbadense [5]. Drought resistance in upland cotton has been achieved through the utilization of alleles from Asiatic Cottons [6]. Production of fertile hybrid germplasm with diploid Australian Gossypium species has been achieved [7]. Cotton are of two types, the diploid and the tetraploid species, the diploid have 13 chromosomes, n = 2n = 26, while the tetraploid cotton emerged due to whole genome duplication of the two diploid parental lines, resulting into 2n = 4n = 52 chromosomes [8]. The diploid cotton are subdivided into A,B, C, D,E,F,G and K genomes, on the other hand, there are 7 known species of AD genome [9]. Among the diploid cotton genomes, D genome has been found to harbor high number of significant agronomic traits, such as superior fiber qualities, and high tolerance to both biotic and abiotic stresses [10,11]. In a number studies done on the tetraploid cotton, more on quantitative trait loci (QTL) mapping, high number of QTLSs have been found to be mapped on the Dt_sub genome compared to At_sub genome [12-15]. This explains the significance on the D genome and ability to utilize the genes in D cotton species to improve the elite cotton cultivars, which have been has narrow genetic base due to intensive selection and inbreeding [16-18]. The two wild cotton species, G. thurberi and G. trilobum belong to the diploid cotton of the D genome. G. thurberi Todaro (D8) is a wild cotton species, native to Mexico in the Sonora Desert and parts of the southwestern part of the United States of America (USA). G. thurberi has good characteristics that can be introgressed into elite cultivars such as fibre fineness, fibre strength, long fibre, prolific boll bearing, resistance to Fusarium wilt, resistance to frost and cotton bollworms [19]. In addition, G. thurberi has been found to be highly resistant to silver leaf whitefly [20]. The second parental line used in this research, G. trilobum (D1) is an endemic species of West and central Mexico. It has glabrous leaves which is a key character for its identification, it has important agronomic traits such as resistance to Verticillium wilt and drought tolerance [21]. The application of genetic maps between interspecific crosses in cotton, have become vital tools in understanding the genome structure, exploring important agronomic traits and also provide the basis for finding new DNA markers for further construction of high density maps [22]. Currently there are limited numbers of genetic maps that have been constructed from interspecific crosses between the wild progenies of the D cotton genome. The use of Simple sequence repeats (SSRs) are considered to be one of the markers of choice for genome mapping, because they are PCR-based, co-dominancy, multiallelic and hyper-variable in nature [23]. SSR markers, derived from either genomic region or expressed sequence tags (EST), are considered to be essential in the construction of genetic maps. In addition, EST-SSR markers have been extensively used in unraveling the complexities of eukaryotic organisms genomes being that they are directly tagged to the functional genes [24] In this study we developed an F2generation between two wild cotton species in the D genome; G. thurberi and G. trilobum. We applied the use of mono-markers, SWU simple sequence repeat (SSR) in genotyping 188 individuals of the F2generation. The developed genotypes were applied in the construction of the genetic map; the map enabled us to unearth some of the vital transcriptome factors with profound effect on fiber development. The linkage map and the genes mined will provide a basis in genetic studies such as Marker-assisted selection (MAS) and gene transformation.

Materials and methods

Parental materials

G. thurberi as female parent was crossed with G. trilobum as male parent to obtain F1 generation. The F1 generation was then self-pollinated to get the F2individuals. A total of 274 F2were obtained through F1 self-crossing. From the F2progenies, 188 individuals were randomly selected for genotypic analysis with the polymorphic markers. The two parental materials and the F2 progenies were developed at the National Wild Cotton Nursery in Sanya, Hainan Island, China.

DNA extraction, quantification and electrophoresis

The leaves from the parents, F1 individual and F2progenies were collected and stored in the fridge at -80°C. DNA extraction was done following the CTAB method [25]. DNA quantification and purification was then done to determine the concentration and level of RNA contamination using the Nanodrop techniques, Spectrophotometer was used for quantification and quality checking depending on A260/A280 [26]. Concentration of genomic DNA was estimated by comparing the size and intensity of each sample band with those of sizing standard, DNA mass ladder. We then diluted the sample according to each sample concentration until it was within the working concentration range; the DNA working concentration was based on 10–100 μg/μl. The polymerase chain reaction (PCR) amplification on the reagents was conducted using TAKARA Bio Inc TP 600 thermal cycler. Electrophoresis was performed on the PCR product following the method described by [25] with minor modifications. The amplified PCR products were separated on 8% denaturing polyacrylamide gel and visualized by silver nitrate staining [27].

Application of SSR markers genotyping the F2progenies derived from the two diploid parental lines

We employed the use of expressed sequence tag-simple sequence repeat (EST-SSR) mono-markers developed by South West University, China thus the acronym SWU. The SWU markers were developed from G. raimondii genome. A total of 12,650 markers were screened for polymorphism, out of this we obtained 996 polymorphic loci which were used to genotype 188 F2 individuals, the Details of the SWU markers, forward and reverse sequence are summarized in (S1 Table). The male plant G. trilobum, the female plant G. thurberi and the heterozygous F2 progenies were scored as A, B, and H respectively, The missing data was designated as‘-’.Multi-allelic markers were named separately by primer name followed by the letters a, b, c, and d as a suffix.

Genetic map construction

We employed the use of Join Map 4.0 with a recombination frequency of 0.40 and a LOD score of 2.5 for the formation of linkage groups [28]. Linkage groups were assigned to chromosomes depending on blast searches on the markers since these markers are newly developed. The linkage groups were then drawn using Mapchart 2.3 Software [29], A Chi-square (χ2) test was performed to determine whether the markers significantly deviated from Mendelian segregation ratios The markers showing segregation distortion were indicated by asterisks (*P<0.05, **P<0.01, ***P<0.00, ****P<0.001, *****P<0.0005, ******P<0.0001, *******P<0.00005. The markers that deviated significantly from the normal Mendelian ratio of 3:1 for dominant markers and 1:2:1 for codominant markers were termed to be segregated distorted markers and were used to determine segregation distortion in the linkage groups [30].

Gene mining, protein characterization and GO functional annotation

The physical positions of the flanking markers were employed in mining the genes. The SSR marker sequences were used as the query by blasting in to the reference genome, G. raimondii genome assembly, being the markers were developed from G. raimondii. By employing the physical position and use of cotton genome database, all the genes were obtained per each chromosome. The method adopted was similar to previous method employed by Magwanga et al [11] in obtaining the conserved genes between two tetraploid cotton, G, hirsutum and G. tomentosum. Further analysis were carried out on the various genes mined in order to determine the characteristics of the proteins encoding the mined genes and their putative roles in cotton through GO annotation, which was carried out through BLAST2GO [31]. Furthermore, the isoelectric points (pI), grand hydropathy values (GRAVY), charge and molecular masses of the proteins encoding the mined genes were estimated by ExPASy Server tool (http://web.expasy.org/compute_pi/).

RNA expression analysis and RT-qPCR validation of the highly upregulated genes

We obtained the RNA sequence data for the genes mined within the SDR regions. The RNA-seq data was obtained from the Cotton Functional Genomics Database (https://cottonfgd.org/). The RNA sequenced data were for the reference genome, G. raimondii profiled at different stages of fiber development. The Raw RNA seq data were transformed into log 2, and used in the construction of heatmap. Furthermore, we selected 50 highly upregulated genes, and carried out RT-qPCR analysis in order to validate the possible role of these mined genes in fiber development using their gene specific primers (S2 Table). The two parental lines flowers were tagged and samples harvested at 0, 5, 25 and 30 DPA for real time quantitative polymerase chain reaction (RT-qPCR). The RT-qPCR analysis was carried out as outlined by Magwanga et al [1], cotton GrActin with forward sequence “ATCCTCCGTCTAGACCTTG” and reverse sequence “TGTCCATCAGGCAACTCAT” was used as the reference gene

Collinearity analysis

A BLASTN Search with E ≤ 1 × 10−5, identity ≥ 80%, and matched length ≥ 200 bp was applied (https://blast.ncbi.nlm.nih.gov/Blast.cgi). The SSR sequences were used as queries, the genome assemblies of (AD)1 [32] genome and (AD)2 [33] were used in collinearity analysis. Markers with the best hits were chosen and the Circos program (http://circos.ca) was applied to draw the circos maps.

Results

Parental polymorphism

The SWU primers used were 12,560 in total and were used for screening for interspecific polymorphism between the two parental lines, G. trilobum, G. thurberi and their F1 generation. A total of 994 markers were obtained as polymorphic which accounted for only 7.91% of all the markers screened. A total of 132 (13.3%) markers were scored as dominant markers while 862 (86.7%) markers were scored as codominant. In our study we noted that the rate of polymorphism in the eSSRs markers used was lower this could be due to DNA sequences conserved at transcribed regions [34]. Low levels of polymorphism have been reported in other plants for instance in peanut (6.8%) polymorphism was detected among the eSSRs, [35], maize (1.4%), rice (4.7%), sorghum (3.6%), wheat (3.2%) [36], in Gossypium species lower polymorphic rate of the eSSRs have been reported [37-39]. However the eSSRs remain to be the markers of choice due to their ability to detect incomplete dominance inheritance, cost less and having a good genomic coverage despite the lower polymorphism observed in some plant [40].

Genetic map construction and determination of the segregation distortion regions (SDRs)

All the polymorphic markers used in the genotyping of the F2progenies were successfully scored and utilized in the construction of the linkage map by the use of JoinMap. A total of 849 out 994 polymorphic markers were linked and distributed across the entire 13 chromosomes of the D genome (Fig 1.). The details for the SSR markers and the alleles scores used for the construction of the genetic map are shown in (S3 Table). The distribution of markers on the linkages was symmetric and there was no clustering of loci, 145 loci were not linked due to high distortions. The genetic map size generated was 1,012.458 cM with an average marker distance of 1.193 cM. The chromosome with the highest marker loci density was Chr09 with 93 (11%) markers followed by Chr05 with 89 (10.5%) markers while Chr02 had the least number of markers loci with only 21 markers. The largest gap between adjacent loci was observed on Chr01 covering 15.699 cM while Chr04, Chr08, Chr10 and Chr11 had the smallest gap of 0.001 cM (Table 1). The longest chromosome was Chr12 spanning a distance of 103.563 cM while the shortest chromosome was Chr02 with a map distance of 28.665cM. A total of 714 loci (84.216%) accorded with the Mendelian ratio while 135 (15.783%) deviated from Mendelian ratio, chromosomes with the highest number of distorted loci were Chr07 and Chr11 with 23 distorted loci each while Chr12 had the least number of distorted loci with only 2 distorted loci (Table 1). Some regions on linkage groups had large clustered segregation distortion loci (SDLs); these regions were referred as segregation regions (SDR’S). A total of 8 SDR regions were noted on Chr01, Chr02, Chr06, Chr09, Chr10 and Chr11 each had a single SDR, designated as SDR1, SDR2, SDR6, SDR9, SDR10, and SDR11 respectively while Chr07 had two SDR’S namely SDR7-1, SDR7-2. Large clusters of segregated distorted loci on these regions were observed on Chr02, Chr06, Chr07 and Chr11. The largest SDR’s were skewed towards the heterozygous allele (Table 2).
Fig 1

Genetic maps for the 13 chromosomes of the F2interspecific individuals derived between G. thurberi and G. trilobum.

The markers in blue are distorted while markers in red and underlined indicates the distorted regions per chromosomes.

Table 1

Characteristics of the genetic map.

Chro.Mapped markersMap Size (cM)Av. Map distance (cM)Gaps (cM) per ChromosomeSegregation Distortion
Smallest gap (cM)Largest Gap (cM)<10 cMAve. %SDsG. thurberiG. trilobumToward heterozygoteNumber of SD
Chr0160102.7611.7130.01815.699542054312
Chr022128.6651.3650.0033.5542042.85721710
Chr035663.6011.1360.00419.969528.9290415
Chr047059.2290.8460.00113.685598.5711506
Chr058992.5631.040.0028.12808.9892518
Chr067364.2130.880.0028.2556910.9594318
Chr076069.0031.150.00613.5035738.333411823
Chr086480.0531.2510.0017.053607.8132125
Chr099396.5591.0380.00413.2898110.75342410
Chr1058103.5631.7860.00112.1574620.6932712
Chr118264.6040.7880.0018.1047128.049401923
Chr124188.2882.1530.06716.95344.8781102
Chr138299.3561.2120.0027.4216513.41565011
Totals8491012.461.1930.00911.36674815.783383463135

SD: segregation distortion; cM: centiMorgans; G: Gossypium; Chr: chromosome

Table 2

Analysis of the protein structure and domain features of the genes located within the SDRs.

Gene IDDomainGeneNameDescriptionChro.StartEndLen.(bp)Exon No.Mean Exon Len. (bp)Mean Intron Len.(bp)
Gorai.002G229500---Chr0258,870,42358,870,9535311531intronless
Gorai.002G229600-AGP23Arabinogalactan peptide 23Chr0258,874,91158,875,4445341534intronless
Gorai.002G229700---Chr0258,886,29658,886,6373421342intronless
Gorai.002G235600---Chr0259,612,40659,614,3561,9514329.3211.3
Gorai.002G235700---Chr0259,615,96159,616,5906301630intronless
Gorai.006G069500---Chr0628,104,13528,105,0579232394.5134
Gorai.006G099600-AGD14Probable ADP-ribosylation factor GTPase-activating protein AGD14Chr0634,010,08534,014,4824,3988208.1390.4
Gorai.007G221700---Chr0725,764,60025,767,6763,0773691.3501.5
Gorai.007G347400---Chr0757,766,11957,766,9618432304.5234
Gorai.007G348800---Chr0757,896,00957,902,1206,11213185.9307.9
Gorai.007G349100-CLE-4A-1CLAVATA3/ESR (CLE)-related protein 4A-1Chr0757,924,59357,926,0841,49211,492.00intronless
Gorai.007G349300---Chr0757,944,76457,948,9534,19010113.4339.6
Gorai.007G355900---Chr0758,647,17458,648,2361,0632458.5146
Gorai.009G367400---Chr0949,208,61649,209,0053901390intronless
Gorai.009G374800-rbm48RNA binding protein 48Chr0950,946,13250,950,9734,8427126.3374.8
Gorai.010G009800---Chr10718,711720,4371,72723181,091.00
Gorai.011G137000-SS4Probable starch synthase 4, chloroplastic/amyloplasticChr1121,004,03721,007,4183,382493.8729
Gorai.011G137600---Chr1121,157,12021,157,8917721772intronless
Gorai.011G137700---Chr1121,161,53021,162,0745451545No intron
Gorai.011G137900---Chr1121,170,29121,172,6012,3115237.8280.5
Gorai.011G141300--23 kDa jasmonate-induced proteinChr1122,263,23322,264,4101,1782539.599
Gorai.011G142900---Chr1122,543,65922,549,4075,7497209.1374.2
Gorai.012G141700---Chr1231,509,02531,511,3042,2804301.3358.3
Gorai.006G021800PF00010PRE5Transcription factor PRE5Chr065,591,6525,593,1081,4573178.3416.5
Gorai.007G349000PF00025-ADP-ribosylation factorChr0757,911,13657,917,5756,4407193.6829.8
Gorai.007G346300PF00046WOX11WUSCHEL-related homeobox 11Chr0757,419,36457,421,6152,2523393.3536
Gorai.001G088900PF00067CYP81D1Cytochrome P450 81D1Chr019,602,0339,604,3422,3102874562
Gorai.001G089000PF00067CYP81E8Cytochrome P450 81E8Chr019,618,6559,620,8142,1607149.1186
Gorai.002G241100PF00067CYP94C1Cytochrome P450 94C1Chr0260,454,08360,455,8171,73511,735.00intronless
Gorai.006G081800PF00067-Cytochrome P450 CYP736A12Chr0631,044,71231,046,2531,5423439112.5
Gorai.007G347700PF00067CYP89A2Cytochrome P450 89A2Chr0757,779,91457,781,4551,54211,542.00intronless
Gorai.007G347800PF00067CYP89A2Cytochrome P450 89A2Chr0757,788,88257,790,6961,81511,815.00intronless
Gorai.011G158300PF00069LECRKS2Receptor like protein kinase S.2Chr1128,179,90028,182,3982,49912,499.00intronless
Gorai.011G162200PF00069BAK1BRASSINOSTEROID INSENSITIVE 1-associated receptor kinase 1Chr1130,386,85430,389,2312,3784281418
Gorai.009G374600PF00071RABA1FRasnarelated protein RABA1fChr0950,937,23750,940,1262,89025431,804.00
Gorai.006G021900PF00076BPA1Binding partner of ACD11 1Chr065,598,5525,602,9654,4145373.4556.3
Gorai.011G142700PF00076--Chr1122,525,36322,528,4583,0963829304.5
Gorai.010G009900PF00083At1g75220Sugar transporter ERD6-like 6Chr10725,683731,8306,14819109.6223.8
Gorai.011G154700PF00083PHT1-5Probable inorganic phosphate transporter 1–5Chr1126,839,39026,841,1601,77111,771.00intronless
Gorai.007G347300PF00140SIGBR- polymerase sigma factor sigBChr0757,758,64957,762,7344,0868273.8270.9
Gorai.012G141500PF00141poxN1Peroxidase N1Chr1231,500,40931,501,7741,3663400.782
Gorai.007G350800PF00179UBC22Ubiquitin-conjugating enzyme E2 22Chr0758,098,36958,102,7394,3717170502.3
Gorai.011G137500PF00182EP3Endochitinase EP3Chr1121,155,48821,156,5811,0942502.589
Gorai.012G141400PF00223psaAPhotosystem I P700 chlorophyll a apoprotein A1Chr1231,499,25431,499,712459221921
Gorai.007G353100PF00225CENPECentromere-associated protein EChr0758,374,07658,394,94520,87034135.8492.5
Gorai.007G346400PF00249MYB44Transcription factor MYB44Chr0757,434,32257,436,1921,871356588
Gorai.007G348600PF00249MYB39Transcription factor MYB39Chr0757,887,88157,889,5341,6544302.8147.7
Gorai.010G010000PF00249RL6Protein RADIALIS-like 6Chr10734,790736,3671,57821441,290.00
Gorai.002G231300PF00282SDCSerine decarboxylaseChr0259,077,23959,080,2262,9885426.2195.5
Gorai.002G235500PF00293NUDT27Nudix hydrolase 27, chloroplasticChr0259,603,47459,606,6483,1756299.2276
Gorai.006G024400PF00295At1g80170Probable polygalacturonase At1g80170Chr066,292,0966,294,2612,1668151.1136.7
Gorai.006G023400PF00328PPIP5K1Inositol hexakisphosphate and diphosphoinositol-pentakisphosphate kinase 1Chr066,109,6716,122,78713,11729135.8325.8
Gorai.001G122300PF00332At2g27500Glucan endo-1,3-beta-glucosidase 14Chr0115,285,80215,288,9653,1644455111
Gorai.007G347200PF00385LHP1Chromo domain-containing protein LHP1Chr0757,754,60257,758,0583,4576357.8262
Gorai.002G229900PF00400lst8Protein LST8 homologChr0258,908,14958,913,5185,37011151.9369.9
Gorai.003G137300PF00403ATX1Copper transport protein ATX1Chr0339,650,47339,651,5571,0853302.788.5
Gorai.002G241200PF00428RPP2B60S acidic ribosomal protein P2BChr0260,460,97860,462,7861,8094190.3349.3
Gorai.002G235100PF00534DGD1Digalactosyldiacylglycerol synthase 1, chloroplasticChr0259,555,12459,562,2747,1517429691.3
Gorai.010G007600PF00590rsmIRibosomal R- small subunit methyltransferase IChr10482,439486,7104,27211158.5252.9
Gorai.002G229800PF00595CTPA3Carboxyl-terminal-processing peptidase 3, chloroplasticChr0258,890,30458,897,7307,42712185473.4
Gorai.011G135300PF00612IQD14Protein IQ-DOMAIN 14Chr1120,432,09320,437,5395,4477314541.5
Gorai.009G367200PF00646TULP7Tubby-like F-box protein 7Chr0949,203,68949,207,3503,6624441.8631.7
Gorai.007G221600PF00651At5g67385BTB/POZ domain-containing protein At5g67385Chr0725,760,93825,764,2263,2895539.8147.5
Gorai.011G168600PF00654CLC-DChloride channel protein CLC-dChr1134,350,55434,362,79312,24023164.3382.7
Gorai.001G121600PF00777GALT29ABeta-1,6-galactosyltransferase GALT29AChr0114,837,93614,840,1972,26211,191.00intronless
Gorai.003G137400PF00831RPL3560S ribosomal protein L35Chr0339,654,70339,656,2091,5074196.5240.3
Gorai.007G356000PF00931At4g27220Probable disease resistance protein At4g27220Chr0758,647,54058,664,75217,2138931.6557
Gorai.006G069600PF01015-40S ribosomal protein S3aChr0628,109,32928,112,6943,3667177.3354.2
Gorai.007G347100PF01113DAPB24-hydroxy-tetrahydrodipicoli-te reductase 2, chloroplasticChr0757,748,16257,753,9185,7579132.9570.1
Gorai.001G121800PF01161CET2CEN-like protein 2Chr0115,030,12915,031,2051,0774145165.7
Gorai.011G137100PF01169At1g68650GDT1-like protein 5Chr1121,017,32021,017,693374296182
Gorai.002G235200PF01201nsa2Ribosome biogenesis protein NSA2 homologChr0259,562,73259,565,5312,80010132.2162.2
Gorai.011G181800PF01326PPDPyruvate, phosphate dikinase, chloroplasticChr1143,199,80243,207,6417,84021161.6222.3
Gorai.011G136900PF01336At3g11710Lysine-tRNA ligase, cytoplasmicChr1120,997,06521,003,2646,20017138.8232.4
Gorai.002G231600PF01370TKPR2Tetraketide alpha-pyrone reductase 2Chr0259,112,37559,118,7136,3396223.7999.4
Gorai.007G349200PF01373BMY1Beta-amylaseChr0757,929,52257,934,3654,8448237.5420.6
Gorai.002G234600PF01397CAD1-A(+)-delta-cadinene synthase isozyme AChr0259,497,29559,509,75412,4606326.22,099.20
Gorai.006G099700PF01412AGD14Probable ADP-ribosylation factor GTPase-activating protein AGD14Chr0634,014,60334,015,4708683100284
Gorai.010G007400PF01419JAL3Jacalin-related lectin 3Chr10450,000455,2705,2717277.9554.3
Gorai.009G366600PF01436--Chr0949,122,68449,126,0973,4147281.3240.8
Gorai.010G012200PF01457GA17800Leishmanolysin-like peptidaseChr10943,286949,6716,38617202.1184.4
Gorai.006G024700PF01459-Mitochondrial outer membrane protein porin of 34 kDaChr066,318,7426,321,4872,7466233.7268.8
Gorai.011G160100PF01471--Chr1129,253,66329,256,7183,0567221.1233
Gorai.007G350700PF01485ARI7Probable E3 ubiquitin-protein ligase ARI7Chr0758,079,03058,087,9898,96017153.4392.3
Gorai.010G007500PF01565CKX5Cytokinin dehydrogenase 5Chr10477,052481,6194,5685429605.8
Gorai.006G084300PF01657CRK26Cysteine-rich receptor-like protein kinase 26Chr0631,770,48931,773,6533,1657352.3116.5
Gorai.001G121500PF01754SAP3Zinc finger A20 and AN1 domain-containing stress-associated protein 3Chr0114,830,58314,831,9951,4132564.5284
Gorai.002G234400PF02045NFYA10Nuclear transcription factor Y subunit A-10Chr0259,489,43659,493,9174,4826278.8561.8
Gorai.012G141600PF02152FOLB1Dihydroneopterin aldolase 1Chr1231,505,79631,507,8732,0783261.3647
Gorai.007G350900PF02182SUVH1Histone-lysine N-methyltransferase, H3 lysine-9 specific SUVH1Chr0758,110,93458,115,6394,70621,484.001,738.00
Gorai.010G012000PF02365NAC053NAC domain-containing protein 53Chr10903,218907,5624,3457221.6465.7
Gorai.010G008100PF02458HHT1Omega-hydroxypalmitate O-feruloyl transferaseChr10538,869541,3762,5082874.5759
Gorai.011G158900PF02636SPAC25A8.03c-DH dehydrogenase [ubiquinone] complex I, assembly factor 7 homologChr1128,720,03228,729,4359,40414129584.5
Gorai.001G088800PF02704RSI-1Protein RSI-1Chr019,592,1979,594,1411,9453248.7599.5
Gorai.011G141100PF02776-Acetolactate synthase 3, chloroplasticChr1122,244,55422,246,8582,30512,305.00intronless
Gorai.002G241400PF02922SBEI1,4-alpha-glucan-branching enzyme 1, chloroplastic/amyloplasticChr0260,478,91460,489,74810,83523135.5336.9
Gorai.007G347600PF02970TFCATubulin-folding cofactor AChr0757,776,55157,778,6002,0504197.3420.3
Gorai.003G137500PF03081EXO70A1Exocyst complex component EXO70A1Chr0339,661,65339,663,7842,13212,132.00intronless
Gorai.002G235800PF03168YLS9Protein YLS9Chr0259,628,42959,630,1181,69011,690.00intronless
Gorai.011G135200PF03405-Stearoyl-[acyl-carrier-protein] 9-desaturase, chloroplasticChr1120,394,79820,400,5265,7293459.72,175.00
Gorai.011G137800PF04012IM30Membrane-associated 30 kDa protein, chloroplasticChr1121,162,83021,169,8377,00812160.6461.9
Gorai.001G121400PF04212SKD1Protein SUPPRESSOR OF K(+) TRANSPORT GROWTH DEFECT 1Chr0114,812,12314,817,2895,1678234.8469.9
Gorai.007G348700PF04674EXL6Protein EXORDIUM-like 6Chr0757,893,38757,894,7871,4012572.5256
Gorai.009G367100PF04690YAB5Axial regulator YABBY 5Chr0949,185,11849,189,6374,5208137.6488.4
Gorai.006G024500PF04981NMD360S ribosomal export protein NMD3Chr066,301,4086,304,1802,77321,041.50690
Gorai.011G142800PF05419GUN4Tetrapyrrole-binding protein, chloroplasticChr1122,533,05522,534,2641,21011,210.00intronless
Gorai.006G067100PF05553--Chr0626,559,87126,560,7328621862intronless
Gorai.002G237800PF05577PRCPLysosomal Pro-X carboxypeptidaseChr0260,091,33660,098,1286,7939214.7585.1
Gorai.006G032900PF05691RFS5Probable galactinol—sucrose galactosyltransferase 5Chr068,508,7718,511,5212,7514619.591
Gorai.011G141200PF05695ycF2:3 -AProtein YcF2:3Chr1122,256,67022,258,0521,3832201981
Gorai.011G142600PF05773GCN2Probable serine/threonine-protein ki-se GCN2Chr1122,507,94422,525,34317,40028140.9498.3
Gorai.007G347500PF06219--Chr0757,772,59657,776,5583,9634506643.7
Gorai.010G007700PF07526BLH11BEL1-like homeodomain protein 11Chr10487,521490,9373,4174395.8611.3
Gorai.006G024600PF07714At5g15080Probable receptor-like protein kinase At5g15080Chr066,313,1786,317,7404,5636347.3487.2
Gorai.007G349400PF07714At3g07070Serine/threonine-protein kinase At3g07070Chr0757,951,54057,955,4863,9475365.6529.8
Gorai.009G347700PF07797--Chr0943,080,21943,081,9731,7553376118.5
Gorai.002G241300PF07992AFRRMonodehydroascorbate reductaseChr0260,463,57360,466,8963,32410177.9171.7
Gorai.007G353300PF08159nol10Nucleolar protein 10Chr0758,413,02958,420,9987,97016169.8350.3
Gorai.002G235300PF08263TMK3Receptor-like kinase TMK3Chr0259,576,58659,580,4613,87631,164.30190
Gorai.010G010100PF08263At2g16250Probable LRR receptor-like serine/threonine-protein kinase At2g16250Chr10751,818757,2645,4475725455.5
Gorai.011G162100PF08263RCH2Receptor-like protein kinase 2Chr1130,378,80730,382,8724,06621,983.5099
Gorai.009G367300PF08523MBF1BMultiprotein-bridging factor 1bChr0949,207,33549,209,6242,2904227.5460
Gorai.011G137400PF09247TAF1Transcription initiation factor TFIID subunit 1Chr1121,135,95721,154,67718,72121320.1600
Gorai.006G099800PF09405--Chr0634,021,04534,027,4016,35712260294.3
Gorai.007G353400PF09713--Chr0758,422,96958,430,4147,4469161.3735.6
Gorai.011G170500PF10153efg1rRNA processing protein efg1Chr1137,394,49037,397,9193,4308189.6254.9
Gorai.009G366500PF10517At5g54830Cytochrome b561, DM13 and DOMON domain-containing protein At5g54830Chr0949,101,41349,107,7836,37121,677.003,017.00
Gorai.002G235400PF11571MED27Mediator of R- polymerase II transcription subunit 27Chr0259,598,07159,602,2494,1797274.9375.8
Gorai.006G032700PF12530RST1Protein RST1Chr068,479,4788,494,55415,07725231.6386.9
Gorai.007G221800PF12767--Chr0725,772,74725,774,8732,1272889.5348
Gorai.003G137600PF12796At5g02620Ankyrin repeat-containing protein At5g02620Chr0339,671,61539,673,4551,8413542107.5
Gorai.001G122200PF12937At1g67190F-box/LRR-repeat protein At1g67190Chr0115,251,69715,256,2444,54837731,114.50
Gorai.001G121300PF13041PCMP-H60Pentatricopeptide repeat-containing protein At2g27610Chr0114,807,57714,810,6683,092398273
Gorai.006G023300PF13041PCMP-E79Putative pentatricopeptide repeat-containing protein At3g28640Chr066,089,6906,092,1402,45112,451.00intronless
Gorai.010G009700PF13041At1g19525Pentatricopeptide repeat-containing protein At1g19525Chr10712,070715,9303,8613930.7534.5
Gorai.007G353200PF13419NanpN-acylneurami-te-9-phosphataseChr0758,396,48658,400,8094,3246219.8601
Gorai.011G142500PF13540ACR4Serine/threonine-protein kinase-like protein ACR4Chr1122,503,34322,506,9773,63513,616.00intronless
Gorai.006G099500PF13637Ank2Ankyrin-2Chr0633,992,98233,995,6362,6554362.3402
Gorai.011G158400PF13833CML22Probable calcium-binding protein CML22Chr1128,182,63828,185,2252,5885256327
Gorai.002G231400PF13837--Chr0259,082,05059,084,2312,18212,182.00intronless
Gorai.002G231500PF13837GT-2Trihelix transcription factor GT-2Chr0259,100,46559,102,6132,14921,025.0099
Gorai.002G234500PF13855At1g74360Probable LRR receptor-like serine/threonine-protein kinase At1g74360Chr0259,494,56759,496,7692,2033695.358.5
Gorai.010G012100PF13867--Chr10917,545920,3662,8226221.5298.6
Gorai.006G032800PF14259At3g27700Zinc finger CCCH domain-containing protein 41Chr068,495,2958,503,8478,5536546.81,054.40
Gorai.011G162300PF14291--Chr1130,391,53230,394,3782,8475207453
Gorai.009G374700PF14929--Chr0950,940,93650,945,9785,0439230371.6

Genetic maps for the 13 chromosomes of the F2interspecific individuals derived between G. thurberi and G. trilobum.

The markers in blue are distorted while markers in red and underlined indicates the distorted regions per chromosomes. SD: segregation distortion; cM: centiMorgans; G: Gossypium; Chr: chromosome

Gene mining, protein characterization and Gene Ontology (GO) functional annotations of the mined genes

We conducted a blast search at regions up and down stream of 20 Kb of each SSR location using the total 846 SSR markers sequences that were extracted from D5 genome. 2,496 genes were identified. The genes were mapped in all the chromosomes the chromosome with the highest number of genes was Chr09 with 316 genes followed by Chr11with 257 genes while Chr02 had the least number of genes with only 48 genes. The genes were characterized for their physiochemical properties. The grand average hydropathy (GRAVY) values ranged between -2.335 and 1.654, their molecular weight ranged between 5.935 and 437.729 kDa, their charge ranged from -170.5 to 66 while the Isoelectric Point (pI) ranged from 3.435 to 12.839, there were 2049 genes that were hydrophilic while only 446 were hydrophobic as shown by their GRAVY values (S4 Table). From the GO blast analysis, all the three GO terms were detected, in which the highest being the cellular component with 9 functions, while the least was the molecular component with 6 functions (Fig 2). In cellular component (CC), the following genes were found to harbor critical functions, Gorai.009G374600, Gorai.012G141400, Gorai.007G347200, Gorai.001G121600, Gorai.011G137100, Gorai.010G012200, Gorai.011G141200, Gorai.007G353300 and Gorai.007G221800. The CC functions detected were; nucleus (GO: 0005634), chloroplast (GO: 0009507), membrane (GO: 0016020), integral to membrane (GO: 0016021), integral to Golgi membrane (GO: 0030173) and SAGA-type complex (GO: 0070461).
Fig 2

GO annotation results from the mined genes, the highest being cellular component with 9 functions.

The integrity of cell membrane and cell membranous is significant for normal functioning of the plant, when plants are exposed to any form of stress, the excessive production of reactive oxygen species (ROS), do degrades the cell membrane thus affecting the normal osmotic balance within the cell, which eventually lead to cell death [41,42]. In molecular functions (MF), 60 genes were found to be involved, with 27 different molecular functions, such as transferase activity, transferring phosphorus-containing groups (GO: 0016772), calcium ion binding (GO: 0005509), protein binding (GO: 0005515), transferase activity, transferring acyl groups other than amino-acyl groups among others (GO: 0016747), among others. In the determination of a gene with higher contributory role in cotton fiber development, a gene in ligon lintless-1 gene (Li gene) was found to harbor various molecular functions such as transferring acyl groups other than amino-acyl groups among others (GO: 0016747), which has been found to play an important role in cotton fiber elongation [43]. Finally, the GO functions detected to be involved in biological processes were 15, which included functions such as oxidation-reduction process (GO: 0055114), translational elongation (GO: 0006414), folic acid-containing compound metabolic process (GO: 0006760), response to light stimulus (GO: 0009416), regulation of transcription, DNA-dependent (GO: 0006355), microtubule-based movement (GO: 0007018) among others. Oxidation-reduction process, is important in plants responses to stress conditions [44], thus, the detection of this biological function among the genes obtained within the SDRs perhaps indicates that, these genes could be having a stress responsiveness functions in enhancing plants survival under abiotic stress conditions. Detailed information on the GO functions and the genes involved are summarized in (S5 Table). In the identification and characterization of the late embryogenesis abundant proteins (LEA) in cotton, Magwanga et al [1], found that integral to membrane (GO: 0016021), was detected for over 95% of the LEA genes, and this he postulated to have a functional role in maintaining the cell membrane integrity. Moreover, in the analysis of the genes which could been introgressed into the backcross population, BC2F2 developed from G. tomentosum a drought and salt resistant donor parent and G. hirsutum a high yielding tetraploid cotton but more susceptible to various forms of abiotic stress [18], revealed several GO functions, some which have been detected for the genes obtained within the SDRs in this study, an indication that these genes could be playing an important role in the plant.

Analysis of the structure of the genes within the SDR

We undertook to analyze the structures of the genes found within the segregation distortion regions as obtained for chr1, chr2, chr6, chr7, chr9, chr10 and chr11 with 9, 35, 16, 33, 10, 12 and 30 genes, respectively. Out of all the genes within the SDRs, 22 were intronless, of significant were Gorai.011G142800 (Tetrapyrrole-binding protein, chloroplastic), Gorai.002G235800 (Protein YLS9), Gorai.011G141100 (Acetolactate synthase 3, chloroplastic), Gorai.011G158300 (Receptor like protein kinase S.2), Gorai.007G347800 (Cytochrome P450 89A2), Gorai.011G142500 (Serine/threonine-protein kinase-like protein ACR4), Gorai.001G121600 (Beta-1,6-galactosyltransferase GALT29A) and Gorai.006G023300 (Putative pentatricopeptide repeat-containing protein At3g28640) (Table 2). In all the genes within the SDRs, 23 genes were classified as genes of unknown domain, accounting for 16% of all the genes mined within the various SDRs across the seven (7) chromosomes, being the remaining genes were from different domains. The dominant domain among the remaining 123 genes was the P450; Cytochrome P450 (PF00067) with six (6) genes which were Gorai.001G088900 (Cytochrome P450 81D1), Gorai.001G089000 (Cytochrome P450 81E8), Gorai.002G241100 (Cytochrome P450 94C1), Gorai.006G081800 (Cytochrome P450 CYP736A12), Gorai.007G347700 (Cytochrome P450 89A2) and Gorai.007G347800 (Cytochrome P450 89A2). Cytochromes P450 (CYPs)are proteins of the superfamily containing heme as a cofactor), therefore, they are hemoproteins [45]. The cytochromes (CYPs) use a variety of small and large molecules as substrates) in enzymatic reactions. They are the terminal oxidase enzymes in electron transfer chains, broadly categorized as P450-containing systems The term "P450" is derived from the spectrophotometric peak at the wavelength of the absorption maximum of the enzyme (450 nm) when it is in the reduced state and combined with carbon (II) oxide. In plants, CYPs are involved in numerous biosynthetic reactions, which leads to plant hormones production, secondary metabolites synthesis, fatty acid conjugation, lignification of various plant tissues, and production of various defensive compounds [46]. Plant cytochrome P450 genes make up 1% of the plant genes as per the annotations of plant genome. The number and diversity of these genes is believed to trigger numerous bioactive compounds [47]. The detection of these genes within the SDR could explain the significance of these regions in the evolution of new functional transcriptome within the plants.

Collinearity analysis between the genetic map and the physical map reference genome, G. raimondii (D5)

We performed Collinearity analysis between the constructed genetic map of G. thurberi and G. trilobum with a reference to G. raimondii physical map. All the SSR markers full sequences were used to do blast search against the physical genomic map of G. raimondii; and the matching sites were extracted from blast result for collinearity analysis. After removal of redundant markers 846 SSR markers located in genetic linkage map produced 869 loci, which translated to 95.9% of the mapped markers showed consistency between two maps (Fig 3); however there were 36 markers that were in non-conformity to the physical map of the reference genome (Table 3). There were six (6) inversions noted on Chr02, Chr03, Chr07, and Chr13, and also four translocations in Chr03, Chr08, and Chr09 (Table 4), the map developed was of high resolution. Comparison of genetic and physical map is important in confirming the order of genetic markers, using the information from sequence-based physical maps and also to support the genetic-marker order [48]. The collinearity analysis conducted between our genetic map and the physical map with the reference genome being G. raimondii indicated good collinearity between the chromosomes in the genetic map and the physical map; it also confirms the accuracy of the genetic map.
Fig 3

Collinearity between genetic and physical map of D5.

The different colors represent the various syntenic block regions in the chromosomes.

Table 3

Marker inconformity between the genetic and physical map.

MarkersGenetic location (cM)Physical location (bp)
Linkage groupLocationChromosomeStartEnd
SWU18284G_Chr0152.019P_Chr092231626422316364
SWU15054G_Chr0161.469P_Chr062318099823181098
SWU16907G_Chr0172.737P_Chr081486995414870054
SWU12734G_Chr0215.036P_Chr033966789139667991
SWU21545G_Chr0219.215P_Chr123148921831489318
SWU15145G_Chr032.45P_Chr063179202031792120
SWU14306G_Chr0332.028P_Chr053217704032177140
SWU22022G_Chr0332.614P_Chr132296467922964779
SWU22040G_Chr0332.803P_Chr132429370624293806
SWU22028G_Chr0333.245P_Chr132338070723380807
SWU18938G_Chr040P_Chr097017096970171069
SWU18954G_Chr043.422P_Chr097047858670478686
SWU12135G_Chr060P_Chr0376420547642154
SWU15193G_Chr0728.58P_Chr063400200534002105
SWU14289G_Chr0733.571P_Chr053083080830830908
SWU19619G_Chr0741.198P_Chr105038983450389934
SWU10330G_Chr0836.173P_Chr011526714515267245
SWU13865G_Chr090P_Chr046114657361146673
SWU13887G_Chr0912.515P_Chr046200845262008552
SWU15131G_Chr1038.761P_Chr063104845331048553
SWU16204G_Chr1073.246P_Chr073030026730300367
SWU14046G_Chr1142.846P_Chr0580588878058987
SWU12579G_Chr1151.861P_Chr033312702333127123
SWU16010G_Chr1319.276P_Chr071678362616783726
SWU16010G_Chr1319.276P_Chr071678369116783791
SWU19829G_Chr1338.499P_Chr106130604061306140
SWU16525G_Chr1344.267P_Chr075537885055378950
SWU16010G_Chr1347.694P_Chr071678362616783726
SWU16010G_Chr1347.694P_Chr071678369116783791
SWU12058G_Chr1349.895P_Chr0345675474567647
SWU11359G_Chr1352.381P_Chr022849236428492464
SWU13955G_Chr1355.048P_Chr0530031743003274
SWU12482G_Chr1355.598P_Chr032754957027549670
SWU10804G_Chr1357.636P_Chr015073951950739619
SWU13263G_Chr1358.021P_Chr041832521218325312
SWU13263G_Chr1358.021P_Chr041832526018325360
Table 4

Chromosomes showing inversion and translocation between the genetic and physical map of D5.

Chromosome No.MarkersLocation of Genetics (cM)Location of Physical map (Mb)Event
Chr02SWU11559-SWU112106.558–8.01215.95–50.80inversions
Chr03SWU12367-SWU2202831.53–33.24519.33–23.38translocations
Chr03SWU12731-SWU1273324.321–24.57539.56–39.87translocations
Chr03SWU12611-SWU1262128.579–28.59734.51–34.81inversions
Chr05SWU14526-SWU1448140.808–41.70148.23–51.89inversions
Chr07SWU16069-SWU1605727.244–27.49919.34–20.05inversions
Chr08SWU16899-SWU1691666.435–67.51614.37–15.76translocations
Chr09SWU18603-SWU1852886.688–87.12138.71–46.91translocations
Chr13SWU21880-SWU2192154.134–54.1818.43–11.56inversions
Chr13SWU22169-SWU2214251.086–51.25335.16–36.79inversions

Collinearity between genetic and physical map of D5.

The different colors represent the various syntenic block regions in the chromosomes.

Collinearity analysis of genetic map to that of the physical map (Dt) for G. hirsutum (GhDt) and G. barbadense (GbDt)

A blast search was conducted by using 846 SSR markers from the genetic map and was screened on the Dt-sub genome of G. hirsutum and G. barbadense, out of the total markers 745 and 337 markers were aligned to the assembly genome of G. hirsutum (GhDt) and G. barbadense (GbDt) respectively. 16% of marker in GhDt and 15% of markers in GbDt showed consistency between two maps; however 85% of markers in both sub genomes were in non-conformity between genetic and physical map (Fig 4A, Fig 4B, S6 Table and S7 Table). From the results obtained on the two collinearity analysis, showed that there was a closer relationship between the genetic map and physical map of GhDt than GbDt, this is clearly shown by higher number of markers that were in conformity with GhDt rather than GbDt, this lead to formation of better syntenic blocks between the genetic and physical map of GhDt. Good syntenic block formation was however observed between chromosome 3 in both the two sub genomes. This could possibly mean that more genes have been introgressed from the two wild cotton species into G. hirsutum rather than to G. barbadense, from previous studies on genes introgression within G. hirsutum, a higher percentage of the introgression observed (43.7%) was accounted for by wild accessions, as compared to improved accessions (18.4%) whereas within G. barbadense, 33.1% of the introgression was accounted for by wild accessions, and only 27.1% of the introgression was accounted for by improved accessions, thus the wild accessions accounted for more introgression in G. hirsutum than in G. barbadense [49]. The results obtained are in agreement to the earlier reports, in which fibre quality traits such as, high fibre length have been found to be introgressed into G. hirsutum from its wild progenitors of the D genome, G. thurberi [5]. Moreover, G. hirsutum and G. barbadense are said to have a common ancestry however interference by human activities and abiotic factors have made them to evolve different agronomic traits [50]. However, their genomic sequences have made it possible to study the divergence and comparative analysis of the two species.
Fig 4

Collinearity analysis.

(A) Analysis between genetic map and physical map of GhDt while (B). Analysis between genetic map and physical map of GbDt.

Collinearity analysis.

(A) Analysis between genetic map and physical map of GhDt while (B). Analysis between genetic map and physical map of GbDt.

RNA sequence data analysis and RT-qPCR validation of genes in the SDR regions

The RNA sequence data for the genes at the SDR were obtained using their homeolog genes of the upland cotton; G. hirsutum sequenced at different development stages cotton fiber 0, 5, 10, 20 and 25 day post anthesis (DPA) from cotton genome data base and analyzed. Based on the expression profile, the genes were categorized into four groups. Group 1 members were highly up regulated, though some were found to be only highly up regulated in some stages of fiber development. Group 2, exhibited differential expression, with some being up regulated while other were either down regulated or not expressed. Group 3, majority of the genes were down regulated, with only very few genes were either partially up regulated or not expressed. Lastly groups 4 were not expressed in all the various stages of fiber development (Fig 5). Furthermore, 30 highly upregulated genes later validated through RT-qPCR, at similar fiber development stages. Majority of the genes were highly inducted in G. thurberi as compared to G. tribolum an indication that G. thurberi had a higher potential of producing superior fibers compared to G. trilobum (Fig 6A). Similar findings have been previously reported in which G. thurberi has been found to have superior [19]. The expression pattern for the fifty (50) analysed genes through RT-qPCR analysis revealed that these genes could be playing an important role in various stages of cotton fiber development, for instance Gorai.007G347600 (Tubulin-folding cofactor A), Gorai.012G141600 (Dihydroneopterin aldolase 1), Gorai.006G024500 (60S ribosomal export protein NMD3), Gorai.002G229900 (Protein LST8 homolog) and Gorai.002G235200 (Ribosome biogenesis protein NSA2 homolog) were highly upregulated at different stages of fiber development. Studies have been conducted to investigate the role of tubulin in cotton fiber development, the tubulin genes were found to be upregulated at specific stages of fiber development, the transcript α-tubulin genes GhTua2/3 and GhTua4 were found to be highly upregulated from 10 to 20 DPA, while GhTua1 and GhTua5 transcripts were highly upregulated from 0 DPA up to 14 DPA then registered a significant drop at 16 DPA with the onset of secondary wall synthesis [51-53]. For the results obtained in the RNA sequence and RT-qPCR analysis, we observed that the highly up regulated genes were mainly enzymes that performed catalytic activities. Similar results have been observed in maize where gene cluster was observed on five adjacent genes (Bx1–Bx5) that encode enzymes for successive steps in the biosynthesis of the cyclic hydroxamic acid 2,4-dihydroxy-1,4-benzoxazin-3-one [54]. We further analysed the gene structures of the 50 highly up regulated genes as per the RNA sequencing results, all the genes were disrupted by introns except six (6) genes, which were Gorai.010G010100 (Probable LRR receptor-like serine/threonine-protein kinase At2g16250), Gorai.011G141100 (Acetolactate synthase 3, chloroplastic), Gorai.002G231400 (uncharacterized gene), Gorai.011G158300 (Receptor like protein kinase S.2), Gorai.007G350900 (Histone-lysine N-methyltransferase, H3 lysine-9 specific SUVH1) and Gorai.006G024500 (60S ribosomal export protein NMD3) (Fig 6B). Several stress responsive genes have been found to be heavily laden with introns, such as the LEA genes [55], cyclin dependent kinase (CDK) genes [56], G protein coupled receptors (GPCRs) [57] among others.
Fig 5

Heat map for the RNA expression seq. for the mined genes at the SDR regions in relation to fiber development.

The heat map was visualized using Mev.exe program (Showed by log 2 values). (i) Red-up regulated, green-down regulated and black- no expression. DPA: day post anthesis.

Fig 6

RT-qPCR validation of the selected genes.

(A): Heat map for the 50 highly upregulated genes as per the RNA seq. The heat map was visualized using Mev.exe program (Showed by log 2 values). (i) Red-up regulated, green-down regulated and black- no expression. DPA: day post anthesis (B): Gene structure analysis of the 50 selected genes. Abbreviation: DPA: day post anthesis; Gth: Gossypium thurberi; Gtr: Gossypium trilobum.

Heat map for the RNA expression seq. for the mined genes at the SDR regions in relation to fiber development.

The heat map was visualized using Mev.exe program (Showed by log 2 values). (i) Red-up regulated, green-down regulated and black- no expression. DPA: day post anthesis.

RT-qPCR validation of the selected genes.

(A): Heat map for the 50 highly upregulated genes as per the RNA seq. The heat map was visualized using Mev.exe program (Showed by log 2 values). (i) Red-up regulated, green-down regulated and black- no expression. DPA: day post anthesis (B): Gene structure analysis of the 50 selected genes. Abbreviation: DPA: day post anthesis; Gth: Gossypium thurberi; Gtr: Gossypium trilobum.

Discussion

Construction of genetic maps has become increasingly important in the understanding of marker- assisted selection (MAS) in plants and its efficiency in gene mapping. The use of SSR markers in the construction of genetic maps have a great significance since these markers are abundant and have high levels of transferability, low levels of intra-locus, relative abundance and good genome coverage [58]. In the development of our genetic map we employed the use of EST-SSR (eSSRs) SWU mono markers, ESR-SSR markers have been used in the construction of several genetic maps in cotton.[59-61] The ESSRs are based on expressed sequences and are conserved across cotton species and other closely related plant species. Wild cotton has been known to have both advantageous and disadvantageous traits, therefore the development of genetic maps of interspecific and intraspecific crosses aids in the introgression of advantageous alleles into the already cultivated cultivars. We developed a genetic map, from an interspecific cross between two wild cotton species in diploid cotton in the D genome. So far, fewer genetic maps developed from the diploid D genome have been reported. The constructed genetic map had a total length of 1,012.458 cM with an average distance between the loci of 1.193 cM, a total of 849 markers loci were used in the map development. The map developed had a much higher genome coverage compared to earlier developed maps from the diploid genomes, for instance, a map between G. arboreum and G. herbaceum had a total length of 1,109 cM with an average marker distance of 7.92 cM between loci [59]. Although the map size was relatively smaller compared to dense genetic maps developed, the average distance between two marker loci (1.193 cM) was smaller compared to previously developed maps, this indicates that the map is suitable for analysis of quantitative trait loci (QTLs) and gene mining [62]. The genetic map between the two sister species G. thurberi and G. trilobum is the second to have been developed from wild species in the D genome, the first map was developed from an interspecific cross between G. davidsonii and G. klotzschianum. However several maps have been developed using diploid cotton from A genomes. Segregation distortion (SD) is the deviation observed on genotypic frequencies from expected Mendelian ratios [63]. This phenomenon has been reported in other plants such as Maize [64], Barley [65] and Potatoes [66]. From our genetic map we noted segregation distortion loci (SDLs) in all the chromosomes; however they were unevenly distributed within the 13 chromosomes. Chr07 and Chr11 had the highest number of SDLs with 23 distorted loci each. Similar results were recorded on the genetic maps in Gossypium spp [61] Chr02, Chr07 and Chr11 [67], Chr07 (>50% of the loci were distorted). Some SDL were clustered in specific regions on the chromosome these regions were designated as the segregation distortion regions (SDRs). The largest SDRs were observed on Chr02, Chr06, Chr07-2 and Chr1. The largest SDR’s were skewed towards the heterozygous, similar results were also recorded from previous studies by Liang et al [38]. The skewness towards heterozygosity could be due to the genetic loci expressing themselves at different times leading to gametophytic and zygotic selection. Results from earlier constructed genetic map in cotton showed that larger SDRs were located on Chr02 [39], Chr02 had 3 SDRs (>50% of loci were distorted); [68], Chr02, Chr16 and Chr18 [69]. Most interestingly we also noted that in most of these genetic maps Chr02 had fewer number of marker loci. SDRs exhibiting similar patterns of distortion at the same chromosomal regions in several species-related populations could lead to the identification of common genetic factors causing these phenomena, [64]. From these results we concluded that Chr02 could be carrying vital genes that could be segregating around these SDRs and therefore making the flanking markers to segregate. Hence there is need to mine genes within these regions. The genes would help to ravel the issues of SDRs through the identification of important traits and genome wide association studies, for example Bovill et al [70] identified gene for crown rot resistance in wheat around the SDR, similar Sr36 gene locus was detected in the SDR on chromosome 2B [71]. From the physiochemical properties of the genes mined we noted that the gravy values range were both positive and negative values, indicating that the proteins encoding the mined genes were both hydrophilic and hydrophobic in nature [72]. We noted that the identified genes were more hydrophilic in nature rather than hydrophobic, many genes contained proteins and enzymes related to metabolism and disease/defense; these proteins are mainly activated when in solution forms hence the occurrence of more hydrophilic genes than hydrophobic genes. We analyzed the genes located on the SDR to determine if they had role in segregation distortion of the flanking markers. We noted that within the SDR there were common gene domains; Cytochromes P450 (CYPs) appeared in almost all the SDRs with six members, the Myb-like DNA binding domain with four members and the Multi-protein bridging factor 1 domain with three members, these genes could probably have caused segregation of flanking markers. It could also be due to same gametophyte factors or unknown genes in a population segregating, thus exhibiting segregation distortion in the same chromosomal regions [73]. We further observed that the chromosomes with largest distortions had the highest number of genes; interestingly we noted that Chr02 had the least markers (21), with the shortest map size of 28.665 cM but had highest number of genes which were 35 genes. This implied that the genes located in these regions could have been segregating due to zygotic or gametophytic factors or other underlying factors hence there is need to do more research on these genes locate on the SDRs in chromosome 2. The expression analysis of the genes at the SDR region showed that some of the genes were involved in fibre development. The expression of genes in the SDR regions revealed that most of the genes that were up regulated were involved in enzyme catalytic activities; examples of these genes include Serine decarboxylase, Tetraketide alpha-pyrone reductase 2, Digalactosyldiacylglycerol synthase-1-chloroplastic, and Mediator of RNA polymerase II transcription subunit 27 among others. The Cytochromes P450 (CYPs) were the dominant domain among the genes mined within the segregation regions (SDRs). This superfamily is among the largest group of enzymes in plants, they play vital role in a range of metabolic pathways. [74]. The name cytochrome is gotten from the spectral absorbance maximum, produced when carbon (II) oxide binds to the enzyme in its reduced state produced at 450 nm [75]. They have been found to function in all the eukaryotes. The two species G. thurberi and G. trilobum are also known to be resistance to soil-borne fungal pathogens, Fusarium wilt and Verticillium wilt respectively, enzymes play major function in various fungal metabolisms such as biochemical reactions, adaptation to hostile environment and detoxification of chemicals [76], and this could explain their higher number. In addition, G. thurberi also possess other beneficial agronomic traits, such as resistant to cotton bollworm and silver leaf whitefly, thus the reflection of the higher number of CYP, they play important role in both insects and plants, and they participate in a range of spectrum of plant toxins metabolized by insects and the defense compounds manufactured by plants. [77]. SDR has become a common feature in plants, and it is believed that the SDRs have significant effect on mapping and breeding applications. High level of distortions has been found in a number of plants, for instance in Medicago sativa L. 24% and 34% of markers have been found to be distorted in the mapping of the F1 generation, while in the F2 generation, very high level of distortion of 68% per linkage [78,79], similarly SD has also been observed in rice chromosome 9 in doubled haploid, recombinant inbred [64]. Segregation distortion is believed to be caused by a group of genetic elements near the centromere of chromosomes, and now been seen as a potentially powerful evolutionary force, has been observed in monocotyledons plants such as maize [80]. The detection of these genes provides further evidence of the significance role of the SDRs in plants.

Conclusions

The use of genetic maps between wild cotton species has significance in identification of vital alleles with profound agronomic benefits that could be introgressed into elite cotton cultivar. Cotton farming is facing challenge emanating from environmental stresses such as cold, drought and salinity. The application of the two species would help molecular breeders in introgression of the identified vital genes into already cultivated cotton that were mined within the SSR regions. The two wild species in the D genome; G. thurberi as female parent and G. trilobum as the male parent were used in the construction of a fine genetic map, this map will provide a basic tool for researchers to conduct evaluation of QTLs and identification of novel genes along the SSR regions. The genetic map had a length of 1,012.458 cM with an average length between the loci of 1.193 cM. A total of 849 loci were successfully mapped in all the 13 chromosomes. Chromosome regions with obvious segregation distortion were identified in this map, approximately 16% of mapped markers showed distorted segregation in the F2progenies. 2,495 genes were mined within the SSR region and characterized on their physiochemical properties. We further analyzed the genes within the SDR region with an aim of identifying genes that could be segregating within the SDR, we noted that the common gene domain (Cytochromes P450 (CYPs) appeared in almost all the SDRs and it contained six members. Further analysis of this gene domain will enable understanding of the role they play in SDRs by future molecular breeders. The constructed linkage map will allow future breeders to identify the markers that linked to the trait of interest and use them in marker-assisted breeding program and genome wide studies.

Ethical approval and consent to participate

No ethical nor consent to participate in this research was sought.

SWU marker sequences details.

(XLSX) Click here for additional data file.

Details for primers used for RT-qPCR analysis.

(DOCX) Click here for additional data file.

SWUmarkers and their allele scores.

The male plant G. trilobum, the female plant G. thurberi and the heterozygous F2 progenies were scored as A, B, and H respectively. The missing data was designated as‘-’.Multi-allelic markers were named separately by primer name followed by the letters a, b, c, and d as a suffix. (XLSX) Click here for additional data file.

Physiochemical properties of the proteins encoding the mined genes as obtained from the genetic map developed between the two wild cotton species of the D genome.

(XLSX) Click here for additional data file.

Gene Ontology analysis of the genes obtained within the SDR regions.

(DOCX) Click here for additional data file.

Markers inconformity between genetic map and the physical map G. hirsutum (GhDt).

(DOCX) Click here for additional data file.

Markers inconformity between genetic map and the physical map of G. barbadense (GbDt).

(DOCX) Click here for additional data file.
  54 in total

1.  Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat.

Authors:  Ramesh V Kantety; Mauricio La Rota; David E Matthews; Mark E Sorrells
Journal:  Plant Mol Biol       Date:  2002 Mar-Apr       Impact factor: 4.076

Review 2.  Functional genomics of P450s.

Authors:  Mary A Schuler; Daniele Werck-Reichhart
Journal:  Annu Rev Plant Biol       Date:  2003       Impact factor: 26.379

Review 3.  Animal and plant cytochrome P-450 systems.

Authors:  J L Riviere; F Cabanne
Journal:  Biochimie       Date:  1987 Jun-Jul       Impact factor: 4.079

4.  A simple method for displaying the hydropathic character of a protein.

Authors:  J Kyte; R F Doolittle
Journal:  J Mol Biol       Date:  1982-05-05       Impact factor: 5.469

5.  QTL mapping of yield and fiber traits based on a four-way cross population in Gossypium hirsutum L.

Authors:  Hongde Qin; Wangzhen Guo; Yuan-Ming Zhang; Tianzhen Zhang
Journal:  Theor Appl Genet       Date:  2008-07-05       Impact factor: 5.699

6.  Simple sequence repeat genetic linkage maps of A-genome diploid cotton (Gossypium arboreum).

Authors:  Xue-Xia Ma; Bao-Liang Zhou; Yan-Hui Lü; Wang-Zhen Guo; Tian-Zhen Zhang
Journal:  J Integr Plant Biol       Date:  2008-04       Impact factor: 7.061

7.  QTL Mapping for Fiber and Yield Traits in Upland Cotton under Multiple Environments.

Authors:  Hantao Wang; Cong Huang; Huanle Guo; Ximei Li; Wenxia Zhao; Baosheng Dai; Zhenhua Yan; Zhongxu Lin
Journal:  PLoS One       Date:  2015-06-25       Impact factor: 3.240

8.  Simple Sequence Repeat (SSR) Genetic Linkage Map of D Genome Diploid Cotton Derived from an Interspecific Cross between Gossypium davidsonii and Gossypium klotzschianum.

Authors:  Joy Nyangasi Kirungu; Yanfeng Deng; Xiaoyan Cai; Richard Odongo Magwanga; Zhongli Zhou; Xingxing Wang; Yuhong Wang; Zhenmei Zhang; Kunbo Wang; Fang Liu
Journal:  Int J Mol Sci       Date:  2018-01-11       Impact factor: 5.923

9.  Utility of EST-derived SSR in cultivated peanut (Arachis hypogaea L.) and Arachis wild species.

Authors:  Xuanqiang Liang; Xiaoping Chen; Yanbin Hong; Haiyan Liu; Guiyuan Zhou; Shaoxiong Li; Baozhu Guo
Journal:  BMC Plant Biol       Date:  2009-03-24       Impact factor: 4.215

10.  Whole Genome Analysis of Cyclin Dependent Kinase (CDK) Gene Family in Cotton and Functional Evaluation of the Role of CDKF4 Gene in Drought and Salt Stress Tolerance in Plants.

Authors:  Richard Odongo Magwanga; Pu Lu; Joy Nyangasi Kirungu; Xiaoyan Cai; Zhongli Zhou; Xingxing Wang; Latyr Diouf; Yanchao Xu; Yuqing Hou; Yangguang Hu; Qi Dong; Kunbo Wang; Fang Liu
Journal:  Int J Mol Sci       Date:  2018-09-05       Impact factor: 5.923

View more
  3 in total

1.  Genotyping-by-sequencing and multilocation evaluation of two interspecific backcross populations identify QTLs for yield-related traits in pigeonpea.

Authors:  Rachit K Saxena; Sandip Kale; Reyazul Rouf Mir; Nalini Mallikarjuna; Pooja Yadav; Roma Rani Das; Johiruddin Molla; Muniswamy Sonnappa; Anuradha Ghanta; Yamini Narasimhan; Abhishek Rathore; C V Sameer Kumar; Rajeev K Varshney
Journal:  Theor Appl Genet       Date:  2019-12-16       Impact factor: 5.699

2.  Identification and characterization of genes related to salt stress tolerance within segregation distortion regions of genetic map in F2 population of upland cotton.

Authors:  Muhammad Shehzad; Zhongli Zhou; Allah Ditta; Majid Khan; Xiaoyan Cai; Yanchao Xu; Amir Maqbool; Ahlam Khalofah; Muhammad Shaban; Muhammad Naeem; Mohammad Javed Ansari; Kunbo Wang; Fang Liu
Journal:  PLoS One       Date:  2021-03-26       Impact factor: 3.752

3.  Cytological and molecular characterizations of a novel 2A nullisomic line derived from a widely-grown wheat cultivar Zhoumai 18 conferring male sterility.

Authors:  Zhixin Jiao; Xinxin Zhu; Huijuan Li; Zhitao Liu; Xinyi Huang; Nan Wu; Junhang An; Junchang Li; Jing Zhang; Yumei Jiang; Qiaoyun Li; Zengjun Qi; Jishan Niu
Journal:  PeerJ       Date:  2020-10-30       Impact factor: 2.984

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.