Literature DB >> 35757545

Strong Selectional Forces Fine-Tune CpG Content in Genes Involved in Neurological Disorders as Revealed by Codon Usage Patterns.

Rekha Khandia1, Anushri Sharma1, Taha Alqahtani2, Ali M Alqahtani2, Yahya I Asiri2, Saud Alqahtani2, Ahmed M Alharbi3, Mohammad Amjad Kamal4,5,6,7.   

Abstract

Neurodegenerative disorders cause irreversible damage to the neurons and adversely affect the quality of life. Protein misfolding and their aggregation in specific parts of the brain, mitochondrial dysfunction, calcium load, proteolytic stress, and oxidative stress are among the causes of neurodegenerative disorders. In addition, altered metabolism has been associated with neurodegeneration as evidenced by reductions in glutamine and alanine in transient global amnesia patients, higher homocysteine-cysteine disulfide, and lower methionine decline in serum urea have been observed in Alzheimer's disease patients. Neurodegeneration thus appears to be a culmination of altered metabolism. The study's objective is to analyze various attributes like composition, physical properties of the protein, and factors like selectional and mutational forces, influencing codon usage preferences in a panel of genes involved directly or indirectly in metabolism and contributing to neurodegeneration. Various parameters, including gene composition, dinucleotide analysis, Relative synonymous codon usage (RSCU), Codon adaptation index (CAI), neutrality and parity plots, and different protein indices, were computed and analyzed to determine the codon usage pattern and factors affecting it. The correlation of intrinsic protein properties such as the grand average of hydropathicity index (GRAVY), isoelectric point, hydrophobicity, and acidic, basic, and neutral amino acid content has been found to influence codon usage. In genes up to 800 amino acids long, the GC3 content was highly variable, while GC12 content was relatively constant. An optimum CpG content is present in genes to maintain a high expression level as required for genes involved in metabolism. Also observed was a low codon usage bias with a higher protein expression level. Compositional parameters and nucleotides at the second position of codons played essential roles in explaining the extent of bias. Overall analysis indicated that the dominance of selection pressure and compositional constraints and mutational forces shape codon usage.
Copyright © 2022 Khandia, Sharma, Alqahtani, Alqahtani, Asiri, Alqahtani, Alharbi and Kamal.

Entities:  

Keywords:  RSCU; codon usage; dinucleotide ratio; fine tuning of CpG dinucleotide; metabolism-related genes; neurodegeneration

Year:  2022        PMID: 35757545      PMCID: PMC9226491          DOI: 10.3389/fnins.2022.887929

Source DB:  PubMed          Journal:  Front Neurosci        ISSN: 1662-453X            Impact factor:   5.152


Introduction

Neurodegenerative disorders are incurable and debilitating pathological conditions resulting in progressive degeneration and possible death of nerve cells. Such diseases pose a major threat to human health due to deterioration in the quality of life and premature mortality. Economic impacts also have been associated with long-term in-home caregiving. Many neurodegenerative disorders have shown an association with misfolding of proteins and their aggregation in specific brain regions (Soto, 2003). The most common neurodegenerative disorders associated with misfolded protein aggregation are Alzheimer’s disease (AD) and Parkinson’s disease (PD). Multiple lines of evidence have connected the link of Aβ and tau in AD and α-syn proteins in PD. However, it is still unclear whether the presence of abnormal proteins is the consequence of disease or its cause (Bourdenx et al., 2017). The presence of misfolded proteins and their aggregation might be attributed to the genetic mutations in genes related to the disease. Other shared pathologies between neurodegenerative diseases are mitochondrial dysfunction, glutamate toxicity, calcium load, proteolytic stress, and oxidative stress (Muddapu et al., 2020). The changes in disease-specific proteins are associated with enhanced oxidative stress, initiation of inflammatory processes, and neuronal damage (Ballard et al., 2011). A comparative study of genome-wide gene expression data of 93 brain tissue samples obtained from patients with AD, PD, Huntington’s disease (HD), acute myeloid leukemia (AML), and multiple sclerosis revealed a high number of dysregulated genes is associated with each disorder. Still, no gene was shared across all conditions (Durrenberger et al., 2015). This finding indicates that no single shared mechanism is involved in neurodegenerative disorders. However, the results of Durrenberger et al. (2015) did not include an assessment of protein expression and post-translational modifications, which may result in misleading conclusions. In neurodegenerative diseases, specific neuronal clusters have been found more likely to serve as the primary site for the spread of neuronal pathology (Fu et al., 2018). This vulnerable population exhibits specific morphological features, including long-range neuronal projections and extensive synaptic connections, making them selectively vulnerable due to the higher metabolic requirements for structural integrity maintenance (Pacelli et al., 2015). Muddapu et al. (2020) proposed that the pathological markers of neurodegenerative diseases, including protein misfolding, oxidative stress, and mitochondrial dysfunction, are the direct consequences of metabolic anomalies. For instance, insulin plays a role in cholesterol metabolism essential to myelination and the regulation of amyloid protein degrading enzymes (Wang et al., 2014). Insulin resistance causes an imbalance of glucose metabolism and results in hyperglycemia and oxidative stress, leading to inflammatory response and neuronal damage. Alerted levels of amino acid in the brain and serum of AD patients have been documented. Since glutamate and its metabolite gamma-aminobutyric acid (GABA) are excitatory and inhibitory neurotransmitters, respectively, we can speculate that the alterations in glutamate may adversely affect neural functioning (Esposito et al., 2013). Glutamine and alanine levels are also reduced in the blood of patients with transient global amnesia (Sancesario et al., 2013). Higher homocysteine-cysteine disulfide and lower methionine levels have been documented in the serum of AD patients. In the normal human brain, the activity of the enzyme ornithine transcarbamoylase is very low, thereby preventing the urea cycle (Bensemain et al., 2009). AD patients experience a 44% decline of urea in serum (González-Domínguez et al., 2015). All this evidence suggests the central role of metabolism malfunctioning in neurodegenerative disorders. After observing a potential connection between metabolic disturbances and neurodegeneration, we were tempted to study those metabolism-associated genes that contribute to neurodegeneration if malfunctioning. In case of clinical features associated with neurological consequences like cerebral edema, cerebellar ataxia, coma, seizures, stroke and intellectual disability along with hyperammonemia, protein avoidance, low plasma citrulline and hypoargininemia, commercially, the genetic diagnosis is available, and information regarding the genes those are involved may be obtained. Therefore, we used the information available through commercial sources and assessed a panel of 60 genes associated with neurodegeneration that are directly or indirectly involved in the metabolism or transport of metabolites in brain cells. These genes are associated with several neurodegeneration symptoms, including neurocognitive deficiencies, attention-deficit/hyperactivity disorder, developmental delays, seizures, learning disabilities, lethargy, somnolence, refusal to feed, vomiting, tachypnea, respiratory alkalosis, fatal neonatal encephalopathy with hypotonia, and many others. All proteins are made up of amino acids, and 61 codons encode 20 amino acids. All amino acids are coded by two or more synonymous codons, excluding methionine and tryptophan. This usage of synonymous codons is not equal, and often, some of the synonymous codons are used preferably over others. This phenomenon is called codon usage bias (CUB), which is attributed to various factors, including overall compositional constraints (Deka and Chakraborty, 2014), selectional or mutational forces (Hershberg and Petrov, 2008), gene expression levels (Zhou et al., 2016), and the tRNA pool (Quax et al., 2015). The gene expression is affected by codon usage choice. The genes with higher expression levels exhibit a higher codon adaptation index (CAI), and the most abundant proteins have higher CAI values (Henry and Sharp, 2007). Bioinformatics and biomedical research have permitted an expanded understanding of the pathobiology of neurodegenerative disorders. Thus far, little research has been conducted on the genes involved in neurodegeneration from the metabolism perspective. In the present study, the codon usage pattern of 60 relevant genes is studied to elucidate various forces (such as mutational, selectional, or compositional) acting upon them. In the present study, we calculated various indices, including parity and nucleotide skews, to determine the compositional disproportion. Neutrality, parity, ENc-GC3 curve and regression analysis between nucleotide compositions were carried out to reveal the impact of evolutionary forces. In addition, the CAI and relative synonymous codon usage (RSCU) were determined to evaluate the codon preferences. Various statistical methods have been employed to see the association between various molecular features. The analyses helped determine various molecular signatures, evolutionary forces acting on genes and codon usage patterns related to the genes involved in metabolism and neurodegeneration. The results of this study will provide insight into the factors affecting codon choices along with the expression level information of these genes.

Materials and Methods

Data Collection

The genes analyzed for neurodegenerative disorders were obtained from the NCBI Genetic Testing Registry NGS Neurodegenerative disorders Multi-Gene Panel. For neurodegenerative symptoms with evidence of disturbed metabolism (cerebral edema, cerebellar ataxia, coma, seizures, stroke, and intellectual disability along with hyperammonemia, protein avoidance, low plasma citrulline and hypoargininemia), next-generation sequencing is recommended. Laboratory of genome diagnostics, LGD-AUMC Academic Medical Center, University of Amsterdam, offers NGS for a Multi-Gene Panel for diagnostic purposes. The gene panel offered by them was taken in the present study. A total of 183 transcripts belonging to 60 genes were studied, shown in Table 1 with the function of each gene and the number of its transcripts utilized. To be utilized in the study, a transcript/coding sequence (CDS) must be in a reading frame and contain no nucleotides other than A, T, G, or C (for example, R, Y, or B representing A/G, C/T, and C/G/T respectively). In addition, the sequences were devoid of UAA, UAG, or UGA stop codons within sequences. A total of 264096 nucleotides and 88032 codons were studied.
TABLE 1

The list of genes involved in neurodegenerative disorders with the location on the human chromosome, the disease involved, and the number of transcripts.

S No.Name of geneSynonymLocation of chromosomeDisease involvedNeurological manifestationsNo. of transcript studiedMetabolic function (if any)
1 ABCD1 ALD; AMN; ALDP; ABC42 Xq28AdrenoleukodystrophyX-linked adrenoleukodystrophy (X-ALD) affects the white matter and the adrenal cortex.1Encodes a transporter protein localized into the peroxisomal membrane and participate in the metabolism of very-long-chain fatty acids
2 ADSL ASL; AMPS; ASASE 22q13.1Adenylosuccinate lyase deficiencyA disorder characterized by intellectual disability, psychomotor delay and/or regression, seizures, and autistic features4Participate in biosynthesis of purines
3 ALDH7A1 EPD; PDE; ATQ1 5q23.2Pyridoxine dependent epilepsyIntractable seizures within the first weeks to months of life. Response obtained to large daily supplements of pyridoxine2Codes for enzyme α-aminoadipic semialdehyde (α-AASA) dehydrogenase that breakdown lysine in the brain
4 APTX AOA; AOA1; AXA1; EAOH; EOAHA; FHA-HIT 9p21.1Coenzyme Q10 deficiencyAssociated with multisystem involvement, including neurologic manifestations such as fatal neonatal encephalopathy with hypotonia; a late-onset of slowly progressive multiple-system atrophy-like phenotype (neurodegeneration with autonomic failure and various combinations of parkinsonism and cerebellar ataxia, and pyramidal dysfunction); and dystonia, spasticity, seizures, and intellectual disability.23Role in single stranded DNA repair
5 ARG1 arginase1 6q23.2Arginase deficiencyArginase deficiency in untreated individuals is characterized by episodic hyperammonaemia of variable degree that is infrequently severe enough to be life threatening or to cause death.3A manganese-containing enzyme catalyzing the final step in the urea cycle for disposal of toxic ammonia by converting l-arginine to l-ornithine and urea
6 ARSA ASA; MLD 22q13.33Metachromatic leukodystrophyArylsulfatase A deficiency (also known as metachromatic leukodystrophy or MLD) is characterized by characterized by the damage of the myelin sheath resulting in progressive motor and cognitive impairment as clinical manifestations6Responsible for coding the enzyme arylsulfatase A that help in processing sulfatides a subgroup of sphingolipids
7 ASL ASAL 7q11.21Arginosuccinate lyase deficiencyThe severe neonatal-onset form is characterized by hyperammonaemia (Neurocognitive deficiencies, attention-deficit/hyperactivity disorder, developmental delay, seizures, and learning disability) within the first few days after birth that can manifest as increasing lethargy, somnolence, refusal to feed, vomiting, tachypnea, and respiratory alkalosis. Absence of treatment leads to worsening lethargy, seizures, coma, and even death4Involved in Alanine, aspartate and glutamate metabolism, urea cycle metabolism and metabolism of amino acids.
8 ASS1 ASS; CTLN1 9q34.11Citrullinemia type ILater-onset form is associated with intense headaches, blind spots (scotomas), problems with balance and muscle coordination (ataxia), and lethargy.2Codes for argininosuccinate synthase 1 enzyme. It participates in the urea cycle
9 BCKDHA MSU; MSUD1; OVD1A; BCKDE1A 19q13.2Maple syrup urine diseaseAccumulation of leucine, isoleucine, and valine and their byproducts is toxic to the nervous system and lead to seizures, developmental delay, and the other health problems associated with maple syrup urine disease.2Encodes for alpha-keto acid dehydrogenase involved in breakdown of leucine, isoleucine, and valine
10 BCKDHB E1B; BCKDE1B; BCKDH E1-beta 6q14.1Maple syrup urine diseaseAccumulation of leucine, isoleucine, and valine and their byproducts is toxic to the nervous system and lead to seizures, developmental delay, and the other health problems associated with maple syrup urine disease.3Encodes for alpha-keto acid dehydrogenase involved in breakdown of leucine, isoleucine, and valine
11 CBS CBSL; HIP4 21q22.3Classic homocystinuriaCharacterized by involvement of the eye, skeletal system, vascular system and CNS5Codes for cystathionine beta-synthase which converts homocysteine and serine to cytathionine
12 COQ2 MSA1; CL640; COQ10D1; PHB:PPT 4q21.23Coenzyme Q10 deficiency, primary 1Primary coenzyme Q10 (CoQ10) deficiency is usually associated with multisystem involvement, including neurologic manifestations such as fatal neonatal encephalopathy with hypotonia; a late-onset of slowly progressive multiple-system atrophy-like phenotype (neurodegeneration with autonomic failure and various combinations of parkinsonism and cerebellar ataxia, and pyramidal dysfunction); and dystonia, spasticity, seizures, and intellectual disability.2Required in the biosynthetic pathway of COQ (ubiquinone). This enzyme catalyzes the prenylation of p-hydroxybenzoate with an all-trans polyprenyl group.
13 COQ8A COQ8; ADCK3; ARCA2; CABC1; SCAR9; COQ10D4 1q42.13Coenzyme Q10 deficiency, primary 1Deficiency is usually associated with multisystem involvement, including neurologic manifestations such as fatal neonatal encephalopathy with hypotonia; a late-onset slowly progressive multiple-system atrophy-like phenotype (neurodegeneration with autonomic failure and various combinations of parkinsonism and cerebellar ataxia, and pyramidal dysfunction); and dystonia, spasticity, seizures, and intellectual disability.1Required in the biosynthetic pathway of CoQ (ubiquinone). This enzyme catalyzes the prenylation of p-hydroxybenzoate with an all-trans polyprenyl group
14 COQ9 COQ10D5; C16orf49 16q21Coenzyme Q10 deficiency, primary 1Primary coenzyme Q10 (CoQ10) deficiency is usually associated with multisystem involvement, including neurologic manifestations such as fatal neonatal encephalopathy with hypotonia; a late-onset slowly progressive multiple-system atrophy-like phenotype (neurodegeneration with autonomic failure and various combinations of parkinsonism and cerebellar ataxia, and pyramidal dysfunction); and dystonia, spasticity, seizures, and intellectual disability.1Required in the biosynthetic pathway of CoQ (ubiquinone). This enzyme catalyzes the prenylation of p-hydroxybenzoate with an all-trans polyenyl group
15 CPS1 PHN; GATD6; CPSASE1 2q34Congenital hyperammonemia, type IA rare, severe disorder of urea cycle metabolism typically characterized by either a neonatal-onset of severe hyperammonemia that occurs few days after birth and manifests with lethargy, vomiting, hypothermia, seizures, coma and death or a presentation outside the new-born period at any age with (sometimes) milder symptoms of hyperammonemia4Encodes for carbamoyl phosphate synthetase I. It converts ammonium into carbamoyl phosphate, and plays an intricate role in arginine metabolism and pyrimidine metabolism
16 CYP27A1 CTX; CP27; CYP27 2q35Cerebrotendinous xanthomatosis (Cholestanol storage disease)Cerebrotendinous xanthomatosis (CTX) is a lipid storage disease characterized by infantile-onset diarrhea, childhood-onset cataract, adolescent- to young adult-onset tendon xanthomas, and adult-onset progressive neurologic dysfunction (dementia, psychiatric disturbances, pyramidal and/or cerebellar signs, dystonia, atypical parkinsonism, peripheral neuropathy, and seizures).1Encodes for sterol 27-hydroxylase. It breaks down cholesterol to form a bile acid called chenodeoxycholic acid. Maintains normal cholesterol levels in the body.
17 DBT E2; E2B; BCATE2; BCKADE2; BCKAD-E2; BCKDH-E2; BCOADC-E2 1p21.2Maple syrup urine diseaseElevated concentrations of branched-chain amino acids (BCAAs; leucine, isoleucine, and valine) and alloisoleucine, as well as a generalized disturbance of amino acid concentration ratios, are present in blood and the maple syrup odor can be detected in crewmen1Encodes for a part of branched-chain alpha-keto acid dehydrogenase (BCKD) enzyme complex. It helps in the breakdown of the branched amino acids leucine, isoleucine, and valine
18 DDC AADC 7p12.2-12.1Deficiency of aromatic-L-amino-acid decarboxylaseAutosomal recessive inborn error in neurotransmitter metabolism that leads to combined serotonin and catecholamine deficiency7Encodes for aromatic l-amino acid decarboxylase (AADC) enzyme. It converts L-dopa and 5-hydroxytryptophan to dopamine and serotonin
19 DLD LAD; DLDD; DLDH; GCSL; PHE3; OGDC-E3 7q31.1Maple syrup urine diseaseCharacterized by an overlapping continuum that ranges from early-onset neurologic manifestations to adult-onset liver involvement and, rarely, a myopathic presentation.4Encodes for an enzyme called dihydrolipoamide dehydrogenase. Involved in metabolism of leucine, isoleucine, and valine
20 GAMT PIG2; CCDS2; TP53I2; HEL-S-20 19p13.3Inborn errors of creatine metabolismGAMT deficiency is characterized by Symptoms ranging from mild intellectual disability and speech delay to severe intellectual disability, seizures, movement disorder, and behavior disorder2This enzyme participates in glycine, serine and threonine metabolism and arginine and proline metabolism.
21 GATM AGAT; CCDS3; FRTS1 15q21.1Inborn errors of creatine metabolismIntellectual disability and seizures, behavior disorder that can include autistic behaviors and self-mutilation2Encodes for arginine:glycine amidinotransferase. Participate in synthesis of glycine, arginine, and methionine.
22 GCDH GCD; ACAD5 19p13.13Glutaric aciduria, type 1Result in acute bilateral striatal injury and subsequent complex movement disorders. increased risk for renal disease2Encodes for glutaryl-CoA dehydrogenase. Involved in the metabolism of tryptophan, lysine and hydroxylysine
23 GCH1 GCH; DYT5; DYT14; DYT5a; GTPCH1; HPABH4B; GTP-CH-1 14q22.2Dopa-responsive dystonia (Segawa syndrome)Typically characterized by signs of parkinsonism that may be relatively subtle. Such signs may include slowness of movement (bradykinesia), tremors, stiffness and resistance to movement (rigidity), balance difficulties, and postural instability.4Encodes for GTP cyclohydrolase 1. Involved in the production of a molecule called tetrahydrobiopterin, that process phenylalanine into tyrosine
24 GNS G6S 12q14.3Sanfilippo syndromeA rare autosomal recessive lysosomal storage disease affecting the metabolism of mucopolysaccharides. Signs and symptoms include behavioral changes, sleep disorders, mental developmental delays, and seizures1Affect the Metabolism of mucopolysaccharides. Some related pathways Glycosaminoglycan metabolism is other related pathway
25 HCLS HCS 21q22.13Holocarboxylase synthetase deficiencyDeficiency leads to complications of metabolic acidosis, seizure, and hyperammonemia that can result in long-term neurological sequelae and developmental disability8Encodes for Holocarboxylase synthetase. Related to inborn error of biotin metabolism
26 HGSNAT RP73; HGNAT; MPS3C; TMEM76 8p11.21-11.1Sanfilippo syndromeSigns and symptoms include behavioral changes, sleep disorders, mental developmental delays, and seizures4This gene encodes a lysosomal acetyltransferase, which is one of several enzymes involved in the lysosomal degradation of heparin sulfate
27 HPRT1 HPRT; HGPRT Xq26.2-26.3Lesch-Nyhan syndromeLesch-Nyhan disease (LND) at the most severe end with motor dysfunction resembling severe cerebral palsy, intellectual disability, and self-injurious behavior1Encodes for hypoxanthine phosphoribosyltransferase 1. Rare disorder of purine metabolism
28 IDS ID2S; MPS2; SIDS Xq28Mucopolysaccharidosis Type I/IIShort stature; macrocephaly with or without communicating hydrocephalus; macroglossia; hoarse voice; conductive and sensorineural hearing loss; hepatosplenomegaly; dysostosis multiplex; spinal stenosis; and carpal tunnel syndrome.2Codes for iduronate 2-sulfatase responsible for breakdown of large sugar molecules glycosaminoglycans
29 IDUA IDA; MPS1; MPSI 4p16.3Mucopolysaccharidosis Type I/IINeurological complications may include damage to neurons. Pain and impaired motor function (ability to start and control muscle movement) may result from compressed nerves or nerve roots in the spinal cord or in the peripheral nervous system.3Encodes for alpha-L-iduronidase which is an enzyme involved in the metabolism of glycosaminoglycans (GAGs).
30 IVD IVDH; ACAD2 15q15.1Isovaleryl-CoA dehydrogenase deficiencyA rare autosomal disorder. It is characterized by abnormalities in the metabolism of leucine. The genetic deficiency of IVD results in an accumulation of isovaleric acid, which is toxic to the central nervous system and leads to isovaleric acidemia7Encodes for Isovaleryl-CoA dehydrogenase, a mitochondrial matrix enzyme that catalyzes the third step in leucine catabolism.
31 LMBRD1 NESI; LMBD1; MAHCF; C6orf209 6q13Cobalamin F disorderCognitive and neurological impairment4Encodes for LMBR1 domain containing 1, involved in transport and metabolism of cobalamin
32 MAN2B1 MANB; LAMAN 19p13.13Methylmalonic acidemia with homocystinuriaDevelopmental delay, eye defects, neurological problems, and blood abnormalities.2Encoding for alpha-mannosidase. Non-functional LMBD1 protein prevents the release of vitamin B12 from lysosomes. Ultimately decrease in the production of methionine and accumulation of homocysteine
33 MILYCD MCD 16q23.3Malonic AciduriaDevelopmental delay in early childhood, seizures, hypotonia, diarrhea, vomiting, metabolic acidosis, hypoglycemia, ketosis, abnormal urinary compounds, lactic acidemia, and hypertrophic cardiomyopathy1Encodes for malonyl-CoA decarboxylase involved in the metabolism of fatty acids synthesis and is important in muscle and brain metabolism
34 MMAA cblA 4q31.21Vitamin B12-responsive methylmalonic acidemia type cblAIn the neonatal period the disease can present with lethargy, vomiting, hypotonia, hypothermia, respiratory distress, severe ketoacidosis, hyperammonemia, neutropenia, and thrombocytopenia and can result in death within the first four weeks of life. In the infantile/non-B12-responsive phenotype, infants are normal at birth, but develop lethargy, vomiting, dehydration, failure to thrive, hepatomegaly, hypotonia, and encephalopathy within a few weeks to months of age. Major secondary complications of methylmalonic acidemia include: intellectual impairment (variable); tubulointerstitial nephritis with progressive renal failure; “metabolic stroke”2Encodes for product involved in certain proteins, fats (lipids), and cholesterol and transport of vitamin B12
35 MMAB cblB 12q24.11Vitamin B12-responsive methylmalonic acidemia type cblBMajor secondary complications of methylmalonic acidemia includes intellectual impairment (variable); tubulointerstitial nephritis with progressive renal failure; “metabolic stroke” (acute and chronic basal ganglia injury) causing a disabling movement disorder with choreoathetosis, dystonia, and para/quadriparesis; pancreatitis; growth failure; functional immune impairment; and optic nerve atrophy.1Metabolism of cobalamin associated B
36 MMACHC cblC 1p34.1Methylmalonic acidemia with homocystinuriaIn Adolescents and adults, may present neuropsychiatric symptoms, progressive cognitive decline, thromboembolic complications, and/or sub-acute combined degeneration of the spinal cord.2Encoded enzyme helps in converting vitamin B12 into adenosylcobalamin (AdoCbl) or methylcobalamin.
37 MMADHC cblD; C2orf25; CL25022 2q23.2Homocystinuria, Methylmalonic acidemia, Methylmalonic acidemia with homocystinuriaInfants may present with poor feeding and slow growth, neurologic abnormality, and, rarely, hemolytic uremic syndrome (HUS).1Encodes for a protein involved in converting vitamin B12 into adenosylcobalamin (AdoCbl) or methylcobalamin (MeCbl)
38 MMUT MCM; MUT 6p12.3Methylmalonic acidemiaAll phenotypes are characterized by periods of relative health and intermittent metabolic decompensation. Major secondary complications of methylmalonic acidemia includes intellectual impairment (variable); tubulointerstitial nephritis with progressive renal failure; “metabolic stroke” (acute and chronic basal ganglia injury) causing a disabling1Encodes for Methyl malonyl CoA mutase, responsible for a particular step in the breakdown of several protein building blocks (amino acids), specifically isoleucine, methionine, threonine, and valine. The enzyme also helps break down certain types of fats (lipids) and cholesterol. Among its related pathways are Amino Acid metabolism and Carbon metabolism
39 MOCS1 MIG11; MOCOD; MOCS1A; MOCS1B 6p21.2Combined molybdoflavoprotein enzyme deficiencyMolybdenum cofactor deficiency is a rare condition characterized by brain dysfunction (encephalopathy), atrophy of brain tissue, microcephaly7Encodes for protein involved in in the formation (biosynthesis) of a molecule called molybdenum cofactor.
40 MOCS2 MPTS; MCBPE; MOCO1; MOCODB 5q11.2Combined molybdoflavoprotein enzyme deficiencyRare autosomal recessive metabolic disorder characterized by neonatal onset of intractable seizures, opisthotonus, and facial dysmorphism associated with hypouricemia and elevated urinary sulfite levels2Encodes for molybdopterin synthase involved in formation (biosynthesis) of a molecule called molybdenum cofactor.
41 MTHFR MTHR_HUMAN 1p36.22Homocystinuria due to MTHFR deficiencyA genetic mutation that may lead to high levels of homocysteine in the blood and low levels of folate and other vitamins. Leads to depression, anxiety, bipolar disorder, nerve pain etc.2A key enzyme involved in the metabolism of folate and break down of homocysteine.
42 MTR MS; HMAG; cblG 1q43HomocystinuriaAdolescents and adults, may have neuropsychiatric symptoms, progressive cognitive decline, thromboembolic complications, and/or subacute combined degeneration of the spinal cord.3Encodes for methionine synthase involved in converting amino acid homocysteine to methionine
43 MTRR MSR; cblE 5p15.31HomocystinuriaELevated levels of homocysteine results neurodegenerative disorders, recurrent pregnancy loss, neural tube defects5Encodes for methionine synthase reductase. Involved in conversion of homocysteine to methionine.
44 NAGLU NAG; CMT2V; MPS3B; UFHSD; MPS-IIIB 17q21.2Charcot-Marie-Tooth disease, Mucopolysaccharidosis type IIIA rare autosomal recessive lysosomal storage disease with signs and symptoms of behavioral changes, sleep disorders, mental developmental delays, and seizures1Encodes for N-acetyl-alpha-glucosaminidase involved in the step-wise breakdown of large molecules called glycosaminoglycans
45 NPC1 NPC; POGZ; SLC65A1 18q11.2Niemann-Pick disease,Patients with the classical phenotype present progressive neurological disease in late infancy to adolescence, with clumsy gait, ataxia, school failure, behavior problems and a characteristic supranuclear downward gaze paralysis1Encodes for a protein present in membranes of lysosomes and endosomes, involved in movement of cholesterol and other types of fats (lipids) within cells and across cell membranes.
46 NPC2 HE1; EDDM1 14q24.3Niemann-Pick disease,Patients with the classical phenotype present progressive neurological disease in late infancy to adolescence, with clumsy gait, ataxia, school failure, behavior problems and a characteristic supranuclear downward gaze paralysis3Encodes for a protein present in membranes of lysosomes and endosomes, involved in movement of cholesterol and other types of fats (lipids) within cells and across cell membranes.
47 OTC OCTD; OTCD Xp11.4Ornithine carbamoyl transferase deficiencyA rare, genetic disorder of urea cycle metabolism and ammonia detoxification. Ornithine transcarbamylase (OTC) deficiency neuropsychological complications include developmental delay, learning disabilities, intellectual disability, attention-deficit/hyperactivity disorder (ADHD), and executive function deficits1Urea cycle metabolism and ammonia detoxification
48 PCBD1 PCD; PHS; DCOH; PCBD 10q22.1Tetrahydrobiopterin deficiencyAbnormality of the nervous system; Brain and/or spinal cord issue; Neurologic abnormalities; Neurological abnormality3Codes for enzyme pterin-4 alpha-carbinolamine dehydratase involved in recycling of tetrahydrobiopterin. Tetrahydrobiopterin helps in converting phenylalanine into another amino acid, tyrosine
49 PCCA Propionyl-CoA Carboxylase Subunit Alpha 13q32.3Propionic acidemiaDevelopmental delays or regression, movement disorders, or cardiomyopathy.11Codes for propionyl-CoA carboxylase subunit alpha that Helps in propionate metabolism
50 PCCB PCCase Subunit beta 3q22.3Propionic acidemiaGrowth impairment, intellectual disability, seizures, basal ganglia lesions2Encodes for an enzyme propionyl-CoA carboxylase. Helps in breakdown of isoleucine, methionine, threonine, and valine
51 PDSS1 DPS; SPS; TPT; COQ1; TPRT; COQ1A; TPT 1; hDPS1; COQ10D2 10p12.1Coenzyme Q10 deficiencyIn the most severe cases, the condition becomes apparent in infancy and causes severe brain dysfunction combined with muscle weakness (encephalomyopathy) and the failure of other body systems3Involved in CoQ biosynthesis
52 PDSS2 DLP1; COQ1B; hDLP1; COQ10D3; C6orf210; bA59I9.3 6q21Coenzyme Q10 deficiency, primary 1Deficiency is usually associated with multisystem involvement, including neurologic manifestations such as fatal neonatal encephalopathy with hypotonia; a late-onset slowly progressive multiple-system atrophy-like phenotype (neurodegeneration with autonomic failure and various combinations of parkinsonism and cerebellar ataxia, and pyramidal dysfunction); and dystonia, spasticity, seizures, and intellectual disability.1Metabolism of coenzymeQ
53 PNP NP; PUNP; PRO1837 14q11.2Purine-nucleoside phosphorylase deficiencyA rare immune disease characterized by progressive immunodeficiency leading to recurrent and opportunistic infections, autoimmunity and malignancy as well as neurologic manifestations1Eccodes for Purine-nucleoside phosphorylase. PNP catalyzes the reversible cleavage of inosine to hypoxanthine and guanosine to guanine
54 PNPO PDXPO; HEL-S-302 17q21.32Pyridoxal phosphate-responsive seizuresPNPOD is an autosomal recessive inborn error of metabolism resulting in vitamin B6 deficiency that manifests as neonatal-onset of severe seizures and subsequent encephalopathy.1Encodes for pyridoxamine 5′-phosphate oxidase. It is involved in the metabolism of vitamin B6.
55 PTS PTPS 11q23.1BH4-deficient hyperphenylalaninemia A6-pyruvoyl-tetrahydropterin synthase (PTPS) deficiency is one of the causes of malignant hyperphenylalaninemia due to tetrahydrobiopterin deficiency. Not only does tetrahydrobiopterin deficiency cause hyperphenylalaninemia, it is also responsible for defective neurotransmission of monoamines because of malfunctioning tyrosine and tryptophan hydroxylases1Encodes for 6-pyruvoyltetrahydropterin synthase. Indirectly involved in processing of several amino acids
56 QDPR DHPR; PKU2; HDHPR; SDR33C1 4p15.32Dihydropteridine reductase deficiencyAn autosomal recessive condition characterized by BH4-defecient hyperphenylalaninemia, depletion of dopamine and serotonin, and progressive cognitive and motor deficits.2Encodes for Quinonoid dihydropteridine reductase (QDPR) catalyses the regeneration of tetrahydrobiopterin (BH4), a cofactor for monoamine synthesis, phenylalanine hydroxylation and nitric oxide production
57 SGSH HSS; SFMD; MPS3A 17q25.3Sanfilippo syndromeA rare autosomal recessive lysosomal storage disease with symptoms of behavioral changes, sleep disorders, mental developmental delays3Metabolism of mucopolysaccharide. Glycosaminoglycan metabolism in Lysosome
58 SLC2A1 CSE; PED; DYT9; GLUT; DYT17; DYT18; EIG12; GLUT1; HTLVR; GLUT-1; SDCHCN; GLUT1DS 1p34.2GLUT1 deficiency syndrome 1Classic phenotype is characterized by infantile-onset seizures, delayed neurologic development, acquired microcephaly, and complex movement disorders.1Encodes for a protein called protein called the glucose transporter protein type 1 (GLUT1). n the brain it is involved in the movement of glucose across the blood brain barrier. Protect and maintain neurons.
59 SLC6A19 HND; B0AT1 5p15.33Hartnup diseaseA rare metabolic disorder belonging to the neutral aminoaciduria, mainly characterized by skin photosensitivity, ocular and neuropsychiatric features, due to abnormal renal and gastrointestinal transport of neutral amino acids1Codes for protein called system B(0) neutral amino acid transporter 1 (B0AT1). B0AT1 transports the neutral amino acids
60 SPR SDR38C1 2p13.2Dopa-responsive dystonia due to sepiapterin reductase deficiencyDeficiency (SRD), which ranges from significant motor and cognitive deficits to only minimal findings. Clinical features include motor and speech delay, parkinsonian signs, intellectual disability, psychiatric and/or behavioral abnormalities1Codes for epiapterin reductase enzyme. Participates indirectly in processing amino acids
The list of genes involved in neurodegenerative disorders with the location on the human chromosome, the disease involved, and the number of transcripts.

Nucleobase Compositional Analysis

The nucleobase composition was calculated for all the 183 CDSs. The number of A, T, C, G nucleotides present, the % of the nucleotides, and % composition at the first, second, and third codon position (A1, T1, C1, G1, A2, T2, C2, G2, A3, T3, C3, G3) were determined. Total AT% and GC%, along with AT3% and GC3%, were calculated. Calculations of %GC at the first and second place (GC12) and GC% content at the third place were also included. The %GC at different codon positions (%GC1,%GC2, and %GC3) helps decipher the relationship between the codon usage and compositional, selectional, and mutational forces (Mazumder et al., 2014). The above calculations were performed using informatics software developed by Puigbò et al. (2008) and available at http://genomes.urv.es/CAIcal/ (Supplementary Table 1).

Dinucleotide Abundance

Sixteen dinucleotides obtained from combining four nucleotides were subjected to odds ratio analysis. The odds ratios, the results of dividing the observed frequencies by expected frequencies, were calculated and presented in Table 2. The calculations were performed using DNASTAR Lasergene Inc.[1] software. The dinucleotides with an odds ratio less than 0.78 are considered underrepresented, and greater than 1.25 are considered overrepresented (Kunec and Osterrieder, 2016).
TABLE 2

Dinucleotide analysis showing odds ratio of genes.

Dinucleotide odds ratio
Gene nameAAACAGATCACCCGCTGAGCGGGTTATCTGTT
ABCD1 0.4290.7441.0870.4861.1231.7090.9941.2950.9231.7311.8100.7870.2720.9371.3590.315
ADSL 0.9240.9131.2601.0221.3450.8920.5211.0451.2891.1780.9950.7180.5510.8211.4131.113
ALDH7A1 1.3400.7291.3290.9451.1070.7400.3191.0801.3340.9671.3610.9240.5620.8101.5770.875
APTX 1.4580.7961.4200.9691.2650.9730.2241.0281.3650.9421.2370.7340.5530.7771.4000.859
ARG1 1.5801.1251.3770.8621.2210.8200.2041.1731.4550.6701.2870.6700.6880.8021.2150.850
ARSA 0.2590.8830.8570.4211.0502.3580.7211.6080.8431.6191.6480.6140.2700.8761.4970.477
ASL 0.5590.9211.2800.5911.3861.3450.7211.1501.1771.4951.7510.7030.2180.8411.3860.477
ASS1 1.0990.8661.2410.7881.4221.5120.6590.9311.1371.0991.3960.8140.3231.0471.1630.504
BCKDHA 0.5630.9581.1680.7551.4321.5570.8871.0661.0061.4921.6170.6350.4430.9341.0780.407
BCKDHB 1.1170.6591.0881.1651.0830.9000.4091.0541.0830.9341.2370.8090.7460.9531.3291.434
CBS 0.6670.8881.2700.6191.2461.3490.9830.9151.3971.4271.8430.7560.1350.8281.3270.350
COQ2 0.8940.8470.8810.8470.9411.1160.6721.2770.9081.2771.4050.8000.7260.7661.4321.210
COQ8A 0.6590.7911.2930.6341.3261.6550.7251.1691.1941.4741.5560.5430.1890.9551.2020.634
COQ9 0.6530.9041.4900.6691.5561.2550.5691.0541.2051.5231.4060.7200.3010.7531.3890.552
CPS1 1.3630.8451.1871.1101.2910.9130.1951.0361.2940.7851.0510.8200.5530.8921.5211.143
CYP27A1 0.6520.8531.1940.5621.1341.5850.6821.3341.1131.4851.4340.7020.3610.8131.4240.672
DBT 1.7350.8071.1381.2931.1160.7510.1991.1161.1600.7070.7070.7960.9610.9171.3261.271
DDC 0.9800.8071.1730.8791.2441.1190.5401.1401.2231.2821.3250.7010.3810.8341.5050.865
DLD 1.5980.6911.3131.2791.1220.4370.1600.9481.3010.8130.9360.9020.8590.7251.5441.373
GAMT 0.4851.0010.9160.6001.2431.7801.0741.0851.0531.5801.8220.6110.2210.8221.2530.453
GATM 1.1880.9581.0171.2401.2841.0540.4531.0391.2400.9210.8690.6610.6900.8981.3511.136
GCDH 0.5530.8241.1990.6641.1681.3770.8791.1681.2541.4331.7220.7260.2580.9591.3410.473
GCDH1 1.0340.6351.3830.7931.0511.3320.9670.8491.2371.5291.5010.6910.5170.7031.1130.663
GNS 1.1101.0421.1190.9261.2551.0900.3861.1971.1390.8781.0520.8200.6850.9171.3411.042
HCLS 0.9820.9261.3330.7591.2211.1770.4951.1531.3140.8831.2170.8710.4821.0601.2400.886
HGSNAT 0.8720.7330.8770.9610.9851.0520.4641.4421.0571.0451.1730.8240.5291.1121.5861.288
HPRT1 1.4630.7561.0491.5120.9270.6100.1950.9511.4630.6340.9760.8290.9270.6831.6831.341
IDS 0.8070.9050.8821.0041.1491.6330.5271.2521.1060.8971.0080.7360.5351.1261.3301.102
IDUA 0.3141.0270.8270.2761.1032.1851.5491.1960.8232.0071.5020.6790.2040.8151.1330.361
IVD 0.7170.7011.2100.7881.1581.1500.5681.3211.1441.4841.5560.7240.3900.8621.5810.647
LMBRD1 1.5310.7550.9111.4540.8530.4990.2031.2271.0400.6280.6720.7961.2270.9001.3501.954
MAN2B1 0.6221.0600.9730.5591.3051.5610.9491.1760.9861.5531.5080.6960.2950.8181.3190.617
MLYCD 0.7780.7561.1240.4651.0161.2751.2101.2421.0801.8151.6750.6590.2480.8971.2210.540
MMAA 1.6050.9171.2991.0451.1590.7390.2421.0701.2610.7131.0320.8030.8410.8411.2361.197
MMBB 0.9360.8511.3190.6811.1911.3620.8940.9791.2551.2771.5110.6810.4040.9361.0000.723
MMACHC 0.5251.0391.2490.5981.2172.2030.4931.3741.1441.3111.1120.5670.5250.7341.2800.630
MMADHC 1.5640.8451.1511.2221.0790.5930.1441.0971.3300.6470.7550.8990.7910.8271.6001.456
MMUT 1.6550.7461.1871.3001.1010.6750.2201.0521.2860.8030.9800.7180.8450.8241.4001.208
MOCS1 0.6870.6971.3910.6851.3471.5550.5171.2141.1541.4311.5330.6720.2670.9511.3540.546
MOCS2 1.6540.4621.5191.2501.0000.7880.3270.8851.3650.8850.6920.9420.8460.8651.3651.154
MTHFR 0.7840.9411.3150.6031.2171.5910.6101.2211.3661.1771.4020.6770.2760.9291.2960.595
MTR 1.3420.8181.2911.0991.1550.9500.3111.0211.3750.9311.1390.7060.6770.7381.4101.036
MTRR 1.3590.8941.2960.9791.2981.0060.3171.1061.1600.8940.8890.7380.7090.9331.1811.243
NAGLU 0.3800.7601.0470.4301.1041.6930.9111.4130.8181.9081.8430.6530.3160.7601.4200.545
NPC1 0.8590.9380.9180.9301.1971.0550.4921.2680.9760.9640.9840.9930.6091.0551.5271.235
NPC2 1.4990.7081.2270.8851.2391.4280.3781.1800.9091.0970.9200.9200.6730.9911.3220.625
OTC 1.5490.8571.2931.1731.2780.7220.2711.0831.2480.8870.9170.6620.7970.8871.2331.143
PCBD1 1.1471.1071.2880.7041.6501.0260.3221.0871.0871.2081.2880.7850.3220.7451.5090.725
PCCA 1.4080.7301.2851.2091.1940.6020.3680.9081.2940.8961.0400.9250.7370.8431.4621.097
PCCB 0.8880.8051.0620.9951.3681.0330.5430.9751.1161.1211.3150.8930.3780.9611.5231.024
PDSS1 1.3770.8651.2071.1351.1460.8430.6340.9591.2780.9860.7990.8370.7820.8871.2621.003
PDSS2 1.1880.7071.2411.0411.1480.7870.3741.2681.1611.0281.0680.8140.6811.0541.3881.054
PNP 0.9570.9761.2520.9021.3441.0310.2761.0861.2520.8651.2700.9760.5340.8651.5650.847
PNPO 0.8970.8561.2230.5300.8971.4060.5911.3861.3661.1411.4880.8150.3460.8761.5080.673
PTS 1.7210.7321.0621.4650.9150.6960.4390.8051.2810.8050.9890.9521.0250.6221.5740.915
QDPR 0.9430.7211.1170.7561.0471.1290.7561.2101.1751.5131.8390.5930.3490.7801.4310.640
SGSH 0.4671.1510.9560.3421.1732.0041.1781.3291.0001.5871.2670.7070.2760.9421.1600.462
SLC2A1 0.4760.6600.9090.7471.2561.5160.7361.3960.8341.4511.3210.9200.2271.2771.5590.714
SLC6A19 0.3950.9500.8320.6971.3111.6130.8401.4790.8741.2861.3610.8570.2941.3951.3450.471
SPR 0.7950.7950.9380.2850.8151.4271.0391.6100.9781.8751.6310.7540.2240.7951.6310.408
Dinucleotide analysis showing odds ratio of genes.

Relative Synonymous Codon Usage Analysis

RSCU, an index representing codon bias, is the ratio of the observed to the expected frequency of a codon coding for a particular amino acid among all synonymous codons (Deb et al., 2021). The RSCU values were obtained using informatics software developed by Puigbò et al. (2008) and available at http://genomes.urv.es/CAIcal/ (Puigbò et al., 2008). The length of the gene or amino acid composition does not affect RSCU values. Values above 1.6 and below 0.6 are considered overrepresented and underrepresented codons, respectively (Yu et al., 2021).

The Parity Rule 2 Plot Analysis

Parity analysis shows the bias between AT and GC at the 3rd codon position. AT bias (A3%/[A3% + T3%]) and GC bias (G3%/[G3% + C3%]) are plotted on the ordinate and abscissa, respectively. Under ideal conditions, as per the rule of parity, in a strand of DNA, A = T and G = C provided there is no bias among mutation and selection (Sueoka, 1995). Therefore, at the center of the plot, where the value is 0.5, the selection and mutational forces are equal (Sueoka, 1999).

Neutral Evolution Analysis

Neutrality plots are useful in quantifying mutational and other forces such as selectional forces (Khandia et al., 2019). Neutrality is derived by plotting %GC12 vs.%GC3. Here, the regression coefficient is considered the point of equilibrium between mutation and selection pressure (Sueoka, 1988). A slope value approximating 1 shows the dominance of mutational forces (Yu et al., 2021).

Codon Adaptation Index

The CAI value expresses the adaptability and expression of any gene within the organism (Munjal et al., 2020). The values of CAI range between 0 and 1. Values approaching 1 indicate the gene has higher expressivity, while CAI values near zero represent lower expressivity (Sharp and Li, 1987). The calculations were performed using informatics software developed by Puigbò et al. (2008) and available at http://genomes.urv.es/CAIcal/. The codon usage table for Homo sapiens was obtained from the codon usage database[2] encompassing 93487 coding sequences belonging to 40662582 codons.

Intrinsic Codon Bias Index

The intrinsic codon bias index (ICDI) is an analogous index to the Nc value and is independent of optimal codons. The values of ICDI span between 0 and 1. A value of 1 indicates extremely high bias, while a value 0 indicates equal usage of codons. Values lower than 0.3 indicate comparatively low bias (Freire-Picos et al., 1994). The ICDI values were obtained using the formula provided by Freire-Picos et al. (1994).

Nc Determination and Plotting Nc-GC3% Curve

The Nc-GC3 curve was plotted to evaluate the role of compositional constraints. Nc values, which reveal bias in codon usage, were determined using the CodonW 1.4.4 program. The lowest and the highest Nc values are 20 and 61, respectively (Munjal et al., 2020). An Nc value of 61 results from a situation when all the codons are used equally for coding amino acids, so no bias is detected. In contrast, a value of 20 results from only one codon being used among various synonymous codons, indicating the highest bias. In general, an Nc value less than 35 indicates a higher codon preference, and greater than 50 indicates random codon usage (Wang et al., 2018).

Principal Component Analysis

In principal component analysis (PCA), a multivariate statistical approach in codon usage analysis, two major axes (axis 1 and axis 2) represent the two components contributing the most variation to the data. In the PCA, the RSCU values of each sequence were distributed into a 59-dimensional vector corresponding to the 59 synonymous codons. The stop, initiation (AUG), and tryptophan (UGG) codons were excluded.

Protein Indices

Indices related to proteins were calculated using appropriate formulas available in the literature or software programs. The hydropathicity index GRAVY, with combined features of hydrophobicity or hydrophilicity, was calculated using the formula of Kyte and Doolittle (1982). Its value ranges between –2 and +2 for most proteins, with positive values indicating more hydrophobic proteins and vice versa. The AROMA value represents the distribution of aromatic amino acids (tryptophan, tyrosine, and phenylalanine) in the protein. Variation in these indices indicates the selection pressures. The PI or isoelectric point of any protein is the pH at which a protein has no net electrical charge. Hydrophobicity values were also determined as they have a role in determining protein-protein interactions (Young et al., 1994). This notion is further strengthened because few of the most extensive hydrophobic surfaces are found within the membrane proteins (Hagemans et al., 2015). The aliphatic index measures the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine) and was calculated by the formula given by Ikai (1980). The aliphatic index is positively correlated with the thermal stability of the protein. The % acidic, basic, and neutral amino acids were also calculated. All the indices were calculated using ExpasyProtparam tool (Gasteiger et al., 2005) or Peptide 2.0 tool available at https://www.peptide2.com/.

Statistical Analysis

Correlation, regression, and correspondence analyses were performed and plotted in PAST4 software. Basic calculations, including sums and averages, were done in Microsoft Windows Excel.

Results

Compositional Analysis

The genes involved in neurodegeneration exhibited a widely variable compositional pattern. The % nucleotide composition of all four nucleotides is given in Figure 1. The %A ranged from 13.74% (IDUA) to 31.12% (DBT), %C ranged from 16.62 (DLD) to 38.98% (IDUA),%T ranged between 14.80% (IDUA) to 33.92%, and %G ranged between 19.57% (LMBRD1) and 33.91% (CBS). The nucleotide component C demonstrated the greatest range of variability (22.35%), while nucleotide G had the least (14.34%). Stem graph (Figure 1) showing the overall% of all the 4 nucleotides in different genes. Overall, %GC varied between 36.93% (LMBRD1) and 71.45% (CBS), while GC3 varied between 26.82% (LMBRD1) and 89.14% (CBS).
FIGURE 1

Stem graph for nucleotide composition of genes envisaged in present study.

Stem graph for nucleotide composition of genes envisaged in present study.

GC Content Correlation With Protein Length

GC components (%GC12 and %GC3) were analyzed to determine their relationships with protein length. It was observed that, on average, %GC12 content was relatively constant and contributed to 40%-60% composition, while %GC3 content largely fluctuated towards both low and high values along the length of protein (Figure 2).
FIGURE 2

Relation of GC content GC12 and GC3 with the length of the protein. Red dots are GC12 composition and blue dots are GC3 position.

Relation of GC content GC12 and GC3 with the length of the protein. Red dots are GC12 composition and blue dots are GC3 position.

Relationship of Compositional Properties and Codon Bias

A significant positive correlation was observed between Nc and different nucleotides. C1, A2, and G3 (r = –0.197, p < 0.05, r = –0.243 p < 0.01, r = –0.198 p < 0.01 respectively) were negatively correlated with Nc, while T, T1, T2, and C2 were positively correlated (r = 0.0.165, p < 0.05, r = 0.156 p < 0.05, r = 0.237, p < 0.01, r = 0.286 p < 0.001 respectively).

Relationship Between Codon Usage Bias and Nucleotides at the Third Codon Position

Regression coefficients 0.547, 0.555, –0.515, and –0.482 were obtained for Nc-A3, Nc-T3, Nc-G3, and Nc-C3, respectively. The negative regression coefficient between Nc-G3 and Nc-C3 suggests a positive influence of C3 and G3 on CUB (Figure 3).
FIGURE 3

Regression analysis between the Nc and third position of codon.

Regression analysis between the Nc and third position of codon.

Relationship Between Overall Composition and Composition at the Third Position of the Codon

Regression analysis between the overall nucleotide content (%A, %T, %C, %G) and their respective 3rd position of the codon shows the effects of compositional properties on mutational force (Figure 4). The nucleotides C and A contributed almost equally, 49.3 and 48.7%, while G contributed the least (37.6%) and toward mutational pressure.
FIGURE 4

Regression analysis for overall nucleotide content and nucleotide content at third position.

Regression analysis for overall nucleotide content and nucleotide content at third position.

Effects of Dinucleotide Content

The frequency of dinucleotide occurrence is of great interest since some of the dinucleotide combinations significantly deviate from the expected value. The measure of this deviation is calculated as an odds ratio (frequency observed/expected). The CpG and TpA dinucleotide combinations are commonly underrepresented dinucleotides (Venter et al., 2001). Notably, we found that 23.33% of genes displayed unbiased TpA usage with an odds ratio of more than 0.78. In fact, in the IUDA gene, TpA was overrepresented (odds ratio 1.54). Nucleotide composition is also reported to play an important role in deciding the TpA or CpG content (Munjal et al., 2020); however, we report an exception to this as, despite lower GC content, the CpG dinucleotide exhibited its presence unbiased in GCDH and MAN2B1 genes. Similarly, despite low AT content, the TpA dinucleotide was present in an unbiased manner LMBRD1, MMAA, MOCS2 genes. The RSCU value indicates the relative frequency of the codon. The codons with RSCU values above 1.6 and below 0.6 are overrepresented and underrepresented, respectively (Paul et al., 2018). Across the genes, GC ending codons had an RSCU value below 0.6 (TCG, CCG, ACG, and GCG codons are unrepresented with RSCU values below 0.6 in 81.66, 70, 68, and 72% of sequences, respectively). TA ending codons TTA, CTA, ATA, and GTA were also underrepresented with RSCU values below 0.6 in 71.67, 76.67, 71.67, and 68.33% of sequences, respectively. Among all codons, CTG and GTG had maximum RSCU values (average of 2.55 and 2.01, respectively) with the highest RSCU values of 4.73 and 3.27 for CTG and GTG, respectively. Table 3 lists the RSCU values of the genes analyzed in this study, highlighting the codons with the highest RSCUs for each amino acid.
TABLE 3

RSCU values of genes highlighting the codons with highest RSCU values.

Name of genes
AMINOACIDSOne letter amino acidCodon ABCD1 ADSL ALDH7A1 APTX ARG1 ARSA ASL ASS1 BCKDHA BCKDHB CBS COQ2 COQ8A COQ9 CPS1 CYP27A1 DBT DDC DLD GAMT
PhenylalanineFTTT0.2610.818751.28750.9913041.0106670.5386670.6670.5330.5561.4853330.27081.3850.41.1111.2580.6671.2381.2541.545250.8445
FTTC1.7391.181250.71251.0086960.9893331.4613331.3331.4671.4440.5146671.72920.6151.60.8890.7421.3330.7620.7460.454751.2
LeucineLTTA00.6390.4560.2297390.3120.0963330.079750.1760000.4325000.548750.4480.8570.5345711.546750
LTTG0.2551.070750.91151.0244350.9466670.3863330.21650.3530.5451.7790.31041.02650.7740.5141.2910.53710.5185711.631250.316
LCTT0.2551.5660.98651.4213040.9086670.290.405250.5290.3641.7790.24380.97350.290.3431.0060.1790.8570.6204291.459750
LCTC1.660.550751.44250.9225220.7110.972.326752.1181.0910.7531.07040.5941.3551.5430.685751.4331.1430.5224290.1841.263
LCTA0.1910.232250.1520.4495221.3043330.1926670.216500.5450.8440.12240.91800.6860.639750.2690.7140.3974290.496750
LCTG3.6381.94152.0521.9523041.8173334.0643332.7552.8243.4550.8444.25322.05553.5812.9141.82853.1341.4293.4067140.68154.421
IsoleucineIATT0.3751.44251.35451.6288261.59666701.014750.8890.130.9310.76161.50.8570.751.749251.2631.50.5288571.72750.231
IATC2.5311.22251.2660.9160430.8626672.5833331.81622.7391.0342.23840.6411.9291.8750.949251.5790.51.9870.584252.769
IATA0.0940.3350.3790.4553480.5406670.4166670.169250.1110.131.03400.8590.2140.3750.30150.15810.4845710.688250
ValineVGTT0.140.552750.93250.3439130.3663330.3866670.2840.13301.690.21321.9410.2420.2671.1480.3641.8950.5534292.036250
VGTC0.7720.75050.64250.5218260.5753331.2133330.813251.4671.5240.8186670.40340.48550.9700.9510.7270.4210.9075710.386251.667
VGTA0.140.60050.7421.0528260.5220000.3810.6733330.10620.360500.80.3280.1210.7370.1180.85050
VGTG2.9472.0961.6832.0808262.5366672.42.90352.42.0950.8186673.2771.2132.7882.9331.5742.7880.9472.4208570.7272.333
SerineSTCT0.6381.7741.1561.7119571.2856671.2666670.296250.51.51.80.4861.12250.61.921.3660.751.51.0722.205750
STCC1.5320.533751.05251.2781740.8856672.4333330.851252.51.0710.91.60881.7622.551.21.3661.750.9381.4948570.644252.4
STCA0.6381.0680.84250.6350871.0286670.2166670.510750.250.8570.91.12280.9860.750.961.4260.51.8751.2392862.101751.2
STCG1.660.1637500000.68150.250.4290.30.5360.252000.1190.7500.1810.23950.6
SAGT0.1281.291251.4741.0452171.0856670.4333331.0220.251.0711.20.39121.0070.450.720.8321.250.9381.2184290.404250
SAGC1.4041.1691.4741.3293911.7143331.652.6382.251.0710.91.85440.87051.651.20.89110.750.7937140.404251.8
ProlinePCCT0.4571.019752.09551.7973481.3440.989333010.52.0133330.4470.68550.9140.8571.1231.1431.8571.3721.1630.211
PCCC2.41.58150.3810.8572170.6306671.7446673.23651.42.8330.561.82361.4822.1711.7141.01051.60.5711.1844290.529752.316
PCCA0.3430.980251.14250.8074782.0253330.6806670.38210.51.320.6471.13150.80.8571.642250.81.4291.0988571.85450.842
PCCG0.80.41850.3810.53760900.5853330.3820.60.1670.1066671.08240.7010.1140.5710.224250.4570.1430.3451430.4530.632
ThreonineTACT0.5520.922251.7261.1753481.2213330.8296670.990750.6670.2862.3403330.09441.340.5710.6321.1870.81.6550.8565711.336250
TACC1.2410.8020.78450.9892610.5223332.2491.8532.8891.7140.2371.98621.141.7141.0531.0551.60.5521.0328570.467252.133
TACA0.8282.115251.17551.8353912.2563330.6266670.330250.2220.8571.4220.33761.260.7621.8951.5380.61.6551.2094291.865250.533
TACG1.3790.16050.314000.2950.825750.2221.14301.58180.260.9520.4210.2210.1380.9005710.3311.333
AlanineAGCT0.51.049751.04552.2092.1110.8850.554750.5710.5390.8676670.80960.99450.6960.4711.7390.8331.750.6872.4240
AGCC2.61.40351.13650.8992170.6112.4183332.266252.5712.15751.3191.80361.1122.4931.2941.1132.1670.8751.9438570.46351.778
AGCA0.251.275251.50.8917831.2776670.5210.808750.5710.58350.9663330.47280.76150.4641.2941.0430.51.250.9274291.023250.444
AGCG0.650.2720.3185000.1756670.370250.2860.7190.8466670.9141.1320.3480.9410.1040.50.1250.4420.08951.778
TyrosineYTAT0.371.446251.6750.5391740.6113330.4360.4110.7370.6961.3736670.250.7140.5710.6670.8420.63220.2938570.90450.25
YTAC1.630.553750.3251.4608261.3886671.5641.5891.2631.3040.6263331.751.2861.4291.3331.1581.36801.7061431.09551.75
HistidineHCAT0.3530.869750.62851.0926090.7223330.9063330.558750.80.1541.1850.11440.90.47611.03401.40.7162861.360750.25
HCAC1.6471.130251.37150.9073911.2776671.0936671.441251.21.8460.8151.88561.11.52410.96620.61.2837140.639251.75
GlutamineQCAA0.1670.145750.2510.5863481.1776670.1133330.168750.5880.1380.2446670.170.26650.1250.1820.538250.4570.50.1695710.71450
QCAG1.8331.854251.7491.4136520.8223331.8866671.831251.4121.8621.7553331.831.73351.8751.8181.461751.5431.51.8304291.28552
AsparagineNAAT0.1540.80651.0240.8527830.7080.5556670.75950.3750.4440.8890.3131.6360.0950.41.0410.51.0530.8707141.435250.333
NAAC1.8461.19350.9761.1472171.2921.4443331.24051.6251.5561.1111.6870.3641.9051.60.9591.50.9471.1292860.564751.667
LysineKAAA0.2140.91.1550.7841740.6780.250.5970.6670.2221.3610.2480.9080.3030.2220.9470.251.221.0415711.235250.6
KAAG1.7861.10.8451.2158261.3221.751.4031.3331.7780.6391.7521.0921.6971.7781.0531.750.780.9584290.764751.4
Aspartic acidDGAT0.51.016251.2361.5859570.8536670.4153330.4710.750.871.5373330.57240.8240.60.9331.2530.7371.4170.6331431.612750.545
DGAC1.50.983750.7640.4140431.1463331.5846671.5291.251.130.4626671.42761.1761.41.0670.7471.2630.5831.3668570.387251.455
Glutamic acidEGAA0.3530.929251.07151.1396521.5623330.2110.3390.5710.2221.3613330.23161.1430.4490.481.071750.5621.7860.8284291.496250
EGAG1.6471.070750.92850.8603480.4376671.7891.6611.4291.7780.6386671.76840.8571.5511.520.928251.4380.2141.1715710.503752
CysteineCTGT0.4440.612251.5780.67682620.4206670.30.40.41.50.92721.6530.44401.201.4290.6541431.3740
CTGC1.5561.387750.4221.32317401.5793331.71.61.60.51.07280.3471.55620.820.5711.3458570.6262
ArginineRCGT0.611.221500.94639101.0483330.41550.60.1710.1906670.22340.95350.2931.0910.8420.51.7650.2254290.264750.545
RCGC2.0341.403250.9750.40547801.4576671.453751.81.7140.6883331.73641.38950.8780.8180.7371.1671.4121.3617140.760252.182
RCGA0.1020.94151.1051.1670870.9791.1660.56550.60.3431.37733300.6760.4390.8180.6320.6671.0590.4275711.0250
RCGG2.2371.403250.490.4413910.4896672.1536672.5271.52.5711.1066671.90841.50951.9021.3640.7372.16701.45102.727
RAGA0.1020.8931.841.5377833.2450000.1711.3773330.17121.23150.4390.5451.6840.6671.4121.6257143.950
RAGG0.9150.13651.591.501871.2870.1741.03851.51.0291.2603331.96060.242.0491.3641.3680.8330.3530.90857100.545
GlycineGGGT0.6551.252250.7640.2047830.4936670.4473330.601750.5160.5160.5663330.09121.04650.4260.8421.1170.4440.9030.3985711.613750.25
GGGC2.0360.85451.0171.6986960.8891.8276671.3242.1941.9350.9656671.57640.95052.0431.4740.9371.5560.7740.8845710.716752.5
GGGA0.3641.216751.75050.4413481.9260.4940.45050.3870.2581.5570.18241.4290.4260.6321.2610.6671.6771.3442861.44050
GGGG0.9450.676250.46851.6552610.6916671.2311.623750.9031.290.9103332.150.5741.1061.0530.6851.3330.6451.3718570.228751.25
PhenylalanineFTTT1.1270.76350.910751.21.2211251.22851.5561.0220.2496671.0057141.12150.5450.7370.6670.88451.21.5291.50.7261.7145
FTTC1.1110.8731.23651.089250.80.7788750.77150.4440.9781.0836670.9942860.87851.4551.2631.3331.11550.80.4710.51.054333
LeucineLTTA0.251.26650.05750.07150.6770.42050.18950.5710.2066670.05933301.48650.05500.92301.7141.250.750.238
LTTG0.51.10.4620.875251.1610.6911250.913751.7141.4673330.1786670.4014291.2150.440.4480.2311.1670.4290.750.8440.559
LCTT0.51.55350.4041.4350.291.096250.860751.1431.0443330.9083330.9924291.6060.440.3580.9230.2651.2861.252.1560.672556
LCTC10.28651.3851.1271.1611.569751.231750.8570.9883331.190.8648570.41251.1561.5221.3851.53950.64310.6561.502556
LCTA0.250.573500.717750.4840.1498750.238750.2860.2826670.0593330.2427140.357250.220.1790.2310.74450.750.50.5620.437333
LCTG3.51.223.6921.77352.2262.07252.56551.4292.0106673.6046673.4985710.9223.6883.4932.3082.2841.1791.251.0312.590333
IsoleucineIATT0.4621.60750.82551.254751.051.2258751.3487520.5540.2110.881.691750.5360.1760.6670.31651.2861.751.5580.688556
IATC2.5381.242.17451.118251.51.279751.285750.41.7146671.7891.8975710.38852.1432.47121.416510.50.5771.794889
IATA00.152500.6270.450.4933750.36550.60.73133300.2224290.920.3210.3530.3331.26650.7140.750.8650.516667
ValineVGTT0.4711.250.6060.937751.0320.835750.88350.81.2223330.1833330.4241.619250.4860.37500.389511.5651.4470.449778
VGTC1.4120.51651.1290.937751.0320.9860.840251.20.7881.5943331.2451430.79350.6490.6251.0910.38950.3330.1740.171.009667
VGTA00.4330.17450.9140.3870.2001250.46950.8000.4697140.91350.3780.12500.9321.1671.0431.0210.293667
VGTG2.1181.82.0911.210751.5481.9781251.806751.21.9896672.2223331.8617140.67352.4862.8752.9092.291.51.2171.3622.247111
SerineSTCT0.4621.11050.62550.283750.8111.0991252.3317501.5270.2063331.4047141.902750.20.3530.6320.41452.0691.3042.0491.096778
STCC1.8462.5131.1630.765751.6221.5581.12051.0911.4666672.8891.3858570.3621.31.2350.6321.2430.2070.2611.0241.623556
STCA0.9231.40250.35850.471250.9730.838250.155751.0910.74300.4552861.17910.8820.3161.6571.8621.5651.610.594
STCG0.46201.79151.148750.1620.0990.54300.4476670.770.2118570.1980.90.8821.89500.2070.52200.174556
SAGT0.4620.3510.5370.3831.6221.0991250.9552.1820.60633301.3122861.3770.50.3530.9470.41451.2411.8260.7320.865778
SAGC1.8460.62351.5242.9480.8111.3063750.89451.6361.2096672.1346671.230.9812.12.2941.5792.27150.4140.5220.5851.645667
ProlinePCCT0.5221.28710.80950.9731.2096251.2961.7781.5383330.5121.6114291.5750.8240.8571.1431.1491.0531.751.7780.843444
PCCC2.2611.1922.251.7140.8651.6790.89801.2803331.5806671.2905711.312251.7651.2861.4291.58450.6320.50.5561.626
PCCA0.871.45150.50.333252.0540.59951.559752.2220.6873330.5636670.4102861.112750.3530.2861.4291.04652.1051.751.5560.774667
PCCG0.3480.0690.251.1430.1080.5120.24600.4933331.3430.68814301.0591.57100.22050.21100.1110.755889
ThreonineTACT0.5331.0940.19650.54551.7141.0478751.2972.5450.6233330.221.1651.8960.4910.60.2351.29551.9132.5881.6220.938111
TACC2.40.9832.1581.2121.0291.4551.313750.7272.2563331.5981.5788570.6061.6841.41.1760.61350.870.2350.9731.716778
TACA0.5331.47051.0720.8791.0291.062250.7140.7270.5610.8866671.2042861.211251.26311.6471.72751.2171.1761.4051.270778
TACG0.5330.4530.5741.36350.2290.435750.675500.561.2950.0520.28650.56110.9410.36350000.074111
AlanineAGCT0.5522.07850.62750.58551.291.3108751.309251.20.9496670.5091.0854291.9620.5240.3080.7271.37751.3331.5242.0871.194778
AGCC1.3790.7852.6311.36651.1611.5888751.55051.22.2506671.5756671.8655710.542252.0951.8462.3642.12451.0370.7620.581.945
AGCA0.5520.8960.2290.8761.1610.67250.74551.20.690.2833330.5107141.289750.5240.2050.1820.4981.3331.7141.1590.503333
AGCG1.5170.240.5131.17150.3870.427750.39450.40.1096671.6320.5380.2060.8571.6410.72700.29600.1740.357
TyrosineYTAT0.5710.61250.7691.051.1540.9951.0671.4551.2626670.0316670.6611.201250.5290.4621.14301.1431.4291.2631.186667
YTAC1.4291.38751.2310.950.8461.0050.9330.5450.7373331.3016671.3390.798751.4711.5380.85720.8570.5710.7370.813333
HistidineHCAT0.61.1390.3251.2751.0910.7576250.648251.61.1990.4986670.5638571.1430.2960.7691.3330.80350.7691.751.0670.974444
HCAC1.40.8611.6750.7250.9091.2423751.351750.40.8011.5013331.4361430.8571.7041.2310.6671.19651.2310.250.9331.025556
GlutamineQCAA00.30950.1050.84450.50.444750.12700.5780.1083330.2274291.643750.2480.250.3640.503510.8890.5880.175222
QCAG21.69051.8951.15551.51.555251.87321.4221.8916671.7725710.356251.7521.751.6361.496511.1111.4121.824778
AsparagineNAAT0.6671.0210.66550.363250.9710.520750.9702520.7096670.2176671.2277141.5240.7080.26711.121.3331.440.477778
NAAC1.3330.9791.33451.636751.0291.479251.0297501.2903331.1156670.7722860.4761.2921.73310.900.6670.561.522222
LysineKAAA0.61.16650.43750.474750.6670.64050.93510.65533300.2807141.265750.5830.5450.80.3431.5331.2861.1430.562556
KAAG1.40.83351.56251.525251.3331.35951.06511.3446671.3333331.7192860.734251.4171.4551.21.6570.4670.7140.8571.437444
Aspartic acidDGAT0.5451.0270.3641.42750.6210.6798750.785251.1431.1426670.1320.7481.623250.7310.3640.5450.6411.0531.4671.550.971111
DGAC1.4550.9731.6360.57251.3791.3201251.214750.8570.8573331.8681.2520.376751.2691.6361.4551.3590.9470.5330.451.028889
Glutamic acidEGAA0.4711.0030.2860.48050.90.9340.983251.5561.0676670.3950.3745711.558250.3110.5790.8240.52251.0831.6671.7090.528556
EGAG1.5290.9971.7141.51951.11.0661.016750.4440.9323331.6051.6254290.441751.6891.4211.1761.47750.9170.3330.2911.471444
CysteineCTGT00.64450.73351.083250.8571.24550.967251.50.6236670.9111.0472861.697250.51.14301.121.11120.650889
CTGC21.35551.26650.916751.1430.75451.032750.51.3763331.0890.9527140.302751.50.85720.900.88901.349111
ArginineRCGT01.21450.18500.4140.1911250.1860.50.3356670.260.4791430.67850.4290.1671.3330.90.4620.4621.1430.381
RCGC2.6670.68551.29250.6750.6210.4113751.3880.51.6723.3386671.2262860.305252.4862.33310.90.2310.4620.4290.550667
RCGA0.6671.41450.36950.150.6210.3213751.338510.6476670.5683330.8130.914750.4290.33301.81.3851.8461.2860.695667
RCGG2.6670.6432.03131.4481.1861250.576500.9333330.8916671.8182860.44151.543210.90.46200.2862.285556
RAGA01.07151.01450.6751.2412.2813751.005751.50.7386670.1730.2062862.440.4290.51.3331.21.8462.3081.7141.167444
RAGG00.97151.1081.51.6551.6086251.50552.51.6720.7683331.4571.220.6860.6671.3330.31.6150.9231.1430.919556
GlycineGGGT0.4440.650.62550.513250.7180.3728750.499750.50.6176670.7373330.5358571.245750.6290.48800.30950.9141.4741.1030.621556
GGGC1.4811.3331.64051.841751.0261.712251.0807511.5483332.3571.9131430.885751.61.7561.5651.69050.3430.2110.4831.433
GGGA0.5931.850.43250.5261.1281.0951251.55521.0546670.2760.3797141.557250.80.391.0430.1432.0571.68420.926556
GGGG1.4810.16651.3011.1191.1280.8196250.864750.50.7793330.6296671.1710.31150.9711.3661.3911.8570.6860.6320.4141.018889
PhenylalanineFTTT0.711.3446671.25780.5291.0110.3553331.4670.8333331.2671.2591.2411.8331.0670.7141.50.92850.6566670.4210.140
FTTC1.290.6553330.74221.4710.9891.6446670.5331.1666670.7330.7410.7590.1670.9331.2860.51.07151.3433331.5791.862
LeucineLTTA0.0910.9013330.711200.40300.61201.1302730.16050.8363330.7060.692000.2860.12800.080.125
LTTG0.4550.8693330.59580.5521.0301.4690.1666671.0812731.2820.8891.1760.2310.5221.0910.5710.25666700.080.5
LCTT0.7270.9113331.53680.4140.940.3080.6120.9333330.9662730.7951.6731.1760.9231.0431.6360.2860.1110.30500.5
LCTC1.2271.1011.19240.8970.8961.6770.85700.6576361.1990.8823330.7061.1540.7831.09122.0733331.3221.920.75
LCTA0.13550.3266670.6250.2760.5370.6616670.6120.1666670.8271820.4810.5340.7060.4620.2610.5450.5710.2566670.3050.160
LCTG3.3641.8903331.33883.8622.1943.3541.8374.7333331.3372732.08351.1851.5292.5383.3911.6362.2863.1746674.0683.764.125
IsoleucineIATT0.17151.4606671.233601.21.1666670.93801.7721820.8231.4033331.1430.50.751.80.5480.4243330.3240.3670
IATC2.6150.9813330.6752.6471.21.2221.6882.4166670.8412731.9330.5563331.4292.250.751.22.07152.5756672.5142.6333
IATA0.2130.5581.09140.3530.60.6113330.3750.5833330.3865450.24351.0403330.4290.251.500.38100.16200
ValineVGTT0.2111.1311.43340.3080.5710.7056670.801.2098181.26751.6296670.3331.0430.2861.3331.6580.1476670.4550.0750.211
VGTC1.051.0106670.259211.2950.780.5330.7556670.7551.30650.3980.50.871.429001.5540.8181.2080.632
VGTA0.2110.3840.96940.0770.3050.2313331.0670.4890.5941820.23750.9536671.1670.34801.1110.2650.10333300.0750
VGTG2.5281.4741.33782.6151.8292.2826671.62.7556671.4411.1881.01866721.7392.2861.5562.0772.1953332.7272.6423.158
SerineSTCT0.4311.1861.40540.6251.031.8473331.36401.5434551.6121.7503330.8181.3851.12500.7160.2813330.5140.4620.9
STCC2.1051.4893330.86742.251.27301.0911.90.4255450.71650.5841.3640.4621.8751.51.07351.2852.43.2312.4
STCA0.3731.0546671.64920.1250.7270.1111.3641.2333331.5418180.9841.0046671.0910.462000.7160.3613330.6860.2310.3
STCG0.36450.0653330.32960.1250.5450000.40609100.3863330.5450000.5160.5016670.3430.3460.9
SAGT1.1771.3290.79460.6251.2730.7166670.81800.9815450.89551.5363330.9552.3081.8751.500.3006670.6860.1150.6
SAGC1.550.8756670.9532.251.1523.3251.3642.8666671.1013641.79150.7381.2271.3851.12532.9793.2703331.3711.6150.9
ProlinePCCT0.6741.5606671.24560.6361.5380.9266671.7142.5333331.7229092.040.6551.3331.3331.6841.60.8890.3256670.5220.6251.333
PCCC2.0261.1470.73081.6360.9231.5446670.28600.6630.94151.07333311.3331.4740.80.8892.0926672.78321.333
PCCA0.9390.8531.54560.8181.1691.221.7141.4666670.9260910.78451.7280.6671.3330.8421.60.4440.4146670.5220.250.667
PCCG0.36050.4393330.4780.9090.3690.3090.28600.6880.2340.54410001.7781.1666670.1741.1250.667
ThreonineTACT0.81.1543330.86841.10310.3333330.8330.6666671.2211820.7410.7536671.751.8670.7271.3330.87050.8870.61501
TACC2.2671.1146671.35261.9311.5623.05533310.6666671.0777271.7781.1451.251.3331.81801.7572.1376671.8461.9432
TACA0.5331.4693331.49480.2760.9380.6113331.66721.4209091.0372.0230.50.80.3640.6671.12150.5010.7690.9140.5
TACG0.40.2620.28420.690.500.50.6666670.2802730.4440.0783330.501.09120.2510.4750.7691.1430.5
AlanineAGCT1.1221.2676670.85080.5890.8641.0953331.5381.3443331.3047270.7941.0076671.8180.90921.7141.6760.5813330.9410.250.462
AGCC1.6061.4323331.4132.0211.8181.6426670.4621.9986670.7694551.35551.4873330.971.2731.2501.02852.3053332.23532.462
AGCA0.78251.1856671.6130.5890.7730.7141.6920.6571.5744551.32151.0761.0911.8180.51.1430.6480.3140.4710.250.615
AGCG0.4890.1140.12320.80.5450.5473330.30800.3514550.5290.4290.12100.251.1430.6480.7993330.3530.50.462
TyrosineYTAT0.27950.921.580.6670.81511.27301.2786360.9230.7503331.3331.1110.6671.250.66700.9230.51
YTAC1.72051.080.421.3331.18510.72720.7213641.0771.2496670.6670.8891.3330.751.33321.0771.51
HistidineHCAT0.33051.0803331.37420.2860.63220.8891.0666671.4761820.7270.9623331.3850.750.6671.21.50.1880.80.3330
HCAC1.66950.9196670.62581.7141.36801.1110.9333330.5238181.2731.0376670.6151.251.3330.80.51.8121.21.6672
GlutamineQCAA0.1080.6476671.05720.2290.3270.6550.8240.7856670.8810910.8120.4576670.6320.6150.5450.500.3383330.2860.0910.364
QCAG1.8921.3523330.94281.7711.6731.3451.1761.2143331.1189091.1881.5423331.3681.3851.4551.521.6616671.7141.9091.636
AsparagineNAAT0.59051.1956671.08720.6961.0491.4541.3750.2443331.1247271.2381.5563331.1580.54511.4550.4520.4923330.4290.1480.2
NAAC1.40950.8043330.91281.3040.9510.5460.6251.7556670.8752730.7620.4436670.8421.45510.5451.5481.5076671.5711.8521.8
LysineKAAA0.31850.7996670.78080.51.2861.1033331.1541.1333331.1750.50.980.7620.8331.1431.8180.51650.4443330.3750.1250.714
KAAG1.68151.2003331.21921.50.7140.8966670.8460.8666670.8251.51.021.2381.1670.8570.1821.48351.5556671.6251.8751.286
Aspartic acidDGAT0.36951.0006671.24780.370.78311.4120.41.3928181.0621.1670.751.0770.8331.3330.94450.2526670.57100.4
DGAC1.63050.9993330.75221.631.21710.5881.60.6071820.9380.8331.250.9231.1670.6671.05551.7473331.42921.6
Glutamic acidEGAA0.4761.1426671.05360.20.9581.3146671.20.9186671.3330910.69351.4086670.8571.3330.4171.3331.10.2320.3330.1480.308
EGAG1.5240.8573330.94641.81.0420.6853330.81.0813330.6669091.30650.5913331.1430.6671.5830.6670.91.7681.6671.8521.692
CysteineCTGT0.52250.8431.10.51.1840.3553331.33321.4172731.0911.39433312100.50.4813330.6670.1250.8
CTGC1.47751.1570.91.50.8161.6446670.66700.5827270.9090.60566710121.51.5186671.3331.8751.2
ArginineRCGT0.60150.2356670.23040.2450.3914.40.80.5713330.5941821.2350.5941.5791.4120.5711.200.7856670.57100.75
RCGC1.5040.5396670.682.0821.1740000.4571820.8820.5866670.63202.2863.61.3332.2413332.5711.6671.875
RCGA0.8230.7406670.83040.8571.04301.600.8913640.8821.7026670.3161.4120.8571.21.3330.2413330.2860.6670.75
RCGG1.5041.1690.60961.7141.30400.81.2856670.5831820.8820.9540.9470.7061.42901.3331.8213332.57121.875
RAGA0.75151.7376672.280.1220.9130.821.5713332.7955451.0592.0163331.8950.3530.857000.50300.3330.375
RAGG0.8161.5766671.36960.981.1740.80.82.5716670.6786361.0590.1466670.6322.1180020.40801.3330.375
GlycineGGGT0.6890.5230.57880.1330.3461.1621.0910.2223330.9198180.7581.1103330.810.5710.80.640.5416670.6960.1740.174
GGGC1.93651.1743330.78322.3331.4811.0286670.7272.5776671.0941821.0520.5226670.641.1431.3330.81.441.7012.3482.3482.261
GGGA0.64451.4926671.85580.41.3831.6761.45501.1253641.26351.4986671.61.1430.7621.60.960.7526670.3480.4350.348
GGGG0.73050.810.78321.1330.790.1333330.7271.20.8607270.92650.8683330.960.7141.3330.80.961.0046670.6091.0431.217
RSCU values of genes highlighting the codons with highest RSCU values.

Analysis of Rare Codons

The codons with a frequency of occurrence below 1% were considered rare codons. Codons TTA, CTA, ATA, GTA, TCG, CCG, ACG, GCG, CGT, CGC, CGA, and TGT were rare codons (Figure 5). The codon usage is tissue-specific and in the brain, the codon usage might be different from other tissues. We adapted per million frequencies of codon bias and per million frequencies of the codon counts multiplied by the respective expression; the value would be called codonome bias hereafter in the human brain dataset from the works of Piovesan et al. (2013). A very high statistically significant positive correlation (r = 0.665, p < 0.001) between codon bias for genes envisaged and codonome bias for brain-specific genes was observed, revealing the presence of tissue and cell-specific selective pressure on gene-related to neurodegeneration with metabolic consequences in present study. Changes in the expression profile of isoacceptor tRNAs might result from adaptation and selection to changes in transcriptome codon usage (Dittmar et al., 2006). Also, tRNA-Arg-TCT enrichment is present and it is suggestive of a tissue-specific role of tRNA in translation (Torres et al., 2019).
FIGURE 5

Rare codons for the neurodegeneration associated gene transcripts. The “rare codon” was defined by calculating the frequency of occurrence of all codons in coding sequences (threshold selected <1% viz. less than 10 in 1,000).

Rare codons for the neurodegeneration associated gene transcripts. The “rare codon” was defined by calculating the frequency of occurrence of all codons in coding sequences (threshold selected <1% viz. less than 10 in 1,000). As per the parity rule, in the absence of mutational force or selection force on a gene’s codon usage, the base content follows Chargaff’s rule: A = T and G = C. The A3%, T3%, C3%, and G3% (nucleotide content at the third position of the codon) were calculated to determine the A3/(A3 + U3) serving as AT bias and G3/(G3 + C3) serving as GC bias. When A3/(A3 + U3) is plotted against G3/(G3 + C3) on the abscissa and the ordinate, respectively, if the values are 0.5, then all the data will be located in the center (Young et al., 1994). However, the codon usage is generally governed by either of these or both the selection and mutation pressure and other forces such as compositional pressure. In the present study, the results (Figure 6) show that the average position of x = 0.449 ± 0.152 (AT bias) and y = 0.511 ± 0.054 (GC bias). Hence T is preferred over A, and G is preferred over C.
FIGURE 6

Parity plot generated using A3/(A3 + U3) as abscissa and G3/(G3 + C3) as the ordinate. The plot exhibit the preference of T over A and G over C.

Parity plot generated using A3/(A3 + U3) as abscissa and G3/(G3 + C3) as the ordinate. The plot exhibit the preference of T over A and G over C.

Degree of Codon Bias

The Nc value determines the degree of bias. The higher the Nc, the lower the bias. A gene with an Nc value less than 35 generally has a strong codon bias (Sheikh et al., 2020), while the gene with an Nc value of greater than 50 has a random choice of codon, indicating the least codon bias. Nc values between 35 and 50 demonstrate moderate bias (Wang et al., 2018). In the present study, the Nc value ranged between 33.9 and 59.9, indicating that the genes exhibited a wider range of codon bias. An Nc < 35 was observed for 2.17% of transcripts, representing a very high bias. Furthermore, 40.21% had moderate bias (Nc 35-50), and 56.60% of transcripts displayed low bias (Nc > 50).

The Effect of Compositional Constraint in Shaping Codon Usage

The Nc-GC3 curve was plotted to elucidate further the effects of mutational, selectional, or compositional constraints on codon usage. If the codon usage was solely driven by %GC content present at the third position of the codon, then all the data points will lie on the GC3 curve. If a gene is subjected to translational selection, the data points will lie well below the expected curve (Sablok et al., 2011). In the present case, all low Nc points were well below the curve, suggesting forces other than compositional constraints like selectional forces affected codon usage (Figure 7).
FIGURE 7

The relationship between compositional constraint and codon usage bias. ENc-GC3 curve indicates the presence of selection and mutational forces on codon bias of genes. Data points far from the standard curve indicate action of forces other than compositional constraints acting on codon usage choices.

The relationship between compositional constraint and codon usage bias. ENc-GC3 curve indicates the presence of selection and mutational forces on codon bias of genes. Data points far from the standard curve indicate action of forces other than compositional constraints acting on codon usage choices.

Neutrality Analysis for Quantitation of Mutation and Selection Pressure

Although the Nc-GC3 plot can demonstrate the key factors responsible for shaping codon bias, it cannot quantitate the directional pressure and selection pressure. For the same neutrality plot is required, constructed using the %GC3 and %GC12 content of genes. The GC3 content of the genes varied from 27.7 to 89.6%, while GC1 and GC2 varied from 41.7–89% to 30.1–76.8%, respectively. The linear regression model of GC12% on GC3% indicated (GC12%) = 0.1732 (GC3%) + 39.18 with R2 = 0.315. It suggests that a 31.5% variance in GC12 is introduced by GC3. The regression coefficient is 17.25%, indicating that mutational forces contribute 17.32%, while selection and other factors contribute 82.68%. The results indicate that selection pressures are dominant over mutational forces (Figure 8).
FIGURE 8

Neutrality plot analysis revealed 17.25% mutational forces and 82.68% selection forces acting on 60 genes related to neurodegenerative disorders.

Neutrality plot analysis revealed 17.25% mutational forces and 82.68% selection forces acting on 60 genes related to neurodegenerative disorders.

Effect of Mutational Forces on Gene Expression

The CAI is an index for gene expressivity. The higher the CAI value, the greater the expressivity of the gene (Kumar et al., 2021). CAI and %A3, %T3, %C3, and %G3 were analyzed by regression analysis to determine the effect of mutational forces on gene expression. The results indicate that for all four nucleotides, an almost straight line is obtained as a regression curve (regression coefficients of –0.0047, –0.0045, 0.004, and 0.004 for A3, T3, C3, and G3, respectively). These results indicate that substantially fewer mutational forces were at work, and selectional forces majorly influence gene expression. An ICDI estimates codon bias where optimal codons are unknown (Rodríguez-Belmonte et al., 1996). The average ICDI value was 0.182, and vales were well correlated with Nc values. According to Freire-Picos et al. (1994), if the ICDI value is greater than 0.5, there is a high level of bias. We found generally low levels of bias. The lowest ICDI value was 0.05 for 5-methyltetrahydrofolate-homocysteine methyltransferase (MTR) transcript variant 2, while the highest was 0.501 for solute carrier family 6 member 19 (SLC6A19). However, the average ICDI was 0.182 ± 0.008, indicating a generally low bias.

The Interrelationship Between Codon Usage Bias and Properties of Protein

Protein indices including protein length, GRAVY, AROMO, PI, hydrophobicity index, aliphatic index, instability index, and percent acidic, basic, and neutral amino acids were estimated and subjected to correlation analysis (Barbhuiya et al., 2020). The correlation analysis between the codon usage bias and various protein indices revealed that Nc was positively associated with PI (r = 0.176; p < 0.05), acidic (r = 0.216; p < 0.01), and basic (r = 0.347; p < 0.0001) protein residues. The Nc was negatively correlated with GRAVY (r = –0.173; p < 0.05), hydropathicity (r = –0.220; p < 0.01) and neutral amino acids (r = 0.146; p < 0.05). Nc and protein length did not correlate to codon usage bias. The biplot arrows indicate the preferred codons from each sequence. The farthest vector (codon) has the maximum effect on PC. The eclipse enclosed the sequences with 95% confidence based on PCA on biplots (Figure 9). The dots are the PCA score of the sequences. PCBD1 and PTS genes were present outside of 95% confidence eclipse. The scree plot revealed that PC1 captured 51.56% of the variation, and PC2 captured 6.13%.
FIGURE 9

A biplot depiction of PCA. Dots are representing the sequences, whilst arrows are showing codons. Eclipse showing a 95% confidence limit.

A biplot depiction of PCA. Dots are representing the sequences, whilst arrows are showing codons. Eclipse showing a 95% confidence limit.

Discussion

In the present study, the codon usage pattern was analyzed for 183 transcripts belonging to 60 genes involved in neurodegeneration with metabolic disturbances. Initial overall nucleotide composition analysis revealed a random pattern for A, T, C, and G nucleotide usage in the sequences. Variation was observed at total nucleotide composition as well as GC composition. The nucleotide component C had a maximum range of variability of 22.35%, while nucleotide G had the least (14.34%). The GC3 component showed the most variation of all positions, possibly due to the degeneracy at the third codon position. Dinucleotide frequency significantly influences codon usage bias and can be considered a genetic signature for a species (De Amicis and Marchetti, 2000). Underrepresentation of the TpA dinucleotide has been reported in multiple studies. As TpA is more susceptible to degradation by cellular RNases, owing to its mRNA destabilizing effect, contribution to stop codons (TAA and TAG) (Khandia et al., 2019), and selectional forces that tend to keep the TpA content low. CpG dinucleotides are also predisposed to mutations by deamination of 5-methylcytosine at CpG sites resulting in C?T changes. The dinucleotide CpG is approximately 42 times more mutable than predicted from random mutation; however, the exact proportion cannot be estimated due to the variable degree of methylation of cytosine in vertebrates (Cooper and Youssoufian, 1988). For this reason, the expression constructs for protein expression for protein production and gene therapy are designed in a way to avoid CpG (Bauer et al., 2010). Despite selection forces acting against the dinucleotide pair TpA and CpG, our study found an unbiased representation of the TpA and CpG dinucleotides in 18.33 and 23.33% of sequences, respectively. In a study related to a humanized green fluorescent protein, consideration was given to 60 CpG residues within the coding region. Expression of a detectable amount of protein was decreased with decreasing number of CpG and was independent of the promoter used. A similar experiment reported that CpG depleted mRNA was decreased fivefold in the nucleus and eightfold in the cytoplasm. A decrease in the GFP reporter expression associated with CpG depletion was more related to a decline in the copy number of mRNA than translational efficiency, and the effect was gene-independent (Bauer et al., 2010). This result indicates that intragenic CpG influences de novo transcriptional activity. Such experimental evidence implies that although CpG tends to mutate faster than other nucleotide combinations and makes the gene vulnerable to loss of function, it is still essential for optimal gene expression; hence, a fine-tuned balance is needed to achieve optimal protein expression. Apart from de novo transcription, CpG has a role in the stability of RNA transcripts, and with an increasing proportion of CpG, mRNA stability and subsequent protein expression also increase (Duan and Antezana, 2003). It is evident that all the genes related to metabolism or metabolite transport need to be highly expressed in the cells to meet metabolism and metabolite transport demands. Despite their tendency towards mutation and loss of function, the genes tended to retain CpG at an adequate level, explaining well the unbiased usage of CpG in our study and underscoring the selectional forces that keep the CpG at a certain level to maintain high expression. A BRCA1 or BRCA2 mutant chicken DT40 cell line model for spontaneous mutation exhibited an 11 times higher likelihood of NCG to NTG mutation relative to the mean mutation rate (Zámborszky et al., 2017). In our study, we found strikingly high RSCU for CTG (RSCU > 1.6; highest 4.73) and GTG (RSCU > 1.6; highest 3.28) codons in some genes with over-representation of CTG and GTG in 78.33 and 68.33% of genes, respectively. This finding correlated well with the transition of CpG dinucleotide to TpG. It is further strengthened by the fact that the predecessors of CTG and GTG (codons CCG and GCG) were over exhibited only in 3.26 and 4.34% of coding sequences but underrepresented for both CCG and GCG in 85.86 and 80.97% of genes. Thus, underrepresentation can be attributed to the conversion of CCG and GCG to CTG and GTG, culminating in CTG and GTG overrepresentation. Several factors affect the biased codon choices, including genetic drift, mutation pressure, natural selection, composition, secondary protein motifs, protein’s physical properties, transcriptional factors, and external environment, tRNA abundance etc. (Ikemura, 1981). However, natural selection, mutation pressure with genetic drift (Chen et al., 2014; LaBella et al., 2019), and compositional constraints (Jia et al., 2015) are major factors. In addition, various analyses like parity, neutrality, ENc-GC3 analysis, and abundance of specific codons and dinucleotides suggest the presence of selection as a significant force and mutation force. Investigation of the role of compositional properties on codon bias revealed that three out of four nucleotides at the second position of the codon significantly impact the bias (A2, C2, and T2). The A2 nucleotides negatively correlated with Nc, while C2 and T2 were positively correlated. This result could be explained by Saier (2019) work, who explained that the second nucleotide position is the most important in determining the nature of the genetic code. Overall based on our analyses, it can be inferred that selection, mutation and composition are the forces that might be responsible for shaping codon usage. In living organisms, the GC content ranges from approximately 20% GC to 80% GC. Upon plotting the GC content variation at three codon positions against overall GC content, there appeared a positive correlation between GC content in a codon with a total GC content of the genome; however, the steepness of the slope differed with a rank of third, first, and second codon positions (Muto and Osawa, 1987). Since the mutations are random, the advantageous one will be selected. The constraints affecting the mutation are highest on codon position two, while least on codon position three. This observation can be attributed to the fact that the second position of the codon specifies the type of amino acid, while the first one specifies a specific amino acid. The third position is redundant since several bases specify an amino acid. How position two of the codon specifies the type of amino acid can be understood by the example of when T, A, and C are present at the second position: all resulting amino acids are hydrophobic, hydrophilic, and semipolar. The only exception is G; when it is present at the second position, similar to C at the second position, it results in semipolar amino acids with two exceptions (arginine, a strongly hydrophilic amino acid, and UGA, a stop codon). A regression analysis between the Nc with the nucleotide content present at the third position of the codon revealed a positive association with %A3 and %T3 and a negative correlation with %C3 and %G3. The %T3 had the highest regression coefficient and a positive correlation. Parity plot analysis revealed that T was preferred over A, and G is preferred over C. The disproportionate usage of these nucleotides suggests the natural selection of codon usage bias of genes (Uddin and Chakraborty, 2016) linked to neurodegeneration. Neutrality analysis indicates the dominance of selection and other forces, such as compositional, in shaping codon usage. Mutational force only contributed 17.32 and 31.5% variance in GC12 was attributed to GC3. Similar to Nc, ICDI is also a parameter to evaluate the codon usage bias, and its value ranges between 0 and 1. Higher ICDI values (toward 1) indicate the highest codon usage bias. In the present study, the average ICDI value was 0.182 ± 0.008, indicating a generally low bias. Nc analysis revealed a relatively low bias in codon usage. This finding was coupled with the fact that these genes were highly expressed with high CAI values. The CAI value quantifies the synonymous codon usage bias for a DNA or RNA sequence (and the codon usage similarities between the gene and a reference set). High CAI suggests a very high selectional force on a gene to selectively use a codon contributing to high-level protein expression (Sharp and Li, 1987; Puigbò et al., 2007). The genes with higher CAI values tend to utilize more optimal codons. The CAI value varied between 0.885 and 0.71. In E. coli, the highest CAI (0.84) has been reported for the rplL gene encoding ribosomal protein L7/12, one of the most abundant proteins present in the species (DiRienzo and Inouye, 1979). In the present study, all the genes had high CAI values, indicating higher expression of genes and the importance of these genes in physiological functions. Upon regression the CAI values to the nucleotide composition present at the third position of the codon, a very low regression coefficient indicated that mutational forces were not affected by the gene expression and expression was driven mainly by selectional forces.

Conclusion

The present study explored the codon usage pattern and various forces applied on 183 transcripts of 60 genes involved in neurodegeneration associated with metabolic ailments. Analyses revealed a random pattern of the overall composition of the four standard nucleotides, and nucleotide C had a maximum range of variability of 22.35% in terms of total nucleotide components. Across the protein length, up to 800 amino acids, with increasing length, GC12 remained constant while GC3 fluctuated widely. The overall codon usage bias was low with higher Nc values and low ICDI. An investigation into the effects of compositional parameters on codon usage bias revealed that the second position of a codon is critical, as determined by a significant correlation of A2, C2, and T2 with Nc (p > 0.001). The genes were highly expressed, evidenced by their very high CAI values. This higher expression shows their involvement in critical physiological processes. Other parameters such as neutrality analysis, parity plot, and Nc-GC3 curve indicated the dominance of selection pressure along with the presence of compositional and mutational constraints. The transcripts exhibited under-representation of dinucleotides TpA and CpG due to selectional pressure. However, unbiased representation (odds ratio > 0.78) of TpA and CpG dinucleotides was observed in 18.33 and 23.33% of genes. These dinucleotides are important as part of regulatory elements (TATA box, stop codons, polyadenylation signal) in the thermodynamic stability of mRNA and missense mutations. The unbiased representation of these dinucleotides suggests selectional forces finely tune the CpG content level to obtain the optimum rate of protein expression for the high demand of these metabolism and metabolite transfer-related genes. The loss of this fine-tuning leads to neurological ailments. Notably, we observed very high RSCU values for CTG and GTG codons resulting from the transition of C to T. This observation indicates the mutational forces move forward to eliminate the CpG and selection pressure in the reverse direction to maintain high protein expression and this critical balance fine tune the CpG content in genes associated with neurodegeneration.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Ethics Statement

Ethical review and approval were not required for the study on human participants in accordance with the local legislation and institutional requirements.

Author Contributions

RK: conceptualization. RK and AS: methodology and writing—original draft preparation. RK, TA, AA, YA, SA, and AMA: validation. RK, TA, and AA: formal analysis. RK, AS, TA, AA, YA, SA, AMA, and MK: writing—review, editing, and visualization. MK: final validation. All authors have read and agreed to the published version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  57 in total

1.  Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position.

Authors:  N Sueoka
Journal:  Gene       Date:  1999-09-30       Impact factor: 3.688

Review 2.  Alzheimer's disease.

Authors:  Clive Ballard; Serge Gauthier; Anne Corbett; Carol Brayne; Dag Aarsland; Emma Jones
Journal:  Lancet       Date:  2011-03-01       Impact factor: 79.321

3.  Analysis of compositional properties and codon usage bias of mitochondrial CYB gene in anura, urodela and gymnophiona.

Authors:  Parvin A Barbhuiya; Arif Uddin; Supriyo Chakraborty
Journal:  Gene       Date:  2020-05-11       Impact factor: 3.688

4.  Intrastrand parity rules of DNA base composition and usage biases of synonymous codons.

Authors:  N Sueoka
Journal:  J Mol Evol       Date:  1995-03       Impact factor: 2.395

Review 5.  Protein aggregation and neurodegeneration in prototypical neurodegenerative diseases: Examples of amyloidopathies, tauopathies and synucleinopathies.

Authors:  Mathieu Bourdenx; Nikolaos Stavros Koulakiotis; Despina Sanoudou; Erwan Bezard; Benjamin Dehay; Anthony Tsarbopoulos
Journal:  Prog Neurobiol       Date:  2015-07-21       Impact factor: 11.685

Review 6.  Amyloid β, glutamate, excitotoxicity in Alzheimer's disease: are we on the right track?

Authors:  Zaira Esposito; Lorena Belli; Sofia Toniolo; Giuseppe Sancesario; Claudio Bianconi; Alessandro Martorana
Journal:  CNS Neurosci Ther       Date:  2013-04-18       Impact factor: 5.243

7.  Mycobacterium lepromatosis genome exhibits unusually high CpG dinucleotide content and selection is key force in shaping codon usage.

Authors:  Ashok Munjal; Rekha Khandia; Kishor K Shende; Jayashankar Das
Journal:  Infect Genet Evol       Date:  2020-06-06       Impact factor: 3.342

8.  Universal tight correlation of codon bias and pool of RNA codons (codonome): The genome is optimized to allow any distribution of gene expression values in the transcriptome from bacteria to humans.

Authors:  Allison Piovesan; Lorenza Vitale; Maria Chiara Pelleri; Pierluigi Strippoli
Journal:  Genomics       Date:  2013-03-01       Impact factor: 5.736

9.  Variation and selection on codon usage bias across an entire subphylum.

Authors:  Abigail L LaBella; Dana A Opulente; Jacob L Steenwyk; Chris Todd Hittinger; Antonis Rokas
Journal:  PLoS Genet       Date:  2019-07-31       Impact factor: 5.917

10.  Analysis of preferred codon usage in the coronavirus N genes and their implications for genome evolution and vaccine design.

Authors:  Abdullah Sheikh; Abdulla Al-Taher; Mohammed Al-Nazawi; Abdullah I Al-Mubarak; Mahmoud Kandeel
Journal:  J Virol Methods       Date:  2020-01-05       Impact factor: 2.014

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.