Chemistry:Micropeptide

From HandWiki
Short description: Short length polypeptides
Micropeptides can be transcribed from 5'UTRs, small genes, polycistronic mRNAs, or mis-annotated lncRNA.

Micropeptides (also referred to as microproteins) are polypeptides with a length of less than 100-150 amino acids that are encoded by short open reading frames (sORFs).[1][2][3] In this respect, they differ from many other active small polypeptides, which are produced through the posttranslational cleavage of larger polypeptides.[1][4] In terms of size, micropeptides are considerably shorter than "canonical" proteins, which have an average length of 330 and 449 amino acids in prokaryotes and eukaryotes, respectively.[5] Micropeptides are sometimes named according to their genomic location. For example, the translated product of an upstream open reading frame (uORF) might be called a uORF-encoded peptide (uPEP).[6] Micropeptides lack an N-terminal signaling sequences, suggesting that they are likely to be localized to the cytoplasm.[1] However, some micropeptides have been found in other cell compartments, as indicated by the existence of transmembrane micropeptides.[7][8] They are found in both prokaryotes and eukaryotes.[1][9][10] The sORFs from which micropeptides are translated can be encoded in 5' UTRs, small genes, or polycistronic mRNAs. Some micropeptide-coding genes were originally mis-annotated as long non-coding RNAs (lncRNAs).[11]

Given their small size, sORFs were originally overlooked. However, hundreds of thousands of putative micropeptides have been identified through various techniques in a multitude of organisms. Only a small fraction of these with coding potential have had their expression and function confirmed. Those that have been functionally characterized, in general, have roles in cell signaling, organogenesis, and cellular physiology. As more micropeptides are discovered so are more of their functions. One regulatory function is that of peptoswitches, which inhibit expression of downstream coding sequences by stalling ribosomes, through their direct or indirect activation by small molecules.[11]

Identification

Various experimental techniques exist for identifying potential sORFs and their translational products. These techniques are only useful for identification of sORF that may produce micropeptides and not for direct functional characterization.

RNA sequencing

One method for finding potential sORFs, and therefore micropeptides, is through RNA sequencing (RNA-Seq). RNA-Seq uses next-generation sequencing (NGS) to determine which RNAs are expressed in a given cell, tissue, or organism at a specific point in time. This collection of data, known as a transcriptome, can then be used as a resource for finding potential sORFs.[1] Because of the strong likelihood of sORFs less than 100 aa occurring by chance, further study is necessary to determine the validity of data obtained using this method.[11]

Ribosome profiling (Ribo-Seq)

Ribosome profiling has been used to identify potential micropeptides in a growing number of organisms, including fruit flies, zebrafish, mice and humans.[11] One method uses compounds such as harringtonine, puromycin or lactimidomycin to stop ribosomes at translation initiation sites.[12] This indicates where active translation is taking place. Translation elongation inhibitors, such as emetine or cycloheximide, may also be used to obtain ribosome footprints which are more likely to result in a translated ORF.[13] If a ribosome is bound at or near a sORF, it putatively encodes a micropeptide.[1][2][14]

Mass spectrometry

Mass spectrometry (MS) is the gold standard for identifying and sequencing proteins. Using this technique, investigators are able to determine if polypeptides are, in fact, translated from a sORF.

Proteogenomic applications

Proteogenomics combines proteomics, genomics, and transciptomics. This is important when looking for potential micropeptides. One method of using proteogenomics entails using RNA-Seq data to create a custom database of all possible polypeptides. Liquid chromatography followed by tandem MS (LC-MS/MS) is performed to provide sequence information for translation products. Comparison of the transcriptomic and proteomics data can be used to confirm the presence of micropeptides.[1][2]

Phylogenetic conservation

Phylogenetic conservation can be a useful tool, particularly when sifting through a large database of sORFs. The likelihood of a sORF resulting in a functional micropeptide is more likely if it is conserved across numerous species.[11][12] However, this will not work for all sORFs. For example, those that are encoded by lncRNAs are less likely to be conserved given lncRNAs themselves do not have high sequence conservation.[2] Further experimentation will be necessary to determine if a functional micropeptide is in fact produced.

Validating protein-coding potential

Antibodies

Custom antibodies targeted to the micropeptide of interest can be useful for quantifying expression or determining intracellular localization. As is the case with most proteins, low expression may make detection difficult. The small size of the micropeptide can also lead to difficulties in designing an epitope from which to target the antibody.[2]

Tagging with CRISPR-Cas9

Genome editing can be used to add FLAG/MYC or other small peptide tags to an endogenous sORF, thus creating fusion proteins. In most cases, this method is beneficial in that it can be performed more quickly than developing a custom antibody. It is also useful for micropeptides for which no epitope can be targeted.[2]

In vitro translation

This process entails cloning the full-length micropeptide cDNA into a plasmid containing a T7 or SP6 promoter. This method utilizes a cell-free protein-synthesizing system in the presence of 35S-methionine to produce the peptide of interest. The products can then be analyzed by gel electrophoresis and the 35S-labeled peptide is visualized using autoradiography.[2]

Databases and repositories

There are several repositories and databases that have been created for both sORFs and micropeptides. A repository for of small ORFs discovered by ribosome profiling can be found at sORFs.org.[15][16] A repository of putative sORF-encoded peptides in Arabidopsis thaliana can be found at ARA-PEPs.[17][18] A database of small proteins, especially encoded by non-coding RNAs can be found at SmProt.[19][20]

Prokaryotic examples

To date, most micropeptides have been identified in prokaryotic organisms. While most have yet to be fully characterized, of those that have been studied, many appear to be critical to the survival of these organisms. Because of their small size, prokaryotes are particularly susceptible to changes in their environment, and as such have developed methods to ensure their existence.

Escherichia coli (E. coli)

Micropeptides expressed in E. coli exemplify bacterial environmental adaptations. Most of these have been classified into three groups: leader peptides, ribosomal proteins, and toxic proteins. Leader proteins regulate transcription and/or translation of proteins involved in amino acid metabolism when amino acids are scarce. Ribosomal proteins include L36 (rpmJ) and L34 (rpmH), two components of the 50S ribosomal subunit. Toxic proteins, such as ldrD, are toxic at high levels and can kill cells or inhibit growth, which functions to reduce the host cell's viability.[21]

Salmonella enterica (S. enterica)

In S. enterica, the MgtC virulence factor is involved in adaptation to low magnesium environments. The hydrophobic peptide MgrR, binds to MgtC, causing its degradation by the FtsH protease.[9]

Bacillus subtilis (B. subtilis)

The 46 aa Sda micropeptide, expressed by B. subtilis, represses sporulation when replication initiation is impaired. By inhibiting the histidine Kinase KinA, Sda prevents the activation of the transcription factor Spo0A, which is required for sporulation.[10]

Staphylococcus aureus (S. aureus)

In S. aureus, there are a group of micropeptides, 20-22 aa, that are excreted during host infection to disrupt neutrophil membranes, causing cell lysis. These micropeptides allow the bacterium to avoid degradation by the human immune systems' main defenses.[22][23]

Eukaryotic examples

Micropeptides have been discovered in eukaryotic organisms from Arabidopsis thaliana to humans. They play diverse roles in tissue and organ development, as well as maintenance and function once fully developed. While many are yet to be functionally characterized, and likely more remain to be discovered, below is a summary of recently identified eukaryotic micropeptide functions.

Arabidopsis thaliana (A. thaliana)

The POLARIS (PLS) gene encodes a 36 aa micropeptide. It is necessary for proper vascular leaf patterning and cell expansion in the root. This micropeptide interacts with developmental PIN proteins to form a critical network for hormonal crosstalk between auxin, ethylene, and cytokinin.[24][25][26]

ROTUNDIFOLIA (ROT4) in A. thaliana encodes a 53 aa peptide, which localizes to the plasma membrane of leaf cells. The mechanism of ROT4 function is not well understood, but mutants have short rounded leaves, indicating that this peptide may be important in leaf morphogenesis.[27]

Zea mays (Z. mays)

Brick1 (Brk1) encodes a 76 aa micropeptide, which is highly conserved in both plants and animals. In Z. mays, it was found to be involved in morphogenesis of leaf epithelia, by promoting multiple actin-dependent cell polarization events in the developing leaf epidermis.[28] Zm401p10 is an 89 aa micropeptide, which plays a role in normal pollen development in the tapetum. After mitosis it also is essential in the degradation of the tapetum.[29] Zm908p11 is a micropeptide 97 aa in length, encoded by the Zm908 gene that is expressed in mature pollen grains. It localizes to the cytoplasm of pollen tubes, where it aids in their growth and development.[30]

Drosophila melanogaster (D. melanogaster)

The evolutionarily conserved polished rice (pri) gene, known as tarsal-less (tal) in D. melanogaster, is involved in epidermal differentiation. This polycistronic transcript encodes four similar peptides, which range between 11-32 aa in length. They function to truncate the transcription factor Shavenbaby (Svb). This converts Svb into an activator that directly regulates the expression of target effectors, including miniature (m) and shavenoid (sha), which are together responsible for trichome formation.[31]

Danio rerio (D. rerio)

The Elabela gene (Ela) (a.k.a. Apela, Toddler) is important for embryogenesis.[32] It is specifically expressed during late blastula and gastrula stages. During gastrulation, it is critical in promoting the internalization and animal-pole directed movement of mesendodermal cells. After gastrulation, Ela is expressed in the lateral mesoderm, endoderm, as well as the anterior, and posterior, notochord. Although it was annotated as a lncRNA in zebrafish, mouse, and human, the 58-aa ORF was found to be highly conserved among vertebrate species. Ela is processed by removal of its N-terminus signal peptide and then secreted in the extracellular space. Its 34-aa mature peptide serves as the first endogenous ligand to a GPCR known as the Apelin Receptor.[33][32] The genetic inactivation of Ela or Aplnr in zebrafish results in heartless phenotypes.[34][35]

Mus musculus (M. musculus)

Myoregulin (Mln) is encoded by a gene originally annotated as a lncRNA. Mln is expressed in all 3 types of skeletal muscle, and works similarly to the micropeptides phospholamban (Pln) in the cardiac muscle and sarcolipin (Sln) in slow (Type I) skeletal muscle. These micropeptides interact with sarcoplasmic reticulum Ca2+-ATPase (SERCA), a membrane pump responsible for regulating Ca2+ uptake into the sarcoplasmic reticulum (SR). By inhibiting Ca2+ uptake into the SR, they cause muscle relaxation. Similarly, the endoregulin (ELN) and another-regulin (ALN) genes code for transmembrane micropeptides that contain the SERCA binding motif, and are conserved in mammals.[7]

Myomixer (Mymx) is encoded by the gene Gm7325, a muscle-specific peptide, 84 aa in length, which plays a role during embryogenesis in fusion and skeletal muscle formation. It localizes to the plasma membrane, associating with a fusogenic membrane protein, Myomaker (Mymk). In humans, the gene encoding Mymx is annotated as uncharacterized LOC101929726. Orthologs are found in the turtle, frog and fish genomes as well.[8]

Homo sapiens (H. sapiens)

In humans, NoBody (non-annotated P-body dissociating polypeptide), a 68 aa micropeptide, was discovered in the long intervening noncoding RNA (lincRNA) LINC01420. It has high sequence conservation among mammals, and localizes to P-bodies. It enriches proteins associated with 5’ mRNA decapping. It is thought to interact directly with Enhancer of mRNA Decapping 4 (EDC4).[36]

ELABELA (ELA) (a.k.a. APELA) is an endogenous hormone that is secreted as a 32 amino acid micropeptide by human embryonic stem cells.[32] It is essential to maintain the self-renewal and pluripotency of human embryonic stem cells. Its signals in an autocrine fashion through the PI3/AKT pathway via an as yet unidentified cell surface receptor.[37] In differentiating mesoendermal cells ELA binds to, and signals via, APLNR, a GPCR which can also respond to the hormonal peptide APLN.

The C7orf49 gene, conserved in mammals, when alternatively spliced is predicted to produce three micropeptides. MRI-1 was previously found to be a modulator of retrovirus infection. The second predicted micropeptide, MRI-2, may be important in non-homologous end joining (NHEJ) of DNA double strand breaks. In Co-Immunoprecipitation experiments, MRI-2 bound to Ku70 and Ku80, two subunits of Ku, which play a major role in the NHEJ pathway.[38]

The 24 amino acid micropeptide, Humanin (HN), interacts with the apoptosis-inducing protein Bcl2-associated X protein (Bax). In its active state, Bax undergoes a conformational change which exposes membrane-targeting domains. This causes it to move from the cytosol to the mitochondrial membrane, where it inserts and releases apoptogenic proteins such as cytochrome c. By interacting with Bax, HN prevents Bax targeting of the mitochondria, thereby blocking apoptosis.[39]

A micropeptide of 90aa, ‘Small Regulatory Polypeptide of Amino Acid Response’ or SPAAR, was found to be encoded in the lncRNA LINC00961. It is conserved between human and mouse, and localizes to the late endosome/lysosome. SPAAR interacts with four subunits of the v-ATPase complex, inhibiting mTORC1 translocation to the lysosomal surface where it is activated. Down-regulation of this micropeptide enables mTORC1 activation by amino acid stimulation, promoting muscle regeneration.[40]

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 "Little things make big things happen: A summary of micropeptide encoding genes". EuPA Open Proteomics 3: 128–137. 2014. doi:10.1016/j.euprot.2014.02.006. 
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 "Mining for Micropeptides". Trends in Cell Biology 27 (9): 685–696. September 2017. doi:10.1016/j.tcb.2017.04.006. PMID 28528987. 
  3. "Detailed analysis of putative genes encoding small proteins in legume genomes". Frontiers in Plant Science 4: 208. 2013. doi:10.3389/fpls.2013.00208. PMID 23802007. 
  4. "Lilliputians get into the limelight: novel class of small peptide genes in morphogenesis". Development, Growth & Differentiation 50 Suppl 1: S269–76. June 2008. doi:10.1111/j.1440-169x.2008.00994.x. PMID 18459982. 
  5. "Protein-length distributions for the three domains of life". Trends in Genetics 16 (3): 107–9. March 2000. doi:10.1016/s0168-9525(99)01922-8. PMID 10689349. 
  6. "Short Open Reading Frames and Their Encoded Peptides". Proteomics 18 (10): e1700035. May 2018. doi:10.1002/pmic.201700035. PMID 29691985. 
  7. 7.0 7.1 "A micropeptide encoded by a putative long noncoding RNA regulates muscle performance". Cell 160 (4): 595–606. February 2015. doi:10.1016/j.cell.2015.01.009. PMID 25640239. 
  8. 8.0 8.1 "Control of muscle formation by the fusogenic micropeptide myomixer". Science 356 (6335): 323–327. April 2017. doi:10.1126/science.aam9361. PMID 28386024. Bibcode2017Sci...356..323B. 
  9. 9.0 9.1 "Peptide-assisted degradation of the Salmonella MgtC virulence factor". The EMBO Journal 27 (3): 546–57. February 2008. doi:10.1038/sj.emboj.7601983. PMID 18200043. 
  10. 10.0 10.1 "Replication initiation proteins regulate a developmental checkpoint in Bacillus subtilis". Cell 104 (2): 269–79. January 2001. doi:10.1016/s0092-8674(01)00211-2. PMID 11207367. 
  11. 11.0 11.1 11.2 11.3 11.4 "Emerging evidence for functional peptides encoded by short open reading frames". Nature Reviews. Genetics 15 (3): 193–204. March 2014. doi:10.1038/nrg3520. PMID 24514441. 
  12. 12.0 12.1 "Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation". The EMBO Journal 33 (9): 981–93. May 2014. doi:10.1002/embj.201488411. PMID 24705786. 
  13. "Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes". Cell Reports 8 (5): 1365–79. September 2014. doi:10.1016/j.celrep.2014.07.045. PMID 25159147. 
  14. "Ribosome Profiling as a Tool to Decipher Viral Complexity". Annual Review of Virology 2 (1): 335–49. November 2015. doi:10.1146/annurev-virology-100114-054854. PMID 26958919. 
  15. "sORFs.org: repository of small ORFs identified by ribosome profiling" (in en). http://sorfs.org/. 
  16. "sORFs.org: a repository of small ORFs identified by ribosome profiling". Nucleic Acids Research 44 (D1): D324–9. January 2016. doi:10.1093/nar/gkv1175. PMID 26527729. 
  17. "ARA-PEPs: A Repository of putative sORF-encoded peptides in Arabidopsis thaliana". https://www.biw.kuleuven.be/CSB/ARA-PEPs/. 
  18. "ARA-PEPs: a repository of putative sORF-encoded peptides in Arabidopsis thaliana". BMC Bioinformatics 18 (1): 37. January 2017. doi:10.1186/s12859-016-1458-y. PMID 28095775. 
  19. "SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci". http://bioinfo.ibp.ac.cn/SmProt/. 
  20. "SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci". Briefings in Bioinformatics 19 (4): 636–643. July 2018. doi:10.1093/bib/bbx005. PMID 28137767. 
  21. "Small membrane proteins found by comparative genomics and ribosome binding site models". Molecular Microbiology 70 (6): 1487–501. December 2008. doi:10.1111/j.1365-2958.2008.06495.x. PMID 19121005. 
  22. "Identification of novel cytolytic peptides as key virulence determinants for community-associated MRSA". Nature Medicine 13 (12): 1510–4. December 2007. doi:10.1038/nm1656. PMID 17994102. 
  23. "Small stress response proteins in Escherichia coli: proteins missed by classical proteomic studies". Journal of Bacteriology 192 (1): 46–58. January 2010. doi:10.1128/jb.00872-09. PMID 19734316. 
  24. "The POLARIS gene of Arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning". The Plant Cell 14 (8): 1705–21. August 2002. doi:10.1105/tpc.002618. PMID 12172017. 
  25. "The POLARIS peptide of Arabidopsis regulates auxin transport and root growth via effects on ethylene signaling". The Plant Cell 18 (11): 3058–72. November 2006. doi:10.1105/tpc.106.040790. PMID 17138700. 
  26. "Interaction of PLS and PIN and hormonal crosstalk in Arabidopsis root development". Frontiers in Plant Science 4: 75. 2013. doi:10.3389/fpls.2013.00075. PMID 23577016. 
  27. "Overexpression of a novel small peptide ROTUNDIFOLIA4 decreases cell proliferation and alters leaf shape in Arabidopsis thaliana". The Plant Journal 38 (4): 699–713. May 2004. doi:10.1111/j.1365-313x.2004.02078.x. PMID 15125775. 
  28. "A small, novel protein highly conserved in plants and animals promotes the polarized growth and division of maize leaf epidermal cells". Current Biology 12 (10): 849–53. May 2002. doi:10.1016/s0960-9822(02)00819-9. PMID 12015123. 
  29. "Zm401p10, encoded by an anther-specific gene with short open reading frames, is essential for tapetum degeneration and anther development in maize". Functional Plant Biology 36 (1): 73–85. 2009. doi:10.1071/fp08154. PMID 32688629. 
  30. "Zm908p11, encoded by a short open reading frame (sORF) gene, functions in pollen tube growth as a profilin ligand in maize". Journal of Experimental Botany 64 (8): 2359–72. May 2013. doi:10.1093/jxb/ert093. PMID 23676884. 
  31. "Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis". Science 329 (5989): 336–9. July 2010. doi:10.1126/science.1188158. PMID 20647469. Bibcode2010Sci...329..336K. 
  32. 32.0 32.1 32.2 "ELABELA: a hormone essential for heart development signals via the apelin receptor". Developmental Cell 27 (6): 672–80. December 2013. doi:10.1016/j.devcel.2013.11.002. PMID 24316148. 
  33. "Toddler: an embryonic signal that promotes cell movement via Apelin receptors". Science 343 (6172): 1248636. February 2014. doi:10.1126/science.1248636. PMID 24407481. 
  34. Deshwar, Ashish R; Chng, Serene C; Ho, Lena; Reversade, Bruno; Scott, Ian C (2016-04-14). Robertson, Elizabeth. ed. "The Apelin receptor enhances Nodal/TGFβ signaling to ensure proper cardiac development". eLife 5: e13758. doi:10.7554/eLife.13758. ISSN 2050-084X. PMID 27077952. 
  35. Scott, Ian C.; Masri, Bernard; D'Amico, Leonard A.; Jin, Suk-Won; Jungblut, Benno; Wehman, Ann M.; Baier, Herwig; Audigier, Yves et al. (March 2007). "The g protein-coupled receptor agtrl1b regulates early development of myocardial progenitors". Developmental Cell 12 (3): 403–413. doi:10.1016/j.devcel.2007.01.012. ISSN 1534-5807. PMID 17336906. 
  36. "A human microprotein that interacts with the mRNA decapping complex". Nature Chemical Biology 13 (2): 174–180. February 2017. doi:10.1038/nchembio.2249. PMID 27918561. 
  37. Ho, Lena; Tan, Shawn Y. X.; Wee, Sheena; Wu, Yixuan; Tan, Sam J. C.; Ramakrishna, Navin B.; Chng, Serene C.; Nama, Srikanth et al. (2015-10-01). "ELABELA Is an Endogenous Growth Factor that Sustains hESC Self-Renewal via the PI3K/AKT Pathway". Cell Stem Cell 17 (4): 435–447. doi:10.1016/j.stem.2015.08.010. ISSN 1875-9777. PMID 26387754. 
  38. "A human short open reading frame (sORF)-encoded polypeptide that stimulates DNA end joining". The Journal of Biological Chemistry 289 (16): 10950–7. April 2014. doi:10.1074/jbc.c113.533968. PMID 24610814. 
  39. "Humanin peptide suppresses apoptosis by interfering with Bax activation". Nature 423 (6938): 456–61. May 2003. doi:10.1038/nature01627. PMID 12732850. Bibcode2003Natur.423..456G. 
  40. "mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide". Nature 541 (7636): 228–232. January 2017. doi:10.1038/nature21034. PMID 28024296. Bibcode2017Natur.541..228M. 

Category:Peptides