Protein and peptide sequences are usually represented using a string of amino acids using a well-known one letter code endorsed by the IUPAC. However, there is still no clear consensus about how to represent ‘proteoforms’ and ‘peptidoforms’, meaning all possible variations of a protein/peptide sequence, including protein modifications, both artefactual and post-translational modifications (PTMs). There are indeed multiple ways of encoding mass modifications and extended discussion has taken place to achieve a consensus. A standard notation for proteoforms and peptidoforms is then required for the community, so that it can be embedded in many relevant PSI (and potentially other) file formats.
The PSI has developed a format called PEFF (PSI Extended FASTA Format) that can be used to represent proteoforms. Additionally, the Consortium for Top Down Proteomics CTDP developed a notation format called ProForma v1, aiming to represent proteoforms.
This format specification represents the consensus for the standard representation of proteoforms and peptidoforms. This notation aims to support the main proteomics approaches, including bottom-up (focused on peptides/peptidoforms) and top-down (focused on proteins/proteoforms) approaches.
The ProForma notation is a string of characters that represent linearly one or more peptidoform/proteoform primary structures with possibilities to link peptidic chains together. It is not meant to represent secondary or tertiary structures.
Canonical IUPAC amino acids
EMEVEESPEK
EM[Oxidation]EVEES[UNIMOD:21]PEK
EM[L-methionine sulfoxide]EVEES[MOD:00046]PEK
EM[R:L-methionine (R)-sulfoxide]EVEES[RESID:AA0037]PEK
Cross-linkers using the XL-MOD ontology
EMEVTK[XLMOD:02001#XL1]SESPEK[#XL1]
EVTSEKC[L-cystine (cross-link)#XL1]LEMSC[#XL1]EFD
Glycans using the GNO (Glycan Naming Ontology) ontology
YPVLN[GNO:G62765YT]VTMPN[GNO:G02815KT]NSNGKFDK
EM[+15.9949]EVEES[-79.9663]PEK
RTAAX[+367.0537]WT
Elemental formulas and Glycan compositions
SEQUEN[Formula:C12H20O2]CE
SEQUEN[Glycan:HexNAc1Hex 2]CE
[iTRAQ4plex]-EMEVNESPEK-[Methyl]
{Glycan:Hex}EMEVNESPEK
[Phospho]?EMEVTSESPEK
EMEVT[#g1]S[#g1]ES[Phospho#g1]PEK
PROT(EOSFORMS)[+19.0523]ISK
<13C>ATPEILTVNSIGQLK
<[S-carboxamidomethyl-L-cysteine]@C>ATPEILTCNSIGCLK
ELV[info:AnyString]IS
ELV[+11.9784|info:suspected frobinylation]IS