MPEP 2412.05(d)
Representation and Symbols of Amino Acid Sequence Data

Ninth Edition of the MPEP, Revision 07.2022, Last Revised in February 2023

Previous: §2412.05(c) | Next: §2412.05(e)

2412.05(d)    Representation and Symbols of Amino Acid Sequence Data [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 C.F.R. 1.832  Representation of nucleotide and/or amino acid sequence data in the "Sequence Listing XML" part of a patent application filed on or after July 1, 2022.

  • *****

  • (c) The representation and symbols for amino acid sequence data shall conform to the requirements of paragraphs (c)(1) through (4) of this section.
    • (1) The amino acids in an amino acid sequence must be represented in the manner described in paragraphs 24 and 25 of WIPO Standard ST.26.
    • (2) All amino acids, including modified amino acids and "unknown" amino acids, within an amino acid sequence must be represented using the symbols set forth in paragraphs 26–29 and 32 of WIPO Standard ST.26
    • (3) Modified amino acids within an amino acid sequence must be described in the manner discussed in paragraphs 29 and 30 of WIPO Standard ST.26.
    • (4) A region containing a known number of contiguous "X" residues for which the same description applies may be jointly described in the manner described in paragraph 34 of WIPO Standard ST.26.
  • *****

I.    REPRESENTATION OF AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 24, specifies that the amino acids in an amino acid sequence must be represented in the amino to carboxy direction from left to right. The amino and carboxy groups must not be represented in the sequence.

WIPO Standard ST.26, paragraph 25, indicates that the first amino acid in the sequence is residue position number 1, including amino acids preceding the mature protein, for example, pre-sequences, pro-sequences, pre-pro-sequences and signal sequences. When an amino acid sequence is circular in configuration and the ring consists solely of amino acid residues linked by peptide bonds, i.e., the sequence has no amino and carboxy termini, applicant must choose the amino acid in residue position number 1. Numbering is continuous through the entire sequence in the amino to carboxy direction.

II.    SYMBOLS FOR AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 26, specifies that all amino acids in a sequence must be represented using the symbols set forth in Table 3, above. Only uppercase letters must be used. Any symbol used to represent an amino acid is the equivalent of only one residue.

WIPO Standard ST.26, paragraph 27, indicates that where an ambiguity symbol (representing two or more amino acids in the alternative) is appropriate, the most restrictive symbol should be used, as listed in Table 3: List of Amino Acids Symbols (MPEP § 2412.03(a)). For example, if an amino acid in a given position could be aspartic acid or asparagine, the symbol "B" should be used, rather than "X". The symbol "X" will be construed as any one of "A", "R", "N", "D", "C", "Q", "E", "G", "H", "I", "L", "K", "M", "F", "P", "O", "S", "U", "T", "W", "Y", or "V", except where it is used with a further description in the feature table. The symbol "X" must not be used to represent anything other than an amino acid. A single modified or "unknown" amino acid may be represented by the symbol "X", together with a further description in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a "feature table"). For representation and inclusion of sequence variants, see MPEP § 2412.05(c). For details of how to represent variants in a "Sequence Listing XML," see MPEP § 2413.01(g), subsection XII.

WIPO Standard ST.26, paragraph 28, specifies that disclosed amino acid sequences separated by internal terminator symbols, represented for example by "Ter" or asterisk "*" or period "." or a blank space, must be included as separate sequences for each amino acid sequence that contains at least four specifically defined amino acids and is encompassed by the description of sequences found in MPEP § 2412.05(a), referencing paragraph 7 of WIPO Standard ST.26. Each such separate sequence must be assigned its own sequence identifier. Terminator symbols and spaces must not be included in a "Sequence Listing XML". This means that the element INSDSeq _sequence must disclose the sequence using only the appropriate symbols set forth in Table 1: List of Nucleotides Symbols and Table 3: List of Amino Acids Symbols (reproduced in MPEP § 2412.03(a)), above for the sequence. The sequence must not include numbers, punctuation or whitespace characters (WIPO Standard ST.26, paragraph 57).

WIPO Standard ST.26, paragraph 29, specifies that modified amino acids, including D-amino acids, should be represented in the sequence as the corresponding unmodified amino acids whenever possible. Any modified amino acid in a sequence that cannot otherwise be represented by any other symbol in Table 3: List of Amino Acids Symbols (reproduced in MPEP § 2412.03(a)), i.e., an "other" amino acid, must be represented by "X". The symbol "X" is the equivalent of only one residue.

Any "unknown" amino acid must be represented by the symbol "X" in the sequence. An "unknown" amino acid designated as "X" must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a "feature table" ) using the feature key "UNSURE" and optionally the qualifier "note." The symbol "X" is the equivalent of only one residue (WIPO Standard ST.26, paragraph 32).

III.    DESCRIPTION OF MODIFIED AMINO ACIDS WITHIN AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 29, specifies that modified amino acids, including D-amino acids, should be represented in the sequence as the corresponding unmodified amino acids whenever possible. Any modified amino acid in a sequence that cannot otherwise be represented by any other symbol in Table 3: List of Amino Acids Symbols (reproduced in MPEP § 2412.03(a)), i.e., an "other" amino acid, must be represented by "X". The symbol "X" is the equivalent of only one residue.

WIPO Standard ST.26, paragraph 30, provides that a modified amino acid must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a "feature table"). Where applicable, the feature keys "CARBOHYD" or "LIPID" should be used together with the qualifier "note". The feature key "MOD_RES" should be used for other post-translationally modified amino acids together with the qualifier "note"; otherwise the feature key "SITE" together with the qualifier "note" should be used. The value for the qualifier "note" must either be an abbreviation set forth in Table 4: List of Modified Amino Acids (reproduced in MPEP § 2412.03(b)), above, or the complete, unabbreviated name of the modified amino acid. The abbreviations set forth in Table 4, or the complete, unabbreviated names must not be used in the sequence itself.

IV.    JOINTLY DESCRIBING A REGION OF AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 34, provides that a region containing a known number of contiguous "X" residues for which the same description applies may be jointly described using the syntax "x..y" as the location descriptor in the element INSDFeature_location (see MPEP § 2413.01(g) subsection IV, for information regarding INSDFeature_location). For representation and inclusion of sequence variants, see MPEP § 2412.05(c). For details of how to represent variants in a "Sequence Listing XML," see MPEP § 2413.01(g), subsection XII.