MPEP 2423.01
Format and Symbols To Be Used in a "Sequence Listing"

Ninth Edition of the MPEP, Revision 07.2022, Last Revised in February 2023

Previous: §2423 | Next: §2423.02

2423.01    Format and Symbols To Be Used in a "Sequence Listing" [R-07.2022]

[Editor Note: This section is not applicable to applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). See MPEP §§ 2412-2419 for guidance on WIPO ST.26 requirements for applications filed on or after July 1, 2022.]

37 CFR 1.822 sets forth the format and symbols to be used for listing nucleotide and/or amino acid sequence data. The symbols for representing the nucleotide and/or amino acid characters in the sequences are set forth in Appendices A and C to Subpart G of Part 1 of the CFR. See MPEP § 2422(I). No other symbols shall be used in nucleotide and amino acid sequences. The "modified base" and "modified and unusual amino acid" symbols appearing in Appendices B and D to Subpart G of Part 1 of the CFR (see 37 CFR 1.822 and MPEP § 2422(I)) are not to be set forth in the sequences recited in the "Sequence Listing". However, "modified base" or "modified and unusual amino acid" symbols may be used in the written description and/or drawing portions of the specification. To properly enter notations for modified bases or amino acids in the "Sequence Listing", the Feature section of the "Sequence Listing" should be used. That is, a modified base or amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or amino acid is one of those listed in Appendices B and D to Subpart G of Part 1 of the CFR and the modification is also set forth in the Feature section of the "Sequence Listing". Otherwise, all nucleotide bases or amino acids not appearing in Appendices A and C to Subpart G of Part 1 of the CFR must be listed in a given sequence as "n" or "Xaa," respectively, with further information given in the Feature section of the "Sequence Listing" by including one or more feature keys listed in Appendices E and F to Subpart G of Part 1 of the CFR. See 37 CFR 1.822(b).

In 37 CFR 1.822(b) and 37 CFR 1.822(d), the use of three-letter symbols for amino acids is required in the "Sequence Listing". The three-letter symbols must be presented using the upper case for the first character and lower case for the remaining two characters. Applicants are encouraged to use the three-letter symbols for amino acids throughout the disclosure, instead of the one-letter symbols, for easier reading of the application and any patent issuing therefrom.

37 CFR 1.822(c) through (e) set forth the format for presenting sequence data. These paragraphs set forth the manner in which the characters in sequences are to be grouped, spaced, presented and numbered.