MPEP 2422
Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications

This is the Ninth Edition of the MPEP, Revision 08.2017, Last Revised in January 2018

Previous: §2421.04 | Next: §2422.01

2422    Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications [R-07.2015]

37 C.F.R. 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications.

  • (a) Nucleotide and/or amino acid sequences as used in §§ 1.821 through 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this section. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2, herein incorporated by reference. (Hereinafter "WIPO Standard ST.25 (1998)''). This incorporation by reference was approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of WIPO Standard ST.25 (1998) may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.25 may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC. Nucleotides and amino acids are further defined as follows:
    • (1) Nucleotides: Nucleotides are intended to embrace only those nucleotides that can be represented using the symbols set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 1. Modifications, e.g., methylated bases, may be described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 2, but shall not be shown explicitly in the nucleotide sequence.
    • (2) Amino acids: Amino acids are those L-amino acids commonly found in naturally occurring proteins and are listed in WIPO Standard ST.25 (1998), Appendix 2, Table 3. Those amino acid sequences containing D-amino acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated using the symbols shown in WIPO Standard ST.25 (1998), Appendix 2, Table 3 with the modified positions; e.g., hydroxylations or glycosylations, being described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 4, but these modifications shall not be shown explicitly in the amino acid sequence. Any peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc., is embraced by this definition.
  • (b) Patent applications which contain disclosures of nucleotide and/or amino acid sequences, in accordance with the definition in paragraph (a) of this section, shall, with regard to the manner in which the nucleotide and/or amino acid sequences are presented and described, conform exclusively to the requirements of §§ 1.821 through 1.825.
  • (c) Patent applications which contain disclosures of nucleotide and/or amino acid sequences must contain, as a separate part of the disclosure, a paper copy disclosing the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of §§ 1.822 and 1.823. This paper copy is hereinafter referred to as the "Sequence Listing." Each sequence disclosed must appear separately in the "Sequence Listing." Each sequence set forth in the "Sequence Listing" shall be assigned a separate sequence identifier. The sequence identifiers shall begin with 1 and increase sequentially by integers. If no sequence is present for a sequence identifier, the code "000" shall be used in place of the sequence. The response for the numeric identifier <160> shall include the total number of SEQ ID NOs, whether followed by a sequence or by the code "000."
  • (d) Where the description or claims of a patent application discuss a sequence that is set forth in the "Sequence Listing" in accordance with paragraph (c) of this section, reference must be made to the sequence by use of the sequence identifier, preceded by "SEQ ID NO:" in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application.
  • (e) A copy of the "Sequence Listing" referred to in paragraph (c) of this section must also be submitted in computer readable form in accordance with the requirements of § 1.824. The computer readable form is a copy of the "Sequence Listing" and will not necessarily be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Patent and Trademark Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application if the computer readable form in the other application was compliant with all of the requirements of these rules. The new application shall be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. In the new application, applicant must also request the use of the compliant computer readable "Sequence Listing" that is already on file for the other application and must state that the paper copy of the "Sequence Listing" in the new application is identical to the computer readable copy filed for the other application.
  • (f) In addition to the paper copy required by paragraph (c) of this section and the computer readable form required by paragraph (e) of this section, a statement that the content of the paper and computer readable copies are the same must be submitted with the computer readable form, e.g., a statement that "the information recorded in computer readable form is identical to the written sequence listing."
  • (g) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage under 35 U.S.C. 371, applicant will be notified and given a period of time within which to comply with such requirements in order to prevent abandonment of the application. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission includes no new matter.
  • (h) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing an international application under the Patent Cooperation Treaty (PCT), which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant will be sent a notice necessitating compliance with the requirements within a prescribed time period. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. If applicant fails to timely provide the required computer readable form, the United States International Searching Authority shall search only to the extent that a meaningful search can be performed without the computer readable form and the United States International Preliminary Examining Authority shall examine only to the extent that a meaningful examination can be performed without the computer readable form.

I.    INCORPORATION BY REFERENCE OF WIPO ST.25 (1998) IN 37 CFR 1.821

37 CFR 1.821 incorporates by reference the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25 (1998), including Tables 1 through 6 of Appendix 2. Copies may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC 20408. These tables are reproduced below. The 1998 version of WIPO ST.25 is available online at www.wipo.int/standards/en/archives.html. Note that the standard was revised in December 2009, and the current version is available online at www.wipo.int/export/sites/www/standards/en/pdf/03-25-01.pdf.

WIPO Standard ST.25 (1998), Appendix 2, Table 1, provides that the bases of a nucleotide sequence should be represented using the following one-letter symbol for nucleotide sequence characters:

Symbol Meaning Origin of designation
a a adenine 
g g guanine 
c c cytosine 
t t thymine 
u u uracil 
r g or a purine 
y t/u or c pyrimidine 
m a or c amino 
k g or t/u keto 
s g or c strong interactions 3H-bonds 
w a or t/u weak interactions 2H-bonds 
b g or c or t/u not a
d a or g or t/u not c
h a or c or t/u not g
v a or g or c not t, not u
n a or g or c or t/u, unknown, or other  any

WIPO Standard ST.25 (1998), Appendix 2, Table 2, provides that modified bases may be represented as the corresponding unmodified bases in the sequence itself, if the modification is further described in numeric identifier <223> of the Feature section of the sequence listing. The symbols from the list below may be used in the description (i.e., the specification and drawing, or in the Feature section of the sequence listing) but these symbols may not be used in the sequence itself. Modifications not listed in Table 2 may also be represented as the corresponding unmodified base in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the sequence listing.

Symbol Meaning
ac4c  4-acetylcytidine 
chm5u  5-(carboxyhydroxymethyl)uridine 
cm  2'-O-methylcytidine 
cmnm5s2u  5-carboxymethylaminomethyl-2-thiouridine 
cmnm5u  5-carboxymethylaminomethyluridine 
d dihydrouridine 
fm  2'-O-methylpseudouridine 
gal q beta, D-galactosylqueuosine 
gm  2'-O-methylguanosine 
i inosine 
i6a  N6-isopentenyladenosine 
m1a  1-methyladenosine 
m1f  1-methylpseudouridine 
m1g  1-methylguanosine 
m1i  1-methylinosine 
m22g  2,2-dimethylguanosine 
m2a  2-methyladenosine 
m2g  2-methylguanosine 
m3c  3-methylcytidine 
m5c  5-methylcytidine 
m6a  N6-methyladenosine 
m7g  7-methylguanosine 

mam5u  5-methylaminomethyluridine 
mam5s2u  5-methoxyaminomethyl-2-thiouridine 
man q beta, D-mannosylqueuosine 
mcm5s2u  5-methoxycarbonylmethyl-2-thiouridine 
mcm5u  5-methoxycarbonylmethyluridine 
mo5u  5-methoxyuridine 
ms2i6a  2-methylthio-N6-isopentenyladenosine 
ms2t6a  N-((9-beta-D-ribofuranosyl-2-methylthiopurine -6-yl)carbamoyl)threonine 
mt6a  N-((9-beta-D-ribofuranosylpurine-6-yl) N-methylcarbamoyl)threonine 
mv  uridine-5-oxyacetic acid-methylester 
o5u  uridine-5-oxyacetic acid 
osyw  wybutoxosine 
p pseudouridine 
q queuosine 
s2t  5-methyl-2-thiouridine 
s2c  2-thiocytidine 
s2t  5-methyl-2-thiouridine 
s2u  2-thiouridine 
s4u  4-thiouridine 
t 5-methyluridine 
t6a  N-((9-beta-D-ribofuranosylpurine-6-yl)- carbamoyl)threonine 
tm  2'-O-methyl-5-methyluridine 
um  2'-O-methyluridine 
yw  wybutosine 
x 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u

WIPO Standard ST.25 (1998), Appendix 2, Table 3, provides that the amino acids should be represented using the following three-letter symbols with the first letter as a capital.

Symbol Meaning
Ala  Alanine 
Cys  Cysteine 
Asp  Aspartic Acid 
Glu  Glutamic Acid 
Phe  Phenylalanine 
Gly  Glycine 
His  Histidine 
Ile  Isoleucine 
Lys  Lysine 
Leu  Leucine 
Met  Methionine 
Asn  Asparagine 
Pro  Proline 
Gln  Glutamine 
Arg  Arginine 
Ser  Serine 
Thr  Threonine 
Val  Valine 
Trp  Tryptophan 
Tyr  Tyrosine 
Asx  Asp or Asn 
Glx  Glu or Gln 
Xaa  unknown or other

WIPO Standard ST.25 (1998), Appendix 2, Table 4, provides that modified and unusual amino acids may be represented as the corresponding unmodified amino acids in the sequence itself if the modification is further described in numeric identifier <223> of the Feature section of the sequence listing. The symbols from the list below may be used in the description (i.e., the specification and drawings, or in the Feature section of the sequence listing) but these symbols may not be used in the sequence itself. Modifications not listed in Table 4 may also be represented as the corresponding unmodified amino acid in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the sequence listing.

Symbol Meaning
Aad  2-Aminoadipic acid 
bAad  3-Aminoadipic acid 
bAla beta-Alanine, beta-Aminopropionic acid 
Abu  2-Aminobutyric acid 
4Abu  4-Aminobutyric acid, piperidinic acid 
Acp  6-Aminocaproic acid 
Ahe  2-Aminoheptanoic acid 
Aib  2-Aminoisobutyric acid 
bAib  3-Aminoisobutyric acid 
Apm  2-Aminopimelic acid 
Dbu  2,4-Diaminobutyric acid 
Des  Desmosine 
Dpm  2,2' -Diaminopimelic acid 
Dpr  2,3-Diaminopropionic acid 
EtGly  N-Ethylglycine 
EtAsn  N-Ethylasparagine 
Hyl  Hydroxylysine 
aHyl  allo-Hydroxylysine 
3Hyp  3-Hydroxyproline 
4Hyp  4-Hydroxyproline 
Ide  Isodesmosine 
aIle  allo-Isoleucine 
MeGly  N-Methylglycine, sarcosine 
MeIle  N-Methylisoleucine 
MeLys  6-N-Methyllysine 
MeVal  N-Methylvaline 
Nva  Norvaline 
Nle  Norleucine 
Orn  Ornithine 

WIPO Standard ST.25 (1998), Appendix 2, Table 5, provides for feature keys related to DNA sequences.

Key Description
allele  a related individual or strain contains stable, alternative forms of the same gene which differs from the presented sequence at this location (and perhaps others)
attenuator  (1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription 
C_region  constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain 
CAAT_signal  CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT 
CDS  coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation 
conflict  independent determinations of the "same" sequence differ at this site or region 
D-loop  displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein 
D-segment  diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain 
enhancer  a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter 
exon  region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3'UTR 
GC_signal  GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG 
gene  region of biological interest identified as a gene and for which a name has been assigned 
iDNA  intervening DNA; DNA which is eliminated through any of several kinds of recombination 
intron  a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it 
J_segment  joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains 
LTR  long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses 
mat_peptide  mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS)
misc_binding  site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind)
misc_difference  feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base)
misc_feature  region of biological interest which cannot be described by any other feature key; a new or rare feature 
misc_recomb  site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral)
misc_RNA  any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA)
misc_signal  any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin)
misc_structure  any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop)
modified_base  the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value)
mRNA  messenger RNA; includes 5' untranslated region (5'UTR), coding sequences (CDS, exon) and 3' untranslated region (3'UTR)
mutation  a related strain has an abrupt, inheritable change in the sequence at this location 
N_region  extra nucleotides inserted between rearranged immunoglobulin segments 
old_sequence  the presented sequence revises a previous version of the sequence at this location 
polyA_signal  recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA 
polyA_site  site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation 
precursor_RNA  any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)
prim_transcript  primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)
primer_bind  non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements 
promoter  region on a DNA molecule involved in RNA polymerase binding to initiate transcription 
protein_bind  non-covalent protein binding site on nucleic acid 
RBS  ribosome binding site 
repeat_region  region of genome containing repeating units 
repeat_unit  single repeat element 
rep_origin  origin of replication; starting site for duplication of nucleic acid to give two identical copies 
rRNA  mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins 
S_region  switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell 
satellite  many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA 
scRNA  small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote 
sig_peptide  signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence 
snRNA  small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions 
source  identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissable 
stem_loop  hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA 
STS  Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs 
TATA_signal  TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T)
terminator  sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein 
transit_peptide  transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle 
tRNA  mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence 
unsure  author is unsure of exact sequence in this region 
V_region  variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments 
V_segment  variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide 
variation  a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others)
3'clip  3'-most region of a precursor transcript that is clipped off during processing 
3'UTR  region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein 
5'clip  5'-most region of a precursor transcript that is clipped off during processing 
5'UTR  region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein 
-10_signal  pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT 
-35_signal  a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ]

WIPO Standard ST.25 (1998), Appendix 2, Table 6 provides for feature keys related to protein sequences.

Key Description
CONFLICT  different papers report differing sequences 
VARIANT  authors report that sequence variants exist 
VARSPLIC  description of sequence variants produced by alternative splicing 
MUTAGEN  site which has been experimentally altered 
MOD_RES  post-translational modification of a residue 
ACETYLATION  N-terminal or other 
AMIDATION  generally at the C-terminal of a mature active peptide 
BLOCKED  undetermined N- or C-terminal blocking group 
FORMYLATION  of the N-terminal methionine 
GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION  of asparagine, aspartic acid, proline or lysine 
METHYLATION  generally of lysine or arginine 
PHOSPHORYLATION  of serine, threonine, tyrosine, aspartic acid or histidine 
PYRROLIDONE CARBOXYLIC ACID  N-terminal glutamate which has formed an internal cyclic lactam 
SULFATATION  generally of tyrosine 
LIPID  covalent binding of a lipidic moiety 
MYRISTATE  myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue 
PALMITATE  palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue 
FARNESYL  farnesyl group attached through a thioether bond to a cysteine residue 
GERANYL-GERANYL  geranyl-geranyl group attached through a thioether bond to a cysteine residue 
GPI-ANCHOR  glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein 
N-ACYL DIGLYCERIDE  N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages 
DISULFID  disulfide bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the ‘FROM’ and ‘TO’ endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link 
THIOLEST  thiolester bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thiolester bond 
THIOETH  thioether bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thioether bond 
CARBOHYD  glycosylation site; the nature of the carbohydrate (if known) is given in the description field 
METAL  binding site for a metal ion; the description field indicates the nature of the metal 
BINDING  binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field 
SIGNAL  extent of a signal sequence (prepeptide)
TRANSIT  extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody)
PROPEP  extent of a propeptide 
CHAIN  extent of a polypeptide chain in the mature protein 
PEPTIDE  extent of a released active peptide 
DOMAIN  extent of a domain of interest on the sequence; the nature of that domain is given in the description field 
CA_BIND  extent of a calcium-binding region 
DNA_BIND  extent of a DNA-binding region 
NP_BIND  extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field 
TRANSMEM  extent of a transmembrane region 
ZN_FING  extent of a zinc finger region 
SIMILAR  extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field 
REPEAT  extent of an internal sequence repetition 
HELIX  secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi-helix 
STRAND  secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge
TURN  secondary structure: Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn)
ACT_SITE  amino acid(s) involved in the activity of an enzyme 
SITE  any other interesting site on the sequence 
INIT_MET  the sequence is known to start with an initiator methionine 
NON_TER  the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key 
NON_CONS  non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them 
UNSURE  uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment 

II.    FILING INTERNATIONALLY

The requirements of 37 CFR 1.821 through 37 CFR 1.825 are the result of an effort to harmonize the USPTO requirements with international sequence listing requirements to the extent possible. The requirements of 37 CFR 1.821 through 37 CFR 1.825 substantially correspond to the requirements of WIPO Standard ST.25. PatentIn Version 3.5.1 software (see MPEP § 2430) generates sequence listings that meet all of the requirements of WIPO Standard ST.25. The requirements of 37 CFR 1.821 through 37 CFR 1.825, however, are less stringent than the requirements of WIPO Standard ST.25. Thus, applicants who wish to file in countries which adhere to WIPO Standard ST.25 should consider the following when not using PatentIn Version 3.5.1:

  • (A) The data in numeric identifier <221> must use selections from Tables 5 and 6 of WIPO Standard ST.25 (2009) to comply with that standard. The terms from these Tables are considered language neutral vocabulary;
  • (B) Where the sequence listing forming part of the international application contains free text, e.g., free text in numeric identifier <223>, any such free text shall be repeated in the main part of the description in the language thereof. It is recommended that the free text in the language of the main part of the description be put in a specific section of the description called "Sequence Listing Free Text;
  • (C) A sequence listing filed after the international filing date is generally not considered to be part of the disclosure and usually will not be published as part of the international application publication (see PCT Article 34 and PCT Rules 26 and 91 for exceptions);
  • (D) Paragraphs 4(v) and 4bis(iv) of WIPO Standard ST.25 (2009) requires the specific wording "the information recorded in electronic form furnished under PCT Rule 13ter is identical to the sequence listing"; and
  • (E) WIPO Standard ST.25 (2009), paragraph 24, requires a blank line between numeric identifiers in the sequence listing when the digit in the first or second position of the numeric identifier changes.

Requirements related to the submission of sequence listings may also differ between filing in the United States and filing internationally. For example, where an international application is filed in paper, the sequence listing part of the international application must also be provided in paper, although the search copy must be filed in electronic form, e.g. on a CD or, in the RO/US, as an ASCII text file via EFS-Web. Also, any tables filed in an international application must be an integral part of the application, i.e., cannot be submitted as a separate file in text format.