pdb2seq
Manual
The following sections briefly describe how to run PDB2SEQ from the command- line, the required input and the produced output files.
Introduction
PDB2SEQ extracts residue sequences from atomic coordinate files in .pdb or .cif format and writes them as residue sequence data (.seq) FASTA format.
A specific model and chain from the input file may be selected with
the --model and --chain selectors.
Note: many tools exist to convert atomic models to residue sequences. Different parsers may handle edge cases such as insertion codes, duplicated residue identifiers, missing residues, or dummy residues inserted into models by ATSAS modelling applications, differently.
PDB2SEQ applies the same residue interpretation rules that are used internally by other ATSAS applications operating on atomic models. Using PDB2SEQ therefore ensures that the generated sequence is consistent with how ATSAS interprets the corresponding model.
Mixing sequences generated by external tools with ATSAS applications operating on atomic models may otherwise lead to subtle mismatches or unintended inconsistencies.
Running pdb2seq
Usage:
$ pdb2seq [OPTIONS] [MODEL]
OPTIONS known by PDB2SEQ are described in the next section, the required argument MODEL in the section on input files.
Command-line arguments and options
PDB2SEQ requires the following command line arguments:
| Argument | Description |
|---|---|
| MODEL | Exactly one atomic model file in .pdb or .cif format. |
PDB2SEQ accepts absolute as well as relative paths to the atomic coordinate MODEL in .pdb or .cif format. Instead of a file name, the argument may be given as ‘-‘ to read data from stdin. Input format on stdin is assumed to be .pdb for historic reasons.
PDB2SEQ recognizes the following command-line options:
| Short option | Long option | Description |
|---|---|---|
| --chain <LETTER> | Select one chain from a file with multiple chains (single letter). | |
| --model <ID> | Select one model from a file with multiple models. | |
| -o | --output <FILE> | Write output to FILE instead of stdout. |
| -v | --version | Print version information and exit. |
| -h | --help | Print a summary of arguments, options, and exit. |
Runtime output
PDB2SEQ writes the resulting sequences to stdout or to the file specified by
--output.
pdb2seq input files
PDB2SEQ accepts atomic coordinate data in atomic model or dummy atom model (.pdb) or atomic model or dummy atom model (.cif) format.
Use --chain to restrict output to a single chain ID and --model to restrict
output to a single model. If no --model or --chain is specified, all models
are converted.
pdb2seq output files
PDB2SEQ outputs sequences in residue sequence data (.seq)
format (FASTA). By default, output is written to stdout; use --output to save
to a file.
Examples
Extract sequences for all chains in the input model:
$ pdb2seq 6lyz.pdb > 6lyz.seq
Select chain A and write the sequence to a file:
$ pdb2seq --chain A --output chainA.seq 6lyz.cif
Select a specific model from an ensemble:
$ pdb2seq --model 2 6lyz.pdb