Manual

The following sections briefly describe how to run PDB2SEQ from the command- line, the required input and the produced output files.

Introduction

PDB2SEQ extracts residue sequences from atomic coordinate files in .pdb or .cif format and writes them as residue sequence data (.seq) FASTA format.

A specific model and chain from the input file may be selected with the --model and --chain selectors.

Note: many tools exist to convert atomic models to residue sequences. Different parsers may handle edge cases such as insertion codes, duplicated residue identifiers, missing residues, or dummy residues inserted into models by ATSAS modelling applications, differently.

PDB2SEQ applies the same residue interpretation rules that are used internally by other ATSAS applications operating on atomic models. Using PDB2SEQ therefore ensures that the generated sequence is consistent with how ATSAS interprets the corresponding model.

Mixing sequences generated by external tools with ATSAS applications operating on atomic models may otherwise lead to subtle mismatches or unintended inconsistencies.

Running pdb2seq

Usage:

$ pdb2seq [OPTIONS] [MODEL]

OPTIONS known by PDB2SEQ are described in the next section, the required argument MODEL in the section on input files.

Command-line arguments and options

PDB2SEQ requires the following command line arguments:

Argument	Description
MODEL	Exactly one atomic model file in .pdb or .cif format.

PDB2SEQ accepts absolute as well as relative paths to the atomic coordinate MODEL in .pdb or .cif format. Instead of a file name, the argument may be given as ‘-‘ to read data from stdin. Input format on stdin is assumed to be .pdb for historic reasons.

PDB2SEQ recognizes the following command-line options:

Short option	Long option	Description
	--chain <LETTER>	Select one chain from a file with multiple chains (single letter).
	--model <ID>	Select one model from a file with multiple models.
-o	--output <FILE>	Write output to FILE instead of stdout.
-v	--version	Print version information and exit.
-h	--help	Print a summary of arguments, options, and exit.

Runtime output

PDB2SEQ writes the resulting sequences to stdout or to the file specified by --output.

pdb2seq input files

PDB2SEQ accepts atomic coordinate data in atomic model or dummy atom model (.pdb) or atomic model or dummy atom model (.cif) format.

Use --chain to restrict output to a single chain ID and --model to restrict output to a single model. If no --model or --chain is specified, all models are converted.

pdb2seq output files

PDB2SEQ outputs sequences in residue sequence data (.seq) format (FASTA). By default, output is written to stdout; use --output to save to a file.

Examples

Extract sequences for all chains in the input model:

$ pdb2seq 6lyz.pdb > 6lyz.seq

Select chain A and write the sequence to a file:

$ pdb2seq --chain A --output chainA.seq 6lyz.cif

Select a specific model from an ensemble:

$ pdb2seq --model 2 6lyz.pdb