Manual

The following sections briefly describe the method implemented in SREFLEX, how to run SREFLEX from the command-line, the required input and the produced output files.

Introduction

The SREFLEX program uses normal mode analysis (NMA) in Cartesian space to estimate the flexibility of high-resolution models of biological macromolecules and improve their agreement with experimental small angle X-ray and neutron scattering (SAXS and SANS) data. The method starts from a given structural conformation and a corresponding SAS profile in relative disagreement. The structure is partitioned into pseudo-domains based on user input or automatically from the protein dynamics as predicted by NMA. The algorithm proceeds hierarchically to first probe large rearrangements and progresses into smaller and more localized movements. The output consists in a set of structural models representing possible conformational changes that improve the agreement with the experimental SAS profile.

A mode to generate a pool of conformers from an initial structure file is also available.

Running sreflex

Usage:

$ sreflex [OPTIONS] <SAS FILE> <COORD FILE>

The restrained refinement stage that is performed initially by SREFLEX considers the input structure as a set of rigid bodies or pseudo-domains. By default, SREFLEX will partition the structure based on its NMA and define (pseudo-) domains that can move with relative independence of each other. The algorithm is explained in detail in the SREFLEX article. This automatic partitioning procedure is the default behaviour if a single coordinate file in PDB or mmCIF format is provided as the <COORD FILE> argument. Results may be improved if the user defines custom domains beforehand, these definitions can be based for example on structural or evolutionary information. In such case, the coordinates for user-defined domains should be provided to SREFLEX as different coordinate files files in a comma separated list (no spaces) in place of the <COORD FILE> argument. This will deactivate the automatic partitioning procedure and each coordinate file provided by the user will be considered as a pseudo-domain or rigid-body during the initial restrained refinement stage.

Command-Line Arguments and Options

SREFLEX recognizes the following command-line arguments.

Argument Description
SAS FILE Required. Exactly one experimental SAS data (.dat) file.
COORD FILE Required. The atomic coordinate file in .pdb or .cif format.

Absolute as well as relative paths to data files are accepted. Instead of a file name, one of the arguments may be given as ‘-‘ to read data from stdin.

SREFLEX recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.

Short Option Long Option Description
  --concoord Use third-patry CONCOORD for second refinement stage (LINUX only).
-p --prefix <PREFIX> Prefix to prepend to the output directory, the default iswd_sreflex.
-q --quiet Suppress screen output.
-t --threads <INT> Select the number of CPU cores/threads to use. All available cores will be used by default, that number also represents the upper limit for this parameter.
-r --ratio <FLOAT> Convergence ratio, default 0.7, range [0.5:0.9].
-f --first <INT> First SAS input data point to consider, default 1.
-N --neutron Work with SANS data.
-n --nmtop <INT> Top normal mode to consider, default is 16, range [9:64].
-s --skip <STAGE> Skip RESTRAINED or UNRESTRAINED refinement stage.
-P --pool <INT> Specify maximum number of conformers to generate from initial structure. Independent of scattering data
  --lm <N> Maximum order of harmonics; default: 20, minimum: 1, maximum: 100. This defines the resolution of the calculated curve. The default value should be sufficient in most use cases. For large or extended particles higher orders could improve the results, at the cost of an increased run time. This value must be increased whenever the maximum scattering angle is increased (smax).
  --fb <N> Order of Fibonacci grid; default: 17, minimum: 10, maximum: 18 The order of Fibonacci grid defines the number of points describing the surface of the macromolecule. Higher grid orders give a more accurate surface representation, but more CPU expensive. Only used if option shell=directional (the default).
  --smax <SM> Maximum scattering angle in inverse angstroms, either for calculating the theoretical curve up to SM or for fitting to SM; default: 0.5\(\AA^{-1}\), maximum: 2.0\(\AA^{-1}\)
  --ns <N> Number of calculated data points; default: 101, maximum = 10001.
  --units <N> Angular units of the experimental data: 1 = \(\AA^{-1}\), \(s = 4\pi sin(\theta)/\lambda\); 2 = \(\mathrm{nm}^{-1}\), \(s = 4 \pi sin(\theta)/\lambda\); 3 = \(\AA^{-1}\), \(s = 2 sin(\theta)/\lambda\); 4 = \(\mathrm{nm}^{-1}\), \(s = 2 sin(\theta)/\lambda\). By default, an attempt is made to estimate the unit scale.
  --dns <VALUE> Solvent density; default: \(0.334 \mathrm{e}/\AA^3\), the electron density of pure water. Solvents with high salt concentration may have a somewhat higher electron density.
  --dro <VALUE> Contrast of hydration shell, default: \(0.03 \mathrm{e}/\AA^3\)
  --constant Enables constant subtraction. This operation accounts for possible systematic errors due to mismatched buffers in the experimental data.
  --explicit-hydrogens Use explicit hydrogens provided in the atomic structure file; default: use implicit hydrogen groups determined by looking up the number of hydrogens in components.cif.
  --energy <eV> X-ray energy in eV, required for energy correction in anomalous SAXS only.
  --shell <VALUE> Shell kind, one of ‘directional’ (classic CRYSOL) or ‘water’ (previously CRYSOL3)
  --alternative-names Enable alternative (old) atom naming for all atomic structure files; default: disabled. See also:components.cif
  --implicit-hydrogen <N> Set this to a value N>=0 to override ‘unable to determine number of hydrogens’ errors.
  --sub-element <NAME> Set this to a valid element to override ‘unable to determine element’ errors.
-v --version Print version information and exit.
-h --help Print a summary of arguments, options, and exit.

sreflex Input Files

AUTORG expects background-subtracted experimental SAS data (.dat).

SREFLEX expects an atomic coordinate file in .pdb or .cif format.

NMA calculations are based on backbone atoms (centroids): alpha carbons (CA) for proteins and sugar C1’ for nucleotides. Please note that SREFLEX parses centroid atoms from the ATOM record. The program will not work properly if input coordinates lack backbone atoms. Non-solvent HETATM entries are grouped by residue identifier and number. These are then associated to the closest ATOM centroid in the structure for the application of rotations and translations. Theoretical scattering calculations consider all non-H ATOM and non-solvent HETATM entries.

sreflex Output Files

Upon execution, SREFLEX will create a directory to store results. The name of the output directory can be set through the prefix option. If the directory exists already, a number will be appended to create a new directory.

Output Description
log.txt Contains the same information as the screen output and is updated during execution of the program.
report.txt Summarizes execution details and results for each generated model.
models/*.pdb Models are created for both restrained (rc.pdb) and unrestrained (uc.pdb) stages in PDB format. Within each coordinates file, theREMARKsection contains further details.
fits/*.fit Fit of the calculated scattering curve versus the experimental data.

Examples

The examples are based on the known conformational change that adenylate kinase undergoes upon catalytic activity. PDB entry 1ake (chain A) was used to simulate a SAXS profile for the closed conformation (available as example file closed1akeA.dat). PDB entry 4ake (chain A) corresponds to the open conformational state (available as example file open4akeA.pdb). SREFLEX will refine the 4ake open conformation until it matches the SAXS profile of the closed conformation.

Default run

Use SREFLEX to model the conformational change of adenylate kinase:

$ sreflex closed1akeA.dat open4akeA.pdb

SREFLEX will create a directory called wd_sreflex where the output files (report.txt, log.txt, models and fits) can be found upon completion of the run.

Custom domain definitions

To illustrate the definition of custom structural domains, open4akeA.pdb was separated into two files ( open1.pdb and open2.pdb ) corresponding to the SCOP domain classification of the 4ake PDB entry. In this case, SREFLEX will refine the structure considering both PDB files as rigid-bodies for the initial restrained refinement stage, disabling the automatic structure partitioning approach. The –prefix option is used to store the results in a new directory named customDomains. Note that there are no spaces separating the PDB filenames at the <COORD FILE> argument:

$ sreflex --prefix=customDomains closed1akeA.dat open1.pdb,open2.pdb 

Crysol batch arguments

Command line arguments can be forwarded to CRYSOL to directly modify how theoretical intensities are calculated during refinement.

$ sreflex --lm 20 --sm 1.0 closed1akeA.dat open4akeA.pdb 

Please use these and other options with caution, as they can prevent the program from working properly.