Manual

The following describes SASREFMX, a special version of the rigid body modelling program SASREF for structural analysis of transient complexes and weak oligomers from polydisperse scattering data.

SASREFMX couples rigid body modelling with mixture analysis, estimating volume fractions of associated states and dissociation products. This manual describes interactive configuration, required input files, and produced output files.

Introduction

SASREFMX performs quaternary structure modelling of a complex particle formed by subunits with known atomic structures against scattering data from polydisperse samples. Multiple curves can be fitted simultaneously, e.g. profiles measured at different concentration, temperature, pH, or ionic strength conditions, reflecting different affinities and association states.

The algorithm can account for symmetry (global, subunit-specific, and curve-specific). A simulated annealing protocol constructs an interconnected ensemble of subunits without steric clashes while minimizing discrepancy between experimental and predicted scattering.

SASREFMX models each curve as a linear combination of the intact complex and predefined dissociation products. These dissociation products are defined implicitly via a dummy entry in the data control file and explicitly in the last row of the cross-correlation matrix.

For further details of the rigid body modelling approach, please refer to the SASREF manual.

Running sasrefmx

Command-Line

Usage:

$ sasrefmx [OPTIONS]

SASREFMX is configured in interactive mode; command-line options are used to supplement the run.

Arguments and Options

SASREFMX recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.

Short Option Long Option Description
  --seed=<INT> Set the seed for the random number generator
  --model-format=<FMT> Format of 3D models, one of: cif, pdb (default: cif)
  --alternative-names Enable alternative atom naming for all atomic structure files; default: disabled.
  --implicit-hydrogen=<N> Set this to a value N>=0 to override ‘unable to determine number of hydrogens’ errors.
  --sub-element=<NAME> Set this to a valid element to override ‘unable to determine element’ errors.
-h --help Print a summary of arguments, options, and exit.
-v --version Print version information and exit.

Interactive Configuration

SASREFMX runs in dialog mode. A substantial part of the setup is provided through configuration files.

There are two modes, EXPERT and USER. In EXPERT mode, more parameters can be changed. In USER mode, fewer questions are asked and defaults are used for most parameters. The default values are identical in both modes.

An interactive answers file (.ans) may be used to record and replay configurations, enabling repeatable runs without re-entering parameters.

Screen text Mode Default value Description
Computation mode (User or Expert) U|E USER Mode selection.
Log file name U|E N/A Project identifier, used as prefix for output files.
Enter project description U|E N/A Free text stored in the log file.
Symmetry: Pn(2) (n=2-9) U|E P1 Master symmetry. Subunits and curves may use lower symmetry. Supported symmetries are P1, P2-P9, P222, and P32-P92.
File name with curves info U|E N/A Configuration file for scattering profiles to fit.
File name with smearing parameters U|E empty Optional resolution-smearing settings (Resolution function (.res)). If no file is given, smearing is disabled.
File name with subunits info U|E N/A Configuration file for atomic models.
File name with cross-dependencies U|E N/A Configuration file defining which subunits contribute to each curve and dissociation product.
Cross penalty weight E 10.0 Weight of steric-overlap penalty in mutation acceptance.
Disconnectivity penalty weight E 10.0 Weight of connectivity penalty in mutation acceptance.
File name, contacts conditions, CR for none <.cnd> U|E empty Optional distance restraints file.
Contacts penalty weight E 10.0 Weight for contact-restraint violations. Asked only if a contacts file is provided.
Expected particle shape: Prolate, Oblate, or Unknown U|E UNKNOWN Optional anisometry restraint type.
Anisometry penalty weight E 1.0 Weight of anisometry penalty. Skipped if shape is UNKNOWN.
Expected direction of anisometry: aLong Z, aCross Z, or Unknown U|E UNKNOWN Asked only if shape is known and symmetry is P2.
Shift penalty weight E 1.0 Weight for displacement of the full model from origin.
Spatial step in angstroems E 5.0 Max random translation per mutation; asked per subunit.
Angular step in degrees E 20.0 Max random rotation per mutation; asked per subunit.
Initial annealing temperature E 10.0 Starting temperature of simulated annealing.
Annealing schedule factor E 0.9 Cooling factor per temperature step.
Max # of iterations at each T E var Default is 5000 * total number of subunits.
Max # of successes at each T E var Default is 500 * total number of subunits.
Min # of successes to continue E var Default is 50 * total number of subunits.
Max # of annealing steps E 100 Hard stop for annealing steps.

Runtime Output

On runtime, output includes temperature-step progress, best \(\chi^2\) values, and associated volume fractions:

j:   4 T: 0.729E+01 Suc:  1000 Eva:    12497 CPU:  0.208E+03 F:99.4301 Pen: 13.803
The best chi^2 values:11.64871 5.96331
Associated volume fractions: 0.42157 0.64983

The fields can be interpreted as follows:

Field Description
j Step number. Starts at 1 and increases monotonically.
T Temperature value, reduced each step by the annealing schedule factor.
Suc Number of successful mutations in the current temperature step.
Eva Accumulated number of function evaluations.
CPU Elapsed wall-clock time since start of annealing.
F Best target function value obtained so far.
Pen Total penalty value of the best target function.
The best chi^2 values \(\chi^2\) values for each fitted experimental curve.
Associated volume fractions Volume fraction of the associated state for each fitted experimental curve.

Graphical Interface

There is no graphical interface for SASREFMX.

sasrefmx Input Files

Three compulsory configuration files are required:

Data control file

  • The first row shall indicate the total number \(K\) of rows following
  • \(K\) rows follow; each row has 8 whitespace-separated values/columns
  • a last dummy line describing dissociation products. It keeps placeholder values in most fields; symmetry is used in downstream construct generation. The cross-correlation file has to account for this.

Values/columns for each data row, in order:

Column Description Valid values
1 File name with experimental SAS data (.dat). An existing file name without whitespace.
2 D2O fraction in solvent, or -1.0 for X-ray data -1.0, [0.0-1.0]
3 Symmetry for this construct under this condition Pn, Pn2, with n=1,…,9
4 Define angular units of the experimental SAS data (.dat). 1, 2, 3, 4
5 Fraction of the curve to be fitted [0.1-1.0]
6 Setting number of an optional Resolution function (.res) 0-15
7 Weight of this curve in the target function [0.0-1.0]
8 If Y, a constant background is adjusted automatically Y|N

Subunits control file

The subunits control file describes the rigid bodies:

  • The first row shall indicate the total number \(M\) of rows following
  • \(M\) rows follow; each row has 4 whitespace-separated values/columns

Values/columns for each data row, in order:

Column Description Valid values
1 Model file name (PDB or mmCIF) An existing file name without whitespace.
2 Whether to shift the subunit to origin at initialization Y|N
3 Movement limitations:
- N = none
- F = fixed
- X = along X only
- Y = along Y only
- Z = along Z only
- D = along (1,1,1) only
[N/F/X/Y/Z/D]
4 Symmetry applied to this subunit Pn, Pn2, with n=1,…,9

Cross-correlation file

The cross-correlation file ties the experimental data and subunits together. It contains a matrix with \(M\) columns and \(K\) rows, where \(M\) is the number of subunits and \(K\) is the number of curves.

Entry (i,j) of this matrix specifies the contribution of the i-th subunit to the j-th data set.

For SANS curves the matrix value ([0.0-1.0 or -1.0]) is the subunit perdeuteration (level of D2O in expression medium); -1.0 means the subunit is not present in that construct. For X-ray curves, use 0.0 if the subunit is present, -1.0 otherwise.

As the data control file adds a dummy entry for dissociation products, the last row of the cross-correlation file has to account for this:

  • -1 means the subunit is absent from dissociation products.
  • 0 means the subunit is part of a single sub-complex product.
  • positive integer values define stoichiometric coefficients for fully dissociated components.

Distance restraints

Distance restraints may be imposed via an optional contacts conditions file:

dist 7.0
1 0 0 2 1 1
dist 5.0
2 0 0 3 1 1
dist 7.0
1 342 342 2 25 25
1 350 350 2 17 17
dist 6.0
1 290 297 2  64 79
dist 7.0
1 1 0 3 1 0

dist 7.0 means that the minimum distance between CA atoms of selected residue ranges (or P atoms in nucleotides) should not exceed 7 \(\text{\AA}\).

In a line without dist, the 1st and 4th values are ordinal subunit numbers. The 2nd-3rd and 5th-6th values specify residue ranges in first and second subunits, respectively; 0 means the last residue/nucleotide.

If multiple alternatives are listed after one dist line, the smallest alternative distance is compared to the threshold.

Please refer to the SASREF manual for details.

Important (new as of ATSAS-4.0): in the presence of symmetry, subunit numbering differs from older versions. First all symmetry mates of subunit 1 are listed, then of subunit 2, and so on.

sasrefmx Output Files

After each simulated annealing step, SASREFMX writes output files using the log-file prefix. Existing files with the same prefix are overwritten.

Extension Description
.log Copy of screen output
.pdb or .cif Current complex model in PDB or mmCIF format (depends on model-format).
-i.fit Fit for the i-th fitted experimental curve, including refined volume fraction.

Examples

Quaternary Structure Analysis of a Weak Tetramer

Let weak_tetramer.dat be a SAXS profile from a polydisperse sample containing an equilibrium between a P222 tetramer and its monomer. The monomer atomic structure is in monomer.pdb. This can be described by:

Data control file, curves.con:

2
weak_tetramer.dat -1.00 P222 1 1.0 0 1.0 y
dummy.dat         -1.00 P1   1 1.0 0 1.0 y

Subunits control file, subunits.con:

1
monomer.pdb Y N P222

Cross-correlation file, table.con:

0.0
0.0

Modelling of a Transient Heterodimer

An equimolar mixture of proteins A and B yields an equilibrium between AB and unbound A/B. A concentration series with different associated fractions can be fitted simultaneously.

Data control file, curves.con:

4
transient_c1.dat -1.00 P1 1 1.0 0 1.0 y
transient_c2.dat -1.00 P1 1 1.0 0 1.0 y
transient_c3.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat        -1.00 P1 1 1.0 0 1.0 y

Subunits control file, subunits.con:

2
a.pdb Y N P1
b.pdb Y N P1

Cross-correlation file, table.con:

0.0 0.0
0.0 0.0
0.0 0.0
1.0 1.0

Complex with Excess of a Component in Solution

If excess subunit C is required to form stable complex ABC, the sample can be polydisperse. Configuration files:

Data control file, curves.con:

2
mixture.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat   -1.00 P1 1 1.0 0 1.0 y

Subunits control file, subunits.con:

3
a.pdb Y N P1
b.pdb Y N P1
c.pdb Y N P1

Cross-correlation file, table.con:

0.0 0.0 0.0
-1.0 -1.0 0.0

Presence of a Sub-Complex in the Mixture

If subunit C is limiting (e.g. stoichiometric study), sub-complex AB may be present together with ternary ABC. Configuration files:

Data control file, curves.con:

2
mixture.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat   -1.00 P1 1 1.0 0 1.0 y

Subunits control file, subunits.con:

3
a.pdb Y N P1
b.pdb Y N P1
c.pdb Y N P1

Cross-correlation file, table.con:

0.0 0.0 0.0
0.0 0.0 -1.0