Manual

The following describes the special versions of the rigid body modelling program SASREF, SASREFCV (formerly known as SASREF7) for fitting of SANS contrast variation series (also combinable with SAXS) and SASREFMX for the structural analysis of transient complexes and weak oligomers from polydisperse data. In the latter case, the rigid body modelling is coupled with mixture analysis, whereby the volume fractions of the dissociation products are estimated. The two approaches have very much in common, so this manual provides details of the dialog prompt as well as the required configuration / input files as well as the produced output for both programs.

Introduction

SASREF CV / MX perform quaternary structure modeling of a complex particle formed by subunits with known atomic structure against the SAS data set in case of contrast variation series and a polydisperse system, respectively. Multiple data sets can be fitted simultaneously, e.g. different D2O content and/or perdeuteration (in SASREFCV) or profiles recorded at different conditions (concentration, temperature, pH, ionic strength) yielding different affinity of the complex particle (in SASREFMX). Both algorithms are capable to account for the symmetry (which can be subunit- and data-specific). A simulated annealing protocol is employed to construct an interconnected ensemble of subunits without steric clashes, while minimizing the discrepancy between the experimental scattering data and the predicted curves from the appropriate subunits assemblies. In case of SASREFMX, the experimental data is fitted by the linear combination of the profiles calculated from the intact particle and from the dissociation products. For futher details of the rigid body modelling approach please refer to SASREF manual and to the papers cited above.

Running SASREFCV/SASREFMX

Command-Line Arguments and Options

Short Option	Long Option	Description
	--seed=<INT>	Set the seed for the random number generator
	--model-format=<FMT>	Format of 3D models, one of: cif, pdb (default: cif)
	–implicit-hydrogen=<N>	Set this to a value N>=0 to override ‘unable to determine number of hydrogens’ errors.
	–sub-element==<NAME>	Set this to a valid element to override ‘unable to determine element’ errors.
-h	--help	Print a summary of arguments, options, and exit.
-v	--version	Print version information and exit.

Interactive Configuration

SASREF CV / MX can only be run in the dialog mode, no command line arguments are accepted. Similarly to MONSA, significant amount of the user input is provided using configuration files. There are two modes, EXPERT and USER. In the former mode, the user have the options to adjust more parameters. In the latter mode, fewer questions are asked as the default values are used for the most of the program parameters. The default settings are the same in both modes. SASREF CV / MX interactive prompt:

Screen Text	Default	Asked inUSER-mode?	Description
Computation mode (User or Expert)	USER	Y	Mode selection.
Log file name	N/A	Y	Project identifier, will be used as a prefix for all output file names.
Enter project description	N/A	Y	Any text that will be stored in the log file.
Symmetry: Pn(2) (n=2-6)	P1	Y	“Master” (highest order) symmetry. Individual subunits or scattering profiles may have lower order symmetry. Supported symmetries are:P1(no symmetry)P2-P6, P222, P32-P62. The n-fold axis is typically Z, if there is in addition a two-fold axis, it coincides with Y.
File name with curves info	N/A	Y	Configurationfile for the scattering profiles to be fitted
File name with smearing parameters	empty	Y	If required, SASREF smears the theoretical curves using the resolution function introduced by J. Skov Pedersen et al. (1990),J. Appl. Cryst.,23, 321. It is mostly needed for the SANS data but could also be applied for non-point SAXS source. Please refer toMONSAmanual for the explanations on the file format. If no file name is provided, no smearing is applied.
File name with subunits info	N/A	Y	Configurationfile for the atomic models of subunits
File name with cross-dependencies	N/A	Y	Configurationfile with the cross-corelation table between the scattering curves and contributing subunits.
Cross penalty weight	10.0	N	How much the Cross Penalty shall influence the acceptance or rejection of a mutation. A value of0.0disables the penalty. If unsure, use the default value. If clashes between the subunits are observed, try increasing this penalty weight.
Disconnectivity penalty weight	10.0	N	How much the Disconnectivity Penalty shall influence the acceptance or rejection of a mutation. A value of0.0disables the penalty. If unsure, use the default value. If not interconnected arrangement of the subunits is observed, try increasing this penalty weight.
File name, contacts conditions, CR for none<.cnd>	empty	Y	If the information on interface between certain subunits in terms of contacting residues is available, it may be used as a modeling restraint. The information is provided in a file with specialformat. By default no information is given.
Contacts penalty weight	10.0	N	How much improper contacts shall influence the acceptance or rejection of a mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if thecontacts conditionsfile is provided.
Expected particle shape: Prolate, Oblate, or Unknown	UNKNOWN	Y	If, due to prior studies, it is known that the particle’s shape shall be eitherPROLATEorOBLATE, one may use the anisometry option to enforce a penalty on particles that do not correspond with the expected anisometry. By default, anisometry is ‘UNKNOWN’.
Anisometry penalty weight	1.0	N	How much improper anisometry shall influence the acceptance or rejection of a mutation. If unsure, use the default value. This question is skipped if theExpected particle shapeis ‘UNKNOWN’.
Expected direction of anisometry: aLong Z, aCross Z, or Unknown	UNKNOWN	Y	This question is only asked if theExpected particle shapeis not ‘UNKNOWN’ and thesymmetryis ‘P2’. The user can specify if the symmetry axis coincides with (ALONG) or perpendicular to (ACROSS) the anisometry axis.
Shift penalty weight	1.0	N	How much shift from the origin of the entire complex shall influence the acceptance or rejection of a mutation. A value of0.0disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately.
Spatial step in angstroems	5.0	N	Maximal random shift of a subunit at a single modification of the system in the course of simulated annealing. This question is asked for each subunit.
Angular step in degrees	20.0	N	Maximal random rotation angle of a subunit at a single modification of the system in the course of simulated annealing. Setting it to zero may be useful to keep the mutual orientations of certain subunits, e.g. if NMR RDC data are available. This question is asked for each subunit.
Initial annealing temperature	10.0	N	Starting temperature of simulated annealing protocol.
Annealing schedule factor	0.9	N	Factor by which thetemperatureis decreased; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95).
Max # of iterations at each T	var	N	Finalize temperature step and cool after this many iterations at the latest. The default value is5000*total number of subunits.
Max # of successes at each T	var	N	Finalize temperature step and cool after at most this many successful mutations. The default value is500*total number of subunits.
Min # of successes to continue	var	N	Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done. The default value is50*total number of subunits.
Max # of annealing steps	100	N	Stop if simulated annealing is not finished after this many steps. The slower the systemis cooled, the more temperature steps are required.

Runtime Output

On runtime, two lines of output will be generated for each temperature step :

j:   4 T: 0.729E+01 Suc:  1000 Eva:    12497 CPU:  0.208E+03 F:99.4301 Pen: 13.803
The best chi values:11.64871 5.96331

The fields can be interpreted as follows, top-left to bottom-right:

Field	Description
j	Step number. Starts at 1, increases monotonically.
T	Temperature measure, starts at an arbitrary high value, decreases each step by theannealing schedule factor.
Suc	Number of successful mutations in this temperature step. Limited by theminimumandmaximumnumber of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by themaximumnumber of successes criterion. If instead themaximum number of iterationsare done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
Eva	Accumulated number of function evaluations.
CPU	Elapsed wall-clock time since the annealing procedure was started.
F	The best target function value obtained so far.
Pen	Accumulated penalty value of the best target function.
The best chi values	For each curve out oftotal number of curves, the\chivalue of the best target function is given.

SASREFMX additionally outputs the volume fraction of the intact construct for each of the fitted curves:


 Associated volume fractions: 0.42157 0.64983

SASREF CV / MX Input Files

Three compulsory configuration files are to be created containing the information about:

(a) - the scattering data,
(b) - the subunits (rigid bodies) and
(c) - the contribution of each subunit to each scattering curve.Data control file (data.con in the following examples ) has the following format: ```
The first line contains one integer K (total number of scattering curves for SASREFCV and total number of scattering curves+1 for SASREFMX)
K lines, each containing 8 parameters related to the scattering data set:

| Field | Acceptable values | Description |
|----|----|----|
| 1. | N/A | File name with the experimental data(*.dat)in ascii format containing 3 columns: (1) experimental scattering vector, (2) experimental intensity and (3) experimental errors |
| 2. | [-1.0, 0.0-1.0] | D2O fraction in the solvent or-1.0, if X-ray scattering data |
| 3. | [P1-P6, P222-P62] | Symmetry for the given construct at the given conditions (which may be different from the overal symmetry) |
| 4. | [1,2] | Angular units:1=_4*\pi*sin( heta)/\lambda_in&#8491;^-1^,2=_4*\pi*sin( heta)/\lambda_in nm^-1^ |
| 5. | [0.1-1.0] | Fraction of the curve to be fitted |
| 6. | [0-15] | Setting number: number of the column in the optional["Resolution file"](monsa.html#smearing)containing the information for smearing. This value must be0for X-ray curves and for the neutron curves, for which smearing information is not available. Neutron scattering curves with the same non-zero setting number must have the same angular axis and number of experimental points. |
| 7. | [0.0-1.0] | Weight of the curve in the target function. |
| 8. | [Y/N] | IfY, a constant background will be automatically adjusted for this curve (this could be useful for example to correct for incoherent background in neutron data) |

Last line in case of SASREFMX describes the dissociation products and contains all dummy values except for the symmetry.

**Subunits control file** ( subs.con in the
[examples](#examples) ) describes the rigid bodies. Its format is the following:

The first line contains one integer M (total number of subunits)
M lines, each containing 4 parameters related to the subunit:

| Field | Acceptable values | Description |
|----|----|----|
| 1. | N/A | Model file name; coordinate sin PDB or mmCIF format |
| 2. | [Y/N] | Whether to shift the subunit to the origin at the begining or not |
| 3. | [N/F/X/Y/Z/D] | Movements limitations.N='No limitations';F='subunit will be fixed';X='rotations/translations along X axis only';Y='rotations/translations along Y axis only';Z='rotations/translations along Z axis only';D='rotations/translations along (1,1,1) vector only'; |
| 4. | [P1-P6, P222-P62] | Symmetry applied to the given subunit (may be different from the overal symmetry) |

**Cross-correlation file** ( table.con in the
[examples](#examples) ) contains a table which sets the relationship between the
subunits and the scattering profiles. The number of its columns equals to the
[total number of the subunits](#input-subunits) ( M ) and the number of its rows
equals to the [total number of the scattering curves](#input-curves) ( K ). The
value in the _i-th_ column and _j-th_ row gives the contribution of the _i-th_
subunit in the _j-th_ scattering data set. For SANS curve this value ( [0.0-1.0
or -1.0] ) is the subunit perdeuteration (D ~2~ O content in solution where the
protein is expressed), whereby -1.0 means that the given subunit is not present
in the corresponding construct. For X-ray scattering curve, 0.0 is to be used,
if the subunit is present.
In case of SASREFMX, the last row describes the dissociation products which are
mixed to all the curves. Here, an integer number, 0 or -1 are allowed, whereby
-1 means that the subunit is not among the dissociation products, 0 means that
the subunit is a part of a sub-complex which is a dissociation product of a
larger assembly (there could be not more than just one such sub-complex) and an
integer means the molar ratio (stoichiometry) of the subunit in the original
(fully dissociated) sample.
**Distance restraints** may be imposed on the model using contacts conditions
file (optional) in the following format:

  dist 7.0
  1 0 0 2 1 1
  dist 5.0
  2 0 0 3 1 1
  dist 7.0
  1 342 342 2 25 25
  1 350 350 2 17 17
  dist 6.0
  1 290 297 2  64 79
  dist 7.0
  1 1 0 3 1 0

"dist 7.0" means that the minimum distance between CA atoms of the residues (or
P atoms in the nucleotides) specified in the following lines should not exceed 7
&#8491;. The first and the fourth numbers in the line not containing keyword
"dist" mean the ordial numbers of the 1st and the 2nd subunits having the
contact by any residue/nucleotide of the 1st subunit in the range from second
number to third number with any residue of the 2nd subunit in the range from
fifth number to sixth number. 0 means the last residue/nucleotide of the
subunit.
If two (or more) alternatives are given after the line with the keyword "dist",
the program compares the better (smaller) distance among them with the specified
one.
Please refer to [SASREF](sasref.html#input-format) manual for more details.
**Important (new as of ATSAS-4.0.0):** there is a difference in the numbering of
the subunits compared to previous versions in the presence of symmetry. First,
all symmetry mates of the first subunit appear, then of the second and so on
until the last one.

## SASREF CV / MX Output Files
After each simulated annealing step, SASREF CV /MX creates a set of output
files, each filename starts with a customizable [prefix](#dialog-project) that
gets an extension appended. If a prefix has been used before, existing files
will be overwritten without further note.

| Extension | Description |
|----|----|
| .log | Contains the same information as the screen output and is updated during execution of the program. |
| .pdbor.cif | The current model of the entire complex in PDB or mmCIF format. The header section of the file contains information about the application used and about the parameters of the model, e.g. penalties and\chi^2^. |
| -_i_.fit | Fit of the scattering curve computed from the complex (subcomplex) versus the corresponding experimental data._i_stands for the[construct](#dialog-construct)number. Columns in the output file are: 's', 'I~exp~' and 'I~comp~'. |


# Examples

## Building a Complex against X-ray and Contrast Variation 
SANS Data Sets
A simulated exampe of T7 DNA Polymerase Ternary Complex with DCTP (PDB entry
1t8e). Files containing the atomic coordinates of the three subunits are:

phsave1.pdb - Polymerase phsave2.pdb - DCTP phsave3.pdb - DNA

Simulated SAS data contain 17 curves in total: 2 X-ray profiles (from the entire
complex and from the binary construct without DNA) + 15 neutron scattering
curves from the complex [(series of D ~2~ O content: 0, 40, 55, 70 and 100% D
~2~ O)* (3 perduterations of DCTP: 0, 50 and 100% )]:

x-prot.dat X-ray protein complex x-compl.dat X-ray, ternary complex complh_0.dat ternary complex with protonated DCTP in 0%D2O complh_40.dat in 40%D2O complh_55.dat in 55%D2O complh_70.dat in 70%D2O complh_100.dat in 100%D2O compl50d_0.dat 50% deuterated DCTP in 0%D2O compl50d_40.dat in 40%D2O compl50d_55.dat in 55%D2O compl50d_70.dat in 70%D2O compl50d_100.dat in 100%D2O compl100d_0.dat fully deuterated DCTP in 0%D2O compl100d_40.dat in 40%D2O compl100d_55.dat in 55%D2O compl100d_70.dat in 70%D2O compl100d_100.dat in 100%D2O

Content of the curves.con file:

17 x-prot.dat -1.00 P1 1 1.0 0 1.0 y x-compl.dat -1.00 P1 1 1.0 0 1.0 y complh_0.dat 0.00 P1 1 1.0 0 1.0 y complh_40.dat 0.40 P1 1 1.0 0 1.0 y complh_55.dat 0.55 P1 1 1.0 0 1.0 y complh_70.dat 0.70 P1 1 1.0 0 1.0 y complh_100.dat 1.00 P1 1 1.0 0 1.0 y compl50d_0.dat 0.00 P1 1 1.0 0 1.0 y compl50d_40.dat 0.40 P1 1 1.0 0 1.0 y compl50d_55.dat 0.55 P1 1 1.0 0 1.0 y compl50d_70.dat 0.70 P1 1 1.0 0 1.0 y compl50d_100.dat 1.00 P1 1 1.0 0 1.0 y compl100d_0.dat 0.00 P1 1 1.0 0 1.0 y compl100d_40.dat 0.40 P1 1 1.0 0 1.0 y compl100d_55.dat 0.55 P1 1 1.0 0 1.0 y compl100d_70.dat 0.70 P1 1 1.0 0 1.0 y compl100d_100.dat 1.00 P1 1 1.0 0 1.0 y

Content of the subunits.con file:

3 phsave1.pdb Y F P1 phsave2.pdb Y N P1 phsave3.pdb Y N P1

Content of the table.con file:

0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0

A listing of questions/answers for a sample run in the USER mode is as follows:

Computation mode (User or Expert) …… < User >: Log file name …………………….. < .log >: test1 Enter project description ………….. : T7 DNA POLYMERASE WITH DCTP, 17 curves Symmetry: Pn(2) (n=2-6) ……………. < P1 >: File name with curves info …………. < .con >: curves File name with smearing parameters ….. < .res >: File name with subunits info ……….. < .con >: subunits File name with cross-dependencies …… < .con >: table File name, contacts conditions, CR for none < .cnd >: Expected particle shape: <P>rolate, blate, or nknown .......................... < Unknown >: ...

## Quaternary Structure Analysis of a Weak Tetramer
Let weak_tetramer.dat be a SAXS profile from a polydisperse sample containing a
tetramer with P222 symmetry and its monomer (i.e. oligomeric equilibrium). The
atomic structure of the latter is contained in monomer.pdb. Content of the
curves.con file is then:

2 weak_tetramer.dat -1.00 P222 1 1.0 0 1.0 y dummy.dat -1.00 P1 1 1.0 0 1.0 y

Content of the subunits.con file:

1 monomer.pdb Y N P222

Content of the table.con file:

0.0
0.0

## Modelling of a Transient Heterodimer
Equimolar mixture of proteins A and B yields an equilibrium between the AB
complex and its components in unbound state. Concentration series with distinct
volume fracions of the intact complex can be fitted simultaneously. Content of
the configuration files in this case is as follows.
curves.con :

4 transient_c1.dat -1.00 P1 1 1.0 0 1.0 y transient_c2.dat -1.00 P1 1 1.0 0 1.0 y transient_c3.dat -1.00 P1 1 1.0 0 1.0 y dummy.dat -1.00 P1 1 1.0 0 1.0 y

subunits.con :

2 a.pdb Y N P1 b.pdb Y N P1

table.con :

0.0 0.0
0.0 0.0
0.0 0.0
1.0 1.0

## Complex with Excess of a Component in Solution
If an excess of subunit C is needed for formation of a stable complex ABC,
resulting sample will be polydisperse. Content of the configuration files in
this case is as follows.
curves.con :

2 mixture.dat -1.00 P1 1 1.0 0 1.0 y dummy.dat -1.00 P1 1 1.0 0 1.0 y

subunits.con :

3 a.pdb Y N P1 b.pdb Y N P1 c.pdb Y N P1

table.con :

0.0 0.0 0.0 -1.0 -1.0 0.0

## Presence of a Sub-Complex in the Mixture
By lack of subunit C (e.g. in a stoichiometric study), some amount of sub-
complex AB migh be present in the mixture with the ternary ABC. Content of the
configuration files is then as follows.
curves.con :

2 mixture.dat -1.00 P1 1 1.0 0 1.0 y dummy.dat -1.00 P1 1 1.0 0 1.0 y

subunits.con :

3 a.pdb Y N P1 b.pdb Y N P1 c.pdb Y N P1

table.con :

0.0 0.0 0.0 0.0 0.0 -1.0

```