sasrefcv/sasrefmx
Manual
The following describes the special versions of the rigid body modelling program SASREF, SASREFCV (formerly known as SASREF7) for fitting of SANS contrast variation series (also combinable with SAXS) and SASREFMX for the structural analysis of transient complexes and weak oligomers from polydisperse data. In the latter case, the rigid body modelling is coupled with mixture analysis, whereby the volume fractions of the dissociation products are estimated. The two approaches have very much in common, so this manual provides details of the dialog prompt as well as the required configuration / input files as well as the produced output for both programs.
Introduction
SASREF CV / MX perform quaternary structure modeling of a complex particle formed by subunits with known atomic structure against the SAS data set in case of contrast variation series and a polydisperse system, respectively. Multiple data sets can be fitted simultaneously, e.g. different D2O content and/or perdeuteration (in SASREFCV) or profiles recorded at different conditions (concentration, temperature, pH, ionic strength) yielding different affinity of the complex particle (in SASREFMX). Both algorithms are capable to account for the symmetry (which can be subunit- and data-specific). A simulated annealing protocol is employed to construct an interconnected ensemble of subunits without steric clashes, while minimizing the discrepancy between the experimental scattering data and the predicted curves from the appropriate subunits assemblies. In case of SASREFMX, the experimental data is fitted by the linear combination of the profiles calculated from the intact particle and from the dissociation products. For futher details of the rigid body modelling approach please refer to SASREF manual and to the papers cited above.
Running sasrefcv/sasrefmx
Command-Line Arguments and Options
Short Option | Long Option | Description |
---|---|---|
--seed=<INT> | Set the seed for the random number generator | |
--model-format=<FMT> | Format of 3D models, one of: cif, pdb (default: cif) | |
--alternative-names | Enable alternative atom naming for all atomic structure files; default: disabled. | |
--implicit-hydrogen=<N> | Set this to a value N>=0 to override ‘unable to determine number of hydrogens’ errors. | |
--sub-element==<NAME> | Set this to a valid element to override ‘unable to determine element’ errors. | |
-h | --help | Print a summary of arguments, options, and exit. |
-v | --version | Print version information and exit. |
Interactive Configuration
SASREF CV / MX can only be run in the dialog mode, no command line arguments are accepted. Similarly to MONSA, significant amount of the user input is provided using configuration files.
There are two modes, EXPERT and USER. In EXPERT mode, the user has the options to adjust more parameters. In USER mode, fewer questions are asked as the default values are used for the most of the program parameters. The default settings are the same in both modes.
SASREF CV / MX interactive prompt:
Screen Text | Default | Asked in USER-mode? | Description |
---|---|---|---|
Computation mode (User or Expert) | USER | Y | Mode selection. |
Log file name | N/A | Y | Project identifier, will be used as a prefix for all output file names. |
Enter project description | N/A | Y | Any text that will be stored in the log file. |
Symmetry: Pn(2) (n=2-6) | P1 | Y | “Master” (highest order) symmetry. Individual subunits or scattering profiles may have lower order symmetry. Supported symmetries are: P1 (no symmetry), P2-P6, P222, P32-P62. The n-fold axis is typically Z, if there is in addition a two-fold axis, it coincides with Y. |
File name with curves info | N/A | Y | Configuration file for the scattering profiles to be fitted |
File name with smearing parameters | empty | Y | If required, SASREF smears the theoretical curves using the resolution function introduced by J. Skov Pedersen et al. (1990), J. Appl. Cryst., 23, 321. It is mostly needed for the SANS data but could also be applied for non-point SAXS source. Please refer to MONSA manual for the explanations on the file format. If no file name is provided, no smearing is applied. |
File name with subunits info | N/A | Y | Configuration file for the atomic models of subunits |
File name with cross-dependencies | N/A | Y | Configuration file with the cross-corelation table between the scattering curves and contributing subunits. |
Cross penalty weight | 10.0 | N | How much the Cross Penalty shall influence the acceptance or rejection of a mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If clashes between the subunits are observed, try increasing this penalty weight. |
Disconnectivity penalty weight | 10.0 | N | How much the Disconnectivity Penalty shall influence the acceptance or rejection of a mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If not interconnected arrangement of the subunits is observed, try increasing this penalty weight. |
File name, contacts conditions, CR for none <.cnd> | empty | Y | If the information on interface between certain subunits in terms of contacting residues is available, it may be used as a modeling restraint. The information is provided in a file with special format. By default no information is given. |
Contacts penalty weight | 10.0 | N | How much improper contacts shall influence the acceptance or rejection of a mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if the contacts conditions file is provided. |
Expected particle shape: Prolate, Oblate, or Unknown | UNKNOWN | Y | If, due to prior studies, it is known that the particle’s shape shall be either PROLATE or OBLATE, one may use the anisometry option to enforce a penalty on particles that do not correspond with the expected anisometry. By default, anisometry is ‘UNKNOWN’. |
Anisometry penalty weight | 1.0 | N | How much improper anisometry shall influence the acceptance or rejection of a mutation. If unsure, use the default value. This question is skipped if the Expected particle shape is ‘UNKNOWN’. |
Expected direction of anisometry: aLong Z, aCross Z, or Unknown | UNKNOWN | Y | This question is only asked if the expected particle shape is not ‘UNKNOWN’ and the symmetry is ‘P2’. The user can specify if the symmetry axis coincides with (ALONG) or perpendicular to (ACROSS) the anisometry axis. |
Shift penalty weight | 1.0 | N | How much shift from the origin of the entire complex shall influence the acceptance or rejection of a mutation. A value of 0.0 disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately. |
Spatial step in angstroems | 5.0 | N | Maximal random shift of a subunit at a single modification of the system in the course of simulated annealing. This question is asked for each subunit. |
Angular step in degrees | 20.0 | N | Maximal random rotation angle of a subunit at a single modification of the system in the course of simulated annealing. Setting it to zero may be useful to keep the mutual orientations of certain subunits, e.g. if NMR RDC data are available. This question is asked for each subunit. |
Initial annealing temperature | 10.0 | N | Starting temperature of simulated annealing protocol. |
Annealing schedule factor | 0.9 | N | Factor by which the annealing temperature is decreased; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95). |
Max # of iterations at each T | var | N | Finalize temperature step and cool after this many iterations at the latest. The default value is 5000 * total number of subunits. |
Max # of successes at each T | var | N | Finalize temperature step and cool after at most this many successful mutations. The default value is 500 * total number of subunits. |
Min # of successes to continue | var | N | Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done. The default value is 50 * total number of subunits. |
Max # of annealing steps | 100 | N | Stop if simulated annealing is not finished after this many steps. The slower the system is cooled, the more temperature steps are required. |
Runtime Output
On runtime, two lines of output will be generated for each temperature step :
j: 4 T: 0.729E+01 Suc: 1000 Eva: 12497 CPU: 0.208E+03 F:99.4301 Pen: 13.803
The best chi^2 values:11.64871 5.96331
The fields can be interpreted as follows, top-left to bottom-right:
Field | Description |
---|---|
j | Step number. Starts at 1, increases monotonically. |
T | Temperature measure, starts at an arbitrary high value, decreases each step by the annealing schedule factor. |
Suc | Number of successful mutations in this temperature step. Limited by the minimum and maximum number of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly. |
Eva | Accumulated number of function evaluations. |
CPU | Elapsed wall-clock time since the annealing procedure was started. |
F | The best target function value obtained so far. |
Pen | Accumulated penalty value of the best target function. |
The best chi^2 values | For each curve out of total number of curves, the \(\chi^2\) value of the best target function is given. |
SASREFMX additionally outputs the volume fraction of the intact construct for each of the fitted curves:
Associated volume fractions: 0.42157 0.64983
sasrefcv/sasrefmx Input Files
Three compulsory configuration files are to be created containing the information about:
- the scattering data:
- the subunits (rigid bodies):
- the contribution of each subunit to each scattering curve:
Data control file
In the following examples, data.con
has the following format:
- The first line contains one integer K (total number of scattering curves for SASREFCV and total number of scattering curves+1 for SASREFMX)
- K lines, each containing 8 parameters separated by white space which are related to the scattering data set
These parameters are, in order:
Description | Valid values |
---|---|
File name with experimental SAS data (.dat) | |
D2O fraction in the solvent or -1.0, if X-ray scattering data | -1.0, [0.0-1.0] |
Symmetry for the given construct at the given conditions (which may be different from the overall symmetry) | Pn, Pn2, with n=1,…,6 |
Angular units: 1 = \(4 \pi \sin(\theta)/\lambda\) in \(\textrm{\AA}^{-1}\), 2 = \(4 \pi \sin(\theta)/\lambda\) in \(\textrm{nm}^{-1}\) | [1,2] |
Fraction of the curve to be fitted | [0.1-1.0] |
Setting number: number of the column in the optional “Resolution file” containing the information for smearing. This value must be 0 for X-ray curves and for the neutron curves, for which smearing information is not available. Neutron scattering curves with the same non-zero setting number must have the same angular axis and number of experimental points. | [0-15] |
Weight of the curve in the target function. | [0.0-1.0] |
If Y, a constant background will be automatically adjusted for this curve (this could be useful for example to correct for incoherent background in neutron data) | Y|N |
The last line in case of SASREFMX describes the dissociation products and contains all dummy values except for the symmetry.
Subunits control file
In the following examples, subs.con
describes the rigid bodies. Its format is the following:
- The first line contains one integer M (total number of subunits)
- M lines, each containing 4 parameters related to the subunit:
These parameters are, in order:
Description | Valid values |
---|---|
Model file name; coordinates in PDB or mmCIF format | |
Whether to shift the subunit to the origin during the initialization phase | Y|N |
Movements limitations: - N = No limitations - F = subunit will be fixed - X = rotations/translations along X axis only - Y = rotations/translations along Y axis only - Z = rotations/translations along Z axis only - D = rotations/translations along (1,1,1) vector only |
[N/F/X/Y/Z/D] |
Symmetry applied to the given subunit (may be different from the overall symmetry) | Pn, Pn2, with n=1,…,6 |
Cross-correlation file
In the following examples, table.con
contains a table which sets the relationship between the subunits and the scattering profiles. The number of its columns equals to the total number of the subunits (M) and the number of its rows equals to the total number of the scattering curves (K). The value in the i-th column and j-th row gives the contribution of the i-th subunit in the j-th scattering data set. For SANS curve this value ([0.0-1.0 or -1.0]) is the subunit perdeuteration (D2O content in solution where the protein is expressed), whereby -1.0 means that the given subunit is not present in the corresponding construct. For X-ray scattering curve, 0.0 is to be used, if the subunit is present.
In case of SASREFMX, the last row describes the dissociation products which are mixed to all the curves. Here, an integer number, 0 or -1 are allowed, whereby -1 means that the subunit is not among the dissociation products, 0 means that the subunit is a part of a sub-complex which is a dissociation product of a larger assembly (there could be not more than just one such sub-complex) and an integer means the molar ratio (stoichiometry) of the subunit in the original (fully dissociated) sample.
Distance restraints
Distance constraints may be imposed on the model using contacts conditions file (optional) in the following format:
dist 7.0
1 0 0 2 1 1
dist 5.0
2 0 0 3 1 1
dist 7.0
1 342 342 2 25 25
1 350 350 2 17 17
dist 6.0
1 290 297 2 64 79
dist 7.0
1 1 0 3 1 0
“dist 7.0” means that the minimum distance between CA atoms of the residues (or P atoms in the nucleotides) specified in the following lines should not exceed 7 \(\text{\AA}\). The first and the fourth numbers in the line not containing keyword “dist” mean the ordinal numbers of the 1st and the 2nd subunits having the contact by any residue/nucleotide of the 1st subunit in the range from second number to third number with any residue of the 2nd subunit in the range from fifth number to sixth number. 0 means the last residue/nucleotide of the subunit. If two (or more) alternatives are given after the line with the keyword “dist”, the program compares the better (smaller) distance among them with the specified one.
Please refer to SASREF manual for more details.
Important (new as of ATSAS-4.0): there is a difference in the numbering of the subunits compared to previous versions in the presence of symmetry. First, all symmetry mates of the first subunit appear, then of the second and so on until the last one.
sasrefcv/sasrefmx Output Files
After each simulated annealing step, SASREFCV and SASREFMX create a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.
Extension | Description |
---|---|
.log | A copy of the screen output |
.pdb or .cif | The current model of the entire complex in either PDB or mmCIF format, depending on the model-format option. |
-i.fit | Fit of the scattering curve computed from the ith (sub-)complex, where refers to the ith construct. |
Examples
Building a Complex against X-ray and Contrast Variation
SANS Data Sets A simulated exampe of T7 DNA Polymerase Ternary Complex with DCTP (PDB entry 1t8e). Files containing the atomic coordinates of the three subunits are:
phsave1.pdb - Polymerase
phsave2.pdb - DCTP
phsave3.pdb - DNA
Simulated SAS data contain 17 curves in total: 2 X-ray profiles (from the entire complex and from the binary construct without DNA) + 15 neutron scattering curves from the complex [(series of D2O content: 0, 40, 55, 70 and 100% D2O) times (3 perduterations of DCTP: 0, 50 and 100%)]:
File name | Description | D2O content |
---|---|---|
x-prot.dat | X-ray protein complex | – |
x-compl.dat | X-ray, ternary complex | – |
complh_0.dat | ternary complex with protonated DCTP | 0% |
complh_40.dat | 40% | |
complh_55.dat | 55% | |
complh_70.dat | 70% | |
complh_100.dat | 100% | |
compl50d_0.dat | 50% deuterated DCTP | 0% |
compl50d_40.dat | 40% | |
compl50d_55.dat | 55% | |
compl50d_70.dat | 70% | |
compl50d_100.dat | 100% | |
compl100d_0.dat | fully deuterated DCTP | 0% |
compl100d_40.dat | 40% | |
compl100d_55.dat | 55% | |
compl100d_70.dat | 70% | |
compl100d_100.dat | 100% |
Content of the curves.con file:
17
x-prot.dat -1.00 P1 1 1.0 0 1.0 y
x-compl.dat -1.00 P1 1 1.0 0 1.0 y
complh_0.dat 0.00 P1 1 1.0 0 1.0 y
complh_40.dat 0.40 P1 1 1.0 0 1.0 y
complh_55.dat 0.55 P1 1 1.0 0 1.0 y
complh_70.dat 0.70 P1 1 1.0 0 1.0 y
complh_100.dat 1.00 P1 1 1.0 0 1.0 y
compl50d_0.dat 0.00 P1 1 1.0 0 1.0 y
compl50d_40.dat 0.40 P1 1 1.0 0 1.0 y
compl50d_55.dat 0.55 P1 1 1.0 0 1.0 y
compl50d_70.dat 0.70 P1 1 1.0 0 1.0 y
compl50d_100.dat 1.00 P1 1 1.0 0 1.0 y
compl100d_0.dat 0.00 P1 1 1.0 0 1.0 y
compl100d_40.dat 0.40 P1 1 1.0 0 1.0 y
compl100d_55.dat 0.55 P1 1 1.0 0 1.0 y
compl100d_70.dat 0.70 P1 1 1.0 0 1.0 y
compl100d_100.dat 1.00 P1 1 1.0 0 1.0 y
Content of the subunits.con file:
3
phsave1.pdb Y F P1
phsave2.pdb Y N P1
phsave3.pdb Y N P1
Content of the table.con file:
0.0 0.0 -1.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.5 0.0
0.0 0.5 0.0
0.0 0.5 0.0
0.0 0.5 0.0
0.0 0.5 0.0
0.0 1.0 0.0
0.0 1.0 0.0
0.0 1.0 0.0
0.0 1.0 0.0
0.0 1.0 0.0
A listing of questions/answers for a sample run in the USER mode is as follows:
Computation mode (User or Expert) ...... < User >:
Log file name .......................... < .log >: test1
Enter project description .............. : T7 DNA POLYMERASE WITH DCTP, 17 curves
Symmetry: Pn(2) (n=2-6) ................ < P1 >:
File name with curves info ............. < .con >: curves
File name with smearing parameters ..... < .res >:
File name with subunits info ........... < .con >: subunits
File name with cross-dependencies ...... < .con >: table
File name, contacts conditions, CR for none < .cnd >:
Expected particle shape: <P>rolate, <O>blate,
or <U>nknown .......................... < Unknown >:
...
Quaternary Structure Analysis of a Weak Tetramer
Let weak_tetramer.dat be a SAXS profile from a polydisperse sample containing a tetramer with P222 symmetry and its monomer (i.e. oligomeric equilibrium). The atomic structure of the latter is contained in monomer.pdb. Content of the curves.con file is then:
2
weak_tetramer.dat -1.00 P222 1 1.0 0 1.0 y
dummy.dat -1.00 P1 1 1.0 0 1.0 y
Content of the subunits.con file:
1
monomer.pdb Y N P222
Content of the table.con file:
0.0
0.0
Modelling of a Transient Heterodimer
Equimolar mixture of proteins A and B yields an equilibrium between the AB complex and its components in unbound state. Concentration series with distinct volume fracions of the intact complex can be fitted simultaneously. Content of the configuration files in this case is as follows. curves.con :
4
transient_c1.dat -1.00 P1 1 1.0 0 1.0 y
transient_c2.dat -1.00 P1 1 1.0 0 1.0 y
transient_c3.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat -1.00 P1 1 1.0 0 1.0 y
subunits.con :
2
a.pdb Y N P1
b.pdb Y N P1
table.con :
0.0 0.0
0.0 0.0
0.0 0.0
1.0 1.0
Complex with Excess of a Component in Solution
If an excess of subunit C is needed for formation of a stable complex ABC, resulting sample will be polydisperse. Content of the configuration files in this case is as follows. curves.con :
2
mixture.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat -1.00 P1 1 1.0 0 1.0 y
subunits.con :
3
a.pdb Y N P1
b.pdb Y N P1
c.pdb Y N P1
table.con :
0.0 0.0 0.0
-1.0 -1.0 0.0
Presence of a Sub-Complex in the Mixture
By lack of subunit C (e.g. in a stoichiometric study), some amount of sub- complex AB migh be present in the mixture with the ternary ABC. Content of the configuration files is then as follows. curves.con :
2
mixture.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat -1.00 P1 1 1.0 0 1.0 y
subunits.con :
3
a.pdb Y N P1
b.pdb Y N P1
c.pdb Y N P1
table.con :
0.0 0.0 0.0
0.0 0.0 -1.0