Manual

The following sections briefly describe the method implemented in MONSA, usage in dialog mode as well as the required input and the produced output files.

Introduction

MONSA is an extended version of DAMMIN for multiphase bead modelling which allows one to fit simultaneously multiple curves (e.g. from X-ray and/or neutron contrast variation series).

Running MONSA

MONSA reads in multiple data sets and information about the contrasts and volume fractions of the phases in a particle. The program can simultaneously fit data recorded at different instrumental settings and also with different radiations (e.g. X-rays and neutrons). The structure of the input data is therefore somewhat complicated. The program requires:

  • aMASTER file(file *.mst) containing the general phase information and references to CONTROL file(s);
  • CONTROL file(s)(.con) containing the smearing information for the given setting, information about contrasts and references to DATA files (.dat);
  • DATA files(*.dat), containing raw experimental data at different contrasts;
  • aPDB-like filedefining the number of phases and the SEARCH VOLUME for the model.

    Command-Line Arguments and Options

    MONSA recognizes the following command-line options:

Option Description
–model-format=<FMT> Format of 3D models, one of: cif, pdb (default: cif)
–help Print a summary of arguments, options, and exit.
–version Print the ATSAS version and exit.

Interactive Configuration

Screen Text Default Description
Log file name: N/A An identifier (up to six characters) to define all the output files names
Project description: N/A Text description of the problem
Master file name: N/A Name of the master file
Maximum order of harmonics: 14 The more harmonics, the more accurate the reconstruction becomes, but the slower the process. May be between 5 and 20
DAM coordinates file name: N/A Name of the Search Volume file generated byBODIES.
Symmetry: Pn or Pn2 (n=1,2,3,4,5,6): P1 Specify the symmetry to enforce on the particle.
Reset (unfix) all atoms [ Y / N ]: No If ‘Y’, thephases indicesallowed for the atoms in the pdb file are set to.
Atomic radius: var If the file is prepared byBODIES, the value is read from the file.
Atomic volume: var This is ( 4 / 3 )\pi*r^3^/ 0.74 (volume per sphere for dense packing).
Preference for non-solvent contacts: 0.3 With a value of 0.0, the phase of the atom (solvent or protein) does not influence thelooseness penalty weight. When this value is increased, non-solvent contacts are prefered, through the calculation of thelooseness penalty weight. If unsure, use the default value.
Looseness penalty weight: 50 How much the Looseness Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value. If unlike smooth surfaces, sharp edges are observed, try decreasing this penalty weight.
Discontiguity penalty weight: 50 How much the Discontiguity Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value.
Randomize the initial DAM [ Y / N ] Yes If ‘Y’, the starting model is randomized
Fix the overall scale factor [ Y / N ] No If No (recommended), then the overall scale factor, as well as individual relative scale factors for all the data sets will be determined automatically. If the scale factor is known (data on absolute scale) in may be fixed and entered manually.
Volume fraction penalty weight 50 How much the Volume Fraction Penalty should influence the acceptance or rejection of phase changes.
Rg penalty weight 0.0 How much the radius of gyration penalty should influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty.
Center penalty weight (negative = WeiPer): 0.0 How much the Center Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value.
Initial annealing temperature : 10 If the value is too high, it could take ages for the system to cool down. If the value is too low, the system can be trapped in a local minimum. If unsure use the default value.
Annealing schedule factor : 0.9 Factor by which the temperature is decreased; 0.95 is a good average value. Faster cooling for smaller systems is possible (0.9), but slower cooling (0.99) needs to be applied more often.
Max # of iteration at each T: var Finalize temperature step and cool after this many iterations at the latest.
Max # of successes at each T: var Finalize temperature step and cool after at most this many successful phase changes.
Min # of successes to continue: var Stop if not at least this many successful state changes within a single temperature step can be done.
Number of annealing steps: 100 Stop after this number of steps if did not cooled down before.
Plot the final fits [ Y / N ]: No Display the final fits.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

jAnn: 1 T: 0.100E+02 iSuc: 11718 nEva: 12542 CPU: 0.4056E+02

SqfVal: 22.8539 Rf: 22.25999 Los: 0.1312 Dis: 0.0464 Sca: 0.342E+01

The fields can be interpreted as follows, top-left to bottom-right:

Field Description
jAnn Step number. Starts at 1, increases monotonically.
T Temperature measure, starts at an arbitrary high value, decreases each step by the temperature schedule factor
iSuc Number of successful phase changes in this temperature step. The number of successes should slowly decrease, the first couple of steps should be terminated by themaximumnumber of successes criterion. If instead themaximum number of iterations per stepare done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
nEva Accumulated number of function evaluations.
CPU Elapsed CPU cycles since the annealing procedure was started.
SqfVal Goodness of the model (fit + penalties).
Rf Goodness of fit of simulated data versus experimental data, does not take penalties into account.
Los Contribution of Looseness Penalty, not taking theLooseness Penalty Weightinto account.
Dis Contribution of Discontiguity Penalty, not taking theDiscontiguity Penalty Weightinto account.
Sca Scale factor

MONSA Input Files

Master File

The master file contains the general phase information: volumes of the different phases, radii of gyration, connectivity etc. It has the following structure:


 Line 1    Title (up to 80 characters)
 Line 2    Four theoretical volumes
           of individual phases (required)
 Line 3    Four theoretical radii of gyration in &#8491;ngstrom (even [if your data are in nm-1](#control))
           of individual phases (optional)
 Line 4    Connectivity indicators of phases (required):
           '1' for 'interconnected', '0' for 'disconnected', '-1' for 'symmetry defined'
 Line 5    Control file name and Npts for Guinier fit
           (no fit if the latter is equal to '-1')
  ... OPTIONAL ...
 Line 6    Control file name and Npts for Guinier fit
           (no fit if the latter is equal to '-1')
...

  etc      Erroneous lines skipped; read to the end

The program works with up to four-component particles. If the number of components (phases) is less than four, just put zeroes for the values required for this phase.

Control File

The control file contains the smearing information for the given setting, information about contrasts and references to the data file. It has the following structure:


 Line 1    Resolution file name, resolution setting number (free format)
 Line 2    Output file name for the fits (not used)        (free format)
 Line 3    Title                                           (character*80)
 Line 4    Number of points in the setting                 (free format)
           (put negative number to indicate nm-1 as angular units)
 Line 5    Data file name, contrasts and constants         (free format)
  etc      Erroneous lines skipped; read to the end

The information about the data sets is given in the format:


Filename    Dro1        Dro2       Dro3       Dro4      Mult  Const   Weight

Field Description
Filename Filename of the scattering pattern (up to 15 characters).
Dron Contrast of the nth phase.
Mult The scattering pattern is multiplied by this factor afterconstant substraction.
Const Constant subtractedto the scattering pattern.
Weight Relative weight of the data set.

Smearing

If required, MONSA smears the theoretical curves using the resolution function introduced by J. Skov Pedersen et al. (1990), J. Appl. Cryst., 23, 321. Several subroutines for data smearing are provided by J. Skov Pedersen and modified for the use in MONSA. The resolution file must have the following format (the numbers describe a setting at RISOE SANS instrument):

Row Value Description
1 0.8 Effective collimation slit diameter in cm.
2 0.35 Effective sample diameter in cm.
3 300 Collimation distance in cm.
4 105 Sample-detector distance in cm.
5 3 \lambdainÅ
6 0.18 \delta(\lambda)/\lambda
7 1.1 Pixel size in cm.
8 0.0000 Averaging error (accounted for in Pixel size).

Diagram of a SANS instrument showing the lengths required for the ill.res fileIf the file is corrupted or does not exist, no smearing is performed. An example of the resolution file is given below. The resolution setting number is the number of column in the resolution file.


 0.00001, 0.00001,   0.00001 , 0.8    , 0.8
 0.00001, 0.00001,   0.00001 , 0.30   , 0.35
 1100.  , 200.   ,    100.   , 300.   , 100
  180.  , 125.   ,    100.   , 110.   , 100
  6.0   ,  5.6   ,     1.    ,  3.22  , 6.
  0.10  ,  0.09  ,    0.01   , 0.18   , 0.18
 0.0001 ,  1.57  ,    0.01   , 1.1    , 1.1
 0.0000 , 0.0000 ,    0.0000 , 0.0000 , 0.0000

Data Files

The experimental data files must have the following structure:


            1st line     - comment
            2nd line etc - s, I(s), Err(s) in free format

where s = 4 * \pi * sin ( heta ) / \lambda in Å ^-1^, I(s) is the experimental intensity and Err(s) is the standard deviation

Search Volume File

The input file defining the search volume is a PDB-like file containing the coordinates of dummy atoms with the extra “ phase “ information telling to which phase the atom belongs. The file looks like this:


0         1         2         3         4         5         6         7
01234567890123456789012345678901234567890123456789012345678901234567890123456

ATOM      1  CA  ASP    1      -17.000 -16.957-101.666  1.00 20.00   3 3012
ATOM      2  CA  ASP    1      -17.000   -.957-101.666  1.00 20.00 1 3 3012
ATOM      3  CA  ASP    1      -17.000  15.043-101.666  1.00 20.00 0 1 3012
ATOM      4  CA  ASP    1       -1.000 -16.957-101.666  1.00 20.00   2 3012
ATOM      5  CA  ASP    1       -1.000   -.957-101.666  1.00 20.00   1 202

The characters 1 to 65 in a line are as in a normal PDB file. Column 67 (iCore): if ‘1’, the phase of this atom is fixed and will not be changed during the search (“core atom”); if iCore=’0’ or ‘ ‘, the phase of the atom is free to change. The core indicators may be re-computed automatically when loading the model to that iCore will be put to 1 ff an atom is surrounded by the atoms of the same phase only. In this case, the program will change interface atoms. This option may be useful if a preliminary model is available. Column 69( iPhas ): is the phase indices of the atom (ordinal number in the iAllo array). Column 71 (nAllo): is the number of phases allowed for the given atom. Columns 72 etc ( iAllo ): are the indices of the allowed phases such that iAllo( iPhas ) is the phase of the atom. This system allows one, if required, to select the phases which can occupy any given point. In the above example of a two-phase system


Atom 1: free  atom of phase 2
Atom 2: fixed atom of phase 2
Atom 3: free  atom of phase 0 (solvent)
Atom 4: free  atom of phase 1
Atom 5: free  atom of phase 0 (solvent; could be only solvent or phase 2)

In most cases, however, the user does not need to learn the structure of this file. A program BODIES is available to generate an ellipsoidal (or spherical) search volume for the given number of phases and given number of dummy atoms. In a general case, one can always use the spherical search volume with the diameter equal to Dmax, as in DAMMIN. MONSA will automatically calculate the number of phases in the search model when reading this file. The number of dummy atoms in the search volume must not exceed 10000! The distribution package includes an example of a batch file containing the required answers. Typing monsa144 < test.ans will run the structure determination for the supplied example in the batch mode (may take a day of CPU on a PC!). The example is taken from the article (a 30S ribosomal subunit-like particle with simulated proteins inside). The model is given in the file model.pdb (phase 1 - proteins, phase 2 - RNA), the initial search volume in the file sph105-2.pdb (a sphere with diameter 210A, two-phase system; generated by BODIES ). The scattering curves *.dat are computed from this model (see above example of test control data file) and randomized. NOTE that for any solution obtained using this method, an enantiomorph would yield the same scattering patterns! It was also observed (quite seldom) in test examples that one phase was enantiomorphous whereas the others not.

MONSA Output Files

With each successful run, MONSA creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

Extension Description
.log Contains the same information as the screen output and is updated during execution of the program.
-0.pdb This pdb file contains the beads of the solvent (a.k.a. the search volume).
-1.pdb : -n.pdb These pdb files contain the beads of each individual phases.
.pdb This pdb file contains the beads of all the phases and the solvent (a.k.a. the search volume). The beads of the different phases and the solvent are distinguished by their chain number. The header of the file contains information about the application used and about invariants of the particle, e.g. R~g~, volume and molecular mass of the particle.
.fit Fit of the simulated scattering curve versus a smoothed-out version of the real-data. Columns in the output file are: ‘s’, ‘c.I~exp~’, ‘c.ErrI~exp~’ and ‘I~FIT~’.

Generating a Search Volume

In previous releases two helper applications, DAMESV and DAMEMB, were included to generate suitable search volumes for MONSA. This functionality was integrated into the search-volume mode of BODIES.

Example

Master file for the test example: contrast variation simulated data of a 30S ribosomal subunit-like particle consisting of “RNA” (phase 2, density = 4.0) with some “proteins” inside (phase 1; density = 2.0)

Master file for quazi-30S model randomized data to s=0.2
 3.7e5   8.7e5    0.00  0.0              ! Desired Volumes
 49.0     61.0    0.00  0.0              ! Desired Rgs
  0        1      0      0               ! Connectivity
'test.con'    10                         ! Control file name; Rgs will be
                                         ! computed from 10 first points

Control file for the test example


  'Point collimation'   1                             !! No smearing
  'test.fit'                                          !! Output fits
  Test for 30S -- use randomized data up to 0.2       !! Title
   98                                                 !! Number of points
'0r1.dat'    2.00       4.00       0.00     0.00      1.000    0.0    1.00    0
'2r1.dat'    0.00       2.00       0.00     0.00      1.000    0.0    1.00    0
'4r1.dat'   -2.00       0.00       0.00     0.00      1.000    0.0    1.00    0
'6r1.dat'   -4.00      -2.00       0.00     0.00      1.000    0.0    1.00    0
'infr1.dat'  1.00       1.00       0.00     0.00      1.e-6    0.0    1.00    0

Here, the data sets ‘?r1.dat’ correspond to the scattering patterns from the test body in solvents with density 0.0, 2.0, 4.0, 6.0. The set ‘infr1.dat’ corresponds to “shape scattering” (infinite contrast). Note that the test would have worked also without the ‘infinite contrast’ data. Please note:

- filename should be given in quotes (up to 15 characters);
- put zeroes as contrasts for phases, which are not present;
- all files in the setting MUST have the same number of points and the same angular axis; if you have data set(s) on another angular grid(s), put them as another setting(s);
- from each data set, a constant "Const" will be subtracted and the result will be multiplied by "Mult";
- the data sets will be weighted with the relative weight "Weight" in the total discrepancy; reducing the weight is equivalent to increasing errors in the data file;
- number of points must not exceed 2048.   Choose the value, so that the maximal s value becomes 2.5 nm^-1^.Example of input data ```

Randomized data, RELERR= 3.00 %, file 0.dat 12-NOV-1998 13:22:35 .600000E-02 .176494E+14 .504240E+12 .800000E-02 .168392E+14 .486090E+12 .100000E-01 .159999E+14 .463710E+12 ….. SKIPPED FOR BREVITY …… .194000E+00 .628594E+10 .184596E+09 .196000E+00 .582946E+10 .179298E+09 .198000E+00 .591612E+10 .173796E+09 .200000E+00 .570405E+10 .168174E+09

After the [configuration](#interactive-configuration), the program computes the parameters for the
initial state and the simulated annealing procedure starts:
 ---  Starting values  ---   Total scale factor      :    3.51404919007708   Function value          :    733.688635068192   Overall discrepancy     :    696.618264908644   SQRT(Overall discr.)    :    26.3935269509144   DAM looseness           :   0.137137795235494   DAM discontiguity       :   6.681519817391703E-002   Overall penalty         :    37.0703701595471  jAnn:   1  T: 0.100E+02  iSuc: 11718  nEva:    12513  CPU:  0.4555E+02   SqfVal: 23.3509  Rf: 22.69190  Los: 0.1314 Dis: 0.0517  Sca: 0.338E+01  jAnn:   2  T: 0.900E+01  iSuc: 11718  nEva:    25119  CPU:  0.9059E+02   SqfVal: 22.7818  Rf: 22.15299  Los: 0.1243 Dis: 0.0272  Sca: 0.341E+01  jAnn:   3  T: 0.810E+01  iSuc: 11718  nEva:    37867  CPU:  0.1366E+03   SqfVal: 22.5775  Rf: 22.01942  Los: 0.1295 Dis: 0.0268  Sca: 0.327E+01  jAnn:   4  T: 0.729E+01  iSuc: 11718  nEva:    50732  CPU:  0.1830E+03   SqfVal: 22.5775  Rf: 22.01942  Los: 0.1295 Dis: 0.0268  Sca: 0.327E+01  jAnn:   5  T: 0.656E+01  iSuc: 11718  nEva:    63648  CPU:  0.2309E+03   SqfVal: 22.5775  Rf: 22.01942  Los: 0.1295 Dis: 0.0268  Sca: 0.327E+01  jAnn:   6  T: 0.590E+01  iSuc: 11718  nEva:    76727  CPU:  0.2778E+03   SqfVal: 22.3977  Rf: 21.72769  Los: 0.1368 Dis: 0.0467  Sca: 0.330E+01  jAnn:   7  T: 0.531E+01  iSuc: 11718  nEva:    89852  CPU:  0.3235E+03   SqfVal: 22.2560  Rf: 21.66409  Los: 0.1292 Dis: 0.0197  Sca: 0.329E+01  jAnn:   8  T: 0.478E+01  iSuc: 11718  nEva:   103078  CPU:  0.3704E+03   SqfVal: 21.9930  Rf: 21.34937  Los: 0.1377 Dis: 0.0354  Sca: 0.322E+01

```