monsa
Manual
The following sections briefly describe the method implemented in MONSA, usage in dialog mode as well as the required input and the produced output files.
Introduction
MONSA is an extended version of DAMMIN for multiphase bead modelling which allows one to fit simultaneously multiple curves (e.g. from X-ray and/or neutron contrast variation series).
Running MONSA
MONSA reads in multiple data sets and information about the contrasts and volume fractions of the phases in a particle. The program can simultaneously fit data recorded at different instrumental settings and also with different radiations (e.g. X-rays and neutrons). The structure of the input data is therefore somewhat complicated. The program requires:
- aMASTER file(file *.mst) containing the general phase information and references to CONTROL file(s);
- CONTROL file(s)(.con) containing the smearing information for the given setting, information about contrasts and references to DATA files (.dat);
- DATA files(*.dat), containing raw experimental data at different contrasts;
- aPDB-like filedefining the number of phases and the SEARCH VOLUME
for the model.
Command-Line Arguments and Options
MONSA recognizes the following command-line options:
Option | Description |
---|---|
–model-format=<FMT> | Format of 3D models, one of: cif, pdb (default: cif) |
–help | Print a summary of arguments, options, and exit. |
–version | Print the ATSAS version and exit. |
Interactive Configuration
Screen Text | Default | Description |
---|---|---|
Log file name: | N/A | An identifier (up to six characters) to define all the output files names |
Project description: | N/A | Text description of the problem |
Master file name: | N/A | Name of the master file |
Maximum order of harmonics: | 14 | The more harmonics, the more accurate the reconstruction becomes, but the slower the process. May be between 5 and 20 |
DAM coordinates file name: | N/A | Name of the Search Volume file generated byBODIES. |
Symmetry: Pn or Pn2 (n=1,2,3,4,5,6): | P1 | Specify the symmetry to enforce on the particle. |
Reset (unfix) all atoms [ Y / N ]: | No | If ‘Y’, thephases indicesallowed for the atoms in the pdb file are set to. |
Atomic radius: | var | If the file is prepared byBODIES, the value is read from the file. |
Atomic volume: | var | This is ( 4 / 3 )\pi*r^3^/ 0.74 (volume per sphere for dense packing). |
Preference for non-solvent contacts: | 0.3 | With a value of 0.0, the phase of the atom (solvent or protein) does not influence thelooseness penalty weight. When this value is increased, non-solvent contacts are prefered, through the calculation of thelooseness penalty weight. If unsure, use the default value. |
Looseness penalty weight: | 50 | How much the Looseness Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value. If unlike smooth surfaces, sharp edges are observed, try decreasing this penalty weight. |
Discontiguity penalty weight: | 50 | How much the Discontiguity Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value. |
Randomize the initial DAM [ Y / N ] | Yes | If ‘Y’, the starting model is randomized |
Fix the overall scale factor [ Y / N ] | No | If No (recommended), then the overall scale factor, as well as individual relative scale factors for all the data sets will be determined automatically. If the scale factor is known (data on absolute scale) in may be fixed and entered manually. |
Volume fraction penalty weight | 50 | How much the Volume Fraction Penalty should influence the acceptance or rejection of phase changes. |
Rg penalty weight | 0.0 | How much the radius of gyration penalty should influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. |
Center penalty weight (negative = WeiPer): | 0.0 | How much the Center Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value. |
Initial annealing temperature : | 10 | If the value is too high, it could take ages for the system to cool down. If the value is too low, the system can be trapped in a local minimum. If unsure use the default value. |
Annealing schedule factor : | 0.9 | Factor by which the temperature is decreased; 0.95 is a good average value. Faster cooling for smaller systems is possible (0.9), but slower cooling (0.99) needs to be applied more often. |
Max # of iteration at each T: | var | Finalize temperature step and cool after this many iterations at the latest. |
Max # of successes at each T: | var | Finalize temperature step and cool after at most this many successful phase changes. |
Min # of successes to continue: | var | Stop if not at least this many successful state changes within a single temperature step can be done. |
Number of annealing steps: | 100 | Stop after this number of steps if did not cooled down before. |
Plot the final fits [ Y / N ]: | No | Display the final fits. |
Runtime Output
On runtime, two lines of output will be generated for each temperature step:
jAnn: 1 T: 0.100E+02 iSuc: 11718 nEva: 12542 CPU: 0.4056E+02
SqfVal: 22.8539 Rf: 22.25999 Los: 0.1312 Dis: 0.0464 Sca: 0.342E+01
The fields can be interpreted as follows, top-left to bottom-right:
Field | Description |
---|---|
jAnn | Step number. Starts at 1, increases monotonically. |
T | Temperature measure, starts at an arbitrary high value, decreases each step by the temperature schedule factor |
iSuc | Number of successful phase changes in this temperature step. The number of successes should slowly decrease, the first couple of steps should be terminated by themaximumnumber of successes criterion. If instead themaximum number of iterations per stepare done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly. |
nEva | Accumulated number of function evaluations. |
CPU | Elapsed CPU cycles since the annealing procedure was started. |
SqfVal | Goodness of the model (fit + penalties). |
Rf | Goodness of fit of simulated data versus experimental data, does not take penalties into account. |
Los | Contribution of Looseness Penalty, not taking theLooseness Penalty Weightinto account. |
Dis | Contribution of Discontiguity Penalty, not taking theDiscontiguity Penalty Weightinto account. |
Sca | Scale factor |
MONSA Input Files
Master File
The master file contains the general phase information: volumes of the different phases, radii of gyration, connectivity etc. It has the following structure:
Line 1 Title (up to 80 characters)
Line 2 Four theoretical volumes
of individual phases (required)
Line 3 Four theoretical radii of gyration in Ångstrom (even [if your data are in nm-1](#control))
of individual phases (optional)
Line 4 Connectivity indicators of phases (required):
'1' for 'interconnected', '0' for 'disconnected', '-1' for 'symmetry defined'
Line 5 Control file name and Npts for Guinier fit
(no fit if the latter is equal to '-1')
... OPTIONAL ...
Line 6 Control file name and Npts for Guinier fit
(no fit if the latter is equal to '-1')
...
etc Erroneous lines skipped; read to the end
The program works with up to four-component particles. If the number of components (phases) is less than four, just put zeroes for the values required for this phase.
Control File
The control file contains the smearing information for the given setting, information about contrasts and references to the data file. It has the following structure:
Line 1 Resolution file name, resolution setting number (free format)
Line 2 Output file name for the fits (not used) (free format)
Line 3 Title (character*80)
Line 4 Number of points in the setting (free format)
(put negative number to indicate nm-1 as angular units)
Line 5 Data file name, contrasts and constants (free format)
etc Erroneous lines skipped; read to the end
The information about the data sets is given in the format:
Filename Dro1 Dro2 Dro3 Dro4 Mult Const Weight
Field | Description |
---|---|
Filename | Filename of the scattering pattern (up to 15 characters). |
Dron | Contrast of the nth phase. |
Mult | The scattering pattern is multiplied by this factor afterconstant substraction. |
Const | Constant subtractedto the scattering pattern. |
Weight | Relative weight of the data set. |
Smearing
If required, MONSA smears the theoretical curves using the resolution function introduced by J. Skov Pedersen et al. (1990), J. Appl. Cryst., 23, 321. Several subroutines for data smearing are provided by J. Skov Pedersen and modified for the use in MONSA. The resolution file must have the following format (the numbers describe a setting at RISOE SANS instrument):
Row | Value | Description |
---|---|---|
1 | 0.8 | Effective collimation slit diameter in cm. |
2 | 0.35 | Effective sample diameter in cm. |
3 | 300 | Collimation distance in cm. |
4 | 105 | Sample-detector distance in cm. |
5 | 3 | \lambdainÅ |
6 | 0.18 | \delta(\lambda)/\lambda |
7 | 1.1 | Pixel size in cm. |
8 | 0.0000 | Averaging error (accounted for in Pixel size). |
If the file is corrupted or does not exist, no smearing is performed. An example of the resolution file is given below. The resolution setting number is the number of column in the resolution file.
0.00001, 0.00001, 0.00001 , 0.8 , 0.8
0.00001, 0.00001, 0.00001 , 0.30 , 0.35
1100. , 200. , 100. , 300. , 100
180. , 125. , 100. , 110. , 100
6.0 , 5.6 , 1. , 3.22 , 6.
0.10 , 0.09 , 0.01 , 0.18 , 0.18
0.0001 , 1.57 , 0.01 , 1.1 , 1.1
0.0000 , 0.0000 , 0.0000 , 0.0000 , 0.0000
Data Files
The experimental data files must have the following structure:
1st line - comment
2nd line etc - s, I(s), Err(s) in free format
where s = 4 * \pi * sin ( heta ) / \lambda in Å ^-1^, I(s) is the experimental intensity and Err(s) is the standard deviation
Search Volume File
The input file defining the search volume is a PDB-like file containing the coordinates of dummy atoms with the extra “ phase “ information telling to which phase the atom belongs. The file looks like this:
0 1 2 3 4 5 6 7
01234567890123456789012345678901234567890123456789012345678901234567890123456
ATOM 1 CA ASP 1 -17.000 -16.957-101.666 1.00 20.00 3 3012
ATOM 2 CA ASP 1 -17.000 -.957-101.666 1.00 20.00 1 3 3012
ATOM 3 CA ASP 1 -17.000 15.043-101.666 1.00 20.00 0 1 3012
ATOM 4 CA ASP 1 -1.000 -16.957-101.666 1.00 20.00 2 3012
ATOM 5 CA ASP 1 -1.000 -.957-101.666 1.00 20.00 1 202
The characters 1 to 65 in a line are as in a normal PDB file. Column 67 (iCore): if ‘1’, the phase of this atom is fixed and will not be changed during the search (“core atom”); if iCore=’0’ or ‘ ‘, the phase of the atom is free to change. The core indicators may be re-computed automatically when loading the model to that iCore will be put to 1 ff an atom is surrounded by the atoms of the same phase only. In this case, the program will change interface atoms. This option may be useful if a preliminary model is available. Column 69( iPhas ): is the phase indices of the atom (ordinal number in the iAllo array). Column 71 (nAllo): is the number of phases allowed for the given atom. Columns 72 etc ( iAllo ): are the indices of the allowed phases such that iAllo( iPhas ) is the phase of the atom. This system allows one, if required, to select the phases which can occupy any given point. In the above example of a two-phase system
Atom 1: free atom of phase 2
Atom 2: fixed atom of phase 2
Atom 3: free atom of phase 0 (solvent)
Atom 4: free atom of phase 1
Atom 5: free atom of phase 0 (solvent; could be only solvent or phase 2)
In most cases, however, the user does not need to learn the structure of this file. A program BODIES is available to generate an ellipsoidal (or spherical) search volume for the given number of phases and given number of dummy atoms. In a general case, one can always use the spherical search volume with the diameter equal to Dmax, as in DAMMIN. MONSA will automatically calculate the number of phases in the search model when reading this file. The number of dummy atoms in the search volume must not exceed 10000! The distribution package includes an example of a batch file containing the required answers. Typing monsa144 < test.ans will run the structure determination for the supplied example in the batch mode (may take a day of CPU on a PC!). The example is taken from the article (a 30S ribosomal subunit-like particle with simulated proteins inside). The model is given in the file model.pdb (phase 1 - proteins, phase 2 - RNA), the initial search volume in the file sph105-2.pdb (a sphere with diameter 210A, two-phase system; generated by BODIES ). The scattering curves *.dat are computed from this model (see above example of test control data file) and randomized. NOTE that for any solution obtained using this method, an enantiomorph would yield the same scattering patterns! It was also observed (quite seldom) in test examples that one phase was enantiomorphous whereas the others not.
MONSA Output Files
With each successful run, MONSA creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.
Extension | Description |
---|---|
.log | Contains the same information as the screen output and is updated during execution of the program. |
-0.pdb | This pdb file contains the beads of the solvent (a.k.a. the search volume). |
-1.pdb : -n.pdb | These pdb files contain the beads of each individual phases. |
.pdb | This pdb file contains the beads of all the phases and the solvent (a.k.a. the search volume). The beads of the different phases and the solvent are distinguished by their chain number. The header of the file contains information about the application used and about invariants of the particle, e.g. R~g~, volume and molecular mass of the particle. |
.fit | Fit of the simulated scattering curve versus a smoothed-out version of the real-data. Columns in the output file are: ‘s’, ‘c.I~exp~’, ‘c.ErrI~exp~’ and ‘I~FIT~’. |
Generating a Search Volume
In previous releases two helper applications, DAMESV and DAMEMB, were included to generate suitable search volumes for MONSA. This functionality was integrated into the search-volume mode of BODIES.
Example
Master file for the test example: contrast variation simulated data of a 30S ribosomal subunit-like particle consisting of “RNA” (phase 2, density = 4.0) with some “proteins” inside (phase 1; density = 2.0)
Master file for quazi-30S model randomized data to s=0.2
3.7e5 8.7e5 0.00 0.0 ! Desired Volumes
49.0 61.0 0.00 0.0 ! Desired Rgs
0 1 0 0 ! Connectivity
'test.con' 10 ! Control file name; Rgs will be
! computed from 10 first points
Control file for the test example
'Point collimation' 1 !! No smearing
'test.fit' !! Output fits
Test for 30S -- use randomized data up to 0.2 !! Title
98 !! Number of points
'0r1.dat' 2.00 4.00 0.00 0.00 1.000 0.0 1.00 0
'2r1.dat' 0.00 2.00 0.00 0.00 1.000 0.0 1.00 0
'4r1.dat' -2.00 0.00 0.00 0.00 1.000 0.0 1.00 0
'6r1.dat' -4.00 -2.00 0.00 0.00 1.000 0.0 1.00 0
'infr1.dat' 1.00 1.00 0.00 0.00 1.e-6 0.0 1.00 0
Here, the data sets ‘?r1.dat’ correspond to the scattering patterns from the test body in solvents with density 0.0, 2.0, 4.0, 6.0. The set ‘infr1.dat’ corresponds to “shape scattering” (infinite contrast). Note that the test would have worked also without the ‘infinite contrast’ data. Please note:
- filename should be given in quotes (up to 15 characters);
- put zeroes as contrasts for phases, which are not present;
- all files in the setting MUST have the same number of points and the same angular axis; if you have data set(s) on another angular grid(s), put them as another setting(s);
- from each data set, a constant "Const" will be subtracted and the result will be multiplied by "Mult";
- the data sets will be weighted with the relative weight "Weight" in the total discrepancy; reducing the weight is equivalent to increasing errors in the data file;
- number of points must not exceed 2048. Choose the value, so that the maximal s value becomes 2.5 nm^-1^.Example of input data ```
Randomized data, RELERR= 3.00 %, file 0.dat 12-NOV-1998 13:22:35 .600000E-02 .176494E+14 .504240E+12 .800000E-02 .168392E+14 .486090E+12 .100000E-01 .159999E+14 .463710E+12 ….. SKIPPED FOR BREVITY …… .194000E+00 .628594E+10 .184596E+09 .196000E+00 .582946E+10 .179298E+09 .198000E+00 .591612E+10 .173796E+09 .200000E+00 .570405E+10 .168174E+09
After the [configuration](#interactive-configuration), the program computes the parameters for the
initial state and the simulated annealing procedure starts:
--- Starting values --- Total scale factor : 3.51404919007708 Function value : 733.688635068192 Overall discrepancy : 696.618264908644 SQRT(Overall discr.) : 26.3935269509144 DAM looseness : 0.137137795235494 DAM discontiguity : 6.681519817391703E-002 Overall penalty : 37.0703701595471 jAnn: 1 T: 0.100E+02 iSuc: 11718 nEva: 12513 CPU: 0.4555E+02 SqfVal: 23.3509 Rf: 22.69190 Los: 0.1314 Dis: 0.0517 Sca: 0.338E+01 jAnn: 2 T: 0.900E+01 iSuc: 11718 nEva: 25119 CPU: 0.9059E+02 SqfVal: 22.7818 Rf: 22.15299 Los: 0.1243 Dis: 0.0272 Sca: 0.341E+01 jAnn: 3 T: 0.810E+01 iSuc: 11718 nEva: 37867 CPU: 0.1366E+03 SqfVal: 22.5775 Rf: 22.01942 Los: 0.1295 Dis: 0.0268 Sca: 0.327E+01 jAnn: 4 T: 0.729E+01 iSuc: 11718 nEva: 50732 CPU: 0.1830E+03 SqfVal: 22.5775 Rf: 22.01942 Los: 0.1295 Dis: 0.0268 Sca: 0.327E+01 jAnn: 5 T: 0.656E+01 iSuc: 11718 nEva: 63648 CPU: 0.2309E+03 SqfVal: 22.5775 Rf: 22.01942 Los: 0.1295 Dis: 0.0268 Sca: 0.327E+01 jAnn: 6 T: 0.590E+01 iSuc: 11718 nEva: 76727 CPU: 0.2778E+03 SqfVal: 22.3977 Rf: 21.72769 Los: 0.1368 Dis: 0.0467 Sca: 0.330E+01 jAnn: 7 T: 0.531E+01 iSuc: 11718 nEva: 89852 CPU: 0.3235E+03 SqfVal: 22.2560 Rf: 21.66409 Los: 0.1292 Dis: 0.0197 Sca: 0.329E+01 jAnn: 8 T: 0.478E+01 iSuc: 11718 nEva: 103078 CPU: 0.3704E+03 SqfVal: 21.9930 Rf: 21.34937 Los: 0.1377 Dis: 0.0354 Sca: 0.322E+01
```