datmw
Manual
The following sections briefly describe how to run DATMW from the command-line, the required input, and the runtime output.
Introduction
DATMW estimates the molecular weight (MW) for proteins based on multiple methods (Hajizadeh et al., 2018). Compared to other methods, its main feature is a Bayesian approach to MW estimation that integrates information from multiple SAXS-derived observables to produce an MW estimate with an uncertainty estimate.
Implemented parameter-free methods, i.e. methods that do not require accurate concentration measurements:
- MW from Porod Invariant
- MW from Porod Volume
- MW from Volume of Correlation
- MW from apparent volume
- MW from Size and Shape
- MW from Bayesian inference, based on all of the above
Implemented methods that require accurate concentration measurements:
- MW from absolute scale
- MW from relative scale, using a standard measurement
See the linked manuals for details on their implemented methods.
MW from Porod Invariant
In contrast to the Porod Volume and apparent Volume calculations, the MW from the Porod Invariant
\[Q_P = \int_0^\infty s^2 \cdot I(s) ds\]is completed by extrapolation using the Guinier approximation in the range \(0 \le s \lt s(\text{first})\), using the \(R_g\) value provided by the user, and by extrapolating to infinity for \(s \cdot R_g \gt 8\). Then, the final MW estimate is obtained by:
\[MW_{P} = \left. \frac{2 \pi^2 I(0)}{Q_P} \right/ 1.37\]where 1.37 is an empirically obtained constant.
MW from Bayesian Inference
The Bayesian method combines the likelihoods derived from the individual MW estimates, Porod Invariant, Porod Volume, Volume of Correlation, Apparent Volume and Size and Shape, to obtain a posterior probability distribution for the molecular weight.
Details of this method are too involved to reproduce here, please see the reference.
Running datmw
Usage:
$ datmw [OPTIONS] <SASDATA(S)>
OPTIONS known by DATMW are described in the next section, the required argument SASDATA file(s) in the section on input files.
Command-line arguments and options
DATMW requires the following command line arguments:
| Argument | Description |
|---|---|
| SASDATA(S) | One or more experimental SAS data (.dat) or regularised SAS data (.out) files. |
Absolute as well as relative paths to data files are accepted. Instead of a file name, one of the arguments may be given as ‘-‘ to read regularised SAS data (.out) from stdin.
DATMW recognizes the following command-line options:
| Short option | Long option | Description |
|---|---|---|
| --method=<NAME> | One of: Qp, Porod, Vc, MoW, sizeshape, Bayes, absolute, relative. Default: Bayes. | |
| --i0=<VALUE> | Experimental forward scattering (I(0)). Required for all methods. | |
| --rg=<VALUE> | Experimental Radius of Gyration (Rg). Required for Porod, Qp, Vc, MoW, sizeshape, Bayes. | |
| --first=<N> | Index of the first point to be used. Default: 1. Required for Porod, Qp, Vc, MoW, sizeshape, Bayes. | |
| --psv=<X> | Partial specific volume in units of \(\text{cm}^3/\text{g}\). Default: 0.7425. Used by method=absolute. | |
| --contrast=<X> | Contrast in units of \(10^{10}\,\text{cm}^{-2}\). Default: 2.8086. Used by method=absolute. | |
| --i0_standard=<VALUE> | Forward scattering of the standard. Required by method=relative. | |
| --mw_standard=<VALUE> | Expected MW of the standard (Da). Required by method=relative. | |
| -u | --unit=<u|1|2|3|4> | Define angular units of the experimental SAS data (.dat) or regularised SAS data (.out) files. |
| -v | --version | Print version information and exit. |
| -h | --help | Print usage information and exit. |
Notes:
--i0is required for all methods.--rgand--firstare required for Porod, Qp, Vc, MoW, sizeshape, and Bayes.- For
--method=absolute, the provided I(0) must be on an absolute scale; psv and contrast may be supplied or defaults are used. - For
--method=relative, both--i0_standardand--mw_standardmust be provided in addition to--i0of the sample.
Runtime output
DATMW output consists of result lines for each input file with the following values:
-
--method=Qpsmax (\(\AA^{-1}\)), MW (Da), file name
-
--method=Porodsmax (\(\AA^{-1}\)), Volume (\(\AA^3\)), MW (Da), file name
-
--method=Vcsmax (\(\AA^{-1}\)), Vc, MW (Da), file name
-
--method=MoWsmax (\(\AA^{-1}\)), Q’, V’ (apparent volume), V (Volume, \(\AA^3\)), MW (Da), file name
-
--method=sizeshapeMW (Da), file name
-
--method=BayesMW (Da), MW Score, CI lower, CI upper, CI prob., file name
-
--method=absoluteMW (Da), file name
-
--method=relativeMW (Da), file name
datmw input files
DATMW expects background-subtracted experimental SAS data (.dat) or regularised SAS data (.out) files.
If SASDATA is a regularised SAS data (.out) file, reciprocal space \(R_g\) and \(I(0)\) stated in the file are used, but may be overridden by the corresponding command-line options.
The options --i0, --rg, --first and --unit are applied identically to
all simultaneous input files.
Examples
Molecular Weight estimate from experimental SAS data (.dat);
values for --rg, --i0 and --first from AUTORG.
$ datmw --rg=15.0 --i0=6.47 --first=2 --method=qp lyzexp.dat
0.466667 8183.99 lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=porod lyzexp.dat
0.498363 15060.2 10157.5 lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=vc lyzexp.dat
0.300184 148.568 11953.7 lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=mow lyzexp.dat
0.451733 0.953803E-03 20695.3 14002.3 11552.4 lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=sizeshape lyzexp.dat
11202.1 lyzexp.dat
Here the molecular weight estimates are 8184 Da, 10158 Da, 11954 Da, 11552 Da and 11202 Da, respectively. Combined into a single Bayesian estimate:
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=bayes lyzexp.dat
11250.0 0.824794 9950.00 11650.0 0.917916 lyzexp.dat
Here 11250 Da, with a 92% probability to be within the Credibility Interval (CI) of 9950-11650 Da.