Manual

The following sections briefly describe how to run DATMW from the command-line, the required input, and the runtime output.

Introduction

DATMW estimates the molecular weight (MW) for proteins based on multiple methods (Hajizadeh et al., 2018). Compared to other methods, its main feature is a Bayesian approach to MW estimation that integrates information from multiple SAXS-derived observables to produce an MW estimate with an uncertainty estimate.

Implemented parameter-free methods, i.e. methods that do not require accurate concentration measurements:

Implemented methods that require accurate concentration measurements:

  • MW from absolute scale
  • MW from relative scale, using a standard measurement

See the linked manuals for details on their implemented methods.

MW from Porod Invariant

In contrast to the Porod Volume and apparent Volume calculations, the MW from the Porod Invariant

\[Q_P = \int_0^\infty s^2 \cdot I(s) ds\]

is completed by extrapolation using the Guinier approximation in the range \(0 \le s \lt s(\text{first})\), using the \(R_g\) value provided by the user, and by extrapolating to infinity for \(s \cdot R_g \gt 8\). Then, the final MW estimate is obtained by:

\[MW_{P} = \left. \frac{2 \pi^2 I(0)}{Q_P} \right/ 1.37\]

where 1.37 is an empirically obtained constant.

MW from Bayesian Inference

The Bayesian method combines the likelihoods derived from the individual MW estimates, Porod Invariant, Porod Volume, Volume of Correlation, Apparent Volume and Size and Shape, to obtain a posterior probability distribution for the molecular weight.

Details of this method are too involved to reproduce here, please see the reference.

Running datmw

Usage:

$ datmw [OPTIONS] <SASDATA(S)>

OPTIONS known by DATMW are described in the next section, the required argument SASDATA file(s) in the section on input files.

Command-line arguments and options

DATMW requires the following command line arguments:

Argument Description
SASDATA(S) One or more experimental SAS data (.dat) or regularised SAS data (.out) files.

Absolute as well as relative paths to data files are accepted. Instead of a file name, one of the arguments may be given as ‘-‘ to read regularised SAS data (.out) from stdin.

DATMW recognizes the following command-line options:

Short option Long option Description
  --method=<NAME> One of: Qp, Porod, Vc, MoW, sizeshape, Bayes, absolute, relative. Default: Bayes.
  --i0=<VALUE> Experimental forward scattering (I(0)). Required for all methods.
  --rg=<VALUE> Experimental Radius of Gyration (Rg). Required for Porod, Qp, Vc, MoW, sizeshape, Bayes.
  --first=<N> Index of the first point to be used. Default: 1. Required for Porod, Qp, Vc, MoW, sizeshape, Bayes.
  --psv=<X> Partial specific volume in units of \(\text{cm}^3/\text{g}\). Default: 0.7425. Used by method=absolute.
  --contrast=<X> Contrast in units of \(10^{10}\,\text{cm}^{-2}\). Default: 2.8086. Used by method=absolute.
  --i0_standard=<VALUE> Forward scattering of the standard. Required by method=relative.
  --mw_standard=<VALUE> Expected MW of the standard (Da). Required by method=relative.
-u --unit=<u|1|2|3|4> Define angular units of the experimental SAS data (.dat) or regularised SAS data (.out) files.
-v --version Print version information and exit.
-h --help Print usage information and exit.

Notes:

  • --i0 is required for all methods.
  • --rg and --first are required for Porod, Qp, Vc, MoW, sizeshape, and Bayes.
  • For --method=absolute, the provided I(0) must be on an absolute scale; psv and contrast may be supplied or defaults are used.
  • For --method=relative, both --i0_standard and --mw_standard must be provided in addition to --i0 of the sample.

Runtime output

DATMW output consists of result lines for each input file with the following values:

  • --method=Qp

    smax (\(\AA^{-1}\)), MW (Da), file name

  • --method=Porod

    smax (\(\AA^{-1}\)), Volume (\(\AA^3\)), MW (Da), file name

  • --method=Vc

    smax (\(\AA^{-1}\)), Vc, MW (Da), file name

  • --method=MoW

    smax (\(\AA^{-1}\)), Q’, V’ (apparent volume), V (Volume, \(\AA^3\)), MW (Da), file name

  • --method=sizeshape

    MW (Da), file name

  • --method=Bayes

    MW (Da), MW Score, CI lower, CI upper, CI prob., file name

  • --method=absolute

    MW (Da), file name

  • --method=relative

    MW (Da), file name

datmw input files

DATMW expects background-subtracted experimental SAS data (.dat) or regularised SAS data (.out) files.

If SASDATA is a regularised SAS data (.out) file, reciprocal space \(R_g\) and \(I(0)\) stated in the file are used, but may be overridden by the corresponding command-line options.

The options --i0, --rg, --first and --unit are applied identically to all simultaneous input files.

Examples

Molecular Weight estimate from experimental SAS data (.dat); values for --rg, --i0 and --first from AUTORG.

$ datmw --rg=15.0 --i0=6.47 --first=2 --method=qp lyzexp.dat
   0.466667        8183.99      lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=porod lyzexp.dat
   0.498363        15060.2        10157.5      lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=vc lyzexp.dat
   0.300184        148.568        11953.7     lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=mow lyzexp.dat
   0.451733       0.953803E-03    20695.3        14002.3        11552.4     lyzexp.dat
$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=sizeshape lyzexp.dat
    11202.1     lyzexp.dat

Here the molecular weight estimates are 8184 Da, 10158 Da, 11954 Da, 11552 Da and 11202 Da, respectively. Combined into a single Bayesian estimate:

$ bin/datmw --rg=15.0 --i0=6.47 --first=2 --method=bayes lyzexp.dat
    11250.0       0.824794        9950.00        11650.0       0.917916     lyzexp.dat

Here 11250 Da, with a 92% probability to be within the Credibility Interval (CI) of 9950-11650 Da.