Manual

The following sections briefly describe the method implemented in CRYSOL, how to run CRYSOL from the command-line, as well as the required input and the produced output files.

Introduction

CRYSOL is a program for evaluating the solution scattering from macromolecules with known atomic structure and possibly fitting it to experimental scattering curves from Small-Angle X-ray Scattering (SAXS). As an input one can either use a coordinate file in .pdb or .cif format, with an X-ray or NMR structure of a protein or a protein-DNA(RNA) complex. In addition, and contrary to previous versions, as of ATSAS 3.1 CRYSOL also processes dummy atom and dummy residue models correctly.

The program uses multipole expansion of the scattering amplitudes to calculate the spherically averaged scattering pattern and takes into account different representations of the hydration shell. Given SAXS experimental data, CRYSOL can fit the theoretical scattering curve by minimizing the discrepancy (chi-square value) between the calculated scattering and the experimental data. This fitting is done by varying three parameters: (i) total displaced solvent volume (ii) contrast of the hydration shell (iii) relative background

Running crysol

Usage:

$> crysol [OPTIONS] [FILE(S)] 

Here, FILE(S) are one or more atomic coordinate files to process, as well as experimental data to fit against. CRYSOL recognizes the following command-line arguments and options.

Command-Line Arguments and Options

CRYSOL recognizes the following command-line arguments.

Argument Description
FILE(S) At least one atomic coordinates file in .pdb, .cif or .ent format, and one or more experimental SAS data (.dat) files.

Absolute as well as relative paths to data files are accepted. Instead of a file name, one of the arguments may be given as ‘-‘ to read data from stdin.

CRYSOL recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.

Short Option Long Option Description
  --model <ID> or -nmr <ID> Select a specific model ID from the coordinate file; default: all model IDs
  --chain <ID> or -cid <ID> Select a specific chain ID from the coordinate file; default: all chain IDs
  --alternative-names or -old Enable alternative (old) atom naming for all atomic structure files; default: disabled. See also:components.cif
  --explicit-hydrogens or -eh Use explicit hydrogens provided in the atomic structure file; default: use implicit hydrogen groups determined by looking up the number of hydrogens in components.cif.
  --lm <N> Maximum order of harmonics; default: 20, minimum: 1, maximum: 100. This defines the resolution of the calculated curve. The default value should be sufficient in most use cases. For large or extended particles higher orders could improve the results, at the cost of an increased run time. This value must be increased whenever the maximum scattering angle is increased (smax).
  --fb <N> Order of Fibonacci grid; default: 17, minimum: 10, maximum: 18 The order of Fibonacci grid defines the number of points describing the surface of the macromolecule. Higher grid orders give a more accurate surface representation, but more CPU expensive. Only used if option shell=directional (the default).
  --ns <N> Number of calculated data points; default: 101, maximum = 10001.
  --smax <SM> or -sm <SM> Maximum scattering angle in inverse angstroms, either for calculating the theoretical curve up to SM or for fitting to SM; default: 0.5\(\AA^{-1}\), maximum: 2.0\(\AA^{-1}\)
  --units <N> or -un <N> Angular units of the experimental data: 1 = \(\AA^{-1}\), \(s = 4\pi sin(\theta)/\lambda\); 2 = \(\mathrm{nm}^{-1}\), \(s = 4 \pi sin(\theta)/\lambda\); 3 = \(\AA^{-1}\), \(s = 2 sin(\theta)/\lambda\); 4 = \(\mathrm{nm}^{-1}\), \(s = 2 sin(\theta)/\lambda\). By default, an attempt is made to estimate the unit scale.
  --dns <VALUE> Solvent density; default: \(0.334 \mathrm{e}/\AA^3\), the electron density of pure water. Solvents with high salt concentration may have a somewhat higher electron density.
  --dro <VALUE> Contrast of hydration shell, default: \(0.03 \mathrm{e}/\AA^3\)
  --constant or -cst Enables constant subtraction. This operation accounts for possible systematic errors due to mismatched buffers in the experimental data.
  --skip-minimization Skip adjustment of parameters and calculate fit to experimental data with dns/dro values provided
  --energy <eV> X-ray energy in eV, required for energy correction in anomalous SAXS only.
  --shell <VALUE> Shell kind, one of ‘directional’ (classic CRYSOL) or ‘water’ (previously CRYSOL3)
-p --prefix <FILE> The PREFIX to prepend to any output filename; default: basename of the input file(s).
  --implicit-hydrogen <N> Set this to a value N>=0 to override ‘unable to determine number of hydrogens’ errors.
  --sub-element <NAME> Set this to a valid element to override ‘unable to determine element’ errors.
-v --version Print version information and exit.
-h --help Print a summary of arguments, options, and exit.

Runtime Output

On runtime, a number of parameters will be reported for each input model. These are also recorded in the .log output file.

Common Issues

This section describes the most common problems CRYSOL might report and how to mitigate them.

  1. To determine the number of implicit hydrogens, CRYSOL requires components.cif, the Chemical Component Dictionary maintained by the EBI. If not found, an error is reported:
    error: ATSAS resource files not found. Check installation and ATSAS environment variable.
    

    The file is part of the ATSAS package. Please check your installation.

  2. There are multiple ways to handle errors of the kind:
    error: unable to determine number of hydrogens for ...
    
    • in case the model is very old, the alternative-names option may help to identify the correct atom name
    • validate that the reported residue name in the atomic model matches what components.cif (see above) names. Update the residue and/or atom names accordingly.
    • in case the model already contains all hydrogen atoms, use the explicit-hydrogen option to not attempt to detemine the number of implicit hydrogens
    • use the implicit-hydrogen option to set a number of hydrogens to assume where they can not be determined. Previous implmenentations of crysol used to assume 0 (zero).
  3. There are multiple ways to handle errors of the kind:
    error: unable to determine element for ...
    
    • in case the model is very old, the alternative-names option may help to identify the correct atom name
    • validate that the reported residue name in the atomic model matches what components.cif (see above) names. Update the residue and/or atom names accordingly.
    • use the sub-element option to substitue an element where it can not be determined. Previous implmenentations of crysol used to substitute O (Oxygen).

crysol Input Files

CRYSOL reads one or more atomic coordinates file in .pdb, .cif or .ent format. Optionally zero, one or more experimental SAS data (.dat) files can be supplied.

crysol Output Files

All output files start with a configurable prefix and appended suffix. If no prefix is provided, output file names are generated based on the base name of the inputs. For example: “6lyz.cif” generates 6lyz.log, 6lyz.int, 6lyz.abs and “6lyz.cif lyzexp.dat” creates 6lyz_lyzexp.log and 6lyz_lyzexp.fit in addition. With prefix, the two .log files are merged into one.

Suffix Type Description
.log ASCII A copy of the screen output
.int ASCII Calculated scattering on arbitrary scale
.abs ASCII Calculated scattering on absolute scale
.fit ASCII Fit of the calculated scattering curve versus the experimental data.
.alm BINARY Binary amplitudes.

The first line of the .fit file contains the following fields:

Field Description
Dro Optimal hydration shell contrast.
RGT Radius of gyration (in \(\AA\)) estimated from the theoretical curve.
Vol Optimal excluded volume (in \(\AA^3\)).
Chi^2 Discrepancy between theoretical and experimental curves.

Examples

All examples use 6lyz.pdb, lyz_014.dat and lyzexp.dat. They are included in the documentation directory of the installation package.

Calculating scattering intensity without fitting

$ crysol 6lyz.pdb --lm 18 

Calculate the scattering intensity of an atomic model or dummy atom model (.pdb), with a maximum order of harmonics = 18

Processing of a single atomic model with fitting

$ crysol 6lyz.pdb *.dat --constant 

Calculate the scattering intensity of an atomic model or dummy atom model (.pdb), and fit it to all available experimental data, allow the subtraction of a constant.

Processing multiple atomic models with fitting

$ crysol 4mld.pdb SASDAB7_fit1_model1.pdb SASDAB7.dat 

Calculate the scattering intensity of SASDAB7 against the ComE protein alone (4mld.pdb) and in complex with its DNA promoter region (SASDAB7_fit1_model1.pdb).

$ crysol 1f6g.pdb exp_file.dat --model 5 --chain A 

Calculate the scattering intensity of chain A of conformer #5 from NMR ensemble (1f6g.pdb) and fit it to experimental data.