Manual

CORAL combines the algorithms of SASREF and BUNCH. The following describes the method implemented in CORAL, details of the dialog prompt as well as the required input and the produced output files.

Introduction

CORAL (COmplexes with RAndom Loops) performs SAXS-based rigid body modeling of complexes with missing fragments (e.g., terminal regions or interdomain linkers). Similar to SASREF, CORAL translates and rotates atomic models of individual domains within a complex. These movements are constrained by the distances between the N- and C-terminal portions of adjacent domains within a chain.

A pre-generated library of self-avoiding random loops composed of dummy residues (DRs) is used, covering linker lengths from 5 to 100 amino acids. Each length is sampled with 20 random structures for every possible end-to-end distance, with a binning step of \(2 \AA\). When a domain is moved, its new position is checked against the library. If a suitable linker is not found, the movement is rejected. If successful, the corresponding random loop is inserted as a placeholder for the missing linker, and its contribution is added to the computed scattering intensity and target function (e.g., overlaps, contact restraints).

C- and N-terminal portions can also be randomly selected from the library but do not constrain domain motion. CORAL allows simultaneous fitting of multiple scattering curves from subsets of the system, assuming the same arrangement of domains/subunits in these constructs. Symmetry (as a constraint) and anisometry (as a restraint) can also be considered.

A simulated annealing protocol optimizes the positions and orientations of high-resolution domain models and the approximate conformations of missing polypeptide chain portions.

Running CORAL

Command-Line Arguments and Options

CORAL is mainly run in dialog mode, command line arguments may only be used to supplement the configuration.

Short Option Long Option Description
  --seed <INT> Set the seed for the random number generator
  –implicit-hydrogen <N> Set this to a value N>=0 to override ‘unable to determine number of hydrogens’ errors.
  –sub-element <NAME> Set this to a valid element to override ‘unable to determine element’ errors.
  --model-format <FMT> Format of 3D models, one of: cif, pdb (default: cif)
-v --version Print version information and exit.
-h --help Print a summary of arguments, options, and exit.

Interactive Configuration

A special configuration file has to be created before running CORAL.

There are two modes, EXPERT and USER. In USER mode, fewer questions are asked and default values are used for most parameters. All USER mode questions are also asked in EXPERT mode. The default settings are the same in both modes.

Screen Text Default Mode Description
Computation mode (User or Expert) USER User Mode selection.
Log file name N/A User Project identifier, will be used as a prefix for all output file names.
Enter project description N/A User Any text that will be stored in the log file.
File name with objects info N/A User File name with the initial configuration.
Fix the subunit at original position? [Y/N] N User The fixation option may be used to keep the desired positions of certain domains. This question is asked for each domain in all symmetry-independent chains. Make sure that the fixed domains are not very far from (0,0,0), otherwise the overall center may be significantly displaced from the origin so that the intensity calculation will be affected.
Pair of domains to group 0,0 User One may force consorted movements of specific domains by pairing them, e.g. to keep the known binding interface. If more than two domains have to be paired, all combinations of the pairs have to be specified. E.g. for pairing the 1st, the 3rd and the 5th domains, one needs to enter subsequently 1,3; 1,5; 3,5. This question is asked until 0,0 is answered.
DR formfactor multiplier 1.0 Expert The weight of the DR formfactors may be adjusted. For instance, an increased value (~1.2) would allow to account for an extra hydration if it is known that the loops are exposed to the solvent.
Symmetry: P1…19 or Pn2 (n=1,..,12) P1 User Supported symmetries are: P1, P2-P19 (nineteen-fold), P222, P32-P(12)2. The n-fold axis is typically Z, if there is in addition a two-fold axis it coincides with Y.
Cross penalty weight 100.0 Expert How much the Cross Penalty shall influence the acceptance or rejection of mutation. A value of0.0disables the penalty. If unsure, use the default value. If clashes between the loops or domains are observed, try increasing this penalty weight.
Shift penalty weight 1.0 Expert How much shift from the origin of the entire protein shall influence the acceptance or rejection of mutation. A value of0.0disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately. One needs to increase the weight in case resulting model is significantly shifted from the origin.
File name, contact conditions, CR for none <.cnd > empty User If the information on contacting residues is available it may be used as a modeling restraint. The information is provided in a file with special format. By default no information is given.
Contacts penalty weight 10.0 Expert How much improper contacts shall influence the acceptance or rejection of mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if the contact conditions file is provided.
Input total number of scattering curves 1 User If in addition to the entire complex, the scattering curves of its partial constructs are available, they can be fitted simultaneously assuming the same arrangement of domains in all the constructs.
Account for constant background User Expert Whether or not to adjust a background constant in the fitting
Input first&last residues in 1-st construct 1,var User The residues range present in the given construct (scattering curve). The residues belonging to different chains are sequentially numbered according to their appearance in the configuration file. This question is asked for each construct, i.e. the number of times equals to the total number of scattering curves (answer to the previous question). The default answer is from the first to the last residue.
Enter file name, 1-st experimental data <.dat> N/A User The name of the data file containing the experimental SAXS profile of a certain construct. The question is asked for each construct.
Angular units in the input file : 4pisin(theta)/lambda [1/angstrom] (1) 4pisin(theta)/lambda [1/nm ] (2) 2* sin(theta)/lambda [1/angstrom] (3) 2* sin(theta)/lambda [1/nm ] (4) 1 User Formula for the scattering vector in the data file and its units. The question is asked for each construct.
Fitting range in fractions of Smax 1.0 User Percentage of the scattering curve to fit, starting at the first point. Default is the entire curve. The question is asked for each construct.
Spatial step in angstroems 5.0 User Maximal random shift of a domain at a single modification of the system in the course of simulated annealing.
Angular step in degrees 20.0 User Maximal random rotation angle of a domain at a single modification of the system in the course of simulated annealing.
Initial annealing temperature 10.0 Expert Starting temperature of simulated annealing protocol.
Annealing schedule factor 0.9 Expert Factor by which the temperature is decreased at each step; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95).
Max # of iterations at each T var Expert Finalize temperature step and cool after this many iterations at the latest. The default value depends on the total numbers of domains and residues.
Max # of successes at each T var Expert Finalize temperature step and cool after at most this many successful mutations. The default value depends on the total numbers of domains and residues.
Min # of successes to continue 100 Expert Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done.
Max # of annealing steps 100 Expert Stop if simulated annealing is not finished after this many steps. The slower the system is cooled, the more temperature steps are required.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

 j:   1 T: 0.100E+01 Suc:  1000 Eva:     2711 CPU:  0.503E+02 F:30.8120 Pen: 28.0621
 The best chi values: 1.65827

FIXME: Is CHI correct?

The fields can be interpreted as follows, top-left to bottom-right:

Field Description
j Step number. Starts at 1, increases monotonically.
T Temperature measure, starts at an arbitrary high value, decreases each step by the annealing schedule factor.
Suc Number of successful mutations in this temperature step. Limited by the minimum and maximum number of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
Eva Accumulated number of function evaluations.
CPU Elapsed wall-clock time since the annealing procedure was started.
F The best target function value obtained so far.
Pen Accumulated penalty value of the best target function.
The best chi values For each curve out of total number of curves, the \(\chi^2\) value of the best target function is given.

CORAL Input Files

CORAL expects background subtracted experimental SAS data (.dat) files.

Initial Configuration File

The initial configuration for modeling a complex with CORAL is specified using a configuration file.

Here is an example for a complex with two proteins (chains) A and B:

NTER 20
a1.pdb
LINK 25
a2.pdb
LINK 30
a3.pdb
b1.pdb
LINK 15
b2.pdb
CTER 10

Here the NTER keyword indicates that 20 amino acids are missing at the N-terminus of the first domain, a1.pdb, in chain A. This is followed by a LINK of 25 amino acids to the next domain, a2.pdb, and another LINK of 30 amino acids to the final domain, a3.pdb, in chain A. Chain B starts with b1.pdb, followed by a LINK of 15 amino acids to b2.pdb, and ends with 10 amino acids missing at the C-terminus (CTER).

Format Rules:

  • In the configuration file, NTER and CTER are optional; if present, NTER must be followed by a domain, and CTER must be preceded by a domain.
  • Within a single chain, domains must be separated by LINK. A LINK can only exist between two domains and cannot end a chain.
  • A chain must follow the sequence: NTER - domain - LINK - domain - LINK….
  • Two domains not separated by LINK indicate the start of a new chain.
  • When symmetry is applied, the configuration should describe only the asymmetric part.

Contact Conditions File

The following conditions assume a configration in P2, with two symmetry mates. They require the distance of \(7\AA\) between the residues 25 and 115 from the asymetric unit and the distance of \(5\AA\) between the residues 25 and 115 from the same chain and the distance of 5 Å between the residue 40 from the first chain and 50 from the second.

dist 7.0
1 25 25 1 115 115
dist 5.0
1 40 40 2 50 50

If two (or more) alternatives are given after the line with the keyword “dist”, the program compares the better (smaller) distance among them with the specified one.

Note that chains are counted (and indexed in the contacts file) as follows:

  • chains 1, 2, …, N as described in the configuration, top to bottom,
  • first symmetry mates of chains 1, 2, …, N
  • second symmetry mates of chains 1, 2, …, N

CORAL Output Files

After each simulated annealing step, CORAL creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

Extension Description
.log Contains the same information as the screen output and is updated during execution of the program.
.pdb or .cif Current model of the entire complex in either .pdb or .cif format, depending on model-format option. The comments section of the file contains information about the application used and about the parameters of the model, e.g. penalties and goodness-of-fit to the data \((\chi^2)\).
-i.fit Fit of the scattering curve computed from a construct versus the corresponding experimental data; i refers to the respective construct number.

Examples

For the sample run, we use a hexameric glutamate decarboxylase (Gad) with three calmodulin (CaM) molecules bound to three pairs of Gad C-terminal peptides. The structures of the homohexameric Gad core and the 1:2 complex of CaM with the C-terminal peptide are known. The peptide is connected to the Gad core domain by a 22 amino acid linker.

The corresponding configuration file:

m1.pdb
LINK 22
pept1a.pdb
m2.pdb
LINK 22
pept2a.pdb
cama.pdb
m3.pdb
LINK 22
pept1b.pdb
m4.pdb
LINK 22
pept2b.pdb
camb.pdb
m5.pdb
LINK 22
pept1c.pdb
m6.pdb
LINK 22
pept2c.pdb
camc.pdb

Here, m?.pdb and pept??.pdb are the atomic models of the Gad core domains and its C-terminal portions, respectively. Further, cam?.pdb are three copies of the CaM molecule.

Note that the six core domains of Gad have to be fixed so that the hexamer remains intact and three CaM are coupled with two C-terminal peptides each to keep their known arrangement.

CORAL may be run with an answers file like this (empty lines are significant and indicate default values being used):

U
gadcam
Gad hexamer+3CaMs
config
Y
N
Y
N
N
Y
N
Y
N
N
Y
N
Y
N
N
2,4
2,5
4,5
7,9
7,10
9,10
12,14
12,15
14,15
0,0
P1

1
1,3372
gad_cam-mer.dat
1
1.0
5.0
20.0

Save to a file, e.g. gadcam.ans, then run as:

$ coral < gadcam.ans