Manual

The following describes the method implemented in BUNCH, details of the dialog prompt as well as the required input and the produced output files.

Introduction

BUNCH models multidomain proteins against SAXS data using a combination of rigid body and ab initio approaches. It determines the three-dimensional domain structure of proteins based on multiple scattering datasets from deletion mutants, given the structures of individual domains.

A simulated annealing protocol optimizes the positions and orientations of high-resolution domain models and the conformations of dummy residue (DR) chains. Steric clashes, improper bond or dihedral angles, and overly extended loops are penalized.

Theoretical scattering patterns \(I(s)\) are calculated using spherical harmonics from high-resolution coordinates and DRs. Partial scattering amplitudes \(A_{lm}(s)\) depend on the scattering amplitudes in reference positions and on rotational and translational parameters. Reference partial amplitudes are precomputed by CRYSOL. Symmetry, if applicable, is accounted for by configuring the asymmetric part and generating the rest according to symmetry rules.

Running bunch

Prior to running BUNCH, an additional preparatory step has to be performed to generate an initial approximation to be used as a starting point.

Command-Line Arguments and Options

BUNCH is mainly run in dialog mode, command line arguments may only be used to supplement the configuration.

Short Option Long Option Description
  --seed <INT> Set the seed for the random number generator
  --model-format <FMT> Format of the output models, one of: cif, pdb (default: cif)
-v --version Print version information and exit.
-h --help Print a summary of arguments, options, and exit.

Interactive Configuration

There are two modes, EXPERT and USER. In USER mode, fewer questions are asked and default values are used for most parameters. All USER mode questions are also asked in EXPERT mode. The default settings are the same in both modes.

Screen Text Default Mode Description
Computation mode (User or Expert) USER User Mode selection.
Log file name N/A User Project identifier, will be used as a prefix for all output file names.
Enter project description N/A User Free text that will be stored in the log file.
Initial structure N/A User File name with the input file to BUNCH previously generated with PRE_BUNCH.
DR formfactor multiplier 1.0 Expert The weight of the DR formfactors may be adjusted. For instance, an increased value (~1.2) would allow to account for an extra hydration if it is known that the loops are exposed to the solvent. Negative value will mean that the individual (primary sequence-based) form factors will be used instead of averaged one.
Symmetry: P1…19 or Pn2 (n=1,..,12) P1 User Supported symmetries are: P1, P2-P19 (nineteen-fold), P222, P32-P(12)2. The n-fold axis is typically Z, if there is in addition a two-fold axis it coincides with Y.
Angles penalty weight 10.0 Expert How much the Bond Angles Penalty shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If the resulting bond angles do not look good try increasing this penalty weight.
Dihedrals penalty weight 1.0 Expert How much the Dihedral Angles Penalty shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If the resulting dihedral angles do not look good try increasing this penalty weight.
Cross penalty weight 100.0 Expert How much the Cross Penalty shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If clashes between the loops or domains are observed, try increasing this penalty weight.
Extended loops penalty weight 1.0 Expert This weight governs the penalty responsible for “moderate” \(R_g\) values of the missing portions. Increase this weight if they are known to make folded domains. Decrease or switch off the penalty if the loops are known to be extended/disordered.
Distances penalty weight 10.0 Expert This weight governs the penalty that ensures that the histogram of the distances between the closest 20 DRs along the chain is compatible with the averaged distributuion of 20 successive CA atoms in the backbones of disordered loops.
Shift penalty weight 1.0 Expert How much shift from the origin of the entire protein shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately.
File name, contact conditions, CR for none<.cnd > empty User If the information on contacting residues is available it may be used as a modeling restraint. The information is provided in a file with special format. By default no information is given.
Contacts penalty weight 10.0 Expert How much improper contacts shall influence the acceptance or rejection of mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if the contact conditions file is provided.
Input total number of scattering curves 1 User If in addition to the entire multidomain protein, the scattering curves of its partial constructs (deletion mutants) are available, they can be fitted simultaneously assuming the same arrangement of domains in all the constructs.
Use Kratky Geometry. N Expert If the answer is Yes, the computed curves will be smeared to fit the data from Kratky camera.
Input first&last residues in 1-st construct 1,var User The residues range present in the given construct (scattering curve). This question is asked for each construct, i.e. the number of times equals to the total number of scattering curves. The default answer is from 1 to the last residue in the full-length protein.
Enter file name, 1-st experimental data<.dat> N/A User The name of the data file containing the experimental SAXS profile of a certain construct. The question is asked for each construct.
Angular units in the input file : 4*pi*sin(theta)/lambda [1/angstrom] (1) 4*pi*sin(theta)/lambda [1/nm] (2) 2*sin(theta)/lambda [1/angstrom] (3) 2*sin(theta)/lambda [1/nm] (4) 1 User Formula for the scattering vector in the data file and its units. The question is asked for each construct.
Fitting range in fractions of Smax 1.0 User Percentage of the scattering curve to fit, starting at the first point. Default is the entire curve. The question is asked for each construct.
Amplitudes, 1-st subunit<.alm> N/A User The name of the file with partial scattering amplitudes of a certain domain computed by CRYSOL. This question is asked for each domain, i.e. the number of times equals to the total number of domains.
Fix the subunit at this position? [Y/N] N Expert The fixation option may be used to keep the desired relative arrangement of certain domains, e.g. to keep the known dimerization interface. This question is asked for each domain.
Angular step in degrees 20.0 User Maximal random rotation angle of a chain portion at a single modification of the system in the course of simulated annealing.
Initial annealing temperature 1.0 Expert Starting temperature of simulated annealing protocol.
Annealing schedule factor 0.9 Expert Factor by which the temperature is decreased at each step; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95).
Max # of iterations at each T var Expert Finalize temperature step and cool after this many iterations at the latest. The default value is either 5000 times the number of unfixed domains or 50 times the number of amino acids, whichever is larger.
Max # of successes at each T var Expert Finalize temperature step and cool after at most this many successful mutations. The default value is either 500 times the number of unfixed domains or 5 times the number of amino acids, whichever is larger.
Min # of successes to continue var Expert Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done. The default value is either 50 times the number of unfixed domains or 0.5 times the number of amino acids, whichever is larger, but at least 100.
Max # of annealing steps 100 Expert Stop if simulated annealing is not finished after this many steps. The slower the system is cooled, the more temperature steps are required.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

 j:   1 T: 0.100E+01 Suc:  1000 Eva:     2711 CPU:  0.503E+02 F:30.8120 Pen: 28.0621
 The best Chi^2 values: 1.65827

The fields can be interpreted as follows, top-left to bottom-right:

Field Description
j Step number. Starts at 1, increases monotonically.
T Temperature measure, starts at an arbitrary high value, descreases each step by the annealing schedule factor
Suc Number of successful mutations in this temperature step. Limited by the minimum and maximum number of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
Eva Accumulated number of function evaluations.
CPU Elapsed wall-clock time since the annealing procedure was started.
F The best target function value obtained so far.
Pen Accumulated penalty value of the best target function.
The best Chi^2 values For each curve out of total number of curves, the goodness-of-fit value \(\chi^2\) of the best target function is given.

bunch Input Files

BUNCH expects background subtracted experimental SAS data (.dat) as well as binary amplitude data (.alm) as calcuated by CRYSOL.

Initial approximation

BUNCH requires an initial approximation in .pdb or .cif format as generated by PRE_BUNCH.

Format of contact conditions file

The following conditions assume a configration in P2, with two symmetry mates. They require the distance of \(7\AA\) between the residues 25 and 115 from the asymetric unit and the distance of \(5\AA\) between the residues 40 of two symmetry related chains.

dist 7.0
1 25 25 1 115 115
dist 5.0
1 40 40 2 40 40

If two (or more) alternatives are given after the line with the keyword “dist”, the program compares the better (smaller) distance among them with the specified one.

Important: here, residue number is the ordnial number of CA atom in the coordinates file, i.e. in the following excerpt, PRO 32 will have a residue number equal to 2.

ATOM 1  N   GLY A 31 -6.047 33.786  1.442
ATOM 2  CA  GLY A 31 -5.711 33.334  0.066
ATOM 3  C   GLY A 31 -4.332 32.718  0.000
ATOM 4  O   GLY A 31 -3.676 32.483  0.995
ATOM 5  N   PRO A 32 -3.874 32.485 -1.215
ATOM 6  CA  PRO A 32 -2.562 31.874 -1.416
ATOM 7  C   PRO A 32 -1.444 32.754 -0.866
ATOM 8  O   PRO A 32 -1.566 33.990 -0.808
ATOM 9  CB  PRO A 32 -2.464 31.760 -2.936
ATOM 10 CG  PRO A 32 -3.446 32.698 -3.473
ATOM 11 CD  PRO A 32 -4.564 32.799 -2.483
ATOM 12 N   LEU A 33 -0.348 32.111 -0.506
ATOM 13 CA  LEU A 33  0.834 32.815 -0.070
ATOM 14 C   LEU A 33  1.392 33.614 -1.230
ATOM 15 O   LEU A 33  1.470 33.154 -2.364
ATOM 16 CB  LEU A 33  1.900 31.869  0.390
ATOM 17 CG  LEU A 33  1.537 31.036  1.611
ATOM 18 CD1 LEU A 33  2.576 29.958  1.797
ATOM 19 CD2 LEU A 33  1.490 31.984  2.815

bunch Output Files

After each simulated annealing step, BUNCH creates a set of output files, each filename starts with a customizable prefix that gets an extension appended (see table below). If a given prefix has been used before, existing files will be overwritten without further notice.

Extension Description
.log Contains the same information as the screen output and is updated during execution of the program.
.pdb or .cif Current model of the entire complex in either .pdb or .cif format, depending on model-format option. The comments section of the file contains information about the application used and about the parameters of the model, e.g. penalties and goodness-of-fit to the data \((\chi^2)\).
-i.fit Fit of the scattering curve computed from a construct versus the corresponding experimental data; i refers to the respective construct number.

Examples

This example demonstrates the use of BUNCH to model a dimeric GST-DHFR fusion protein with P2 symmetry, employing SAXS data of the full construct only.

All required files can be found in the ATSAS installation directory, namely in $ATSAS/share/doc/atsas-x.x.x/bunch/example.

Preparation

The atomic coordinates of 1gta (GST) and 1RA9 (DHFR) are required. First extract the monomer of GST, then position and rotate it so that the correct dimer is obtained by its rotation by 180 degrees about the Z-axis.

With these models and the total sequence, arrange the initial approximation:

$ pre_bunch gst-dhfr.seq 1gtaz1.pdb 1ra9.pdb -o gst-dhfr_ini.pdb

Finally, pre-compute the binary amplitude data (.alm) with CRYSOL.

User Mode With Fixation

In this Example BUNCH is run with the fixation option to keep the GST dimer intact. Assuming that all files are in the current working directory, BUNCH may be run with an answers file like this (empty lines are significant and indicate default values being used):

user
gstdh1
Gst-Dhfr with fixation of Gst
gst-dhfr_ini.pdb
p2

1


gst-d_med.dat
1
1.00
1gtaz100.alm
y
1ra900.alm

20

Save to a file, e.g. user_fixed.ans, then run as:

$ bunch < user_fixed.ans

USER Mode With Contacts

In this example the interface between the GST monomers is ensured by the use of contact conditions. Additional information on the contacts (Asp77 with Pro86 and Met69 with Gly97) is given in the file contacts.cnd. As before, empty lines are significant and indicate default values being used. Note the addition of “contacts.cnd”. Here contacts.cnd contains:

dist 7.0
1 77 77 2 86 86
dist 7.0
1 69 69 2 97 97

BUNCH configuration:

user
gstdh2
Gst-Dhfr with contacts
gst-dhfr_ini.pdb
p2
contacts.cnd
1

gst-d_med.dat
1
1.0
1gtaz100.alm
n
1ra900.alm
n

20

Save to a file, e.g. user_contact.ans`, then run as:

$ bunch < user_contact.ans