Manual

Introduction

GASBOR is program for ab initio reconstruction of protein structure by a chain-like ensemble of dummy residues.

GASBORMX, extends GASBORI to account for oligomeric equilibria. In this mode, an ab initio model of a symmetric oligomer is built under the assumption that some fraction of monomers remains free in solution (i.e., a polydisperse sample).

GASBORMX can handle up to approximately 8000 dummy residues and atoms (residues plus dummy waters). Since the water shell is typically small (residue-to-water ratio around 3:1), this corresponds to a practical upper limit of about 6000 residues, or a molecular mass of roughly 700 kDa. The runtime scales quadratically with the number of residues, so while small proteins (e.g. lysozyme, 129 residues) may complete in seconds to minutes, large systems can take substantially longer. For proteins larger than ~2000 residues, the internal structure has less impact on the scattering profile, and faster alternatives such as DAMMIF or DAMMIN are recommended for comparable results.

Algorithm description

The algorithm is the same as described for GASBORI and GASBORP.

Running gasbor

Command-Line Arguments and Options

Usage:

$ gasbormx [OPTIONS] [GNOMFILE] [nDR]

GASBORMX requires the following command line arguments:

Argument Description
GNOMFILE A relative or absolute path to regularised SAS data (.out).
nDR Number of dummy residues in asymmetric part.

GASBOR recognizes the following command-line options.

Option Description
--seed <INT> Set the seed for the random number generator
--mo <U|E> Configuration mode, either User or Expert.
--lo <LOG_FILE> Prefix to prepend to output filenames. Default is the base name of the GASBOR input file without extension.
--model-format <FMT> Format of 3D models, one of: cif, pdb (default: cif)
--id <DESCRIPTION> Project description. By default, the command line content is used.
--un <UNIT> Angular unit of the input file, either ‘1’ (\(\AA^{-1}\)) or ‘2’ (\(\text{nm}^{-1}\)); if not given, the application will attempt to guess the units from the data.
--sy <SYMMETRY> Specify the point symmetry of the particle. Point groups P2, …, P9, Pn2 (n = 2, …, 9) or P23 or P432 or PICO are supported. By default, no symmetry is enforced (P1).
--an <ANISOMETRY> Particle anisometry: oblate (O), prolate (P) or unknown (default).
--dr <DIRECTION> Direction of anisometry, applicable with P2 symmetry only: along (L), across (C) or unknown (default).
--help Print usage information and exit.
--version Print version information and exit.

Interactive Configuration

If the optional argument GNOMFILE is omitted, settings available through command-line arguments and options may also be configured interactively as shown in the table below. Otherwise these questions are skipped.

Screen text Default value Description
Computation mode (User or Expert) User Expert mode allows to configure all aspects of the procedure, while User mode applies default values for most features.
Screen text Mode Default value Description
Log file name U|E N/A Prefix to the output file names.
Enter project description U|E N/A Short description of the run.
Total number of curves to fit U|E 1  
Number of knots on the master grid E 101  

The following questions are repeated for each input curve:

Screen text Mode Default value Description
Input data, GNOM output file name U|E N/A Input file with scattering data and p(r) curve.
Angular units in the input file
4pisin(theta)/lambda [1/angstrom] (1)
4pisin(theta)/lambda [1/nm] (2)
U|E 1 Angular units of the input file, one of \(\AA^{-1}\) or \(\text{nm}^{-1}\). Default is \(\AA^{-1}\).
Portion of the curve to be fitted U|E 1.0 How much of the data is used for fitting, starting from the beginning. The whole curve is used by default.
Volume fraction of monomer (if known) U|E -1.0 If known, enter the volume fraction of the intact monomer (a number between 0 and 1). This value will be kept fixed during modeling. If unknown, leave as default (-1) and the program will determine it automatically during optimization.

Then continues as:

Screen text Mode Default value Description
Initial DRM (CR for random) U|E   Starting configuration. Leave empty (press Enter) to use a random start.
Symmetry: P2…9 or Pn2 (n=2,..,9) or P23 or P432 or PICO U|E P2 Expected particle symmetry.
Number of residues in asymmetric part U|E N/A Number of dummy residues in one asymmetric unit.
Fibonacci grid order U|E 12 Controls how many dummy waters are distributed. 0–18.
Radius of the search volume E \(D_{\text{max}}/2\) Radius of the volume in which dummy atoms will be placed. Limits the sampling space.
Histogram penalty weight E 1.0000E-3 Penalizes unrealistic distributions of inter-residue distances.
Bond length penalty weight E 1.0000E-2 Penalizes bond lengths different from \(3.8 \AA\).
Discontiguity penalty weight E 1.0000E-2 Penalizes disconnected dummy residues.
Peripheral penalty weight E 1.0000E0 Encourages compact structures early on. Reduced during annealing.
Expected particle shape: <P>rolate, blate, or nknown U|E Unknown Define if the particle is strongly non-spherical.
Contrast of the hydration layer E 3.0000E-2 Contrast of the hydration layer relative to the solvent.
Seqence file name, CR for none E N/A File with amino acid sequence for more accurate modeling (Enter for none). Lines in this file must not exceed 256 characters.
Weight: 0=s^2,1=const at s<MaxPor,2=aver
Weight: 3=s ,4=const at s<MaxI*s,5=aver
Weight: 6=logarithmic scale
E 2 Weight I(s) fit according to:
- 0: weight \(I(s)\) proportional to \(s^2\)
- 1: weight \(I(s)\) proportional to \(s^2\), but with a constant where \(s < \max(I \cdot s^2)\)
- 2: the average of options 0 and 1
- 3: weight \(I(s)\) proportional to \(s\)
- 4: weight \(I(s)\) proportional to \(2\), but with a constant where \(s < \max(I \cdot s)\)
- 5: the average of options 3 and 4
- 6: calculate fit on logarithmic scale
Account for constant background E Y Subtracts a constant from the data during fitting.
Initial annealing temperature E 1.0000E-3 Controls early-stage flexibility. Higher = more exploration.
Annealing schedule factor E 0.9000 Controls how fast the model becomes less flexible. Closer to 1 = slower cooling.
# of independent atoms to modify E 1 Number of dummy residues moved per annealing step.
Max # of iterations at each T E 55000 Maximum steps allowed at a given temperature.
Max # of successes at each T E 5500 If this many successful moves are made, temperature is decreased early.
Min # of successes to continue E 55 Stops the program if fewer than this number of moves succeed.
Max # of annealing steps E 100 Program ends after this many annealing cycles.

Runtime Output

After printing program version number and querying or printing all parameters, GASBOR will display a message that Simulated annealing procedure started and after each round of simulated annealing at new temperature, it will print a report line:

 j:   1 T: 0.100E+00 Suc: 10500 Eva:    10828 CPU:  0.542E+01 SqF: 0.4870
  Rf: 0.38044 His: 30.50 Bnd: 2.265 Dis:2.6937 Per :0.2471
Report header Description
j: Iteration number.
T: Temperature of iteration.
Suc: Number of successes at given iteration.
Eva: Total number of function evaluations until end of this iteration.
CPU: Total CPU time in seconds since beginning of run until end of this iteration.
SqF: square root of the target function at the end of iteration
Rf: R-factor penalty at the end of iteration
His: Histogram penalty at the end of iteration
Bnd: Bond angle penalty at the end of iteration
Dis: Discontiguity penalty at the end of iteration
Per: Peripheral penalty at the end of iteration

After the run is completed, final \chi ^2^ against all data files are printed to the output.

gasbor Input Files

GASBORMX requires one or more regularised SAS data (.out) files as generated by GNOM.

In addition, an optional residue sequence data (.seq) file can be provided in Expert mode.

gasbor Output Files

GASBORMX outputs a set of files, each filename starts with a customizable prefix option. If a prefix has been used before, existing files will be overwritten without further note.

Extension Description
.log A copy of the screen output
.pdb or .cif The model is provided in either PDB or mmCIF format, depending on the model-format option.
-i.fit Fit of the simulated scattering curve versus the ith smoothed-out version of the real-data. See interactive mode how to change the number of supporting points in the spline interpolation.
-i.fir Fit of the simulated scattering curve versus the ith experimental data.
-monomer.dat Calculated scattering of the mononmer.
-dimer.dat Calculated scattering of the mononmer.

Examples

Concentration series of Tetanus toxin

There are three curves available on Tetanus toxin: one corresponds to pure monomer, the other two are from monomer-dimer equilibrium with unknown volume fractions. In this gasbormx example all three curves are fitted simultaneously, while keeping the volume fraction of the monomer fixed to 1 for the monoimer curve and varying monomer and dimer volume fractions for the other two. The overall P2 symmetry is used and 451 residues per monomer are generated.

  ***  Ab inito reconstruction of a protein structure    ***
  ***  by a chain-like ensemble of dummy residues (mix)  ***
  ***  Takes into account oligomer-monomer equilibrium   ***
  ***  Please reference: D.I.Svergun, M.V.Petoukhov &    ***
  ***   M.H.J.Koch (2001) Biophys. J. 80, 2946-2953      ***

   Type gasbormx --help for batch mode use

  === GASBOR ATSAS 4.0.1 (6378ba7) started on   01-Oct-2023   16:22:34

 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: **tetomx1**
 Project identifier ..................................... : tetomx1
 Enter project description .............. : **3 curves**
 Warning: initialising the random seed when it has already been initialised
 Previous seed:     27016039586466748
 New seed:          27020856437574588
 Initialized random seed as ..................... : 27020856437574588
 Total number of curves to fit .......... <            1 >: **3**
 Number of knots on the master grid ..................... : 101
 Curve # ................................................ : 1
 Input data, GNOM output file name ...... <         .out >: **hcm_mer**
 Data set title
 Maximum diameter of the particle ....................... : 10.05
 Radius of gyration ..................................... : 2.938
 Number of GNOM data points ............................. : 591
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)  <            2 >:
 Angular units multiplied by ............................ : 0.1000
 Maximum diameter divided by ............................ : 0.1000
 Maximum s value [1/angstrom] ........................... : 0.2691
 Number of Shannon channels ............................. : 8.608
 Portion of the curve to be fitted ...... <        1.000 >:
 Reduced s maximum ...................................... : 0.2684
 Reduced number of Shannon channels ..................... : 8.585
 Volume fraction of monomer (if known) .. <       -1.000 >: **1**
 Curve # ................................................ : 2
 Input data, GNOM output file name ...... <         .out >: **hcp_a46c**
 Data set title
 Maximum diameter of the particle ....................... : 13.00
 Radius of gyration ..................................... : 3.907
 Number of GNOM data points ............................. : 941
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)  <            2 >:
 Angular units multiplied by ............................ : 0.1000
 Maximum diameter divided by ............................ : 0.1000
 Maximum s value [1/angstrom] ........................... : 0.3384
 Number of Shannon channels ............................. : 14.00
 Portion of the curve to be fitted ...... <        1.000 >:
 Reduced s maximum ...................................... : 0.3381
 Reduced number of Shannon channels ..................... : 13.99
 Volume fraction of monomer (if known) .. <       -1.000 >:
 Curve # ................................................ : 3
 Input data, GNOM output file name ...... <         .out >: **hcp_a48c**
 Data set title
 Maximum diameter of the particle ....................... : 15.00
 Radius of gyration ..................................... : 4.393
 Number of GNOM data points ............................. : 944
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)  <            2 >:
 Angular units multiplied by ............................ : 0.1000
 Maximum diameter divided by ............................ : 0.1000
 Maximum s value [1/angstrom] ........................... : 0.3384
 Number of Shannon channels ............................. : 16.16
 Portion of the curve to be fitted ...... <        1.000 >:
 Reduced s maximum ...................................... : 0.3381
 Reduced number of Shannon channels ..................... : 16.14
 Volume fraction of monomer (if known) .. <       -1.000 >:
 Initial DRM (CR for random) ............ <         .pdb >:
 Symmetry: P2...9 or Pn2 (n=2,..,9)
 or P23 or P432 or PICO ................. <           P2 >:**P2**
 Number of equivalent positions ......................... : 2
 Number of residues in asymmetric part .. <         1156 >: **451**
 Fibonacci grid order ................... <           14 >:
 Number of dummy waters ................................. : 611
 Excluded volume per residue ............................ : 28.73
 Radius of the search volume ............................ : 75.00
 Histogram penalty weight ............................... : 1.000E-03
 Bond length penalty weight ............................. : 1.000E-02
 Discontiguity penalty weight ........................... : 1.000E-02
 Peripheral penalty weight .............................. : 5.000E-02
 Expected particle shape: <P>rolate, <O>blate,
  or <U>nknown .......................... <      Unknown >:
 Contrast of the hydration layer ........................ : 3.000E-02
  Computation of the initial intensity ...
 Histogram penalty value ................................ : 32.79
 Bond length penalty value .............................. : 0.6398
 Initial DRM # of graphs ................................ : 731
 Discontiguity   value .................................. : 4.725
 Peripheral penalty value ............................... : 0.2862
 Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
 *** Accounting for constant background ***
 Constant background subtracted ......................... : -8.012E-04
 Initial R^2 factor ..................................... : 0.5078
 Initial R   factor ..................................... : 0.7126
 Volume fraction, monomer ............................... : 1.000
 Constant background subtracted ......................... : -0.1363
 Initial R^2 factor ..................................... : 7.185
 Initial R   factor ..................................... : 2.681
 Volume fraction, monomer ............................... : 1.000
 Constant background subtracted ......................... : -0.5360
 Initial R^2 factor ..................................... : 7.029
 Initial R   factor ..................................... : 2.651
 Initial penalty ........................................ : 0.1007
 Initial fVal ........................................... : 14.82
 Initial annealing temperature .......................... : 0.100
 Annealing schedule factor .............................. : 0.9000
 # of independent atoms to modify ....................... : 1
 Max # of iterations at each T .......................... : 105000
 Max # of successes at each T ........................... : 10500
 Min # of successes to continue ......................... : 105
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E+00 Suc: 10500 Eva:    10921 CPU:  0.311E+02 SqF: 0.4654
  Rf: 0.35014 His: 36.75 Bnd: 2.707 Dis:2.0684 Per :0.1894
_..._
 j:  96 T: 0.450E-05 Suc:    98 Eva:  3542664 CPU:  0.764E+04 SqF: 0.0942
  Rf: 0.05956 His:  4.96 Bnd: 0.038 Dis:0.0000 Per :0.1477

 Final Chi^2 against raw data ........................... : 1.669
 Final Chi^2 against raw data ........................... : 0.9622
 Final Chi^2 against raw data ........................... : 1.172

  === GASBOR ATSAS 4.0.1 (6378ba7) started on   01-Oct-2023   18:32:46