Manual

Introduction

GASBOR is program for ab initio reconstruction of protein structure by a chain-like ensemble of dummy residues.

There are two versions of GASBOR: one fits the scattering intensity in reciprocal space (GASBORI), and the other fits the real-space pair-distance distribution function, \(p(r)\) (GASBORP). Both versions use similar algorithms. The reciprocal-space version is slower but typically provides better fits to experimental data. The real-space version is significantly faster and is recommended when the number of dummy residues becomes large, as runtime scales quadratically with that number.

GASBOR can handle up to approximately 8000 dummy residues and atoms (residues plus dummy waters). Since the water shell is typically small (residue-to-water ratio around 3:1), this corresponds to a practical upper limit of about 6000 residues, or a molecular mass of roughly 700 kDa. The runtime scales quadratically with the number of residues, so while small proteins (e.g. lysozyme, 129 residues) may complete in seconds to minutes, large systems can take substantially longer. For proteins larger than ~2000 residues, the internal structure has less impact on the scattering profile, and faster alternatives such as DAMMIF or DAMMIN are recommended for comparable results.

Algorithm description

The use of GASBOR is similar to that of DAMMIN or DAMMIF. Most of parameters have the same meaning. The most important difference is that the protein structure is represented not by dummy spheres on lattice (called dummy atoms in DAMMIN / DAMMIF, but not corresponding to real atoms), but rather by an ensemble of dummy residues (corresponding to average residue densities) placed anywhere in continuous space with a preferred number of close distance neighbours for each atom. The centers of these residues aim to approximate positions of the \(\text{C}\alpha\) atoms in the protein structure. The number of residues should be equal to that in the protein.

Note, however, that these residues are anonymous, in the sense that their ordinal numbers in the model has nothing to do with the numbering primary sequence of the protein! Accordingly, the program does not subtract any Porod constant from the experimental data.

In DAMMIN, it was recommended to discard high angle portions of the scattering patterns; in GASBOR, on the contrary, one should use them. The program is able to fit the data up to the resolution of 5 \(\AA\), i.e. momentum transfer \(s = 4 \pi \cdot sin(\theta)/\lambda = 1.2 \AA^{-1}\).

Running gasbor

Command-Line Arguments and Options

Usage:

$ gasbori [OPTIONS] [GNOMFILE] [nDR]
$ gasborp [OPTIONS] [GNOMFILE] [nDR]

Both GASBOR variants require the following command line arguments:

Argument Description
GNOMFILE A relative or absolute path to regularised SAS data (.out).
nDR Number of dummy residues in asymmetric part.

GASBOR recognizes the following command-line options.

Option Description
--seed <INT> Set the seed for the random number generator
--mo <U|E> Configuration mode, either User or Expert.
--lo <LOG_FILE> Prefix to prepend to output filenames. Default is the base name of the GASBOR input file without extension.
--model-format <FMT> Format of 3D models, one of: cif, pdb (default: cif)
--id <DESCRIPTION> Project description. By default, the command line content is used.
--un <UNIT> Angular unit of the input file, either ‘1’ (\(\AA^{-1}\)) or ‘2’ (\(\text{nm}^{-1}\)); if not given, the application will attempt to guess the units from the data.
--sy <SYMMETRY> Specify the point symmetry of the particle. Point groups P1, …, P19, Pn2 (n = 2, …, 12), P23, P432 or PICO (icosahedral) are supported. By default, no symmetry is enforced (P1).
--an <ANISOMETRY> Particle anisometry: oblate (O), prolate (P) or unknown (default).
--dr <DIRECTION> Direction of anisometry, applicable with P2 symmetry only: along (L), across (C) or unknown (default).
--help Print usage information and exit.
--version Print version information and exit.

Interactive Configuration

If the optional argument GNOMFILE is omitted, settings available through command-line arguments and options may also be configured interactively as shown in the table below. Otherwise these questions are skipped.

Screen text Default value Description
Computation mode (User or Expert) User Expert mode allows to configure all aspects of the procedure, while User mode applies default values for most features.
Screen text Mode Default value Description
Log file name U|E N/A Prefix to the output file names.
Input data, GNOM output file name U|E N/A Input file with scattering data and p(r) curve..
Enter project description U|E N/A Short description of the run.
Angular units in the input file
4pisin(theta)/lambda [1/angstrom] (1)
4pisin(theta)/lambda [1/nm] (2)
U|E 1 Angular units of the input file, one of \(\AA^{-1}\) or \(\text{nm}^{-1}\). Default is \(\AA^{-1}\).
Portion of the curve to be fitted U|E 1.0 How much of the data is used for fitting, starting from the beginning. The whole curve is used by default.
Number of knots in the curve to fit E 11-201 Number of points used in the regularized fit. Default depends on input data.
Initial DRM (CR for random) U|E   Starting configuration. Leave empty (press Enter) to use a random start.
Symmetry: P1…19 or Pn2 (n=1,..,12) or P23 or P432 or PICO U|E P1 Expected particle symmetry.
Number of residues in asymmetric part U|E N/A Number of dummy residues in one asymmetric unit.
Fibonacci grid order U|E 9 Controls how many dummy waters are distributed. 0–18.
Radius of the search volume E \(D_{\text{max}}/2\) Radius of the volume in which dummy atoms will be placed. Limits the sampling space.
Histogram penalty weight E 1.0000E-3 Penalizes unrealistic distributions of inter-residue distances.
Bond length penalty weight E 1.0000E-2 Penalizes bond lengths different from \(3.8 \AA\).
Discontiguity penalty weight E 1.0000E-2 Penalizes disconnected dummy residues.
Peripheral penalty weight E 1.0000E0 Encourages compact structures early on. Reduced during annealing.
Expected particle shape: <P>rolate, blate, or nknown U|E Unknown Define if the particle is strongly non-spherical.
Contrast of the hydration layer E 3.0000E-2 Contrast of the hydration layer relative to the solvent.
Seqence file name, CR for none E N/A File with amino acid sequence for more accurate modeling (Enter for none). Lines in this file must not exceed 256 characters.
Weight: 0=s^2,1=const at s<MaxPor,2=aver
Weight: 3=s ,4=const at s<MaxI*s,5=aver
Weight: 6=logarithmic scale
E 2 Weight I(s) fit according to:
- 0: weight \(I(s)\) proportional to \(s^2\)
- 1: weight \(I(s)\) proportional to \(s^2\), but with a constant where \(s < \max(I \cdot s^2)\)
- 2: the average of options 0 and 1
- 3: weight \(I(s)\) proportional to \(s\)
- 4: weight \(I(s)\) proportional to \(2\), but with a constant where \(s < \max(I \cdot s)\)
- 5: the average of options 3 and 4
- 6: calculate fit on logarithmic scale
Account for constant background E Y Subtracts a constant from the data during fitting.
Initial scale factor E N/A Starting value for scaling the model to the data.
Fixing threshold for Rf E 0.0 Deprecated. Leave at 0.0.
Fixing threshold for PenCha E 0.0 Deprecated. Leave at 0.0.
Fixing threshold for PenLen E 0.0 Deprecated. Leave at 0.0.
Initial annealing temperature E 1.0000E-3 Controls early-stage flexibility. Higher = more exploration.
Annealing schedule factor E 0.9000 Controls how fast the model becomes less flexible. Closer to 1 = slower cooling.
# of independent atoms to modify E 1 Number of dummy residues moved per annealing step.
Max # of iterations at each T E 40000 Maximum steps allowed at a given temperature.
Max # of successes at each T E 4000 If this many successful moves are made, temperature is decreased early.
Min # of successes to continue E 40 Stops the program if fewer than this number of moves succeed.
Max # of annealing steps E 100 Program ends after this many annealing cycles.

Runtime Output

After printing program version number and querying or printing all parameters, GASBOR will display a message that Simulated annealing procedure started and after each round of simulated annealing at new temperature, it will print a report line:

 j:   1 T: 0.100E-02 Suc:  4000 Eva:    12125 CPU:  0.211E+00 SqF: 0.3309
  Rf: 0.04743 His: 15.96 Bnd: 0.493 Dis:0.2389 Per :0.0840
Report header Description
j: Iteration number.
T: Temperature of iteration.
Suc: Number of successes at given iteration.
Eva: Total number of function evaluations until end of this iteration.
CPU: Total CPU time in seconds since beginning of run until end of this iteration.
SqF: square root of the target function at the end of iteration
Rf: R-factor penalty at the end of iteration
His: Histogram penalty at the end of iteration
Bnd: Bond angle penalty at the end of iteration
Dis: Discontiguity penalty at the end of iteration
Per: Peripheral penalty at the end of iteration

After the run is completed, final \chi ^2^ against data is printed to the output.

gasbor Input Files

GASBORI and GASBORP require regularised SAS data (.out) as generated by GNOM.

In addition, an optional residue sequence data (.seq) file can be provided in Expert mode.

gasbor Output Files

GASBORI and GASBORP output a set of files, each filename starts with a customizable prefix option. If a prefix has been used before, existing files will be overwritten without further note.

Due to the different implementations, the output files of GASBORI and GASBORP differ.

Output files of gasbori

Extension Description
.log A copy of the screen output
.pdb or .cif The model is provided in either PDB or mmCIF format, depending on the model-format option.
.fit Fit of the simulated scattering curve versus a smoothed-out version of the real-data. See interactive mode how to change the number of supporting points in the spline interpolation.
.fir Fit of the simulated scattering curve versus the experimental data.

Output files of gasborp

Extension Description
.log A copy of the screen output
.pdb or .cif The model is provided in either PDB or mmCIF format, depending on the model-format option.
.hst Fit of the simulated p(r) versus the provided p(r).
.fir Fit of the simulated scattering curve versus the experimental data.

Examples

Lysozyme

Lysozyme has no symmetry, and 129 residues: Enter P1 symmetry, 129 residues and default answers to all other questions. You may also use command line (type gasbori --help for batch mode use):

$ gasbori gnlyzfu.out 129

Here is resulting output:

  ***  Ab inito reconstruction of a protein structure    ***
  ***   by a chain-like ensemble of dummy residues       ***
  ***  Please reference: D.I.Svergun, M.V.Petoukhov &    ***
  ***   M.H.J.Koch (2001) Biophys. J. 80, 2946-2953      ***

   Type gasbori --help for batch mode use

  === GASBOR ATSAS 4.0.1 (6378ba7) started on   29-Sep-2023   12:48:23

 Project identifier ..................................... : gnlyzfu
 Project description:
 Initialized random seed as ..................... : 661145759406964600
 Data set title ......................................... : Angular axis n01000.sax             Datafile n10000.sub
 Maximum diameter of the particle ....................... : 50.00
 Radius of gyration ..................................... : 14.33
 Number of GNOM data points ............................. : 230
 Maximum s value [1/angstrom] ........................... : 1.316
 Number of Shannon channels ............................. : 20.94
 Reduced s maximum ...................................... : 1.307
 Reduced number of Shannon channels ..................... : 20.80
 Number of knots in the curve to fit .................... : 42
 Symmetry: P1...19 or Pn2 (n=1,..,12)
 Number of equivalent positions ......................... : 1
  Number of dummy waters ................................ : 90
 Excluded volume per residue ............................ : 28.73
 Radius of the search volume ............................ : 25.00
 Histogram penalty weight ............................... : 1.000E-03
 Bond length penalty weight ............................. : 1.000E-02
 Discontiguity penalty weight ........................... : 1.000E-02
 Peripheral penalty weight .............................. : 1.000
 Contrast of the hydration layer ........................ : 3.000E-02
  Computation of the initial intensity ...
 Histogram penalty value ................................ : 40.11
 Bond length penalty value .............................. : 2.402
 Initial DRM # of graphs ................................ : 60
 Discontiguity   value .................................. : 1.099
 Peripheral penalty value ............................... : 0.2496
 Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
 *** Accounting for constant background ***
 Initial scale factor ................................... : 1.092E-04
 Constant background subtracted ......................... : -0.4002
 Initial R^2 factor ..................................... : 0.1164
 Initial R   factor ..................................... : 0.3412
 Initial penalty ........................................ : 0.3247
 Initial fVal ........................................... : 0.4411
 R-factor fixing threshold .............................. : 0.0
 Fixing threshold PenCha ................................ : 0.0
 Fixing threshold PenLen ................................ : 0.0
 Initial annealing temperature .......................... : 1.000E-03
 Annealing schedule factor .............................. : 0.9000
 # of independent atoms to modify ....................... : 1
 Max # of iterations at each T .......................... : 55000
 Max # of successes at each T ........................... : 5500
 Min # of successes to continue ......................... : 55
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E-02 Suc:  5500 Eva:    11278 CPU:  0.106E+01 SqF: 0.5132
  Rf: 0.09491 His: 25.78 Bnd: 1.398 Dis:0.1870 Per :0.2127
_..._
 j:  36 T: 0.250E-04 Suc:    55  j:  36 T: 0.250E-04 Suc:    53 Eva:  1427584 CPU:  0.135E+03 SqF: 0.0912
  Rf: 0.03409 His:  6.26 Bnd: 0.065 Dis:0.0000 Per :0.3664

 Final Chi^2 against raw data ........................... : 1.248

  === GASBOR ATSAS 4.0.1 (6378ba7) finished on   29-Sep-2023   12:44:39

Transketolase

Transketolase is homodimer in solution, and each monomer has 680 residues, giving a total of 1360 residues: Enter P2 for symmetry, 680 for residues and default answers to all other questions.

  ***  Ab inito reconstruction of a protein structure    ***
  ***   by a chain-like ensemble of dummy residues       ***
  ***  Please reference: D.I.Svergun, M.V.Petoukhov &    ***
  ***   M.H.J.Koch (2001) Biophys. J. 80, 2946-2953      ***

   Type gasbori --help for batch mode use

  === GASBOR ATSAS 4.0.1 (6378ba7) started on   29-Sep-2023   12:56:50
  
 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: **log**
 Input data, GNOM output file name ...... <         .out >: **1trk.out**
 Project identificator .................................. : log
 Enter project description .............. : **project**
 Random sequence initialized from ....................... : 164228
  ** Information read from the GNOM file **
 Data set title:    Transketolase collated from n85, o14+o16   6-11-98
 Raw data file name:  trkexp.dat
 Maximum diameter of the particle ....................... : 12.00
  Solution at Alpha =   .164E+01   Rg :   .336E+01   I(0) :    .190E+03
 Radius of gyration ..................................... : 3.360
 Number of GNOM data points ............................. : 283
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)  <            2 >: **2**
 Angular units multiplied by ............................ : 0.1000
 Maximum diameter divided by ............................ : 0.1000
 Maximum s value [1/angstrom] ........................... : 0.3418
 Number of Shannon channels ............................. : 13.06
 Portion of the curve to be fitted ...... <        1.000 >:
 Number of knots in the curve to fit .................... : 26
 Initial DRM (CR for random) ............ <         .pdb >:
 Symmetry: P1...19 or Pn2 (n=1,..,12)
 or P23 or P432 or PICO ................. <           P1 >: **P2**
 Number of equivalent positions ......................... : 2
 Number of residues in asymmetric part .. <          517 >: **680**
 Fibonacci grid order ................... <           15 >:
 Number of dummy waters ................................ : 988
 Excluded volume per residue ............................ : 28.73
 Radius of the search volume ............................ : 60.00
 Histogram penalty weight ............................... : 1.000e-3
 Bond length penalty weight ............................. : 1.000e-2
 Discontiguity penalty weight ........................... : 1.000e-2
 Peripheral penalty weight .............................. : 1.000
 Expected particle shape: <P>rolate, <O>blate,
  or <U>nknown .......................... <      Unknown >:
 Contrast of the hydration layer ........................ : 3.000e-2
  Computation of the initial intensity ...
 Histogram penalty value ................................ : 37.38
 Bond length penalty value .............................. : 1.604
 Initial DRM # of graphs ................................ : 708
 Discontiguity   value .................................. : 2.191
 Peripheral penalty value ............................... : 0.2647
 Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
 *** Accounting for constant background ***
 Initial scale factor ................................... : 5.042e-7
 Constant background subtracted ......................... : 0.3339
 Initial R^2 factor ..................................... : 3.837e-2
 Initial R   factor ..................................... : 0.1959
 Initial penalty ........................................ : 0.3400
 Initial fVal ........................................... : 0.3784
 R-factor fixing threshold .............................. : 0.0
 Fixing threshold PenCha ................................ : 0.0
 Fixing threshold PenLen ................................ : 0.0
 Initial annealing temperature .......................... : 1.000e-3
 Annealing schedule factor .............................. : 0.9000
 # of independent atoms to modify ....................... : 1
 Max # of iterations at each T .......................... : 130000
 Max # of successes at each T ........................... : 13000
 Min # of successes to continue ......................... : 130
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E-02 Suc: 13000 Eva:    14975 CPU:  0.272E+02 SqF: 0.5531
  Rf: 0.11900 His: 36.59 Bnd: 2.213 Dis:0.3588 Per :0.2294
_..._
 j:  56 T: 0.304E-05 Suc:    85 Eva:  3737295 CPU:  0.680E+04 SqF: 0.0797
  Rf: 0.02350 His:  5.37 Bnd: 0.044 Dis:0.0000 Per :0.3197

 Final Chi^2 against raw data ........................... : 1.774

  === GASBOR ATSAS 4.0.1 (6378ba7) finished on   29-Sep-2023   14:57:15

Lysozyme in Expert mode with sequence file

Lysozyme example with sequence-specific form-factors of dummy residues. Enter E for Expert mode, P1 symmetry, 129 residues, lyz.seq for the sequence and default answers to all other questions.

  ***  Ab inito reconstruction of a protein structure    ***
  ***   by a chain-like ensemble of dummy residues       ***
  ***  Please reference: D.I.Svergun, M.V.Petoukhov &    ***
  ***   M.H.J.Koch (2001) Biophys. J. 80, 2946-2953      ***

   Type gasbori --help for batch mode use

  === GASBOR ATSAS 4.0.1 (8aa369f3b) started on   02-Oct-2023   13:55:24

 Computation mode (User or Expert) ...... <         User >: **E**
 Log file name .......................... <         .log >: **lyzseq**
 Input data, GNOM output file name ...... <         .out >: **gnlyzfu**
 Project identifier ..................................... : lyzseq
 Enter project description .............. : **use sequence**
Initial random seed? (default: use current time) ..................... :
 Warning: initialising the random seed when it has already been initialised
 Previous seed:     39884949516326408
 New seed:          39888779342845448
 Initialized random seed as ..................... : 39888779342845448
 Data set title ......................................... : Angular axis n01000.sax             Datafile n10000.sub
 Maximum diameter of the particle ....................... : 50.00
 Radius of gyration ..................................... : 14.33
 Number of GNOM data points ............................. : 230
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)  <            1 >:
 Maximum s value [1/angstrom] ........................... : 1.316
 Number of Shannon channels ............................. : 20.94
 Portion of the curve to be fitted ...... <        1.000 >:
 Reduced s maximum ...................................... : 1.307
 Reduced number of Shannon channels ..................... : 20.80
 Number of knots in the curve to fit .... <           42 >:
 Initial DRM (CR for random) ............ <         .pdb >:
 Symmetry: P1...19 or Pn2 (n=1,..,12)
 or P23 or P432 or PICO ................. <           P1 >: **P1**
 Number of equivalent positions ......................... : 1
 Number of residues in asymmetric part .. <           80 >: **129**
 Fibonacci grid order ................... <           10 >:
  Number of dummy waters ................................ : 90
 Excluded volume per residue ............................ : 28.73
 Radius of the search volume ............ <        25.00 >:
 Histogram penalty weight ............... <    1.0000E-3 >:
 Bond length penalty weight ............. <    1.0000E-2 >:
 Discontiguity penalty weight ........... <    1.0000E-2 >:
 Peripheral penalty weight .............. <        1.000 >:
 Expected particle shape: <P>rolate, <O>blate,
  or <U>nknown .......................... <      Unknown >:
 Contrast of the hydration layer ........ <    3.0000E-2 >:
 Seqence file name, CR for none ......... <         .seq >: **lyz**
 Sequence file name ..................................... : lyz.seq
  Computation of the initial intensity ...
 Histogram penalty value ................................ : 27.84
 Bond length penalty value .............................. : 1.772
 Initial DRM # of graphs ................................ : 65
 Discontiguity   value .................................. : 2.152
 Peripheral penalty value ............................... : 0.2892
Weight: 0=s^2,1=const at s<MaxPor,2=aver
Weight: 3=s  ,4=const at s<MaxI*s,5=aver
 Weight: 6=logarithmic scale ............ <            2 >:
 Account for constant background [ Y / N ] <          Yes >:
 *** Accounting for constant background ***
 Initial scale factor ................... <    9.9872E-5 >:
 Constant background subtracted ......................... : -0.4946
 Initial R^2 factor ..................................... : 0.1717
 Initial R   factor ..................................... : 0.4144
 Initial penalty ........................................ : 0.3563
 Initial fVal ........................................... : 0.5280
 Fixing threshold for Rf ................ <        0.000 >:
 Fixing threshold for PenCha ............ <        0.000 >:
 Fixing threshold for PenLen ............ <        0.000 >:
 Initial annealing temperature .......... <    1.0000E-3 >:
 Annealing schedule factor .............. <       0.9000 >:
 # of independent atoms to modify ....... <            1 >:
 Max # of iterations at each T .......... <        55000 >:
 Max # of successes at each T ........... <         5500 >:
 Min # of successes to continue ......... <           55 >:
 Max # of annealing steps ............... <          100 >:
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E-02 Suc:  5500 Eva:    10277 CPU:  0.969E+00 SqF: 0.5257
  Rf: 0.11500 His: 29.47 Bnd: 1.951 Dis:0.0315 Per :0.2139
_..._
 j:  37 T: 0.225E-04 Suc:    36 Eva:  1431936 CPU:  0.134E+03 SqF: 0.0943
  Rf: 0.03583 His:  6.50 Bnd: 0.087 Dis:0.0000 Per :0.4718

 Final Chi^2 against raw data ........................... : 1.335

  === GASBOR ATSAS 4.0.1 (8aa369f3b) finished on   02-Oct-2023   13:58:36