gasbori/gasborp
Manual
Introduction
GASBOR is program for ab initio reconstruction of protein structure by a chain-like ensemble of dummy residues.
There are two versions of GASBOR: one fits the scattering intensity in reciprocal space (GASBORI), and the other fits the real-space pair-distance distribution function, \(p(r)\) (GASBORP). Both versions use similar algorithms. The reciprocal-space version is slower but typically provides better fits to experimental data. The real-space version is significantly faster and is recommended when the number of dummy residues becomes large, as runtime scales quadratically with that number.
GASBOR can handle up to approximately 8000 dummy residues and atoms (residues plus dummy waters). Since the water shell is typically small (residue-to-water ratio around 3:1), this corresponds to a practical upper limit of about 6000 residues, or a molecular mass of roughly 700 kDa. The runtime scales quadratically with the number of residues, so while small proteins (e.g. lysozyme, 129 residues) may complete in seconds to minutes, large systems can take substantially longer. For proteins larger than ~2000 residues, the internal structure has less impact on the scattering profile, and faster alternatives such as DAMMIF or DAMMIN are recommended for comparable results.
Algorithm description
The use of GASBOR is similar to that of DAMMIN or DAMMIF. Most of parameters have the same meaning. The most important difference is that the protein structure is represented not by dummy spheres on lattice (called dummy atoms in DAMMIN / DAMMIF, but not corresponding to real atoms), but rather by an ensemble of dummy residues (corresponding to average residue densities) placed anywhere in continuous space with a preferred number of close distance neighbours for each atom. The centers of these residues aim to approximate positions of the \(\text{C}\alpha\) atoms in the protein structure. The number of residues should be equal to that in the protein.
Note, however, that these residues are anonymous, in the sense that their ordinal numbers in the model has nothing to do with the numbering primary sequence of the protein! Accordingly, the program does not subtract any Porod constant from the experimental data.
In DAMMIN, it was recommended to discard high angle portions of the scattering patterns; in GASBOR, on the contrary, one should use them. The program is able to fit the data up to the resolution of 5 \(\AA\), i.e. momentum transfer \(s = 4 \pi \cdot sin(\theta)/\lambda = 1.2 \AA^{-1}\).
Running gasbor
Command-Line Arguments and Options
Usage:
$ gasbori [OPTIONS] [GNOMFILE] [nDR]
$ gasborp [OPTIONS] [GNOMFILE] [nDR]
Both GASBOR variants require the following command line arguments:
Argument | Description |
---|---|
GNOMFILE | A relative or absolute path to regularised SAS data (.out). |
nDR | Number of dummy residues in asymmetric part. |
GASBOR recognizes the following command-line options.
Option | Description |
---|---|
--seed <INT> | Set the seed for the random number generator |
--mo <U|E> | Configuration mode, either User or Expert. |
--lo <LOG_FILE> | Prefix to prepend to output filenames. Default is the base name of the GASBOR input file without extension. |
--model-format <FMT> | Format of 3D models, one of: cif, pdb (default: cif) |
--id <DESCRIPTION> | Project description. By default, the command line content is used. |
--un <UNIT> | Angular unit of the input file, either ‘1’ (\(\AA^{-1}\)) or ‘2’ (\(\text{nm}^{-1}\)); if not given, the application will attempt to guess the units from the data. |
--sy <SYMMETRY> | Specify the point symmetry of the particle. Point groups P1, …, P19, Pn2 (n = 2, …, 12), P23, P432 or PICO (icosahedral) are supported. By default, no symmetry is enforced (P1). |
--an <ANISOMETRY> | Particle anisometry: oblate (O), prolate (P) or unknown (default). |
--dr <DIRECTION> | Direction of anisometry, applicable with P2 symmetry only: along (L), across (C) or unknown (default). |
--help | Print usage information and exit. |
--version | Print version information and exit. |
Interactive Configuration
If the optional argument GNOMFILE is omitted, settings available through command-line arguments and options may also be configured interactively as shown in the table below. Otherwise these questions are skipped.
Screen text | Default value | Description |
---|---|---|
Computation mode (User or Expert) | User | Expert mode allows to configure all aspects of the procedure, while User mode applies default values for most features. |
Screen text | Mode | Default value | Description |
---|---|---|---|
Log file name | U|E | N/A | Prefix to the output file names. |
Input data, GNOM output file name | U|E | N/A | Input file with scattering data and p(r) curve.. |
Enter project description | U|E | N/A | Short description of the run. |
Angular units in the input file 4pisin(theta)/lambda [1/angstrom] (1) 4pisin(theta)/lambda [1/nm] (2) |
U|E | 1 | Angular units of the input file, one of \(\AA^{-1}\) or \(\text{nm}^{-1}\). Default is \(\AA^{-1}\). |
Portion of the curve to be fitted | U|E | 1.0 | How much of the data is used for fitting, starting from the beginning. The whole curve is used by default. |
Number of knots in the curve to fit | E | 11-201 | Number of points used in the regularized fit. Default depends on input data. |
Initial DRM (CR for random) | U|E | Starting configuration. Leave empty (press Enter) to use a random start. | |
Symmetry: P1…19 or Pn2 (n=1,..,12) or P23 or P432 or PICO | U|E | P1 | Expected particle symmetry. |
Number of residues in asymmetric part | U|E | N/A | Number of dummy residues in one asymmetric unit. |
Fibonacci grid order | U|E | 9 | Controls how many dummy waters are distributed. 0–18. |
Radius of the search volume | E | \(D_{\text{max}}/2\) | Radius of the volume in which dummy atoms will be placed. Limits the sampling space. |
Histogram penalty weight | E | 1.0000E-3 | Penalizes unrealistic distributions of inter-residue distances. |
Bond length penalty weight | E | 1.0000E-2 | Penalizes bond lengths different from \(3.8 \AA\). |
Discontiguity penalty weight | E | 1.0000E-2 | Penalizes disconnected dummy residues. |
Peripheral penalty weight | E | 1.0000E0 | Encourages compact structures early on. Reduced during annealing. |
Expected particle shape: <P>rolate, |
U|E | Unknown | Define if the particle is strongly non-spherical. |
Contrast of the hydration layer | E | 3.0000E-2 | Contrast of the hydration layer relative to the solvent. |
Seqence file name, CR for none | E | N/A | File with amino acid sequence for more accurate modeling (Enter for none). Lines in this file must not exceed 256 characters. |
Weight: 0=s^2,1=const at s<MaxPor,2=aver Weight: 3=s ,4=const at s<MaxI*s,5=aver Weight: 6=logarithmic scale |
E | 2 | Weight I(s) fit according to: - 0: weight \(I(s)\) proportional to \(s^2\) - 1: weight \(I(s)\) proportional to \(s^2\), but with a constant where \(s < \max(I \cdot s^2)\) - 2: the average of options 0 and 1 - 3: weight \(I(s)\) proportional to \(s\) - 4: weight \(I(s)\) proportional to \(2\), but with a constant where \(s < \max(I \cdot s)\) - 5: the average of options 3 and 4 - 6: calculate fit on logarithmic scale |
Account for constant background | E | Y | Subtracts a constant from the data during fitting. |
Initial scale factor | E | N/A | Starting value for scaling the model to the data. |
Fixing threshold for Rf | E | 0.0 | Deprecated. Leave at 0.0. |
Fixing threshold for PenCha | E | 0.0 | Deprecated. Leave at 0.0. |
Fixing threshold for PenLen | E | 0.0 | Deprecated. Leave at 0.0. |
Initial annealing temperature | E | 1.0000E-3 | Controls early-stage flexibility. Higher = more exploration. |
Annealing schedule factor | E | 0.9000 | Controls how fast the model becomes less flexible. Closer to 1 = slower cooling. |
# of independent atoms to modify | E | 1 | Number of dummy residues moved per annealing step. |
Max # of iterations at each T | E | 40000 | Maximum steps allowed at a given temperature. |
Max # of successes at each T | E | 4000 | If this many successful moves are made, temperature is decreased early. |
Min # of successes to continue | E | 40 | Stops the program if fewer than this number of moves succeed. |
Max # of annealing steps | E | 100 | Program ends after this many annealing cycles. |
Runtime Output
After printing program version number and querying or printing all parameters, GASBOR will display a message that Simulated annealing procedure started and after each round of simulated annealing at new temperature, it will print a report line:
j: 1 T: 0.100E-02 Suc: 4000 Eva: 12125 CPU: 0.211E+00 SqF: 0.3309
Rf: 0.04743 His: 15.96 Bnd: 0.493 Dis:0.2389 Per :0.0840
Report header | Description |
---|---|
j: | Iteration number. |
T: | Temperature of iteration. |
Suc: | Number of successes at given iteration. |
Eva: | Total number of function evaluations until end of this iteration. |
CPU: | Total CPU time in seconds since beginning of run until end of this iteration. |
SqF: | square root of the target function at the end of iteration |
Rf: | R-factor penalty at the end of iteration |
His: | Histogram penalty at the end of iteration |
Bnd: | Bond angle penalty at the end of iteration |
Dis: | Discontiguity penalty at the end of iteration |
Per: | Peripheral penalty at the end of iteration |
After the run is completed, final \chi ^2^ against data is printed to the output.
gasbor Input Files
GASBORI and GASBORP require regularised SAS data (.out) as generated by GNOM.
In addition, an optional residue sequence data (.seq) file can be provided in Expert mode.
gasbor Output Files
GASBORI and GASBORP output a set of files, each filename starts with a customizable prefix option. If a prefix has been used before, existing files will be overwritten without further note.
Due to the different implementations, the output files of GASBORI and GASBORP differ.
Output files of gasbori
Extension | Description |
---|---|
.log | A copy of the screen output |
.pdb or .cif | The model is provided in either PDB or mmCIF format, depending on the model-format option. |
.fit | Fit of the simulated scattering curve versus a smoothed-out version of the real-data. See interactive mode how to change the number of supporting points in the spline interpolation. |
.fir | Fit of the simulated scattering curve versus the experimental data. |
Output files of gasborp
Extension | Description |
---|---|
.log | A copy of the screen output |
.pdb or .cif | The model is provided in either PDB or mmCIF format, depending on the model-format option. |
.hst | Fit of the simulated p(r) versus the provided p(r). |
.fir | Fit of the simulated scattering curve versus the experimental data. |
Examples
Lysozyme
Lysozyme has no symmetry, and 129 residues: Enter P1 symmetry, 129 residues and
default answers to all other questions.
You may also use command line (type gasbori --help
for batch mode use):
$ gasbori gnlyzfu.out 129
Here is resulting output:
*** Ab inito reconstruction of a protein structure ***
*** by a chain-like ensemble of dummy residues ***
*** Please reference: D.I.Svergun, M.V.Petoukhov & ***
*** M.H.J.Koch (2001) Biophys. J. 80, 2946-2953 ***
Type gasbori --help for batch mode use
=== GASBOR ATSAS 4.0.1 (6378ba7) started on 29-Sep-2023 12:48:23
Project identifier ..................................... : gnlyzfu
Project description:
Initialized random seed as ..................... : 661145759406964600
Data set title ......................................... : Angular axis n01000.sax Datafile n10000.sub
Maximum diameter of the particle ....................... : 50.00
Radius of gyration ..................................... : 14.33
Number of GNOM data points ............................. : 230
Maximum s value [1/angstrom] ........................... : 1.316
Number of Shannon channels ............................. : 20.94
Reduced s maximum ...................................... : 1.307
Reduced number of Shannon channels ..................... : 20.80
Number of knots in the curve to fit .................... : 42
Symmetry: P1...19 or Pn2 (n=1,..,12)
Number of equivalent positions ......................... : 1
Number of dummy waters ................................ : 90
Excluded volume per residue ............................ : 28.73
Radius of the search volume ............................ : 25.00
Histogram penalty weight ............................... : 1.000E-03
Bond length penalty weight ............................. : 1.000E-02
Discontiguity penalty weight ........................... : 1.000E-02
Peripheral penalty weight .............................. : 1.000
Contrast of the hydration layer ........................ : 3.000E-02
Computation of the initial intensity ...
Histogram penalty value ................................ : 40.11
Bond length penalty value .............................. : 2.402
Initial DRM # of graphs ................................ : 60
Discontiguity value .................................. : 1.099
Peripheral penalty value ............................... : 0.2496
Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
*** Accounting for constant background ***
Initial scale factor ................................... : 1.092E-04
Constant background subtracted ......................... : -0.4002
Initial R^2 factor ..................................... : 0.1164
Initial R factor ..................................... : 0.3412
Initial penalty ........................................ : 0.3247
Initial fVal ........................................... : 0.4411
R-factor fixing threshold .............................. : 0.0
Fixing threshold PenCha ................................ : 0.0
Fixing threshold PenLen ................................ : 0.0
Initial annealing temperature .......................... : 1.000E-03
Annealing schedule factor .............................. : 0.9000
# of independent atoms to modify ....................... : 1
Max # of iterations at each T .......................... : 55000
Max # of successes at each T ........................... : 5500
Min # of successes to continue ......................... : 55
Max # of annealing steps ............................... : 100
==== Simulated annealing procedure started ====
j: 1 T: 0.100E-02 Suc: 5500 Eva: 11278 CPU: 0.106E+01 SqF: 0.5132
Rf: 0.09491 His: 25.78 Bnd: 1.398 Dis:0.1870 Per :0.2127
_..._
j: 36 T: 0.250E-04 Suc: 55 j: 36 T: 0.250E-04 Suc: 53 Eva: 1427584 CPU: 0.135E+03 SqF: 0.0912
Rf: 0.03409 His: 6.26 Bnd: 0.065 Dis:0.0000 Per :0.3664
Final Chi^2 against raw data ........................... : 1.248
=== GASBOR ATSAS 4.0.1 (6378ba7) finished on 29-Sep-2023 12:44:39
Transketolase
Transketolase is homodimer in solution, and each monomer has 680 residues, giving a total of 1360 residues: Enter P2 for symmetry, 680 for residues and default answers to all other questions.
*** Ab inito reconstruction of a protein structure ***
*** by a chain-like ensemble of dummy residues ***
*** Please reference: D.I.Svergun, M.V.Petoukhov & ***
*** M.H.J.Koch (2001) Biophys. J. 80, 2946-2953 ***
Type gasbori --help for batch mode use
=== GASBOR ATSAS 4.0.1 (6378ba7) started on 29-Sep-2023 12:56:50
Computation mode (User or Expert) ...... < User >:
Log file name .......................... < .log >: **log**
Input data, GNOM output file name ...... < .out >: **1trk.out**
Project identificator .................................. : log
Enter project description .............. : **project**
Random sequence initialized from ....................... : 164228
** Information read from the GNOM file **
Data set title: Transketolase collated from n85, o14+o16 6-11-98
Raw data file name: trkexp.dat
Maximum diameter of the particle ....................... : 12.00
Solution at Alpha = .164E+01 Rg : .336E+01 I(0) : .190E+03
Radius of gyration ..................................... : 3.360
Number of GNOM data points ............................. : 283
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2) < 2 >: **2**
Angular units multiplied by ............................ : 0.1000
Maximum diameter divided by ............................ : 0.1000
Maximum s value [1/angstrom] ........................... : 0.3418
Number of Shannon channels ............................. : 13.06
Portion of the curve to be fitted ...... < 1.000 >:
Number of knots in the curve to fit .................... : 26
Initial DRM (CR for random) ............ < .pdb >:
Symmetry: P1...19 or Pn2 (n=1,..,12)
or P23 or P432 or PICO ................. < P1 >: **P2**
Number of equivalent positions ......................... : 2
Number of residues in asymmetric part .. < 517 >: **680**
Fibonacci grid order ................... < 15 >:
Number of dummy waters ................................ : 988
Excluded volume per residue ............................ : 28.73
Radius of the search volume ............................ : 60.00
Histogram penalty weight ............................... : 1.000e-3
Bond length penalty weight ............................. : 1.000e-2
Discontiguity penalty weight ........................... : 1.000e-2
Peripheral penalty weight .............................. : 1.000
Expected particle shape: <P>rolate, <O>blate,
or <U>nknown .......................... < Unknown >:
Contrast of the hydration layer ........................ : 3.000e-2
Computation of the initial intensity ...
Histogram penalty value ................................ : 37.38
Bond length penalty value .............................. : 1.604
Initial DRM # of graphs ................................ : 708
Discontiguity value .................................. : 2.191
Peripheral penalty value ............................... : 0.2647
Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
*** Accounting for constant background ***
Initial scale factor ................................... : 5.042e-7
Constant background subtracted ......................... : 0.3339
Initial R^2 factor ..................................... : 3.837e-2
Initial R factor ..................................... : 0.1959
Initial penalty ........................................ : 0.3400
Initial fVal ........................................... : 0.3784
R-factor fixing threshold .............................. : 0.0
Fixing threshold PenCha ................................ : 0.0
Fixing threshold PenLen ................................ : 0.0
Initial annealing temperature .......................... : 1.000e-3
Annealing schedule factor .............................. : 0.9000
# of independent atoms to modify ....................... : 1
Max # of iterations at each T .......................... : 130000
Max # of successes at each T ........................... : 13000
Min # of successes to continue ......................... : 130
Max # of annealing steps ............................... : 100
==== Simulated annealing procedure started ====
j: 1 T: 0.100E-02 Suc: 13000 Eva: 14975 CPU: 0.272E+02 SqF: 0.5531
Rf: 0.11900 His: 36.59 Bnd: 2.213 Dis:0.3588 Per :0.2294
_..._
j: 56 T: 0.304E-05 Suc: 85 Eva: 3737295 CPU: 0.680E+04 SqF: 0.0797
Rf: 0.02350 His: 5.37 Bnd: 0.044 Dis:0.0000 Per :0.3197
Final Chi^2 against raw data ........................... : 1.774
=== GASBOR ATSAS 4.0.1 (6378ba7) finished on 29-Sep-2023 14:57:15
Lysozyme in Expert mode with sequence file
Lysozyme example with sequence-specific form-factors of dummy residues. Enter E for Expert mode, P1 symmetry, 129 residues, lyz.seq for the sequence and default answers to all other questions.
*** Ab inito reconstruction of a protein structure ***
*** by a chain-like ensemble of dummy residues ***
*** Please reference: D.I.Svergun, M.V.Petoukhov & ***
*** M.H.J.Koch (2001) Biophys. J. 80, 2946-2953 ***
Type gasbori --help for batch mode use
=== GASBOR ATSAS 4.0.1 (8aa369f3b) started on 02-Oct-2023 13:55:24
Computation mode (User or Expert) ...... < User >: **E**
Log file name .......................... < .log >: **lyzseq**
Input data, GNOM output file name ...... < .out >: **gnlyzfu**
Project identifier ..................................... : lyzseq
Enter project description .............. : **use sequence**
Initial random seed? (default: use current time) ..................... :
Warning: initialising the random seed when it has already been initialised
Previous seed: 39884949516326408
New seed: 39888779342845448
Initialized random seed as ..................... : 39888779342845448
Data set title ......................................... : Angular axis n01000.sax Datafile n10000.sub
Maximum diameter of the particle ....................... : 50.00
Radius of gyration ..................................... : 14.33
Number of GNOM data points ............................. : 230
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2) < 1 >:
Maximum s value [1/angstrom] ........................... : 1.316
Number of Shannon channels ............................. : 20.94
Portion of the curve to be fitted ...... < 1.000 >:
Reduced s maximum ...................................... : 1.307
Reduced number of Shannon channels ..................... : 20.80
Number of knots in the curve to fit .... < 42 >:
Initial DRM (CR for random) ............ < .pdb >:
Symmetry: P1...19 or Pn2 (n=1,..,12)
or P23 or P432 or PICO ................. < P1 >: **P1**
Number of equivalent positions ......................... : 1
Number of residues in asymmetric part .. < 80 >: **129**
Fibonacci grid order ................... < 10 >:
Number of dummy waters ................................ : 90
Excluded volume per residue ............................ : 28.73
Radius of the search volume ............ < 25.00 >:
Histogram penalty weight ............... < 1.0000E-3 >:
Bond length penalty weight ............. < 1.0000E-2 >:
Discontiguity penalty weight ........... < 1.0000E-2 >:
Peripheral penalty weight .............. < 1.000 >:
Expected particle shape: <P>rolate, <O>blate,
or <U>nknown .......................... < Unknown >:
Contrast of the hydration layer ........ < 3.0000E-2 >:
Seqence file name, CR for none ......... < .seq >: **lyz**
Sequence file name ..................................... : lyz.seq
Computation of the initial intensity ...
Histogram penalty value ................................ : 27.84
Bond length penalty value .............................. : 1.772
Initial DRM # of graphs ................................ : 65
Discontiguity value .................................. : 2.152
Peripheral penalty value ............................... : 0.2892
Weight: 0=s^2,1=const at s<MaxPor,2=aver
Weight: 3=s ,4=const at s<MaxI*s,5=aver
Weight: 6=logarithmic scale ............ < 2 >:
Account for constant background [ Y / N ] < Yes >:
*** Accounting for constant background ***
Initial scale factor ................... < 9.9872E-5 >:
Constant background subtracted ......................... : -0.4946
Initial R^2 factor ..................................... : 0.1717
Initial R factor ..................................... : 0.4144
Initial penalty ........................................ : 0.3563
Initial fVal ........................................... : 0.5280
Fixing threshold for Rf ................ < 0.000 >:
Fixing threshold for PenCha ............ < 0.000 >:
Fixing threshold for PenLen ............ < 0.000 >:
Initial annealing temperature .......... < 1.0000E-3 >:
Annealing schedule factor .............. < 0.9000 >:
# of independent atoms to modify ....... < 1 >:
Max # of iterations at each T .......... < 55000 >:
Max # of successes at each T ........... < 5500 >:
Min # of successes to continue ......... < 55 >:
Max # of annealing steps ............... < 100 >:
==== Simulated annealing procedure started ====
j: 1 T: 0.100E-02 Suc: 5500 Eva: 10277 CPU: 0.969E+00 SqF: 0.5257
Rf: 0.11500 His: 29.47 Bnd: 1.951 Dis:0.0315 Per :0.2139
_..._
j: 37 T: 0.225E-04 Suc: 36 Eva: 1431936 CPU: 0.134E+03 SqF: 0.0943
Rf: 0.03583 His: 6.50 Bnd: 0.087 Dis:0.0000 Per :0.4718
Final Chi^2 against raw data ........................... : 1.335
=== GASBOR ATSAS 4.0.1 (8aa369f3b) finished on 02-Oct-2023 13:58:36