gasbormx
Manual
Introduction
GASBOR is program for ab initio reconstruction of protein structure by a chain-like ensemble of dummy residues.
GASBORMX, extends GASBORI to account for oligomeric equilibria. In this mode, an ab initio model of a symmetric oligomer is built under the assumption that some fraction of monomers remains free in solution (i.e., a polydisperse sample).
GASBORMX can handle up to approximately 8000 dummy residues and atoms (residues plus dummy waters). Since the water shell is typically small (residue-to-water ratio around 3:1), this corresponds to a practical upper limit of about 6000 residues, or a molecular mass of roughly 700 kDa. The runtime scales quadratically with the number of residues, so while small proteins (e.g. lysozyme, 129 residues) may complete in seconds to minutes, large systems can take substantially longer. For proteins larger than ~2000 residues, the internal structure has less impact on the scattering profile, and faster alternatives such as DAMMIF or DAMMIN are recommended for comparable results.
Algorithm description
The algorithm is the same as described for GASBORI and GASBORP.
Running gasbor
Command-Line Arguments and Options
Usage:
$ gasbormx [OPTIONS] [GNOMFILE] [nDR]
GASBORMX requires the following command line arguments:
Argument | Description |
---|---|
GNOMFILE | A relative or absolute path to regularised SAS data (.out). |
nDR | Number of dummy residues in asymmetric part. |
GASBOR recognizes the following command-line options.
Option | Description |
---|---|
--seed <INT> | Set the seed for the random number generator |
--mo <U|E> | Configuration mode, either User or Expert. |
--lo <LOG_FILE> | Prefix to prepend to output filenames. Default is the base name of the GASBOR input file without extension. |
--model-format <FMT> | Format of 3D models, one of: cif, pdb (default: cif) |
--id <DESCRIPTION> | Project description. By default, the command line content is used. |
--un <UNIT> | Angular unit of the input file, either ‘1’ (\(\AA^{-1}\)) or ‘2’ (\(\text{nm}^{-1}\)); if not given, the application will attempt to guess the units from the data. |
--sy <SYMMETRY> | Specify the point symmetry of the particle. Point groups P2, …, P9, Pn2 (n = 2, …, 9) or P23 or P432 or PICO are supported. By default, no symmetry is enforced (P1). |
--an <ANISOMETRY> | Particle anisometry: oblate (O), prolate (P) or unknown (default). |
--dr <DIRECTION> | Direction of anisometry, applicable with P2 symmetry only: along (L), across (C) or unknown (default). |
--help | Print usage information and exit. |
--version | Print version information and exit. |
Interactive Configuration
If the optional argument GNOMFILE is omitted, settings available through command-line arguments and options may also be configured interactively as shown in the table below. Otherwise these questions are skipped.
Screen text | Default value | Description |
---|---|---|
Computation mode (User or Expert) | User | Expert mode allows to configure all aspects of the procedure, while User mode applies default values for most features. |
Screen text | Mode | Default value | Description |
---|---|---|---|
Log file name | U|E | N/A | Prefix to the output file names. |
Enter project description | U|E | N/A | Short description of the run. |
Total number of curves to fit | U|E | 1 | |
Number of knots on the master grid | E | 101 |
The following questions are repeated for each input curve:
Screen text | Mode | Default value | Description |
---|---|---|---|
Input data, GNOM output file name | U|E | N/A | Input file with scattering data and p(r) curve. |
Angular units in the input file 4pisin(theta)/lambda [1/angstrom] (1) 4pisin(theta)/lambda [1/nm] (2) |
U|E | 1 | Angular units of the input file, one of \(\AA^{-1}\) or \(\text{nm}^{-1}\). Default is \(\AA^{-1}\). |
Portion of the curve to be fitted | U|E | 1.0 | How much of the data is used for fitting, starting from the beginning. The whole curve is used by default. |
Volume fraction of monomer (if known) | U|E | -1.0 | If known, enter the volume fraction of the intact monomer (a number between 0 and 1). This value will be kept fixed during modeling. If unknown, leave as default (-1) and the program will determine it automatically during optimization. |
Then continues as:
Screen text | Mode | Default value | Description |
---|---|---|---|
Initial DRM (CR for random) | U|E | Starting configuration. Leave empty (press Enter) to use a random start. | |
Symmetry: P2…9 or Pn2 (n=2,..,9) or P23 or P432 or PICO | U|E | P2 | Expected particle symmetry. |
Number of residues in asymmetric part | U|E | N/A | Number of dummy residues in one asymmetric unit. |
Fibonacci grid order | U|E | 12 | Controls how many dummy waters are distributed. 0–18. |
Radius of the search volume | E | \(D_{\text{max}}/2\) | Radius of the volume in which dummy atoms will be placed. Limits the sampling space. |
Histogram penalty weight | E | 1.0000E-3 | Penalizes unrealistic distributions of inter-residue distances. |
Bond length penalty weight | E | 1.0000E-2 | Penalizes bond lengths different from \(3.8 \AA\). |
Discontiguity penalty weight | E | 1.0000E-2 | Penalizes disconnected dummy residues. |
Peripheral penalty weight | E | 1.0000E0 | Encourages compact structures early on. Reduced during annealing. |
Expected particle shape: <P>rolate, |
U|E | Unknown | Define if the particle is strongly non-spherical. |
Contrast of the hydration layer | E | 3.0000E-2 | Contrast of the hydration layer relative to the solvent. |
Seqence file name, CR for none | E | N/A | File with amino acid sequence for more accurate modeling (Enter for none). Lines in this file must not exceed 256 characters. |
Weight: 0=s^2,1=const at s<MaxPor,2=aver Weight: 3=s ,4=const at s<MaxI*s,5=aver Weight: 6=logarithmic scale |
E | 2 | Weight I(s) fit according to: - 0: weight \(I(s)\) proportional to \(s^2\) - 1: weight \(I(s)\) proportional to \(s^2\), but with a constant where \(s < \max(I \cdot s^2)\) - 2: the average of options 0 and 1 - 3: weight \(I(s)\) proportional to \(s\) - 4: weight \(I(s)\) proportional to \(2\), but with a constant where \(s < \max(I \cdot s)\) - 5: the average of options 3 and 4 - 6: calculate fit on logarithmic scale |
Account for constant background | E | Y | Subtracts a constant from the data during fitting. |
Initial annealing temperature | E | 1.0000E-3 | Controls early-stage flexibility. Higher = more exploration. |
Annealing schedule factor | E | 0.9000 | Controls how fast the model becomes less flexible. Closer to 1 = slower cooling. |
# of independent atoms to modify | E | 1 | Number of dummy residues moved per annealing step. |
Max # of iterations at each T | E | 55000 | Maximum steps allowed at a given temperature. |
Max # of successes at each T | E | 5500 | If this many successful moves are made, temperature is decreased early. |
Min # of successes to continue | E | 55 | Stops the program if fewer than this number of moves succeed. |
Max # of annealing steps | E | 100 | Program ends after this many annealing cycles. |
Runtime Output
After printing program version number and querying or printing all parameters, GASBOR will display a message that Simulated annealing procedure started and after each round of simulated annealing at new temperature, it will print a report line:
j: 1 T: 0.100E+00 Suc: 10500 Eva: 10828 CPU: 0.542E+01 SqF: 0.4870
Rf: 0.38044 His: 30.50 Bnd: 2.265 Dis:2.6937 Per :0.2471
Report header | Description |
---|---|
j: | Iteration number. |
T: | Temperature of iteration. |
Suc: | Number of successes at given iteration. |
Eva: | Total number of function evaluations until end of this iteration. |
CPU: | Total CPU time in seconds since beginning of run until end of this iteration. |
SqF: | square root of the target function at the end of iteration |
Rf: | R-factor penalty at the end of iteration |
His: | Histogram penalty at the end of iteration |
Bnd: | Bond angle penalty at the end of iteration |
Dis: | Discontiguity penalty at the end of iteration |
Per: | Peripheral penalty at the end of iteration |
After the run is completed, final \chi ^2^ against all data files are printed to the output.
gasbor Input Files
GASBORMX requires one or more regularised SAS data (.out) files as generated by GNOM.
In addition, an optional residue sequence data (.seq) file can be provided in Expert mode.
gasbor Output Files
GASBORMX outputs a set of files, each filename starts with a customizable prefix option. If a prefix has been used before, existing files will be overwritten without further note.
Extension | Description |
---|---|
.log | A copy of the screen output |
.pdb or .cif | The model is provided in either PDB or mmCIF format, depending on the model-format option. |
-i.fit | Fit of the simulated scattering curve versus the ith smoothed-out version of the real-data. See interactive mode how to change the number of supporting points in the spline interpolation. |
-i.fir | Fit of the simulated scattering curve versus the ith experimental data. |
-monomer.dat | Calculated scattering of the mononmer. |
-dimer.dat | Calculated scattering of the mononmer. |
Examples
Concentration series of Tetanus toxin
There are three curves available on Tetanus toxin: one corresponds to pure monomer, the other two are from monomer-dimer equilibrium with unknown volume fractions. In this gasbormx example all three curves are fitted simultaneously, while keeping the volume fraction of the monomer fixed to 1 for the monoimer curve and varying monomer and dimer volume fractions for the other two. The overall P2 symmetry is used and 451 residues per monomer are generated.
*** Ab inito reconstruction of a protein structure ***
*** by a chain-like ensemble of dummy residues (mix) ***
*** Takes into account oligomer-monomer equilibrium ***
*** Please reference: D.I.Svergun, M.V.Petoukhov & ***
*** M.H.J.Koch (2001) Biophys. J. 80, 2946-2953 ***
Type gasbormx --help for batch mode use
=== GASBOR ATSAS 4.0.1 (6378ba7) started on 01-Oct-2023 16:22:34
Computation mode (User or Expert) ...... < User >:
Log file name .......................... < .log >: **tetomx1**
Project identifier ..................................... : tetomx1
Enter project description .............. : **3 curves**
Warning: initialising the random seed when it has already been initialised
Previous seed: 27016039586466748
New seed: 27020856437574588
Initialized random seed as ..................... : 27020856437574588
Total number of curves to fit .......... < 1 >: **3**
Number of knots on the master grid ..................... : 101
Curve # ................................................ : 1
Input data, GNOM output file name ...... < .out >: **hcm_mer**
Data set title
Maximum diameter of the particle ....................... : 10.05
Radius of gyration ..................................... : 2.938
Number of GNOM data points ............................. : 591
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2) < 2 >:
Angular units multiplied by ............................ : 0.1000
Maximum diameter divided by ............................ : 0.1000
Maximum s value [1/angstrom] ........................... : 0.2691
Number of Shannon channels ............................. : 8.608
Portion of the curve to be fitted ...... < 1.000 >:
Reduced s maximum ...................................... : 0.2684
Reduced number of Shannon channels ..................... : 8.585
Volume fraction of monomer (if known) .. < -1.000 >: **1**
Curve # ................................................ : 2
Input data, GNOM output file name ...... < .out >: **hcp_a46c**
Data set title
Maximum diameter of the particle ....................... : 13.00
Radius of gyration ..................................... : 3.907
Number of GNOM data points ............................. : 941
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2) < 2 >:
Angular units multiplied by ............................ : 0.1000
Maximum diameter divided by ............................ : 0.1000
Maximum s value [1/angstrom] ........................... : 0.3384
Number of Shannon channels ............................. : 14.00
Portion of the curve to be fitted ...... < 1.000 >:
Reduced s maximum ...................................... : 0.3381
Reduced number of Shannon channels ..................... : 13.99
Volume fraction of monomer (if known) .. < -1.000 >:
Curve # ................................................ : 3
Input data, GNOM output file name ...... < .out >: **hcp_a48c**
Data set title
Maximum diameter of the particle ....................... : 15.00
Radius of gyration ..................................... : 4.393
Number of GNOM data points ............................. : 944
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2) < 2 >:
Angular units multiplied by ............................ : 0.1000
Maximum diameter divided by ............................ : 0.1000
Maximum s value [1/angstrom] ........................... : 0.3384
Number of Shannon channels ............................. : 16.16
Portion of the curve to be fitted ...... < 1.000 >:
Reduced s maximum ...................................... : 0.3381
Reduced number of Shannon channels ..................... : 16.14
Volume fraction of monomer (if known) .. < -1.000 >:
Initial DRM (CR for random) ............ < .pdb >:
Symmetry: P2...9 or Pn2 (n=2,..,9)
or P23 or P432 or PICO ................. < P2 >:**P2**
Number of equivalent positions ......................... : 2
Number of residues in asymmetric part .. < 1156 >: **451**
Fibonacci grid order ................... < 14 >:
Number of dummy waters ................................. : 611
Excluded volume per residue ............................ : 28.73
Radius of the search volume ............................ : 75.00
Histogram penalty weight ............................... : 1.000E-03
Bond length penalty weight ............................. : 1.000E-02
Discontiguity penalty weight ........................... : 1.000E-02
Peripheral penalty weight .............................. : 5.000E-02
Expected particle shape: <P>rolate, <O>blate,
or <U>nknown .......................... < Unknown >:
Contrast of the hydration layer ........................ : 3.000E-02
Computation of the initial intensity ...
Histogram penalty value ................................ : 32.79
Bond length penalty value .............................. : 0.6398
Initial DRM # of graphs ................................ : 731
Discontiguity value .................................. : 4.725
Peripheral penalty value ............................... : 0.2862
Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
*** Accounting for constant background ***
Constant background subtracted ......................... : -8.012E-04
Initial R^2 factor ..................................... : 0.5078
Initial R factor ..................................... : 0.7126
Volume fraction, monomer ............................... : 1.000
Constant background subtracted ......................... : -0.1363
Initial R^2 factor ..................................... : 7.185
Initial R factor ..................................... : 2.681
Volume fraction, monomer ............................... : 1.000
Constant background subtracted ......................... : -0.5360
Initial R^2 factor ..................................... : 7.029
Initial R factor ..................................... : 2.651
Initial penalty ........................................ : 0.1007
Initial fVal ........................................... : 14.82
Initial annealing temperature .......................... : 0.100
Annealing schedule factor .............................. : 0.9000
# of independent atoms to modify ....................... : 1
Max # of iterations at each T .......................... : 105000
Max # of successes at each T ........................... : 10500
Min # of successes to continue ......................... : 105
Max # of annealing steps ............................... : 100
==== Simulated annealing procedure started ====
j: 1 T: 0.100E+00 Suc: 10500 Eva: 10921 CPU: 0.311E+02 SqF: 0.4654
Rf: 0.35014 His: 36.75 Bnd: 2.707 Dis:2.0684 Per :0.1894
_..._
j: 96 T: 0.450E-05 Suc: 98 Eva: 3542664 CPU: 0.764E+04 SqF: 0.0942
Rf: 0.05956 His: 4.96 Bnd: 0.038 Dis:0.0000 Per :0.1477
Final Chi^2 against raw data ........................... : 1.669
Final Chi^2 against raw data ........................... : 0.9622
Final Chi^2 against raw data ........................... : 1.172
=== GASBOR ATSAS 4.0.1 (6378ba7) started on 01-Oct-2023 18:32:46