Ensemble Optimization Method

Contrary to other manuals, this text does not describe a binary named eom, but the Ensemble Optimization Method (EOM) in general. The EOM workflow consists of separate programs, each documented individually.

Introduction

The Ensemble Optimization Method (EOM) is a suite of programs that facilitate fitting of an averaged theoretical scattering intensity derived from an ensemble of conformations to experimental SAXS data. For this, one or more pools of independent models based upon sequence and structural information has to be generated.

Once one or more pools have been generated, the user can calculate the theoretical scattering intensities of the models in the pool using FFMAKER. FFMAKER will generate input to be passed to the ensemble selection methods: a genetic algorithm (GAJOE) or non-negative linear least-squares algorithm (NNLSJOE) for the selection of an ensemble. Either selection algorithm compares the averaged theoretical scattering intensity from the independent ensembles of conformations against the scattering data. The ensemble that best describes the experimental SAXS data is selected.

Metrics for quantitative assessment of system flexibility

The distributions of \(R_g\) and \(D_{\max}\) generated by EOM (specifically the GAJOE module) can be represented as probability density functions. This allows for a quantitative estimation of the flexibility of the system using the concept of information entropy. For example, an ensemble/pool of structural parameters for a protein showing a broad Gaussian-like distribution (where it is assumed the disordered regions move randomly in solution) can be viewed as a carrier of high uncertainty. Conversely, an ensemble/pool of parameters for a protein with a narrow size distribution (a scenario where the particle exhibits limited flexibility) provides low uncertainty.

Useful metrics for the quantitative description of uncertainty (flexibility) provided by GAJOE are as follows.

Degree of Flexibility

A metric for the degree of flexibility of the selected ensemble and that of the pool. \(R_{\text{flex}} = 100\%\) for a fully flexible system, \(R_{\text{flex}} = 0\%\) for a fully rigid system. Here:

\[R_{\text{flex}} = -H_b(S)\]

where

\[H_b(S)=-\sum_{i=1}^n p(x_i) \log_b \left[p(x_i )\right]\]

with \(p(x_i) \log_b \left[p(x_i )\right] = 0\) if \(p(x_i) = 0\). Please refer to the article for further detail.

Variance of the distributions Metric for evaluation of the variance of the distributions of the selected ensemble and that of the pool, defined as the ratio of the standard deviations of the selected ensemble and that of the pool.

\[R_\sigma = \text{standard_deviation(ensemble) / standard_deviation(pool)}\]

\(R_\sigma\) approaches 1.0 for a fully flexible system and \(R_\sigma < 1.0\) for systems with significant flexibility.

For example, the following output from EOM/GAJOE facilitates assessment of the flexibility of the system:

Rflex (random) / Rsigma: ~ 66.6% (~ 91.2%) / 0.62

\(R_{\text{flex}}\) of the selected ensemble is ~67%, compared to ~91% for the pool, suggesting that this system is significantly less flexible than the pool. \(R_{\sigma}\) is much less than 1.0, supporting the hypothesis that the system is significantly flexible.

NOTE: if \(R_{\text{flex}}\) of the ensemble is significantly smaller than that of the pool, but \(R_{\sigma} > 1.0\) this may indicate a problem with the experimental data and further investigation is required.

Workflow

The general workflow for EOM:

  • generate one or more pools of candidate models
  • calculate theoretical scattering of all pool models
  • select a subset of models that fits the experiemental data

Command-Line

In ATSAS, the following applications are intended for these steps:

However, other methods, e.g. Molecular Dynamics, may be used to build suitable pools.

Graphical Interface

Figure 1: first page of the EOM wizard when started from the ATSAS Application Launcher.

As an alternative to usage from the command-line, EOM may also be run as a wizard from the ATSAS Application Launcher.

This wizard allows to either generate pools with RANCH, or to import pools generated beforehand with other tools, and to optionally do the processing with FFMAKER and analysis with GAJOE or NNLSJOE. After completion of all calculations, the fits can be inspected and the output files saved.