Manual

The following describes the method implemented in EFAMIX, details of the configuration as well as the required input and the produced output files.

Introduction

EFAMIX is used to separate mixed SAXS signals into the profiles of individual components and their concentration changes across frames.

Based on an ordered series of experimental data frames (for example from SEC-SAXS), EFAMIX estimates:

  1. the scattering profile of each component
  2. the concentration profile (volume fraction) of each component over frames

The method is based on SVD and evolving factor analysis (EFA), where the data matrix is analyzed progressively along the frame order.

A key assumption is that components appear and disappear in the same order. In other words, the component that appears first is also expected to disappear first, then the second, and so on. From forward and backward EFA, EFAMIX defines a concentration window for each component. Outside this window, the component concentration is set to zero.

Typical use cases are SEC-SAXS and other ordered SAXS series where this behavior is expected. EFAMIX can be run from the command line or through CHROMIXS.

Running EFAMIX

Usage:

$ efamix [OPTIONS] [FILE(S)]

OPTIONS known by EFAMIX are described in the next section, the optional argument(s) FILE(S) in the section on input files. If no OPTIONS is given, the configuration is done in full interactive mode.

In general, command-line options can be used to make choices about the parameters of the algorithm, while the interactive configuration is used to govern the data processing.

Command-Line Arguments and Options

EFAMIX accepts the following command line arguments:

Argument Description
FILES Optional. An ordered series of experimental SAS data (.dat) files

Absolute as well as relative paths to data files are accepted.

If --filelist is used, the input files are read from the list file instead of from positional arguments.

Command-Line Options

EFAMIX recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.

Short Option Long Option Description
-m –mode= Mode of data processing: automatic (default) or interactive.
-n –Ncomp= Specify the number of components in the protein mixture (2 to 4).
-b –buffer= Buffer frame selection: select the Nth-Mth data file in the input list. Multiple ranges can be separated by commas (e.g., 100-200,250-300).
-s –sample= Sample frame selection: select the Nth-Mth data file in the input list (e.g., 1200-1500).
-1 –component1= Component 1 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file.
-2 –component2= Component 2 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file.
-3 –component3= Component 3 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file.
-4 –component4= Component 4 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file.
  –Nbeg= First SAXS data point to process. –smin can be used instead.
  –Nend= Last SAXS data point to process. –smax can be used instead.
  –smin= Start s-value in the SAXS curve to process. –Nbeg can be used instead.
  –smax= End s-value in the SAXS curve to process. –Nend can be used instead.
-p –prefix= Prefix prepended to output filenames (default: efamix).
-c –conc-file= File containing concentration-window information. This is an alternative to the –componentN options.
  –show-progress Show run progress information.
-e –error-weighting=<Y/N> Apply error weighting of input SEC-SAXS data during decomposition (default: Y).
-w –write-fit Write fit files to the output.
  –brief Write component/concentration profiles in combined files instead of separate files per component.
-f –filelist= File containing the SEC-SAXS data frames to process (one path per line). Useful when command lines become long.
-v –version Print version information and exit.
-h –help Print a summary of arguments and options, then exit.

The options –buffer and –sample are required.

Interactive Configuration

If some of the options are omitted in the command-line, they may also be configured interactively as shown in the table below. Otherwise these questions are skipped.

An interactive answers file (.ans) may be used to record and replay configurations, enabling repeatable runs without re-entering parameters.

Screen Text Default Description
Enter number of components in the system: 2 Number of components in the system. Allowed range is 2 to 4.
Enter number of start frame to include: 1 First sample frame index to include.
Enter number of last frame to include: 3000 Last sample frame index to include.
Enter number of start buffer frame to average: 1 First buffer frame index to average.
Enter number of last buffer frame to average: 200 Last buffer frame index to average.
Enter number of first SAXS data point - Nbeg: 1 First SAXS data point to process.
Enter number of last SAXS data point - Nend: max data range Last SAXS data point to process.

EFAMIX Input Files

EFAMIX expects experimental SAS data (.dat) files.

The file paths can be relative or absolute. For SEC-SAXS data sets from P12, the typical naming pattern is root-name_NNNNN.dat, where NNNNN are frame numbers in ascending order from 00001 to the total number of frames.

Instead of enumerating files at the command line, --filelist can be used. The list file must contain one input file path per line. Paths may be relative to the working directory or absolute.

Information about concentration windows can optionally be provided in a configuration file with the following format. If not specified, windows are estimated automatically by EFA:

Comp1
140 280
Comp2
220 380

EFAMIX Output Files

With each successful run, EFAMIX creates a set of output files in different subfolders

  • Component_and_Concentration_profiles
  • Individual_frames_subtracted
  • Restored_individual_frames_subtracted
  • Singular_value_EFA_plots

Each filename starts with a customizable --prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

Extension Description
prefix_component_profiles.dat Scattering profiles of components restored by EFAMIX. Column 1 is the s-axis; column 2 is component 1; column 3 is component 2; etc. If --brief is not used, separate files are written per component. Subfolder: “Component_and_Concentration_profiles”
prefix_concentration_profiles.dat Concentration profiles of restored components. Column 1 is frame number (starting from Nframe_start); columns 2..N are component concentrations; the last column is the summed concentration profile. If --brief is not used, separate files are written per component. Subfolder: “Component_and_Concentration_profiles”
prefix.log Input parameters and estimated concentration windows for components. Updated during execution. Subfolder: “Component_and_Concentration_profiles”
prefix_NNNNN_sub.dat Experimental data with subtracted buffer signal. Numbering starts at Nframe_start and ends at Nframe_end. Created only if --write-fit is used. Subfolder: “Individual_frames_subtracted”
prefix_NNNNN_sub_restored.dat EFAMIX-restored data (fit files). Numbering starts at Nframe_start and ends at Nframe_end. Created only if --write-fit is used. Subfolder: “Restored_individual_frames_subtracted”
prefix_forwards*.dat Written as prefix-_diag_forwards_N.dat and prefix-_grad_forwards_N.dat (N is the number of components, 2 to 4). Contains evolving singular values and their gradients in forward direction. Created only if --write-fit is used. Subfolder: “Singular_value_EFA_plots”
prefix_backwards*.dat Written as prefix-_diag_backwards_N.dat and prefix-_grad_backwards_N.dat (N is the number of components, 2 to 4). Contains evolving singular values and their gradients in backward direction. Created only if --write-fit is used. Subfolder: “Singular_value_EFA_plots”
prefix_conc_window*.dat Written as prefix-_conc_window_N.dat (N is the number of components, 2 to 4). Contains concentration-window ranges estimated by EFA or set interactively. Created only if --write-fit is used. Subfolder: “Singular_value_EFA_plots”

Examples

Please note that the prefixes in the examples may be chosen arbitrarily. The values below are chosen for maximum clarity only.

SEC-SAXS data set

Use EFAMIX in automatic mode to restore the scattering profiles and concentrations of the components in a two-component protein mixture. The concentration windows for the components will be estimated automatically from the EFA analysis:

$ efamix --Ncomp=2 --mode=automatic --sample=1300-1500 --buffer=100-200 --Nbeg=20 --Nend=1000 --prefix=bsa --write-fit bsa/sec*.dat

Time-resolved SAXS data set

Use EFAMIX to process a time-resolved SAXS data set where components appear one after another and disappear in the same order. Prior to running, SVD analysis of the SAXS data set should be performed to ensure that the number of independent components of the system lies between 2 and 4.

$ efamix --Ncomp=3 --mode=automatic --sample=1300-1500 --buffer=100-200 --Nbeg=20 --Nend=1000 --prefix=prion --write-fit --show-progress --component1=1320-1380 --component2=1360-1480 --component3=1400-1490 data/sec*.dat

IEX-SAXS data set

For best results, run EFAMIX in INTERACTIVE mode, customizing the input parameters as required. With the following command all fit files will be written to the output, the progress of the program run will be shown on the screen:

$ efamix --Ncomp=4 --mode=interactive --sample=1300-1500 --buffer=100-200 --smin=0.1 --smax=4.0 --prefix=fab_fc --write-fit --show-progress --conc-file=conc_window.con ../SAXS/data/sec*.dat