efamix
Manual
The following describes the method implemented in EFAMIX, details of the configuration as well as the required input and the produced output files.
Introduction
EFAMIX is used to separate mixed SAXS signals into the profiles of individual components and their concentration changes across frames.
Based on an ordered series of experimental data frames (for example from SEC-SAXS), EFAMIX estimates:
- the scattering profile of each component
- the concentration profile (volume fraction) of each component over frames
The method is based on SVD and evolving factor analysis (EFA), where the data matrix is analyzed progressively along the frame order.
A key assumption is that components appear and disappear in the same order. In other words, the component that appears first is also expected to disappear first, then the second, and so on. From forward and backward EFA, EFAMIX defines a concentration window for each component. Outside this window, the component concentration is set to zero.
Typical use cases are SEC-SAXS and other ordered SAXS series where this behavior is expected. EFAMIX can be run from the command line or through CHROMIXS.
Running EFAMIX
Usage:
$ efamix [OPTIONS] [FILE(S)]
OPTIONS known by EFAMIX are described in the next section, the optional argument(s) FILE(S) in the section on input files. If no OPTIONS is given, the configuration is done in full interactive mode.
In general, command-line options can be used to make choices about the parameters of the algorithm, while the interactive configuration is used to govern the data processing.
Command-Line Arguments and Options
EFAMIX accepts the following command line arguments:
| Argument | Description |
|---|---|
| FILES | Optional. An ordered series of experimental SAS data (.dat) files |
Absolute as well as relative paths to data files are accepted.
If --filelist is used, the input files are read from the list
file instead of from positional arguments.
Command-Line Options
EFAMIX recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.
| Short Option | Long Option | Description |
|---|---|---|
| -m | –mode= |
Mode of data processing: automatic (default) or interactive. |
| -n | –Ncomp= |
Specify the number of components in the protein mixture (2 to 4). |
| -b | –buffer= |
Buffer frame selection: select the Nth-Mth data file in the input list. Multiple ranges can be separated by commas (e.g., 100-200,250-300). |
| -s | –sample= |
Sample frame selection: select the Nth-Mth data file in the input list (e.g., 1200-1500). |
| -1 | –component1= |
Component 1 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file. |
| -2 | –component2= |
Component 2 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file. |
| -3 | –component3= |
Component 3 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file. |
| -4 | –component4= |
Component 4 frame selection. If undefined, it is estimated via EFA plots or the concentration-window file. |
| –Nbeg= |
First SAXS data point to process. –smin can be used instead. | |
| –Nend= |
Last SAXS data point to process. –smax can be used instead. | |
| –smin= |
Start s-value in the SAXS curve to process. –Nbeg can be used instead. | |
| –smax= |
End s-value in the SAXS curve to process. –Nend can be used instead. | |
| -p | –prefix= |
Prefix prepended to output filenames (default: efamix). |
| -c | –conc-file= |
File containing concentration-window information. This is an alternative to the –componentN options. |
| –show-progress | Show run progress information. | |
| -e | –error-weighting=<Y/N> | Apply error weighting of input SEC-SAXS data during decomposition (default: Y). |
| -w | –write-fit | Write fit files to the output. |
| –brief | Write component/concentration profiles in combined files instead of separate files per component. | |
| -f | –filelist= |
File containing the SEC-SAXS data frames to process (one path per line). Useful when command lines become long. |
| -v | –version | Print version information and exit. |
| -h | –help | Print a summary of arguments and options, then exit. |
The options –buffer and –sample are required.
Interactive Configuration
If some of the options are omitted in the command-line, they may also be configured interactively as shown in the table below. Otherwise these questions are skipped.
An interactive answers file (.ans) may be used to record and replay configurations, enabling repeatable runs without re-entering parameters.
| Screen Text | Default | Description |
|---|---|---|
| Enter number of components in the system: | 2 | Number of components in the system. Allowed range is 2 to 4. |
| Enter number of start frame to include: | 1 | First sample frame index to include. |
| Enter number of last frame to include: | 3000 | Last sample frame index to include. |
| Enter number of start buffer frame to average: | 1 | First buffer frame index to average. |
| Enter number of last buffer frame to average: | 200 | Last buffer frame index to average. |
| Enter number of first SAXS data point - Nbeg: | 1 | First SAXS data point to process. |
| Enter number of last SAXS data point - Nend: | max data range | Last SAXS data point to process. |
EFAMIX Input Files
EFAMIX expects experimental SAS data (.dat) files.
The file paths can be relative or absolute. For SEC-SAXS data sets from P12,
the typical naming pattern is root-name_NNNNN.dat, where NNNNN are frame
numbers in ascending order from 00001 to the total number of frames.
Instead of enumerating files at the command line, --filelist
can be used. The list file must contain one input file path per line.
Paths may be relative to the working directory or absolute.
Information about concentration windows can optionally be provided in a configuration file with the following format. If not specified, windows are estimated automatically by EFA:
Comp1
140 280
Comp2
220 380
EFAMIX Output Files
With each successful run, EFAMIX creates a set of output files in different subfolders
- Component_and_Concentration_profiles
- Individual_frames_subtracted
- Restored_individual_frames_subtracted
- Singular_value_EFA_plots
Each filename starts with a customizable --prefix that gets
an extension appended. If a prefix has been used before, existing files
will be overwritten without further note.
| Extension | Description |
|---|---|
prefix_component_profiles.dat |
Scattering profiles of components restored by EFAMIX. Column 1 is the s-axis; column 2 is component 1; column 3 is component 2; etc. If --brief is not used, separate files are written per component. Subfolder: “Component_and_Concentration_profiles” |
prefix_concentration_profiles.dat |
Concentration profiles of restored components. Column 1 is frame number (starting from Nframe_start); columns 2..N are component concentrations; the last column is the summed concentration profile. If --brief is not used, separate files are written per component. Subfolder: “Component_and_Concentration_profiles” |
prefix.log |
Input parameters and estimated concentration windows for components. Updated during execution. Subfolder: “Component_and_Concentration_profiles” |
prefix_NNNNN_sub.dat |
Experimental data with subtracted buffer signal. Numbering starts at Nframe_start and ends at Nframe_end. Created only if --write-fit is used. Subfolder: “Individual_frames_subtracted” |
prefix_NNNNN_sub_restored.dat |
EFAMIX-restored data (fit files). Numbering starts at Nframe_start and ends at Nframe_end. Created only if --write-fit is used. Subfolder: “Restored_individual_frames_subtracted” |
prefix_forwards*.dat |
Written as prefix-_diag_forwards_N.dat and prefix-_grad_forwards_N.dat (N is the number of components, 2 to 4). Contains evolving singular values and their gradients in forward direction. Created only if --write-fit is used. Subfolder: “Singular_value_EFA_plots” |
prefix_backwards*.dat |
Written as prefix-_diag_backwards_N.dat and prefix-_grad_backwards_N.dat (N is the number of components, 2 to 4). Contains evolving singular values and their gradients in backward direction. Created only if --write-fit is used. Subfolder: “Singular_value_EFA_plots” |
prefix_conc_window*.dat |
Written as prefix-_conc_window_N.dat (N is the number of components, 2 to 4). Contains concentration-window ranges estimated by EFA or set interactively. Created only if --write-fit is used. Subfolder: “Singular_value_EFA_plots” |
Examples
Please note that the prefixes in the examples may be chosen arbitrarily. The values below are chosen for maximum clarity only.
SEC-SAXS data set
Use EFAMIX in automatic mode to restore the scattering profiles and concentrations of the components in a two-component protein mixture. The concentration windows for the components will be estimated automatically from the EFA analysis:
$ efamix --Ncomp=2 --mode=automatic --sample=1300-1500 --buffer=100-200 --Nbeg=20 --Nend=1000 --prefix=bsa --write-fit bsa/sec*.dat
Time-resolved SAXS data set
Use EFAMIX to process a time-resolved SAXS data set where components appear one after another and disappear in the same order. Prior to running, SVD analysis of the SAXS data set should be performed to ensure that the number of independent components of the system lies between 2 and 4.
$ efamix --Ncomp=3 --mode=automatic --sample=1300-1500 --buffer=100-200 --Nbeg=20 --Nend=1000 --prefix=prion --write-fit --show-progress --component1=1320-1380 --component2=1360-1480 --component3=1400-1490 data/sec*.dat
IEX-SAXS data set
For best results, run EFAMIX in INTERACTIVE mode, customizing the input parameters as required. With the following command all fit files will be written to the output, the progress of the program run will be shown on the screen:
$ efamix --Ncomp=4 --mode=interactive --sample=1300-1500 --buffer=100-200 --smin=0.1 --smax=4.0 --prefix=fab_fc --write-fit --show-progress --conc-file=conc_window.con ../SAXS/data/sec*.dat