efamix
Manual
Introduction
EFAMIX program represents the algorithm for restoring the scattering profiles of individual components of protein mixtures using the evolving factor analysis (EFA). The scattering profiles of individual components and the corresponding concentration (volume fraction) profiles are restored. The method uses the singular value decomposition (SVD) [1] of multiple data set. The fundamental idea of EFA [2] is to follow the change or evolution of the rank of the data matrix as a function of the ordered variable, which is done by SVD on an increasing data matrix. In order to correctly associate the appearance of a given component with its disappearance, one assumes that the first substance present in the system will be the first to disappear, the second component will disappear next, and so on. Thus, the region of existence of a component, called concentration window, is generated for the i-th compound from the point where the rank rises to i in the forward EFA calculation to the point where the rank rises to N - i + 1 in the backward calculation [2]. Outside this concentration window a component is not present; its concentration is therefore is equal to zero. The typical examples of such kind of data are the SEC-SAXS data sets. At the next step the algorithm calculates the rotation matrix R from which the concentration matrix is restored and afterwards the individual scattering profiles. EFAMIX can be run as a standalone application or can be called from the program CHROMIXS via menu ‘File->Configure EFAMIX’.
Running EFAMIX
Usage:
$ efamix [OPTIONS] FILE(S)
Command-Line Arguments and Options
EFAMIX accepts the following command line arguments:
| Argument | Description | 
|---|---|
| FILE(S) | The filenames of aSEC-SAXS DATA set, possibly with relative or absolute path components. The arguments should be in the format of scattering curves (DAT files). The filename masks (symbols ‘*’ and ‘?’) for multiple sequences can be used. | 
If the option filelist is used, only one argument, the file naming all input files, is accepted.
OPTIONS known by EFAMIX are described in this section. In general, command-line options can be used to make choices about the parameters of the algorithm, while the interactive configuration is used to govern the data processing. If no OPTIONS is given, the configuration is done in full interactive mode.
Command-Line Options
EFAMIX recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.
| Short Option | Long Option | Description | 
|---|---|---|
| -m | --mode=<MODE> | Mode of data processing,AUTOMATIC(all calculations will be made automatically using default values of parameters), orINTERACTIVE. Default is ‘INTERACTIVE’.Seeexample. | 
| -n | --Ncomp=<NUMBER> | Specify the number of components in the protein mixture | 
| --Nframes_total=<NUMBER> | Specify the total number of time frames (number of data curves) in the SEC-SAXS data set. | |
| -b | --buffer=<N-M> | Buffer frame selection, select the Nth-Mth data file in the argument list, multiple ranges separated by ‘,’ as following 100-200,250-300 | 
| -s | --sample=<N-M> | Sample frame selection, select the Nth-Mth data file in the argument list, as following 1200-1500. | 
| --component-1=<N-M> | Component-1 frame selection, select the Nth-Mth data file in the argument list. If undefined, will be assessed via EFA plot or from the input conc-file. . | |
| --component-2=<N-M> | Component-2 frame selection, select the Nth-Mth data file in the argument list. If undefined, will be assessed via EFA plot or from the input conc-file. . | |
| --component-3=<N-M> | Component-3 frame selection, select the Nth-Mth data file in the argument list. If undefined, will be assessed via EFA plot or from the input conc-file. . | |
| --component-4=<N-M> | Component-4 frame selection, select the Nth-Mth data file in the argument list. If undefined, will be assessed via EFA plot or from the input conc-file. . | |
| --Nbeg=<NUMBER> | Specify the first data point in SAXS curve to be processed. An alternative option ‘-smin’ can be used instead. | |
| --Nend=<NUMBER> | Specify the last data point in SAXS curve to be processed. An alternative option ‘-smax’ can be used instead. | |
| --smin=<s> | Specify the start s-axis value in SAXS curve to be processed. An alternative option ‘–Nbeg’ can be used instead. | |
| --Nend=<s> | Specify the last s-axis value in SAXS curve to be processed. An alternative option ‘–Nend’ can be used instead. | |
| -p | --prefix=<NAME> | Specify the prefix name to prepand to any output filenames (default: ‘efamix’). | 
| -c | --conc-file=<NAME> | Specify the file name containing information about the concentration windows of the components. (default: ‘ ‘ ). It is an alternative way of using ‘–component-N’ options. | 
| -s | --show-progress=<TRUE/FALSE> | Specify the appearance of the run progress information. (default: FALSE) | 
| -e | --error-weighting=<TRUE/FALSE> | Specify if the error weighting of input SEC-SAXS data is applied during the data decomposition. (default: TRUE) | 
| -w | --write-fit=<TRUE/FALSE> | Specify the writing of fit files to the output. (default: FALSE) | 
| -b | --brief=<TRUE/FALSE> | Writing of Components/Concentrations profiles in separate files for each component/concentration (–brief=FALSE), or in common files, where all components are stored (–brief=TRUE). (default: FALSE) | 
| --filelist | Only one argument is allowed, the list of files to process; one file path per line. | |
| -v | --version | Print version information and exit. | 
| -h | --help | Print a summary of arguments, options, and exit. | 
Interactive Configuration
If some of the options are omitted in the command-line, they may also be configured interactively as shown in the table below. Otherwise these questions are skipped.
| Screen Text | Default | Description | 
|---|---|---|
| Enter number of components in the system? | 2 | Number of components in the system. Default value is set to 2. It can be set between 2 and 4. | 
| Enter number of frames for SAXS data set? | 3000 | Total number of time frames (number of data curves) in the SEC-SAXS data set. Default value is set to 3000. | 
| Enter number of SAXS data point - Nbeg? | 1 | The first data point number in SAXS curve to be processed. Default value is set to 1. | 
| Enter number of SAXS data point - Nend? | 3000 | The last data point number in SAXS curve to be processed. Default value is set to 3000. | 
EFAMIX Input Files
EFAMIX uses the experimental files of SEC-SAXS data set in ASCII format as input files. The program should get the information about the path location of the files (using relative or absolute path components). The files for SEC-SAXS data sets at P12 normally have the following numeration: ‘root-name_NNNNN.dat’, where NNNNN are the frame numbers in ascending order from ‘00001’ to the total number of frames.
Alternatively to enumerating the files at the command-line, the filelist may be provided. In this case, only a file path is accepted as argument. The file shall list all actual input files paths. Path names must be either relative to the working directory, or absolute.
The information about the concentration windows of the components optionally may be provided in a configuration file with the following format (if it is not specified, it will be estimated automatically by EFA analysis):
   Comp1
   140 280
   Comp2
   220 380
EFAMIX Output Files
With each succesful run, EFAMIX creates a set of output files in different subfolders (“Component_and_Concentration_profiles”, “Individual_frames_subtracted”, “Restored_individual_frames_subtracted”, “Singular_value_EFA_plots”), each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.
| Extension | Description | 
|---|---|
| prefix_component_profiles.dat | The scattering profiles of the components restored by EFAMIX. The first column is the S-axis. The second column is the scattering intensity from component1, the third column - component 2, etc. If the option ‘–brief’ is disabled, the information is saved in separate files for each components.Subfolder: “Component_and_Concentration_profiles” | 
| prefix_concentration_profiles.dat | The concentration profiles of the components restored by EFAMIX. The first column is the frame numbering (starting from Nframe_start). The second column is the concentration profile of component1, the third column - component 2, etc. The last column contain the total concentration profile from all components. If the option ‘–brief’ is disabled, the information is saved in separate files for each components.Subfolder: “Component_and_Concentration_profiles” | 
| prefix.log | Contains the information about the input parameters and the estimated concentration window numbers for the components. It is updated during execution of the program. Subfolder: “Component_and_Concentration_profiles” | 
| data_NNNNN_sub.dat | The experimental data with the subtracted buffer signal. The file numbering starts from Nframe_start and ends with Nframe_end. They are created only if the option ‘write-fit’ is enabled.Subfolder: “Individual_frames_subtracted” | 
| data_NNNNN_sub_restored.dat | The restored by EFAMIX data (fit files). The file numbering starts Nframe_start and ends with Nframe_end. They are created only if the option ‘write-fit’ is enabled.Subfolder: “Restored_individual_frames_subtracted” | 
| prefix_forwards*.dat | The information is written to the following files: prefix-‘diag_forwards_N.dat’ and prefix-‘_grad_forwards_N.dat’ (where N is the number of components, it can be set between 2 and 4). It contains information about evolving singular values and their first derivatives (gradients) obtained in forward direction. The files are created only if the option ‘write-fit’ is enabled._Subfolder: “Singular_value_EFA_plots” | 
| prefix_backwards*.dat | The information is written to the following files: prefix-‘diag_backwards_N.dat’ and prefix-‘_grad_backwards_N.dat’ (where N is the number of components, can be set between 2 and 4). It contains information about evolving singular values and their first derivatives (gradients) obtained in backwards direction. The files are created only if the option ‘write-fit’ is enabled._Subfolder: “Singular_value_EFA_plots” | 
| prefix_conc_window*.dat | The information is written to the following files: prefix-‘conc_window_N.dat’ (where N is the number of components, it can be set between 2 and 4). It contains information about the sizes of concentration windows of the components estimated by EFA or defined in interactive mode. The files are created only if the option ‘write-fit’ is enabled._Subfolder: “Singular_value_EFA_plots” | 
Examples
Please note that the prefixes in the examples may be chosen arbitrarily. The values below are chosen for maximum clarity only.
SEC-SAXS data set
Use EFAMIX in AUTOMATIC -mode to restore the scatterinf profiles and concentrations of the components in a two-component protein mixture. The concentration windows for the components will be estimated automatically from the EFA analysis:
$ efamix --Ncomp=2 --mode=automatic --Nframes_total=2400 --sample=1300-1500 --buffer=100-200 --Nbeg=20 --Nend=1000 --prefix=bsa --write-fit bsa/sec*.dat
Time-resolved SAXS data set
Use EFAMIX for processing time-resolved SAXS data set for the cases where the components appear one after another and disappear in the same order. Prior to run, SVD analysis of SAXS data set should be performed to ensure that the number of independent components of the system lies between 2 and 4.
$ efamix --Ncomp=3 --mode=automatic --Nframes_total=2400 --sample=1300-1500 --buffer=100-200 --Nbeg=20 --Nend=1000 --prefix=prion --write-fit --show-progress --component1=1320-1380 --component2=1360-1480 --component3=1400-1490 data/sec*.dat
IEC-SAXS data set
For best results, run EFAMIX in INTERACTIVE mode, customizing the input parameters as required. With the following command all fit files will be written to the output, the progress of the program run will be shown on the screen:
$ efamix --Ncomp=4 --mode=automatic --Nframes_total=2400 --sample=1300-1500 --buffer=100-200 --smin=0.1 --smax=4.0 --prefix=fab_fc --write-fit --show-progress --conc-file=conc_window.con ../SAXS/data/sec*.dat
EFAMIX is also integrated into CHROMIXS GUI and can be called from CHROMIXS via menu ‘File->Configure EFAMIX’.
References:
[1] Golub G. H. & Reinsh C. Singular Value Decomposition and Least Squares Solutions // Numer. Math. (1970), V. 14, p.403-420. https://doi.org/10.1007/BF02163027 [2] Keller H.R., Massart D.L. Evolving factor analysis // Chemometrics and Intelligent Laboratory Systems, (1991), V. 12, Issue 3, p. 209-224. https://doi.org/10.1016/0169-7439(92)80002-L