ranch
Manual
Introduction
RANCH generates a pool of n independent models based upon sequence and structural information.
For multi-domain proteins where high-resolution structures for individual subunits/domains are available, these structures and distance/orientation information derived from them can be used as rigid-bodies and/or constraints in EOM model generation. For proteins expected to be intrinsically unfolded no rigid bodies are required as input, and completely random configurations of the alpha-carbon trace are created based upon the sequence alone.
Crystallographic symmetry if required must be defined by the user as an appropriately arranged set of input rigid bodies (CIF or PDB format, with the user applying the fixed flag to maintain the desired orientation of such bodies). RANCH will not apply symmetry operations.
Inter-domain/subunit contacts can be imposed to generate homo/hetero oligomers and complexes by providing distance constraints.
Running RANCH
Command-Line
Usage:
$ ranch [OPTIONS] <ASSIGNMENT> <SEQUENCE> [MODEL(S)]
RANCH accepts absolute as well as relative paths to the input SEQUENCE, ASSIGNMENT and atomic coordinate FILE(s).
In all cases the cordinate input may be either in .pdb or .cif format. The OPTIONS known by RANCH are described in the next section.
Arguments and Options
RANCH requires the following command line arguments:
| Argument | Description |
|---|---|
| SEQUENCE | Required. The amino-acid residue sequence data (.seq) of the protein/peptide(s) in FASTA format, in a single file. |
| ASSIGNMENT | Required. Domain assignments. The assignment of chain ID and residue numbering corresponding to structured and unstructured sequence. Here can be defined sequence regions corresponding to input CIF/PDB files and also user defined stretches of ideal strand and helix. |
| MODEL(S) | Optional. The atomic coordinate files of any input rigid bodies in .pdb or .cif format. |
RANCH recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.
| Short option | Long option | Description |
|---|---|---|
| -p | --prefix=<ARG> | output filename prefix (default: ranch) |
| --model-format=<FMT> | Format of 3D models, one of: cif, pdb (default: cif) | |
| --offset=<ARG> | output file numbering offset (default: 0) | |
| --repetitions=<ARG> | number of output model files (CIF); default: 10000 | |
| --database=<FILE> | Quasi-Ramachandran database file (dihedral map). NOTE that three designations in the ASSIGNMENT file can be used that define the dihedral angles used: disordered (for intrinsically disordered and unstructured regions), denatured (for chemically denatured proteins/peptides) and compact (for compact structure). | |
| --database-threshold=<ARG> | probabilities from the Quasi-Ramachandran dihedral map less than this threshold will be set to 0.0 (default: 0.0025) | |
| --distance-constraints=<FILE> | File listing distance constraints between specified sequence positions/amino-acids | |
| --seed=<INT> | Set the seed for the random number generator | |
| -v | --version | Print version information and exit. |
| -h | --help | Print a summary of arguments, options, and exit. |
Runtime Output
RANCH does not have any runtime output.
Graphical Interface
As an alternative to usage from the command-line, RANCH may also be run through the EOM wizard from the ATSAS Application Launcher.
This wizard allows convenient generation and selection of pools with RANCH, as well as their processing and analysis. See EOM for more details.
RANCH Input Files
RANCH requires a residue sequence data (.seq) and accepts optional atomic coordinate files either in .pdb or .cif format.
Any file path provided may be either a relative or absolute.
Assignment File
Distance Constraints File
RANCH Output Files
RANCH writes atomic coordinate data in .pdb or .cif format on output. By default the coordinate files are written to the current directory, or a directory may be specified as part of the prefix.
Examples
Unstructured peptides
Use RANCH to generate a pool of 10000 models based only on amino-acid sequence sequence.fasta and write the models to the directory pool:
$ ranch --repetitions 10000 --prefix pool/pep_ assignment.txt sequence.fasta
Example of the FASTA sequence file format:
> A
DSHAKRHHGYKRKFHEKHHSHRGYADSHAKRHHGYKRKFHEKHHSHRGYA
AAAAAAAAAAARKFHEKHHSHRGYADSHAKRHHGYKRKFHEKHHSHRGYA
In this case a single chain (A) of 100 residue length is defined. Additional chains can be appended to the file following this format. Example of the assignment file format:
A 1 100 disordered
In this case a single chain (A), generate coordinates for residues 1 to 100 using the Quasi-Ramachandran database for dihedral angles.
User defined regions of secondary structure
Use RANCH to generate a pool of 10000 models with stretches of ideal secondary structure:
$ ranch --repetitions 10000 --prefix pool/pep_ assignment_ss.txt sequence.fasta
Example of the assignment file format (assignment_ss.txt):
A 1 10 disordered
A 11 22 helix
A 22 26 disordered
A 27 37 strand
A 38 100 disordered
In this case a single chain (A), generate coordinates for unstructured residues 1-10, 22-26 & 38-100 using the disordered Quasi-Ramachandran database for dihedral angles, and additionally use dihdral angles from the helical and beta- strand regions of the Quasi-Ramachandran database for residues 11-22 and 27-37, respectively.
Protein homo-oligomers with user defined coordinates for several domains
Use RANCH to generate a pool of 10000 multi-chain models with an interface defined by input PDB/CIF orientation:
$ ranch --repetitions 10000 --prefix pool/complex_ assignment.txt sequence.seq dom1.pdb dom2a.pdb dom2b.pdb
Example of the assignment file format (assignment.txt):
#assignment.txt
A 1 218 structure fixed
A 219 228 disordered
A 229 387 structure
B 1 218 structure fixed
B 219 228 disordered
B 229 387 structure
In this case a two-domain protein forms a dimer by its first domain (dom1.pdb). The two copies of the second domain (dom2a.pdb and dom2b.pdb) are connected to the first domain by unstructured regions (219-228). The interface is defined by the user input coordinate file (dom1.pdb) and this pre-oriented coordinate file is fixed in position. RANCH will allow the unstructured regions to undergo conformational sampling while the interface is maintained.
Instead of using fixed dimeric interface of the first domain, one may split the dom1.pdb into two files and apply distance constraints to define the interface using the following assignment.txt and a distances.txt file:
# assignment.txt
A 1 218 structure
A 219 228 disordered
A 229 387 structure
B 1 218 structure
B 219 228 disordered
B 229 387 structure
# distances.txt
A 140 145 B 140 145 15
In the above case a 15 angstrom upper limit distance is defined between residues 140-145 of chain A and residues 140-145 of chain B.