Using correlation profiles of synonymous substitutions to infer recombination rates from large-scale sequencing data in (+)ssRNA viruses
python3from https://www.python.org/ (we found running issues using the default Python in MacOS);
- For basic usage, install
mcorrLDGenomefrom your terminal:
go install github.com/kussell-lab/viral-mcorr/cmd/mcorr-gene-aln@latest go install github.com/kussell-lab/viral-mcorr/cmd/mcorrViralGenome@latest go install github.com/kussell-lab/viral-mcorr/cmd/mcorrLDGenome@latest cd $HOME/go/src/github.com/kussell-lab/mcorr/cmd/mcorr-viral-fit pip install $HOME/go/src/github.com/kussell-lab/mcorr/cmd/mcorr-viral-fit
mcorr-viral-fit by cloning this github repository and then using pip to install the program locally:
git clone [email protected]:kussell-lab/viral-mcorr.git pip install ./
$PATHenvironment. In Linux, you can do it in your terminal:
In MacOS, you can do it as follows:
We have tested installation in MacOS Monterey (w/ an M1 chip), using Python 3 and Go 1.15 and 1.16.
Basic usage for inferring recombination parameters
The inference of recombination parameters requires two steps:
Calculate Correlation Profile
For multi-fasta alignments of single genes or whole genomes in which there is a single CDS region, use
mcorr-gene-aln <input MFA file> <output prefix>
To calculate correlation profiles across the CDS region of whole-genome alignments (multiple gene alignments), use
mcorrViralGenome <input XMFA file> <output prefix>
--mate-alnallows for inclusion of a second XMFA file of viral genomes. The flag
--between-cladescan be used when you have two XMFA files to calculate correlation profiles exclusively across sequence pairs in which neither sequence is from the same XMFA file.
The XMFA files should contain only coding sequences and should not include any redundant CDS regions (i.e., CDS regions which code for a subregion of another CDS region should be removed from the XMFA). Gapped regions should be denoted by dashes or Ns. The description of XMFA file can be found in http://darlinglab.org/mauve/user-guide/files.html. We provide two useful pipelines to generate whole-genome alignments:
- from multiple assemblies: https://github.com/kussell-lab/AssemblyAlignmentGenerator;
- from raw reads: https://github.com/kussell-lab/ReferenceAlignmentGenerator
All programs will produce two files:
- a .csv file stores the calculated Correlation Profile, which will be used for fitting in the next step;
- a .json file stores the (intermediate) Correlation Profile for each gene.
Fit the Correlation Profile using
For fitting correlation profiles as described in our paper [link will go here] use
mcorr-viral-fit <.csv file> <output_prefix>
This will produce several files:
<output_prefix>_zero-recombo_best_fit.svgshow the plots of the Correlation Profile, fitting, and residuals for the template-switching recombination model and for the zero recombination case;
<output_prefix>_comparemodels.csvshows the table of fitted parameters for all recombination models (template-switching, fragment-incorporation, and zero-recombination) and AIC values;
<output_prefix>_zero-recombo_residuals.csvincludes residuals for the model with template-switching and the zero-recombination case
<output_prefix>_template-switch_fit_results.csvshows fit results for data and bootstrap replicates to template-switching model (if correlations were analyzed w/
<output_prefix>_template-switch_fit_report.txtshows fit results and bootstrap CIs if correlations were analyzed w/
Basic usage for measuring correlation coefficients for sites across the genome or genes
To measure correlations at individual codons across the genome, you can use
described in our paper [link will go here]:
mcorrLDGenome <input XMFA file> <output prefix>
XMFA files must be formatted in the same way as described for mcorrViralGenome, above. Alternatively, multi-fasta alignments of single CDS regions.
- How to create alignments of viral genomes for use with viral-mcorr.
- Example workflows for using mcorr-gene-aln, mcorrViralGenome, and mcorrLDGenome