Introduction

SMCSMC, short for Sequential Monte Carlo for the Sequentially Markovian Coalescent, is a method for estimating ancestral population size and migration history from sequence data. It has several advantages over comparable methods, especially when you are interested in analysing complex demographic models.

The method uses a particle filter to sample from the posterior distribution of trees along the sequence and Variational Bayes to infer epoch specific demographic parameters over a given number of iterations.

smcsmc takes as input optionally phased sequencing data formatted as segments, and provides utilities for analysing and visualising the inferred ancestral rates.

Installation

We highly recommend installing SMCSMC from conda, as it comes packaged with all necessary dependencies. A seperate guide for manual compilation may be found in the developer reference. See here for a helpful guide to installing and using conda to manage programs.

First add both conda-forge and terhorst to your channel lists (if they are not there already), then install smcsmc.

conda --add channel conda-forge
conda --add channel terhorst

conda install smcsmc

Basic Usage

To use SMCSMC, start a python session and import the smcsmc module. As a part of the installation above, two binaries are installed into the conda-bin, smcsmc (inference) and scrm (simulation). The front end, smcsmc is a wrapper around these binaries providing convenient functions for data manipulation, conversion, plotting, and utilities surrounding the workflow for analysing sequences with SMCSMC. With a test seg file such as this one, the following will run a default session of SMCSMC.

import smcsmc

test_args = {
        `seg`: `test_seg.seg`,
        `nsam`: 4
}

smcsmc.utils.run_smcsmc(test_args)

Follow the Getting Started guide to become familiar with the basic structure and function of SMCSMC commands, then look at one of the tutorials for analysing simulated or real data. For a more complete guide to arguments, see Input Arguments. Alternatively the cli can be used with identical results.

smc2 -nsam 4 -seg test_seg.seg

Other Methods

SMCSMC is part of the PopSim consortium, and we are actively involved in building a framework to standardize population genetic analyses. Part of this involves making it easy to run the same analysis with many different methods. We have built smcsmc with this goal in mind. For the latest information about comparisons between different population genetic software, including smc++, stairwayplot, msmc, and dadi/fastcoal, check out the PopSim analysis repository.

_images/popsim.png

Population history of a European-acting individual inferred from five replicates of the stdpopsim.homo_sapiens.GutenkunstThreePopOutOfAfrica model of human history.

Citation

If you use smcsmc in your work, please cite the following article:

  1. Henderson, D., Zhu, S. (Joe), & Lunter, G. (2018). Demographic inference using particle filters for continuous Markov jump processes. BioRxiv, 382218. https://doi.org/10.1101/382218
  2. Staab, P. R., Zhu, S., Metzler, D., & Lunter, G. (2015). scrm: efficiently simulating long sequences using the approximated coalescent with recombination. Bioinformatics, 31(10), 1680–1682. https://doi.org/10.1093/bioinformatics/btu861