Input Arguments

This section describes the general structure of smcsmc arguments and provides information on how they are interpreted.

General Structure

The user has a single entry point into smcsmc, smcsmc.run_smcsmc. This function takes a single main argument.

Input arguments are always formatted into a dict and require both a key and a value. The key is always the name of the argument, and the value is generally the value of the arguement. In some cases they differ, and it is important to understand when this is the case. All values should be given as strings unless otherwise noted. This is simply a convenience to avoid complicated post processing.

import smcsmc

args = {
   'seg':    'test_seg.seg'
   'nsam':   '4'

}

smcsmc.run_smcsmc(args)

The arguments given entirely determine the smcsmc inference.

Processing of Arguments

There are three main kinds of arguments to smcsmc.

  • Key value pairings: These are the most common arguments, and require both a key and value.
    • e.g. {'chunk': '100'}
  • Boolean arguments: To input a boolean, the name of the argument should be given and an empty string passed as its value. This will be picked up in processing and treated correctly.
    • e.g. {'no-infer-recomb': ''}
  • Mulitple arguments of the same name: Some arguements to smcsmc are passed directly to SCRM and act as psuedo-ms code. In this case, pass arguements having the same name as a vector and smcsmc will process them into the correct number of identically named arguements.
    • e.g. {'eM': ['0.0092 10', '0 5']}

Todo

This is not yet implemented.

smcsmc will also understand that an argument passed with None as a value will be removed entirely. This is a convenience for reusing input.

Arguments

[*] arguments are required. [+] arguments are optional, but one amungst the group is required.

The following are arguments that define properties of the population you are studying. They are fixed and do not change.

Key Value Description
nsam n [*] Set the total number of samples to n
N0 N [+] Set (unscaled) initial population size to N
mu s [+] Set mutation rate (per nucleotide per generation)
t th [+] Set mutation rate (expected number of mutations) for the locus (=4 N0 mu L )
length L [+] Set locus length (nucleotides)
I n s1..sn Use an n-population island model with si individuals sampled
ej t i j Speciation event at t*4N0; creates population i from population j in forward direction
eI t s1..sn Sample s1..sn indiviuals from their populations at time t*4N0 generations

The following arguements describe initial values which will be inferred and updated during runtime.

Key Value Description
rho rho [+] Set recombination rate (per nucleotide per generation)
r r L [+] Set initial per locus recombination rate (=4 N0 L rho) and locus length (L)
eN t n Change the size of all populations to n*N0 at time t*4N0
en t i n Change the size of population i to n*N0 at time t*4N0
eM t m Change the symmetric backward migration rate to m/(npop-1) at time t*4N0
em t i j m Change the backward migration rate from population i to population j to m/(npop-1) at time t*4N0
ema t s11 s12 … Set backward migration rate matrix at time t*4N0

The following arguements define inference related options.

Key value Description
o f [*] Output prefix
seg f [+] Input .seg file
segs f1 f2 … [+] Input .seg files (will be merged into a single .seg file
maxgap n Split .seg files over gaps larger than maxgap (200 kb)
minseg n After splitting ignore segments shorter than minseg (500 kb)
startpos x First locus to process (1)
P s e p Divide time interval [s - e] (generations; s>0) equally on log scale using pattern p (e.g. 1*2+8*1)
Np n Number of particles
seed s Random number seed
calibrate_lag s Accumulate inferred events with a lag of s times the survival time (2)
apf b Auxiliary particle filter: none (0) singletons (1) cherries (2)
dephase   Dephase heterozygous sites (but use phasing for -apf)
ancestral_aware   Assume that haplotype 0 is ancestral
bias_heights t0..tn Set recombination bias times to h0..hn * 4N0
bias_strengths s1..sn Set recombination bias strenghts
arg range Sample posterior ARG at given epoch or epoch range (0-based closed; e.g. 0-10)

These arguments define the behaviour of the parameter updates via stochastic EM or Variational Bayes.

Key Value Description
EM n Number of EM (or VB) iterations-1 (0)
VB   Use Variational Bayes rather than EM (uniform prior for all rates)
cap n Set (unscaled) upper bound on effective population size
chunks n Number of chunks computed in parallel (1)
no_infer_recomb   Do not infer recombination rate
no_m_step   Do not update parameters (but do infer recombination guide)
alpha t Fraction of posterior recombination to mix in to recombination guide (0.0); negative removes files

These are general options.