Input Arguments¶
This section describes the general structure of smcsmc
arguments and provides information on how they are interpreted.
General Structure¶
The user has a single entry point into smcsmc
, smcsmc.run_smcsmc
. This function takes a single main argument.
Input arguments are always formatted into a dict and require both a key and a value. The key is always the name of the argument, and the value is generally the value of the arguement. In some cases they differ, and it is important to understand when this is the case. All values should be given as strings unless otherwise noted. This is simply a convenience to avoid complicated post processing.
import smcsmc
args = {
'seg': 'test_seg.seg'
'nsam': '4'
}
smcsmc.run_smcsmc(args)
The arguments given entirely determine the smcsmc
inference.
Processing of Arguments¶
There are three main kinds of arguments to smcsmc
.
- Key value pairings: These are the most common arguments, and require both a key and value.
- e.g.
{'chunk': '100'}
- e.g.
- Boolean arguments: To input a boolean, the name of the argument should be given and an empty string passed as its value. This will be picked up in processing and treated correctly.
- e.g.
{'no-infer-recomb': ''}
- e.g.
- Mulitple arguments of the same name: Some arguements to
smcsmc
are passed directly toSCRM
and act as psuedo-ms
code. In this case, pass arguements having the same name as a vector andsmcsmc
will process them into the correct number of identically named arguements.- e.g.
{'eM': ['0.0092 10', '0 5']}
- e.g.
Todo
This is not yet implemented.
smcsmc
will also understand that an argument passed with None
as a value will be removed entirely. This is a convenience for reusing input.
Arguments¶
[*]
arguments are required. [+]
arguments are optional, but one amungst the group is required.
The following are arguments that define properties of the population you are studying. They are fixed and do not change.
Key | Value | Description |
---|---|---|
nsam | n | [*] Set the total number of samples to n |
N0 | N | [+] Set (unscaled) initial population size to N |
mu | s | [+] Set mutation rate (per nucleotide per generation) |
t | th | [+] Set mutation rate (expected number of mutations) for the locus (=4 N0 mu L ) |
length | L | [+] Set locus length (nucleotides) |
I | n s1..sn | Use an n-population island model with si individuals sampled |
ej | t i j | Speciation event at t*4N0; creates population i from population j in forward direction |
eI | t s1..sn | Sample s1..sn indiviuals from their populations at time t*4N0 generations |
The following arguements describe initial values which will be inferred and updated during runtime.
Key | Value | Description |
---|---|---|
rho | rho | [+] Set recombination rate (per nucleotide per generation) |
r | r L | [+] Set initial per locus recombination rate (=4 N0 L rho) and locus length (L) |
eN | t n | Change the size of all populations to n*N0 at time t*4N0 |
en | t i n | Change the size of population i to n*N0 at time t*4N0 |
eM | t m | Change the symmetric backward migration rate to m/(npop-1) at time t*4N0 |
em | t i j m | Change the backward migration rate from population i to population j to m/(npop-1) at time t*4N0 |
ema | t s11 s12 … | Set backward migration rate matrix at time t*4N0 |
The following arguements define inference related options.
Key | value | Description |
---|---|---|
o | f | [*] Output prefix |
seg | f | [+] Input .seg file |
segs | f1 f2 … | [+] Input .seg files (will be merged into a single .seg file |
maxgap | n | Split .seg files over gaps larger than maxgap (200 kb) |
minseg | n | After splitting ignore segments shorter than minseg (500 kb) |
startpos | x | First locus to process (1) |
P | s e p | Divide time interval [s - e] (generations; s>0) equally on log scale using pattern p (e.g. 1*2+8*1) |
Np | n | Number of particles |
seed | s | Random number seed |
calibrate_lag | s | Accumulate inferred events with a lag of s times the survival time (2) |
apf | b | Auxiliary particle filter: none (0) singletons (1) cherries (2) |
dephase | Dephase heterozygous sites (but use phasing for -apf) | |
ancestral_aware | Assume that haplotype 0 is ancestral | |
bias_heights | t0..tn | Set recombination bias times to h0..hn * 4N0 |
bias_strengths | s1..sn | Set recombination bias strenghts |
arg | range | Sample posterior ARG at given epoch or epoch range (0-based closed; e.g. 0-10) |
These arguments define the behaviour of the parameter updates via stochastic EM or Variational Bayes.
Key | Value | Description |
---|---|---|
EM | n | Number of EM (or VB) iterations-1 (0) |
VB | Use Variational Bayes rather than EM (uniform prior for all rates) | |
cap | n | Set (unscaled) upper bound on effective population size |
chunks | n | Number of chunks computed in parallel (1) |
no_infer_recomb | Do not infer recombination rate | |
no_m_step | Do not update parameters (but do infer recombination guide) | |
alpha | t | Fraction of posterior recombination to mix in to recombination guide (0.0); negative removes files |
These are general options.