Simons Global Diversity Panel¶
In this tutorial we will use smcsmc
to analyse the population history of a population from the Simons Global Diversity Panel.
We very arbitrarily chose to analyse the population history of the Mbuti, a South-Central African group with an ancient divergence from the rest of the continent.
Downloading and Converting Data¶
We will only download one chromosome of data for the sake of this tutorial.
First, make a new folder to hold your analysis of the Mbuti:
mkdir smcsmc-mbuti
cd smcsmc-mbuti
We will download a small chromosome from the phased release of the SGDP provided by the Reich Lab and save it as sgdp_phased_chr21.vcf.gz
:
wget https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/phased_data/PS2_multisample_public/cteam_extended.v4.PS2_phase.public.chr21.vcf.gz -O sgdp_phased_chr21.vcf.gz
We then need to convert this sample from vcf
file format to the segments which smcsmc
requires. We can use smcsmc.vcf_to_seg()
.
import smcsmc
smcsmc.vcf_to_seg(
[('sgdp_phased_chr21.vcf.gz', 'S_Mbuti-1')],
output = 'mbuti_chr21.seg.gz',
chroms = [21])
Warning
This can take a few minutes.
Now we have a list of all the segments in the Mbuti individual.
Running the Model¶
We are going to follow a very similar procedure as the Getting Started guide:
import os
smcsmcpath = os.path.expandvars('${CONDA_PREFIX}/bin/smcsmc')
args = {
'EM': '15',
'Np': '10000',
# Submission Parameters
'chunks': '100',
'no_infer_recomb': '',
'smcsmcpath': smcsmcpath,
# Other inference parameters
'mu': '1.25e-9',
'N0': '14312',
'rho': '3e-9',
'calibrate_lag': '1.0',
'tmax': '3.5',
'alpha': '0',
'apf': '2',
'P': '133 133016 31*1',
'VB': '',
'nsam': '2',
# Files
'o': 'run',
'seg': 'mbuti_chr21.seg.gz'
}
If you are able, add in the c
flag to run on a cluster compute system.
Then simply run the model.
smcsmc.run_smcsmc(args)
Todo
Run this model and include a plot of the results.