This is an announcement of the new generation of Dodecad ancestry analysis. In comparison to the standard K=10 used since the beginning of the Project:
- Participants' data are now used to enrich the set of reference populations and to help define new ancestral components
- Rather than choosing arbitrary reference populations, I employ a very large set of individuals to capture allele frequencies and then create synthetic individuals ("panmictic zombies") that embody these frequencies; more on this below.
- Results for unrelated project participants will be reported in a separate post, using my new technique of converting unsupervised ADMIXTURE runs into supervised ones. Hence, Project participants can expect to receive new K=12 results; moreover, the fact that this will be done in supervised mode means that it is no longer necessary to process samples in small batches of 10 or so. All current unrelated participants will receive their results in one go, and only future submissions will be processed in batches.
This analysis utilizes results from Project participants (populations with _D endings), as well as synthetic individuals summarizing allele frequencies of East Eurasians, Sub-Saharan Africans, and South Indians (populations with _Z endings)
The framing populations (_Z)
The following _Z populations were included:
- Sub_Saharan_Z: Bantu, Yoruba, Mandenka, San, and Pygmies from HGDP-CEPH
- South_Indian_Z: North Kannadi, Sakilli from Behar et al. (2010), AP_Madiga, AP_Mala, TN_Dalit from Xing et al. (2010), Bhil, Chenchu, Kurumba, Satnami, Madiga, Mala, Kamsali, Onge, Great_Andamanese from Reich et al. (2009)
- Sino_Tibetan_Z: Yizu, Naxi, Han, Tujia from HGDP-CEPH
- Altaic_Z: Tu, Xibo, Mongola, Daur, Hezhen, Oroqen, Yakut from HGDP-CEPH, and Evenk, Buryat from Rasmussen et al. (2010)
- Siberian_Other_Z: Selkup, Ket, Yukagir, Nganasan, Koryak, Chuckchi from Rasmussen et al. (2010)
- Southeast_Asian_Z: Dai, Lahu, Miaozu, Cambodians from HGDP-CEPH, Khmer-Cambodian, Thai from Xing et al. (2010), and Singapore Malay from the Singapore Genome Variation Project
Results of the ADMIXTURE analysis defining the new K=12 components of the Project can be seen below:
Raw proportions can be found in a spreadsheet. There are also population portraits in a zip file, showing individual-level variation.
The 12 components are:
The Fst divergences between the 12 components can be seen in the spreadsheet and also below:
A different way of showing them is via a neighbor-joining tree. Note, however, that this is not a replacement for the Fst table above which alone fully preserves the inter-population relationships:
We can also plot the first few MDS dimensions using synthetic individuals from the 12 components; again, these capture variation only partially:
What comes next?
Hopefully quite soon, I will:
- Report new v2 results for all project participants
- Report new v2 proportions for many other populations not included here