My idea of using zombies with ADMIXTURE is the gift that keeps on giving. Remember that "zombies" are synthetic individuals created from ADMIXTURE output, representing the K inferred ancestral components. They can be viewed as hypothetical ancestral individuals representing each of these K components without any admixture from any of the others.
An interesting problem that often comes up is to compare across different ADMIXTURE runs. I can think of at least three different applications of this:
- To compare components across different K; for example, how does a "West Asian"-centered component at K=5 differ from a similarly-centered component at K=12?
- To compare components across different datasets; for example, how does a "West Asian"-centered component inferred from an existing dataset (e.g., the current Dodecad v3) differ from a "West Asian"-centered one from a new dataset (e.g., the upcoming Dodecad v4, which will also be trained on the valuable new populations of Yunusbayev et al. 2011)
- To compare components across different projects; there has been a proliferation of different ancestry projects since the launching of Dodecad nearly a year ago, and since all of them slightly different individuals/SNPs/terminology, it is quite useful to be able to gauge how one component from one project maps onto other components in other projects.
- Scandinavian
- Volga_Region
- Altaic
- Celto_Germanic
- Caucassian_Anatolian_Balkanic
- Balto_Slavic
- North_Atlantic
I then inferred the ancestry of the MDLP zombies using Dodecad v3, and vice versa. Since Dodecad v3 also includes populations (e.g., Africans) not considered by MDLP, I did not try to map those onto MDLP.
I will comment on the MDLP-to-dv3 mapping:
- The MDLP "Scandinavian" component appears to be West/East European with a little Mediterranean and a little Northeast Asian
- The MDLP "Volga_Region" component appears to be East European with some Northeast Asian
- The MDLP "Altaic" component is West Asian+Northeast Asian+Southeast Asian. Note that in Dodecad v3, the Northeast Asian component peaks at Chukchi, Nganasan, and Koryak, and most other east Eurasian populations have much less of it
- The MDLP "Celto-Germanic" component is (surprisingly) Mediterranean-dominated. One possible interpretation is that in the context of MDLP this captures one aspect of the difference between Southwestern and Northeastern Europe -higher Mediterranean in the former-, whereas the...
- ... MDLP "North-Atlantic" component seems to be entirely West European, and is capturing a different aspect of east-west variation in Europe.
- The MDLP "Balto-Slavic" appears the reverse of the "Celto-Germanic" with lower Mediterranean and reversed East/West European
- Finally, the MDLP "Caucassian_Anatolian_Balkanic" component is predictably mainly West Asian, but with a little Mediterranean and Southwest Asian as well
For example, the first couple of dimensions are dominated by the African/Asian components of Dodecad v3 that are not present in MDLP. Notice, however, the position of "Altaic", right where one might expect to find it between West and East Eurasians.
Limiting ourselves to only European populations, we obtain:
It appears that the "North_Atlantic" component may be centered on a small number of related individuals.
I encourage other genome bloggers to try their own hand at comparing their components with those of other projects, or even their own. This process will be made possible if people using ADMIXTURE follow the simple instructions to convert their output for use with DIYDodecad.
Once Dodecad v4 is off the ground, and if I find time to fully automate the process, I will perhaps try to map all my past calculators (i.e., the initial K=10, Dodecad v3, 'bat', 'euro7', 'weac', 'africa9') onto the new golden standard of the Project.
PS: This analysis was done on ~63k SNPs in common between MDLP and Dodecad v3