The design of Dodecad v3

Dodecad v2 was short-lived, as I discovered a way to improve it shortly after I announced it.

The first step was to carry out an extensive K=3 ADMIXTURE analysis of about 130 different populations and about 2,000 individuals from Europe, Asia, and Africa. Using the allele frequency results of this analysis I was able to create the most comprehensive synthetic individuals to represent West Eurasians, Asians, and Sub-Saharan Africans.

Subsequently, I carried out an analysis of East Eurasian populations using the West Eurasian/Sub-Saharan synthetic individuals as controls, as well as an analysis of Sub-Saharan populations using the West Eurasian/Asian individuals as controls.

In East Eurasia, I was able to infer the existence of two components, one centered in the extreme northeast, another in the southeast, with many other populations arrayed between these two extremes:

In Sub-Saharan Africa, the primary division was between San, Mbuti, and Biaka Pygmies (whom I have called "Palaeo-Africans") and the rest (Yoruba, Mandenka, and Bantu, "Neo-Africans"):

Now, I had four synthetic "framing populations": Neo-Africans, Palaeo-Africans, Northeast Asians and Southeast Asians, created from hundreds of individuals from several different populations:
  1. I did not have to choose a particular population (e.g., Chinese) to represent East Asia
  2. I did not have to aggregate individuals from populations with variable levels of non-East Asian admixture
I now used my South Asian populations, together with Neo-African, West Eurasian, Northeast and Southeast Asian controls to extract a South Asian specific component:

Armed with these 5 synthetic "framing" populations, I carried out a K=12 analysis with my West Eurasian, South Asian, and North/East African populations (1,247 individuals; 69 populations):

And, finally, I generated 50 synthetic individuals from each of the 12 inferred components to create a dataset of 600 individuals that will be the basis of Dodecad v3.

Below is the table of Fst divergences:

The following MDS plots show the first 10 dimensions of variation of these individuals:

Finally, here is a neighbor-joining tree of the 12 components:
(to be continued)


