Dodecad Ancestry Project: The design of Dodecad v3

Tuesday, June 21, 2011

The design of Dodecad v3

Dodecad v2 was short-lived, as I discovered a way to improve it shortly after I announced it.

The first step was to carry out an extensive K=3 ADMIXTURE analysis of about 130 different populations and about 2,000 individuals from Europe, Asia, and Africa. Using the allele frequency results of this analysis I was able to create the most comprehensive synthetic individuals to represent West Eurasians, Asians, and Sub-Saharan Africans.

Subsequently, I carried out an analysis of East Eurasian populations using the West Eurasian/Sub-Saharan synthetic individuals as controls, as well as an analysis of Sub-Saharan populations using the West Eurasian/Asian individuals as controls.

In East Eurasia, I was able to infer the existence of two components, one centered in the extreme northeast, another in the southeast, with many other populations arrayed between these two extremes:

In Sub-Saharan Africa, the primary division was between San, Mbuti, and Biaka Pygmies (whom I have called "Palaeo-Africans") and the rest (Yoruba, Mandenka, and Bantu, "Neo-Africans"):

Now, I had four synthetic "framing populations": Neo-Africans, Palaeo-Africans, Northeast Asians and Southeast Asians, created from hundreds of individuals from several different populations:

I did not have to choose a particular population (e.g., Chinese) to represent East Asia
I did not have to aggregate individuals from populations with variable levels of non-East Asian admixture

I now used my South Asian populations, together with Neo-African, West Eurasian, Northeast and Southeast Asian controls to extract a South Asian specific component:

Armed with these 5 synthetic "framing" populations, I carried out a K=12 analysis with my West Eurasian, South Asian, and North/East African populations (1,247 individuals; 69 populations):

And, finally, I generated 50 synthetic individuals from each of the 12 inferred components to create a dataset of 600 individuals that will be the basis of Dodecad v3.

Below is the table of Fst divergences:

The following MDS plots show the first 10 dimensions of variation of these individuals:

Finally, here is a neighbor-joining tree of the 12 components:

(to be continued)

13 comments:

Onur DincerJune 22, 2011 at 1:08 AM
What does O_Italian mean? They have relatively (for Italians and Europeans in general) significant Mongoloid admixture according to the above analysis.
ReplyDelete
Replies
Dodecad ProjectJune 22, 2011 at 1:34 AM
O_Italian is Other Italian, and that is all due to a single individual that I am waiting to hear from to see whether he/she has any explanation for these results. I will also carry another data cleanup once I'm done with this, to detect submitted relatives or outliers that likely misreported their ancestry. This is part of the reason why I am not reporting raw averages at this time, as I have not cleaned up all the latest submissions.

Part of the (to be continued) involves visually inspecting the population portraits to catch outliers such as the one contributing the "Northeast Asian" in the O_Italian sample.
ReplyDelete
Replies
Joshua LipsonJune 22, 2011 at 5:22 AM
Where my Ashkenazim at?
ReplyDelete
Replies
MauriJune 22, 2011 at 9:09 AM
Hi, I wonder if it is possible to see the variation of each admix result in every national group.
ReplyDelete
Replies
princenuadhaJune 22, 2011 at 7:13 PM
After looking at the fst divergence table I suspect that the East Eurpean in more "native" than Western European, ie the west European component has more ancestry that only recently diverged. I also think a significant part of the West European came from around the caucuses (just north) relatively recently.

The reasons I think this is because W.A. is close to W.A. and because W.A. is closer to W.E. than E.E., which suggests that there was a migration between the two since there further apart geographically. Also Mediterranean is closer to E.E. than to W.E. which further suggests that E.E. is more "native" to Europe like Mediterranean, especially since Mediterranean is somewhat of an isolated population (diverged earlier than other componentsvin the area). And while I think this is a weak indicator I think the fact that Northeast Eurasian is closer to W.E. than E.E. AND W.A. even though the geographic distance is bigger fits with the idea that a significant part of W.E. came from north of the caucasus.
ReplyDelete
Replies
AnonymousJune 23, 2011 at 9:23 AM
I'd like to see myself on more plots with coordinates listed.
ReplyDelete
Replies
RealistJuly 17, 2012 at 6:49 AM
Hi, I was wondering could you explain what Palaeo and Neo African means? I'm a little confused
ReplyDelete
Replies
UnknownJune 24, 2013 at 6:45 PM
Regarding the higher S.Asian Score in the Iranian population. While some of this is certainly be recent, most of this likely reflects admixture acquired through pre-LGM, and likely before there much of the defining variation in Caucasoids. Because West Asians (and especially Iranians) defined the split between S.Asians (and later Europeans) one would expect some residual component. However, Global PCA analysis, inclusive of a good number of populations, does not quite suggest an admixture approaching what is suggestive of the Dodecad V3 results. For S.Asian admixture, I tend to believe in the admixture that is suggestive from the distribution of haplogroup L1, and also more in-line with many-to-most other calculators. Around 4%, in through the north, although southern iran varies between, 6- 12%
ReplyDelete
Replies
Kevin McElroyMarch 26, 2014 at 7:11 AM
Do people still on reply here? I have a question about Jewish ancestry.
ReplyDelete
Replies
The Dandelion KidOctober 4, 2014 at 6:13 PM
Wow, way over my head.
ReplyDelete
Replies
AnitaMarch 9, 2015 at 1:06 AM
Is there any documentation for beginners for gedmatch results?
ReplyDelete
Replies

Add comment

Tuesday, June 21, 2011

The design of Dodecad v3

13 comments:

Data Sources

Useful software

Genome Bloggers

Project Links

Technical stuff