Naturally, I wanted to see how well this works in practice, so I went through the ancestry thread to find some test cases.
DOD006 reports half North Italian and half Ashkenazi Jewish ancestry. Using 5 Dodecad Project North Italians and 25 Dodecad Project Ashkenazi Jews, I estimate his/her ancestry as 24.9% North Italian and 75.1% Ashkenazi Jewish. Substituting the HGDP North Italian sample (from Bergamo) for the Dodecad one, I obtain values of 27.1% N.I. and 72.9% AJ. Based on these results, I would wager that the North Italian ancestor was half Jewish, or otherwise atypical for that population.
DOD073 reports half German and half Irish ancestry. Using 17 Dodecad Project Irish and 11 Dodecad Project Germans as references, I estimate his/her ancestry as 55.9% Irish and 44.1% German. This seems reasonable, given the limitations of the algorithm and the relative closeness of the two populations.
DOD188 reports half Sicilian, half Polish ancestry. Using 6 Poles and 20 South Italians/Sicilians from the Dodecad Project, I estimate his/her ancestry as 40.6% Polish and 59.4% Sicilian. Is this slightly worse result due to the algorithm's limitations, or, as I suspect, to the smaller Polish sample?
DOD014 is a very interesting case reporting half Greek half South Italian/Sicilian ancestry. Given the close relationship between these two populations, I did not know what to expect, and the result of 30.6% Greek 69.4% South Italian/Sicilian probably indicates the difficulty of obtaining accurate estimates for admixture between related populations.
DOD245 suggests an approximate breakup of 50% W African, 25% Ashkenazi, and 25% N European/English with some Native American. Using HGDP Yoruba, Dodecad Ashkenazi, and 17 Dodecad British, I estimate 50.5% W African, 24.4% Ashkenazi, and 25.1% British which seems right on the money, thanks, perhaps, to the large reference samples from well-differentiated populations.
I revisited Joe Pickrell whose ancestry I do not know fully except for the following:
- 1 Ashkenazi great grandparent
- 1 Italian grandparent
- the Ashkenazi component is close to the expected 12.5%, given the randomness of three generations between a great-grandparent and his descendant
- the Italian is more than the expected 25%, and this could be explained in many different ways, e.g., part-Italian descent of the non-Italian ancestors, or descent from some non-British, non-Italian white Europeans.
Zack finds his Dodecad results (DOD128) to be compatible with a quarter Egyptian ancestry, finding his South Asian ancestry to be more similar to Punjabis (although he has no data for Punjabis). Using Pakistani Punjabis from Xing et al. (2010) and Behar et al. (2010) Egyptians as references requires me to drop the number of markers to ~38k, but the result of the supervised ADMIXTURE analysis is 77.4% Punjabi and 22.6% Egyptian, which seems compatible with what he expected.
Finally, another difficult case is DOD329 who is 3/4 Norwegian and 1/4 Swedish with a little "forest Finn". Judging from the K=10 results for this sample (only 0.4% East Asian), I don't think there is much "forest Finn" in his/her genome. Using 7 Dodecad Swedes and 6 Dodecad Norwegians as references, I obtain 46.8% and 53.2% which is again appropriately "off" given both the small reference samples and close relatedness of these two populations.
Concordance between self-reported and genomic ancestry
Consider DOD375 of Spanish origin (from Valencia). I ran supervised ADMIXTURE analysis using Behar et al. (2010) Spaniards and 25 HapMap Mexicans as references. Not surprisingly, this individual turns out to be 100% Spanish using this test.
Now, consider the individual who prompted my recent plea for accurate self-reporting of ancestry. I had hard evidence that this individual, who also claimed full Spanish ancestry, was in fact part Mexican. Nonetheless, I decided to make the case airtight by performing exactly the type of test described in the previous paragraph. The result: 76% Spanish and 24% Mexican, in agreement with a single Mexican grandparent.
This type of analysis does seem to work best when good-sized samples of the ancestral populations are available, and these populations are well-differentiated genetically.
From an anthropological viewpoint, it could be useful for populations with well-known admixture histories, such as those of the New World or parts of Central Asia.
It could also be useful as a confirmatory tool to compare self-reported vs. genomic ancestry.