Wednesday, November 17, 2010

ADMIXTURE analysis of Eurasian populations with K=15

Note: color coding in the initially uploaded RAR of individual variation was off; please get the new one. [working link, May 4, 2011]

I have added populations from several sources: HapMap-3, HGDP, Behar et al. (2010), Rasmussen et al. (2010), and the Dodecad Ancestry Project, and ran the most ambitious ADMIXTURE analysis of Eurasian variation yet: 69 populations, and 1,189 individuals in total.

K=15 ADMIXTURE plot:

Admixture proportions for the 69 populations can be found in the spreadsheet.

Population portraits, showing individual variation within populations, can be found in the RAR.


The table of Fst distances between the 14 components:

MDS representation:

Hierarchical clustering with complete linkage (this is not a phylogeny):
What has changed

In comparison to the K=10 analysis, the increased resolution allows us to:
  1. South Asians belonged primarily to the South Asian and West Asian components; this South Asian component spilt over to Iran and Central Asia. Now, a new Central-South Asian component, corresponding to the Ancestral North Indian of a recent study is inferred, and a corresponding South Indian component.
  2. HGDP Bedouins and Behar et al. (2010) Saudis take up their own component which I labeled Arabian. This appears to be a subset of the Southwest Asian component of the K=10 analysis
  3. There are several components in Siberian and Central Asian populations, alread discovered in my regional analysis. These are Central Siberian, Nganasan, Koryak, Chukchi, and Altaic which replace the K=10 Northeast Asian component

A final note:

The K=14 analysis revealed a Palestinian-centered "Levantine" cluster; this folded at my K=15 run, and two additional splits occurred (Koryak/Chuckhi and Nganasan).

At this level of resolution, many alternative representations can occur for a given K, and the order in which splits occur can vary; they continue, however, to correlate well with populations. Noise levels seem to be slightly increased, especially for clusters associated with single populations of few individuals, but the broad patterns are quite evident.

Up to now, I have not encountered any nonsensical results, so I will continue this as far as it goes. Regional analyses indicate there is more structure to be discovered, so we'll see how far the data can be pushed.

UPDATE: Razib of Gene Expression worries that the South Indian and Central-South Asian components I have identified may not correspond to ASI/ANI, suggesting that ASI ought to be closer to East Asians.

However, Reich et al. state that:
“Many of the analyses in this study are based on modeling the history of Indo-European andDravidian speaking groups of the Indian subcontinent in terms of a two-way historical mixture ofan “Ancestral North Indian” (ANI) population that is genetically close to Central Asians, MiddleEasterners, and Europeans, and an “Ancestral South Indian” (ASI) population that is not close to any large modern group outside the Indian subcontinent.”

Indeed, this is what I discover, with SIN being about Fst=0.085 from Caucasoids and about Fst=0.1 from East Asians. As I couldn't find raw Fst's between the components in Reich et al.'s paper, I can turn to one of their figures (S2 Fig.1):

It is evident that South Asian groups (except a few with East Eurasian admixture) are arrayed along a cline toward Europeans from a third pole (top of the figure): thus, the ASI, which represents this pole is not particularly related to East Asians, it is about equi-distant to Europeans and East Asians.

It is possible, however, that the correspondence is not perfect, as the North Kannada group which forms the South Indian pole in my analysis may not be as "southern" as the tribals Reich et al. had access to.


  1. Something doesn't make sense here..West Asian which peaks in arabics, is the closest to Northern Europeans (NEU) ???

  2. First of all, West Asian does not peak in Arabics.
    Second, I don't see what your problem is; Arabs are different from Northern Europeans because they are made of components in different proportions, some of them (like the East African) quite divergent.

  3. well yes, the West Asian peaks in levants. But I find quite strange that NEU are closer to WAS than to SEU

  4. Superb job as usual Dienekes.

    The split in between the Central South Asian and South Asian component is very helpful in the Fertile Crescent analysis I'm doing now. (

    That Central South Asian component in Europeans picks up here quite a bit here. You've added more Central and South Asian populations I see. Do you think that is shifting the overall weighting toward Asia?

    One comment on the Arab and Southwest Asian components: It may be helpful to have Yemenis in the mix:

  5. That Central South Asian component in Europeans picks up here quite a bit here. You've added more Central and South Asian populations I see. Do you think that is shifting the overall weighting toward Asia?

    Perhaps, but the number of individuals really does not affect the emergence of clusters, their distinctiveness does. Yorubans and Nganasans get their own clusters even though there are few of them in the data.

  6. page 40 of the supplements lays out the model which seems to comport with their results:

    - 4,000 gens ago Split of West African and Eurasian ancestors

    - 2,000 gens ago Split of ANI and ASI ancestors

    - 1,700 gens ago Split of Asian populations (‘proto-East Asia’, ASI, and Onge)

    your quote of reich is correct. my point is that they should be closer to ESA than any of the west eurasian groups, albeit it would be a close thing. ASI is clearly not a linear combination of west eurasians and east asians; it went on its own path. also, remember that the lowbound proportion of ANI is 40%, and highbound is 70%, by population. so the indians should be a little biased toward europeans, especially when considering that ANI is very close to west eurasians, while ASI is only closer to to ESA than west eurasians, not close.

    of course, neither your components or the reich et al. model are anything more than approximate mappings onto reach ancestral populations. so no point in belaboring these details, since the reich et al. model itself has some fudge. rather the important is to remember that some of your K's may be compounds themselves. the two elements you identified correlate with ANI/ASI, but they're not isomomorphic.

  7. @Dienekes,

    Could you please provide the eigenvalues for the PCA. (It's impossible to judge distances without knowing what the eigenvalues are.)

  8. The WAS group is interesting as they are closer to the other Caucasoid groups even the North Africans, and the Arab group than SEU. The Fst difference is minor but still interesting. Could the WAS group be ancestral Caucasoid?

    The NEU group indicates it originated east of Europe close to the Caucasus, WAS group, and with closer contacts with the "Indian" groups. Andronovo culture perhaps.

  9. I agree, great work.

    One thing I noticed is that the Arabian cluster is closed to ssa than either of the European clusters which is different than one of your previous posts. The difference isn't much anyways and I'd guess that the results are expected to be slightly different since your considering different populations.

    It is very interesting that west Asian is closer to neu than to seu. This tells me that neu was not created by people moving north from Southerners Europe refugees fairing the ice age. I believe this to be the case because neu is no closer to seu than they are to west Asians but also because the neu cluster would not have "absorbed" more west Asian than that absorbed by seu after the ice age if neu were just northern migrants from the refugees.

    I think this post has given us a better understanding of the origin of the neu component which is something I really want to figure out.

  10. Could it be possible to add Native American samples to the project ? That would even more complete and worlwide but anyways good job.

  11. Its too bad their there isn't a Swiss group. During the next submission time ill try to somehow recruit some Swiss (especially ger-swiss) by spreading the message in forums.

    Dienekes, do you have any Swiss in your database?

  12. Dienekes, on your comment:

    "Perhaps, but the number of individuals really does not affect the emergence of clusters, their distinctiveness does."

    I would agree. Just wondering if the weighting is correct.

    In any case, at k=15 with this weight of populations, the CSA components in Western Europe come up out of the noise, which make them easier to analyze.

  13. This comment has been removed by the author.