Dodecad Ancestry Project: Siberians

Note: color coding in the initially uploaded RAR of individual variation was off; please get the new one. [working link, May 4, 2011]

I have added populations from several sources: HapMap-3, HGDP, Behar et al. (2010), Rasmussen et al. (2010), and the Dodecad Ancestry Project, and ran the most ambitious ADMIXTURE analysis of Eurasian variation yet: 69 populations, and 1,189 individuals in total.

K=15 ADMIXTURE plot:

Admixture proportions for the 69 populations can be found in the spreadsheet.

Population portraits, showing individual variation within populations, can be found in the RAR.

RELATIONSHIP BETWEEN 15 COMPONENTS

The table of Fst distances between the 14 components:

MDS representation:

Hierarchical clustering with complete linkage (this is not a phylogeny):

What has changed

In comparison to the K=10 analysis, the increased resolution allows us to:

South Asians belonged primarily to the South Asian and West Asian components; this South Asian component spilt over to Iran and Central Asia. Now, a new Central-South Asian component, corresponding to the Ancestral North Indian of a recent study is inferred, and a corresponding South Indian component.
HGDP Bedouins and Behar et al. (2010) Saudis take up their own component which I labeled Arabian. This appears to be a subset of the Southwest Asian component of the K=10 analysis
There are several components in Siberian and Central Asian populations, alread discovered in my regional analysis. These are Central Siberian, Nganasan, Koryak, Chukchi, and Altaic which replace the K=10 Northeast Asian component

A final note:

The K=14 analysis revealed a Palestinian-centered "Levantine" cluster; this folded at my K=15 run, and two additional splits occurred (Koryak/Chuckhi and Nganasan).

At this level of resolution, many alternative representations can occur for a given K, and the order in which splits occur can vary; they continue, however, to correlate well with populations. Noise levels seem to be slightly increased, especially for clusters associated with single populations of few individuals, but the broad patterns are quite evident.

Up to now, I have not encountered any nonsensical results, so I will continue this as far as it goes. Regional analyses indicate there is more structure to be discovered, so we'll see how far the data can be pushed.

UPDATE: Razib of Gene Expression worries that the South Indian and Central-South Asian components I have identified may not correspond to ASI/ANI, suggesting that ASI ought to be closer to East Asians.

However, Reich et al. state that:

“Many of the analyses in this study are based on modeling the history of Indo-European andDravidian speaking groups of the Indian subcontinent in terms of a two-way historical mixture ofan “Ancestral North Indian” (ANI) population that is genetically close to Central Asians, MiddleEasterners, and Europeans, and an “Ancestral South Indian” (ASI) population that is not close to any large modern group outside the Indian subcontinent.”

Indeed, this is what I discover, with SIN being about Fst=0.085 from Caucasoids and about Fst=0.1 from East Asians. As I couldn't find raw Fst's between the components in Reich et al.'s paper, I can turn to one of their figures (S2 Fig.1):

It is evident that South Asian groups (except a few with East Eurasian admixture) are arrayed along a cline toward Europeans from a third pole (top of the figure): thus, the ASI, which represents this pole is not particularly related to East Asians, it is about equi-distant to Europeans and East Asians.

It is possible, however, that the correspondence is not perfect, as the North Kannada group which forms the South Indian pole in my analysis may not be as "southern" as the tribals Reich et al. had access to.

Wednesday, November 17, 2010

ADMIXTURE analysis of Eurasian populations with K=15

Data Sources

Useful software

Genome Bloggers

Project Links

Technical stuff