Monday, May 23, 2011

Clusters Galore: African edition

I gathered about 900 individuals from the Dodecad Project and the literature and run Clusters Galore on them. Dodecad Project members were included if any of the following conditions held:
  1. They were in the North_African_D, North_African_Jews_D, or East_African_D population
  2. They had at least 10% "Northwest African" in the standard K=10 analysis
  3. They had at least 10% "West African" in the standard K=10 analysis
  4. They had at least 10% "East African" in the stanard K=10 analysis
Integrating among multiple datasets meant that the analysis was done on only 1,871 SNPs, which were however sufficient to infer 28 clusters with 9 PCA dimensions retained.

PCA plots

Below are PCA plots of the first nine dimensions; each dimension is paired with the 1st one to create 8 different scatter plots:

Galore/PCA results

All the results can be found in the spreadsheet, which contains:
  • Population Galore results: how many individuals from each population are assigned to each cluster
  • Individual Galore results: what is the probability that each ID belongs in each cluster (in %)
  • Population PCA results: mean positions of populations in the first 20 principal components
  • Individual PCA results: position of individuals in the first 20 principal components.
Some observations

It does appear that within Africa two processes explain most of genetic variation:
  1. Variable affiliation with West Eurasians, with North Africans being most West Eurasian-like, followed by some East Africans such as Ethiopians, and then Maasai
  2. Contrast between farmers and hunter groups such as the San and Pygmies
With respect to Project participants:
  1. All members of East_African_D align with Ethiopians and Ethiopian Jews (cluster #2)
  2. All members of North_African_Jews_D align with a major cluster of the Morocco_Jews of Behar et al. (2010) (cluster #3)
  3. Members of North_African_D are split over three clusters: most are in the Algeria/Libya/North Morocco cluster #4; one is in the Egyptian cluster #5; three are in the Mozabite/Saharan/"Berber" cluster #7
  4. The Other_D participants include many African Americans, as well as some Arabs with some African admixture, etc. Clusters #8 and #10 seems to include many African Americans.
Hopefully this will be useful to project partipants with an African background.


  1. Interesting how Tuscans cluster with Moroccan Jews but not with people from North Africa. I wonder if Greeks would also cluster with them.

  2. Tuscans don't cluster with morocco jews. This an "optical" illusion and the lack of other European people of which french Basque are quite an outliers.
    Going with those maps you'd say that French Basque would be closer to Morocco Jews as well, while we all know that it is not the truth.

  3. Thanks for the analysis, but the # of SNPs is really really low. This can get quite noisy. Perhaps this analysis was best done with only the Henn+HGDP/Behar data (retaining roughly 50K SNPs).