Sunday, February 27, 2011

Clusters Galore with Dodecad populations

Here is a spreadsheet of Clusters Galore analysis with Dodecad populations: 692 reference individuals + 261 Dodecad Project participants from 24 different populations with at least 5 members each:
Assyrian, Scandinavian, Greek, Finnish, S_Italian_Sicilian, Ashkenazi, German, Indian, Portuguese, Armenian, Russian, Spanish, British, Irish, Turkish, N_Italian, Balkans, Iranian, North_African, East_African, French, Chinese, Japanese, Polish
As a reminder to new readers, the Clusters Galore technique consists of applying multidimensional scaling on genomic data to convert ~152,000 SNPs into a number of continuous dimensions capturing most of the variation, followed by employment of MCLUST to cluster individuals along these dimensions.

In total, 65 clusters were obtained when 10 MDS dimensions were retained.

Some observations:
  • Most Greeks and all South Italians/Sicilians continue to fall in the same cluster #4. The fact that the latter population, despite being one of the largest (20 individuals) continues to remain unsplit and distinctive testifies to the fact that it is probably homogeneous and lacks substantial regional inbreeding within it.
  • Cluster #2 includes most Germanic individuals and also the Irish
  • Cluster #5 is made mostly of Central/North Italians
  • Non-Greek Balkan participants fall mostly in cluster #6, which also includes the non-Gypsy admixed reference Romanians
  • Project and reference Iberians (Spaniards and Portuguese) continue to be undifferentiated and distinctive, falling in cluster #14; my comments on South Italians/Sicilians also probably apply to them as well.
  • There is a trace of structure in the Ashkenazi population, which is split into two clusters. This probably underscores the benefits of large samples in the inference of structure, as 25 Ashkenazi Jews have submitted their results to the Project.
  • Project Russians have split affiliations between a circum-Baltic cluster #3 and the Finnish cluster #7.
  • North Africans form two new clusters that do no overlap with either reference Mozabites or Egyptians. There is great variety in North Africa, and the 8 people who have submitted their samples are a good start to learning about this region of the world.
  • The Chinese are split into two, one part aligning with the "southern" Miaozu and one part aligning with the "northern" Japanese.
Please do not ask me which cluster you fall in, as there will be a separate analysis of Project participants identified by their DOD number but without ethnic identifiers, in compliance with the Project's privacy policy.


  1. Thanks, Diekenes. Is there anyway we can tell which clusters are most similar to each other? For example, if we look at a specific cluster (such as #4), which clusters did it share a parent cluster with before breaking up into this more specific clusters? Would a lower K value give us this answer?

  2. An euclidian distance matrix of the clusters could be very interesting too.

    I am quite surprised to see a definite French cluster to appear (which seems to be more common in the reference sample than in the dodecad one).

  3. One can fairly conclude that despite the absolute lack of data on that French sample from Lyons, geneticists somehow selected people who are mostly of the same ethnic background (the Rhône valley, the Alps, Burgundy, ... which is about the autochtonous peopling of the town).

    Nevertheless, as I stated earlier on, Lyons being situated on a remarkable ethnic border (a very ancient one as Ligurian placenames end some miles South of Lyons around Annonay and still today that's the border between half Oïlic Arpitan dialects and Oc Vivaro-Alpine ones), it'd be much more interesting to know whether or not people in Drôme or Ardèche rather cluster with Provençal people than with Lyonnais people.

  4. The North african group has 8 members which are splitted into two clusters
    1)dod168 and dod169 from Msaken Tunisia
    2)DOD330 Tunisia
    DOD363 Algeria
    DOD360 Algeria
    DOD348 Morocco
    DOD361 Algeria
    DOD359 Morocco

    These samples are among the most admixed populations so far analysed