Saturday, January 14, 2012

fastIBD analysis of Balkans/West Asia

Now that I've discovered a way to boost Clusters Galore analysis even further by using fastIBD, I will start experimenting with different regional populations. This analysis took about 5 hours to complete, so it appears to be quite practical.

For my first experiment, I carry out an analysis of various populations from the Balkans and West Asia.

Clusters Galore

27 different clusters were inferred with 17 MDS dimensions. Some interesting findings:
  • For the first time there emerge a couple of clusters that appear to be quite specific to Armenians (#2 and #3). 
  • Similarly, Assyrians are broken to a few clusters that appear fairly specific to them  (#9-11)
  • Georgians are split into three clusters, one of which (#14) is linked with the neighboring Abkhasians, who in turn have their own exclusive cluster (#25)
  • The cluster modal in Greeks (#6) includes 14 of 19 Greek participants, and a few Greeks are also in the Balkan cluster (#8) and an Iranian-Turkish cluster (#4)
  • The Behar Cypriot sample also splits into two, and the few Turkish Cypriot participants link to one of them (#13)
  • The Ossetian project participant links to one of the three North_Ossetian clusters
  • The major Balkan cluster (#8) still defies resolution. I am certain, however, that structure in this cluster will be uncovered with more participation. MCLUST adapts the cluster size and shape, and a "big", inclusive cluster spanning the Balkans appears more parsimonious than smaller clusters centered on the different groups. With larger participation, I anticipate that regional structure will be uncovered in the Balkans as well.
I cannot stress the importance of participation strongly enough. When groups have more participants, it is possible to both:

  1. Discover group-specific clusters, by identifying what is common between members of groups
  2. Discover within-group clusters, by identifying what is different between members of groups
For example, the great participation of Armenians in the Project has now allowed me to discover structure within the Armenian population. It appears, that cluster #2 corresponds to a more "western" Armenian group, and #3 to a more "eastern" one, with some overlap between the two.

Inter-population IBD

You can also see a visual representation of inter-population IBD:

I have only included populations with 5+ participants in this representation. Reddish shades express high IBD sharing; bluish ones low one. The heatmap has been scaled by row.

As you might expect, values across the diagonal are "reddish", since individuals within populations tend to have high IBD sharing with each other.

A few features "pop out" of the screen. Going from top to bottom:
  • Intra-Iranic sharing
  • Intra-Armenian sharing
  • Intra-Balkan sharing
  • Georgian-Abkhaz sharing
You can probably get more out of the figure, but these appear to be the most salient features.

Results for Project Participants

The results can be found in the spreadsheet, and include:
  • Probabilities of assignment in each of the 27 clusters of the Clusters Galore analysis
  • Z-scores of IBD between each individual and each of the 20 populations with 5+ participants. Higher values mean more IBD sharing. Note that Z-scores have been calculated for each row, hence each participant must scan his own row to find populations with an excess (+) or deficiency (-) of IBD sharing, and people should not compare across different rows.
Last but not least, I want to remind new project participants to leave a message in the Information about Project samples thread. Your comment will not appear immediately, since comment moderation is on, and also note that there are multiple pages of comments. 

If you haven't joined the Project yet, I encourage you to do so if you are eligible.


  1. very very nice, thank you for this. however, i would have expected to be included, i am DOD772, romanian. did you include me into Romanians_D?

  2. Hi, I've responded via e-mail, since I don't want to discuss participants' ancestry except via the e-mail address used to submit the sample.

  3. Thank you for the wonderful work. I was curious about the 18th cluster, consisting of two Georgians. Could you indicate if those are admixed individuals? (There's at least one obvious case, №19

  4. Gosh! When did the Dodecad Armenians reach 44 participants?

  5. You should have also included Ashkenazi Jews and Sephardi Jews.

  6. Why are the Yunusbayev Kurds so remote from everyone else in the heat map?

  7. @Onur might have something to do with the fact that the samples are taken from Kurds in Kazakhstan otherwise I don t know.

  8. @Onur might have something to do with the fact that the samples are taken from Kurds in Kazakhstan otherwise I don t know.

    Yeah, that was the explanation I had in mind too (Mr. Metspalu had already informed me that the Yunusbayev Kurdish samples are all from the Kurdish minority in Kazakhstan). Thanks for the explanation anyway.

    For those who don't know, Kurds in Kazakhstan are a small and isolated minority and arrived there during the Soviet times from Transcaucasia mostly as a result of Stalin's mass deportations, so they are genetically a subset of the Transcaucasian Kurds with a probable recent population bottleneck added due to their small number and isolation from outside.

  9. Dienekes, in view of all this, what is your opinion on Armenian origins in the Balkans?

  10. I am sure that Armenians originated in the Balkans because:
    1. of Herodotus' account
    2. of the close linguistic relationship with Greek
    3. of the lack of a close linguistic relationship with the Anatolian languages

    To what extent that involved a substantial movement of people is a different issue that is difficult to resolve in the absence of ancient DNA.

  11. Of course, but more specifically, are you seeing any IBD sharing that could point to this movement of people?