For my first experiment, I carry out an analysis of various populations from the Balkans and West Asia.
27 different clusters were inferred with 17 MDS dimensions. Some interesting findings:
- For the first time there emerge a couple of clusters that appear to be quite specific to Armenians (#2 and #3).
- Similarly, Assyrians are broken to a few clusters that appear fairly specific to them (#9-11)
- Georgians are split into three clusters, one of which (#14) is linked with the neighboring Abkhasians, who in turn have their own exclusive cluster (#25)
- The cluster modal in Greeks (#6) includes 14 of 19 Greek participants, and a few Greeks are also in the Balkan cluster (#8) and an Iranian-Turkish cluster (#4)
- The Behar Cypriot sample also splits into two, and the few Turkish Cypriot participants link to one of them (#13)
- The Ossetian project participant links to one of the three North_Ossetian clusters
- The major Balkan cluster (#8) still defies resolution. I am certain, however, that structure in this cluster will be uncovered with more participation. MCLUST adapts the cluster size and shape, and a "big", inclusive cluster spanning the Balkans appears more parsimonious than smaller clusters centered on the different groups. With larger participation, I anticipate that regional structure will be uncovered in the Balkans as well.
- Discover group-specific clusters, by identifying what is common between members of groups
- Discover within-group clusters, by identifying what is different between members of groups
You can also see a visual representation of inter-population IBD:
As you might expect, values across the diagonal are "reddish", since individuals within populations tend to have high IBD sharing with each other.
A few features "pop out" of the screen. Going from top to bottom:
- Intra-Iranic sharing
- Intra-Armenian sharing
- Intra-Balkan sharing
- Georgian-Abkhaz sharing
Results for Project Participants
The results can be found in the spreadsheet, and include:
- Probabilities of assignment in each of the 27 clusters of the Clusters Galore analysis
- Z-scores of IBD between each individual and each of the 20 populations with 5+ participants. Higher values mean more IBD sharing. Note that Z-scores have been calculated for each row, hence each participant must scan his own row to find populations with an excess (+) or deficiency (-) of IBD sharing, and people should not compare across different rows.
If you haven't joined the Project yet, I encourage you to do so if you are eligible.