Tuesday, February 14, 2012

ChromoPainter/fineSTRUCTURE analysis of Balkans/West Asia

I have carried out a ChromoPainter/fineSTRUCTURE analysis of Balkans/West Asia. This is a slightly different dataset than the one used in the previous fastIBD analysis of the same region. It also took much longer (about a week, with two CPUs dedicated to the task) to complete, so it is not something that can be done routinely.

Technical details (skip if you want)

413 individuals from 33 populations were studied, on 258,100 SNPs, after --geno 0.03 --maf 0.01 filters were applied. Data were phased in Beagle with the default 10 iterations. Genetic maps from the HapMap were used. fineSTRUCTURE was used on ChromoPainter output, with 500,000 burnin/runtime iterations each.

25 Inferred Populations

fineSTRUCTURE imposes a tree structure on a number of inferred populations. The following heatmap shows this tree structure; columns represent donor populations, rows, recipient ones.

There was a total of 25 populations, labeled pop0, pop1, ..., pop24.

The following table summarizes how many individuals from each original population were assigned to each inferred population:

I will limit myself to populations which include Dodecad Project members:

  • pop6 includes a Project North Ossetian, as well as all Yunusbayev et al. North Ossetians
  • pop7 is mainly Armenian
  • pop16 is also mainly Armenian; it would be interesting to see whether this bipartite division of Armenians is in agreement with the one inferred in the previous fastIBD analysis
  • pop8 is mainly Greek, and appears to be "continental Greek"; it also includes some other Balkan individuals
  • pop14 is also Greek, and includes a variety of people with ancestry from Crete, the Aegean, Cyprus, Asia Minor, Cappadocia, and the Pontus as well as continental Greek. It could be labeled "eastern Greek"
  • pop11 is Cypriot, including the single 100% Greek Cypriot of the Project, all 3 100% Turkish Cypriots, as well as a Turkish individual of partial Turkish_Cypriot ancestry
  • pop10 is Turkish, and includes people with some ancestry from the Balkans, as well as Anatolia. It could be labelled "Balkan Turkish"
  • pop13 is also Turkish, and seems to include people with ancestry exclusively from Anatolia, including almost all the Behar et al. Turks
  • pop15 is Assyrian; some Assyrians also fall on the aforementioned pop16 which includes mainly Armenians
  • pop18 could be labelled "North Balkan"; there is probably structure to be uncovered within this cluster, once more participants from the Balkans join the Project
  • pop20 is "Georgian-Abkhazian"
  • pop21 is "Kurdish-Iranian"
  • pop22 could be labeled "Northeastern Anatolia" or (more classically) "Pontus-Colchis". It appears to unite various individuals from Northeastern Turkey and neighboring Georgia, having Karadeniz Turkish, Armenian, Pontic Greek, and Kartvelian ancestry. I strongly encourage participants from this region to join the Project, especially Pontic Greeks, as there are no 100% Pontic Greeks currently in the Project.
  • pop23 is "Bulgarian-Romanian" mainly, and also includes one Serb. Once again, I emphasize that the power of this approach using haplotypes depends on participation, so I encourage all people from the Balkans to consider joining the Project.
Principal Components Analysis

I have also used the PCA feature of fineSTRUCTURE to carry out principal components analysis. I am plotting the first two dimensions of this PCA, using my own visualization code that places labels in the average position on the plane:


Results for Project participants are included in the spreadsheet.

  • Population matrix, shows how many individuals from each population were assigned to each cluster
  • Z score population matrix, shows the normalized number of "chunks" from each donor population (columns) to each recipient (row). Do not compare across rows! The way to read this table is the following: for each row, higher values indicate more sharing. For example, the "Cypriots" population has pop11 as its main donor.
  • Individual assignments: the pop number that all Project and reference IDs were assigned to
  • Individual Chunkcounts: the number of chunks copied from its donor population (column) to each individual
  • Individual PCA: your PCA co-ordinates that can help you find your dot on the Principal Components Analysis graphic (see above)
Averaged results were included only for populations with >=5 members.
The raw chunkcounts for all 413x413 individuals can be found here.


  1. So, there is no fully Anatolian (including Pontus) Greek project participant, and no fully Balkan Turkish project participant except the single Dodecad Turkish individual assigned to the "continental Greek" cluster (pop8). pop14 is a cluster of "Greeks with both Balkan and Anatolian origins" rather than "eastern Greek", and pop10 is a cluster of "Turks with both Balkan and Anatolian origins" rather than "Balkan Turkish".

  2. Once again, Bulgarians and Romanians show up pretty close to each other. I think their genetic unity stems from pre-Slavic times and points to an ancient east Balkan genetic pool (Thracians?). West Balkans need much more sampling.

    BTW, Dienekes, where are the Turks, Kurds and Iranians that show up in the "major Armenian" cluster (pop7) from?

  3. In the following weeks, depending on the success of the 23andme test that I ordered for my mother, for sure I will provide the raw data. She is serbian from dalmatia, ancient serbian krajina community.

  4. BTW, Dienekes, where are the Turks, Kurds and Iranians that show up in the "major Armenian" cluster (pop7) from?

    I made a typo here. It should be "where are the Turks, Kurds and Assyrians that show up in the "major Armenian" cluster (pop7) from?"

    I can ask the same question for the Assyrians that show up in the "minor Armenian" cluster (pop16).

  5. Great stuff! I would argue that pop14, the "Eastern Greeks", comprised, as it were, of Greek individuals from Crete, the Aegean and Anatolia is better viewed as an Aegean/West Anatolian cluster. This is supported by pop14's close genetic affinity to pop11, the Cypriot cluster.

  6. It's not Aegean/West Anatolian. There are Project participants with various types of eastern (relative to the Greek mainland) ancestry, including Crete, the Aegean, Cyprus, Asia Minor, Cappadocia and Pontus, most of them having mixes of some of the above and Greek mainland.