Technical details (skip if you want)
413 individuals from 33 populations were studied, on 258,100 SNPs, after --geno 0.03 --maf 0.01 filters were applied. Data were phased in Beagle with the default 10 iterations. Genetic maps from the HapMap were used. fineSTRUCTURE was used on ChromoPainter output, with 500,000 burnin/runtime iterations each.
25 Inferred Populations
fineSTRUCTURE imposes a tree structure on a number of inferred populations. The following heatmap shows this tree structure; columns represent donor populations, rows, recipient ones.
There was a total of 25 populations, labeled pop0, pop1, ..., pop24.
The following table summarizes how many individuals from each original population were assigned to each inferred population:
I will limit myself to populations which include Dodecad Project members:
- pop6 includes a Project North Ossetian, as well as all Yunusbayev et al. North Ossetians
- pop7 is mainly Armenian
- pop16 is also mainly Armenian; it would be interesting to see whether this bipartite division of Armenians is in agreement with the one inferred in the previous fastIBD analysis
- pop8 is mainly Greek, and appears to be "continental Greek"; it also includes some other Balkan individuals
- pop14 is also Greek, and includes a variety of people with ancestry from Crete, the Aegean, Cyprus, Asia Minor, Cappadocia, and the Pontus as well as continental Greek. It could be labeled "eastern Greek"
- pop11 is Cypriot, including the single 100% Greek Cypriot of the Project, all 3 100% Turkish Cypriots, as well as a Turkish individual of partial Turkish_Cypriot ancestry
- pop10 is Turkish, and includes people with some ancestry from the Balkans, as well as Anatolia. It could be labelled "Balkan Turkish"
- pop13 is also Turkish, and seems to include people with ancestry exclusively from Anatolia, including almost all the Behar et al. Turks
- pop15 is Assyrian; some Assyrians also fall on the aforementioned pop16 which includes mainly Armenians
- pop18 could be labelled "North Balkan"; there is probably structure to be uncovered within this cluster, once more participants from the Balkans join the Project
- pop20 is "Georgian-Abkhazian"
- pop21 is "Kurdish-Iranian"
- pop22 could be labeled "Northeastern Anatolia" or (more classically) "Pontus-Colchis". It appears to unite various individuals from Northeastern Turkey and neighboring Georgia, having Karadeniz Turkish, Armenian, Pontic Greek, and Kartvelian ancestry. I strongly encourage participants from this region to join the Project, especially Pontic Greeks, as there are no 100% Pontic Greeks currently in the Project.
- pop23 is "Bulgarian-Romanian" mainly, and also includes one Serb. Once again, I emphasize that the power of this approach using haplotypes depends on participation, so I encourage all people from the Balkans to consider joining the Project.
I have also used the PCA feature of fineSTRUCTURE to carry out principal components analysis. I am plotting the first two dimensions of this PCA, using my own visualization code that places labels in the average position on the plane:
Results for Project participants are included in the spreadsheet.
- Population matrix, shows how many individuals from each population were assigned to each cluster
- Z score population matrix, shows the normalized number of "chunks" from each donor population (columns) to each recipient (row). Do not compare across rows! The way to read this table is the following: for each row, higher values indicate more sharing. For example, the "Cypriots" population has pop11 as its main donor.
- Individual assignments: the pop number that all Project and reference IDs were assigned to
- Individual Chunkcounts: the number of chunks copied from its donor population (column) to each individual
- Individual PCA: your PCA co-ordinates that can help you find your dot on the Principal Components Analysis graphic (see above)
The raw chunkcounts for all 413x413 individuals can be found here.