Sunday, March 11, 2012

ChromoPainter/fineSTRUCTURE analysis of Italy/Balkans/Anatolia

This was done on the same dataset as the previous fastIBD analysis.

The population assignments:



The heatmap, showing relationship between inferred populations:


The principal components analysis:



The correspondence between inferred populations and K12b components:



Results for Project participants can be found in this spreadsheet; remember than in the chunkcounts tabs, columns represent donor and rows recipient populations.

Monday, March 5, 2012

fastIBD analysis of Italy/Balkans/Anatolia

I have included the new Turkish data from Hodoğlugil & Mahley (2012) in this analysis. Additionally, there are now 5 participants in the Serb_D and Turkish_Cypriot_D sub-populations, as well as a Bosnian Muslim. There are now project participants from many Balkan countries, although Albania, the fYROM, and Croatia remain as "black holes" in the map.

Still, I am hopeful that there will be more project participants from currently under-represented populations. I have already started processing the same dataset with ChromoPainter (which takes much longer), and hopefully that analysis will be posted at the end of this week or the beginning of the next one.

First, the heatmap of inter-population IBD:

Remember that the tree groups similar populations together, and for each row in the matrix, the red end of the spectrum indicates lots of IBD sharing, and the blue end low IBD sharing. Additionally, I have now calculated the median IBD sharing, which is more resistant in the presence of potential relatives in the data.

The results appear fairly reasonable, with the Balkan, Anatolian, and Italian populations of the title forming separate branches, and the mainland Greek sample joining with Central/South Italians and Sicilians.

The Clusters Galore can be seen below; 28 clusters were inferred with 21 dimensions:



Results for Project participants can be found in the spreadsheet, and include the probabilities that each ID is assigned to each of the 28 clusters, as well as the Z-scores comparing each individual against all populations with 5+ individuals. The Z-score should be read as follows: for each row, high values indicate a high degree of IBD sharing, while low values indicate a low degree of IBD sharing.

Of course, I encourage Project participants to leave a message in the Information about Project samples thread.