Thursday, January 19, 2012

fastIBD analysis of South Asia

Please refer to the previous analysis on the Balkans/West Asia for more information about the interpretation of this type of analysis.

Clusters Galore

The Clusters Galore analysis can be found in the spreadsheet. 59 clusters were inferred with 47 MDS dimensions. The very fine-scale structure (I only considered the first 50 dimensions, but many more seemed significant than in any previous experiment) is probably the result of the size of the South Asian population, as well as the practice of endogamy associated with the caste system. High intra-population IBD sharing is also evident in the following (notice how well-defined the diagonal is):

Inter-Population IBD

Results for Dodecad participants

They can be found in the spreadsheet. Many Project participants belong to a population with 1 or 2 individuals, so cluster #1 seems to be a generalized catch-all for many such individuals. Individuals from he two sub-populations that I've identified recently Iyer_D, and Jatt_D all belong to the same cluster. The Iyer_D cluster (#4) also seems to include the Iyengar project participants as might be expected.

It is also interesting how all Dodecad participants fall in just 7 of the 59 clusters. This goes to show how truly diverse people from the Indian subcontinent are. I fully expect that with more participation further structure will be revealed, since it seems that due to endogamy it only takes a few participants from each ethnic group for a specific cluster pertaining to that group to be identified. So, I invite people from South Asia to join the Project during this submission opportunity.

1 comment:

  1. Thank you for this analysis! I guess the rigid clustering makes the South Asian analysis little less interesting than others.
    I find the Velama and Piramalai Kallar influence on Iyer_D interesting. All "Iyer_D" members have positive Z-scores with these two groups. I wonder if this points to a common origin of these(or similar) groups.