Friday, May 6, 2011

Ancestral North Indian for South Asian members (with PCA)

I have previously estimated the Ancestral North Indian component in South Asian project members by exploiting the correlation between ADMIXTURE results and the published figures of Reich et al. (2009).

A different method of achieving the same is to project individuals onto the CEU-Onge first principal component, and exploit the correlation between PC1 scores and the published ANI figures. This correlation is +0.99, so it is possible to regress ANI on PC1 and come up with ANI estimates from PCA scores.

Results for all Indian, Bangladeshi, and Pakistani project members can be found in this spreadsheet, ordered by ANI, and interspersed with population averages.


  1. Thanks for this, Dienekes. Could you kindly request the following participants to make a small statement regarding their ancestry in the ancestry thread, if possible:


    Thanks again, for your efforts!

  2. DOD585 - Kashmiri
    DOD584 - Punjabi

    Where can I see a list of ethnicities for DOD numbers?

  3. Comparing this PCA-based analysis to your previous one using Admixture, there is a 1-2% difference. But for a few participants the difference is larger.

  4. I didn't go through them one by one, but I would guess that the difference would be highest for the few Mongoloid-admixed participants in the Project, as these do not fall strictly speaking along the Caucasoid-Onge cline.

  5. Dienekes, is it accurate to assume that the remainder is ASI, or is there a possibility of the presence of additional real West or East Eurasian components? While this is not exactly comparable, Zack estimated my Ancestral South Indian admixture to be 40.03% in his latest K=11 Admixture Ref3 run where we saw the Onge component which comrpised of the bulk of the inferred ASI figures for most participants. Unfortunately, I do not have any ANI-ASI admixture-based results from Dodecad to compare with these results as I got my 23andMe results on the day you posted that analysis. I would appreciate your comments on all of this. Thanks.

  6. Dienekes, is it accurate to assume that the remainder is ASI, or is there a possibility of the presence of additional real West or East Eurasian components?

    The Indian Cline percentages were estimated on individuals who could be modeled as a 2-way admixture. West Eurasian groups are quite similar to each other (compared to Onge). Onge are, however, a not very good proxy for East Eurasian elements, i.e., such individuals are "off-cline" and their east Eurasian is projected as West Eurasian/Onge with unpredictable results.

    BTW, how much ASI did u get in this?

  7. Yes, I am aware of the fact that populations like Bengalis tend to have additional East Asian admixture, thus making them somewhat off-cline. However, I was wondering whether the remainder of my score is ASI. As I said I don't have admixture based results to compare with from you (Dienekes) and as far as I can see the spreadsheet lists only the following : Alphanumeric ID - PC1 - ANI%. If we assume the remainder is ASI, my ANI-ASI score is - 56.6 and 43.5 respectively. I am DOD464.

    Also, here are the South Asian clusters based on the Clusters Galore exercise as of March 31 and K=12 as of April 29 respectively. I reckon this will help infer the results better. There is a primary regional gradation, and a secondary caste-based one as far as the ANI figures are concerned. Ethnicities are listed against the IDs - blank ones are obviously people who didn't leave a message in the Ancestry thread.
    -Clusters Galore based South Asian cluster

    -K=12 based South Asian cluster