Wednesday, October 26, 2011

'eurasia7' calculator

This calculator was made with 196 different populations and 2,659 individuals, including 518 project participants. The following Dodecad populations do not have 5 individuals yet, so they are included in the OTHERS_D generic category:
Algerian_D, North_African_Jews_D, Slovenian_D, Mixed_Scandinavian_D, Danish_D, Moroccan_D, Tunisian_D, Serb_D, Austrian_D, Saudi_D, Pakistani_D, Tatar_Various_D, Palestinian_D, Greek_Italian_D, Romanian_D, Swiss_German_D, Szekler_D, Mandaean_D, Azeri_D, Czech_D, Georgian_D, Belgian_D, Latvian_D, Estonian_D, Bangladesh_D, Yemenese_D, Sri_Lanka_D, Hungarian_D, Basque_D, Udmurt_D, Egyptian_D
As always, I encourage people with 4 grandparents from the same country or ethnic group of Eurasia, North or East Africa to contact me (do not send data!) for possible inclusion in the Project. If I have overlooked any such individuals, drop me a line (my e-mail address is at the bottom of the blog). I usually start a new _D population whenever individuals with 4 grandparents from the same group are submitted, but I may have missed some.

Note that all individuals from the reference populations have also been included, including outliers; you should be aware of this when reading the population averages, and consult the Outliers tab in the v3 spreadsheet for some instances of outliers.
Due to image size restrictions in Picasa, the labels are not visible well. A large version of the above plot can be found in the download bundle.

The seven ancestral populations inferred at this level of resolution are:
  • Sub_Saharan
  • West_Asian
  • Atlantic_Baltic
  • East_Asian
  • Southern
  • South_Asian
  • Siberian
As usual, you should take these names as useful labels, and interpret them in conjunction with the components' distribution in different populations, and their Fst distances, both of which can be found in the spreadsheet.

The table of Fst distances:


Below you can see a neighbor-joining tree based on inter-population Fst distances:
The first six dimensions of a multi-dimensional scaling of the same:





Calculator Files:

  • The spreadsheet contains population averages, the table of Fst distances, and individual results for included Project participants.
  • The download RAR file (Google Docs or Sendspace) contains all the files needed to run the calculator. You must download and install DIYDodecad 2.1 first. In order to run the calculator, you follow the instructions of the README file, but type 'eurasia7' instead of 'dv3'.

Terms of use: 'eurasia7', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.

Technical Details:

The calculator is built using allele frequencies of K=7 ancestral components inferred by ADMIXTURE 1.21 analysis of 2,659 individuals. Markers included in the source datasets, as well as the Family Finder and 23andMe (as of Oct 21) platforms were included. The marker set was thinned of markers with less than 99.5% genotype rate and less than 0.5% minor allele frequency. Linkage-disequilibrium based pruning was carried out with a window size of 250 SNPs, advanced by 25 SNPs and R-squared greater than 0.4. A total of 164,990 SNPs remained after these filtering steps.

All relevant populations available to me, and genotyped at a sufficient number of markers were included. Inclusion of the Kalash population resulted in a population-specific component at K=7, and hence their admixture components were inferred a posteriori. Their proportions are consistent with previous results, showing them to be a "West Asian" population (62.4%) with substantial "South Asian" admixture (37.1%), and near-complete absence of any other genetic components.

34 comments:

  1. Could you also show the Fst distances between the populations (in an Excel document)?

    ReplyDelete
  2. Perhaps a separate tab on the spreadsheet with the averages of groups with <5 members would be more informative than the generic OTHERS_D category? Of course, it should include a caveat indicating that the results may not be as reliable.

    ReplyDelete
  3. Not posting results with <5 members serves two purposes:

    1. Prevents wrong or shaky conclusions about particular groups from being circulated, and
    2. Encourages participation

    ReplyDelete
  4. Dienekes. Thanks for this update. There seem to be a number of new South-Asian participants, who, unfortunately, didn't leave a message in the ancestry thread. Here are those specific DODs-

    DOD201
    DOD583
    DOD782
    DOD078
    DOD449
    DOD666
    DOD720
    DOD698
    DOD757
    DOD749
    DOD414
    DOD753
    DOD430
    DOD819
    DOD822

    Could you please prompt these individuals to leave a message in the ancestry thread? Indian_D or South Asian_D is a really uninformative label when trying to investigate intra-community variation in a genetically diverse area like South-Asia :-). How about implementing the idea of a spreadsheet I mentioned to you a few days ago?

    ReplyDelete
  5. The information is there in the ancestry thread, so if anyone wants to organize it in a spreadsheet, they are free to do so. That type of secretarial work is not a good use of my time.

    As for Project participants, the submission rules do not stipulate that they make their ancestry information public. If they choose not to, that's their prerogative.

    ReplyDelete
  6. I noticed the same as Vasishta. The only real Anatolian Kurd we have so far wasnt added into the Kurdish_D samples. You probably missed him.

    Here is his DOD

    DOD 834

    ReplyDelete
  7. It seems to be our fault. he missed to leave a message on the ancestry thread.

    ReplyDelete
  8. Interestingly, the Mozabites pretty much lack the West Asian component altogether but have much of the South Arabian component. While modern Arab groups carry significant levels of the West Asian cluster. It looks like that the Mozabites have not been affected by modern Arab expansions.

    ReplyDelete
  9. The most recent run shows that Turks of Anatolia and Turkmens have the same West Asian scores, about 50% for both. Turkmens are a lot more South Asian due to Tajik influences (Tajiks are over 18% South Asian). Turks are more Mediterranean however Turkmens also have significant Mediterranean influences (See Southern).

    Turks have on average 2.7% East Asian compared to Turkmens' 8.4%. This suggests a 32% genetic inflow from Turkmens to Anatolia if we accept Turkmenistan as the source population of Turks in Anatolia (due to good linguistic and historical reasons such as both populations' speaking Oghuz). Furthermore Turks on average have 3.4% Siberian compared to Turkmens' 8.5% suggesting a 40% genetic inflow from Turkmenistan to Anatolia. I do believe the Oghuz Turkic family of Turkmens of Turkmenistan, Azeris of Azerbaijan and Iran and Turks of Anatolia and Balkans should be closely examined as a group and contrasted with Armenians, Greeks, Syrians as the Turkic roots of Anatolian Turks are in Turkmenistan. I do hope Dienekes publishes this post. Thanks for the analysis.

    Sub_Saharan:Turks all:0.4,Turkmens: 0.2
    West_Asian: Turks all:48.8,Turkmens: 46.3
    Atlantic_Baltic: Turks all: 19.1,Turkmens: 12.8
    East_Asian: Turks all: 2.7,Turkmens: 8.4
    Southern: Turks all: 23,Turkmens: 10.9
    South_Asian: Turks all: 2.5,Turkmens: 13
    Siberian: Turks all: 3.4,Turkmens: 8.5

    ReplyDelete
  10. @agit123

    DOD834 is in the spreadsheet. Look again.

    "if we accept Turkmenistan as the source population of Turks in Anatolia (due to good linguistic and historical reasons such as both populations' speaking Oghuz)."

    It does not follow. The fact that A and B speak the same branch of a language family does not imply that A is descended from B. Actually, as I have shown quite conclusively, both Anatolian Turks and Turkmen have been influenced by local substrata and hence the former cannot be derived from the latter.

    http://dienekes.blogspot.com/2011/09/uzbeks-as-nexus-altai-as-source-of.html

    But, since you suggest that I test Turks with Turkmen as a parental population, I decided to do so. I included the Turkmen and Armenians from Yunusbayev et al. (2011) as well as the Turks from Behar et al. (2010), and using the same set of markers as the eurasia7 calculator:

    1 Turks 19 95.9 4.1
    2 Armenians 16 99.8 0.2
    3 Turkmens 15 26.1 73.9

    Turks are:

    (99.8-95.9)/(99.8-26.1) = (4.1-0.2)/(73.9-0.2) = 5.3% of the way between Armenians and Turkmens.

    This experiment can be easily repeated by anyone who has access to this data. In short, there is absolutely no evidence that Turks can be seen as a population of 32-40% Turkmen origin.

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. AT,

    Fact 1: The Turkic roots of Anatolian Turks are not in what is now Turkmenistan but directly in what is now Kazakhstan, as Anatolia, present-day Azeri lands and what is now Turkmenistan were invaded by Turkic peoples during the same centuries (the 11th to mid-12th and also mid-13th centuries) from what is now Kazakhstan. Anatolia, present-day Azeri lands and what is now Turkmenistan were completely non-Turkic in population before those centuries.

    Fact 2: Turkic peoples admixed with the native peoples of Anatolia, present-day Azeri lands and what is now Turkmenistan during the centuries that followed their invasion of those lands. Thus they diverged from their pre-invasion genetics in all three of these regions.

    Fact 3: Central Asian Turkic languages were much closer to each other during the era of the Turkic invasion of Anatolia, present-day Azeri lands and what is now Turkmenistan than they are today so much so that they were at most dialects of the same Turkic language and there was much fluidity between them. So inferences of genetic origins based on the Turkic languages of today are misleading.

    The Result: Your inference that present-day Turkmens are a good genetic representative of the Turkic population that invaded Anatolia is certainly wrong.

    ReplyDelete
  13. @Dienekes: I think to be fair you should publish my responses.

    You previously calculated Turks to be 1/7 Uzbek. Clearly Turkmens are a lot closer to Turks than Uzbeks. Now you suggest Turks are 5% Turkmen. How does this work? This is conflicting.

    ReplyDelete
  14. You don't have to tell me to publish your comments. I don't check comments 24x7.

    You previously calculated Turks to be 1/7 Uzbek. Clearly Turkmens are a lot closer to Turks than Uzbeks. Now you suggest Turks are 5% Turkmen. How does this work? This is conflicting.

    A population cannot always be expressed as a weighted sum of two other populations. You can think of it in terms of colors: you can approximate some green hue if you have a blue and a yellow hue. But, you can't approximate a green hue if you have a blue and a magenta hue, no matter how you mix them. Of course ADMIXTURE always comes up with an answer that adds up to 100%, but it is clear that the answer in this case is not that Turks are about 1/3 of the way between Armenians and Turkmens.

    Again, please consult the PCA plot

    http://dienekes.blogspot.com/2011/09/uzbeks-as-nexus-altai-as-source-of.html

    It is very clear that all the Turkic populations living in West Eurasia can be reasonably approximated as 2-way mixes of Uzbeks and their respective native populations.

    In any case, you had asked for me to do this type of analysis even before there were any Turkmen data available, and now I've done it. It is what it is, and anyone can repeat my experiment if they want to.

    ReplyDelete
  15. The averages for Japanese seem to be different from the V3 results.

    The Japanese_D had around 3% South Asian admixture, while this is absent in the K=7 analysis. Also the JPT (Japanese-Tokyo) had less "Northeast Asian" than most East Asians in the V3 study, while in this study, they seem to have a lot more "Siberian".

    I think the Northeast Asian and Siberian components correspond exactly between this admixture run and the V3 study.

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. The problem with Uzbeks is that what is now Uzbekistan was Turkified only within the last 1000 years (especially beginning from the 13th century push of Mongols on Turkic peoples from what is now Kazakhstan), thus Turkic-speakers of what is now Uzbekistan have been admixing with the pre-Turkic locals of what is now Uzbekistan (who were all Iranic-speaking) within the 1000 years. What is now Kazakhstan, on the other hand, was already Turkic-speaking before the Seljuq/original Turkmen migration to Anatolia, what is now Azeri lands and what is now Turkmenistan, which happened within the 11th to the 13th centuries (they first arrived in Anatolia, what is now Azeri lands and what is now Turkmenistan in the 11th century, thus during the same century, and migration to those lands must have happened more within the 11th to the middle of 12th centuries, when the Great Seljuq Empire still existed, than within the mid-12th to the 13th centuries, when the Great Seljuq Empire had dissolved), and more importantly, the original Turkmens almost completely lived in what is now Kazakhstan before the Seljuq/original Turkmen migration to Anatolia, what is now Azeri lands and what is now Turkmenistan. This leaves us with Kazakhs, Uyghurs and Kyrgyz to use as a proxy for the Turkic population that invaded Anatolia. Of these, Kazakhs are the only one who inhabit the lands of the Turkic population that invaded Anatolia right before its invasion of Anatolia.

    ReplyDelete
  18. << [Kalash] proportions are consistent with previous results, showing them to be a "West Asian" population (62.4%) with substantial "South Asian" admixture (37.1%), and near-complete absence of any other genetic components. >>

    And yet the Kalash show strong traces of blondism of the hair and eyes. (Search google images for "Kalash".) Does this allude to an (partial?) Asian origin of Nordics?

    And again we see that the West Asian ancestral component and the Atlantic_Baltic component are much closer to each other (0.028) than they are to the Southern [European?] (0.055/ 0.058). And they are nearly as close to the South Asian (0.06/ 0/065) as they are to the Southern. Moreover they have simimlar distances to all the other components -- while the other components do not have similar distances? The data thus seems to suggest a West Asian origin of the Atlantic_Baltic [North European?] component.

    The present analysis thus seems consistent with an Asian origin of the Indo-Europeans.

    ReplyDelete
  19. @Dodecad Project.

    Yes he is but his results were not included in Kurdish_D samples. That was actually what I meant.

    ReplyDelete
  20. Yes he is but his results were not included in Kurdish_D samples. That was actually what I meant.

    Results were included in the 'eurasia7' average. A new 'Dodecad v3' average for Kurds (and several other populations) was not calculated, and is not, in general, calculated every time there is a new sample. I do that from time to time for a batch of populations. Also, I don't plan to do that anymore, since I am in the process of transitioning to Dodecad v4.

    ReplyDelete
  21. When will you post about the Dodecad V4 design?

    ReplyDelete
  22. Just a detail:

    I think you should named this sub-saharan component as plain African, and the other Euro7 wich you named African, Sub-saharan. Simply because in the Euro7 I was 0% and here I am 0.4%, and probably others have experienced a similar effect (going up).

    As another example, Mozabites get a substantial amount of Sub-Saharan...and with the other name looks better. But it's just my opinion.

    Well, although it's not incredibly relevant, it's a bit discordant, just this. With all due respect ;)

    ReplyDelete
  23. @Dienekes

    You dont want to add DOD 834 even though he is the only Anatolian Kurd so far? So dont you think it is far stretched calling Kurd_D, Kurd "average"?

    ReplyDelete
  24. @agit123

    What part of "he was included in the eurasia7 average and I am transitioning to v4" don't you understand?

    ReplyDelete
  25. @Acid, you are right about the terminology consistency issue. The Northwest_African and East_African components appear to be old stabilized blends of "African"/"Sub_Saharan" admixed with a Mediterranean and Southwest_Asian element respectively. I think one could name the component that emerges in Africa at lower K (before Northwest_African and East_African) either African or Sub_Saharan. It's probably a good idea to just name it African, and I'll keep that in mind for any future tools released at that level of resolution.

    ReplyDelete
  26. Thanks for your reply, it's good to know you consider this.

    Good luck with the Dodecad v4, some of us are waiting excited for it ;)

    ReplyDelete
  27. Is there any way or tool to find the closest matches for populations or single individuals ? Anyways, thanks.

    ReplyDelete
  28. Dienekes why is the South Asian score higher for North Europeans? Compared to Dodecad V3? I also noticed that Kurds also have a lower number compared to V3, whiles their Iranian brethrens have increased?

    ReplyDelete
  29. >> When will you post about the Dodecad V4 design?

    When it's ready

    >> Dienekes why is the South Asian score higher for North Europeans? Compared to Dodecad V3? I also noticed that Kurds also have a lower number compared to V3, whiles their Iranian brethrens have increased?

    You can't compare scores of similarly named components across different K and different datasets.

    ReplyDelete
  30. I asked this because on eurasia7 it looks like North Europeans do have some Iranic input from Iranic Scythians. So which do you think is more reliable for the Iranic Scythian genetic input in to Europe? this or Dodecad V3? I am novice to this so I might be asking odd questions.

    ReplyDelete
  31. There is a definite "common link" between Central-South Asia and Western Europe that is becoming apparent. I first discovered it with the so-called "Dagestan component" linking the Caucasus with NW Europe and South Asia, and later with the unexpected "South Asian" that is found in small amounts in Europe. I don't have a good theory of how this was mediated. This will become clearer, hopefully fairly soon.

    ReplyDelete
  32. 2 things
    1/Neither Turkmens nor Uzbeks can be proxies of the 11 th century invading Turks because both are merely Turkic speaking Iranians (still with minor Turk genetical input but nearly 0 Turk cultural input)
    Portuguese speaking Angolans too are not Portuguese but simply Portuguese speaking bantus
    In both cases there is no racial nor hg match between the 2 populations concerned+there could not be a population replacement of millions of natives by few thousands invading Turks (or Portugueses)

    However I think that the Oghuz Turk population of Salars could respresent good proxies (both in hg and autosomal DNA) since they dont mix with local budhist hans also racially it shows a pure centralasian Turk phenotype (please see below)
    http://upload.wikimedia.org/wikipedia/commons/5/54/SalarTurkmensXian.jpg
    http://en.wikipedia.org/wiki/Salar_people


    2/I think it becames clear that the northwestafrican component is in reality (according to the results of mozabites and moroccans)the result of merging between 80% arabian and 20% sub-saharan, we can also say that northafrican input amongst haussas is a legacy of proto arabians that brought the afrasian languages from arabia to africa and interestingly all the afrasian peoples that did not mix much with local africans as is the case of kushites, tchadians and ethiosemites i.e egyptians, arabians and northwestafricans share at least 50% of their autosomal DNA (southwestasian component itself and also the southwestasian input hidden into the northwestafrican component)

    ReplyDelete
  33. Where is the download bundle that has a higher resolution image of eurasia7_7.png ?

    ReplyDelete