Monday, December 19, 2011

'world9' calculator

I have consistently received requests for an assessment of Amerindian ancestry. While the focus of the Project is, and will remain, the region of Eurasia, I thought it was a good idea to release a tool that could be used by persons of partial Amerindian ancestry.

I have also included the two Australasian populations currently available, namely Bougainville Melanesians (NAN_Melanesian) and Papuans from the HGDP.

The inferred components at K=9 are quite similar to those of 'eurasia7', with the addition of the Australasian and Amerindian components. I have also included the Kalash in this experiment, which caused the 'West_Asian' component to be modal in them, although the Kalash's difference in terms of this component to other populations is not so great as to render it strongly population-specific; I have called this component 'Caucasus_Gedrosia' and it -like the 'eurasia7' West Asian component- ought to be quite similar to the k5 component inferred by Metspalu et al. (2011).

It is unfortunate that there are only two Australasian populations currently available as public data. There are many more Amerindian and Mestizo ones, but it should be noted that the Amazonian populations on which the 'Amerindian' component is modal are some of the most lacking in genetic diversity in my entire database. As a result, Eurasians who lack any Amerindian or Australasian ancestry can expect to see a little of it in their results as noise.

This is a very important caveat for Americans who suspect that they may have an Amerindian ancestor. Small levels of this component may be noise, and this component is also found in Siberia, and may represent either backflow from the Americas or the common ancestry of Siberian and Amerindian populations. If you are interested in the detection of Amerindian ancestry, I recommend that you use DIYDodecad's 'byseg', 'bychr', and 'target' modes to drill down deeper in your genomes.

Download Files

  • The spreadsheet contains admixture proportions, the table of Fst distances, and individual results in the Individual Results tab.
  • The RAR file contains files for use with DIYDodecad. Extract its contents to the working directory of DIYDodecad. In order to run the calculator, you follow the instructions of the README file, but type 'world9' instead of 'dv3'.

Terms of use:

'world9', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.

Information


Admixture proportions barplot:



The nine ancestral components are:

  • Amerindian
  • East_Asian
  • African
  • Atlantic_Baltic
  • Australasian
  • Siberian
  • Caucasus_Gedrosia
  • Southern
  • South_Asian
Table of Fst divergences:

Neighbor-joining tree of Fst distances; the long branch lengths of the Australasian (and to a less degree the Amerindian) branch is due to the high level of inbreeding in the populations for which this component is modal.
First 8 dimensions of multi-dimensional scaling (MDS):
Technical Details


A dataset of 3,548 individuals/265,519 SNPs/284 populations was assembled. Pruning for distantly related individuals was performed by iterative pruning of a single individual from each pair showing IBD RATIO greater than the mean plus 2 standard deviations, or greater than 2.5. 3,026 individuals remained. An additional 14 individuals were removed because they had less than 97% genotype rate. The marker set was thinned to remove SNPs with less than 97% genotype rate or 1% minor allele frequency. Linkage-disequilibrium based pruning with a window of 200 SNPs, advanced by 25 SNPs, and an R-squared of 0.4 was performed. A total of 3,012 individuals and 170,822 SNPs survived these filtering steps. PLINK 1.07 and ADMIXTURE 1.21 were used in the analyses.

34 comments:

  1. Dienekes, I have a question.

    At this run, my Amerindian and Siberian components, combined, were significantly larger than the Asian component(s) I used to get on other calculators (v3, k12a, euro7, eurasia7, and some of Eurogene's calculators). Is it possible that the Amerindian component has now been overestimated (say, because some of the Amerindian control samples were admixed), or is it more likely that it has before been underestimated due to a lack of Amerindian control samples in previous works?

    ReplyDelete
  2. If you have actual Amerindian ancestry, then it is natural for the sum to increase, because the Amerindian component is a better fit for that aspect of your ancestry.

    East Asian, Siberian, and Amerindian all pick up a common ancestral component, and each two of them carry some (but not all) information in the third one.

    ReplyDelete
  3. is there any way to set apart father inherited from mother inherited admix so i can at least aproximate results for father and mother etc? or it is SF for now

    thank you

    ReplyDelete
  4. Curious when you say "if you have actual Amerindian ancestry, then it is natural for the sum to increase", what kind of an increase is worth noting?

    I'm having a bit of trouble doing the segment by segment runs (I'm not the most technologically competent individual) so I've not looked farther than the basic test.

    On most runs I score 0% Amerindian, this time I was 1.3%, this higher than other folks that I typically am genetically in common with.

    I'm unsure how to interpret the increase in that I have 0% Siberian, I would think that if it was real admix the Siberian would increase as well, but the increase in general is a lingering thought.

    ReplyDelete
  5. Can someone tell me what this means: Is this a fatal error? Can anyone tell me what I might have done wrong?


    Warning message:
    running command 'DIYDodecadWin word9.par' had status 2
    >

    ReplyDelete
  6. Can anyone help me understand how to load the results of my population finder from ftDNA onto this calculator? My results are from ftDNA, I will gladly share it, since my mother is Oceania and my Father Pakistan.

    Thanks,

    ReplyDelete
  7. See the README included in the download here on how to run DIYDodecad

    http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html

    ReplyDelete
  8. Can anyone tell me what the "Southern" distinction stands for?

    ReplyDelete
  9. Umm, I have read your article about the caution of admixture estimates. Are Pure amerindian reference samples are used on the world9 calculator? I think it is very important to choose Pure 100% Amerindian samples in order to get a accurate percentage.

    ReplyDelete
  10. The information is right here, so read it.

    ReplyDelete
  11. I'm 1st generation mexican-american and I'm surprisingly pleased to find out I'm predominately indigenous (50+)! plus some asian that probably came from the out-of-asia migration.

    Amerindian 55.53%
    East_Asian 0.98%
    African 6.46%
    Atlantic_Baltic 20.27%
    Australasian 0.51%
    Siberian 2.01%
    Caucasus_Gedrosia 3.07%
    Southern 10.59%
    South_Asian 0.57%

    I say surprising cause people think only the poor/southern mex populace have higher amerindian %, but I'm neither poor nor from s.mexico.

    ReplyDelete
    Replies
    1. Interesting, I am from South America and here are my results:
      Amerindian 26.57
      East_Asian 0.58
      African 9.93
      Atlantic_Baltic 35.02
      Australasian -
      Siberian 1.37
      Caucasus_Gedrosia 6.61
      Southern 19.92
      South_Asian -

      Delete
  12. Do you know above how much an Amerindian component is not considered "noise" with world9 ? Thank you very much.

    ReplyDelete
  13. The admixture proportions barplot isn't legible when you zoom in on it.

    ReplyDelete
  14. I came out as:

    Population
    Amerindian -
    East_Asian -
    African -
    Atlantic_Baltic 54.86%
    Australasian -
    Siberian 0.73%
    Caucasus_Gedrosia 19.40%
    Southern 24.96%
    South_Asian -

    I think this is the best calculator for Europeans out there.
    It's the only calculator that could possibly identify me as half Dutch and not Scottish, irish,german or Kent.

    Using 2 populations approximation:
    1 50% S_Italiaan_Sicilian +50% Dutch @ 1.083
    2 50% German +50% S_Italian @ 1.126
    3 50% Dutch +50% S_Italian @ 1.148
    4 50% German +50% S_Italian_Sicilian @ 1.223

    ReplyDelete
  15. What is CEU30? My results for the World9 Oracle-x Population Fitting are as follows...

    # Population Percent
    1 Amerindian 0.58
    2 East_Asian 0.00
    3 African 0.00
    4 Atlantic_Baltic 75.43
    5 Australasian 0.00
    6 Siberian 0.09
    7 Caucasus_Gedrosia 10.98
    8 Southern 11.50
    9 South_Asian 1.41


    Pct. Calc. Option 2

    1 CEU30 99.55%
    2 Sardinian 0.21%
    3 Malayan 0.21%
    4 CLM30 0.03%
    5 Colombian 0.00%
    6 MALAYAN 0.00%
    7 MEX30 0.00%
    8 Brazilian 0.00%
    9 AthabaskHD4 0.00%
    10 Castilla_La_Mancha 0.00%

    Total RMSD: 0.249511

    And my results for the World9 4-Ancestors Oracle are below, but I don't understand how to read this and what it means. Could someone please explain? Thanks.

    # Population Percent
    1 Atlantic_Baltic 75.94
    2 Southern 11.58
    3 Caucasus_Gedrosia 11.06
    4 South_Asian 1.42


    --------------------------------

    Least-squares method.

    Using 1 population approximation:
    1 British @ 1.073
    2 CEU30 @ 1.092
    3 Cornwall @ 1.115
    4 Kent @ 1.229
    5 British_Isles @ 1.589
    6 German @ 1.973
    7 Dutch @ 2.204
    8 Irish @ 2.620
    9 Orcadian @ 2.719
    10 Argyll @ 2.876
    250 iterations.



    Using 2 populations approximation:
    1 50% Mixed_Germanic +50% Orkney @ 0.859
    2 50% British_Isles +50% Dutch @ 0.876
    3 50% CEU30 +50% Cornwall @ 0.924
    4 50% Mixed_Germanic +50% Orcadian @ 0.954
    5 50% British_Isles +50% CEU30 @ 0.955
    6 50% British +50% CEU30 @ 0.968
    7 50% CEU30 +50% Kent @ 0.986
    8 50% British +50% Cornwall @ 1.026
    9 50% Irish +50% Mixed_Germanic @ 1.071
    10 50% British +50% British @ 1.073
    31375 iterations.



    Using 3 populations approximation:
    1 50% British_Isles +25% Dutch +25% CEU30 @ 0.809
    2 50% British_Isles +25% Dutch +25% Cornwall @ 0.833
    3 50% Cornwall +25% Dutch +25% Orkney @ 0.843
    4 50% Cornwall +25% Dutch +25% Orcadian @ 0.850
    5 50% British_Isles +25% Mixed_Germanic +25% CEU30 @ 0.853
    6 50% Mixed_Germanic +25% Orkney +25% Orkney @ 0.859
    7 50% Orkney +25% Mixed_Germanic +25% Mixed_Germanic @ 0.859
    8 50% CEU30 +25% British_Isles +25% Cornwall @ 0.860
    9 50% Cornwall +25% Irish +25% Dutch @ 0.869
    10 50% British_Isles +25% British +25% Dutch @ 0.870
    526205 iterations.



    Using 4 populations approximation:
    1 British_Isles + British_Isles + Dutch + CEU30 @ 0.809
    2 British_Isles + British_Isles + Dutch + Cornwall @ 0.833
    3 Mixed_Germanic + CEU30 + Orkney + Cornwall @ 0.839
    4 Dutch + Orkney + Cornwall + Cornwall @ 0.843
    5 Mixed_Germanic + British_Isles + Dutch + Orkney @ 0.847
    6 Dutch + Orcadian + Cornwall + Cornwall @ 0.850
    7 British_Isles + Dutch + CEU30 + Cornwall @ 0.850
    8 Irish + Mixed_Germanic + British_Isles + Cornwall @ 0.852
    9 Mixed_Germanic + British_Isles + Cornwall + Argyll @ 0.852
    10 Mixed_Germanic + British_Isles + British_Isles + CEU30 @ 0.853
    11 British + British_Isles + Dutch + Cornwall @ 0.856
    12 Mixed_Germanic + British_Isles + Orcadian + CEU30 @ 0.857
    13 Mixed_Germanic + Mixed_Germanic + Orkney + Orkney @ 0.859
    14 British_Isles + CEU30 + CEU30 + Cornwall @ 0.860
    15 Irish + Mixed_Germanic + British_Isles + CEU30 @ 0.862
    16 Mixed_Germanic + British_Isles + CEU30 + Orkney @ 0.862
    17 Dutch + CEU30 + Orkney + Cornwall @ 0.864
    18 Mixed_Germanic + British_Isles + Orcadian + Cornwall @ 0.868
    19 Irish + Dutch + Cornwall + Cornwall @ 0.869
    20 British + British_Isles + British_Isles + Dutch @ 0.870

    2346773 iterations.

    ReplyDelete
  16. I am African American and Eastern Indonesian( Maluku) My Results are:
    Population
    Amerindian -
    East_Asian 30.02%
    African 31.13%
    Atlantic_Baltic 12.54%
    Australasian 18.16%
    Siberian 0.57%
    Caucasus_Gedrosia 1.37%
    Southern 3.86%
    South_Asian 2.34%

    ReplyDelete
  17. Population
    Amerindian 1.07%
    East_Asian 0.93%
    African 77.84%
    Atlantic_Baltic 13.06%
    Australasian 0.51%
    Siberian -
    Caucasus_Gedrosia 1.27%
    Southern 5.22%
    South_Asian 0.09%

    This is my gentics I believe. I dont think there is any statistical noise as people call it, to say what want to be and dont want to be.

    ReplyDelete
  18. My dad (who's mostly British/Scottish/Irish, German, and Scandinavian) gets these Admix Results (sorted):

    # Population Percent
    1 Atlantic_Baltic 72.95
    2 Caucasus_Gedrosia 12.19
    3 Southern 11.6
    4 Siberian 1.35
    5 Amerindian 1.33
    6 South_Asian 0.57
    7 African 0.01

    Single Population Sharing:

    # Population (source) Distance
    1 Dutch (Dodecad) 2.06
    2 German (Dodecad) 2.15
    3 Mixed_Germanic (Dodecad) 2.32
    4 CEU30 (1000Genomes) 2.74
    5 Kent (1000 Genomes) 3.31
    6 Cornwall (1000 Genomes) 3.35
    7 British (Dodecad) 3.4
    8 British_Isles (Dodecad) 4.19
    9 Argyll (1000 Genomes) 4.38
    10 Irish (Dodecad) 4.68
    11 Orcadian (HGDP) 4.93
    12 Ukranians (Yunusbayev) 5.28
    13 Orkney (1000 Genomes) 5.33
    14 Hungarians (Behar) 5.4
    15 Polish (Dodecad) 6.22
    16 French (HGDP) 7.58
    17 Belorussian (Behar) 8.02
    18 French (Dodecad) 8.57
    19 Norwegian (Dodecad) 8.71
    20 Swedish (Dodecad) 9

    Mixed Mode Population Sharing:

    # Primary Population (source) Secondary Population (source) Distance
    1 97.6% Dutch (Dodecad) + 2.4% EastGreenland @ 0.38
    2 96.9% Dutch (Dodecad) + 3.1% WestGreenland @ 0.4
    3 93.6% Dutch (Dodecad) + 6.4% Aleut @ 0.67
    4 97.9% Dutch (Dodecad) + 2.1% Athabask @ 0.7
    5 97.6% German (Dodecad) + 2.4% EastGreenland @ 0.79
    6 96.9% German (Dodecad) + 3.1% WestGreenland @ 0.83
    7 97.9% Dutch (Dodecad) + 2.1% Chukchi @ 0.83
    8 97.9% German (Dodecad) + 2.1% Chukchi @ 0.98
    9 97.9% Dutch (Dodecad) + 2.1% Koryak @ 1.06
    10 78.1% Swedish (Dodecad) + 21.9% O_Italian (Dodecad) @ 1.08
    11 79.6% Swedish (Dodecad) + 20.4% C_Italian (Dodecad) @ 1.08
    12 93.9% German (Dodecad) + 6.1% Aleut @ 1.08
    13 97.2% Dutch (Dodecad) + 2.8% MEX30 @ 1.08
    14 76.7% Swedish (Dodecad) + 23.3% Tuscan (HGDP) @ 1.09
    15 98% German (Dodecad) + 2% Athabask @ 1.09
    16 97.6% Dutch (Dodecad) + 2.4% Ecuadorian @ 1.09
    17 93.3% Mixed_Germanic (Dodecad) + 6.7% Aleut @ 1.1
    18 98.6% Dutch (Dodecad) + 1.4% Pima @ 1.1
    19 98.5% Dutch (Dodecad) + 1.5% Maya @ 1.11
    20 98.3% Dutch (Dodecad) + 1.7% PEL30 @ 1.11

    ReplyDelete
  19. I'm not american at all. All my ancestors have lived in east-Africa for centuries. Why do I get a Amerindian signal in every calculator?

    Amerindian 1.02%
    East_Asian 1.74%
    African 34.35%
    Atlantic_Baltic 1.65%
    Australasian 0.83%
    Siberian 0.77%
    Caucasus_Gedrosia 16.61%
    Southern 29.47%
    South_Asian 13.56%

    ReplyDelete
    Replies
    1. If you have Malagasy ancestry, that could be a reason since Native Americans are found to have Polynesian markers in their dna, and come from East Asian back ground like Polynesian/Austronesian people.

      Delete
    2. Population
      Amerindian 0.79
      East_Asian 0.55
      African 84.35
      Atlantic_Baltic 9.77
      Australasian -
      Siberian -
      Caucasus_Gedrosia 0.99
      Southern 3.04
      South_Asian 0.51
      Oracle
      Oracle-4

      Spreadsheet




      Delete
  20. Using 3 populations approximation:
    1 50% MKK30 +25% Cochin_Jews +25% Samaritians @ 3.258934

    What does MKK30 stand for?

    ReplyDelete
  21. Amerindian 0.76%
    East_Asian -
    African 0.15%
    Atlantic_Baltic 73.22%
    Australasian -
    Siberian 0.90%
    Caucasus_Gedrosia 11.27%
    Southern 13.07%
    South_Asian 0.59%

    ReplyDelete
  22. What does southern mean? Is it Southern Europe or Southwest Asia?

    ReplyDelete
  23. This comment has been removed by the author.

    ReplyDelete
  24. I'm just really confused and trying to find my ethnicity...it's all still confusing

    African 63.16
    2 Atlantic_Baltic 27.04
    3 Southern 6.80
    4 Caucasus_Gedrosia 1.29
    5 East_Asian 1.06

    ReplyDelete
  25. I have 1.19% Amerindian according the data by using this tool. Is that percentage considered noise?

    ReplyDelete
  26. Many People have asked what SOUTHERN represents, can this question NOT be answered?

    ReplyDelete
  27. I'm an Indonesian of mostly Javanese descent. I have this result on Dodecad World9:
    Amerindian -
    East_Asian 82.80
    African 0.25
    Atlantic_Baltic 1.52
    Australasian 4.20
    Siberian -
    Caucasus_Gedrosia -
    Southern -
    South_Asian 11.23

    Is it typical among South East Asians or does the African/Baltic/South Asian signify something?

    ReplyDelete
  28. Hello,

    I am wondering if anyone can explain the bottom portion of the oracle to me. I am new to genealogy. What do the percentages mean? Thanks in advance.

    World9 Oracle results:



    Admix Results (sorted):

    # Population Percent
    1 African 74.23
    2 Atlantic_Baltic 15.13
    3 Southern 5.04
    4 South_Asian 3.12
    5 Caucasus_Gedrosia 1.38
    6 Australasian 0.75
    7 Amerindian 0.37

    Single Population Sharing:

    # Population (source) Distance
    1 ASW30 (HapMap3) 8.71
    2 San_He 17.05
    3 ACB30 18.37
    4 Hadza_He 19.39
    5 Sandawe_He 20.01
    6 MKK30 (Dodecad) 22.36
    7 Bantu_N.E. (HGDP) 25.63
    8 LWK30 (Behar) 26.39
    9 Mandenka 29.05
    10 Bantu_S.W._Herero (HGDP) 31.64
    11 YRI30 (HGDP) 32.05
    12 San 32.05
    13 Yoruba (HGDP) 32.51
    14 Bantu_S.E._Tswana (HGDP) 32.53
    15 Biaka_Pygmies 33.21
    16 Mbuti_Pygmies 33.21
    17 Dominican 42.82
    18 Somali (Dodecad) 48.15
    19 Ethiopians (Behar) 52.18
    20 Ethiopian_Jews (Behar) 54.96

    Mixed Mode Population Sharing:

    # Primary Population (source) Secondary Population (source) Distance
    1 85.8% San_He + 14.2% French_Basque @ 2.24
    2 85.8% San_He + 14.2% Pais_Vasco (1000 Genomes) @ 2.26
    3 91.9% ASW30 (HapMap3) + 8.1% Romanians (Behar) @ 2.43
    4 91.4% ASW30 (HapMap3) + 8.6% Brazilian (Dodecad) @ 2.59
    5 92.1% ASW30 (HapMap3) + 7.9% N_Italian (Dodecad) @ 2.6
    6 92.1% ASW30 (HapMap3) + 7.9% North_Italian (HGDP) @ 2.61
    7 92.1% ASW30 (HapMap3) + 7.9% Baleares (1000 Genomes) @ 2.62
    8 92% ASW30 (HapMap3) + 8% Extremadura (1000 Genomes) @ 2.63
    9 92.1% ASW30 (HapMap3) + 7.9% Galicia (1000 Genomes) @ 2.63
    10 92% ASW30 (HapMap3) + 8% Portuguese (Dodecad) @ 2.63
    11 92.1% ASW30 (HapMap3) + 7.9% Castilla_La_Mancha (1000 Genomes) @ 2.64
    12 92% ASW30 (HapMap3) + 8% Murcia (1000 Genomes) @ 2.64
    13 84.6% ACB30 + 15.4% French (Dodecad) @ 2.65
    14 92% ASW30 (HapMap3) + 8% Bulgarians (Yunusbayev) @ 2.65
    15 92% ASW30 (HapMap3) + 8% Bulgarian (Dodecad) @ 2.66
    16 92.1% ASW30 (HapMap3) + 7.9% Andalucia (1000 Genomes) @ 2.66
    17 92.1% ASW30 (HapMap3) + 7.9% Castilla_Y_Leon (1000 Genomes) @ 2.66
    18 92.2% ASW30 (HapMap3) + 7.8% Spaniards (Behar) @ 2.67
    19 92.3% ASW30 (HapMap3) + 7.7% Cataluna (1000 Genomes) @ 2.68
    20 91.8% ASW30 (HapMap3) + 8.2% Canarias (1000 Genomes) @ 2.68

    ReplyDelete
  29. Southern? "Southern" what? What does mean this cluster?

    ReplyDelete