Monday, December 19, 2011

'world9' calculator

I have consistently received requests for an assessment of Amerindian ancestry. While the focus of the Project is, and will remain, the region of Eurasia, I thought it was a good idea to release a tool that could be used by persons of partial Amerindian ancestry.

I have also included the two Australasian populations currently available, namely Bougainville Melanesians (NAN_Melanesian) and Papuans from the HGDP.

The inferred components at K=9 are quite similar to those of 'eurasia7', with the addition of the Australasian and Amerindian components. I have also included the Kalash in this experiment, which caused the 'West_Asian' component to be modal in them, although the Kalash's difference in terms of this component to other populations is not so great as to render it strongly population-specific; I have called this component 'Caucasus_Gedrosia' and it -like the 'eurasia7' West Asian component- ought to be quite similar to the k5 component inferred by Metspalu et al. (2011).

It is unfortunate that there are only two Australasian populations currently available as public data. There are many more Amerindian and Mestizo ones, but it should be noted that the Amazonian populations on which the 'Amerindian' component is modal are some of the most lacking in genetic diversity in my entire database. As a result, Eurasians who lack any Amerindian or Australasian ancestry can expect to see a little of it in their results as noise.

This is a very important caveat for Americans who suspect that they may have an Amerindian ancestor. Small levels of this component may be noise, and this component is also found in Siberia, and may represent either backflow from the Americas or the common ancestry of Siberian and Amerindian populations. If you are interested in the detection of Amerindian ancestry, I recommend that you use DIYDodecad's 'byseg', 'bychr', and 'target' modes to drill down deeper in your genomes.

Download Files

  • The spreadsheet contains admixture proportions, the table of Fst distances, and individual results in the Individual Results tab.
  • The RAR file contains files for use with DIYDodecad. Extract its contents to the working directory of DIYDodecad. In order to run the calculator, you follow the instructions of the README file, but type 'world9' instead of 'dv3'.

Terms of use:

'world9', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.


Admixture proportions barplot:

The nine ancestral components are:

  • Amerindian
  • East_Asian
  • African
  • Atlantic_Baltic
  • Australasian
  • Siberian
  • Caucasus_Gedrosia
  • Southern
  • South_Asian
Table of Fst divergences:

Neighbor-joining tree of Fst distances; the long branch lengths of the Australasian (and to a less degree the Amerindian) branch is due to the high level of inbreeding in the populations for which this component is modal.
First 8 dimensions of multi-dimensional scaling (MDS):
Technical Details

A dataset of 3,548 individuals/265,519 SNPs/284 populations was assembled. Pruning for distantly related individuals was performed by iterative pruning of a single individual from each pair showing IBD RATIO greater than the mean plus 2 standard deviations, or greater than 2.5. 3,026 individuals remained. An additional 14 individuals were removed because they had less than 97% genotype rate. The marker set was thinned to remove SNPs with less than 97% genotype rate or 1% minor allele frequency. Linkage-disequilibrium based pruning with a window of 200 SNPs, advanced by 25 SNPs, and an R-squared of 0.4 was performed. A total of 3,012 individuals and 170,822 SNPs survived these filtering steps. PLINK 1.07 and ADMIXTURE 1.21 were used in the analyses.

Sunday, December 11, 2011

Dodecad Oracle (K12a edition)

I have created a new version of the Dodecad Oracle for use with the K12a calculator.

You can refer to the original Dodecad Oracle for detailed usage instructions.

(The only difference in the use of the program is that the number of populations is 204, so make sure to use this if you plan to remove any reference populations, as mentioned in the instructions)

In short:
  • you first load the file DodecadOracleK12a.RData in R. You can do this by double-clicking on this file in Windows, or using the File->Load Workspace menu. In Linux, you can use the "load" command, e.g., load('/home/ubuntu/Desktop/DodecadOracleK12a.RData')
  • You then enter commands at the command prompt
Some examples:

Comparing a population against other populations

[,1] [,2]
[1,] "Somali_D" "0"
[2,] "Ethiopian_Jews" "12.3049"
[3,] "Ethiopians" "12.3309"
[4,] "Sandawe_He" "38.2093"
[5,] "MKK25" "40.7983"
[6,] "Egyptans" "63.2307"
[7,] "Yemenese" "69.1628"
[8,] "Moroccans" "72.6233"
[9,] "Jordanians" "73.1838"
[10,] "Palestinian" "74.2867"

Comparing a population against 2-way population mixes:

[,1] [,2]
[1,] "Pathan" "0"
[2,] "79.5% Sindhi + 20.5% Lezgins" "3.948"
[3,] "82% Sindhi + 18% Chechens_Y" "4.0251"
[4,] "16.7% Adygei + 83.3% Sindhi" "4.5471"
[5,] "83.4% Sindhi + 16.6% Balkars_Y" "4.6487"
[6,] "80.8% Sindhi + 19.2% Kumyks_Y" "4.7067"
[7,] "83.7% Sindhi + 16.3% North_Ossetians_Y" "4.8352"
[8,] "80.9% Sindhi + 19.1% Nogais_Y" "4.8821"
[9,] "66.6% Sindhi + 33.4% Tajiks_Y" "5.6708"
[10,] "86.4% Sindhi + 13.6% Georgians" "6.2927"

Comparing an individual against populations

DodecadOracle(c(8.4, 0, 2.8, 6, 2.2, 0.1, 40.3, 25.9, 0.3, 11.9, 1.5, 0.5))
[,1] [,2]
[1,] "Iranian_D" "2.2405"
[2,] "Kurd_D" "3.8092"
[3,] "Kurds_Y" "5.4945"
[4,] "Iranians" "6.634"
[5,] "Uzbekistan_Jews" "12.8957"
[6,] "Turks" "17.3173"
[7,] "Turkmens_Y" "17.7316"
[8,] "Iranian_Jews" "18.14"
[9,] "Assyrian_D" "18.8968"
[10,] "Azerbaijan_Jews" "18.9444"

Comparing an individual against 2-way population mixes

DodecadOracle(c(28, 0.8, 1.6, 49.9, 1.9, 0, 10.6, 4.1, 0, 2.4, 0, 0.6),mixedmode=T)
[,1] [,2]
[1,] "47.7% French_D + 52.3% Mordovians_Y" "2.5849"
[2,] "48.3% French + 51.7% Mordovians_Y" "2.6012"
[3,] "36.3% Spaniards + 63.7% Mordovians_Y" "2.9985"
[4,] "36% Spanish_D + 64% Mordovians_Y" "3.0577"
[5,] "65.9% Russian_D + 34.1% Spaniards" "3.0923"
[6,] "35.9% IBS + 64.1% Mordovians_Y" "3.0943"
[7,] "40% French + 60% Ukranians_Y" "3.1662"
[8,] "66.4% Russian_D + 33.6% IBS" "3.2359"
[9,] "24.5% Swedish_D + 75.5% Hungarians" "3.3021"
[10,] "39.3% French_D + 60.7% Ukranians_Y" "3.4046"

The numbers to the right of each result represent the "goodness" of the match; the lower, the better. If you wanted to list the top-30 results, in any of the above commands, you would enter, e.g.,

DodecadOracle(c(28, 0.8, 1.6, 49.9, 1.9, 0, 10.6, 4.1, 0, 2.4, 0, 0.6),mixedmode=T, k=30)

If you recently joined the Project, please consider leaving a brief comment in the Information about Project Samples thread.

Participant results for 'K12a' calculator

The participant results can be found in the "Individual Results" tab of the K12a spreadsheet.
You can read more about the K12a calculator at my other blog; if you are not a Project participant, you can also find a DIY version of it there, which can be used in conjunction with DIYDodecad 2.1.