Friday, September 30, 2011

'euro7' calculator

I am releasing a new calculator for Europeans, including their immediate neighboring populations around the Black Sea (Caucasus and Anatolia). The calculator can be used with DIYDodecad

There are additional African and Far-Asian population controls, so, in principle, the calculator could be used by non-Europeans/Anatolians/Caucasians, although I would be less confident of their results. For example, people of South Asian ancestry may obtain a Far-Asian result if they use this calculator, due to the deep affinity of Ancestral South Indians with East Asians. Other West Eurasians and West Eurasian-admixed peoples, not from the studied regions (e.g., Arabians or East Africans) will have their West Eurasian components mapped onto the ones used in this calculator.

'euro7' uses 7 ancestral components:
  • Caucasus
  • Northwestern
  • Northeastern
  • Southeastern
  • African
  • Far_Asian
  • Southwestern
These names represent 7 ancestral populations inferred by ADMIXTURE, and have been chosen based on the geographical regions where each of them achieves its maximum representation. You should always refer to A note of caution on admixture estimates, Interpretation of ADMIXTURE results: component sharing, as well as the average population values in the spreadsheet when interpreting your individual results.

The distribution of these 7 components can be seen in the barplot on the top left, and precise admixture proportions can be found in the spreadsheet. Note that additional samples have been used to infer these components, but as these come from Dodecad populations with less than 5 participants, I am not reporting average values for them, as per the usual project policy.

Here is the neighbor-joining tree based on the Fst divergences between the 7 ancestral components:
Instructions:

You can download the calculator RAR from here (Google docs; File->Download original), or here (sendspace).

You need to extract the contents of the RAR file to the working directory of DIYDodecad. You use it by following exactly the instructions of the DIYDodecad README, but always type 'euro7' instead of 'dv3' in these instructions.

Terms of use: 'euro7', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.

Calculators released by the Dodecad Project:

Sunday, September 25, 2011

Yunusbayev et al. (2011) data assessed with Dodecad v3

I have acquired the data from the recent Yunusbayev et al. (2011) paper on the Caucasus. This includes the following populations:
  • Kurds_Y 6
  • Bulgarians_Y 13
  • Ukranians_Y 20
  • Mordovians_Y 15
  • Armenians_Y 16
  • Abhkasians_Y 20
  • Balkars_Y 19
  • North_Ossetians_Y 15
  • Chechens_Y 20
  • Nogais_Y 16
  • Kumyks_Y 14
  • Turkmens_Y 15
  • Tajiks_Y 15
It is a valuable new addition to the Project, and it is commendable that it has been made publicly and easily available so swiftly after the appearance of the Yunusbayev et al. (2011) paper.

To get the ball rolling on the new Yunusbayev et al. data, I will map the new populations onto the Dodecad v3 components; they will be added to the Dodecad v3 spreadsheet as they are calculated.

I have been laboriously designing a new global (including Amerindians and Australasians) Dodecad X1 experimental calculator with 3,010 individuals for a few weeks now, but I guess I will now have to reboot it with 3,214.

Together with some other new data I recently discovered, I now have 9,799 individuals (some duplicates from different sources) in my global database. My Dodecad dataset of 511 individuals from a single country or ethnic group isn't too shabby either. Let's hope for a new data release that will push the data collection above the magic 10,000.

UDDATE:

I have added the first 7 populations to the spreadsheet; the others are being calculated as we speak. Most of them seem in line with expectations, but the Abkhasian sample has one outlier individual (abh27), and has thus been placed in the "Outliers" tab of the spreadsheet; a new set of admixture proportions, minus that outlier individual, will be calculated anew:

UPDATE II: The population portraits have been uploaded to Google Docs as a rar file (Sendspace mirror). Average admixture results have all been entered to the spreadsheet.

Wednesday, September 21, 2011

'weac' calculator


This new calculator places individuals on the West Eurasian cline. This cline is the first-order description of variation in West Eurasians, with populations from northern and western Europe falling on one end, and those from the Near East on the other.

On the left, you can see the populations on which the calculator is based, sorted on their average "Atlantic-Baltic" component. The raw data can be found in the spreadsheet.

Note that the main purpose of the calculator is to place European and Near Eastern samples on the West Eurasian cline, and to do so, some African and East Eurasian populations are used as controls. Other types of ancestry (e.g., South Asian or Amerindian) may register as Far-Asian in the context of this test.

You can download the calculator RAR from here (Google docs), or here (sendspace).

You need to extract the contents of the RAR file to the working directory of DIYDodecad. You use it by following exactly the instructions of the DIYDodecad README, but always type 'weac' instead of 'dv3' in these instructions.

Terms of use: 'weac', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.

Sunday, September 18, 2011

Do-It-Yourself Dodecad v 2.1

DIYDodecad v 2.1 allows incomplete genotype files to be used, i.e., genotype
files that do not include all expected SNP markers used in a calculator. This
is useful to individuals having older genotype files from their testing
companies, and allows the tool to be used with any type of genotype data, and
not only the Illumina platforms currently used by 23andMe and FamilyTreeDNA.

There is a minimum requirement of at least 100 usable SNPs, i.e., SNPs that are
in the genotype file and do not have no-calls.

If you had previously followed the instructions carefully, and got an "end of file reached" error, this was most likely due to your genotype file lacking some of the expected markers used in the calculator. Version 2.1 should work for you.

You can download it from here (Google Docs, File->Download Original), or here (Sendspace). Uncompress DIYDodecad2.1.rar to a local directory on your computer, and follow the instructions in the README.txt file.

Past versions: 2.0, 1.0

Thursday, September 15, 2011

Third-party tools based on the Dodecad Project

Gedmatch.com has made available some tools based on Dodecad v3 as bundled in DIYDodecad 2.0. In addition to the regular admixture analysis, there is a chromosome painting, and an option to compare 2 kits. This should be quite useful to Mac users, who can't use DIYDodecad at present. Gedmatch.com requires upload of your data to the server side, which provides the benefit of the other tools of the site, but may not be ideal for people with privacy concerns for whom the DIY tool was partly built.

Note that because admixture estimation is expensive computationally, the Gedmatch.com tools are slightly less accurate than DIYDodecad because of more lax termination criteria. This should not be a problem for the major components of one's ancestry, but may be for the minor ones. Moreover, convergence is achieved with a different number of iterations for different genotype files, so accuracy may vary.

Two other genome bloggers have released their own calculators that can be run with DIYDodecad. Magnus Ducatus Lituaniae has released MDLP based on its K=7 analysis. Eurogenes has released a K=14 imaginatively named test calculator for Eurasian data.

I keep a list of calculators for DIYDodecad in the DIYDodecad 2.0 page.

I neither endorse nor am I affiliated with any third-party tools, but I encourage readers to try them out; the more the merrier.

Wednesday, September 14, 2011

'africa9' calculator

I have devised a new calculator targeted specifically for Africans. Admixture proportions in the reference panel, Fst distances between the K=9 components, as well as individual results for Project participants from the North_Africa_D, North_African_Jews_D and East_African_D populations can be seen in the spreadsheet.

The calculator combines data from Henn et al. (2011), HGDP, and Behar et al. (2010). As a result, the number of SNPs is small: there is probably noise in the minor components, but the major components of one's ancestry should be well-defined.

It should be used only by Africans and African-West Eurasian admixed individuals. It is not meant for people with additional admixture (e.g., South/East Asian or Native American).

You can download the calculator RAR from here (Google docs), or here (sendspace).

You need to extract the contents of the RAR file to the working directory of DIYDodecad. You use it by following exactly the instructions of the DIYDodecad README, but always type 'africa9' instead of 'dv3' in these instructions.

Terms of use: 'africa9', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.

NB: Note that the components of 'africa9' do not necessarily have the same meaning as the same-named components you might have seen elsewhere. Refer to the spreadsheet for the admixture proportions and Fst distances between components. For example, the NW African is substantially removed from other West Eurasian components in Dodecad v3 but equidistant from Europe and SW_Asia in 'africa9'. I also advise that you read Interpretation of ADMIXTURE results: component sharing

Monday, September 5, 2011

'bat' calculator (Balkans-Anatolia-Turkic)

I have decided to make a new calculator for DIYDodecad that may be useful for individuals from the Balkans and Anatolia. You can download it from here at Google Docs (or here from sendspace). The terms of use are the same as for DIYDodecad v 2.0. To run it, you simply extract the contents of the RAR file in your working directory, and type bat.par whenever you typed dv3.par in the instructions.

The reference populations can be seen below. I have included all available Balkan populations, as well as Turks and Armenians. Moreover, I have included all available Turkic populations.

The marker set is the same as used in Dodecad v3. Three components emerge: one centered in the northern Balkans, one in eastern Anatolia, and one present in various proportions among all Turkic populations (see Turkic cline).


The components have been named accordingly, but please note that they do not necessarily reflect recent ancestors. For example, it is a good hypothesis that the Anatolia component was present in the Balkans even in ancient times, so one need not seek a recent Anatolian ancestor to explain its presence in a Balkan individual. Similarly for the Balkans component in Anatolia, which may reflect the diverse Balkan peoples that have settled in Anatolia since the dawn of history, so a present-day inhabitant of Anatolia need not seek a recent Balkan ancestor.

Likewise, the Turkic component is only part of the genetic makeup of the Turkic speakers who arrived in Anatolia, since those probably also carried West Eurasian population elements picked up en route from Siberia to Anatolia.

The way to interpret your results is to see whether you have an excess or deficiency of any component relative to your ethnic group. For example, an Anatolian Greek may have a higher Anatolia/Balkans ratio than a Balkan Greek and likewise for a Balkan vs. Anatolian Turk; the latter may also have a variable Turkic component which will reflect differential Central Asian input.

Saturday, September 3, 2011

No affiliation to Gedmatch.com

A thread at the 23andMe forum suggests that the Dodecad project is somehow related to Gedmatch.com. Apart from the fact that the admin of Gedmatch.com, for whatever reason, chose to co-opt the exact names of the 12 ancestral components of the Dodecad project for his own purposes, creating unnecessary confusion between the two projects.

I would like to state that I have absolutely no relation whatsoever to that outfit, and that use of any DIYDodecad files without my permission is forbidden without attribution, as stated in the DIYDodecad README file.

UPDATE: The admin of Gedmatch.com has kindly asked for permission, and I've replied that he is entirely free to use the DIYDodecad materials, provided that:
  1. It is clearly visible to someone considering using the utility that it is based on Dodecad v3
  2. It is also clear that the results may differ from the same-named components of Dodecad v3 produced by the Dodecad Project
I consider the misunderstanding to be over, and hopefully the tool will be online again soon.