I am releasing two new calculators with K=12 and K=7 components, named 'K12b' and 'K7b'. You can scroll down to the bottom if you are just interested in the downloads, or read on.
New Features
The new 'K12b' calculator is an update of the previous K12a one, that was inferred using all the new samples submitted during the last submission opportunity. The 12 components are still roughly the same, although their allele frequencies may have changed by a bit, so existing participants can expect to have slightly altered results, and new participants in the Project more so, since their data are now contributing to the creation of the new tool. Non-participants can, of course, use the new calculator with DIYDodecad.
I have also taken the opportunity to do some minor tweaks. I am releasing population portraits for K12b (which were lacking in K12a); I've changed my visualization code so that the sample IDs of non-Dodecad populations can now be seen in the barplots. This may be useful for anyone else using these reference populations, by quickly identifying potential outliers in them.
I have also decided to use normalized median admixture proportions for the populations. For example, if 5 individuals in a population have 0, 0, 0.2, 0.5, 10.0% of a particular component, then the average is 2.14%, but the median is 0.2%. By using the median, the proportions become less susceptible to the presence of outliers (such as the 10%). However, if the median is calculated over every component separately, it is no longer guaranteed that the components will add up to 100%; this can be addressed by re-normalizing them (scaling them by a constant factor) so that they do. I believe that use of the normalized median will not only give better proportions that are less susceptible to outliers, but will also improve results of the new Dodecad Oracle for K12b.
At the same time I am also releasing 'K7b' which is an update of the existing 'eurasia7' calculator and which has been built on exactly the same dataset as 'K12b' but at a lower (K=7) level of detail.
Information on K7b
Information spreadsheet.
Normalized median admixture proportions barplot for all included populations (a high resolution version of this is included in the download bundle):
Table of Fst divergences:
Neighbor-joining tree (based on above):
Information on K12b
Information spreadsheet.
Normalized median admixture proportions barplot for all included populations (a high resolution version of this is included in the download bundle):
Table of Fst divergences:
Neighbor-joining tree (based on above):
Multidimensional Scaling Plots of K12b and K7b
I have created MDS plots using synthetic individuals representing the 12 ancestral components of K12b and the 7 ancestral components of K7b. By including both in the same plot, one gets an idea of the relationship of the components at different resolution. The first 10 dimensions can be seen below:
Here is a blowup of the main West Eurasian groups from the plot of the first two dimensions:
Some observations:
Technical Details
A dataset of 268 populations/3,115 individuals was assembled. A total of 265,519 SNPs are in common in the various source datasets as well as the 23andMe v2/v3 and Family Finder platforms. Iterative removal of distant relatives was performed by removing one individual from each pair within a population if that pair had a RATIO of 2.5 or greater or more than the mean and two standard deviations in IBD analysis performed in PLINK 1.07. A total of 2,675 individuals remained. 4 individuals were removed for low genotyping rate (less than 97%). 264,328 SNPs remained after removal of SNPs with less than 97% genotyping rate or 1% minor allele frequency. 166,770 SNPs remained after linkage-based disequilibrium pruning (--indep-pairwise 200 25 0.4). The final set thus consisted of 2,671 individuals/268 populations/166,770 SNPs. Ancestral populations (components) were inferred using ADMIXTURE 1.21, with K=7 and K=12 and default parameters.
No individuals were removed from the source datasets, except in the case of the Armenians_Y sample, where one individual (ID: armenia3) was dropped because he/she was the same as a Dodecad Project participant.
Downloads
K7b population portraits, spreadsheet, and DIYDodecad files.
K12b population portraits, spreadsheet, and DIYDodecad files.
Dodecad Oracle (K12b edition) can be downloaded from here. Please read the instructions of the previous Oracle on how to use this tool. Note that the number of populations is now 223.
To use either calculator with DIYDodecad, with your 23andMe or Family Finder data, follow the instructions in the README file, but substitute 'K12b' or 'K7b' for 'dv3'.
Project participant results for both K7b and K12b are found in the spreadsheets in the Individual Results tab.
Terms of Use
You are free to use K12b and K7b, including all downloaded files for any non-commercial purpose, as long as you attribute them to the Dodecad Project and to Dienekes Pontikos as follows:
The [K7b/K12b] admixture calculator is courtesy of Dienekes Pontikos and was developed as part of the Dodecad Ancestry Project; more information here.
New Features
The new 'K12b' calculator is an update of the previous K12a one, that was inferred using all the new samples submitted during the last submission opportunity. The 12 components are still roughly the same, although their allele frequencies may have changed by a bit, so existing participants can expect to have slightly altered results, and new participants in the Project more so, since their data are now contributing to the creation of the new tool. Non-participants can, of course, use the new calculator with DIYDodecad.
I have also taken the opportunity to do some minor tweaks. I am releasing population portraits for K12b (which were lacking in K12a); I've changed my visualization code so that the sample IDs of non-Dodecad populations can now be seen in the barplots. This may be useful for anyone else using these reference populations, by quickly identifying potential outliers in them.
I have also decided to use normalized median admixture proportions for the populations. For example, if 5 individuals in a population have 0, 0, 0.2, 0.5, 10.0% of a particular component, then the average is 2.14%, but the median is 0.2%. By using the median, the proportions become less susceptible to the presence of outliers (such as the 10%). However, if the median is calculated over every component separately, it is no longer guaranteed that the components will add up to 100%; this can be addressed by re-normalizing them (scaling them by a constant factor) so that they do. I believe that use of the normalized median will not only give better proportions that are less susceptible to outliers, but will also improve results of the new Dodecad Oracle for K12b.
At the same time I am also releasing 'K7b' which is an update of the existing 'eurasia7' calculator and which has been built on exactly the same dataset as 'K12b' but at a lower (K=7) level of detail.
Information on K7b
Information spreadsheet.
Normalized median admixture proportions barplot for all included populations (a high resolution version of this is included in the download bundle):
Table of Fst divergences:
Neighbor-joining tree (based on above):
Information on K12b
Information spreadsheet.
Normalized median admixture proportions barplot for all included populations (a high resolution version of this is included in the download bundle):
Table of Fst divergences:
Neighbor-joining tree (based on above):
Multidimensional Scaling Plots of K12b and K7b
I have created MDS plots using synthetic individuals representing the 12 ancestral components of K12b and the 7 ancestral components of K7b. By including both in the same plot, one gets an idea of the relationship of the components at different resolution. The first 10 dimensions can be seen below:
Some observations:
- The Atlantic_Med component which is bi-modal in Basques and Sardinians occupies the apex of the figure; this makes sense, since Southwest Europe is quite distant (along land routes) to both Asia and Africa.
- The Caucasus component is surrounded by most of the others; this is consistent with my theory elaborated in The womb of nations: how West Eurasians came to be.
- The Atlantic_Baltic component (from K=7) is intermediate between the Atlantic_Med and North_European components.
- Similarly, the West_Asian component (from K=7) is intermediate between the Caucasus and Gedrosia components; the Gedrosia component diverges in the direction of the Asian groups (not shown in this figure), and in particular of South Asians. This divergence can also be seen in the plot of dimension #3.
- The Northwest_African component diverges in the direction of Sub-Saharan Africans.
Technical Details
A dataset of 268 populations/3,115 individuals was assembled. A total of 265,519 SNPs are in common in the various source datasets as well as the 23andMe v2/v3 and Family Finder platforms. Iterative removal of distant relatives was performed by removing one individual from each pair within a population if that pair had a RATIO of 2.5 or greater or more than the mean and two standard deviations in IBD analysis performed in PLINK 1.07. A total of 2,675 individuals remained. 4 individuals were removed for low genotyping rate (less than 97%). 264,328 SNPs remained after removal of SNPs with less than 97% genotyping rate or 1% minor allele frequency. 166,770 SNPs remained after linkage-based disequilibrium pruning (--indep-pairwise 200 25 0.4). The final set thus consisted of 2,671 individuals/268 populations/166,770 SNPs. Ancestral populations (components) were inferred using ADMIXTURE 1.21, with K=7 and K=12 and default parameters.
No individuals were removed from the source datasets, except in the case of the Armenians_Y sample, where one individual (ID: armenia3) was dropped because he/she was the same as a Dodecad Project participant.
Downloads
K7b population portraits, spreadsheet, and DIYDodecad files.
K12b population portraits, spreadsheet, and DIYDodecad files.
Dodecad Oracle (K12b edition) can be downloaded from here. Please read the instructions of the previous Oracle on how to use this tool. Note that the number of populations is now 223.
To use either calculator with DIYDodecad, with your 23andMe or Family Finder data, follow the instructions in the README file, but substitute 'K12b' or 'K7b' for 'dv3'.
Project participant results for both K7b and K12b are found in the spreadsheets in the Individual Results tab.
Terms of Use
You are free to use K12b and K7b, including all downloaded files for any non-commercial purpose, as long as you attribute them to the Dodecad Project and to Dienekes Pontikos as follows:
The [K7b/K12b] admixture calculator is courtesy of Dienekes Pontikos and was developed as part of the Dodecad Ancestry Project; more information here.