tag:blogger.com,1999:blog-65339961273045878652024-02-27T09:09:33.921+02:00Dodecad Ancestry ProjectPersonal anthropology through the power of genomicsDienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.comBlogger218125tag:blogger.com,1999:blog-6533996127304587865.post-82245971888602828392012-12-02T22:16:00.003+02:002012-12-02T22:16:18.919+02:00D-statistics on ADMIXTURE componentsI have <a href="http://dienekes.blogspot.com/2012/12/d-statistics-on-admixture-components.html">implemented</a> the method of D-statistics as an R function. This will allow you to take your raw genotype data and calculate various D-statistics of the form:<br />
<br />
<div style="text-align: center;">
<b>D(Pop1, YOU; Pop3, Outgroup)</b></div>
<br />
Please read the <a href="http://dienekes.blogspot.com/2012/12/d-statistics-on-admixture-components.html">original post</a> for details on how to use this tool.Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com0tag:blogger.com,1999:blog-6533996127304587865.post-86982041150111247372012-11-30T13:48:00.004+02:002012-12-31T01:03:47.964+02:00Geno 2.0 patch for DIYDodecad<b>(See important update at the end of this post)</b><br />
<b><br /></b>
People who have tested using the <a href="http://www.genographic.com/">Genographic Project</a>'s Geno 2.0 test can now use the <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a> tool with their data.
The raw data download from this test has a slightly different format than the ones from 23andMe and Family Finder, so it is necessary to convert your data in a format that DIYDodecad can interpret.<br />
<br />
So, after you have downloaded and extracted the DIYDodecad <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">software</a> as per its instructions, you should also download a couple of extra files into your working directory; these files are included in this <a href="https://docs.google.com/open?id=0B7JDEoCgzRKeVV9HS19wN2JxVVU">patch</a>:<br />
<br />
<ul>
<li><b>standardize.r </b>which replaces the standardize.r in the DIYDodecad software bundle, and allows you to convert your Geno 2.0 formatted data</li>
<li><b>hgdp.base.txt </b>which includes additional information about SNP markers that is not found in your Geno 2.0 raw data download, and which is necessary to complete the conversion process.</li>
</ul>
Once these two files have been extracted into your working directory, the process of using DIYDodecad is exactly the same as for any other user of the software.<br />
<br />
The only difference is that at the step where you convert your data using the <i>standardize </i>command (see DIYDodecad README file), you will use the command:<br />
<br />
<br />
<b>standardize('johndoe.csv', company='geno2')</b><br />
<br />
where johndoe.csv is your unzipped raw data download. This will write a genotype.txt file in the working directory, and you can proceed the rest of the way as per the instructions.<br />
<br />
<div>
You can use all ancestry calculators released by the Project (or indeed other projects); the most recent one is <a href="http://dodecad.blogspot.com/2012/10/globe13-calculator.html">globe13</a>. </div>
<div>
<br /></div>
<div>
You should be aware, that because the Geno 2.0 test includes a smaller number of SNPs, and because globe13 and other calculators were developed using the common SNP set of 23andMe and Family Finder, the analysis using globe13 will only include ~34 thousand SNPs and will be "noisier" than usual. In the future, I might develop new calculators that make use of the SNP set of the Geno 2.0 test itself.</div>
<div>
<br /></div>
<div>
PS: Feel free to post a comment below if you experienced any difficulty converting your data; also thanks to <a href="http://www.yourgeneticgenealogist.com/">CeCe Moore</a> for graciously sharing a raw data file with me, which allowed me to build this converter.<br />
<br />
<b>UPDATE:</b><br />
<b><br /></b>
Apparently, the data format has been changed for some Geno 2.0 data downloads.<br />
If your data includes a [Header] ... [Data] preamble followed by a list of 5 comma-separated values, ignore this.<br />
If it includes a header "SNP,Chr,Allele1,Allele2" followed by a list of 4 comma-separated values, you should follow the instructions as above, but use company='geno2new' instead.</div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com11tag:blogger.com,1999:blog-6533996127304587865.post-45622382340650130362012-10-31T01:32:00.000+02:002012-10-31T09:21:43.218+02:00'globe13' participant resultsProject participant results for the <a href="http://dodecad.blogspot.com/2012/10/globe13-calculator.html">globe13</a> calculator can be found in the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadF9CLUJnTUdSbkVJaDR2UkRtUE9kaUE#gid=0">spreadsheet</a>. Population <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadF9CLUJnTUdSbkVJaDR2UkRtUE9kaUE#gid=2">median </a>results and <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadF9CLUJnTUdSbkVJaDR2UkRtUE9kaUE#gid=3">Fst</a> divergences are also included.<br />
<br />
Below, you can see the first two dimensions of an <b>MDS plot </b>of the 13 components:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5ukw3rG12nU0sVF2cwf_lWl6M7_2Ge5v_1VW9dwWqjD5plq1N2HmD7763G14EsyomVqRj_MHC_vFGEc_Ib5_ugfiNUhSHFnmWqkSzzWN8DguT6VTmiqe0n1rjScyn_3ye9Yev9zDPYjQ/s1600/1_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5ukw3rG12nU0sVF2cwf_lWl6M7_2Ge5v_1VW9dwWqjD5plq1N2HmD7763G14EsyomVqRj_MHC_vFGEc_Ib5_ugfiNUhSHFnmWqkSzzWN8DguT6VTmiqe0n1rjScyn_3ye9Yev9zDPYjQ/s640/1_2.png" width="640" /></a></div>
<br />
A <b>neighbor-joining tree</b> of the 13 components based on the Fst divergences:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgY_W8VHOCszZy8JLPLXmjb2zmsHFA2ikM5D-IKdgeWaDP7f5uZYBEXK-xq4-JkPBDYtM7YZnN8MdXjl7eowxSYoQSjCzjt0ZPhufGySjWYgl98tqcO7fQ86-7Fq2xsN-7Qsdhw54p2G0/s1600/nj.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgY_W8VHOCszZy8JLPLXmjb2zmsHFA2ikM5D-IKdgeWaDP7f5uZYBEXK-xq4-JkPBDYtM7YZnN8MdXjl7eowxSYoQSjCzjt0ZPhufGySjWYgl98tqcO7fQ86-7Fq2xsN-7Qsdhw54p2G0/s640/nj.png" width="640" /></a></div>
I have also created a <b>TreeMix plot </b>using Palaeo_African as an outgroup, and allowing as many as 5 migration edges:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaS2D5Af6vxoPQ4y1EBtw53FJNilxKo_xQVIno_0sKq6BWzcKgjMfGgHlP8v_LROSFDosJfKjEPV28U6SSROkrLto5Ghg0_mOMpZySylS8u2M0NvaEWrtVSccypwidj9LhYIwrxYG-EvU/s1600/treemix.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaS2D5Af6vxoPQ4y1EBtw53FJNilxKo_xQVIno_0sKq6BWzcKgjMfGgHlP8v_LROSFDosJfKjEPV28U6SSROkrLto5Ghg0_mOMpZySylS8u2M0NvaEWrtVSccypwidj9LhYIwrxYG-EvU/s640/treemix.png" width="640" /></a></div>
The actual tree is:<br />
<br />
<br />
((West_African:0.00448794,(East_African:0.00506576,(((((East_Asian:0.0173284,Siberian:0.00732773):0.0027852,(Amerindian:0.026174,Arctic:0.0118342):0.00742092):0.0114738,Australasian:0.0488974):0.00266559,South_Asian:0.00734044):0.008089,(Southwest_Asian:0.00541405,((West_Asian:0.00620657,North_European:0.00657599):0.00311587,Mediterranean:0.00798949):0.00650328):0.0118925):0.0299627):0.00597674):0.00671186,Palaeo_African:0.0215931);<br />
0.0640319 NA NA NA Palaeo_African:0.0215931 Australasian:0.0488974<br />
0.270468 NA NA NA Australasian:0.0488974 East_Asian:0.0173284<br />
0.185213 NA NA NA South_Asian:0.00734044 ((West_Asian:0.00620657,North_European:0.00657599):0.00311587,Mediterranean:0.00798949):0.00650328<br />
0.129883 NA NA NA North_European:0.00657599 Amerindian:0.026174<br />
0.138757 NA NA NA Arctic:0.0118342 (West_Asian:0.00620657,North_European:0.00657599):0.00311587<br />
<div>
<br /></div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com6tag:blogger.com,1999:blog-6533996127304587865.post-56531516798105436442012-10-29T19:58:00.001+02:002012-10-29T19:58:43.566+02:00'globe13' calculatorThe globe13 calculator is based on the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGR2ZWRoQ0VaWTc0dlV1cHh4ZUNJRUE&pli=1#gid=24">K=13</a> analysis. It includes the following components:<br />
<br />
<br />
<ul>
<li>Siberian</li>
<li>Amerindian</li>
<li>West_African</li>
<li>Palaeo_African</li>
<li>Southwest_Asian</li>
<li>East_Asian</li>
<li>Mediterranean</li>
<li>Australasian</li>
<li>Arctic</li>
<li>West_Asian</li>
<li>North_European</li>
<li>South_Asian</li>
<li>East_African</li>
</ul>
<br />
<div>
Fst divergences between ancestral components can be found <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGR2ZWRoQ0VaWTc0dlV1cHh4ZUNJRUE&pli=1#gid=26">here</a>.</div>
<div>
<br /></div>
<div>
<div>
You need to extract the contents of the <a href="https://docs.google.com/open?id=0B7JDEoCgzRKeNV9VaW9KOU43NW8">RAR file</a> to the working directory of <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a>. You use it by following exactly the instructions of the DIYDodecad README, but always type 'globe13' instead of 'dv3' in these instructions. You can consult the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGR2ZWRoQ0VaWTc0dlV1cHh4ZUNJRUE&pli=1#gid=24">spreadsheet</a> for proportions of the 13 components in different world populations.</div>
<div>
<br /></div>
<div>
<b>Terms of use:</b> 'globe13', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.</div>
</div>
Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com2tag:blogger.com,1999:blog-6533996127304587865.post-10581951766420073112012-10-23T15:09:00.001+03:002012-10-23T15:09:05.103+03:00'globe10' calculatorAs part of the on-going analysis of the world dataset, I am releasing the 'globe10' calculator, which is based on the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGR2ZWRoQ0VaWTc0dlV1cHh4ZUNJRUE#gid=17">K=10</a> analysis. This calculator includes the following ancestral components:<br />
<ul>
<li>Amerindian</li>
<li>West_Asian</li>
<li>Australasian</li>
<li>Palaeo_African</li>
<li>Neo_African</li>
<li>Siberian</li>
<li>Southern</li>
<li>East_Asian</li>
<li>Atlantic_Baltic</li>
<li>South_Asian</li>
</ul>
<div>
The names may be the same as the ones from previous calculators released by the Project, but you should always consult the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGR2ZWRoQ0VaWTc0dlV1cHh4ZUNJRUE#gid=17">spreadsheet</a> to see how they might differ. In this case, inclusion of Amerindian, Australasian populations, African hunter-gatherers, dealing with the <a href="http://dienekes.blogspot.com/2012/10/relatives-in-admixture.html">Paniya</a> issue, and inclusion of data of <a href="http://dienekes.blogspot.com/2012/09/complex-origins-and-natural-selection.html">Schlebusch et al. (2012)</a>, and <a href="http://dienekes.blogspot.com/2012/06/ethiopian-origins-pagani-et-al-2012.html">Pagani et al. (2012)</a>, have all combined to change components in subtle ways, although their modalities remain largely unchanged, and hence so do the names.<br />
<br /></div>
You need to extract the contents of the <a href="https://docs.google.com/open?id=0B7JDEoCgzRKeSXY2ZnpVUkd6cTg">RAR file</a> to the working directory of <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a>. You use it by following exactly the instructions of the DIYDodecad README, but always type 'globe10' instead of 'dv3' in these instructions. You can consult the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGR2ZWRoQ0VaWTc0dlV1cHh4ZUNJRUE#gid=17">spreadsheet</a> for proportions of the 10 components in different world populations.<br />
<br />
<b>Terms of use:</b> 'globe10', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com7tag:blogger.com,1999:blog-6533996127304587865.post-67865229568934251842012-10-19T21:50:00.001+03:002012-10-19T21:50:09.763+03:00'globe4' calculator<a href="http://dienekes.blogspot.com/2012/09/estimating-admixture-proportions-and.html">Patterson et al. (2012)</a> recently published evidence for admixture in northern Europeans between a population resembling modern Sardinians (and the Neolithic Tyrolean Iceman, whose genome was published earlier this year), and, surprisingly Native Americans. The authors attribute the Amerindian-like ancestry element to a North Eurasian population that spawned Native Americans, and which also contributed ancestry to northern Europeans. They propose two possibilities for the origin of this admixture: (i) the Mesolithic Europeans resembled Amerindians, or (ii) there was an influx of Amerindian-like populations from the east during late prehistory. A palimpsest of these two processes may explain parts of the observed signal of admixture.<br />
<br />
In a recent K=4 <a href="http://dienekes.blogspot.com/2012/10/admixture-tracks-amerindian-like.html">admixture experiment</a>, I demonstrated that ADMIXTURE software produces an Amerindian ancestral component that closely tracks the signal of admixture using the D-statistic test. I have decided to make this test available for download and use with <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a>.<br />
<br />
The test has four ancestral populations:<br />
<ul>
<li>European</li>
<li>Asian</li>
<li>African</li>
<li>Amerindian</li>
</ul>
It is important to remember that some of these components track different aspects of ancestry that is better resolved at higher resolution. There are also populations that "don't fit well" in this 4-partite scheme (e.g., certain African or Australasian populations).<br />
<br />
For example, the Amerindian component of this test may indicate (i) real recent Native American ancestry, (ii) East Eurasian ancestry found in Siberia and East Asia, (iii) the common signal of admixture differentiating most European groups from Sardinians and Near Eastern Caucasoid groups. Similarly, the Asian component may indicate Australasian, South Asian, or East Eurasian ancestry. And, the European component tracks the ancestry of individuals from West Eurasia in general, although it reaches is maximum in Sardinians.<br />
<br />
This test may, however, be useful to Old World individuals who want to get an idea about the signal of admixture discovered by Patterson <i>et al.</i>,<i> </i>so I decided to make it available. For individuals who don't suspect recent Amerindian or Siberian/East Asian ancestry, and who don't belong to populations with recent such ancestry, the Amerindian component will most likely represent the aforementioned signal.<br />
<br />
You need to extract the contents of the <a href="https://docs.google.com/open?id=0B7JDEoCgzRKeY0hPZW5yWlAyUEE">RAR file</a> to the working directory of <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a>. You use it by following exactly the instructions of the DIYDodecad README, but always type 'globe4' instead of 'dv3' in these instructions. You can consult the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGR2ZWRoQ0VaWTc0dlV1cHh4ZUNJRUE#gid=1">spreadsheet</a> for proportions of the 4 components in different world populations.<br />
<br />
<b>Terms of use:</b> 'globe4', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.
Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com17tag:blogger.com,1999:blog-6533996127304587865.post-24898985077600426212012-10-13T18:44:00.000+03:002012-10-13T18:44:04.362+03:00Geno 2.0 data requestIf anyone has received results from the Geno 2.0 test of the <a href="https://genographic.nationalgeographic.com/">Genographic Project</a> and want to share it with me, feel free to send it at dodecad@gmail.com. I will not distribute it or share it with anyone. I want to see what SNPs are tested, what format the data is in, and what is its intersection with other available datasets. This way, I can update my DIYDodecad software so that Geno 2.0 testees can use the various calculators released by the project to get an alternative ancestry assessment.<br />
<br />
In time, and if there is interest, I may release additional calculators that make use of the particular SNP set tested by Geno 2.0.Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com3tag:blogger.com,1999:blog-6533996127304587865.post-28227651525050028882012-08-12T23:34:00.004+03:002012-08-12T23:34:57.576+03:00fastIBD analysis of Africans and African AmericansIndividuals from the following populations have been included in this analysis:
<br />
<blockquote>
African_American_D Somali_D Moroccan_D
Algerian_D North_African_Jews_D Tunisian_D
East_African_Various_D Yoruba_D Sudan_D
Egyptian_D Chad_D
</blockquote>
These were analyzed in the context of a large set of African populations. CEU European Americans were also added to account for the European admixture present in some African American individuals.<br />
This is the first time I have included African American Dodecad participants in this type of analysis.<br />
<br />
A few quick points:<br />
<ul>
<li>fastIBD was run with default parameters over a dataset of 679 individuals/255020 SNPs</li>
<li>fastIBD identifies segments of relatively recent origin that are shared by individuals. These results should not be construed as measures of overall genetic similarity or origins. Rather, they suggest which populations have exchanged genes in the relative recent past.</li>
</ul>
With that said, you can get:<br />
<ul>
<li>Spreadsheet of numeric <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadDNERzRyYk8wSS1ldkNmTi1yVG1GZnc">results</a>, showing median sharing (in centi-Morgans, cM)</li>
<li>Population-level graphical <a href="https://docs.google.com/open?id=0B7AJcY18g2GaZjdLR0cxUnM2eVk">results</a>, showing an ordering of other populations based on median IBD sharing.</li>
</ul>
<br />
IBD sharing was assessed only for populations with 5+ individuals.<br />
<div>
<br /></div>
<div>
The following heat map allows for a quick appraisal of populations sharing an excess of IBD sharing (read row-by-row). The grouping of populations by language group and/or region is clearly manifested. There are some interesting details that jump off the screen (but do consult the spreadsheet for details). For example, notice that: </div>
<div>
<ul>
<li>within the Bantu group (Bantu_NE, LWK/Luhya, and Bantu_S), only the South Bantu have an excess of IBD sharing with San.</li>
<li>Of the North Africans, Egyptans show an excess of IBD sharing with Tigray</li>
<li>Notice that of the Ethiopians/East Africans it is the Omotic speaking Wolayta that seem to especially share IBD with the Ari people who are also Ethiopian Omotic speakers.</li>
</ul>
</div>
<div>
<br /></div>
<div>
<br /></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwTRT0BE50ZcdG6ZDkLn5HYmzf_JDPpTkv9BS12GUCm2o0g5wBWnS9n6kmtapetQSMT60eUj2Gm9aZr9k_XC2Cg5OSK_ilVGvhVB_1Gvwl_6YEp0Gkj-LETsur5W1A2TnzDjqmo0uSi3U/s1600/heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwTRT0BE50ZcdG6ZDkLn5HYmzf_JDPpTkv9BS12GUCm2o0g5wBWnS9n6kmtapetQSMT60eUj2Gm9aZr9k_XC2Cg5OSK_ilVGvhVB_1Gvwl_6YEp0Gkj-LETsur5W1A2TnzDjqmo0uSi3U/s640/heatmap.png" width="640" /></a></div>
Some visualizations (see graphical results above for full set):<br />
<br />
Mozabites showing a high degree of within-population IBD sharing, and secondarily with other NW African groups.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk0OS7TQ1ZVOlyjgN-YjSArKIWqNNI8pdxM4KltzYNKK_UWrzURFWRZHOfZDOhBt4s42359Pt6BBen70YWFSpFj0pHXOw5E23tK5PTs40CRcv5R1taC-rUYflQXP4d_vgn3Pc4eiCaHM8/s1600/Mozabite.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk0OS7TQ1ZVOlyjgN-YjSArKIWqNNI8pdxM4KltzYNKK_UWrzURFWRZHOfZDOhBt4s42359Pt6BBen70YWFSpFj0pHXOw5E23tK5PTs40CRcv5R1taC-rUYflQXP4d_vgn3Pc4eiCaHM8/s640/Mozabite.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
The Dodecad Project Somali sample shows high degree of sharing within itself and also with the Pagani et al. Somali and Ethiopian Somali samples, and then with various other East African groups.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2FrPj8dUPBTnJlZqEtFa8n618SGtsgE5-p4ZPrOW8e_zt7C0TONmjCBQ_PA4PZDsdTeC7WW8qKBQw1wtHciiJZ5qfor0Dqjh1JVKRHKJztj8r2NWCcEhxAhjGKF_XRNxcpm6O1dNgLU8/s1600/Somali_D.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2FrPj8dUPBTnJlZqEtFa8n618SGtsgE5-p4ZPrOW8e_zt7C0TONmjCBQ_PA4PZDsdTeC7WW8qKBQw1wtHciiJZ5qfor0Dqjh1JVKRHKJztj8r2NWCcEhxAhjGKF_XRNxcpm6O1dNgLU8/s640/Somali_D.png" width="640" /></a></div>
Sources of data are listed at the bottom left of this blog.Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com1tag:blogger.com,1999:blog-6533996127304587865.post-49193785131100618222012-08-11T14:00:00.000+03:002012-08-11T14:08:02.911+03:00On the so-called "Calculator Effect"<div>
The genome blogger Polako recently announced a <a href="http://bga101.blogspot.com/2012/05/beware-calculator-effect.html" rel="nofollow">calculator effect</a> (May 2012) affecting admixture estimates:</div>
<blockquote>
However, many people are getting skewed results, despite doing everything right. For instance, users from the UK often come out much more continental European than they should. Some of them actually believe that this is because they're genetically more Norman or Saxon than the average Brit. Nope, the real reason is what I call the "calculator effect". <b>This is when the algorithm produces different results for people who are part of the original ADMIXTURE runs that set up the allele frequencies used by the calculators, than those who aren't, even though both sets of users are of exactly the same origin, and should expect basically identical results.</b></blockquote>
<div>
This, however, was <a href="http://dienekes.blogspot.com/2011/10/further-caution-on-admixture-estimates.html">described by myself</a> many months prior, in Novemeber 2011, following up on observations made during my first analysis of <a href="http://dodecad.blogspot.com/2011/09/yunusbayev-et-al-2011-data-assessed.html">Yunusbayev et al.</a> Armenians in September 2011. It has been listed in the Technical Stuff at the bottom of this blog ever since.<br />
<br />
I had observed at the time that the newly available Yunusbayev et al. Armenian sample appeared more "European" using the Dodecad v3 calculator tool, which had been built using the Project Armenians (Armenian_D) as well as the Armenian sample of Behar et al.<br />
<br />
I then explained why this was happening, and released new versions of the Dodecad tools, such as K12a, and K12b, and more recently <a href="http://dodecad.blogspot.com/2012/06/k10a-calculator.html">K10a</a> as new scientific and project participant samples became available.</div>
<div>
<br /></div>
<div>
Polako also proposes a "solution" to the problem:</div>
<div>
<blockquote class="tr_bq">
<b>I actually designed my Eurogenes ancestry tests for Gedmatch with this problem in mind, by only using academic references to source the allele frequencies. </b>This means that test results for Eurogenes project members and non-members are directly comparable. Perhaps other genome bloggers can eventually do the same?</blockquote>
</div>
<div>
<b>The only effect of this "solution" is to ensure that there is a "calculator effect" for <i>everyone</i> using his tools.</b> For example, if he uses only published Finns and Lithuanians to build his calculator, then <i>every</i> Finn and Lithuanian who takes his test will wonder why he is "different" from the published Finns and Lithuanians, because they will <i>all</i> suffer a "calculator effect" with respect to the reference populations. <b>So, perhaps they will all be on equal footing with respect to each other, but their results will <i>all </i>be biased because of the issue I had identified.</b></div>
<div>
<br />
Moreover, <b>their results will never improve as more people join his Project</b>, because these new people will not be included in newer versions of calculators: all users of DIY Eurogenes tools will continue to receive sub-par results. Well, small consolation, at least they'll all receive comparable sub-par results.<br />
<br /></div>
<div>
The solution to this problem was also described in my original post, and it's not an unimaginative quick fix of biasing everyone's results with respect to the reference populations:</div>
<div>
<blockquote class="tr_bq">
What can we do to solve this problem? <b>Sample, sample, sample. </b>There is no shortcut. The gross details of the genetic landscape (such as the relationship between major continental groups) are easy to infer, but the details will always have room for improvement.</blockquote>
</div>
<div>
<b>It is only by adequate sampling, that is by <i>including</i> more and more people, rather than <i>excluding </i>even the ones we have, that ever more accurate admixture estimators can be devised.</b> As sample sizes grow (= more scientists publish their data, and more people join projects such as this one), allele frequencies of the different components will become ever more secure, and deviations of individuals who did not contribute to the inference of the genetic components will converge to zero.<br />
<br />
I am already quite confident that inclusion biases amount to only a few percent for Dodecad Project tools and only for the closely related components (e.g., West Asian vs. North European); as mentioned in my original post, these biases are trivial for more distantly related components (e.g., European vs. East Asian).<br />
<br />
<b>And, the way to further reduce biases that do persist is to foster participation, rather than consign everyone to a sort of fossilized mediocrity</b>, excluding whole populations of active direct-to-consumer customers (e.g., Norwegians, or Assyrians, or Iraqis, or Germans, or Koreans, or, ...) on the basis that no "academic reference" has made dense genotype data on them freely and publicly accessible.</div>Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com12tag:blogger.com,1999:blog-6533996127304587865.post-79370625357179682402012-08-10T22:39:00.000+03:002012-08-10T22:39:58.720+03:00fastIBD analysis of East/Central Eurasians and select West Eurasians<br />
Individuals from the following populations have been included in this analysis:<br />
<blockquote class="tr_bq">
Philippines_D Turkish_D Iranian_D Russian_D Finnish_D Turkish_Cypriot_D Ukrainian_D Belorussian_D Chinese_D Korean_D Japanese_D Tatar_Various_D Kazakh_D Szekler_D Hungarian_D Estonian_D Azeri_D Udmurt_D Mixed_Turkic_D </blockquote>
<div>
These were analyzed in a context of a complete set of Central/East Eurasian populations; West Eurasian populations included were mostly Uralic and Turkic speaking groups, and a few others (such as East Slavs or Iranians).</div>
<div>
<br /></div>
<div>
A few quick points:</div>
<div>
<ul>
<li>fastIBD was run with default parameters over a dataset of 627 individuals/255020 SNPs</li>
<li>fastIBD identifies segments of relatively recent origin that are shared by individuals. These results should not be construed as measures of overall genetic similarity or origins. Rather, they suggest which populations have exchanged genes in the relative recent past.</li>
</ul>
<div>
With that said, you can get:</div>
<div>
<ul>
<li>Spreadsheet of numeric <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadDZtMllScUMzblBkYi1DYnhEVUFvSEE">results</a>, showing sharing (in centi-Morgans, cM)</li>
<li>Population-level graphical <a href="https://docs.google.com/open?id=0B7AJcY18g2GaX3RGS3RCdXJNYms">results</a>, showing an ordering of other populations based on mean IBD sharing.</li>
</ul>
IBD sharing was assessed only for populations with 5+ individuals.</div>
</div>
<div>
<br /></div>
<div>
<div>
The following heat map allows for a quick appraisal of populations sharing an excess of IBD sharing (read row-by-row)</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8rtSaHt6FN0VNjBSY3I4bCdF2weLPqEwW6IsbUGaMfXwE-eFyGeqw9rXchXI7mzwsW99m0IQFZEHupxL1Vb2PNjIzWTTJVcWxwjggHAUnNUo64UntFIAVBe-aaldbyg3CbCnSr3dS4Tc/s1600/heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8rtSaHt6FN0VNjBSY3I4bCdF2weLPqEwW6IsbUGaMfXwE-eFyGeqw9rXchXI7mzwsW99m0IQFZEHupxL1Vb2PNjIzWTTJVcWxwjggHAUnNUo64UntFIAVBe-aaldbyg3CbCnSr3dS4Tc/s640/heatmap.png" width="640" /></a></div>
<div>
And, a few visualizations of mean IBD sharing:</div>
<div>
<br /></div>
<div>
Notice high levels of within-population IBD sharing for Finns, consistent with a population that experienced expansion from a small number of founders (small ancestral population size).</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjox313SfCMD9gaIBi8asyPHalqOVMU2u80Bk-5_Nfup6mr8BhYudXs6RX9YR1vgQ6z-AQS9Ze9giDIJ10M2H6argTf8lN1swumtgPNPYXuD1pJMbmBUxHYgCLY0ogtyeQSern_vcWCuv4/s1600/FIN30.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjox313SfCMD9gaIBi8asyPHalqOVMU2u80Bk-5_Nfup6mr8BhYudXs6RX9YR1vgQ6z-AQS9Ze9giDIJ10M2H6argTf8lN1swumtgPNPYXuD1pJMbmBUxHYgCLY0ogtyeQSern_vcWCuv4/s640/FIN30.png" width="640" /></a></div>
<div>
Compare with Turks, who are a much more diverse population.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiMKAN1whKnltFvxZHW-4NIIc5Ktl50Sz5FrBRIAYHYuCwymkjnlKx6mkVIY_CN2p-xudU2C1Jt4anXS4x-FtjEjq0ZnlV1DJ03tV45tdQRmxjRHrR9M2j-5F1bpMpIWCJLhBGOSXLsQI/s1600/Turkish_D.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiMKAN1whKnltFvxZHW-4NIIc5Ktl50Sz5FrBRIAYHYuCwymkjnlKx6mkVIY_CN2p-xudU2C1Jt4anXS4x-FtjEjq0ZnlV1DJ03tV45tdQRmxjRHrR9M2j-5F1bpMpIWCJLhBGOSXLsQI/s640/Turkish_D.png" width="640" /></a></div>
<div>
These two plots (you can check the spreadsheet for exact numbers) indicate different sources for the East Eurasian element in Turks and Finns. </div>
<div>
<br /></div>
<div>
The top eastern populations for Turks are: Turkmen, Chuvash, Uzbek, Uygur, all of which are Turkic speakers, followed by Hazara, Yukagir, and Selkup. For Finns, there is high degree of sharing with various Siberian groups of different languages, including Uralic Selkups (16.4cM) and Nganassan (9.6cM). Turks share less with these Uralic speakers (6.4 and 2.8cM respectively). So, these are strong hints of common shared ancestry within the Turkic and Uralic language families.</div>
<div>
<br /></div>
<div>
The Chuvash population is also quite interesting, as it shares more with Selkup and Nganassan, contrasting with other Turkic speakers. This makes excellent sense, and is in agreement with other recent <a href="http://dienekes.blogspot.com/2012/01/aapa-2012-abstracts-part-1.html">findings</a>:</div>
<blockquote class="tr_bq">
Results from this study maintain that the Chuvash are not related to Altaic or Mongolian populations along their maternal line, thus supporting the “Elite” hypothesis that their language was imposed by a conquering group —leaving Chuvash mtDNA largely of Eurasian origin. <b>Their maternal markers appear to most closely resemble Finno-Ugric speakers rather than Turkic speakers.</b></blockquote>
Sources of data are listed at the bottom left of this blog.<br />Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com8tag:blogger.com,1999:blog-6533996127304587865.post-38365306774021573812012-08-08T21:17:00.004+03:002012-08-08T21:17:48.685+03:00fastIBD analysis of Jewish and some non-Jewish populationsIt can be found <a href="http://dienekes.blogspot.com/2012/08/fastibd-analysis-of-several-jewish-and.html">here</a>.Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com0tag:blogger.com,1999:blog-6533996127304587865.post-15829499106686471292012-06-24T14:30:00.002+03:002012-06-24T14:30:39.692+03:00Clusters Galore analysis of East African participantsSee <a href="http://dienekes.blogspot.com/2012/06/clusters-galore-analysis-of-east.html">here</a>.Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com0tag:blogger.com,1999:blog-6533996127304587865.post-73381924521350230542012-06-12T14:35:00.003+03:002012-06-12T14:36:47.844+03:00'K10a' calculatorThe 'K10a' calculator represents an intermediate stage between the <a href="http://dodecad.blogspot.com/2012/01/k12b-and-k7b-calculators.html">K7 and K12</a> analyses released so far from the Project. The following components have been inferred:<br />
<ul>
<li>Palaeoafrican </li>
<li>South_Asian </li>
<li>West_Asian </li>
<li>Southeast_Asian </li>
<li>Sub_Saharan </li>
<li>Atlantic_Baltic</li>
<li>Red_Sea </li>
<li>East_Asian </li>
<li>Mediterranean </li>
<li>Siberian </li>
</ul>
There are a couple of points of interest; <b>first</b>, the Red_Sea component related Arabians with East Africans. At a higher level of resolution the "Southwest_Asian" and "East_African" (K12) components emerge. The "Red_Sea" component is not very closely related to any other components, but is somewhat related to the "Mediterranean" and "Atlantic_Baltic" components.<br />
<div>
<br /></div>
<div>
So, using the different calculators of the Dodecad Project, we first have (K7) a contrast between Africa and West Eurasia, then a signal of the shared ancestry between Arabia and East Africa (K10), and finally, strong signals of local ancestry in the two regions.</div>
<div>
<br /></div>
<div>
<b>Second</b>, the Mediterranean component here is modal in Sardinians as usual, but also projects into North Africa. Again, this is intermediate between K7 which shows a predominance of West Eurasian ancestry in North Africa + an African component, and K12 in which there are "Atlantic_Med" and "Northwest_Afican" regional components.</div>
<div>
<br /></div>
<div>
These are strong hints that the West Eurasian element in Africa differs between NW and E Africa. In the former region, it is most related to Sardinians, and in the latter it is most related to Arabians. Of course, ultimately the two elements are related to each other.</div>
<div>
<br /></div>
<div>
Table of Fst distances between components:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQwN2R2D2FaPP6u9fQgLVsECDdAl5RgPPq9shAd669zVEJ8WRFLTKMoT7WCZxV49ZaMvZLCoIKgF8sPT4MPKUHQzLHCuwv2vn8Bd4Y3BI69zHU1xrQOkAbwy_fq5aelO5priMMjQqdONM/s1600/fst.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="140" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQwN2R2D2FaPP6u9fQgLVsECDdAl5RgPPq9shAd669zVEJ8WRFLTKMoT7WCZxV49ZaMvZLCoIKgF8sPT4MPKUHQzLHCuwv2vn8Bd4Y3BI69zHU1xrQOkAbwy_fq5aelO5priMMjQqdONM/s640/fst.png" width="640" /></a></div>
<div>
<br /></div>
<div>
MDS plots of the first few dimensions:</div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYqRzvzUzQ16uSgltVReIRpVYhpFnxpq1yxt-j75H2BtqG1uuW-j43ZAjHYiASlFr1kREoi5ThxQOEgaZDmI52INrvtyMQCRov-8_v9Oe4TronVH0cG8wSal8DhBBxrNPLtEid_rdEH_4/s1600/1_2.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYqRzvzUzQ16uSgltVReIRpVYhpFnxpq1yxt-j75H2BtqG1uuW-j43ZAjHYiASlFr1kREoi5ThxQOEgaZDmI52INrvtyMQCRov-8_v9Oe4TronVH0cG8wSal8DhBBxrNPLtEid_rdEH_4/s200/1_2.png" width="200" /></a> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9z7p0g2zKirGDCa3pLti7QPnz4xx3L-Ub1eec4T01tzWfUkcQiUUcWtZuXy2UfbYFFOeHW-YzygVUa58Ckq5U0uAroABOpgMaW7e0yPhmBAduS8uP5HhfKX-lfSslLw3gEREyt3Tl5TU/s1600/3_4.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9z7p0g2zKirGDCa3pLti7QPnz4xx3L-Ub1eec4T01tzWfUkcQiUUcWtZuXy2UfbYFFOeHW-YzygVUa58Ckq5U0uAroABOpgMaW7e0yPhmBAduS8uP5HhfKX-lfSslLw3gEREyt3Tl5TU/s200/3_4.png" width="200" /></a> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg07URwvzBVreMfP5RLCmrr_lFCAqa4wMnzTOdRd9iWT83tIPisoejxxqn7DWfDEEa4anVSLIhA78a8F2mZRwTA6U1gikIH_5yBnPOFqsM5U2DcyArqT2vtd5QVxrSH7XUuoHVbrkbPl_8/s1600/5_6.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg07URwvzBVreMfP5RLCmrr_lFCAqa4wMnzTOdRd9iWT83tIPisoejxxqn7DWfDEEa4anVSLIhA78a8F2mZRwTA6U1gikIH_5yBnPOFqsM5U2DcyArqT2vtd5QVxrSH7XUuoHVbrkbPl_8/s200/5_6.png" width="200" /></a> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhelhR2BZHr8I2xfm2w6NG5YtFrHud2HFQORwwUwvIReg8iDiqMB6mK3e32VWQDqIE6voT_ECz3MM2ilAi_rBPN5TQXQN1f8qSPAS8gdWRyWLky1i-7dgzDyMBhacaa5FUfETlj8-e0X2g/s1600/7_8.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhelhR2BZHr8I2xfm2w6NG5YtFrHud2HFQORwwUwvIReg8iDiqMB6mK3e32VWQDqIE6voT_ECz3MM2ilAi_rBPN5TQXQN1f8qSPAS8gdWRyWLky1i-7dgzDyMBhacaa5FUfETlj8-e0X2g/s200/7_8.png" width="200" /></a><br />
<br />
<b>Downloads: </b><br />
<ul>
<li>Population <a href="https://docs.google.com/open?id=0B7AJcY18g2Gab1ZFN2ZPV29DUTQ">portraits</a> </li>
<li>Population and Individual results <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadC1kRjhxcHNfSGhPYlUxbEI0VVZPR0E">spreadsheet</a> </li>
<li><a href="https://docs.google.com/open?id=0B7AJcY18g2Gab05oMjJlQmpjb0U">Files</a> for <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a></li>
</ul>
<div>
Project participants can find their results in the spreadsheet. Non-participants can use DIYDodecad to calculate their results, but they should place all the calculator files in the same directory as the DIYDodecad software, and replace 'dv3' with 'K10a' in all the instructions of the README file.
<br />
<br />
Component labels are indicative, and you should compare your results against the normalized median results for different populations included in the spreadsheet.
<br />
<br />
<b>Terms of Use
</b>
<br />
<br />
You are free to use 'K10a', including all downloaded files for any non-commercial purpose, as long as you attribute them to the Dodecad Project and to Dienekes Pontikos as follows:
<br />
<br />
The 'K10a' admixture calculator is courtesy of <a href="http://dienekes.blogspot.com/">Dienekes Pontikos</a> and was developed as part of the <a href="http://dodecad.blogspot.com/">Dodecad Ancestry Project</a>; more information <a href="http://dodecad.blogspot.com/2012/06/k10a-calculator.html">here</a>.
</div>
<div>
<br /></div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com7tag:blogger.com,1999:blog-6533996127304587865.post-81779544054856532142012-06-09T00:16:00.003+03:002012-06-09T20:47:35.398+03:00'weac2' calculator<div dir="ltr" style="text-align: left;" trbidi="on">
I have made a new version of the '<a href="http://dodecad.blogspot.gr/2011/09/weac-calculator.html">weac</a>' calculator (West Eurasian cline). This is based on a large Old World dataset at K=7 and includes the following ancestral components:<br />
<ul>
<li>Palaeoafrican </li>
<li>Atlantic_Baltic </li>
<li>Northeast_Asian </li>
<li>Near_East </li>
<li>Sub_Saharan </li>
<li>South_Asian </li>
<li>Southeast_Asian </li>
</ul>
<br />
<div>
The West Eurasian cline is formed between the Near_East and Atlantic_Baltic components.</div>
<div>
<br /></div>
<div>
Here is the table of Fst distances between components:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVWj9R4feUYlraWPTjlHx-xZ6c-BJnEw9Jxs5fNQTztx-j-JoAJRX6fTMF-b2aIB4ByE1-I0VbsEKIw_lB7DtRtdVCCEY10epKIVw18PjPQDL083uI95nWWWONcCPJVXeHdK2ZyTBy3ic/s1600/fst.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="142" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVWj9R4feUYlraWPTjlHx-xZ6c-BJnEw9Jxs5fNQTztx-j-JoAJRX6fTMF-b2aIB4ByE1-I0VbsEKIw_lB7DtRtdVCCEY10epKIVw18PjPQDL083uI95nWWWONcCPJVXeHdK2ZyTBy3ic/s640/fst.png" width="640" /></a></div>
<div>
<br /></div>
<div>
MDS plots of the first few dimensions:<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimK6Z3k1zKjwV-kERYZRk5EzKfpE7ILJQsmvhWV5vRNThBKeOmm-5jHkVzilXrNPUR_BtpZvoxy4VqT413fvx_efLW6UqtcoeSVuUqfBzpE_wAkm5b2IMlZPiVUQ3CWuBifx0ByqSalYg/s1600/1_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimK6Z3k1zKjwV-kERYZRk5EzKfpE7ILJQsmvhWV5vRNThBKeOmm-5jHkVzilXrNPUR_BtpZvoxy4VqT413fvx_efLW6UqtcoeSVuUqfBzpE_wAkm5b2IMlZPiVUQ3CWuBifx0ByqSalYg/s200/1_2.png" width="200" /></a> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiikt0pLYLvfZt3nGJCmKs9fDOQUCRinycAW22Mlf1mEZn3eNvjbuJlreCwVokXiGpGQiDLP_Op8DHbZEtIUxtZue3jDWzmJkE6lKVqsfu4XkkQ8LbCWZYkuu8jt4H0waz73VJyBtZZG7w/s1600/3_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiikt0pLYLvfZt3nGJCmKs9fDOQUCRinycAW22Mlf1mEZn3eNvjbuJlreCwVokXiGpGQiDLP_Op8DHbZEtIUxtZue3jDWzmJkE6lKVqsfu4XkkQ8LbCWZYkuu8jt4H0waz73VJyBtZZG7w/s200/3_4.png" width="200" /></a> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjHvamilJGdNVBetoPZ-tKieI-ulAqggeX_mKhQh9fAsEwBjPQp3WNvTDbgCo3yZyS18OuEBX3Focc3UFzRIHHO9NJectjraX7_FBuUrArIwJKeb1RHHVaED0JQ6Slm3XdC1jAHzOGe8/s1600/5_6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjHvamilJGdNVBetoPZ-tKieI-ulAqggeX_mKhQh9fAsEwBjPQp3WNvTDbgCo3yZyS18OuEBX3Focc3UFzRIHHO9NJectjraX7_FBuUrArIwJKeb1RHHVaED0JQ6Slm3XdC1jAHzOGe8/s200/5_6.png" width="200" /></a></div>
<br />
<br />
<div>
<b>Downloads:</b></div>
<div>
<ul>
<li>Population <a href="https://docs.google.com/open?id=0B7AJcY18g2GadklRbnhVaDI5V1E">portraits</a></li>
<li>Population and Individual results <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadFRGaHlLRFBxVzJLNzVOekVEaG1OelE">spreadsheet</a></li>
<li><a href="https://docs.google.com/open?id=0B7AJcY18g2GaZWZYbC1zVEF4RlE">Files</a> for <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a></li>
</ul>
Project participants can find their results in the spreadsheet. Non-participants can use DIYDodecad to calculate their results, but they should place all the calculator files in the same directory as the DIYDodecad software, and replace 'dv3' with 'weac2' in all the instructions of the README file.<br />
<br />
<b>(NOTE: </b>Some IDs may have wrong results in the spreadsheet because of a misalignment of IDs with results; I'll fix this and update this notice. <b>UPDATE: </b>Results should be correct in spreadsheet now - 9 Jun 2012<b>)</b><br />
<b><br /></b><br />
Component labels are indicative, and you should compare your results against the normalized median results for different populations included in the spreadsheet.<br />
<br />
<b>Terms of Use</b></div>
<br />
You are free to use 'weac2', including all downloaded files for any non-commercial purpose, as long as you attribute them to the Dodecad Project and to Dienekes Pontikos as follows:<br />
<br />
The 'weac2' admixture calculator is courtesy of <a href="http://dienekes.blogspot.com/">Dienekes Pontikos</a> and was developed as part of the <a href="http://dodecad.blogspot.com/">Dodecad Ancestry Project</a>; more information <a href="http://dodecad.blogspot.gr/2012/06/weac2-calculator.html">here</a>.</div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com4tag:blogger.com,1999:blog-6533996127304587865.post-31624403676600728022012-04-27T13:33:00.001+03:002012-04-27T13:33:42.779+03:00Estimating your Gök4-related ancestry<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIwFm-ALLz08yxadLW-zkZM51MzAgAKZJ6DDrezot3IJ2wIPnHLwjg5NspupUPbM_Bo-fWNXyfUd8EF_PDnuzMK6omoFjQpmLiF34i_e2uoNR42ifEgF1EjX8xLmEORQFBbz2mSIJkcG4/s1600/proportions.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="208" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIwFm-ALLz08yxadLW-zkZM51MzAgAKZJ6DDrezot3IJ2wIPnHLwjg5NspupUPbM_Bo-fWNXyfUd8EF_PDnuzMK6omoFjQpmLiF34i_e2uoNR42ifEgF1EjX8xLmEORQFBbz2mSIJkcG4/s320/proportions.png" width="320" /></a></div>
I have taken Table S15 from <a href="http://dienekes.blogspot.com/2012/04/ancient-dna-from-neolithic-sweden.html">Skoglund et al. (2012)</a>, and the <a href="http://dodecad.blogspot.com/">Dodecad Project</a> <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadHZ6SHpiLTNTa3lsUmZJY2pQblVRR2c#gid=0">K7b</a> admixture proportions in order to investigate possible relationships.<br />
<br />
In Table S15 the authors estimate the Neolithic farmer ancestry in several populations on the basis of a single Neolithic individual from the Funnel Beaker (TRB) culture which was found in a megalithic burial in Gökhem parish.<br />
<br />
Most of these populations are already part of the Dodecad Ancestry Project, except the three Swedish samples; given the intermediacy of the Central_Sweden sample, I have decided to use my Swedish_D sample of Project participants as a stand-in for it.<br />
<br />
Below, you can see a scatterplot relating Gök4-related ancestry with K7b "Southern" component:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJb17L6Ths9HrSqEdVY-4lGhPhMefxiWkg-QyZ-yYDBJbxxFS-jCKg6-Mq8xeaksQtK0q8wpYxWjyLH28Z6bT1xq54TJ9Muno2l29YqbU5Ws30dah_a30VEKPtXr2cUXCFd7KN3jqO17E/s1600/gok4_southern.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="410" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJb17L6Ths9HrSqEdVY-4lGhPhMefxiWkg-QyZ-yYDBJbxxFS-jCKg6-Mq8xeaksQtK0q8wpYxWjyLH28Z6bT1xq54TJ9Muno2l29YqbU5Ws30dah_a30VEKPtXr2cUXCFd7KN3jqO17E/s640/gok4_southern.png" width="640" /></a></div>
<br />
<br />
The correlation between the two variables is very strong (R-squared = 0.93).<br />
<br />
Dodecad Project participants who already have K7b <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadHZ6SHpiLTNTa3lsUmZJY2pQblVRR2c#gid=2">results</a>, as well as customers of DTC testing companies who can use <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a> together with the <a href="http://dodecad.blogspot.com/2012/01/k12b-and-k7b-calculators.html">K7b calculator</a> can approximately estimate their Gök4-related ancestry by plugging in their "Southern" value (in %) into the following equation:<br />
<br />
<div style="text-align: center;">
<b>Gök4-related ancestry = 1.721*Southern+19.736</b></div>
<br />
I anticipate that when I am able to study the Neolithic Swedish genomes directly, the Neolithic farmer from Sweden will turn up "Southern" in a K=7 resolution experiment.<br />Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com4tag:blogger.com,1999:blog-6533996127304587865.post-52717984388903871272012-03-11T13:58:00.000+02:002012-03-11T13:58:14.466+02:00ChromoPainter/fineSTRUCTURE analysis of Italy/Balkans/AnatoliaThis was done on the same dataset as the previous <a href="http://dodecad.blogspot.com/2012/03/fastibd-analysis-of-italybalkansanatoli.html">fastIBD analysis</a>.<br />
<br />
The population assignments:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8DZ0pV1I82KLyieX4tL_dG3pL1-D2hulTBThMDvN9l8Z0RJ1JmC395l8vscYk-V8_M6rK6lOAb0qPSXiKJU8rL40QjAYdKrjmMXnuxYEzdZEtsT6-tpkI1CEtSz_dq-pQFxD4Mu64v_E/s1600/mappops.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8DZ0pV1I82KLyieX4tL_dG3pL1-D2hulTBThMDvN9l8Z0RJ1JmC395l8vscYk-V8_M6rK6lOAb0qPSXiKJU8rL40QjAYdKrjmMXnuxYEzdZEtsT6-tpkI1CEtSz_dq-pQFxD4Mu64v_E/s400/mappops.jpg" width="400" /></a></div>
<br />
<br />
The heatmap, showing relationship between inferred populations:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj11v0afD7KLw4kMo-nPb4NtqHL3uLY1LpAVt5HG9S677R5H05ewb7Oqcii1T268LzDUCyxL9z3qm3mdXuEkXmto1biYr1hrugtpdvs1wMPLt6_6WHEr68ZztKwKrQglGhsYAjmdstFeSQ/s1600/heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj11v0afD7KLw4kMo-nPb4NtqHL3uLY1LpAVt5HG9S677R5H05ewb7Oqcii1T268LzDUCyxL9z3qm3mdXuEkXmto1biYr1hrugtpdvs1wMPLt6_6WHEr68ZztKwKrQglGhsYAjmdstFeSQ/s400/heatmap.png" width="400" /></a></div>
<br />
The principal components analysis:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0olGCyr7blSP6ozo_-dESad1NTJD55BXynktMyZK0wh2gQbi74gXY9C-JgaGbJNC4Xe3xLzd3fB4ZDFYNM07riARVTW1xxwb5dOy5MlNC1EJIcDSfp30OYEs08hX5YRowKY1i4zZnbk0/s1600/1_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0olGCyr7blSP6ozo_-dESad1NTJD55BXynktMyZK0wh2gQbi74gXY9C-JgaGbJNC4Xe3xLzd3fB4ZDFYNM07riARVTW1xxwb5dOy5MlNC1EJIcDSfp30OYEs08hX5YRowKY1i4zZnbk0/s400/1_2.png" width="400" /></a></div>
<b><br /></b><br />
The correspondence between inferred populations and <a href="http://dodecad.blogspot.com/2012/01/k12b-and-k7b-calculators.html">K12b</a> components:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5PGbjh1ipjbz3hZ2SdDAvPQGlBuWdmpZugchkdLWX9YrNWXJe44j2bANy2OpP14f6M4Mpg0r3Co0aa61B_llijIV3VWcePSw_2RZXNbo9FE3P2BJYcU3z2-own1IcridNi_TbHiDrBwM/s1600/correspondence.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="128" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5PGbjh1ipjbz3hZ2SdDAvPQGlBuWdmpZugchkdLWX9YrNWXJe44j2bANy2OpP14f6M4Mpg0r3Co0aa61B_llijIV3VWcePSw_2RZXNbo9FE3P2BJYcU3z2-own1IcridNi_TbHiDrBwM/s400/correspondence.jpg" width="400" /></a></div>
<br />
<br />
<b>Results for Project participants can be found in this <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadGFhb01DbmVyZkZpWWhHS1hheWk2Q0E">spreadsheet</a></b>; remember than in the chunkcounts tabs, columns represent donor and rows recipient populations.Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com2tag:blogger.com,1999:blog-6533996127304587865.post-73407482923944360832012-03-05T20:59:00.000+02:002012-03-05T21:00:35.344+02:00fastIBD analysis of Italy/Balkans/Anatolia<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
I have included the new Turkish data from <a href="http://dienekes.blogspot.com/2012/02/first-look-at-turkish-and-kyrgyz-data.html">Hodoğlugil & Mahley (2012)</a> in this analysis. Additionally, there are now 5 participants in the Serb_D and Turkish_Cypriot_D sub-populations, as well as a Bosnian Muslim. There are now project participants from many Balkan countries, although Albania, the fYROM, and Croatia remain as "black holes" in the map.<br />
<div>
<br /></div>
<div>
Still, I am hopeful that there will be more project participants from currently under-represented populations. I have already started processing the same dataset with ChromoPainter (which takes much longer), and hopefully that analysis will be posted at the end of this week or the beginning of the next one.</div>
<div>
<br /></div>
<div>
First, the <b>heatmap of inter-population IBD</b>:</div>
<div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghUxic4BYSMsbvBOQmrqRYFAHK5f44gc8AJSDX-i__jfN2PdIVIOuHLfvk1CeAAdO6Wf-rG2S9QgoZrSDarLNRL9jfKsUt2ABHRr9VtG9rFcpmlwN2nhP09mf4tIsugAFpbHR2V4VxbPg/s1600/heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghUxic4BYSMsbvBOQmrqRYFAHK5f44gc8AJSDX-i__jfN2PdIVIOuHLfvk1CeAAdO6Wf-rG2S9QgoZrSDarLNRL9jfKsUt2ABHRr9VtG9rFcpmlwN2nhP09mf4tIsugAFpbHR2V4VxbPg/s400/heatmap.png" width="400" /></a></div>
</div>
</div>
Remember that the tree groups similar populations together, and for each row in the matrix, the red end of the spectrum indicates lots of IBD sharing, and the blue end low IBD sharing. Additionally, I have now calculated the <i>median </i>IBD sharing, which is more resistant in the presence of potential relatives in the data.<br />
<br />
The results appear fairly reasonable, with the Balkan, Anatolian, and Italian populations of the title forming separate branches, and the mainland Greek sample joining with Central/South Italians and Sicilians.<br />
<br />
The <b>Clusters Galore </b>can be seen below; 28 clusters were inferred with 21 dimensions:<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO5ELNJ1svmJ5lEzffTPAq1f1vw43-r0GI4y9BAbYJegPJIsBOo9yN9rWXQHzgCkiTMP_x9aQa1Va-J36-x1XE0DqviLVSAOdym5-vQbYu0RMrqwRffxao8mUJHeo6idz9ye3xnPyEwTo/s1600/mappops.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="305" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO5ELNJ1svmJ5lEzffTPAq1f1vw43-r0GI4y9BAbYJegPJIsBOo9yN9rWXQHzgCkiTMP_x9aQa1Va-J36-x1XE0DqviLVSAOdym5-vQbYu0RMrqwRffxao8mUJHeo6idz9ye3xnPyEwTo/s400/mappops.jpg" width="400" /></a></div>
</div>
<br />
<b>Results for Project participants</b> can be found in the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadGs5RGlVa2RsampWZ2RZa0VKYUJXYkE">spreadsheet</a>, and include the probabilities that each ID is assigned to each of the 28 clusters, as well as the Z-scores comparing each individual against all populations with 5+ individuals. The Z-score should be read as follows: for each row, high values indicate a high degree of IBD sharing, while low values indicate a low degree of IBD sharing.<br />
<br />
Of course, I encourage Project participants to leave a message in the <a href="http://dodecad.blogspot.com/2010/11/information-about-project-samples.html">Information about Project samples</a> thread.</div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com27tag:blogger.com,1999:blog-6533996127304587865.post-62055889215996619302012-02-15T14:59:00.000+02:002012-02-15T15:03:07.121+02:00Correspondence between ChromoPainter clusters and ADMIXTURE components in Balkans/West AsiaI took the 25 different inferred clusters from my recent <a href="http://dodecad.blogspot.com/2012/02/chromopainterfinestructure-analysis-of.html">ChromoPainter analysis</a>, and calculated their normalized median components in terms of the <a href="http://dodecad.blogspot.com/2012/01/k12b-and-k7b-calculators.html">K12b calculator</a>. This is a quite useful exercise, since it can show in what sense clusters are different from each other.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEXB1ICsQp4bFiDGDDCCe9yDweLET9R8MShV4ACcVpAPh_GiA1CMrLC0tD2YQ5OgpV05pboUMTWQvQlKtKmWnEK6EnGBJk2treyBWlGmnnTrY4Q3k-USr6T7WwK2D5Qdl2qy20pV5lH1Li/s1600/K12b.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEXB1ICsQp4bFiDGDDCCe9yDweLET9R8MShV4ACcVpAPh_GiA1CMrLC0tD2YQ5OgpV05pboUMTWQvQlKtKmWnEK6EnGBJk2treyBWlGmnnTrY4Q3k-USr6T7WwK2D5Qdl2qy20pV5lH1Li/s400/K12b.png" width="400" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDmB3beOe87g8WZmmGO_fh-3yoe-Cp-yWT8lkOr00CIpJU_Eo577D784dHeRksSZbX95VjHui1UPN-8JsydBJCHfRpZyv55qWtgmJ_V-8whzsg4CjFtAWE46_GAYzm1RMiJaViGaz3T7Hl/s1600/K12b_medianprops.png.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="172" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDmB3beOe87g8WZmmGO_fh-3yoe-Cp-yWT8lkOr00CIpJU_Eo577D784dHeRksSZbX95VjHui1UPN-8JsydBJCHfRpZyv55qWtgmJ_V-8whzsg4CjFtAWE46_GAYzm1RMiJaViGaz3T7Hl/s400/K12b_medianprops.png.jpg" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Here are two ways in which you may use this correspondence.
<br />
<br />
<b>1. Different clusters of a single population</b>
<br />
<br />
For example, the Turks with partial Balkan ancestry <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihGOetgiwziyLHflL6wqWgxPb-lWu7hp9OtBmck15PVBmrc4rpe11RVEKCEwPWwg_Y7nBfPzYpU97c3KXzf9ctx884IuYbkTHmQeaKca0Kw6eX57l6GbJFHIZzIC0EITxFCU2zPAcnj9I9/s1600/mappops.jpg">tend to belong</a> to pop10, whereas those of Anatolian ancestry to pop13, and those from northeastern Anatolia to pop22. If we compare the admixture proportions of these three groups, we notice e.g.,
<br />
<br />
<ul>
<li>An excess of Atlantic_Baltic and North_European in pop10</li>
<li>An excess of Caucasus in pop22</li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxe1G7FEXnhkx4OshijcSWaUjGwf1ojEFyjsL6EFNLut4dMkrqF9DUvwjBYLk-iWTq-bwT5vxiX3EfZ1Yuw0phiDEUDp4eMgjYxVoq7ykPJ0wmOudHkB5pB-Vznu2Y7izCG9uJfPKxuMvP/s1600/ADMIXTURE+Iranians_12.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxe1G7FEXnhkx4OshijcSWaUjGwf1ojEFyjsL6EFNLut4dMkrqF9DUvwjBYLk-iWTq-bwT5vxiX3EfZ1Yuw0phiDEUDp4eMgjYxVoq7ykPJ0wmOudHkB5pB-Vznu2Y7izCG9uJfPKxuMvP/s320/ADMIXTURE+Iranians_12.jpg" width="320" /></a></div>
Or, there is a group of 5 Iranians that belong to pop12, whereas the overwhelming majority of Iranians and Kurds belong to pop21. Strikingly, pop12 differs from all other populations in having substantial levels of East_African and Sub_Saharan. So, it seems that fineSTRUCTURE was able to infer that some Iranian individuals had this feature in common. These individuals were already evident in the Iranian population portrait (right), but fineSTRUCTURE was able to group them even though there were no African populations in the ChromoPainter analysis; presumably, the software was able to detect that these individuals shared a set of chunks that were quite different than is the norm for the Balkan/West Asian area.<br />
<br />
<b>2. Related clusters</b><br />
<b><br /></b><br />
fineSTRUCTURE grouped the different populations in a <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUS1_9uR4oziuNh2e6N2LogLGP8EtoxCDw3P9Z3VEIpyw5LoTSdOj8Olz4G1vUEvS9NM7TnmIr36x9bbTteDOmnidKiBlClyZN_xS2eQH8SSKf4wBy7mCWNJVMpw29sBcd0RcdXP64R5ZP/s1600/heatmap.png">tree structure</a>. For example, it grouped pop18, the "North Balkan" cluster with pop23, the "Bulgarian-Romanian" one.<br />
<br />
Looking at the admixture proportions, we can tell that the two clusters do indeed seem quite similar, but there are some differences, e.g., an excess of North_European in pop18, and an excess of Caucasus in pop23. This makes sense given the geographical origin of individuals belonging to the two clusters.Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com1tag:blogger.com,1999:blog-6533996127304587865.post-83483642996096189662012-02-14T13:24:00.000+02:002012-02-15T04:09:18.773+02:00ChromoPainter/fineSTRUCTURE analysis of Balkans/West AsiaI have carried out a <a href="http://dienekes.blogspot.com/2012/01/finestructure-paper-lawson-et-al-2012.html">ChromoPainter/fineSTRUCTURE</a> analysis of Balkans/West Asia. This is a slightly different dataset than the one used in the previous fastIBD <a href="http://dodecad.blogspot.com/2012/01/fastibd-analysis-of-balkanswest-asia.html">analysis</a> of the same region. It also took much longer (about a week, with two CPUs dedicated to the task) to complete, so it is not something that can be done routinely.<br />
<br />
<b>Technical details (skip if you want)</b><br />
<b><br /></b><br />
413 individuals from 33 populations were studied, on 258,100 SNPs, after --geno 0.03 --maf 0.01 filters were applied. Data were phased in Beagle with the default 10 iterations. Genetic maps from the HapMap were used. fineSTRUCTURE was used on ChromoPainter output, with 500,000 burnin/runtime iterations each.<br />
<br />
<b>25 Inferred Populations</b><br />
<b><br /></b><br />
fineSTRUCTURE imposes a tree structure on a number of inferred populations. The following heatmap shows this tree structure; columns represent donor populations, rows, recipient ones.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUS1_9uR4oziuNh2e6N2LogLGP8EtoxCDw3P9Z3VEIpyw5LoTSdOj8Olz4G1vUEvS9NM7TnmIr36x9bbTteDOmnidKiBlClyZN_xS2eQH8SSKf4wBy7mCWNJVMpw29sBcd0RcdXP64R5ZP/s1600/heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="288" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUS1_9uR4oziuNh2e6N2LogLGP8EtoxCDw3P9Z3VEIpyw5LoTSdOj8Olz4G1vUEvS9NM7TnmIr36x9bbTteDOmnidKiBlClyZN_xS2eQH8SSKf4wBy7mCWNJVMpw29sBcd0RcdXP64R5ZP/s320/heatmap.png" width="320" /></a></div>
<br />
There was a total of 25 populations, labeled pop0, pop1, ..., pop24.<br />
<br />
The following table summarizes how many individuals from each original population were assigned to each inferred population:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihGOetgiwziyLHflL6wqWgxPb-lWu7hp9OtBmck15PVBmrc4rpe11RVEKCEwPWwg_Y7nBfPzYpU97c3KXzf9ctx884IuYbkTHmQeaKca0Kw6eX57l6GbJFHIZzIC0EITxFCU2zPAcnj9I9/s1600/mappops.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="148" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihGOetgiwziyLHflL6wqWgxPb-lWu7hp9OtBmck15PVBmrc4rpe11RVEKCEwPWwg_Y7nBfPzYpU97c3KXzf9ctx884IuYbkTHmQeaKca0Kw6eX57l6GbJFHIZzIC0EITxFCU2zPAcnj9I9/s320/mappops.jpg" width="320" /></a></div>
<br />
I will limit myself to populations which include Dodecad Project members:<br />
<br />
<ul>
<li>pop6 includes a Project North Ossetian, as well as all Yunusbayev et al. North Ossetians</li>
<li>pop7 is mainly Armenian</li>
<li>pop16 is also mainly Armenian; it would be interesting to see whether this bipartite division of Armenians is in agreement with the one inferred in the previous fastIBD analysis</li>
<li>pop8 is mainly Greek, and appears to be "continental Greek"; it also includes some other Balkan individuals</li>
<li>pop14 is also Greek, and includes a variety of people with ancestry from Crete, the Aegean, Cyprus, Asia Minor, Cappadocia, and the Pontus as well as continental Greek. It could be labeled "eastern Greek"</li>
<li>pop11 is Cypriot, including the single 100% Greek Cypriot of the Project, all 3 100% Turkish Cypriots, as well as a Turkish individual of partial Turkish_Cypriot ancestry</li>
<li>pop10 is Turkish, and includes people with some ancestry from the Balkans, as well as Anatolia. It could be labelled "Balkan Turkish"</li>
<li>pop13 is also Turkish, and seems to include people with ancestry exclusively from Anatolia, including almost all the Behar et al. Turks</li>
<li>pop15 is Assyrian; some Assyrians also fall on the aforementioned pop16 which includes mainly Armenians</li>
<li>pop18 could be labelled "North Balkan"; there is probably structure to be uncovered within this cluster, once more participants from the Balkans join the Project</li>
<li>pop20 is "Georgian-Abkhazian"</li>
<li>pop21 is "Kurdish-Iranian"</li>
<li>pop22 could be labeled "Northeastern Anatolia" or (more classically) "Pontus-Colchis". It appears to unite various individuals from Northeastern Turkey and neighboring Georgia, having Karadeniz Turkish, Armenian, Pontic Greek, and Kartvelian ancestry. I strongly encourage participants from this region to join the Project, especially Pontic Greeks, as there are no 100% Pontic Greeks currently in the Project.</li>
<li>pop23 is "Bulgarian-Romanian" mainly, and also includes one Serb. Once again, I emphasize that the power of this approach using haplotypes depends on participation, so I encourage all people from the Balkans to consider joining the Project.</li>
</ul>
<b>Principal Components Analysis</b><br />
<b><br /></b><br />
I have also used the PCA feature of fineSTRUCTURE to carry out principal components analysis. I am plotting the first two dimensions of this PCA, using my own visualization code that places labels in the average position on the plane:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVPJ9fRjCNyvTJgV9xW-2J07yp6-p7WXsVQWsH5gbRb5_Vrp8d1UTDW79qrgP_W3eaEnxpQu7d6KS0IsH_P4riPrdYe_-jh_Nf_DzOGZkl70OKZoRB8eJR7fiDZ0vJO-1tKiX7HZ1t01fG/s1600/1_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVPJ9fRjCNyvTJgV9xW-2J07yp6-p7WXsVQWsH5gbRb5_Vrp8d1UTDW79qrgP_W3eaEnxpQu7d6KS0IsH_P4riPrdYe_-jh_Nf_DzOGZkl70OKZoRB8eJR7fiDZ0vJO-1tKiX7HZ1t01fG/s320/1_2.png" width="320" /></a></div>
<br />
<b>Results</b><br />
<b><br /></b><br />
Results for Project participants are included in the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedG4wM2VFMGtiN3Y3QmFDd1phVTRNUEE">spreadsheet</a>.<br />
<br />
<ul>
<li>Population matrix, shows how many individuals from each population were assigned to each cluster</li>
<li>Z score population matrix, shows the normalized number of "chunks" from each donor population (columns) to each recipient (row). Do not compare across rows! The way to read this table is the following: for each row, higher values indicate more sharing. For example, the "Cypriots" population has pop11 as its main donor.</li>
<li>Individual assignments: the pop number that all Project and reference IDs were assigned to</li>
<li>Individual Chunkcounts: the number of chunks copied from its donor population (column) to each individual</li>
<li>Individual PCA: your PCA co-ordinates that can help you find your dot on the Principal Components Analysis graphic (see above)</li>
</ul>
Averaged results were included only for populations with >=5 members.<br />
The raw chunkcounts for all 413x413 individuals can be found <a href="https://docs.google.com/open?id=0B7JDEoCgzRKeM2EzNjdmNzItYjMzOS00YTQ0LTljYzEtMmUzOWFjYmMzYTM2">here</a>.Dienekeshttp://www.blogger.com/profile/02082684850093948970noreply@blogger.com6tag:blogger.com,1999:blog-6533996127304587865.post-7651755943046747762012-02-06T12:18:00.001+02:002012-02-06T12:18:30.533+02:00Other testing companies<div dir="ltr" style="text-align: left;" trbidi="on">
The Dodecad Project is not affiliated with any genetic testing companies. Until now, I have included Project participants from 23andMe and FamilyTreeDNA "Family Finder" tests, but it has come to my attention that there are new players in the field, such as Ancestry.com (see post on <a href="http://www.yourgeneticgenealogist.com/2012/01/update-on-new-autosomal-dna-test-from.html">Your Genetic Genealogist</a>) and Lumigenix (see post on <a href="http://www.genomesunzipped.org/2012/02/review-of-the-lumigenix-comprehensive-personal-genome-service.php">GenomesUnzipped</a>).<br />
<br />
If you have data from any company entering this field, please contact me at dodecad@gmail.com (do not send data right away!). That way, I can find out how many markers are in common between the new tests and my existing datasets, and figure out how easy it will be to convert them for use in the Project and in DIYDodecad.</div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com1tag:blogger.com,1999:blog-6533996127304587865.post-80939576174401641952012-01-31T22:27:00.000+02:002012-01-31T22:27:45.448+02:00'K12b' and 'K7b' calculators<div dir="ltr" style="text-align: left;" trbidi="on">
I am releasing two new calculators with K=12 and K=7 components, named 'K12b' and 'K7b'. You can scroll down to the bottom if you are just interested in the downloads, or read on.<br />
<b><br /></b><br />
<b>New Features</b><br />
<br />
The new <b>'K12b'</b> calculator is an update of the previous <a href="http://dienekes.blogspot.com/2011/12/first-analysis-of-metspalu-et-al-2011.html">K12a</a> one, that was inferred using all the new samples submitted during the last submission opportunity. The 12 components are still roughly the same, although their allele frequencies may have changed by a bit, so existing participants can expect to have slightly altered results, and new participants in the Project more so, since their data are now contributing to the creation of the new tool. Non-participants can, of course, use the new calculator with <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a>.<br />
<br />
I have also taken the opportunity to do some minor tweaks. I am releasing <b>population portraits</b> for K12b (which were lacking in K12a); I've changed my visualization code so that the sample IDs of non-Dodecad populations can now be seen in the barplots. This may be useful for anyone else using these reference populations, by quickly identifying potential outliers in them.<br />
<br />
I have also decided to use <b>normalized median </b>admixture proportions for the populations. For example, if 5 individuals in a population have 0, 0, 0.2, 0.5, 10.0% of a particular component, then the average is 2.14%, but the median is 0.2%. By using the median, the proportions become less susceptible to the presence of outliers (such as the 10%). However, if the median is calculated over every component separately, it is no longer guaranteed that the components will add up to 100%; this can be addressed by re-normalizing them (scaling them by a constant factor) so that they do. I believe that use of the normalized median will not only give better proportions that are less susceptible to outliers, but will also improve results of the new Dodecad Oracle for K12b.<br />
<br />
At the same time I am also releasing <b>'K7b' </b>which is an update of the existing '<a href="http://dodecad.blogspot.com/2011/10/eurasia7-calculator.html">eurasia7</a>' calculator and which has been built on exactly the same dataset as 'K12b' but at a lower (K=7) level of detail.<br />
<br />
<b>Information on K7b</b><br />
<b><br /></b><br />
Information <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadHZ6SHpiLTNTa3lsUmZJY2pQblVRR2c">spreadsheet</a>.<br />
<br />
Normalized median admixture proportions barplot for all included populations (a high resolution version of this is included in the download bundle):
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSO_fFxNDncgUpnr5SWahTyMeNQ1vlsxkVVcTU-O-0c1ztqp7_wA-MOG3_DXdcijEJx0O2aZ3c6DIIs0j5LwMFRMChlwnGDtdwFdm7a3SgtSL-rDkCVy8GWyPPObx8_Rl58hX20aWOZMQL/s1600/_7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="64" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSO_fFxNDncgUpnr5SWahTyMeNQ1vlsxkVVcTU-O-0c1ztqp7_wA-MOG3_DXdcijEJx0O2aZ3c6DIIs0j5LwMFRMChlwnGDtdwFdm7a3SgtSL-rDkCVy8GWyPPObx8_Rl58hX20aWOZMQL/s320/_7.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
Table of Fst divergences:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwIPDQ6AgXFQpFBCN7UA7wJkrYfZ5dG52-BXfJfYgYit4M7O7rgVXntUB6n8ZnGfMp99tMqagA5wineiC0Pt3UIZSwvd8S1M7nFKJpIIcJjR-YCvnRKwx7tixqqIXjzO9-cHCwIpM8WVVB/s1600/fst.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="78" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwIPDQ6AgXFQpFBCN7UA7wJkrYfZ5dG52-BXfJfYgYit4M7O7rgVXntUB6n8ZnGfMp99tMqagA5wineiC0Pt3UIZSwvd8S1M7nFKJpIIcJjR-YCvnRKwx7tixqqIXjzO9-cHCwIpM8WVVB/s320/fst.png" width="320" /></a></div>
<br />
Neighbor-joining tree (based on above):<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiWbGsXeo35xsHQH68tVnz2F3elqaTP_2ZO-3QA1IUNcdUr0U-rDNvy_R83Zyo18WK9M-BMvAO4zLkwHyATSKAiGzi1AGIz05h-1WyfsGz_ygAcV5SEaek5Au0m-DEcMcJDvR0z1y6kXNR/s1600/nj.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiWbGsXeo35xsHQH68tVnz2F3elqaTP_2ZO-3QA1IUNcdUr0U-rDNvy_R83Zyo18WK9M-BMvAO4zLkwHyATSKAiGzi1AGIz05h-1WyfsGz_ygAcV5SEaek5Au0m-DEcMcJDvR0z1y6kXNR/s320/nj.png" width="320" /></a>
<br />
<b>Information on K12b</b><br />
<b><br /></b><br />
Information <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedEY4Y3lTUVBaaFp0bC1zZlBDcTZEYlE">spreadsheet</a>.<br />
<b><br /></b><br />
Normalized median admixture proportions barplot for all included populations (a high resolution version of this is included in the download bundle):<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimKHylVIYmksNC5mIXZBM7jvck41nOeEC7WtZH9xfPCR6xammGV_YKIH_gXhip_k6TN52WdeFaApFLnCXwCh_uq9LOoxpPl8sK5deEuHbPGbfHD3d2pszLndEuWd0OFbhyawc_IcnjfZSd/s1600/_12.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="64" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimKHylVIYmksNC5mIXZBM7jvck41nOeEC7WtZH9xfPCR6xammGV_YKIH_gXhip_k6TN52WdeFaApFLnCXwCh_uq9LOoxpPl8sK5deEuHbPGbfHD3d2pszLndEuWd0OFbhyawc_IcnjfZSd/s320/_12.png" width="320" /></a></div>
<br />
Table of Fst divergences:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiq2H1g5vAKGlko6DAnwQ4mboe0OyoqtpCZv4roG4JHFAasggVHqavCt-oUvfBZJa7pvEw8nP1VBLuRfU8tFXLY3YYRY_fXhsodsVxol9PVzRBZenIbkHh8Y2xFhtP_-ap1FVnpLqCwHIQc/s1600/fst.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="62" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiq2H1g5vAKGlko6DAnwQ4mboe0OyoqtpCZv4roG4JHFAasggVHqavCt-oUvfBZJa7pvEw8nP1VBLuRfU8tFXLY3YYRY_fXhsodsVxol9PVzRBZenIbkHh8Y2xFhtP_-ap1FVnpLqCwHIQc/s320/fst.png" width="320" /></a></div>
<br />
Neighbor-joining tree (based on above):<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxaUVTUpKDLdGFRdvghgs9r5vxakn-0X25Sv66QV1E8Z10IeN-x2s_2xliXw56TQxZGtauuYSnW5-5xdSlCPRmMm_Aw_5a8_y7daliuzpJVeVY_hD9xdqCQ4IU6BZ8ULBP7K8_8A1pGdPk/s1600/nj.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxaUVTUpKDLdGFRdvghgs9r5vxakn-0X25Sv66QV1E8Z10IeN-x2s_2xliXw56TQxZGtauuYSnW5-5xdSlCPRmMm_Aw_5a8_y7daliuzpJVeVY_hD9xdqCQ4IU6BZ8ULBP7K8_8A1pGdPk/s320/nj.png" width="320" /></a></div>
<b>Multidimensional Scaling Plots of K12b and K7b</b><br />
<b><br /></b><br />
I have created MDS plots using synthetic individuals representing the 12 ancestral components of K12b and the 7 ancestral components of K7b. By including both in the same plot, one gets an idea of the relationship of the components at different resolution. The first 10 dimensions can be seen below:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDq2OcbTdb9EIUW_Hup3QNlXE7JF6u-LFXTB40KDVS1QMI2adsZrG9XNS2IShyCZ1muTxNdTF5oATvqk6A1rtbPlSRIDj3a42aTw_yWuxsgG0Gvok2QHGU1CZTLP41zQ8S72tTz1YpZNZC/s1600/1_2.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDq2OcbTdb9EIUW_Hup3QNlXE7JF6u-LFXTB40KDVS1QMI2adsZrG9XNS2IShyCZ1muTxNdTF5oATvqk6A1rtbPlSRIDj3a42aTw_yWuxsgG0Gvok2QHGU1CZTLP41zQ8S72tTz1YpZNZC/s200/1_2.png" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZXv1Fos4h2Tgl75H_rn7Vi4BXGOwgPwHpE5y8M2FD8cBpa304v2X5o-yCRhDqz-AJTYLt0ZtKfPqQ5lqbSFtqxivTYPaiH2nOTqGNt6NYBKkAiubD0AS47H7qc-t6_o-gh9j5GWoxsOrn/s1600/3_4.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZXv1Fos4h2Tgl75H_rn7Vi4BXGOwgPwHpE5y8M2FD8cBpa304v2X5o-yCRhDqz-AJTYLt0ZtKfPqQ5lqbSFtqxivTYPaiH2nOTqGNt6NYBKkAiubD0AS47H7qc-t6_o-gh9j5GWoxsOrn/s200/3_4.png" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6L5_Th-O9-lAj_7OlkL3h3Xvqxcd7bPeBaS0emLEeKItfXg5UfcV5wZdrMkVOtgD5_eGVon-72xnYHutB4BIDoCmeTP3MVV92hklXa9D4V23ZZTmWx3A6er9zzerfDpH-3Y8rgj5IFDEI/s1600/5_6.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6L5_Th-O9-lAj_7OlkL3h3Xvqxcd7bPeBaS0emLEeKItfXg5UfcV5wZdrMkVOtgD5_eGVon-72xnYHutB4BIDoCmeTP3MVV92hklXa9D4V23ZZTmWx3A6er9zzerfDpH-3Y8rgj5IFDEI/s200/5_6.png" width="200" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhU8l6Vt2DfjVON_N4ffQI6wXNkL9rf3i-eriklIMAIWx9aQhf-gNEO76TVN7fyvcFg9UeR0_hsGO_gY5QXdYfPKHPDX-5RAmRGlCzfu8oFaTPAPHFbNdBxQQ2Uwejru7g9IKnWDbNM-X9a/s1600/7_8.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhU8l6Vt2DfjVON_N4ffQI6wXNkL9rf3i-eriklIMAIWx9aQhf-gNEO76TVN7fyvcFg9UeR0_hsGO_gY5QXdYfPKHPDX-5RAmRGlCzfu8oFaTPAPHFbNdBxQQ2Uwejru7g9IKnWDbNM-X9a/s200/7_8.png" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWTsMM3yqT6LepnHbr04fYSxCtONiIKgPoL2shvDS4POh6B8eGIyYNnyaM8oppVxe4DaF0oRgJCStgjCbrkrC-nEoP-zgdytbeN92BuujmukqyVmYRmL9VBIFFYzF7sbk0yE280n6YnjjR/s1600/9_10.png" imageanchor="1"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWTsMM3yqT6LepnHbr04fYSxCtONiIKgPoL2shvDS4POh6B8eGIyYNnyaM8oppVxe4DaF0oRgJCStgjCbrkrC-nEoP-zgdytbeN92BuujmukqyVmYRmL9VBIFFYzF7sbk0yE280n6YnjjR/s200/9_10.png" width="200" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Here is a blowup of the main West Eurasian groups from the plot of the first two dimensions:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEir8lIc2_XoO6YrMUQK5IbwbtA3ZzqnhGCVIyOb_4JORVj-fektwxpzd-CoNTHpVTOPseVb6filpKRnlcCM_qACtO_ZMy1x76-EQ6ifdr6-krC2m0S_L7RmR0g1NQZkaEvB57HFVPppjSeD/s1600/thesix_global.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="380" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEir8lIc2_XoO6YrMUQK5IbwbtA3ZzqnhGCVIyOb_4JORVj-fektwxpzd-CoNTHpVTOPseVb6filpKRnlcCM_qACtO_ZMy1x76-EQ6ifdr6-krC2m0S_L7RmR0g1NQZkaEvB57HFVPppjSeD/s400/thesix_global.png" width="400" /></a></div>
<br />
Some observations:<br />
<br />
<ul>
<li>The Atlantic_Med component which is bi-modal in Basques and Sardinians occupies the apex of the figure; this makes sense, since Southwest Europe is quite distant (along land routes) to both Asia and Africa.</li>
<li>The Caucasus component is surrounded by most of the others; this is consistent with my theory elaborated in <a href="http://dienekes.blogspot.com/2011/12/womb-of-nations-how-west-eurasians-came.html">The womb of nations: how West Eurasians came to be</a>.</li>
<li>The Atlantic_Baltic component (from K=7) is intermediate between the Atlantic_Med and North_European components.</li>
<li>Similarly, the West_Asian component (from K=7) is intermediate between the Caucasus and Gedrosia components; the Gedrosia component diverges in the direction of the Asian groups (not shown in this figure), and in particular of South Asians. This divergence can also be seen in the plot of dimension #3.</li>
<li>The Northwest_African component diverges in the direction of Sub-Saharan Africans.</li>
</ul>
<br />
<b>Technical Details</b><br />
<b><br /></b><br />
A dataset of 268 populations/3,115 individuals was assembled. A total of 265,519 SNPs are in common in the various source datasets as well as the 23andMe v2/v3 and Family Finder platforms. Iterative removal of distant relatives was performed by removing one individual from each pair within a population if that pair had a RATIO of 2.5 or greater or more than the mean and two standard deviations in IBD analysis performed in PLINK 1.07. A total of 2,675 individuals remained. 4 individuals were removed for low genotyping rate (less than 97%). 264,328 SNPs remained after removal of SNPs with less than 97% genotyping rate or 1% minor allele frequency. 166,770 SNPs remained after linkage-based disequilibrium pruning (--indep-pairwise 200 25 0.4). The final set thus consisted of 2,671 individuals/268 populations/166,770 SNPs. Ancestral populations (components) were inferred using ADMIXTURE 1.21, with K=7 and K=12 and default parameters.<br />
<br />
No individuals were removed from the source datasets, except in the case of the Armenians_Y sample, where one individual (ID: armenia3) was dropped because he/she was the same as a Dodecad Project participant.<br />
<br />
<b>Downloads</b><br />
<b><br /></b><br />
K7b population <a href="https://docs.google.com/open?id=0B7JDEoCgzRKeYTA0ZGE1N2ItZTE0ZC00YjdmLWE1NWItYTk0NjdjMjc1OGZm">portraits</a>, <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadHZ6SHpiLTNTa3lsUmZJY2pQblVRR2c">spreadsheet</a>, and DIYDodecad <a href="https://docs.google.com/open?id=0B7AJcY18g2GaZGVlMDJiNWItMDdmMy00YWYxLTljNTAtMzcyNzRkMzc1MTRj">files</a>.<br />
K12b population <a href="https://docs.google.com/open?id=0B7JDEoCgzRKeMzgzOWVhNjUtZWIxYy00MjI0LTlkYTMtNGNkZjhmMmI3NjQz">portraits</a>, <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedEY4Y3lTUVBaaFp0bC1zZlBDcTZEYlE&hl=en_US#gid=0">spreadsheet</a>, and DIYDodecad <a href="https://docs.google.com/open?id=0B7AJcY18g2GaZWFhZTYxOTUtNDI2Yi00M2VlLWEzZGYtODIyNzUxNWJlZTdl">files</a>.<br />
<br />
Dodecad Oracle (K12b edition) can be downloaded from <a href="https://docs.google.com/open?id=0B7AJcY18g2GaYWNiZjI3ZGYtM2EwYy00OTdjLTgwNjUtMWZkODFhNDQ5NjFi">here</a>. Please read the instructions of the <a href="http://dodecad.blogspot.com/2011/12/dodecad-oracle-k12a-edition.html">previous</a> Oracle on how to use this tool. Note that the number of populations is now 223.<br />
<br />
To use either calculator with <a href="http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html">DIYDodecad</a>, with your 23andMe or Family Finder data, follow the instructions in the README file, but substitute 'K12b' or 'K7b' for 'dv3'.<br />
<br />
Project participant results for both K7b and K12b are found in the spreadsheets in the Individual Results tab.<br />
<br />
<b>Terms of Use</b><br />
<b><br /></b><br />
You are free to use K12b and K7b, including all downloaded files for any non-commercial purpose, as long as you attribute them to the Dodecad Project and to Dienekes Pontikos as follows:<br />
<br />
The [K7b/K12b] admixture calculator is courtesy of <a href="http://dienekes.blogspot.com/">Dienekes Pontikos</a> and was developed as part of the <a href="http://dodecad.blogspot.com/">Dodecad Ancestry Project</a>; more information <a href="http://dodecad.blogspot.com/2012/01/k12b-and-k7b-calculators.html">here</a>.</div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com41tag:blogger.com,1999:blog-6533996127304587865.post-74154339216854426702012-01-24T04:10:00.000+02:002012-01-24T04:26:47.845+02:00Submission Opportunity is OVER<div dir="ltr" style="text-align: left;" trbidi="on">
Thank you everyone for submitting their data. I will not accept any more data at this time. A couple of submissions came in at the last second, so I accepted one more than I promised, who got the brand new DPD001 ID.<br />
<br />
Those who submitted in time will get their IDs and their results will be posted in the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArJDEoCgzRKedGdRbkxKMDdlZkJWc21tdkpldWxwVmc">K12a spreadsheet</a>.<br />
Additionally, I will run all participants over <a href="http://dodecad.blogspot.com/2011/12/world9-calculator.html">world9</a>, so that <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadGlpc3JQaVdQbS1QTWF3SzNjTVdfZEE">spreadsheet</a> will also include everybody.<br />
<br />
From now on, I will be reworking some of the Project tools to make use of newer samples submitted during this submission opportunity.<br />
<br />
If you wish to submit your data during this off period, note that you must contact me at dodecad@gmail.com. <b>Do not send data at this time, unless I indicate that I can accept it!</b> I will let you know if I can process it, and note that I will normally only consider those who matched the <a href="http://dodecad.blogspot.com/2012/01/submission-opportunity-january-2012.html">eligibility criteria</a> of the most recent submission period.</div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com2tag:blogger.com,1999:blog-6533996127304587865.post-36505020826852120002012-01-23T22:23:00.000+02:002012-01-24T04:27:42.981+02:00Open submission for everybody until DOD999<div dir="ltr" style="text-align: left;" trbidi="on">
<b>SUBMISSION OPPORTUNITY IS NOW <a href="http://dodecad.blogspot.com/2012/01/submission-opportunity-is-over.html">OVER</a></b><br />
<br />
<b>Everyone on the planet is invited to submit their data, regardless of their ancestry</b>.<br />
<br />
All other <a href="http://dodecad.blogspot.com/2012/01/submission-opportunity-january-2012.html">rules</a> apply, especially the <b>no relatives</b> clause. Additionally, I will accept <b>a single submission from each submitter</b>, so don't submit all your friends. Moreover, regardless of your ancestry, you should <b>let me know the origin of your four grandparents.</b><br />
<div>
<br /></div>
<div>
There are 35 spots open, so hurry, since last time I had a free-for-all I had to close it down after about 12 hours due to overwhelming demand. I will close project submission after I assign DOD999.</div>
<div>
<br /></div>
<div>
All submissions after I post the end-of-submission announcement on the blog will be ignored. If you post this in any forums or mailing lists, include this post link so that people will know whether the opportunity is over.</div>
</div>Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com1tag:blogger.com,1999:blog-6533996127304587865.post-15071630151823123242012-01-21T22:27:00.000+02:002012-01-21T22:31:25.461+02:00fastIBD analysis of Afroasiatic groups (Jews, Arabs, Assyrians, Berbers, Somalis, Amharas, etc.)Please refer to the previous analysis on the <a href="http://dodecad.blogspot.com/2012/01/fastibd-analysis-of-balkanswest-asia.html">Balkans/West Asia</a> for more information about the interpretation of this type of analysis.<br />
<br />
I am very pleased with the way this analysis of Afroasiatic groups has turned out, revealing an exceptional degree of resolution. I invite individuals from the Near East and Africa who are<a href="http://dodecad.blogspot.com/2012/01/submission-opportunity-january-2012.html"> eligible</a>, to submit their data, so that they can be included in future runs of this kind.<br />
<br />
<b>Clusters Galore</b><br />
<b><br /></b><br />
45 clusters were inferred with 29 dimensions.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibiR4cp84Fz0TcvJtj2V_wf38AAesOja_SInA5ctEaIYAM4sF6Vmpg0HfjDBKXIIAGmjZx7gsVOOFIw0D2YKbRgn7ox3pjAa-EucUEDz_jPKazrgidaCBGGfm7ZIPwRDxh4h8sdO9175Y/s1600/galore_afroasiatic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="197" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibiR4cp84Fz0TcvJtj2V_wf38AAesOja_SInA5ctEaIYAM4sF6Vmpg0HfjDBKXIIAGmjZx7gsVOOFIw0D2YKbRgn7ox3pjAa-EucUEDz_jPKazrgidaCBGGfm7ZIPwRDxh4h8sdO9175Y/s320/galore_afroasiatic.png" width="320" /></a></div>
<br />
I can't comment on all 45 clusters, so I'll just limit myself to the ones that are significantly represented among Project participants: <b>1.</b> Ashkenazi, <b>4.</b> Assyrian/Mandaean, <b>6.</b> Somali, <b>7.</b> Moroccan, <b>8.</b> Algerian/Tunisian, <b>9.</b> Sephardic, <b>10.</b> Morocco Jews, <b>11.</b> Iran/Iraq Jews, <b>12.</b> Non-Jewish Ethiopians, <b>13.</b> Saudi, <b>14.</b> Arab #1, <b>15.</b> Arab #2, <b>16.</b> Egyptian<br />
<br />
<b>Inter-Population IBD</b><br />
<b><br /></b><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirwRNOK-X3TFOss3V-9t6rLPH4jP_rJaWh8Rh149HoIBlN2Ta7gA-zBtTB_6r-b01kGTTyx4oWrWrn7E-DeDK1DOxsk2yJcOlztsvRrH_2DhcNVPdbUKiZjh6ZWGm7t16lwSR7UpUnklo/s1600/heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirwRNOK-X3TFOss3V-9t6rLPH4jP_rJaWh8Rh149HoIBlN2Ta7gA-zBtTB_6r-b01kGTTyx4oWrWrn7E-DeDK1DOxsk2yJcOlztsvRrH_2DhcNVPdbUKiZjh6ZWGm7t16lwSR7UpUnklo/s320/heatmap.png" width="320" /></a></div>
<b>Results for Project Participants</b><br />
<b><br /></b><br />
The results can be found in the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadGIxMmNvMEVJYlR2ZWdGYzJtaV9HV1E">spreadsheet</a>.<br />
<br />
I have also added the full <a href="https://docs.google.com/open?id=0B7AJcY18g2GaMzVhY2M5ZDgtMzk5NC00MDJlLWI0ZjUtOWQ4YzMyMzY5MDE2">IBD sharing matrix</a> which lists how many Morgans of sequence are estimated to be IBD with probability greater than 10^-6 between all pairs of individuals.<br />
<br />
You can google any non-Project sample IDs to get some more information about their origin. For example, <a href="https://www.google.com/search?sourceid=chrome&ie=UTF-8&q=GSM536710">GSM536710</a> is an Iraqi Jew who shares about half his genome with <a href="https://www.google.com/search?sourceid=chrome&ie=UTF-8&q=GSM536714">GSM536714</a>, also an Iraqi Jew. These two samples are almost certainly first-degree relatives. Or, GSM537032, a Samaritan shares 740-1,480cM with the other 2 Samaritans, an exceptional amount in this small and probably highly inbred population.<br />
<br />
You can manipulate this matrix in R. After you download it and unzip it, you can load it into R as follows:<br />
<br />
X<-read.table('afroasiatic_ibd_sharing.txt',row.names=1,header=T)<br />
<br />
Then, you can, for example, sort the IBD sharing for a particular individual, as follows:<br />
<br />
sort(X['DOD026',])Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com16tag:blogger.com,1999:blog-6533996127304587865.post-10139301572080722512012-01-21T11:49:00.000+02:002012-01-22T02:53:35.050+02:00fastIBD analysis of Central/Eastern EuropePlease refer to the previous analysis on the <a href="http://dodecad.blogspot.com/2012/01/fastibd-analysis-of-balkanswest-asia.html">Balkans/West Asia</a> for more information about the interpretation of this type of analysis.<br />
<br />
<b>Clusters Galore</b><br />
<b><br /></b><br />
The Clusters Galore can be found in the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadDVVQnlkS2xKTnhPZE42R0toN3JGQ3c">spreadsheet</a>. After inspection of the 23 clusters inferred with 21 dimensions, they could be described as:<br />
<br />
<ol>
<li>Mordvin</li>
<li>East Slavic</li>
<li>Polish-Ukrainian</li>
<li>East Balkan</li>
<li>Vologda Russians</li>
<li>Lithuanian</li>
<li>Central European (combining many groups with small sample sizes)</li>
<li>A couple of related (?) individuals</li>
<li>Anatolian</li>
<li>Greek</li>
<li>Chuvash</li>
<li>Ossetian</li>
<li>A couple of related individuals</li>
<li>A couple of related individuals
</li>
<li>Balkar</li>
<li>A couple of related individuals
</li>
<li>Chechen</li>
<li>Kumyk</li>
<li>A couple of related individuals
</li>
<li>Adygei</li>
<li>Lezgin #1 (main)</li>
<li>Lezgin #2</li>
<li>Lezgin #3</li>
</ol>
If you belong to a population with few other participants, you might end up latching onto a cluster dominated by a bigger group. This does not mean that your population is not distinctive, only that there are not enough samples to reveal its distinctiveness if it exists.<br />
<br />
<b>Inter-Population IBD</b><br />
<b><br /></b><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiweUb-nLAncTAIuiY18mc5NcvT1uqUFwV7cns9cfLVGON7I6ekKGlkb2TPrg2TViNeERG8AW7e11-ANw_cdctTmsvsa4zk8FLpnKf46IJ4HIJiwEp_FGJWFLna4coDxf4kNo7f1aiZmY0/s1600/heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiweUb-nLAncTAIuiY18mc5NcvT1uqUFwV7cns9cfLVGON7I6ekKGlkb2TPrg2TViNeERG8AW7e11-ANw_cdctTmsvsa4zk8FLpnKf46IJ4HIJiwEp_FGJWFLna4coDxf4kNo7f1aiZmY0/s320/heatmap.png" width="320" /></a></div>
<b>Results for Dodecad Participants</b><br />
<br />
Results can be found in the <a href="https://docs.google.com/spreadsheet/ccc?key=0ArAJcY18g2GadDVVQnlkS2xKTnhPZE42R0toN3JGQ3c">spreadsheet</a>.<br />
<br />
If you have joined the Project, please consider leaving a comment in the<a href="http://dodecad.blogspot.com/2010/11/information-about-project-samples.html"> Information about Project samples</a> thread. That will help others make better sense of their results, e.g., if you find that you belong in the same cluster with some other individual, you might want to know something about their origins.<br />
<br />
<b>UPDATE: </b>I have added the <a href="https://docs.google.com/open?id=0B7AJcY18g2GaMTc1YzQ5NzMtZDkxNi00MWJiLWJlODQtZjRjNDkzNzVlYzc5">IBD sharing matrix</a>.See <a href="http://dodecad.blogspot.com/2012/01/fastibd-analysis-of-afroasiatic-groups.html">here </a>on how to use it.Dodecad Projecthttp://www.blogger.com/profile/10447516703222698752noreply@blogger.com11