Tuesday, January 25, 2011

23andMe v3 data

Second, I had the opportunity to look at the v3 data as reported by 23andMe. By my count there are 936,845 markers in the v3 platform, including 168,554 that are in common with the ~177k subset I normally use for 23andMe v2 data. Thus, while most of the v2 SNPs are part of the v3 platform, some of them are not. I anticipate no problems in integrating v3 data with the existing database of the Project.

Monday, January 17, 2011

Clusters Galore results, K=64 for Dodecad Project members (up to DOD307)

The results are found in the spreadsheet. 64 clusters were inferred with 12 MDS dimensions retained. Check the previous K=56 analysis and links therein to familiarize yourself with this ancestry inference technique.

The results table consists of 271 rows for non-related Project members with 23andMe data, each of which contains the probabilities that they belong to one of the 64 clusters. This is followed by a table in which the number of individuals from each reference population assigned to each cluster is shown.

The increase in the number of individuals, compared to the previous analysis (up to to DOD236) has resulted in the inference of a greater number of clusters. Once again, I recommend that people leave a comment in the ancestry thread because that is the only way in which you will find out where people in your cluster are from.

I did not fully inspect all the clusters to see what each of them corresponds to. Some observations:
  • #8 seems to be the mainly Greek/South Italian cluster previously detected
  • #3 seems to be a Finno-Ugrian cluster
  • Most Dodecad Project members belong to cluster #12 which corresponds to White Utahns in the reference populations
  • #12 seems to be the main Iberian cluster
The following IDs appear as outliers:
DOD004 DOD029 DOD030 DOD032 DOD033 DOD034 DOD036 DOD047 DOD060 DOD072 DOD088 DOD119 DOD126 DOD128 DOD132 DOD156 DOD157 DOD168 DOD169 DOD175 DOD186 DOD196 DOD240 DOD245 DOD251
You can also see where you fall on the first 2 MDS dimensions. Your co-ordinates can be found in this spreadsheet.

Saturday, January 15, 2011

Results for DOD299 to DOD307 posted

These are the final results of the recent open-ended submission opportunity that has just ended. Feel-free to add some information about your ancestry in the relevant thread.

NOTE: The images are now fixed.

Admixture proportions can be found in the spreadsheet

All populations:

Individual bars:

Friday, January 14, 2011

Tuesday, January 11, 2011

Results for FFD056 to FFD061 posted

Note that I am not accepting Family Finder data at this time; these samples have accumulated since the last submission opportunity. Feel free to follow the blog to be alerted for new submission opportunities, and if you received your results, please take the time to leave a comment in the ancestry thread.

Admixture proportions can be found in the spreadsheet

All populations:

Individual bars:

Saturday, January 8, 2011

ADMIXTURE analysis with Dodecad Populations (update #2)

Thanks to all the participants of the Project, the number of populations has increased, and so have sample sizes within pre-existing populations in the Project. There are now 17 populations with at least 5 individuals in the Project:
Assyrian, Scandinavian, Greek, Finnish, S_Italian_Sicilian, Ashkenazi, German, Indian, Portuguese, Armenian, Russian, Spanish, British, Irish, Turkish, N_Italian, Balkans
Below are the K=10 ADMIXTURE results with these populations:

Admixture proportions can be found in the spreadsheet.

The fact that the addition of 17 populations and 143 individuals to the core set of 36 populations and 692 individuals results in the same 10 ancestral components testifies to the stability of this solution. Hopefully, within 2011 I will develop an even better comparison set to work with.

Another test of the validity of the analysis is comparison of independent samples of the same populations:
Ashkenazi, Armenian, Spanish, Turkish, N_Italian
I have a sample of Dodecad Project members for each of the above, as well as a published population. A way to measure the concordance between the two is to calculate the correlation coefficient (rounded to the 3rd decimal point):
  • Ashkenazi Jews: 0.999
  • Armenians: 0.988
  • Spanish: 0.998
  • Turkish: 0.995
  • N_Italian: 0.996
The concordance is remarkable.

I have also made a RAR of "population portraits". It is important to do this to determine whether minor ancestral components represent population-wide phenomena or are limited to a few individuals.

For example, here are the Turks of the Dodecad project:
The sample is a bit more varied than the sample included in Behar et al:
This probably underscores the importance of broad coverage of large countries and ethnic groups, as I have discovered recently in my analysis of 9 different populations of Pakistan.

Another new population are the Irish, presenting a picture of remarkable homogeneity:
Here is the population portrait for the Balkans, which consists of non-Greek, non-Roma inhabitants of the Balkans:
This appears quite varied; hopefully more Balkan project participants will allow me to split this into additional sample populations.

Finally, here is a portrait of the Ashkenazi population, which appears quite similar to the Behar et al. one:
A very interesting thing about this population is the existence of small slices of "East Asian" and "Northeast Asian" components totalling about 1.5% in almost all individuals. In my opinion this testifies to some type of old minor absorption, as it is fairly evenly spread in the population.

Friday, January 7, 2011

Results for DOD288 to DOD298 posted

Admixture proportions can be found in the spreadsheet

All populations:

Individual bars: