Thursday, March 31, 2011

Clusters Galore results, K=73 for Dodecad Project members (up to DOD581)

The results can be found in the spreadsheet. There are 73 clusters with 13 MDS dimensions retained. The spreadsheet contains 558 rows for unrelated project participants, each of which contains the probabilities that each individual belongs to one of the 73 clusters. This is followed by 36 rows for the reference populations, showing how many individuals from each population are assigned to each cluster.

In order to interpret your results, first search for your DOD number, and see which columns you have non-zero probabilities in. For the vast majority of individuals you will be uniquely assigned (100%) in one of the 73 clusters. Then, you can visit the ancestry thread to see who else is assigned to the same cluster as yourself, and also look in the reference populations to see how they are represented in the different clusters.

The following 67 IDs were characterized as outliers:
DOD002 DOD004 DOD006 DOD020 DOD029 DOD030 DOD034 DOD036 DOD060 DOD063 DOD072 DOD075 DOD081 DOD107 DOD126 DOD128 DOD132 DOD133 DOD156 DOD157 DOD168 DOD169 DOD175 DOD183 DOD201 DOD224 DOD234 DOD245 DOD252 DOD303 DOD309 DOD316 DOD326 DOD328 DOD339 DOD348 DOD349 DOD359 DOD363 DOD380 DOD382 DOD385 DOD388 DOD392 DOD393 DOD422 DOD425 DOD430 DOD435 DOD437 DOD489 DOD492 DOD495 DOD500 DOD502 DOD511 DOD521 DOD523 DOD531 DOD533 DOD536 DOD548 DOD571 DOD572 DOD573 DOD574 DOD577

As previously explained, outliers may either be mixed individuals or individuals from particular populations not well represented in the Project. In both cases they appear to be more "distant" from other individuals and from their respective clusters.

Getting back to the 74 inferred clusters:
  • Cluster #2 is by far the largest, consisting of mainly of "British Isles"/American White types of people; this grew substantially because of the recent open submission call when many people of this type of ancestry joined
  • Cluster #3 is essentially Ashkenazi Jewish, another big group in the Project
  • Cluster #5 is not represented in the reference populations except for a single Utah White. This is largely German.
  • Cluster #6 is mostly (but not exclusively) French.
  • Cluster #9 is largely Finnish and also includes some East Slavs.
  • Cluster #12 is essentially South Italian/Sicilian/Greek
  • Cluster #14 is mostly Assyrian/Armenian
  • Cluster #16 is mostly Balkan
  • Cluster #21 is mostly Scandinavian
  • Cluster #27 is mainly Balto-Slavic
  • Cluster #29 is essentially Iberian
I covered most of the largest clusters, but there are also plenty of smaller ones. So, make sure you contribute/read the ancestry thread to get a feel for the kind of people that share your cluster. Many of mixed-race participants (e.g., African Americans) are split into multiple clusters; I recently observed that this is the case for highly variable populations with inter-continental admixture.

Don't forget also, that sharing a cluster does not imply a very strong genetic similarity, as clusters may be either very tight or very loose. This analysis is better at identifying differences than at confirming strong similarities.

Readers of the blog will be aware that many of these clusters can be subdivided further if a regional analysis is carried out (e.g., the Assyrian-Armenian one), while others have proven difficult to split meaningully (e.g., the Iberian one into Spanish vs. Portuguese).

This time around, I included all the Genomes Unzipped people as well as Lily and Greg Mendel (LIL001, GRM001).

I will be exploring these clusters further, and any further regional structure that I may discover will be posted in this blog. So, do subscribe to the feed as there may be additional results for your sample ID.

24 comments:

  1. Ok. I know that I am a outlier (DOD349), but still I wonder, why do I cluster with South Indians? And not with for example Turks or even Uyghurs? Or do these results don't mean anything when you are outlier?

    ReplyDelete
  2. Oops, I am sorry. I just noticed that I looked wrong. I do not cluster with South Indians/Tamils. :)

    ReplyDelete
  3. Cluster 42 (4 individuals) looks like it might be South-East Asian. 2 of the individuals in it are confirmed Filipinas (300,326) the other two indivduals aren't listed in the ancestory thread (448,429), all 4 have some south Asian but this ranges from 0.8% to 10.1%

    ReplyDelete
  4. Hi Dienekes,

    Could you please urge the following users to make a short comment about their ancestry in the appropriate thread? I couldn't find the ethnic backgrounds of DOD395, 414 and 449 in the ancestry thread.

    Thanks.

    ReplyDelete
  5. For reference : Cluster 44 seems to consist of mostly South Indian Brahmins, and the 15% individual is a Bengali Brahmin.

    DOD327 - Iyer Brahmin from Tamil Nadu.

    DOD331 - Iyengar Brahmin from Karnataka (a real life 7th cousin of mine/distant relative - predicted as a third to fourth cousin by 23andMe's Relative Finder).

    DOD395 - Goan Catholic of Brahmin (likely Saraswat Brahmin) descent.

    DOD414 - ?

    DOD430 - [15%] Bengali Brahmin.

    DOD449 - ?

    DOD451 - Tamil speaking Iyengar Brahmin from Madurai, Tamil Nadu

    DOD527 - Namboothiri Brahmin from Kerala, South India

    So, this is a shout out to 414 and 449 - please identify yourselves :-)!

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. DOD575 has a 66% probability of being in cluster #4 and a 33% probability of being in cluster #19.

    #19 is mostly North Italian
    #4 seems like an aggregate of different types, some Italians, some mixed Europeans. It is not as distinct. Here are the people that have some probability of belonging to it:


    JKP001 12
    DOD575 66
    DOD307 86
    DOD456 88
    DOD082 96
    DOD290 98
    DOD024 100
    DOD108 100
    DOD139 100
    DOD454 100
    DOD557 100

    You should look at the ancestry thread to see if they've posted

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Great work! Is a supervised analysis of the Balkans in the pipeline by any chance? Pretty please:)

    ReplyDelete
  10. Great work! Is a supervised analysis of the Balkans in the pipeline by any chance? Pretty please:)

    Supervised analysis makes sense if you know which populations admixed to produce which other populations or individuals.

    In the case of the Balkans, there is no such information, and I'd wager that the Balkans have been a net population source, rather than a sink after its original Neolithization where there was a major influx from West Asia.

    So, a supervised analysis would erroneously convey the wrong impression.

    ReplyDelete
  11. I get 2% probability for cluster 4. It seems to be an alpine cluster, for people who are in between north/central Europe and southern Europe (like Italy), and probably have ancestry on both sides of this divide.

    ReplyDelete
  12. Might you consider a K=15 ADMIXTURE run on this amplified set of dodecad individuals, as you did for the populations awhile back?

    ReplyDelete
  13. I could identify 3 people in cluster #19 (where I belong):

    - one 100% Tuscan,
    - one 50% Central Italian, 50% Northern Italian (myself),
    - one 25% Southern Italian, 75% Northern Italian.

    The other 2 I could not identify...

    ReplyDelete
  14. Dienekes, cluster 5 looks more Eastern German. Many are Germans or Brits with some Slavic ancestry. Some are partly Silesian, Russian, Polish, Slovakian, Czech, or Finnish. Two were fully Slovenian.

    ReplyDelete
  15. @Karl:
    I thought a similiar thing some month ago.

    But I recently saw a table with Genetic distances and a claim, that a minimum of 100 persons per population had been used.

    That table showed for North aswell as Southgermans pretty strong eastward connections.

    Both, North aswell as Southgermans had a closer genetic distance to Hungarians or Czechians than to French.

    There is of course the question, WHEN did this happen.

    Thats the question.

    There are things that many ignore or dont know, like that Germany was strongly repopulated from the Balkan peninsular after the 30 years war.

    ReplyDelete
  16. I notice that for many individuals, including me, the British and Scandinavian components add up to 100%. I guess this could be coming from having "new world" people of mixed ancestry, but in that I case I would wonder why specifically these two components are always together.

    ReplyDelete
  17. @Fanty

    "Both, North aswell as Southgermans had a closer genetic distance to Hungarians or Czechians than to French.

    There is of course the question, WHEN did this happen."

    I guess the answer is that Hungarians are "Half-Germans".

    In 1241-42 the Mongols reduced Hungary's towns and villages to ashes and slaughtered half the population.[7] Béla IV of Hungary repopulated the country with a wave of immigrants, transforming royal castles into towns and populating them with Germans, Italians, and Jews.[7] Hungarian kings were keen to settle Germans in the country's uninhabited territories.
    http://en.wikipedia.org/wiki/Germany%E2%80%93Hungary_relations

    ReplyDelete
  18. Dienekes,

    My parents and I are forming our own cluster. You forgot to take me out for this. I'm #81 and as you can see me and my parents are the only ones in #31.

    ReplyDelete
  19. My parents and I are forming our own cluster. You forgot to take me out for this. I'm #81 and as you can see me and my parents are the only ones in #31.

    Thanks for the tip. I reorganized my spreadsheet, and I forgot to mark you as a relative. You won't be next time I do this.

    ReplyDelete
  20. I notice that for many individuals, including me, the British and Scandinavian components add up to 100%. I guess this could be coming from having "new world" people of mixed ancestry, but in that I case I would wonder why specifically these two components are always together.

    If people get mixed probabilities in 2 components, it's usually because the two components are fairly close to each other and difficult to distinguish. This also happens in Italians, for example.

    ReplyDelete
  21. @Daro:
    Yeah maybe.

    There is also the question, how "complete" was the replacement in the "great migrations".

    Hungary was Celtic, then Germanic, then Slavic, then Magyar.

    With exception of the Magyar, thats the same mix-components that can be found in the Germans.

    In that table I saw, was the genetical distance from Hungary almost equal distance to Germany, Poland and the Balkan.

    The most similiar to Hungarians beeing the Austrians and second the Czechians.

    ReplyDelete
  22. Do you think you will ever separate the South Italians/Sicilians from the Greeks?

    The two peoples, the Italians and the Greeks, have been separated for a long time.

    ReplyDelete
  23. Cluster #31( DOD392, DOD393, DOD081) is essentially Galician, a nationality with their own language and culture, so it's interesting the degree of discrimination of this analysis to the level of specific peoples in Iberia.
    I have a question for you Dienekes: why not this population is in the iberian cluster? which components do differ? Maybe basque? north-european? north-african?
    Thanks!

    ReplyDelete