Monday, January 17, 2011

Clusters Galore results, K=64 for Dodecad Project members (up to DOD307)

The results are found in the spreadsheet. 64 clusters were inferred with 12 MDS dimensions retained. Check the previous K=56 analysis and links therein to familiarize yourself with this ancestry inference technique.

The results table consists of 271 rows for non-related Project members with 23andMe data, each of which contains the probabilities that they belong to one of the 64 clusters. This is followed by a table in which the number of individuals from each reference population assigned to each cluster is shown.

The increase in the number of individuals, compared to the previous analysis (up to to DOD236) has resulted in the inference of a greater number of clusters. Once again, I recommend that people leave a comment in the ancestry thread because that is the only way in which you will find out where people in your cluster are from.

I did not fully inspect all the clusters to see what each of them corresponds to. Some observations:
  • #8 seems to be the mainly Greek/South Italian cluster previously detected
  • #3 seems to be a Finno-Ugrian cluster
  • Most Dodecad Project members belong to cluster #12 which corresponds to White Utahns in the reference populations
  • #12 seems to be the main Iberian cluster
The following IDs appear as outliers:
DOD004 DOD029 DOD030 DOD032 DOD033 DOD034 DOD036 DOD047 DOD060 DOD072 DOD088 DOD119 DOD126 DOD128 DOD132 DOD156 DOD157 DOD168 DOD169 DOD175 DOD186 DOD196 DOD240 DOD245 DOD251
You can also see where you fall on the first 2 MDS dimensions. Your co-ordinates can be found in this spreadsheet.


  1. Despite the coordinates for MDS Dimensions 1 and 2, I cannot find myself on the graph. If the x and y axis had more ticks, it would be easier. Any tips on how to find myself on the graph?

  2. You can print it and grab your ruler.
    Or, you can use an image viewer such and count pixels.

  3. Strangely I do not seem to be included in either analysis? (DD301)

  4. You're right, you should have been included; I keep a separate spreadsheet of unrelated individuals and I forgot to add you to it. Given your background I'd be very surprised if you weren't in the "White Utahn" cluster, however. I guess you'll have to wait for the next iteration...

  5. What population group is #7 in the spreadsheet?

  6. I can see the limitations on using so many clusters: this result admits admixture between up to two clusters, not more. Thus, a person who is .5 South Italian .25 Polish and .25 North Italian winds up with a 100% assignment to the Romanian cluster.

    This must be a limitation on computation power. When K=10, we got broken out to as many clusters as was needed to a precision level of .1%.

    I was already amazed that you could compute Maximum Likelihood Admixture Estimates with 8 components! Too bad you can't rent out one of the petascale computing facilities...yet!

  7. Just an observation of the results of reference populations: Some reference populations despite a low number, less than 26 individuals, break up into quite a few clusters whereas reference groups not considered to be isolated, form one cluster like the Tuscan Italians. The Saudi Arabians form five clusters with just 20 individuals in the reference sample.

    I would assume that the more clusters per reference group, the more recently admixed that reference group happens to be and the less homogeneous overall. The Tuscan Italians, 25 in number, form one cluster and indicate that the Tuscan Italians are a homogeneous group and any admixture within the Tuscan Italians is ancient. Colonial Utah Mormons form three clusters showing their ancestry in North America is from a number of sources and recent, which of course tallies with history.

  8. "What population group is #7 in the spreadsheet? "

    I am in that cluster too.

    The ones who typed in their heritage in the Anchestry thread:

    5 Germans (with a Tendency for Eastgermans or Prussians from the now lost "Eastern Terretories" (Silesia etc)

    1 German/Polish/Belorussian
    1 German/Serbian/Slovako
    1 American with mainly German and English heritage.

    2 Slovenians
    2 Norse

    1 Irish

    Some people who belong to 2 or more Clusters are in this cluster combined with the White Utahns Cluster.

    1 Norse, 1 British and one German or so and Cluster 12 and 7 at the same time.

    No other multiclustering of Cluster 7.

  9. Judging from what you have written down cluster 7 seems to represent Germans. The Slovenians, and I don't mean any disrespect, are essentially Slavic speaking Austrians. Norway has had German trade colonies. The Irish person is the odd one out. Interesting for that person as far as genealogy goes, and ancestry.

  10. "Judging from what you have written down cluster 7 seems to represent Germans."

    Its also the second largest cluster. (32 of 271 Dodecad members, so its quiet a large phenomenon)

    But I wonder about:

    1. How many Germans are in the project atm? How much are in Cluster 7?

    2. Is it chance or does it have a meaning, that so much of those 5 Germans have an connection to the east of the Reich?


    The 5 Germans:

    1. Northgerman, with a pinch of Danish. And a Thuringian Great Grand Parent.

    2. German with anchestry in Thuringia, Saxony and Silesia.

    3. German with anchestry in Silesia.

    4. German with anchestry in German speaking minorities in Eastern Europe who originated from Prussia.

    5. German with anchestry in Westphalia, Silesia and East Prussia and a pinch of Lithuanian.

    Thats 3 out 5 that mention "Silesia". 2 mention "Thuringia".

    What I wonder is, does all of Germany fit into cluster 7 or does Germany split up?


    "The Slovenians, and I don't mean any disrespect, are essentially Slavic speaking Austrians."

    I have no doubt that Austrians are in cluster 7.
    Austria has a similiar background: Germanic tribes in the ancient, Slavic tribes in the early middle age. German conquest and colonisation in the High Middle Age.
    Heartland of the two most powerfull German speaking Empires (Prussia and Austria) in the 17th and 19th centuries in thwir struggle over who dominates central Europe.

    I have no doubt Austria beeing Cluster 7. But maybe even Czechia and Hungary?

    Is Cluster 7 really "German" at the moment or, is it "Central European"?

  11. I have been looking at the reference groups main results and comparing it to the project participants, and any ancestry information.

    It appears that those people with suspected Jewish grandparents have not materialized in the results. Two mixed project members with Jewish ancestry, one 50%, the other more complex, have ended up in one of the minor Ashkenazim Jewish columns, column 6 at 100%. One South Italian_Sicilian project member, my assumption due to membership of column 8, has minor Sephardi Jewish ancestry at 3% in column 17. Some project members end up in inappropriate columns when they are mixed. In one case, Sicilian and North Euro American is with the Romanians, and an English European and Sephardi Jewish mix is with the Tuscans. It appears that the results obtained for MCLUST work well provided the person is essentially of one type of ethnic group origin.

  12. One South Italian_Sicilian project member, my assumption due to membership of column 8, has minor Sephardi Jewish ancestry at 3% in column 17.

    As I said before, these are probabilities not admixture proportions.

  13. Fanty,
    I notice that other Cluster 7 Germans mention Saxon and Transylvanian Saxon (DOD250), and that both Thuringia and Silesia border Saxony (Sachsen), so maybe this is Saxon - and that's why it shows up partly in some British and Irish in small portions??

  14. Update:

    I missed a Swede in Cluster 7.

    Its relative anoying that quiet a lot did not participate in the anchestry thread.

    From those who are, 2 out of 3 Norse are Cluster 7 and one is Cluster 12.

    2 Swedes in the Anchestry thread. 1 is cluster 5 and one is cluster 7.

    ALL Germans of the Anchestry thread are Cluster 7.

    At the moment one gets the impression Cluster 7 beeing something like "Non-British Germanic".

    Cluster 5 (onyl few in the Anchestry thread) is the Belorussuans, Lithuanians and the DOD-Poles. Aswell as 1 (of 2) Swedes.

    Hmm, alright, did a map of Europe with some of those clusters as they apear atm:

  15. "Saxony (Sachsen), so maybe this is Saxon - and that's why it shows up partly in some British and Irish in small portions??"

    Modern day "Saxony" is not the uhm "real" Saxony.

    The medieval "Saxony" is in northern Germany. The terretory of what now is called "Lower Saxony" and "Westphalia" are the terretory of the Saxon tribe.

    During the middle ages, the Saxons expanded their terretory into eastern Germany and the dukes of Saxony became quiet powerfull and frequently rebelled against the "King of the Germans".

    Saxony was then breaked up into smaller parts to break its power.

    Here we have an old map of 10th century Germany:

    Here already, Eastgermany is a "Mark". It was settled by Slavic tribes who frequently raided the Empire and to fix this, it was annexed.

    Basicly by the dutchy that had all the raiding trouble: Saxony
    Its known that Saxon dukes build lots of cities in that terretory. But it was questioned long time, how much Slavs had been assimilated and how much Saxon settlers had come.

    Actually these annexed terretories have the largest R1a amount of Germany: 25-30%
    On the other hand is North Germany (Lower Saxony ---> Original Saxony!) second with 20% R1a. (compare: Bavaria 15%, Würthemberg 10%, Rineland even 5%)

    But if you look at my last post, I now believe Cluster 7 to be more of a "Germanic" Cluster than a "German".

    With 2 out of 3 Anchestry thread Norse, 1 out of 2 anchestry thread Swedes in Cluster 7 I now imagine that cluster more like in the picture I have done.

  16. Fanty,

    Great maps - I agree with you, Cluster 7 appears to be Germanic.
    So it's slight presence in Ireland could be the result of the Palatine Germans, who had a number of settlements in Ireland:

  17. Interestingly enough, I define my own cluster at K=22 (I'm DOD133 of full Gascon ancestry): to be more precise I'm joined by another Frenchman from the Lyons sample who must be that French dot with Basque tendencies appearing in most plots using that free-to-use French sample from Lyons (amongst them 23andme). One can fairly deduce that this man from Lyons is either Gascon (a rather peripheral one) or a recent mix of Basque/Gascon ancestry with a more mainstream French one (which might give similar results). Which is a proof that this Lyons sample should not be used without further data about who sampled individuals are : geneticists who sampled these people in Lyons did not select locals (note that they acknowledge it) and that's quite unscientific of them.

    These results show that Gascon people can be differentiated from Basque people (that's not surprising), at least from these sampled French Basques (about whom not much is known : from which valley do they originate ? I mean, there's a whole world between the deep valleys of Lower Navarre and the plain of Soule). I'm really thinking of doing something to make justice to French genetic diversity, geneticists seem to be unable to realize that selecting random people in France's 2nd biggest town is not fair and is rather uninteresting. I'm in contact with people interested in making genealogical research in order to sample fully autochtonous people from one area.

    Samples originate from this foundation : I contacted them so that they tell me more of the geographic origin of sampled people, they did not give an answer (for Lyons, it'll be difficult but they could at least indicate from where the French Basques originate).

  18. It seems clear on the map that, even if Jews are descended from ancient Hebrews which is not proven yet, they have now so much non middle-eastern admixture that they can no longer be considered as Jews. Only individuals of north african jewish ancestry, perhaps (seems to be 3 individual, 004, 053 and 216 on the maps) are quite close to current middle-easterners.

  19. How does the DOD project differ from PCA analysis? Do both use tagged SNPs? Interestingly, my dad, who is 100% southern Italian, falls in the expected column 8 on the Dodecad Project analysis; however, in many PCA analysises, he clusters with the Sephardic Jews, away from other southern Italians/Greeks. Why the discrepancy?

    Would Dodecad results with K > 64 help give more insight?

  20. PCA analyses of the kind you probably refer to use only 2-dimensional projections of the data that can be plotted on a surface. The "Clusters Galore" approach detects clusters at multiple dimensions which cannot be directly visualized (as humans can only look at 3 dimensions).

  21. Wow, swissgirl, there is finally a Swiss person! What part and language is your heritage from? What were your results like?

    I'm hopping that eventually there will be enough Swiss (Ger. especially) for for fineness to "type" them. I wonder what theology be.

    Anyways how did you hear about the project, and do you know anymore Swiss who have done the DNA tests?

  22. I am DOD 307 (see Ancestry Thread)

  23. I see, you're 99% in group 32 (all alone, lol) and 1% in group 21 were there are two people. Weird how your part of two groups with practically nobody in each. Maybe maybe the eastern Swiss were isolated or maybe there's not enough southern Germans, western Austrians, etc.

    When I looked at the ancestry thread dod60 and dod175, both in 21, weren't even shown so I have nothing to compare to.

    I can't use the Google link to see where you plot (-.0242, .0119). Could you tell me what it looks like?

  24. The coordinates place me in the region of "French" and "White Utahns".

    Yes, it would be interesting to know more about the ancestry of DOD60, DOD175, and DOD220. Not everyone posts that info in the Ancestry Thread. Too bad.

    I learned about the Dodecad project through the 23andme Spitton blog where Dienekes placed a specific call for 23andme data from Swiss people.

  25. "I learned about the Dodecad project through the 23andme Spitton blog where Dienekes placed a specific call for 23andme data from Swiss people."

    Hahaha, you mean "Ted"? That was actually me. I can't believe it actually worked. I only did it at one other place but then got lazy :P

    "The coordinates place me in the region of "French" and "White Utahns"."

    Oh, that's kinda disappointing. I was hopping it would be German/central European as apposed to (French, English, German)/NW European.

    We should spread the word so that we can get a Swiss group on dienekes chart.

  26. I'm in #11. Looks like a Romanian cluster if I read the sample sizes correctly.
    (ancestry thread -- DOD236).

  27. "I'm in #11. Looks like a Romanian cluster if I read the sample sizes correctly. "

    No. Its more like an "All Balkans Cluster" since not only the Romanians cluster in 11, but also the Bulgars, Serbs and Croats amoung the dodecad members.

    I made this rough map after looking everyone up in the anchestry thread:

    (Posted the map above already)