Tuesday, January 17, 2012

fastIBD analysis of Iberia, France, Italy, Balkans, Anatolia and European Jews

On the heels of the previous analysis of Balkans/West Asia, a new experiment on a different set of populations. Please refer to the earlier post for some thoughts/explanations about this type of analysis, I'll stick to "just the data" for this post.

Clusters Galore




24 clusters inferred with 17 MDS dimensions.

The Galore analysis provides increased resolution within Iberia (#6-9, 11), Italy, and the Ashkenazi Jewish group (#14-16).

The Iberian results are particularly interesting, showing the power of this approach compared to the one with unlinked data. There appear to be:

  • a Spanish Basque (#6), 
  • French Basque (#11) cluster, as well as 
  • a Portuguese/Galician/Castilla Y Leon (#9) cluster, and 
  • a complementary Castilla La Manch/Cantabria/Andalucia/Murcia (#7) cluster, and 
  • a smaller Aragon/Cataluna cluster (#8). 
There is overlap between these clusters, but the geographical contrasts are quite evident. I did not go through the results of Spanish Project participants (all the Portuguese fall in the Galician cluster, and our Basque member in the Basque cluster as expeccted), so it would be interesting to hear whether they fall in the cluster(s) which exist in their regions of origin.

Inter-Population IBD




Results for Project Participants


The results can be found in the spreadsheet.

35 comments:

  1. A good correlation of iberian clusters with languages:

    Western: galician-portugues, astur-lliones
    Central: Castilian, spanish core.
    Eastern: catalan-aragonese.

    http://en.wikipedia.org/wiki/File:Sprachen_auf_der_Iberischen_Halbinsel.jpg

    ReplyDelete
  2. I have yet to understand why Armenian_D (though it is not clear if they are Anatolian) were included, while two Anatolian Kurdish samples of Kurd_D were not.

    ReplyDelete
  3. Hello Dienekes, I'm DOD074 - northwestern Portugal - and I have a question.


    How should we read the personal IBD scores. Cantabria is my top score but I'm wondering why not Portugal, Galicia or even Asturias.


    Here's the top list



    Cantabria 1.76

    French Basques 1.48

    Castilla y León 1.09

    Portuguese 1.05

    Cataluña 0.95

    Extremadura 0.93

    French_D 0.92

    País_Vasco 0.9

    Murcia 0.87

    Valencia 0.76

    ReplyDelete
  4. >> I have yet to understand why Armenian_D (though it is not clear if they are Anatolian) were included, while two Anatolian Kurdish samples of Kurd_D were not.

    Dodecad populations are included (or not) in full in the different experiments. The criterion for the Armenian_D and Kurd_D populations is ethnic, not geographical. Not all populations will be included in all experiments, and the purpose of this experiment was to study Iberians, French, and Italians, but I threw a few more populations in there. The previous experiment focused on the Balkans and West Asia, and hence Kurds and a variety of other West Asian populations were included.

    ReplyDelete
  5. >> How should we read the personal IBD scores. Cantabria is my top score but I'm wondering why not Portugal, Galicia or even Asturias.

    It is not necessary that your highest IBD scores will be with the Portuguese. If that were the case (All Portuguese having highest IBD scores always with other Portuguese), then the Portuguese would form their own distinctive cluster, but this is clearly not the case. Rather, the Portuguese, Galicians, etc. form a "West Iberian" (#9) cluster to which you also belong.

    If you look at the heatmap and the Portuguese, you will notice a fairly medium red in the Portuguese/Portuguese square along the diagonal, which indicates that the Portuguese do not share an incredible amoung of IBD with each other. On average they do share a lot of IBD with each other (and hence the square is red), but not as much as e.g., the Sardinians, whose square is dark red (hence a lot of intra-Sardinian IBD), and as a result they also form their own unique clusters in the Clusters Galore analysis.

    ReplyDelete
  6. Yes, I already knew that the top scores on IBD and IBS analysis are usually of other ethnicities than your own, e.g. French basques, Sardinians, British, etc... although on this fastIBD analysis I was expecting something more groundbreaking...patterns emerging or something that could tells us more about the peopling of Iberia, other than a west/east divide.


    I'm sorry if I sound disappointed, which I'm not, I'm just eager to connect - or disconnect - these genetic conclusions with historical processes.

    ReplyDelete
  7. i would like to see at some point a detailed analysis of the greek sample if it is possible.It would be interesting to see the differences between regions eg minor asia,mainland greece or north south if of course the sample is appropriate.

    ReplyDelete
  8. Quite a few of the Project participants have ancestry from multiple places, so it's not easy to do a regional analysis at the moment.

    ReplyDelete
  9. Dienekes, seems fine, I would like to see one fastIBD for entire Europe.

    ReplyDelete
  10. Very interesting experiment, D. What time frame are we dealing with in regards to IBD segment sharing? Is it possible to estimate the time and age of the shared ancestry?

    ReplyDelete
  11. The time frame depends on the length of the shared segments. My software currently keeps no statistics about that, but in general with as many SNPs as are currently used, it should be good for a few thousand years.

    ReplyDelete
  12. Is the west iberian cluster( galician-portuguese) more akin to the french cluster?
    This is not congruent with the geography. The dendrogram seems suggest it. A signal of celtic of germanic migrations?

    ReplyDelete
  13. "Dodecad populations are included (or not) in full in the different experiments. The criterion for the Armenian_D and Kurd_D populations is ethnic"

    Geography is important for the ethnicity otherwise Turkish samples should not be included too cause their ethnic origin is far away from Anatolia.
    I can t remember to have came across anything what would make the Armenians ethnically more Anatolian as the Kurds. more than half of the Armenian samples are from Caucasus and not Anatolia. So I don t see what could make the Armenians geographically or ethnically "more" Anatolian. And as I mentioned multiple times unfortunately your samples are strongly biased towards Eastern Kurds, 1 sample is fully Anatolian Kurd and the other from East or the Iraq-Iranian border. I would have participated but 23andme only accepts credit cards.

    Don t take my words as critics for your works on genetics I just fear it might have some politics on it too.

    ReplyDelete
  14. @Dienekes

    You forgot to include Andalusians in the Individual IBD Z-Score

    ReplyDelete
  15. @Kurti, this is not a problem of who is more or less Anatolian. As I said, populations are primarily based on ethnicity and not geography, and in this experiment, a couple of populations (Turks and Armenians) who are primarily Anatolian (although some Turks are from the Balkans and some Armenians from beyond Anatolia) were included. It is not an insult to the "Anatolicity" of Anatolian Kurds, Assyrians, Jews, and anyone else if they are not included in every single experiment.

    >> You forgot to include Andalusians in the Individual IBD Z-Score

    See previous post. Only populations with 5+ individuals are included in the heatmap and columns of the IBD Z-score table.

    ReplyDelete
  16. Can it be said that a positive score confirms and a negative score potentially refutes common ancestry in the last few thousand years?

    ReplyDelete
  17. No, since these scores are averages, so there may be IBD sharing (and hence common ancestry) even in a population with a negative column score.

    They do mean that higher IBD Z-scores indicate more IBD sharing (on average across all individuals of that population, and in cM).

    ReplyDelete
  18. Thanks! Is there, by any chance, a way to report the results in terms of segment size (cM)?

    ReplyDelete
  19. I m included in the balkans/west asia test and i noticed some rather big diferences in some of the ibd z scores.My ibd-z score for the greek_d in this test is 0.65 while in the balkans/west asia it is 1.22.Is it normal?

    ReplyDelete
  20. It's normal as these are z scores, not raw numbers.

    ReplyDelete
  21. Would it possible for you to convert the fast IBD Z scores into a PCA plot? I've seen it once on a IBS analysis ran by Davidski and it came out great.

    ReplyDelete
  22. Do we have any idea of the source of structure within the Ashkenazi population? As a participant (DOD335), I don't recall including details of my ancestry.

    ReplyDelete
  23. I don't know much about Ashkenazi history to be able to correlate what participants told me of their ancestry with the clusters that appeared, and some of them did not provide any more specific information.

    ReplyDelete
    Replies
    1. DOD179 and I (DOD215) have a theory:
      There are a set of interrelated Ashkenazi rabbinical families in Lithuania and Belarus ("Lita") and it may be that the individuals in cluster #15 are "Litvaks" with a shared set of common rabbinical ancestors who lived in Prague and Frankfurt in the 17th century whose descendants went back and forth to "Lithuania".

      Ashkenazi Rabbinical Family Tree

      The above tree has to be treated with caution since some of the links are unproven (or wrong). Also, it doesn't show the many lineages related to these going forward in time, and is missing quite a number of important links.

      Some members, based on surnames, seem to be related to the large set of Ashkenazi rabbinical families shown above, and heard that they descend from a "long line of rabbis" on one side or another.

      It is interesting that I myself have "multiregional" Ashkenazi ancestry, two lines from Belarus, one from Western Ukraine ("Galicia") and one from Romania, yet I'm 100% in cluster #15 just like some others with strictly "Litvak" ancestry. However, DOD178 has 100% Belarus "Litvak" ancestry and falls in cluster #14.

      A possible clue is that DOD173, who has paternal lines from Western Ukraine and Lithuania, and maternal lines from Germany and Hungary falls in both clusters, #14 and #15.

      We really don't know enough about who in fact descends from these interrelated rabbinical families, so we can't quite prove it (yet). One could describe it as a kind of "cryptic relatedness", but perhaps it isn't ...

      BTW, the Y DNA along with documented history and surname seems to indicate that the male-line descendants of Rabbi Mattityahu Treves b. 1323 in Marseilles spread throughout the Jewish world, from Germany to Lithuania to Italy, Greece, Bulgaria, and Western Turkey, to Aleppo and Baghdad.

      These families are often named some variant of "Ashkenazi-Treves". What is fascinating is that they also seem to be close matches of the Martinez family of N. Mexico, the De Cubilla family of Boqueron Panama, and the Cardenas family of Medellin Columbia. They also match the Sephardic Toledano family of Morocco. It may also be that Lithuanian Jewish matches were originally named "Ximenes" (Jimenez). Because "Ashkenazi" means "German", it may be that this is the "Alemano" family of N. Mexico, with the same surname, in Spanish translation.

      This may be a documented case of IBD that extends across the world, from Ashkenazi and Sephardi Jews to Latin America. These individuals may not cluster together, but they should have elevated IBD values.

      You might want to try running FastIBD on a set of Latin Americans against Ashkenazi and Sephardi Jews to see what comes out. We already see these shared segments in Gedmatch but often we need a greater SNP density with combined RF v.3 and FF data to see it.

      I have a feeling that with full sequences these shared segments > 3 cM will suddenly become clear ...

      Delete
  24. Understood. But what seems to be differentiating the different clusters ? The smallest seems to be sort of French-related (not surprising given early Ashkenazi history), at least.

    ReplyDelete
  25. It is clear that the Western Iberian or Portuguese-Galician cluster includes all Iberians with significant North-African ancestry (the 2 Canarians fall into this cluster as well).

    Interestingly, the 2 french from HGDP sample who also have smal North-African ancestry fall into this cluster as well.

    ReplyDelete
  26. Regarding column 12 in the European fastIBD, my dad and his cousins are the only Italians in that column. In the Middle Easter fastIBD, there is another column 12 that includes and Iranian and Kurd. Are these 2 column 12's the same cluster or different clusters?

    ReplyDelete
  27. Regarding column 12 in the European fastIBD, my dad and his cousins are the only Italians in that column. In the Middle Easter fastIBD, there is another column 12 that includes and Iranian and Kurd. Are these 2 column 12's the same cluster or different clusters?


    Column 12 is not the same in different runs.
    Also, if there are known relatives in the Project, you should let me know immediately what is their relationship, because relatives are forbidden.

    ReplyDelete
    Replies
    1. My dad is DOD 165. He is 4th cousins with DOD 177 and I suspect close cousins with DOD 176, as they are from the same small village of 500 people from southern Italy and share 155 centimorgans, as shown by 23andMe. I think it's a good idea to just include my dad in future analysis, to not alter the results. thanks.

      Delete
  28. @ Antoine1706

    That is not true. There are iberians with very small north-african that are also in the Galaico-Portuguese cluster. And one of the french there has 0%.

    ReplyDelete
  29. " There are iberians with very small north-african that are also in the Galaico-Portuguese cluster. And one of the french there has 0%"

    Sure, I think that the " individuality" of the western iberian cluster is maybe in relation with population movements from continental Europe. In the Iron Age the language of these western territories was a proto-celtic, or gallaecean-lusitanian, different from celtiberian.
    A paneuropean MDS needs to be made.

    ReplyDelete
  30. My results are rather interesting : I suppose I'm the only "ethnic" Gascon individual in the projet (DOD133), more precisely a Béarnais (SW France - Pyrenees).

    The Galore analysis assigns me to cluster #8 alongside Aragonese and Catalan people. The inter-population IBD analysis is another proof that Gascon people - at least a South Gascon individual like me - are linked to the French Basques. By decreasing order : French Basque : 4.93 (pretty high ??) ; Pais_Vasco_1KG : 1.84 ; Aragon : 1.66 ; [...] ; French 0.09

    I really lament the lack of proper regional French samples ... I'll try to convince friends of mine to take one of those commercial tests.

    ReplyDelete
  31. Hi Dienekes,
    I am Dod804 & I am Siclian from the village of Alcara Li Fusi. I am confused by my results because they are not very similar to previous analyses. I usually do not seem to be very similar to Iberians & French people. How would you explain this?

    Murcia_1KG 1.79

    Sicilian_D 1.39

    French_D 1.3

    C_Italian_D 1.1

    Baleares_1KG 0.86

    Galicia_1KG 0.7

    Extremadura_1KG 0.61

    Portuguese_D 0.46

    S_Italian_Sicilian_D 0.44

    Castilla_La_Mancha_1KG 0.4

    Sephardic_Jews 0.36

    North_Italian 0.28

    N_Italian_D 0.21

    Ashkenazy_Jews 0.21

    French 0.05

    Spaniards 0.04

    Sardinian -0.09

    ReplyDelete
  32. You fall in cluster #2 which is dominated by Sicilians and South Italians, so I am not sure what you are confused about.

    ReplyDelete
  33. I realize that now but I still am confused about the order of Z-Scores (Murcia, Sicilian, French) etc. How do you explain this? I guess I am a little confused about what a Z-score means. Also would it be possible for you to include me in the West Asia/Balkans Spreadsheet as well? I wonder how a Sicilian such as I would fit within these populations.

    ReplyDelete