Friday, September 30, 2011

'euro7' calculator

I am releasing a new calculator for Europeans, including their immediate neighboring populations around the Black Sea (Caucasus and Anatolia). The calculator can be used with DIYDodecad

There are additional African and Far-Asian population controls, so, in principle, the calculator could be used by non-Europeans/Anatolians/Caucasians, although I would be less confident of their results. For example, people of South Asian ancestry may obtain a Far-Asian result if they use this calculator, due to the deep affinity of Ancestral South Indians with East Asians. Other West Eurasians and West Eurasian-admixed peoples, not from the studied regions (e.g., Arabians or East Africans) will have their West Eurasian components mapped onto the ones used in this calculator.

'euro7' uses 7 ancestral components:
  • Caucasus
  • Northwestern
  • Northeastern
  • Southeastern
  • African
  • Far_Asian
  • Southwestern
These names represent 7 ancestral populations inferred by ADMIXTURE, and have been chosen based on the geographical regions where each of them achieves its maximum representation. You should always refer to A note of caution on admixture estimates, Interpretation of ADMIXTURE results: component sharing, as well as the average population values in the spreadsheet when interpreting your individual results.

The distribution of these 7 components can be seen in the barplot on the top left, and precise admixture proportions can be found in the spreadsheet. Note that additional samples have been used to infer these components, but as these come from Dodecad populations with less than 5 participants, I am not reporting average values for them, as per the usual project policy.

Here is the neighbor-joining tree based on the Fst divergences between the 7 ancestral components:
Instructions:

You can download the calculator RAR from here (Google docs; File->Download original), or here (sendspace).

You need to extract the contents of the RAR file to the working directory of DIYDodecad. You use it by following exactly the instructions of the DIYDodecad README, but always type 'euro7' instead of 'dv3' in these instructions.

Terms of use: 'euro7', including all files in the downloaded RAR file is free for non-commercial personal use. Commercial uses are forbidden. Contact me for non-personal uses of the calculator.

Calculators released by the Dodecad Project:

46 comments:

  1. I think Eurasian genetic analyses are incomplete when there is no "South Asian" component. In this analysis the "South Asian" component of the standard ADMIXTURE analysis is haphazardly eaten by various Eurasian components from the east and west of Eurasia causing some of the information to be lost in a black hole.

    ReplyDelete
  2. I'm sensing there are some serious issues with translating the results of these reference samples using the DIY calculator, but only for groups and people who weren't part of your analysis.

    For instance, Poles who weren't included in your run here score around 8% more membership in your Northwestern cluster using the calculator. I can see the same pattern with Russians.

    Have you also noticed this trend? If so, is it possible to fix this issue?

    This doesn't mean that the calculator approach is totally useless. However, I think for now it would be prudent to inform people who aren't part of your project, only to compare results with each other, and not with the population averages on your reference sheet, nor with individuals who you used in your analysis.

    ReplyDelete
  3. I'm sensing there are some serious issues with translating the results of these reference samples using the DIY calculator, but only for groups and people who weren't part of your analysis.

    I am interested in _evidence_, not your _feelings_.

    For instance, Poles who weren't included in your run here score around 8% more membership in your Northwestern cluster using the calculator. I can see the same pattern with Russians.

    I have no use for anecdotal evidence.

    The Northeastern component encapsulates allele frequencies from 152 individual equivalents, with 112 individuals scoring at least 50% in it. So, the addition of an individual, even if he had allele frequency 100% at every locus compared to 0% of the Northeastern average, would only change allele frequencies by less than 1%. Of course such an individual would not belong in the Northeastern component to begin with.

    In short: you are wrong, but you can always test your pet hypothesis with actual _data_.

    Have you also noticed this trend? If so, is it possible to fix this issue?

    There is no such trend.

    However, I think for now it would be prudent to inform people who aren't part of your project, only to compare results with each other, and not with the population averages on your reference sheet, nor with individuals who you used in your analysis.

    I'm sorry you are disappointed by your personal score, but I will not turn your disappointment into project policy.

    ReplyDelete
  4. I'm not disappointed in my personal score. I just think that my well above average Northwestern score, and below average Northeastern, doesn't really gel with all my previous results. It pushes me into far western Poland, when I'm usually in the north or northeast of the country.

    I'll look into it, and if I find a definite trend, I'll post some data on my blog.

    ReplyDelete
  5. Dienekes, what happened to the Balkans_D group??

    ReplyDelete
  6. Dienekes, what happened to the Balkans_D group??

    I've used all my population groups (e.g., Slovenian_D, Danish_D etc.) individually in order to have the maximum number of samples.

    ReplyDelete
  7. A few interesting facts...


    - The northwestern cluster is closer to the southeastern and caucasus clusters than to the southwestern one

    - The African cluster is closer to the northwestern cluster than to the southwestern one.

    It looks like as if there is somekind of genetic drift in the southwest.
    What are your thoughts on this, Dienekes?

    ReplyDelete
  8. Hello Dienekes, I'm hoping you'd be able to help me.

    I am typing in: system('DIYDodecadWin euro7.par')
    from within the R environment as I've done successfully for the basic admixture results. However, I get the following error message:

    At line 78 of file DIYDodecad.f90 file: "euro7.par"
    Traceback: not available, compile with -ftrace=frame or -ftrace=full
    Fortran runtime error: The system cannot find the file specified.

    Warning message:
    running command 'DIYDodecadWin euro7.par' had status 2

    Do you know what I'm doing wrong?

    ReplyDelete
  9. What I find most interesting is that the SW and SE components are almost NOT dominant anywhere.

    SE component is dominant (>50%) in Armenians and has strong presence (<50%) in Cyprus, Greece and Central Italy but otherwise it's always minor in Europe.

    The SW component only seems important (but not overwhelmingly dominant at all) in Iberians (<50%), showing some presence in Italy (North, Sardinia).

    How would this structure be at K=4?

    Also why West Asians (other than Caucasians and Cypriots) and North Africans have been removed from comparison (when they should be most informative)?

    ReplyDelete
  10. I can assure the Southwestern component is really dominant in the Northeast side of Iberia. That is mainly in Catalunya. I am ethnic Catalan (mostly) and I got almost 60% of if, and less than 5% Southeastern, wich is very low. The rest is all Northwest + Northeast Europe (around 35%), being African, Caucasian and Far Asian absent.

    Southwestern seems like a representation of isolation emaniting from the Pyrenees, taking the end of the last glacial age as reference. In my opinion, could be perfectly reading the oldest European allele frequencies. Who knows...

    ReplyDelete
  11. "Southwestern seems like a representation of isolation emaniting from the Pyrenees"...

    In that case it'd be dominant among Basques and is not. I compare better with the Iberian-specific component found in other studies. If so, it could be more (IMO) the product of the distinctiveness of the Iberian Paleolithic province (distinct from Franco-Cantabrian province, even if somewhat related).

    But, of course, it'd be interesting to know all the various K levels and not just one, chosen more or less arbitrarily.

    ReplyDelete
  12. Modern Basques are most likely primarily descended from people who lived in Aquitania in Classical times: http://www.google.es/imgres?imgurl=http://www2.luventicus.org/mapas/franciaregiones/aquitania.gif&imgrefurl=http://www.luventicus.org/mapas/franciaregiones/aquitania.html&h=678&w=596&sz=15&tbnid=_zff7gI3n9q8NM:&tbnh=90&tbnw=79&prev=/search%3Fq%3DAquitania%26tbm%3Disch%26tbo%3Du&zoom=1&q=Aquitania&docid=W5_4VQ3yBEKreM&sa=X&ei=m2aITqSZL8uW0QW2u9D3Dw&ved=0CG8Q9QEwCQ&dur=436

    That explains how quite of the Southwestern autosomes were replaced due to Northern influence, but still remaining in a substantial degree (35% it's also significant). Also, 10% Southeastern beteen Basques seems to indicate the influence I mention, since the French show near 12% of this.

    Ethnic Catalans have much less French influences, that's the main reason why you see a very high Southwestern and very low Southeastern, because they have retained much better the original isolation from the Pyrenees. That isn't contrary to the fact that Basques are very distinctive genetically speaking, but obviously Northwestern Northeastern, and Southeastern, were not native to the Pyrenees. The only cluster that fits in the category is the Southwestern, even more knowing that peaks in Catalans, who are next to the Pyrenees too.

    ReplyDelete
  13. Whatever. Aquitania is by the Pyrenees as well. I make a likely distinction between the Franco-Cantabrian region (south of modern France, Cantabrian strip in modern Spain) and Iberian region (all the rest of Iberia, except maybe the interior and NW which show little sign of habitation yet).

    Anyhow, Catalans normally show up a bit closer than most other Iberians to Basques and Gascons (Aquitanians), so your case may be a bit exceptional. Unsure but it's not enough to say, as you do, that it "peaks in Catalans".

    ReplyDelete
  14. Aquitania includes a substantial territory northward to France. For the moment, the increased Northwestern and apreciable Southeastern makes me think the mentioned influence has something to do here.

    I also have interesting connections with Basques, but this results seem to hide it. Several of my distant cousins at 23andme are Basques or have some Basque ancestry, so I think depending on the analysis would be more clear how close I am.

    Your observation about the exceptionality of my results is comprehensible. I told it peaks in Catalans, well, perhaps I'm wrong, but keep in mind that not many people with huge Catalan ancestry has been tested. I know my family background, and I can assure I have 3 grandparents from a small Catalan town (less than 1000 inhabitants), wich probably not many people who claim Catalan ancestry could say. No need to say all the surnames you find there are 100% Catalan. For this reason I think it's really true that the component peaks in ethnic Catalans, probably there are others who get even higher Southwestern and lower Southestern than me. The problem is that not many of them will get tested, so ok, I understand you think this way.

    ReplyDelete
  15. One thing I notice when I compare the analysis of Bauchet 2007 and that of Dienekes is that the latter fails to detect a Basque cluster (probably because of lack sufficient depth of analysis and small Basque sample). My hypothesis is that, lacking their "true" cluster distinct populations are drawn to minor admixture components, which they show instead, causing confusion.

    And well, your understanding of Aquitaine and even France is too messy for me to fix. There is more than 2000 years of history that you need to understand before issuing the judgments you sport so lightly (also for France).

    ReplyDelete
  16. that of Dienekes is that the latter fails to detect a Basque cluster (probably because of lack sufficient depth of analysis and small Basque sample).

    That is not a failure, it's a feature. The Basque cluster, as well as the Kalash cluster, Ashkenazi cluster, etc. are population-specific, due to a high level of homogeneity and inbreeding, and hence to be avoided in studying the larger patterns of population structure in Eurasia.

    ReplyDelete
  17. Dienekes, is it possible to use the paint_byseg.r commands for the euro7 calculator as for the dv3?

    ReplyDelete
  18. "That is not a failure, it's a feature. The Basque cluster, as well as the Kalash cluster, Ashkenazi cluster, etc. are population-specific, due to a high level of homogeneity and inbreeding"...

    That's an ideological option you take, because:

    1) Basques are not "inbred" in comparison with other European populations (there are studies on that).

    2) Basques, Gascons and surely other "South French" can well represent a distinct Franco-Cantabrian gene pool, which should NOT be ignored in genetic studies of Europeans.

    ReplyDelete
  19. "The MAF spectra (Supplementary Figure 6), although highly distorted because of SNP ascertainment, also show the Sardinians and Basques to have a noticeable excess of monomorphic SNPs. This excess suggests that some SNPs that are polymorphic in Europe may have been driven to extinction/fixation at a higher rate or never existed at all in these populations, consistent with genetic isolation. They were also both clear outliers with regard to the number of homozygous segments detected (Figure 3c)."

    http://www.nature.com/ejhg/journal/v19/n9/abs/ejhg201165a.html

    ReplyDelete
  20. It seems not casual the similarity showed by the Basques with French in Both K=12 v3 and the recent calculator. I think the fact they are quite isolated and hgomogeneous doesn't change that (IMO), at some point, they recived the mentioned influences pulling down the original Southwestern native to the Pyrenees (but not drastically, since 35% is not precisely low).

    Of course there's no definitive prove of this, but it's a very likely option if the results are really accurate. Possibly another K=12 including the clusters of the Euro7 Calculator, would make things more clear.

    ReplyDelete
  21. It is strange that Luis still questions the homogeneity and inbredness of Basques compared to the overwhelming majority of other European populations after all those studies.

    ReplyDelete
  22. Just a quick reference: Young et al. 2011, which deals with Basque autosomal genetics precisely.

    While the authors state in the intro that "Heterozygosity levels in the Basque provinces were on the low end of the European distribution (0.805-0.812)", this is only barely correct.

    When you look at table 4 it happens that the difference is very small and could be easily removed excluding the sample from Araba which totally tilts the average.

    In any case Basque average diversity is 809 (810.3 after removing Araba), while the average within-populations gene diversity (H sub-T)among all samples is 807, which is lower. While there are regions (notably Scotland, very odd, Andalusia, Austria, Hungary, Turkey) that are quite higher than the Basque diversity average, there are others (Morocco, Catalonia, Murcia, Georgia, Tuscany, Bosnia) that are under the Basque diversity levels.

    It is therefore unfair and an error to make the claims you do, specially considering that Basque genetic samples have often been subject of extremely stringent "pedigree" conditions (not applied elsewhere) that necessarily reduce apparent genetic artificially.

    ReplyDelete
  23. Luis, in the study you quoted very few autosomal markers were examined, while in the study Dienekes quoted over 30 000 autosomal markers were examined.

    ReplyDelete
  24. I do not have access to the paper, so I can't tell how accurate is the claim professed. I'm not surprised that Sardinians are isolated but I am surprised that it's claimed that Basques are so extremely isolated, living as we live open to the trade (and pilgrimage) routes of Western Europe.

    So:

    1) I'd like to see the paper in detail before I can issue judgment.

    2) I suspect that less common "immigrant" alleles (which did not reach Basques as much as other populations) are distorting the picture.

    However I agree that Basques (and Sardinians) are distinct populations in the context of Europe (but Basques together with some neighbors, specially 'South French' and Sardinians together with Corsicans and to some extent mainland Italians maybe) and that's a reason I feel frustrating that their distinctiveness is hidden in most Dienekes' Dodecad analysis. I think this does not add (as he claims) but detracts in fact from truthful info and confuses people like Acid who can't really know what's their exact relation (if any) with Basques or Sardinians.

    I understand that from the viewpoint of Greece it looks "less important" but from the viewpoint of Catalonia, France, Italy, Britain... it should be at least as important as any other component.

    By choosing a K level of his preference (and by removing important "border populations" like North Africans or most West Asians), Dienekes is making an arbitrary and at least questionable choice. An unnecessary choice (several analysis could be produced in parallel) I lament, because if this may be somewhat informative, it could be of much greater interest if all the relevant info was offered instead.

    ReplyDelete
  25. Maju, I think you should read this article carefully, to understand Dienekes reasons to exclude Basques and Sardinians from the initial run

    http://bga101.blogspot.com/2011/05/french-basques-and-sardinians-are.html

    ReplyDelete
  26. Luis, do you have evidence that South French are distinct in the European context and form a cluster together with Basques?

    ReplyDelete
  27. There was another K=12 using Sardinian and Basque components, but It wasn't possible for me to participate (I was late). I am curious about my scores in both clusters, but I think Dienekes' wants to find the composition of populations, rather than remark their known isolation or see how they connect with other people.

    If I understood well, for you a Basque or Sardinian cluster are necessary. Well, perhaps not. It's possible that both clusters include the influences detected by the other analysis, as for example the K=12 v3 or the latest calculator (wich in fact, would be more accurate using a K=12 "style" including Southwestern, Southeastern, etc.)

    I don't see why not, although I'd like to see the other results as I said.

    ReplyDelete
  28. @Eduardo: I fail to see how that article justifies anything. It's just Polako's opinion and, like Dienekes, he has a non-Western point of view and focus.

    @Onur: I do not have much evidence because we lack studies on the Hexagon, the second largest state of Europe and a region with high population levels since antiquity (and the Paleolithic for all we know), which must hold huge genetic diversity and be crucial in understanding European genetics.

    I know that Lyon French formed a distinct cluster in a previous analysis by Dienekes (maybe they are also 'inbred'), I know that R1b1a2a1a1b (P312/S116) shows apparent high basal diversity in South France, but otherwise I know too little because the country have been studied very patchily. Just including a 'French' sample from who-knows-where is not useful, France should be sampled and researched throughfully if we are to unveil European genetics, specially those of Western Europe.

    Iberia also needs some extra attention but the case of France is truly painful.

    @Acid:

    In autosomal studies very specially, having diverse informative viewpoints is important because such wealth of information can barely be discerned in any single statistical study or angle. Different ones should be combined in order to gain some depth of understanding.

    What I mean is that even if this approach can be informative, is only one of several possible approaches, all of which have something to say.

    ReplyDelete
  29. I do not have much evidence because we lack studies on the Hexagon, the second largest state of Europe and a region with high population levels since antiquity (and the Paleolithic for all we know), which must hold huge genetic diversity and be crucial in understanding European genetics.

    I know that Lyon French formed a distinct cluster in a previous analysis by Dienekes (maybe they are also 'inbred'), I know that R1b1a2a1a1b (P312/S116) shows apparent high basal diversity in South France, but otherwise I know too little because the country have been studied very patchily. Just including a 'French' sample from who-knows-where is not useful, France should be sampled and researched throughfully if we are to unveil European genetics, specially those of Western Europe.

    Iberia also needs some extra attention but the case of France is truly painful.


    I do not have much evidence because we lack studies on the Hexagon, the second largest state of Europe and a region with high population levels since antiquity (and the Paleolithic for all we know), which must hold huge genetic diversity and be crucial in understanding European genetics.

    I know that Lyon French formed a distinct cluster in a previous analysis by Dienekes (maybe they are also 'inbred'), I know that R1b1a2a1a1b (P312/S116) shows apparent high basal diversity in South France, but otherwise I know too little because the country have been studied very patchily. Just including a 'French' sample from who-knows-where is not useful, France should be sampled and researched throughfully if we are to unveil European genetics, specially those of Western Europe.

    Iberia also needs some extra attention but the case of France is truly painful.


    Are there any region-based genetic studies on France you know?

    ReplyDelete
  30. Almost nothing: a 2004 mtDNA study of a handful of regions (Brittany, Normandy, Northeast, Poitou-Limousin and Provence) is the closest I know of.

    Also Heraus echoed in the first post of his blog Anthrofrance some unpublished (conference) HLA differences (which would seem to vindicate the importance of Basques and Corsicans as an important reference). If we are to judge by phenotype, which Heraus has explored a lot in these last years, there's something more than just affinity to either Basques, Iberians or Central-NW Europeans: there is a host of French-specific variability, specially towards the South (so more like Occitan maybe).

    Finally in the Myres 2010 R1b paper (which I discussed here), some Occitan and French locations were included and the Occitans specially appear to have high levels of basal diversity in R1b-L11 and specially S116 ("Southern haplogroup" probably radiating from around the Pyrenees).

    So I have a lot of reasons to be intrigued about "French" and specially "South French" (Occitan, Gascon...) genetics by regions. But the data we have is extremely limited, almost oblique.

    ReplyDelete
  31. Luis, with this paucity of data we can't say anything about the nature of the genetic relationship between South French and Basques. We need more research.

    ReplyDelete
  32. It says,

    At line 142 of file DIYDodecad.f90 (Unit 50 "genotype.txt")
    Traceback: not available, compile with -ftrace=frame or -ftrace=full
    Fortran runtime error: End of file

    ReplyDelete
  33. @Inextrinsia

    1. Were you able to run dv3 correctly?
    2. Are you using version 2.1?

    ReplyDelete
  34. Ah ha! No, I was not using version 2.1, though I actually have it, I'd forgotten to throw out the old folder.

    ReplyDelete
  35. Dienekes, can you place the Euro7 population averages (or any of the others weac, etc.) into Oracle so that we may recompute national and binational matches and distances based on these components?

    ReplyDelete
  36. Hi, Dienekes. You don't seem to have released a calculator in a while. I was wondering whether you could take this opportunity to release a calculator based on your initial K=10 Dodecad analysis some time.

    ReplyDelete
  37. I don't plan to release a calculator based on K=10 analysis.

    ReplyDelete
  38. "I don't plan to release a calculator based on K=10 analysis".

    It's a pity because it is a much reliable reference for the true structure of Europeans. Or do you really believe that Russians and Irish are identical?

    ReplyDelete
  39. Or do you really believe that Russians and Irish are identical?

    Who said they were?

    ReplyDelete
  40. LOL

    K=10 showed Northern Euros more similar to each other than K=12 or the recent calculator.

    ¿How can it be much reliable if the "identical" condition is more evident there than elsewhere? I don't see the point, sorry.

    ReplyDelete
  41. BTW Dienekes, is the 24th of October the big day?

    ReplyDelete
  42. BTW Dienekes, is the 24th of October the big day?

    I have something in the works, whether it'll be ready for the 24th remains to be seen.

    ReplyDelete
  43. Hi, I'm from Spain and so are my four grandparents but I've a percentage that doesn't fit with the data in the spreadsheet. I've been checking but I can't find any European country with roughly similar percentages to mine.

    These are the results:
    ----------------------------

    2.90% East_European
    39.66% West_European
    37.35% Mediterranean
    0.58% Neo_African
    10.35% West_Asian
    0.72% South_Asian
    0.00% Northeast_Asian
    0.00% Southeast_Asian
    0.31% East_African
    4.09% Southwest_Asian
    4.03% Northwest_African
    0.00% Palaeo_African

    -------------


    64.48% Atlantic-Baltic
    34.46% Near-East
    0.01% Far-East
    1.05% Africa

    --------------------
    5.37% Caucasus
    39.31% Northwestern
    5.47% Northeastern
    21.23% Southeastern
    1.27% African
    0.00% Far_Asian
    27.35% Southwestern

    -----------------
    I must be a mix of something but I'm not aware of any ancestors from abroad. Is this sort of Southeastern European, Near East and West Asian component rare in Spain?

    ReplyDelete
  44. I'm getting strange results with the calculator. Could it be that the values for Southeastern and Southwestern are mixed up?

    ReplyDelete
  45. Why you made the model with North Western more closed to North Eastern and not to South Western?

    ReplyDelete