Saturday, April 16, 2011

Your nearest IBS neighbors (up to DOD603)

I have calculated your nearest identity-by-state (IBS) neighbors based on the same set of ~146K markers used for the standard K=10 analysis.

As explained here, it is not always the case that your nearest neighbors will belong to the same ethnic group as you. For closely related groups in the global context (e.g., Europeans), it's quite possible for a member of a different group to be more similar to you than a member of your own.

I am distributing the data as an R object. You must first install R, and then you can open this object by double-clicking on it (in Windows), or by using the File->Load Workspace menu within R. Then, you simply enter the following command at the prompt:
closest("DBV001")

Replace "DBV001" with your own project ID. If that ID is not included in the data, or you mistyped it, you will get an error message:
closest("qwerty")
[1] "This ID is not included"

Otherwise, you will get your results:
closest("DBV001")
[1] "Your nearest neighbor is 0.05 standard deviations more distant to you than for the average project participant"
RANK ID IBS
V3 "1" "DBV001" "1"
V133 "2" "DOD151" "0.749907"
V313 "3" "DOD344" "0.749559"
V943 "4" "Ashkenazy_Jews" "0.749298"
V50 "5" "DOD051" "0.74926"
V935 "6" "Ashkenazy_Jews" "0.749082"
V944 "7" "Ashkenazy_Jews" "0.749018"
V243 "8" "DOD272" "0.748985"
V25 "9" "DOD022" "0.748982"
V157 "10" "DOD179" "0.748904"
V942 "11" "Ashkenazy_Jews" "0.748822"
V939 "12" "Ashkenazy_Jews" "0.748821"
V954 "13" "Ashkenazy_Jews" "0.748767"
V251 "14" "DOD280" "0.748607"
V936 "15" "Ashkenazy_Jews" "0.748582"
V940 "16" "Ashkenazy_Jews" "0.748529"
V950 "17" "Ashkenazy_Jews" "0.748515"
V949 "18" "Ashkenazy_Jews" "0.748398"
V201 "19" "DOD228" "0.748382"
V308 "20" "DOD338" "0.748344"

By default, this produces the first 20 closest IBS matches. You can change this behavior by entering:
closest("DBV001", k=50)

Notice, that the sentence: [1] "Your nearest neighbor is 0.05 standard deviations more distant to you than for the average project participant" gives you an idea of how close your nearest neighbor is to you compared to other Project members.

For people of well-represented groups, their nearest neighbor is likely to be closer to them than average.

I have also included the 692 reference individuals from the standard K=10 analysis set, so your list of closest neighbors will include both DOD-labeled project participants, as well as reference individuals.

25 comments:

  1. Worked perfect for me, most of the DOD matches have a shared ancestry.

    Excellent job!

    I still don't know of which sample the CEU25 consists, in detail. I have it 5 times in my top 50.

    What do you say about 0.31 standard deviations closer in the European context?

    ReplyDelete
  2. Very good job. I'm DOD486, 100% Italian and these are my closest matches:

    V449 "1" "DOD486" "1"
    V723 "2" "North_Italian" "0.750471"
    V657 "3" "Sardinian" "0.750373"
    V661 "4" "Sardinian" "0.749847"
    V703 "5" "TSI25" "0.749761"
    V52 "6" "DOD054" "0.749656"
    V652 "7" "Sardinian" "0.749581"
    V680 "8" "French_Basque" "0.749456"
    V647 "9" "Sardinian" "0.749449"
    V708 "10" "TSI25" "0.74937"
    V645 "11" "Sardinian" "0.749282"
    V711 "12" "TSI25" "0.74928"
    V745 "13" "Spaniards" "0.749213"
    V649 "14" "Sardinian" "0.749205"
    V563 "15" "DOD603" "0.749152"
    V44 "16" "DOD042" "0.749141"
    V343 "17" "DOD375" "0.749132"
    V200 "18" "DOD227" "0.749093"
    V389 "19" "DOD423" "0.749044"
    V710 "20" "TSI25" "0.749021"

    ReplyDelete
  3. I'm not the sharpest tool in the shed :) In basic layman's terms, does the "R" program take 146K snp's and match them with similar IBS snp's from a standard K=10 set? On another program,I plot in the Orcadian; in this "R" program, nearest neighbors are in the predominantly Belorussian/Lithuanian group. If I were to combine the results, does this suggest that the population of Orkney's and Lithuanian/Belorussian/Russian share common snp's?

    ReplyDelete
  4. Can't download the data file. Any advise?

    ReplyDelete
  5. Here are my results: DOD548. On 23&me,my maternal ancestors cluster under the Burusho tribe. On my paternal side they cluster in Western Europe. So I seemed to have averaged out somewhere in between.


    [1] "Your nearest neighbor is 0.7 standard deviations more distant to you than for the average project participant"
    RANK ID IBS
    V508 "1" "DOD548" "1"
    V564 "2" "GRM001" "0.746494"
    V474 "3" "DOD513" "0.746339"
    V96 "4" "DOD103" "0.746221"
    V604 "5" "Lithuanians" "0.746134"
    V295 "6" "DOD325" "0.746056"
    V541 "7" "DOD581" "0.745993"
    V609 "8" "Belorussian" "0.745978"
    V307 "9" "DOD337" "0.745907"
    V433 "10" "DOD470" "0.745863"
    V580 "11" "CEU25" "0.745859"
    V228 "12" "DOD257" "0.745837"
    V623 "13" "French" "0.745772"
    V245 "14" "DOD274" "0.745725"
    V631 "15" "French" "0.745698"
    V266 "16" "DOD295" "0.745608"
    V518 "17" "DOD558" "0.745584"
    V441 "18" "DOD478" "0.745565"
    V510 "19" "DOD550" "0.745564"
    V95 "20" "DOD102" "0.745541"

    ReplyDelete
  6. Thanks, it worked well for me once I remembered to put quotes around my ID number!

    ReplyDelete
  7. My standard deviations are 8%. I see 46 Dodecad IDs are missing. What’s the significance of that?

    ReplyDelete
  8. Well my guess is that the CEU25 is the Americans from Utah used in many studies as proxies for White Caucasians, 25 being the number of Americans from Utah in the study.

    I dislike downloading more software but my curiosity overcame me. My closest matches IBS wise is similar to what I have seen with Davidski. Mostly Tuscans and Sardinians with odd, probably outlier, Syrian, Cypriot and Armenian. Had a laugh because the closest DOD member to me is my RF cousin from 23andMe.

    ReplyDelete
  9. Thanks, it worked well for me once I remembered to put in the quotation marks.

    ReplyDelete
  10. I still don't know of which sample the CEU25 consists, in detail. I have it 5 times in my top 50.

    This consists of 25 White Utahns from the CEU HapMap sample.

    In basic layman's terms, does the "R" program take 146K snp's and match them with similar IBS snp's from a standard K=10 set

    The matching doesn't happen in the R program, it's precomputed. The R program is a way to present the results. The standard K=10 set is the one of 36 populations/692 individuals that are used in the standard K=10 results that everyone gets. Those results have been calculated over slightly different SNP sets since the beginning of the project, all of them around the ~150k number of SNPs. Currently they are calculated over 145,743 SNPs, and these are the same SNPs used in this IBS analysis.

    I see 46 Dodecad IDs are missing. What’s the significance of that?

    Two classes of people are missing:

    1. Relatives of other IDs
    2. Fraudsters

    It's also possible that I've excluded someone by my own error, so feel free to e-mail me if that's the case.

    ReplyDelete
  11. Why does V609, a Belorussian, frequently become a nearest neighbor for so many Europeans? It's more pronounced for North Europeans.

    ReplyDelete
  12. Is there a Linux version for this tool? I don't have a windows machines and my wife will not let me touch her windows laptop due to fear of too many geeky tools.

    ReplyDelete
  13. Thanx, for all your hard work in setting up the project : ) BTW, V609 Belorussian, is also my nearest neighbor? .54 dev.

    ReplyDelete
  14. @bergep

    http://cran.ms.unimelb.edu.au/

    ReplyDelete
  15. @bergep, when you go to download R there is a Linux option.

    ReplyDelete
  16. I can not find DOD192. Why?

    Because DOD192 is a relative of another project member.

    ReplyDelete
  17. Haha, the mysterious Belorussian V609 is 4th closest for a Persian too.

    Ah well, I have him also as my closest.

    Something is strange with this person.

    ReplyDelete
  18. Oh Doh! I saw object and thought of a binary object which would be OS dependent. It's an R object so it's OS independent. Thanks.

    ReplyDelete
  19. Belorussian V609 is 2nd closest to me, and I'm almost completely British/German. What's up with that guy?

    ReplyDelete
  20. My own results are pretty coherent and show that the Dodecad Ancestry Project is a very solid project proving historical hypotheses. I'm DOD133 (of full Pyrenean Gascon ancestry) and unsurprisingly enough, my best matches are my French Basque neighbours. Another proof that Gascon people are indeed romanized Basco-Aquitanian people.

    V117 "1" "DOD133" "1"
    V681 "2" "French_Basque" "0.752704"
    V692 "3" "French_Basque" "0.751609"
    V688 "4" "French_Basque" "0.751574"
    V694 "5" "French_Basque" "0.751557"
    V675 "6" "French_Basque" "0.751373"
    V685 "7" "French_Basque" "0.75121"
    V687 "8" "French_Basque" "0.750783"
    V682 "9" "French_Basque" "0.750733"
    V677 "10" "French_Basque" "0.750649"
    V684 "11" "French_Basque" "0.750642"
    V676 "12" "French_Basque" "0.750633"
    V696 "13" "French_Basque" "0.750633"
    V734 "14" "Spaniards" "0.750599"
    V691 "15" "French_Basque" "0.750352"
    V678 "16" "French_Basque" "0.750288"
    V291 "17" "DOD321" "0.750156"
    V683 "18" "French_Basque" "0.750045"
    V625 "19" "French" "0.75001"
    V640 "20" "French" "0.749995"

    ReplyDelete
  21. Belorussian/Lithuanians just show the general direction the Barbarian invaders made on their way to Germania and Britannia during the fall of the Roman empire. He is the quintessential barbarian. The one that took over Europe during the dark ages and dominated all the subsequent royal houses of Europe...

    ReplyDelete
  22. Can someone rehost the IBS data file? It seems to be down...

    ReplyDelete
  23. How can I retrieve my DOD ID number? I think I confused it with my Eurogene ID number.
    Any help would be appreciated.

    ReplyDelete
  24. Send e-mail to the project from the same e-mail address you used to send your data.

    ReplyDelete