Monday, July 25, 2011

Do-It-Yourself Dodecad v 1.0

(UDPATE: There is a new 2.0 version of the software)

I have decided to release a Do-It-Yourself calculator (and mirror), for several reasons:
  1. So that people who don't want to send me their data can still get their results
  2. So that people can estimate admixture proportions in all their relatives, as relatives can't be accepted in the Project
  3. So that people of mixed ancestry can get their results, as there have been limited opportunities for them to submit their data to the Project so far
  4. Most importantly, so that I won't have to do-it-myself ;-)
You need a Windows or Linux 32bit/64bit machine to run DIYDodecad. The instructions should be easy to follow, but if you encounter any bugs or have any problems, feel free to leave a comment or write to me (dodecad@gmail.com).

Of course, I will continue to ask for people to send me their data in the future: the calculator is made possible in part because of their contributions. Project participants have added benefits, such as the more specialized Clusters Galore or regional analyses.

If you are a project participant, you can still try DIYDodecad; you will get slightly different results than the ones you already have, because DIYDodecad does not use the same "random seed" as ADMIXTURE, and has a different default convergence criterion (maximum log-likelihood change of 1e-6 between successive EM iterations). You will also need the program in the future, as more "calculators" will be disseminated for it.

So, if you don't want to/can't join the Project, you can still get your Dodecad v3 results; you can also try the Dodecad Oracle with them. Also, feel free to leave a comment in this post with your results.

Related: some background on the creation of DIYDodecad

43 comments:

  1. Hello Dienekes,

    This sounds great, as I've missed the last submission deadline since I received the 23andme results a few days after the last call expired. But I think you forgot to include a file in the .rar archive, more specifically the 'standardize.r' file. Please correct me if I'm wrong, but when calling for 'standardize.r', R should look in the working directory for this file as to load the instructions of it, right? How can it load it when it's not in the DIYDodecad package? Your instructions in the 'readme' text file were pretty clear and straightforward, and I previously run R for other analyses, such as Dodecad Oracle and the dated EuroDNACalc, which both worked... Can you help me? Thanks again!

    ReplyDelete
  2. You are right, I will update the RAR

    ReplyDelete
  3. Are you sure this is not number 1 LOL,"Most importantly, so that I won't have to do-it-myself ;-)"

    ReplyDelete
  4. I have R installed, can I run it on MAC OS X?

    ReplyDelete
  5. Would you update the main blog link when you add the missing file to the RAR?

    ReplyDelete
  6. Hi, how about Mac OS X users? Do you think the linux version can do it? Best,

    ReplyDelete
  7. Dienekes,
    THanks a bunch! The program works without flaws and the instructions are straightforward. I am sure this will remove a lot of burden at your end :-).

    ReplyDelete
  8. All links have been updated. I don't have plans for a Mac version at this time.

    ReplyDelete
  9. I recommend that Mac OS X users try VirtualBox

    http://www.virtualbox.org/wiki/Downloads

    You can install a Linux or Windows guest system under that. I'm also sure there are ways to have Linux Live CD or Live USB, which would also do the trick. There are also Windows/DOS emulators for Mac that might also work, as the program is command-line only so it ought to work with something like that.

    ReplyDelete
  10. All I get when trying to download your DYI files is:

    404 file not found error

    ReplyDelete
  11. I can't do it, I probably don't understand the last step since the rest goes okay:

    6. At your operating system command prompt, go to the working directory (using the 'cd' command)
    and then enter:

    In Windows:

    DIYDodecadWin dv3.par

    I use Windows, but I don't know what to do there. I tried different things but all times gives an error. Is there a better explanation to make it easy? Perhaps looks silly, I'm sorry xd

    ReplyDelete
  12. All I get when trying to download your DYI files is:

    404 file not found error


    Not sure why, others have already downloaded and used it.

    I can't do it, I probably don't understand the last step since the rest goes okay:

    6. At your operating system command prompt, go to the working directory (using the 'cd' command)
    and then enter:


    If you don't know how to use command line in Windows (like the old DOS commands), you can run the program from within R as follows:

    system('DIYDodecadWin dv3.par')

    R will start working furiously for about a quarter of an hour (depending on your hardware), but in the end it will print out the results.

    ReplyDelete
  13. Very good Dienekes', it works perfectly. In next runs I think other people will apreciate an explanation like this.

    No significant changes in the results.

    Thanks ;)

    ReplyDelete
  14. You get the 404 error if you try to download the contents of the .rar withou the whole archive. The link posted leads to the contents of the rar archive so on the left top corner of the linked document page, select to download the original document, which should be a .rar file.

    ReplyDelete
  15. Awesome! I ran my Dad's 23andme file and got almost EXACTLY the same results! Thanks so much for creating this.

    ReplyDelete
  16. Dear Dienekes,
    When I started R it didn't work. The following reaction came:

    > source('standardize.r')
    Error in file(file, "r", encoding = encoding) :
    cannot open the connection
    In addition: Warning message:
    In file(file, "r", encoding = encoding) :
    cannot open file 'standardize.r': No such file or directory

    May I ask you to help me

    ReplyDelete
  17. When I started R it didn't work. The following reaction came:

    Did you change the directory to the working directory?

    ReplyDelete
  18. changing the directory failed:
    I did:

    > setwd('C:\\Gebruikers\\My name\\Mijn documenten\\A - GENEALOGIE\\A - DNA\\A - Dodecad')
    Error in setwd("C:\\Gebruikers\\My name\\Mijn documenten\\A - GENEALOGIE\\A - DNA\\A - Dodecad") :
    cannot change working directory

    ReplyDelete
  19. Just use the File menu to change your directory.

    ReplyDelete
  20. I get on the command prompt and get into working directory and type in dodecad.rar dv3.par and dodecad rar notepad shows up. I believe I need the Dodecadwin.exe file and when I try to download that it gives me error404. When I go to the left to download file it gives me a rar file and not an exe file like I need.

    ReplyDelete
  21. All the needed files are included in the RAR file, which you can open with WinRar or other archiving software.

    ReplyDelete
  22. is it possible to last as long as 45 minutes to get the results? after about 40 minutes i gave up because it was late. i will try again this evening

    thanks

    ReplyDelete
  23. Running time depends on the genotype file and on the computer running the program. You can try running it overnight if it conflicts with other stuff you do on your computer.

    Alternatively you can edit the dv3.par file to set the goal at a lower level, e.g., 1D-4 rather than the default 1D-6. I don't recommend that, however, as it will render the results not directly comparable with those of other users of the software. I purposefully set the threshold to 1D-6 because I figured most people can devote a fairly long time to get this as right as possible, since it only needs to be done once.

    ReplyDelete
  24. 23andMe updates the raw data files from time to time. I ran the calculator with the data I sent you in February and the most recent data from 23andMe. The results came out the same.

    ADMIXTURE

    11.4% East_European
    47.0% West_European
    31.5% Mediterranean
    9.3% West_Asian
    0.8% South_Asian

    DIYDodecad

    11.56% East_European
    47.14% West_European
    31.23% Mediterranean
    9.14% West_Asian
    0.93% South_Asian

    ReplyDelete
  25. I'm stuck on step 5 of the readme file.

    This command:
    standardize('brianmurphy.txt', company='23andme')

    Gets this error:
    Error in inherits(x, "data.frame") : object 'X' not found

    I've changed the working directory and getwd() confirms it. The filename is correct and it's sitting in the working directory. Any suggestions?

    ReplyDelete
  26. See difference between these two lines:

    standardize('brianmurphy.txt', company='23andme')
    standardize('brianmurphy.txt', company='23andMe')

    ReplyDelete
  27. Many thanks. Caps are important.

    ReplyDelete
  28. i just had to wait a little more and had the results. i've also done EURO-DNA-CALC and DodecadOracle and i am excited with the results.

    thanks man!

    ReplyDelete
  29. Download for DIYDodecad1.0.rar has NO file.
    Error?

    ReplyDelete
  30. You have to download the whole archive. The link posted leads to the contents of the rar archive so on the left top corner of the linked document page, select to download the original document, which should be a .rar file.

    ReplyDelete
  31. Like Larry, I got a 404 when I tried to download the 32bit Linux version, but downloaded from the mirror instead, and all was well. I think it took longer to download and compile R than it did to run my results.

    I'm an adoptee. My biological mother (I've found that part of my family) is from Appalachia, with family in the US for many generations. The genealogy I've done suggests British Isles ancestry with Swiss German about 5 generations back. There is also family lore that we are part Native American, and my mtDNA haplogroup--A4a1--seems to confirm that, but tracing the maternal line gets difficult in the early 1800s. 23andme says no NA ancestry in the last 5 generations.

    My father is unknown, but I have a picture of him and a possible surname: Franks. That suggests German, but so many USA-ans have "Ellis Island" names that it doesn't mean much. Here are my results:


    ------------------------------------------------------
    -- DIY Dodecad v 1.0 ---------------------------------
    ..
    166462 markers
    12 ancestral populations
    Genotype rate is .99847
    Beginning EM iterations:
    # 8350 loglik: -1.5859421236E+05 delta: 1.008E-06 goal: 1.000E-06
    -----------------------------
    FINAL ADMIXTURE PROPORTIONS:
    8352 iterations
    Log Likelihood = -1.5859421235E+05
    -----------------------------
    East_European 10.53%
    West_European 49.63%
    Mediterranean 26.30%
    Neo_African 0.08%
    West_Asian 10.34%
    South_Asian 0.40%
    Northeast_Asian 0.00%
    Southeast_Asian 0.00%
    East_African 0.00%
    Southwest_Asian 1.67%
    Northwest_African 1.04%
    Palaeo_African 0.00%

    And Dodecad Oracle:

    > DodecadOracle(c(10.5,49.6,26.3,0.1,10.3,0.4,0.0,0.0,0.0,1.7,1.0,0.0))
    [,1] [,2]
    [1,] "CEU" "5.4635"
    [2,] "N._European" "6.4288"
    [3,] "Argyll_1KG" "7.3457"
    [4,] "Orcadian" "7.6851"
    [5,] "Orkney_1KG" "8.0759"
    [6,] "German_D" "9.1499"
    [7,] "French" "10.8453"
    [8,] "French_D" "11.3824"
    [9,] "Mixed_Germanic_D" "12.3325"
    [10,] "Dutch_D" "13.8798"
    > DodecadOracle(c(10.5,49.6,26.3,0.1,10.3,0.4,0.0,0.0,0.0,1.7,1.0,0.0),mixedmode=T)
    [,1] [,2]
    [1,] "12.9% Sephardic_Jews + 87.1% Argyll_1KG" "0.8293"
    [2,] "14.5% S_Italian_Sicilian_D + 85.5% Argyll_1KG" "0.8452"
    [3,] "77.7% Argyll_1KG + 22.3% Tuscan_X" "0.888"
    [4,] "14% Sicilian_D + 86% Argyll_1KG" "0.9205"
    [5,] "13.5% S_Italian_D + 86.5% Argyll_1KG" "0.9258"
    [6,] "63.1% British_Isles_D + 36.9% Romanians_14" "1.0156"
    [7,] "77.5% Argyll_1KG + 22.5% Tuscan_H" "1.061"
    [8,] "17.6% Ashkenazy_Jews + 82.4% Orcadian" "1.0748"
    [9,] "91.8% CEU + 8.2% Druze" "1.0755"
    [10,] "90.9% CEU + 9.1% Turkish_D" "1.082"

    ReplyDelete
  32. Dienekes,

    I downloaded and ran it on 9 family members - it works like a dream - bravo!!!

    Here's one question, for me it took 16,500 iterations, while for my wife only about 8,000 iterations, and for others around 14,000 - any reason for this??

    ReplyDelete
  33. Run it in "progress" mode to see how the admixture proportions change during the course of the computation.

    ReplyDelete
  34. How to run it in "progress" mode? Is there a command? Thanks

    ReplyDelete
  35. Dienekes,

    Yeah, "Progress" mode is very cool!

    ReplyDelete
  36. Is there a table that shows what groups comprise each population? For example, which groups = Western European, Eastern European, etc. I've searched the blog, but don't seem to have found a post explicitly describing each population set.

    ReplyDelete
  37. DIY DODECAD IS NOT WORKING FOR ME

    I'm new here, and I have attempted to run the DIY Dodecad on my own (using the that R console thing, is that what its caled), and to shorten things up I did not get the percentages to come up.

    Here's how far I got:

    source('standardize.r')

    standardize('me.txt', company='23andMe')

    These two worked and no error messages, but when you get to here:

    system('DIYDodecadWin dv3.par')

    I enter the above command and nothing happens. Aren't the percentages supposed to come up, or is there more to it that needs to be done?

    ReplyDelete
  38. First, you should use the latest version of the software

    http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html

    Second, the computation takes a while, the percentages do not show up right away.

    You might also be in the wrong directory, so issue a

    dir()

    command after you use standardize to make sure that the genotype.txt file and all dv3 related files are in your working directory.

    ReplyDelete
  39. Ok, I entered dir() and here is what I have:

    [1] "7z920 (1).exe"
    [2] "anna fuchs.txt - Copy.txt"
    [3] "anna fuchs.txt - Copy.zip"
    [4] "anna fuchs.txt.txt"
    [5] "anna fuchs.txt.zip"
    [6] "anna.txt"
    [7] "anna.txt - Copy (2).txt"
    [8] "anna.txt - Copy (3).txt"
    [9] "anna.txt - Copy (4).txt"
    [10] "anna.txt - Copy.txt"
    [11] "annafuchs.txt.zip"
    [12] "DIYDodecad2.0 (1).rar"
    [13] "DIYDodecad2.0 (2).rar"
    [14] "DIYDodecad2.0.rar"
    [15] "DodecadOracleK12a.RData"
    [16] "DodecadOracleV1 (1).RData"
    [17] "DodecadOracleV1.RData"
    [18] "dv3 (1).txt"
    [19] "dv3 (2).alleles"
    [20] "dv3.12.F"
    [21] "dv3.par"
    [22] "genome_Anna_Fuchs_Full_20110929213516.txt"
    [23] "genotype.txt"
    [24] "K12a (1).rar"
    [25] "MDLP.zip"
    [26] "me.txt - Copy (2).txt"
    [27] "me.txt - Copy (3).txt"
    [28] "me.txt - Copy.txt"
    [29] "me.txt.r - Copy (2).txt"
    [30] "me.txt.r - Copy.txt"
    [31] "me.txt.r.txt"
    [32] "standardize.r"


    Please let me know if there's anything I'm missing. Thank you.

    ReplyDelete
  40. You should extract the contents of the RAR file to your working directory. You have a whole bunch of stuff that has been renamed (1) (2) and so on. I'd make a fresh start if I were you and just extract the files of the RAR file to an empty directory.

    ReplyDelete
  41. Thank you, I finally got it to work, and the final admixture proportions came up.

    However (and I'll try my best to explain this properly) , when I do a bychromosome analysisWhy is it that the bychromosome average gives a different calcuation from the final admixture proportions you receive on Dodecad DIY? For example, I've noticed that the bychromosome average spreadsheet gives a different calcuation from the final admixture proportions you receive on Dodecad DIY. For example, I've noticed that when I calculate the average of each component on each chromosome (multiply the #of SNP's by the percent of the components), then add up the totals for the chromsomes, then divide by the total # of SNP's), I am coming up with different percentages than what the calculator is coming out with for final admixture proportions


    For example, here is what the program gave me for the final numbers, without the by-chromosome


    11.18% East_European
    50.47% West_European
    24.36% Mediterranean
    0.00% Neo_African
    10.01% West_Asian
    0.36% South_Asian
    1.25% Northeast_Asian
    0.04% Southeast_Asian
    0.00% East_African
    2.24% Southwest_Asian
    0.00% Northwest_African
    0.08% Palaeo_African

    and I'll just use the West Eurro as an example, as you can see that broke out at 50.47%, but when I try to do the calculations myself (multiply percentages by allele count on the cromosomes), the total for that one came up as 49.99%. Why is this happening?

    ReplyDelete