Dodecad Ancestry Project: Do-It-Yourself Dodecad v 1.0

Monday, July 25, 2011

Do-It-Yourself Dodecad v 1.0

(UDPATE: There is a new 2.0 version of the software)

I have decided to release a Do-It-Yourself calculator (and mirror), for several reasons:

So that people who don't want to send me their data can still get their results
So that people can estimate admixture proportions in all their relatives, as relatives can't be accepted in the Project
So that people of mixed ancestry can get their results, as there have been limited opportunities for them to submit their data to the Project so far
Most importantly, so that I won't have to do-it-myself ;-)

You need a Windows or Linux 32bit/64bit machine to run DIYDodecad. The instructions should be easy to follow, but if you encounter any bugs or have any problems, feel free to leave a comment or write to me (dodecad@gmail.com).

Of course, I will continue to ask for people to send me their data in the future: the calculator is made possible in part because of their contributions. Project participants have added benefits, such as the more specialized Clusters Galore or regional analyses.

If you are a project participant, you can still try DIYDodecad; you will get slightly different results than the ones you already have, because DIYDodecad does not use the same "random seed" as ADMIXTURE, and has a different default convergence criterion (maximum log-likelihood change of 1e-6 between successive EM iterations). You will also need the program in the future, as more "calculators" will be disseminated for it.

So, if you don't want to/can't join the Project, you can still get your Dodecad v3 results; you can also try the Dodecad Oracle with them. Also, feel free to leave a comment in this post with your results.

Related: some background on the creation of DIYDodecad

43 comments:

Der FrostJuly 26, 2011 at 2:00 PM
Hello Dienekes,

This sounds great, as I've missed the last submission deadline since I received the 23andme results a few days after the last call expired. But I think you forgot to include a file in the .rar archive, more specifically the 'standardize.r' file. Please correct me if I'm wrong, but when calling for 'standardize.r', R should look in the working directory for this file as to load the instructions of it, right? How can it load it when it's not in the DIYDodecad package? Your instructions in the 'readme' text file were pretty clear and straightforward, and I previously run R for other analyses, such as Dodecad Oracle and the dated EuroDNACalc, which both worked... Can you help me? Thanks again!
ReplyDelete
Replies
Dodecad ProjectJuly 26, 2011 at 3:38 PM
You are right, I will update the RAR
ReplyDelete
Replies
aJuly 26, 2011 at 4:31 PM
Are you sure this is not number 1 LOL,"Most importantly, so that I won't have to do-it-myself ;-)"
ReplyDelete
Replies
Eduardo PintoJuly 26, 2011 at 8:00 PM
I have R installed, can I run it on MAC OS X?
ReplyDelete
Replies
anastasJuly 26, 2011 at 8:01 PM
Would you update the main blog link when you add the missing file to the RAR?
ReplyDelete
Replies
Antonio PedroJuly 26, 2011 at 8:04 PM
Hi, how about Mac OS X users? Do you think the linux version can do it? Best,
ReplyDelete
Replies
SBJuly 26, 2011 at 8:29 PM
Dienekes,
THanks a bunch! The program works without flaws and the instructions are straightforward. I am sure this will remove a lot of burden at your end :-).
ReplyDelete
Replies
DienekesJuly 26, 2011 at 8:31 PM
All links have been updated. I don't have plans for a Mac version at this time.
ReplyDelete
Replies
DienekesJuly 26, 2011 at 8:35 PM
I recommend that Mac OS X users try VirtualBox

http://www.virtualbox.org/wiki/Downloads

You can install a Linux or Windows guest system under that. I'm also sure there are ways to have Linux Live CD or Live USB, which would also do the trick. There are also Windows/DOS emulators for Mac that might also work, as the program is command-line only so it ought to work with something like that.
ReplyDelete
Replies
LarryJuly 26, 2011 at 9:22 PM
All I get when trying to download your DYI files is:

404 file not found error
ReplyDelete
Replies
AcidJuly 26, 2011 at 10:16 PM
I can't do it, I probably don't understand the last step since the rest goes okay:

6. At your operating system command prompt, go to the working directory (using the 'cd' command)
and then enter:

In Windows:

DIYDodecadWin dv3.par

I use Windows, but I don't know what to do there. I tried different things but all times gives an error. Is there a better explanation to make it easy? Perhaps looks silly, I'm sorry xd
ReplyDelete
Replies
DienekesJuly 26, 2011 at 10:30 PM
All I get when trying to download your DYI files is:

404 file not found error

Not sure why, others have already downloaded and used it.

I can't do it, I probably don't understand the last step since the rest goes okay:

6. At your operating system command prompt, go to the working directory (using the 'cd' command)
and then enter:

If you don't know how to use command line in Windows (like the old DOS commands), you can run the program from within R as follows:

system('DIYDodecadWin dv3.par')

R will start working furiously for about a quarter of an hour (depending on your hardware), but in the end it will print out the results.
ReplyDelete
Replies
AcidJuly 26, 2011 at 11:36 PM
Very good Dienekes', it works perfectly. In next runs I think other people will apreciate an explanation like this.

No significant changes in the results.

Thanks ;)
ReplyDelete
Replies
SBJuly 27, 2011 at 12:36 AM
You get the 404 error if you try to download the contents of the .rar withou the whole archive. The link posted leads to the contents of the rar archive so on the left top corner of the linked document page, select to download the original document, which should be a .rar file.
ReplyDelete
Replies
LisaJuly 27, 2011 at 9:39 AM
Awesome! I ran my Dad's 23andme file and got almost EXACTLY the same results! Thanks so much for creating this.
ReplyDelete
Replies
B.M.July 27, 2011 at 11:22 AM
Dear Dienekes,
When I started R it didn't work. The following reaction came:

> source('standardize.r')
Error in file(file, "r", encoding = encoding) :
cannot open the connection
In addition: Warning message:
In file(file, "r", encoding = encoding) :
cannot open file 'standardize.r': No such file or directory

May I ask you to help me
ReplyDelete
Replies
DienekesJuly 27, 2011 at 12:01 PM
When I started R it didn't work. The following reaction came:

Did you change the directory to the working directory?
ReplyDelete
Replies
B.M.July 27, 2011 at 2:43 PM
changing the directory failed:
I did:

> setwd('C:\\Gebruikers\\My name\\Mijn documenten\\A - GENEALOGIE\\A - DNA\\A - Dodecad')
Error in setwd("C:\\Gebruikers\\My name\\Mijn documenten\\A - GENEALOGIE\\A - DNA\\A - Dodecad") :
cannot change working directory
ReplyDelete
Replies
DienekesJuly 27, 2011 at 3:13 PM
Just use the File menu to change your directory.
ReplyDelete
Replies
CharlesJuly 27, 2011 at 5:47 PM
I get on the command prompt and get into working directory and type in dodecad.rar dv3.par and dodecad rar notepad shows up. I believe I need the Dodecadwin.exe file and when I try to download that it gives me error404. When I go to the left to download file it gives me a rar file and not an exe file like I need.
ReplyDelete
Replies
DienekesJuly 27, 2011 at 6:42 PM
All the needed files are included in the RAR file, which you can open with WinRar or other archiving software.
ReplyDelete
Replies
Alareiks GadrauhtsJuly 27, 2011 at 7:26 PM
is it possible to last as long as 45 minutes to get the results? after about 40 minutes i gave up because it was late. i will try again this evening

thanks
ReplyDelete
Replies
DienekesJuly 27, 2011 at 7:49 PM
Running time depends on the genotype file and on the computer running the program. You can try running it overnight if it conflicts with other stuff you do on your computer.

Alternatively you can edit the dv3.par file to set the goal at a lower level, e.g., 1D-4 rather than the default 1D-6. I don't recommend that, however, as it will render the results not directly comparable with those of other users of the software. I purposefully set the threshold to 1D-6 because I figured most people can devote a fairly long time to get this as right as possible, since it only needs to be done once.
ReplyDelete
Replies
AnonymousJuly 28, 2011 at 5:31 AM
23andMe updates the raw data files from time to time. I ran the calculator with the data I sent you in February and the most recent data from 23andMe. The results came out the same.

ADMIXTURE

11.4% East_European
47.0% West_European
31.5% Mediterranean
9.3% West_Asian
0.8% South_Asian

DIYDodecad

11.56% East_European
47.14% West_European
31.23% Mediterranean
9.14% West_Asian
0.93% South_Asian
ReplyDelete
Replies
drouhinJuly 29, 2011 at 2:27 PM
Huge thanks!!
ReplyDelete
Replies
drouhinJuly 29, 2011 at 6:00 PM
I'm stuck on step 5 of the readme file.

This command:
standardize('brianmurphy.txt', company='23andme')

Gets this error:
Error in inherits(x, "data.frame") : object 'X' not found

I've changed the working directory and getwd() confirms it. The filename is correct and it's sitting in the working directory. Any suggestions?
ReplyDelete
Replies
DienekesJuly 29, 2011 at 6:06 PM
See difference between these two lines:

standardize('brianmurphy.txt', company='23andme')
standardize('brianmurphy.txt', company='23andMe')
ReplyDelete
Replies
drouhinJuly 29, 2011 at 6:29 PM
Many thanks. Caps are important.
ReplyDelete
Replies
Alareiks GadrauhtsJuly 29, 2011 at 9:34 PM
i just had to wait a little more and had the results. i've also done EURO-DNA-CALC and DodecadOracle and i am excited with the results.

thanks man!
ReplyDelete
Replies
AnonymousAugust 1, 2011 at 9:39 AM
Download for DIYDodecad1.0.rar has NO file.
Error?
ReplyDelete
Replies
Rolf BerlinAugust 1, 2011 at 10:43 PM
You have to download the whole archive. The link posted leads to the contents of the rar archive so on the left top corner of the linked document page, select to download the original document, which should be a .rar file.
ReplyDelete
Replies
LSAugust 2, 2011 at 4:27 PM
Like Larry, I got a 404 when I tried to download the 32bit Linux version, but downloaded from the mirror instead, and all was well. I think it took longer to download and compile R than it did to run my results.

I'm an adoptee. My biological mother (I've found that part of my family) is from Appalachia, with family in the US for many generations. The genealogy I've done suggests British Isles ancestry with Swiss German about 5 generations back. There is also family lore that we are part Native American, and my mtDNA haplogroup--A4a1--seems to confirm that, but tracing the maternal line gets difficult in the early 1800s. 23andme says no NA ancestry in the last 5 generations.

My father is unknown, but I have a picture of him and a possible surname: Franks. That suggests German, but so many USA-ans have "Ellis Island" names that it doesn't mean much. Here are my results:

------------------------------------------------------
-- DIY Dodecad v 1.0 ---------------------------------
..
166462 markers
12 ancestral populations
Genotype rate is .99847
Beginning EM iterations:
# 8350 loglik: -1.5859421236E+05 delta: 1.008E-06 goal: 1.000E-06
-----------------------------
FINAL ADMIXTURE PROPORTIONS:
8352 iterations
Log Likelihood = -1.5859421235E+05
-----------------------------
East_European 10.53%
West_European 49.63%
Mediterranean 26.30%
Neo_African 0.08%
West_Asian 10.34%
South_Asian 0.40%
Northeast_Asian 0.00%
Southeast_Asian 0.00%
East_African 0.00%
Southwest_Asian 1.67%
Northwest_African 1.04%
Palaeo_African 0.00%

And Dodecad Oracle:

> DodecadOracle(c(10.5,49.6,26.3,0.1,10.3,0.4,0.0,0.0,0.0,1.7,1.0,0.0))
[,1] [,2]
[1,] "CEU" "5.4635"
[2,] "N._European" "6.4288"
[3,] "Argyll_1KG" "7.3457"
[4,] "Orcadian" "7.6851"
[5,] "Orkney_1KG" "8.0759"
[6,] "German_D" "9.1499"
[7,] "French" "10.8453"
[8,] "French_D" "11.3824"
[9,] "Mixed_Germanic_D" "12.3325"
[10,] "Dutch_D" "13.8798"
> DodecadOracle(c(10.5,49.6,26.3,0.1,10.3,0.4,0.0,0.0,0.0,1.7,1.0,0.0),mixedmode=T)
[,1] [,2]
[1,] "12.9% Sephardic_Jews + 87.1% Argyll_1KG" "0.8293"
[2,] "14.5% S_Italian_Sicilian_D + 85.5% Argyll_1KG" "0.8452"
[3,] "77.7% Argyll_1KG + 22.3% Tuscan_X" "0.888"
[4,] "14% Sicilian_D + 86% Argyll_1KG" "0.9205"
[5,] "13.5% S_Italian_D + 86.5% Argyll_1KG" "0.9258"
[6,] "63.1% British_Isles_D + 36.9% Romanians_14" "1.0156"
[7,] "77.5% Argyll_1KG + 22.5% Tuscan_H" "1.061"
[8,] "17.6% Ashkenazy_Jews + 82.4% Orcadian" "1.0748"
[9,] "91.8% CEU + 8.2% Druze" "1.0755"
[10,] "90.9% CEU + 9.1% Turkish_D" "1.082"
ReplyDelete
Replies
pconroyAugust 2, 2011 at 8:26 PM
Dienekes,

I downloaded and ran it on 9 family members - it works like a dream - bravo!!!

Here's one question, for me it took 16,500 iterations, while for my wife only about 8,000 iterations, and for others around 14,000 - any reason for this??
ReplyDelete
Replies
DienekesAugust 3, 2011 at 12:55 AM
Run it in "progress" mode to see how the admixture proportions change during the course of the computation.
ReplyDelete
Replies
AnonymousAugust 3, 2011 at 10:22 AM
How to run it in "progress" mode? Is there a command? Thanks
ReplyDelete
Replies
pconroyAugust 3, 2011 at 8:58 PM
Dienekes,

Yeah, "Progress" mode is very cool!
ReplyDelete
Replies
TomAugust 25, 2011 at 4:45 AM
Is there a table that shows what groups comprise each population? For example, which groups = Western European, Eastern European, etc. I've searched the blog, but don't seem to have found a post explicitly describing each population set.
ReplyDelete
Replies
Dodecad ProjectAugust 25, 2011 at 12:04 PM
Read the README.txt
ReplyDelete
Replies
nutmegDecember 30, 2011 at 4:22 AM
DIY DODECAD IS NOT WORKING FOR ME

I'm new here, and I have attempted to run the DIY Dodecad on my own (using the that R console thing, is that what its caled), and to shorten things up I did not get the percentages to come up.

Here's how far I got:

source('standardize.r')

standardize('me.txt', company='23andMe')

These two worked and no error messages, but when you get to here:

system('DIYDodecadWin dv3.par')

I enter the above command and nothing happens. Aren't the percentages supposed to come up, or is there more to it that needs to be done?
ReplyDelete
Replies
DienekesDecember 30, 2011 at 1:46 PM
First, you should use the latest version of the software

http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html

Second, the computation takes a while, the percentages do not show up right away.

You might also be in the wrong directory, so issue a

dir()

command after you use standardize to make sure that the genotype.txt file and all dv3 related files are in your working directory.
ReplyDelete
Replies
nutmegDecember 30, 2011 at 6:33 PM
Ok, I entered dir() and here is what I have:

[1] "7z920 (1).exe"
[2] "anna fuchs.txt - Copy.txt"
[3] "anna fuchs.txt - Copy.zip"
[4] "anna fuchs.txt.txt"
[5] "anna fuchs.txt.zip"
[6] "anna.txt"
[7] "anna.txt - Copy (2).txt"
[8] "anna.txt - Copy (3).txt"
[9] "anna.txt - Copy (4).txt"
[10] "anna.txt - Copy.txt"
[11] "annafuchs.txt.zip"
[12] "DIYDodecad2.0 (1).rar"
[13] "DIYDodecad2.0 (2).rar"
[14] "DIYDodecad2.0.rar"
[15] "DodecadOracleK12a.RData"
[16] "DodecadOracleV1 (1).RData"
[17] "DodecadOracleV1.RData"
[18] "dv3 (1).txt"
[19] "dv3 (2).alleles"
[20] "dv3.12.F"
[21] "dv3.par"
[22] "genome_Anna_Fuchs_Full_20110929213516.txt"
[23] "genotype.txt"
[24] "K12a (1).rar"
[25] "MDLP.zip"
[26] "me.txt - Copy (2).txt"
[27] "me.txt - Copy (3).txt"
[28] "me.txt - Copy.txt"
[29] "me.txt.r - Copy (2).txt"
[30] "me.txt.r - Copy.txt"
[31] "me.txt.r.txt"
[32] "standardize.r"

Please let me know if there's anything I'm missing. Thank you.
ReplyDelete
Replies
DienekesDecember 30, 2011 at 7:32 PM
You should extract the contents of the RAR file to your working directory. You have a whole bunch of stuff that has been renamed (1) (2) and so on. I'd make a fresh start if I were you and just extract the files of the RAR file to an empty directory.
ReplyDelete
Replies
nutmegDecember 30, 2011 at 10:40 PM
Thank you, I finally got it to work, and the final admixture proportions came up.

However (and I'll try my best to explain this properly) , when I do a bychromosome analysisWhy is it that the bychromosome average gives a different calcuation from the final admixture proportions you receive on Dodecad DIY? For example, I've noticed that the bychromosome average spreadsheet gives a different calcuation from the final admixture proportions you receive on Dodecad DIY. For example, I've noticed that when I calculate the average of each component on each chromosome (multiply the #of SNP's by the percent of the components), then add up the totals for the chromsomes, then divide by the total # of SNP's), I am coming up with different percentages than what the calculator is coming out with for final admixture proportions

For example, here is what the program gave me for the final numbers, without the by-chromosome

11.18% East_European
50.47% West_European
24.36% Mediterranean
0.00% Neo_African
10.01% West_Asian
0.36% South_Asian
1.25% Northeast_Asian
0.04% Southeast_Asian
0.00% East_African
2.24% Southwest_Asian
0.00% Northwest_African
0.08% Palaeo_African

and I'll just use the West Eurro as an example, as you can see that broke out at 50.47%, but when I try to do the calculations myself (multiply percentages by allele count on the cromosomes), the total for that one came up as 49.99%. Why is this happening?
ReplyDelete
Replies

Add comment

Monday, July 25, 2011

Do-It-Yourself Dodecad v 1.0

43 comments:

Data Sources

Useful software

Genome Bloggers

Project Links

Technical stuff