Dodecad Ancestry Project: Geno 2.0 patch for DIYDodecad

Friday, November 30, 2012

Geno 2.0 patch for DIYDodecad

(See important update at the end of this post)

People who have tested using the Genographic Project's Geno 2.0 test can now use the DIYDodecad tool with their data. The raw data download from this test has a slightly different format than the ones from 23andMe and Family Finder, so it is necessary to convert your data in a format that DIYDodecad can interpret.

So, after you have downloaded and extracted the DIYDodecad software as per its instructions, you should also download a couple of extra files into your working directory; these files are included in this patch:

standardize.r which replaces the standardize.r in the DIYDodecad software bundle, and allows you to convert your Geno 2.0 formatted data
hgdp.base.txt which includes additional information about SNP markers that is not found in your Geno 2.0 raw data download, and which is necessary to complete the conversion process.

Once these two files have been extracted into your working directory, the process of using DIYDodecad is exactly the same as for any other user of the software.

The only difference is that at the step where you convert your data using the standardize command (see DIYDodecad README file), you will use the command:

standardize('johndoe.csv', company='geno2')

where johndoe.csv is your unzipped raw data download. This will write a genotype.txt file in the working directory, and you can proceed the rest of the way as per the instructions.

You can use all ancestry calculators released by the Project (or indeed other projects); the most recent one is globe13.

You should be aware, that because the Geno 2.0 test includes a smaller number of SNPs, and because globe13 and other calculators were developed using the common SNP set of 23andMe and Family Finder, the analysis using globe13 will only include ~34 thousand SNPs and will be "noisier" than usual. In the future, I might develop new calculators that make use of the SNP set of the Geno 2.0 test itself.

PS: Feel free to post a comment below if you experienced any difficulty converting your data; also thanks to CeCe Moore for graciously sharing a raw data file with me, which allowed me to build this converter.

UPDATE:

Apparently, the data format has been changed for some Geno 2.0 data downloads.
If your data includes a [Header] ... [Data] preamble followed by a list of 5 comma-separated values, ignore this.
If it includes a header "SNP,Chr,Allele1,Allele2" followed by a list of 4 comma-separated values, you should follow the instructions as above, but use company='geno2new' instead.

11 comments:

AnonymousDecember 10, 2012 at 2:42 AM
Found you via Cece Moore's blog... Just got my Geno 2.0 results, so downloaded your program and the patch for Geno 2.0 with no problem, but my program has been running for 3+ hours now and still not done.

How long should it take? The R program has returned 1 line ([Header],,,,,) and that's it. No file called genotype.txt in my directory yet.

Thanks.
ReplyDelete
Replies
DienekesDecember 10, 2012 at 3:32 PM
Send me your data if you want to dodecad@gmail.com and I'll take a look at it.
ReplyDelete
Replies
SgtDecember 11, 2012 at 10:18 PM
The same with me; only it did not take long
ReplyDelete
Replies
DienekesDecember 11, 2012 at 10:35 PM
I have run the patch on 3 different files sent to me and it produces a genotype.txt file just fine.

I can think of a few reasons why it might not work for you:

(1) you did not download hgdp.base.txt
(2) you did not uncompress your .csv.gz download file
(3) you did not setwd to the working directory

ReplyDelete
Replies
SgtDecember 11, 2012 at 10:35 PM
Disregard; have proceeded successfully. Thank you.
ReplyDelete
Replies
Stephen50December 19, 2012 at 12:54 AM
This worked for me, even though I am new to R. You should change this line at my brackets []: "The only difference is that [change AT to AFTER] the step where you convert your data using the standardize command (see DIYDodecad README file), you will use the command - - " I kept trying to run the patch instead of standardize instead of following it.
ReplyDelete
Replies
AnonymousDecember 27, 2012 at 5:29 AM
Figured out what I did wrong... The first time I unzipped my .csv.gz file I took a look at it in Excel. That must've corrupted the file somehow. Unzipped a copy tonight, and the size was about 180 bytes smaller. Using this new, smaller *.csv file from Geno2, I got a genotype.txt file within seconds.

Thank you.
ReplyDelete
Replies
mregdnaMarch 5, 2013 at 5:05 PM
Hello,

Is there a linux version (I'm on Ubuntu)? If not, it work maybe with Wine?

Thanks
ReplyDelete
Replies
mregdnaMarch 19, 2013 at 12:52 PM
Hello again,
if I well understand, it should also work on Ubuntu.
I tried but when I do :
standardize('xxx.csv', company='geno2new')
R answer me :
unexpected symbol on 'xxx.csv'

My genofile includes a header "SNP,Chr,Allele1,Allele2"

But just after I have the mtDNA data like "6248,Mt,T,T"
Does this correspond to what you call the list of 4 comma-separated values?

Thanks

ReplyDelete
Replies
FMRJune 2, 2013 at 3:35 AM
I have the new GENO 2 file type. I've changed the csv file to one more user friendly and, having changed R to working directory where all files are I type the following at the prompt:

standardize('fredmr.csv', company='geno2new')

I get the following error message:

Error in is.data.frame(x) : object 'X' not found

What am I doing wrong?

Thanks
ReplyDelete
Replies

Add comment

Friday, November 30, 2012

Geno 2.0 patch for DIYDodecad

11 comments:

Data Sources

Useful software

Genome Bloggers

Project Links

Technical stuff