Sunday, November 28, 2010

Clusters galore: less is more, or, pushing the limits of ancestry inference

I had already hinted in my previous post on my new technique that retaining all MDS dimensions might add noise to the analysis, and I was hopeful that even finer resolution could be achieved with fewer dimensions.

In the spreadsheet you can see the optimal solution if one retains only 10 dimensions in which case 45 clusters are inferred. Previously, I retained 47 dimensions, and got only 35 clusters in the optimal solution: less is more.

In comparison to the previous analysis, I can detect some interesting changes:
  1. Spaniards and Portuguse are split from Tuscans and are joined by some French and North Italians; the rest of the French stay with White Utahns, and the rest of the North Italians stay with Tuscans.
  2. Romanians too get their own cluster
  3. Turks are split from Assyrians/Armenians.
  4. Germans are split from Scandinavians, with 1 sample from either population going to the other population.
  5. The relationship between Cypriots and South Italians is retained, but most of the Greeks (many of whom were borderline between the Tuscan and South Italian cluster) go the Tuscan way.
I am now studying how to choose the optimal number of MDS dimensions to retain, so I will not report any individual data about this to project participants. I just wanted to let everyone share in the excitement.

No comments:

Post a Comment