- Populations that are separated by clustering analysis have enough genetic differences between them to allow such separation
- Populations that are not separated may, or may not have enough genetic differences between them to allow separation
- The clustering algorithm may have its limitations
- The number of markers may be insufficient
- The number of individuals may be insufficient
What constitutes a cluster? MCLUST can divide a dataset into as many clusters as you want, but it also chooses the number of clusters to optimize the Bayes Information Criterion. But, it's important to note that the BIC is not some god-given arbiter of what a cluster is; it is best viewed as a guide to choose a good number of clusters, and not as a guarantee that this is the true number of clusters that a population can be subdivided in.
Clustering Assyrians and Armenians, assuming 2 clusters
To that end, I decided to cluster Assyrians and Armenians, forcing MCLUST to infer 2 clusters. I only retained 2 MDS dimensions, as these are enough to distinguish between 2 groups. Here is the MDS plot:
We can observe that Assyrians can be distinguished from Armenians along Dimension 1, and there is also some structure in Assyrians, with 3 of them forming a mini-cluster on top, 4 of them forming another mini-cluster at the bottom, and 1 of them being closer to the Armenians.
By applying MCLUST with K=2, all 7 Armenians and 1 Assyrian are assigned to a cluster, and the remaining 7 Assyrians to another. Thus, the two groups can be separated from each other, although their differences (due to the factors I mentioned) are not large enough to lead to an improvement of the BIC.
With more individuals., it's possible that the BIC too will be able to track the improvement in likelihood that adding two clusters will produce.
And, indeed, it may be possible that some of these populations will be further subdivided. If more Assyrians join the project, for example, it may be the case that the two apparent Assyrian clusters will emerge, or it will be the case that the space between them will be filled by currently unsampled individuals.
As more individuals join the Project, ever-finer distinctions can be uncovered.
NB: I make the case for Assyrians and Armenians, but I have also tried this for other closely related populations who were assigned to the same cluster by Galore analysis, namely Spaniards and Portuguese. In that case, however, I was not able to find a clean solution with K=2. Once again, this may mean that Spaniards and Portuguese are not genetically that different, or it may mean that their difference will be revealed with more individuals joining the project.