Now, it is time to assess the results and see what improvements can be made. I see a few avenues for improvement:
Clusters, by definition, are composed of at least 2 individuals. Individuals who are the only representatives of their populations (e.g. if a Pygmy or an Icelandic+Armenian mix) will, by necessity, attach themselves to the closest cluster (e.g., to Yoruba, or to some Central European population), even though they are not necessarily close to that population.
Outlier detection is a difficult problem, but I will try some ideas on how to tackle it.
mclust is resilient to phantom clusters, i.e., clusters of "misfits" who don't belong in any other populations but are banded together erroneously by the algorithm. That is inevitable in an automated procedure, especially one that is pushing the limits of ancestry inference. Phantom clusters are, by their nature, transient, so there are some ideas on how to avoid them and how to focus on very robust and repeatable clusters.
Being part of a cluster tells you nothing about how "typical" a member of the cluster you are, i.e., how close to the average. This problem is exacerbated by the fact that the clusters inferred by mclust may have varying shape, size, and orientation.
Nonetheless there are ideas on how to quantify members' typicality, and I will explore them. Please note that typicality is not necessarily the same as "purity". For example, an elongated cluster of African Americans will have typical members with 20% European admixture, but the "purest" African Americans will have 0% European admixture and be very atypical of their group as a whole. Similarly, typical Turks have 5-6% East Eurasian admixture, but people with 10% East Eurasian admixture are less typical, but more likely to be descended from central Asian Turkic people.
Any new technique will have its birth pains, and hopefully myself and others will help identify them and resolve them.