from the after-getting-my-ancestry-results-I-bought-all-that-lederhosen-for-nothing? dept.
Study reveals flaws in popular genetic method:
The most common analytical method within population genetics is deeply flawed, according to a new study from Lund University in Sweden. This may have led to incorrect results and misconceptions about ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, affecting results within medical genetics and even commercial ancestry tests. The study is published in Scientific Reports.
"It is expected that this method will give correct results because it is so frequently used. But it is neither a guarantee of reliability nor produces statistically robust conclusions," says Dr. Eran Elhaik, Associate Professor in molecular cell biology at Lund University.
According to Elhaik, the method helped create old perceptions about race and ethnicity. It plays a role in manufacturing historical tales of who and where people come from, not only by the scientific community but also by commercial ancestry companies. [...]
The field of paleogenomics, where we want to learn about ancient peoples and individuals such as Copper age Europeans, heavily relies on PCA. PCA is used to create a genetic map that positions the unknown sample alongside known reference samples. Thus far, the unknown samples have been assumed to be related to whichever reference population they overlap or lie closest to on the map.
However, Elhaik discovered that the unknown sample could be made to lie close to virtually any reference population just by changing the numbers and types of the reference samples (see illustration), generating practically endless historical versions, all mathematically "correct," but only one may be biologically correct.
[...] Between 32,000 and 216,000 scientific articles in genetics alone have employed PCA for exploring and visualizing similarities and differences between individuals and populations and based their conclusions on these results.
"I believe these results must be re-evaluated," says Elhaik.
[...] "Techniques that offer such flexibility encourage bad science and are particularly dangerous in a world where there is intense pressure to publish. If a researcher runs PCA several times, the temptation will always be to select the output that makes the best story", adds Prof. William Amos, from the Univesity of Cambridge, who was not involved in the study.
Journal Reference:
Elhaik, E. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated [open]. Sci Rep 12, 14683 (2022). 10.1038/s41598-022-14395-4
(Score: 4, Interesting) by shrewdsheep on Thursday September 15 2022, @08:29AM
PCA is a dimension reduction technique that, when applied to matrices of genetic data (individuals/markers) tends to reveal clusters that are interpreted as underlying populations. The author claims that distances in the output space (the PCs) where these clusters are visible should not be interpreted as they currently are.
It is well known that PCs are highly sensitive to small changes in the data, in this sense the article is correct that interpretation is difficult. However, distances can be re-calibrated to achieve desired interpretations in analogue methods like MDS and stability can be improved by regularization techniques.
The criticism, however, does not pertain to so called Genome Wide Association Studies (GWASs) where the same technique is used to correct for bias in statistical tests. The author probably implies these GWAS studies in the 32,000-216,000 studies he blames to be biased. IMO criticism is valid only for a couple of hundreds of studies where genetic ancestry is interpreted. It also has to be added that bias per se is not a problem as long as it does not increases false positive rates above pre-specified levels.