Identity inference of genomic data using long-range familial searches

09 November 2018  Vol 362, Issue 6415

Identity inference of genomic data using long-range familial searches
By Yaniv Erlich, Tal Shor, Itsik Pe’er, Shai Carmi
Science09 Nov 2018 : 690-694 Restricted Access
Genetic privacy is difficult to maintain in light of forensic searches of genetic genealogical databases.
Detecting familial matches
Recent advances in DNA technology and companies that provide array-based testing have led to services that collect, share, and analyze volunteered genomic information. Privacy concerns have been raised, especially in light of the use of these services by law enforcement to identify suspects in criminal cases. Testing models of relatedness, Erlich et al. show that many individuals of European ancestry in the United States—even those that have not undergone genetic testing—can be identified on the basis of available genetic information. These results indicate a need for procedures to help maintain genetic privacy for individuals.
Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European descent will result in a third-cousin or closer match, which theoretically allows their identification using demographic identifiers. Moreover, the technique could implicate nearly any U.S. individual of European descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. On the basis of these results, we propose a potential mitigation strategy and policy implications for human subject research.