• Call for Social Sciences Papers
  • Science Sessions: The PNAS Podcast Program

Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets

  1. Noah A. Rosenberga,1
  1. aDepartment of Biology, Stanford University, Stanford, CA 94305;
  2. bDepartment of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada R3E0J9;
  3. cDepartment of Human Genetics, University of Michigan, Ann Arbor, MI 48109
  1. Edited by Andrew G. Clark, Cornell University, Ithaca, NY, and approved April 10, 2017 (received for review December 6, 2016)

Significance

We describe a method for identifying in distinct genetic datasets observations that represent the same person. By using correlations among genetic markers close to one another in the genome, the method can succeed even if the datasets contain no overlapping markers. We show that the method can link a dataset similar to those used in genomic studies with another dataset containing markers used for forensics. Our approach can assist in maintaining backward compatibility with databases of existing forensic genetic profiles as systems move to new marker types. At the same time, it illustrates that the privacy risks that can arise from the cross-linking of databases are inherent even for small numbers of markers.

Abstract

Combining genotypes across datasets is central in facilitating advances in genetics. Data aggregation efforts often face the challenge of record matching—the identification of dataset entries that represent the same individual. We show that records can be matched across genotype datasets that have no shared markers based on linkage disequilibrium between loci appearing in different datasets. Using two datasets for the same 872 people—one with 642,563 genome-wide SNPs and the other with 13 short tandem repeats (STRs) used in forensic applications—we find that 90–98% of forensic STR records can be connected to corresponding SNP records and vice versa. Accuracy increases to 99–100% when ~30 STRs are used. Our method expands the potential of data aggregation, but it also suggests privacy risks intrinsic in maintenance of databases containing even small numbers of markers—including databases of forensic significance.

Footnotes

  • ?1To whom correspondence should be addressed. Email: noahr{at}stanford.edu.
  • Author contributions: M.D.E., B.F.B.A.-H., J.Z.L., and N.A.R. designed research; M.D.E., T.J.P., and N.A.R. performed research; M.D.E. and N.A.R. analyzed data; and M.D.E., B.F.B.A.-H., T.J.P., J.Z.L., and N.A.R. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.danielhellerman.com/lookup/suppl/doi:10.1073/pnas.1619944114/-/DCSupplemental.

Freely available online through the PNAS open access option.

Online Impact

                                                1. 336531258 2018-02-17
                                                2. 6455421257 2018-02-17
                                                3. 5128821256 2018-02-17
                                                4. 4014601255 2018-02-17
                                                5. 9637141254 2018-02-17
                                                6. 6087041253 2018-02-17
                                                7. 6141561252 2018-02-17
                                                8. 16211251 2018-02-17
                                                9. 202981250 2018-02-17
                                                10. 1634281249 2018-02-17
                                                11. 2115681248 2018-02-17
                                                12. 8627591247 2018-02-17
                                                13. 1184961246 2018-02-17
                                                14. 9203941245 2018-02-17
                                                15. 4504061244 2018-02-16
                                                16. 5597191243 2018-02-16
                                                17. 5234981242 2018-02-16
                                                18. 6285841241 2018-02-16
                                                19. 3913011240 2018-02-16
                                                20. 5129741239 2018-02-16