Twitter | Pretraživanje | |
Doc Edge
(Thread) Here’s my new preprint, with , on genetic privacy in genealogy databases that allow user uploads. (1/n)
Direct-to-consumer (DTC) genetics services are increasingly popular for genetic genealogy, with tens of millions of customers as of 2019. Several DTC genealogy services allow users to upload their...
bioRxiv bioRxiv @biorxivpreprint
Reply Retweet Označi sa "sviđa mi se" More
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
We also wrote an explainer that’s posted on Graham’s blog (2/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @FamilyTreeDNA @MyHeritage @LivingDNA
In this paper, we’re thinking about direct-to-consumer (DTC) genetic genealogy services that allow users to upload their own genetic datasets. Examples include GEDmatch, , , and . (3/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
There are advantages to these upload features: users find more biological relatives without having to be re-genotyped (or pay for new genotyping), and the services grow their databases, which lets them connect more people. (4/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
But there are always risks to sharing genetic information. For example, if a relative matches with you in a particular chromosomal location, then your relative learns something about your genotypes in that location, as pointed out by Leah Larkin (5/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
Our point in this paper is that if a user can upload many datasets, there are ways to learn genotypes that are supposed to be private. Depending on how the DTC service is configured, the majority of genotypes in the database can be revealed with a few hundred uploads. (6/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
We explore three ways of doing this. We call the first “tiling,” and to do it, an adversary would just upload many genomes (e.g. publicly available ones) and keep track of their chromosomal matches with genomes in the database. (7/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
We find that tiling works pretty well in a set of publicly available European genomes. (8/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
The second approach is “probing,” and you could use it to find genotypes at a specific genomic location even if the DTC service doesn’t have a chromosome browser. (9/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
The idea is to fill in most of the uploaded genome with fake genotypes designed to match no one, so any people returned as matches will match the uploaded dataset at the site of interest. (10/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
The third approach is called “baiting,” and it takes advantage of methods for identifying relatives that rely on unphased genetic data. (That is, genetic data where we don’t know which alleles go together on the same copy of a chromosome.) (11/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
You can identify relatives in unphased data by looking for long regions where two people are never homozygous for opposite alleles. (12/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
Though no longer cutting-edge, this method scales to big datasets. One big problem is that it’s easy to trick---upload a fake dataset that’s heterozygous at every site, and it will appear to match everyone everywhere, as if it were the parent of everyone in the database. (13/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
You can use this to reveal genotypes at specified locations. Upload a dataset that’s mostly het, but put in homozygous genotypes at the sites you want to know. If a person in the database has the opposite homozygous genotype, the inferred related segment will break. (14/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
Under the simplest version of this kind of relative-finding algorithm, we estimate that you could reveal enough genotypes to impute genome-wide genotypes of everybody in the database with 97%+ accuracy after about a hundred uploads. (15/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
And if you just care about one SNP---such as one that reveals APOE4 alleles and thus a lot of information about Alzheimer's risk, for example---you could get the genotypes of everyone in the database w/ 2 uploads of fake data. (16/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
Luckily, all the attacks we describe are preventable, or at least can be made inefficient, if DTC services use a subset of the countermeasures we describe (17/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @erlichya @itsikp @ShaiCarmi
Some countermeasures are easy, like only returning information about long chromosomal matches. Others, like requiring cryptographic signatures suggested by last year, are harder to implement but very effective (18/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
We wrote to all the DTC genealogy services we know of that allow uploads 90 days ago to share these methods and the countermeasures we recommend. They all wrote back to us, and some of them told us that some of these countermeasures are already in place. (19/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
We’d encourage all services that might potentially be affected by these kinds of attacks to share the countermeasures they have in place publicly. (20/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @DocEdge85
Genetic genealogy can be an amazing, empowering thing for people who want to find their biological relatives, including folks who wouldn’t be able to find them otherwise. Our goal (shared w/ the companies offering these services) is that people be able to do this safely. (21/n)
Reply Retweet Označi sa "sviđa mi se"
Doc Edge 22. lis
Odgovor korisniku/ci @ExtendGina
And of course, this is all part of a larger conversation about how we as a society want our genetic information used and want our genetic privacy protected. (Add it to a list of reasons to ) (end)
Reply Retweet Označi sa "sviđa mi se"