Autosomal DNA segment matching is a complex issue. Through testing and observation, it is
obvious that some segment matches are false positives. Computer algorithms will detect any matching
allele with no knowledge that the allele is of paternal or maternal origin.
If we said
that the left columns are from the father’s sides and the right from the mother’s,
we would see that none of the columns match.
Obviously, we can’t just draw a line down the middle and say one side is
the mother’s DNA. To determine which DNA
came from mm and which came from dad, the autosomal results would need to be
phased. To phase the results of an
autosomal sample it must be compared to at least one parent result. By difference, the child result can be split
into its paternal and maternal contributions.
If it were possible to phase every sample to
be matched, false positives by computer algorithm would be eliminated. Unfortunately, phasing every sample is not
always possible. A person’s parents may
be deceased or even unknown.
Another
method of reducing or eliminating false positives is to triangulate each
matching segment. If a segment from autosomal
sample A matches the corresponding segment from sample B and sample B
matches sample C and sample C matches the original sample A, then the segment
is considered triangulated and identical by descent. How confident are we that the triangulated matches
aren’t just a circular series of false positives?
Let’s look at segment on chromosome 3 that
starts at rs6796502 and is 2.5 cM and 946 SNPs.
For this exercise, any chromosome segment could be used.
Table 1. Allele frequencies of 20 loci on chromosome 3. |
On that segment, there are 20 published locations
with allele frequencies (NCBI). Table 1
shows the how often a certain allele combination (AA, AC, AG etc.) appears for
a European population. Based on allele
frequency, the most common combination of alleles in this section of chromosome
3 for a population of European descent is listed in Table 2. I have artificially selected the most common
combination to simulate a large portion of the population with European
descent. About 1 in 3,400 or about or
about 300,000 people should have this combination.
Table 2. Predicted allele combination. |
Imagine
for a moment that you roll six dice. The
first die comes up with a one and the second is a two and so on. The probability of rolling a one on the first
die is 1/6 (one side up on a six-sided die).
The probability of rolling a one and then a two is 1/6 times 1/6 or
1/36. It will happen once every 36 rolls. The combination illustrated on six dice would
happen once in every 46,656 rolls. Now
imagine that is your DNA and we are looking for a match. The other person would need one through six
in the same order. To calculate that
probability we multiply 46,656 by 46,656 and get 2,176,782,336. DNA matching actual has a better probability of
matching.
Table 3 lists the most common alleles again
along with potential alleles that would generate a half match and the
corresponding summed frequency. The
probability of the set of 20 potential combinations existing is equal to the
product of the frequencies - 0.759. This
probability has to be extrapolated from 20 loci to 946, giving us 2.45x10-6
or 1 in 400,000. There is a 1 in 400,000
chance of a completely random match on this section of chromosome 3 for the
alleles with the highest frequency. It
is well within reason to expect false positives for this one-to-one match.
Table 3. Probability of a half match within a European population. |
In the
event of a three-way match (triangulation), we multiply by 2.45x10-6
again, giving us a probability of 1 in 167 billion. Now we are outside of what is statistically
reasonable.
The
most common set of European alleles doesn't produce the highest probability of
a random match. When the alleles are not
the same (AC, AG, CT etc.), there is a higher chance of an autosomal half
match. Table 4 shows an actual set of
alleles and the corresponding set of alleles to generate a half match.
Table 4. Probability of a half match within a European population using actual sample. |
This
actual sample takes us from a false positive probability of 1 in 400,000 to 1
in 5,900 (0.000169). A probability of 1
in 5,900 indicates that we should be seeing completely random matches that have
no genetic relationship on a regular basis.
Considering a population of about 1.6 million autosomal tests taken,
each of us would have 270 false positive matches on a segment similar to the
one shown.
Triangulated
matches exist for this segment of chromosome 3.
For the probability of this triangulated segment, we multiply by 0.000169
again, giving us 2.87x10-8 or about 1 in 35 million. Considering the number of results available
for matching (about 1.6 million), it is not realistic that we are matching randomly. In fact, most triangulated matches involve
more than three test results. If four
test results are triangulated, the probability goes to 1 in 205 billion. These probabilities indicate that triangulated
results cannot be random and are matching due to common genetic descent.
I have
intentionally used two examples that have a higher probability of having false
positive matches. As soon as we look at
matches that don’t have the higher frequency European alleles, the probability
of a false positive diminishes.
Table 5. Probability of a half match within a European population with a Mediterranean sub-component. |
Table 5
shows a typical set of alleles. There
are two alleles at rs7630053 and rs4558783 that are not typical European and may
indicate a Mediterranean ethnicity. The
probability of a one to one match on this segment being a false positive
calculates to be 1 in 7 quadrillion.
Currently,
we cannot examine the allele frequency for every SNP in every match we
attempt. When looking for autosomal
matches consider phasing or triangulation.
Phasing the data is very valuable, yet the resources are not always
available. I’ve shown that triangulation
eliminates false positives and those matches are statistically identical by
descent. Triangulated small segment
matching is very valuable in our research.
References:
Maglio, MR (2015) Autosomal DNA and the Triangulation of Small Segments: A Statistical Approach (Link)
© 2015 Michael Maglio and OriginsDNA. All Rights Reserved.