OriginsDNA - Genetic Genealogy: identical by descent

Wednesday, April 15, 2015

Getting More From Your Autosomal DNA: Genetic Family Trees

For years, genealogists have been able to use Y-DNA to validate paternal pedigrees and sort surnames into family groups. This has been a great advantage for the world of genealogy, but it has been restricted to men and paternal lines. Autosomal DNA is more inclusive. Both women and men can take this test and it illuminates the entire family tree as opposed to just the male line. For those of us that have taken an autosomal test, there are a number of tools that help find cousin matches. When we find multiple cousins matching the same chunk of DNA, we reach out to our new cousins and attempt to find a common ancestor in our trees. Many times this is unsuccessful due to incomplete trees. This is what is called a bottom-up approach.

What if we used a top-down approach? What if we started with your 10^th great-grandmother? You’d say autosomal DNA can’t go back that far. That’s 12 generations ago and the DNA would be diluted to less than 1% of the original amount. If autosomal DNA behaved mathematically, you’d be correct. Autosomal DNA behaves more like Legos. When we inherit DNA from our parents, it’s true that we get 50% from mom and 50% from dad. That’s where the fairness ends. When we look at what we inherit from our grandparents (through our parents), it is never 50/50.

Instead, what we get from our grandparents is a random split. In the case of the illustration above, this grandchild received a 54/46 split. This is not uncommon. See this Slate article.

Our chromosomes behave like building blocks. There is a tendency for genes located closely on a chromosome to be inherited together in a block. This is called gene linkage. There is no set size for these blocks; size is completely based on the genes that tend to stay together. Segments around the 2 cM (centiMorgan) size have been found consistently (American Journal of Human Genetics). The DNA we get from our grandparents come to us in large contiguous sections of hundreds of these blocks. From generation to generation, the large sections are inherited randomly and unfairly, but the building blocks have a tendency to stay intact and not recombine. With each generation, there is 50% chance of inheriting or not inheriting a specific block.

It’s possible that these 2 cM building blocks are about 25 generations old. So, when we start with our 10^th great-grandparents, they have lots of these blocks that they inherited from their parents and gave to their children. What we can expect is that their descendants will have an assortment of these blocks from them and other ancestors. When we examine the autosomal DNA for two dozen of their descendants, we find a set of genetic blocks in common. No one descendant will have all the available genetic blocks an ancestor has left in the gene pool. We may find five descendants sharing a block on chromosome one and seven descendants sharing a block on chromosome 12. With DNA samples from two dozen descendants, about 15 ancestral genetic blocks can be identified. All of the ancestral genetic blocks taken together uniquely identify your 10^th great-grandparents as a couple. Only their descendants would have this genetic block combination. (Except in the situation where one set of siblings marries another set of sibling from a different family.)

When we take the process a step further and analyze the next generation, we start to build a genetic family tree.

The table above shows the genetic blocks identified for Stephen Hopkins and each of his children that had descendants. For simplicity, only one individual is listed for each column. Remember that each column of genetic blocks actually represents a married couple: Constance Hopkins and Nicholas Snow, Deborah Hopkins and Andrew Ring, etc. Each genetic block has a chromosome number and start and end locations. Blocks in green represent inherited blocks from Stephen to his children. As we build a genetic family tree, it now becomes possible to take a DNA sample from a living individual and match with Stephen Hopkins. Once a match with Stephen is found, matches to his children can be checked to see which child the sample descends from. Generations can be added to the genetic tree until known descendant DNA data has been exhausted. In the Hopkins family, I was able to extend Constance’s line by a generation to Mary Snow and then then to Mary’s daughter, Mary Paine, before the data ran out.

Similar to Y-DNA, these sets of genetic blocks (autosomal haplotype) can be used to identify genealogical relationships and sometimes the lack of relationships. John Hopkins of Connecticut has often been connected as a son of Stephen Hopkins. When we generate the autosomal haplotype for John and compare it to Stephen, we can see that there is no relation across the board.

The red blocks indicate John’s DNA segments that have no corresponding segments with Stephen. The yellow blocks indicate a similar chromosome location, but no genetic match. Y-DNA gives us the ability to use DNA to see how brothers are potentially connected. Now autosomal DNA gives us the ability to see how brothers and sisters are potentially connected.

The autosomal haplotyping process is not a silver bullet that will solve all of our genealogy problems. It will add to our toolkit as we validate family trees, work through brick-walls and attempt to solve genealogy mysteries.

Reference:

Maglio, MR (2015) Autosomal Haplotypes and the Genetic Reconstruction of Family Trees (Link)

© 2015 Michael Maglio and OriginsDNA. All Rights Reserved.

Thursday, March 19, 2015

Triangulated Small Segments are Identical by Descent

Autosomal DNA segment matching is a complex issue. Through testing and observation, it is obvious that some segment matches are false positives. Computer algorithms will detect any matching allele with no knowledge that the allele is of paternal or maternal origin.

If we said that the left columns are from the father’s sides and the right from the mother’s, we would see that none of the columns match. Obviously, we can’t just draw a line down the middle and say one side is the mother’s DNA. To determine which DNA came from mm and which came from dad, the autosomal results would need to be phased. To phase the results of an autosomal sample it must be compared to at least one parent result. By difference, the child result can be split into its paternal and maternal contributions.

If it were possible to phase every sample to be matched, false positives by computer algorithm would be eliminated. Unfortunately, phasing every sample is not always possible. A person’s parents may be deceased or even unknown.

Another method of reducing or eliminating false positives is to triangulate each matching segment. If a segment from autosomal sample A matches the corresponding segment from sample B and sample B matches sample C and sample C matches the original sample A, then the segment is considered triangulated and identical by descent. How confident are we that the triangulated matches aren’t just a circular series of false positives?

Let’s look at segment on chromosome 3 that starts at rs6796502 and is 2.5 cM and 946 SNPs. For this exercise, any chromosome segment could be used.

Table 1. Allele frequencies of 20 loci on chromosome 3.

On that segment, there are 20 published locations with allele frequencies (NCBI). Table 1 shows the how often a certain allele combination (AA, AC, AG etc.) appears for a European population. Based on allele frequency, the most common combination of alleles in this section of chromosome 3 for a population of European descent is listed in Table 2. I have artificially selected the most common combination to simulate a large portion of the population with European descent. About 1 in 3,400 or about or about 300,000 people should have this combination.

Table 2. Predicted allele combination.

Imagine for a moment that you roll six dice. The first die comes up with a one and the second is a two and so on. The probability of rolling a one on the first die is 1/6 (one side up on a six-sided die). The probability of rolling a one and then a two is 1/6 times 1/6 or 1/36. It will happen once every 36 rolls. The combination illustrated on six dice would happen once in every 46,656 rolls. Now imagine that is your DNA and we are looking for a match. The other person would need one through six in the same order. To calculate that probability we multiply 46,656 by 46,656 and get 2,176,782,336. DNA matching actual has a better probability of matching.

Table 3 lists the most common alleles again along with potential alleles that would generate a half match and the corresponding summed frequency. The probability of the set of 20 potential combinations existing is equal to the product of the frequencies - 0.759. This probability has to be extrapolated from 20 loci to 946, giving us 2.45x10^-6 or 1 in 400,000. There is a 1 in 400,000 chance of a completely random match on this section of chromosome 3 for the alleles with the highest frequency. It is well within reason to expect false positives for this one-to-one match.

Table 3. Probability of a half match within a European population.

In the event of a three-way match (triangulation), we multiply by 2.45x10^-6 again, giving us a probability of 1 in 167 billion. Now we are outside of what is statistically reasonable.

The most common set of European alleles doesn't produce the highest probability of a random match. When the alleles are not the same (AC, AG, CT etc.), there is a higher chance of an autosomal half match. Table 4 shows an actual set of alleles and the corresponding set of alleles to generate a half match.

Table 4. Probability of a half match within a European population using actual sample.

This actual sample takes us from a false positive probability of 1 in 400,000 to 1 in 5,900 (0.000169). A probability of 1 in 5,900 indicates that we should be seeing completely random matches that have no genetic relationship on a regular basis. Considering a population of about 1.6 million autosomal tests taken, each of us would have 270 false positive matches on a segment similar to the one shown.

Triangulated matches exist for this segment of chromosome 3. For the probability of this triangulated segment, we multiply by 0.000169 again, giving us 2.87x10^-8 or about 1 in 35 million. Considering the number of results available for matching (about 1.6 million), it is not realistic that we are matching randomly. In fact, most triangulated matches involve more than three test results. If four test results are triangulated, the probability goes to 1 in 205 billion. These probabilities indicate that triangulated results cannot be random and are matching due to common genetic descent.

I have intentionally used two examples that have a higher probability of having false positive matches. As soon as we look at matches that don’t have the higher frequency European alleles, the probability of a false positive diminishes.

Table 5. Probability of a half match within a European population with a Mediterranean sub-component.

Table 5 shows a typical set of alleles. There are two alleles at rs7630053 and rs4558783 that are not typical European and may indicate a Mediterranean ethnicity. The probability of a one to one match on this segment being a false positive calculates to be 1 in 7 quadrillion.

Currently, we cannot examine the allele frequency for every SNP in every match we attempt. When looking for autosomal matches consider phasing or triangulation. Phasing the data is very valuable, yet the resources are not always available. I’ve shown that triangulation eliminates false positives and those matches are statistically identical by descent. Triangulated small segment matching is very valuable in our research.

References:

Maglio, MR (2015) Autosomal DNA and the Triangulation of Small Segments: A Statistical Approach (Link)

Pages

Wednesday, April 15, 2015

Getting More From Your Autosomal DNA: Genetic Family Trees

Thursday, March 19, 2015

Triangulated Small Segments are Identical by Descent