OriginsDNA - Genetic Genealogy: chromosomes

Showing posts with label chromosomes. Show all posts

Wednesday, April 15, 2015

Getting More From Your Autosomal DNA: Genetic Family Trees

For years, genealogists have been able to use Y-DNA to validate paternal pedigrees and sort surnames into family groups. This has been a great advantage for the world of genealogy, but it has been restricted to men and paternal lines. Autosomal DNA is more inclusive. Both women and men can take this test and it illuminates the entire family tree as opposed to just the male line. For those of us that have taken an autosomal test, there are a number of tools that help find cousin matches. When we find multiple cousins matching the same chunk of DNA, we reach out to our new cousins and attempt to find a common ancestor in our trees. Many times this is unsuccessful due to incomplete trees. This is what is called a bottom-up approach.

What if we used a top-down approach? What if we started with your 10^th great-grandmother? You’d say autosomal DNA can’t go back that far. That’s 12 generations ago and the DNA would be diluted to less than 1% of the original amount. If autosomal DNA behaved mathematically, you’d be correct. Autosomal DNA behaves more like Legos. When we inherit DNA from our parents, it’s true that we get 50% from mom and 50% from dad. That’s where the fairness ends. When we look at what we inherit from our grandparents (through our parents), it is never 50/50.

Instead, what we get from our grandparents is a random split. In the case of the illustration above, this grandchild received a 54/46 split. This is not uncommon. See this Slate article.

Our chromosomes behave like building blocks. There is a tendency for genes located closely on a chromosome to be inherited together in a block. This is called gene linkage. There is no set size for these blocks; size is completely based on the genes that tend to stay together. Segments around the 2 cM (centiMorgan) size have been found consistently (American Journal of Human Genetics). The DNA we get from our grandparents come to us in large contiguous sections of hundreds of these blocks. From generation to generation, the large sections are inherited randomly and unfairly, but the building blocks have a tendency to stay intact and not recombine. With each generation, there is 50% chance of inheriting or not inheriting a specific block.

It’s possible that these 2 cM building blocks are about 25 generations old. So, when we start with our 10^th great-grandparents, they have lots of these blocks that they inherited from their parents and gave to their children. What we can expect is that their descendants will have an assortment of these blocks from them and other ancestors. When we examine the autosomal DNA for two dozen of their descendants, we find a set of genetic blocks in common. No one descendant will have all the available genetic blocks an ancestor has left in the gene pool. We may find five descendants sharing a block on chromosome one and seven descendants sharing a block on chromosome 12. With DNA samples from two dozen descendants, about 15 ancestral genetic blocks can be identified. All of the ancestral genetic blocks taken together uniquely identify your 10^th great-grandparents as a couple. Only their descendants would have this genetic block combination. (Except in the situation where one set of siblings marries another set of sibling from a different family.)

When we take the process a step further and analyze the next generation, we start to build a genetic family tree.

The table above shows the genetic blocks identified for Stephen Hopkins and each of his children that had descendants. For simplicity, only one individual is listed for each column. Remember that each column of genetic blocks actually represents a married couple: Constance Hopkins and Nicholas Snow, Deborah Hopkins and Andrew Ring, etc. Each genetic block has a chromosome number and start and end locations. Blocks in green represent inherited blocks from Stephen to his children. As we build a genetic family tree, it now becomes possible to take a DNA sample from a living individual and match with Stephen Hopkins. Once a match with Stephen is found, matches to his children can be checked to see which child the sample descends from. Generations can be added to the genetic tree until known descendant DNA data has been exhausted. In the Hopkins family, I was able to extend Constance’s line by a generation to Mary Snow and then then to Mary’s daughter, Mary Paine, before the data ran out.

Similar to Y-DNA, these sets of genetic blocks (autosomal haplotype) can be used to identify genealogical relationships and sometimes the lack of relationships. John Hopkins of Connecticut has often been connected as a son of Stephen Hopkins. When we generate the autosomal haplotype for John and compare it to Stephen, we can see that there is no relation across the board.

The red blocks indicate John’s DNA segments that have no corresponding segments with Stephen. The yellow blocks indicate a similar chromosome location, but no genetic match. Y-DNA gives us the ability to use DNA to see how brothers are potentially connected. Now autosomal DNA gives us the ability to see how brothers and sisters are potentially connected.

The autosomal haplotyping process is not a silver bullet that will solve all of our genealogy problems. It will add to our toolkit as we validate family trees, work through brick-walls and attempt to solve genealogy mysteries.

Reference:

Maglio, MR (2015) Autosomal Haplotypes and the Genetic Reconstruction of Family Trees (Link)

© 2015 Michael Maglio and OriginsDNA. All Rights Reserved.

Thursday, March 5, 2015

Breaking Through the Autosomal DNA Generation Barrier: Connecting to Distant Ancestors

There has been much debate over the use of small autosomal DNA segments. It is important to understand where they come from and how they can be used for genetic genealogy. Small segments are considered noise and false matches. There are too many small matches to make sense out of, but they are not necessarily false matches. These segments have been in the population for longer than we thought. When I match someone at 2 cM it is very likely that they are a 12^th cousin, not a 5^th cousin. There is no reason for us to look for small segment matches until we understand where these segments originated.

When we talk about autosomal DNA, we often over simplify the process of genetic inheritance. The simple answer is that we inherit half of our DNA from dad and half from mom. The common message is that with every generation the DNA contribution from an ancestor is randomized and reduced until it is insignificant. Genetic inheritance is actually much more complex than that. Complex in a great way. There is a tremendous amount of ancestral information that we are just beginning to tap into.

We inherit DNA from our parents and their ancestors in large sections. Take a look at the graphic below. Each example is the comparison of a grandchild to a set of paternal grandparents. You can see in the first example that the grandchild inherited over two-thirds of their grandfather’s first chromosome intact (blue bars). The remaining section of the first chromosome is from their grandmother. In the third example, the grandchild has inherited the entire chromosome 14 from their grandmother. It is physically possible that this grandchild could someday give one of their children the grandmother’s complete chromosome 14.

In an effort not to over simplify, this is just half the story. That grandchild has an equal contribution from their maternal grandparents.

In the examples above, we can visualize what happens when DNA recombines. The first example shows where one section of the grandfather’s DNA swapped places with the grandmother’s DNA before it was inherited by the grandchild. This is called crossover. In the examples, a) is a single crossover, b) is a double crossover and c) has no crossover. On average, each of our chromosomes experienced 2 or 3 crossovers before we inherited them.

Where DNA crossover takes place on a chromosome is not random. There are approximate locations where the chromosome is more likely to split. These locations are cleavage sites.

These locations exist because there are groups of genes along a chromosome that have a tendency to stay together. These groups are part of gene linkage. These linked genes only allow for chromosome splits at either end of their linked section. In my research, the minimum size for one of these gene-linked sections is about 2.5 cM. These small segments then travel in larger groups.

In the graphic above, the blue bar represents about a 60 cM match. The intersection between the black and orange ovals is about 2.5 cM and represents a minimum segment. In this crossover recombination, the large segment actually split to the right of the minimum segment. In a future crossover, the chromosome could split on the left side of the minimum segment, giving a large segment bound by the orange oval.

Why are these minimum segments important? My research shows that these segments stay in the gene pool for dozens of generations. Over time, naturally occurring SNP mutations take place. These minimum inherited segments (MIS) can be differentiated into family groups.

In my research, I started with 28 well known US colonial surnames and 393 autosomal kits. For each surname, the associated kits were triangulated. If three or more kits match on the same segment, you can deduce that it came from a common ancestor. Each of the surnames investigated had 6 to 13 distinct triangulated segments. Taken together, these triangulated ancestral segments represent an autosomal haplotype that can be used to identify a descendant’s genetic connection to an ancestor. Across all of the surnames, these distinct segments appear at recurring locations on each chromosome. I have listed 21 of these ancestral loci in my paper.

Not all ancestral segments are the same type. The segments can be categorized into three groups. The first category is Common to All. The surnames in this study are predominantly European. One segment has been identified on chromosome 2 that triangulates across all surnames. This segment correlates to a Western Atlantic ethnicity and I call it the Western Atlantic Autosomal Haplotype (WAAH). The Western Atlantic Autosomal Haplotype should not be confused with ancestry informative markers (AIMs). The WAAH is composed of about 800 SNPs and there are only about 100 AIMs SNPs in that same stretch of chromosome 2.

The next category is Shared. Some segments can be attributed to two or more surnames. There was considerable intermarriage between US colonial families. That period was a bottleneck genealogically and genetically. As two major families married, their combined DNA segments entered the gene pool and were reinforced as their descendants intermarried.

The third category is Unique. These shared segments cannot be attributed to intermarriage of families. Yet the resulting familial autosomal haplotypes are not composed of a single surname. In the case of Benjamin Franklin, the genetic proximity to his wife, Deborah Read and his mother, Abiah Folger, may make it impossible to distinguish between Folger, Franklin and Read DNA. Therefore, the haplotype represents the combined inheritance.

Here is one of my case studies. Augustine Bearse was born in England in 1618 and died in Barnstable, MA before 1697. The Bearse family was chosen due to my familiarity with the genealogy and the debate surrounding Augustine’s wife. His wife Mary was supposedly the granddaughter of the Chief of the Cape Cod Native American tribes. The goal was twofold; to identify the autosomal haplotype for the Bearse family and determine whether any of the ancestral segments had Native American ethnicity.

The Bearse study was composed of 48 autosomal samples. These samples were collected based on claimed genealogical connections. The triangulated samples generated 8 ancestral loci and indicated an additional 5 loci that had the potential to triangulate with more samples. The resulting Bearse autosomal haplotype is found below.

Bearse Autosomal Haplotype

The Bearse haplotype contains the Western Atlantic Autosomal Haplotype (chromosome 2) which is common to all haplotypes in the study. The other 12 loci are more valuable for genealogical validation. One of the Bearse descendants triangulates on six of the ancestral segments. It is highly unlikely that a descendant would match on all of the segments. Although ancestral segments survive over the generations, the randomness of their distribution makes it difficult for any one person to have received them all. Yet, triangulating on just one segment unique to Bearse is enough to indicate and validate a relationship. Lack of a match could mean that an ancestral segment was not inherited or that a non-familial event (adoption, infidelity, etc.) has occurred and the individual’s family tree is incorrect.

In order to investigate the origins of Augustine’s wife Mary, each ancestry segment from the haplotype was evaluated for ethnicity. Only the segment on chromosome six at location 55850885 had any Native American ethnicity. This ancestral segment had not fully triangulated, yet a few of the samples match exactly on Native American SNPs. With additional samples, the segment could triangulate. Once validated, the segment might be shared across multiple surnames or unique to Bearse, indicating Native American genes in the Bearse descendants.

While the amount of autosomal DNA received by each successive generation is only half from each parent, that does not mean that given enough generations a distant ancestor’s genetic contribution will become negligible. Through genetic linkage, portions of DNA are inherited intact. Naturally occurring cleavage sites allow for ancestral segments averaging 2.5 cM to be passed from generation to generation as a minimum inherited segment (MIS).

Ancestral segment analysis is invaluable for the identification of distant ancestors. All of the triangulated ancestral locations combine to become a Familial Autosomal Haplotype (FAH) that can be used to validate family history.

Since finishing my initial research, I have gone on to identify over 50 ancestral loci and over 700 autosomal haplotypes for US colonial ancestors. Stay tuned for further advances in autosomal research.

References:

Maglio, MR (2015) Minimum Inherited DNA Segment Size and the Introduction of Familial Autosomal Haplotypes (Link)

Website:

www.OriginsConnector.com

Wednesday, May 2, 2012

The Autosomal Match Game

Don’t get me wrong. Autosomal DNA testing is a very valuable tool. A match has the possibility of breaking through some very significant genealogical brick walls. It’s important to understand what a match means or doesn’t mean.

In a nutshell, we all have 46 chromosomes, 23 from mom and 23 from dad. Two of those chromosomes are the sexy kind, X and Y. We’ll ignore those for now. In an autosomal test, the DNA sequences in your chromosomes are compared against everyone in the testing company’s database. The goal is to find long matching sequences. Depending on how long the sequences are and the total number of matching sequences, a calculation predicts the cousin relationship.

Now here is where things get dicey…

Take two full siblings (not twins). At first glance, you might think that genetically they are a 100% match. Dad gives these two siblings 23 chromosomes each, half of his DNA. It’s not necessarily the same 23 chromosomes. Mom does the same. Let’s look at the two extremes.

Imagine mom’s DNA as two chunks of 23 chromosomes each – A & B. Dad has two chunks also – C & D. Mom gives each child chunk A and dad gives each chunk D. Both children will have A & D and will be exact genetic matches.

What if mom gave one child A and one child B. Then dad gave one child C and one child D. The full siblings would be A & C and B & D, showing no match at all. The truth is that a full sibling match will exist on a continuum somewhere in between.

The probability that a sibling match would be 0% or 100% is extremely low. Cousin matches are a different story. In a perfect world, two 1st cousins could share 25% of their DNA. Two 2nd cousins might have 1/8, 3rd cousins – 1/16, 4th – 1/32 and 5th cousins – 1/64 – a little more than 1% shared DNA. The possibility of two cousins not sharing DNA or not sharing a long enough sequence to make a match gets higher.

In my family, two Scottish brothers married two German cousins. I am the grandson from one of these unions. I have a cousin who is the grandson from the other marriage. We are both 2nd cousins and 3rd cousins. It is possible that we share 1/8 plus 1/16 for a total of 3/16th. That much shared DNA could be reported on a test as being 1st cousins.

The autosomal match game is not a perfect world. If you don’t get a match and you think you should have, then test different cousins. Adding more DNA samples could give a new set of results. If you do get matches, the degree of the relationship can help set a starting point in looking for that common ancestor.

DNA is just one of many tools we have as genealogists. In the case of autosomal testing, DNA is just the beginning. It will take traditional genealogy to get you to the prize.

#gDNA

Pages