![]() |
Diana, Goddess
of the Hunt for Ancestors!
|
![]() |
| TMRCA An Alternate View |
| The TMRCA calculation [Time to Most Recent Common Ancestor] is mathematically based on an average mutation rate derived from large samples of tested individuals. The problem is that the frequency of mutations is not constant, it's random, and it only appears to even out in large samples and deep time frames. Therefore, and in my opinion, TMRCA is nearly useless when applied to small samples (individual families) in genealogical time (under 25 generations). I do not dispute its usefulness with large samples in paleoanthropological time frames. | ||||||||||||||||||||||||||||||
| The "average" is a descriptive statistic. It's one of
the statistics used as a shorthand way of describing the characteristics
of a group. For example, if you have a classroom of 4th grade students,
you can describe their height by giving a list of their heights, which
would be a valid, though clumsy, way to describe them. We would probably
find the average height a more efficient way to describe the group's
height. But please note and this is the point by calculating
the average, what you have discovered here is not some underlying "law
of the universe" that governs the height of school children. It is
simply a description of this group, not a predictor of other
groups, not unless the other group is a carefully selected 4th grade class
and the sample sizes are valid. In other words, a statistical
average is not useful as a predictor unless the samples are comparable
and, especially, not unless the sample sizes are valid. Oddly
enough, people who clearly understand this basic requirement seem to abandon
it when it comes to genealogical DNA research.
Mutations are random, which is the direct opposite of even. You can take any sample of STR test data and calculate an "average" mutation rate, but this average is merely a description of that particular data set. It is not a natural constant, that is, it is not some underlying law of the universe, such as, c, the speed of light. It only appears to be a constant when applied to large data sets because only in large data sets does the randomness begin to even out. And I say, "begin" to even out because many different mutation rates have been derived, and the reason they don't agree is that the sample sizes still aren't large enough to produce a consistent result. When applied to small data sets, the randomness of mutations becomes highly evident and the TMRCAs based on a delusory "constant" mutation rate become useless. |
||||||||||||||||||||||||||||||
My favorite example is that of my first cousin, who tested my maternal
grandfather's STRAUB line for me. To simplify the example (we actually
have two dozen members of this family tested), my cousin, who is modal
for the family, matches person A at 67/67 and person B at 66/67.
The FTDNA-TiP calculator will tell you my cousin is more closely related
to A than he is to B:
My cousin's probability of being related to A reaches 100% at 18 generations and for B reaches 100% in 22 generations. In other words, my cousin has a 100% chance of being related to both of them within genealogical time. Because they have a paper connection to the same progenitor, we have just proven something we already knew, or at least believed which is not to belittle having proved it. It's the main reason for being tested. The problem is, the TiP calculations show my cousin more closely related A than to B, and it's a problem because B is his brother and A is his 6th cousin (once removed). His brother just happens to bear a new mutation, one not found in any other member of the family. (Thankfully, he has a good sense of humor because the family has now dubbed him, "the mutant.") |
||||||||||||||||||||||||||||||
| So, what does the above example tell us? It tells us not to use TMRCA as a precise measure of the degree of relationship between people in genealogical time frames. It is useful to genealogists only to the extent that it tells you when something is totally inside or outside the realm of possibility. It cannot help you reconstruct the family tree. DNA test results should be used to support or debunk a paper pedigree, not to create a pedigree. | ||||||||||||||||||||||||||||||
| I manage six DNA projects, and I don't use the TiP calculator, except
to answer my project members' questions relating to it (and to prepare
this article). When I began my first project in 2004, I relied on
these two guides, compiled by FTDNA:
Interpreting Genetic Distance within Surname Projects: 25 Markers When FTDNA started offering 37 markers, I began using this guide: And when FTDNA started offering 67 markers, I began using this guide: Yes, I do realize these guides are based on TiP calculations, but the advantage of the guides is that they convey the imprecision of genetic distance as a measure of relatedness, rather than giving the false sense of precision conveyed by the TiP calculator. However, I seldom use even these guides anymore |
||||||||||||||||||||||||||||||
| Most people doing their genealogy and being STR tested for genealogical purposes
are Europeans who emigrated from Europe in the last four centuries, mostly
to the United States, some to Australia, southern Africa, and elsewhere.
Hopefully, now that the United States has a President of both European
and African ancestry he's a descendant
of my STRAUB ancestor, by the way we will see more Africans
seeking their roots through DNA testing, too. My point is that we
"colonials" we "displaced persons" are the ones who've become disconnected
from our roots and now seek them. Europeans still living in Europe
are not on this quest they are largely still living where their roots
lie, which is undoubtedly one reason they are so difficult to find and
recruit for DNA testing. Likewise, people who descend from recent
immigrants usually know where they are from, so tend not to be tested,
though the rest of us wish they would to help those of who don't know.
Thus, the sample of people being Y-DNA STR tested is largely comprised
of persons displaced from their origins from 100 to 400 years ago, whose
initial quest is to connect to their immigrant and, secondly, to "cross
the pond."
What the above means for Y-DNA surname projects is that most of our members connect to an immigrant between 5 and 15 generations removed and that most of us have yet to connect to our Old World ancestry, which will not go back more than another 5 or 10 generations, at most. And further, what it means is 1) that TiP calculations cannot help you, except in the broadest way, and 2) that experience empirical evidence will. |
||||||||||||||||||||||||||||||
| My experience with my surname projects has been that a typical member
testing 67 markers will have accumulated from zero to three mutations away
from the modal haplotype of his immigrant ancestor's descendants.
The rank of these frequencies is: 1, 0, 2, 3. These frequencies
are in keeping with the observation that, at 67 markers, you can expect
roughly
one mutation every seven generations. The range of variation is large,
however. For example, I'm aware of one case where two mutations happened
in a single generation, that is, two brothers with a GD of 2, though that
is the only such case I'm aware of, but two mutations within three or four
generations is not uncommon.
Note that these distances (0 to 3) are from the model haplotype for the family. The distance between two individual descendants can be twice that and still constitute a good match. Compare these empirically derived figures to the 67-marker guide, and they are reasonably congruent, with the family's modal haplotype serving as the "in-betweener" connecting persons who would otherwise appear less related than they really are. The two best examples from my projects are the J-M67 CARRICOs and the I1-AS5 STRAUBs, though we have, in fact, crossed the pond the the STRAUBs. While I'm gaining confidence in what constitutes a "good match" between descendants of the same immigrant, I still do not have the empirical data I need to be confident of what constitutes a match "across the pond" or between families who descend from multiple immigrations of a family with deep roots. In the case of the above CARRICOs, the paper evidence supports that they all descend from a single immigrant to Maryland in 1674 because, among other things, no other CARRICO immigrant is found before the 1900 census. The STRAUBs also appear to descend from one Württemberg family, though possibly from more than just the one 1733 immigrant to Philadelphia. However, it is equally clear that other, more common surnames represent multiple immigrations from the same family and families who histories go deeper than the typical surname adoption period in the 16th Century, going back as far as the 13th Century. Those GDs range, so far, up to 6 from the modal, meaning as much as 12 between descendants. However, as I said, I do not as yet have the empirical evidence I need to feel confident of these limits. It would be worthy of publication for someone to compile the data we do have bearing on this matter. In other words Instead of basing the meaning of genetic distance on a calculation so dependent on a constant mutation rate that isn't constant, base it directly on the data. Both the pedigrees and the DNA test results are real, so use them to tell you what genetic distances really mean which, of course, leads us to the issue of why having your members' lineages is so important and why I double-check, as best I can, the genealogy of my surname project members. |
|
![]() |
Privacy Policy ______
|
![]() |