I'm opening this thread as a place to discuss and catalog information on using Y STR and Y SNP information to try to calculate aging within R haplogroups.
I'm opening this thread as a place to discuss and catalog information on using Y STR and Y SNP information to try to calculate aging within R haplogroups.
NiloSaharan (01-31-2017)
Although I think the Law of Large Numbers can outweigh problems with individual STRs it makes sense to realize that NOT all STRs have similar behavior patterns. We are generally interested in those that can help us estimate time to a most recent common ancestor (TMRCA).
If some are not good at that and we have enough alternatives it makes sense to me to use the alternatives.
Steve Bird at Texas State Univ. wrote this paper: "Towards Improvements in the Estimation of the Coalescent: Implications for the Most Effective Use of Y Chromosome Short Tandem Repeat Mutation Rates", 2012.
http://www.plosone.org/article/info%...l.pone.0048638
He evaluates Y STRs for their fitness to having a linear variance relationship with time.
A discussion is working on the Yahoo L21 board, under AlexWilliams 111 marker SNP based Haplotype PhyloTree where the discussion of mutation rates that Anatole Klyosov uses are calibrated for 25 years per generation. One person converted AK's mutation rate to a 30 year per generation number. Also discussed was the use of 25 or 30 years per generation. One believed 30 years should be use back 1000 years and 25 prior to that.
Based on my understanding that a mutation rate is calculated is based on the number of transmissions that occur before a STR mutation happens. Example, it is estimated that a mutation will occur only once every 500 transmissions (birth events) per a single Y-DNA STR marker – or roughly an overall rate of a 0.2% mutation rate, a debated rate of genetic mutation clock.
Anatole Klyosov uses several method to produce ages based on a 25 year per generation mutation rate.
http://www.jogg.info/52/files/Klyosov1.pdf
Chandler has posted his own set of calculated mutation rates. His paper is found at: http://www.jogg.info/22/Chandler.pdf
Marko Heinla has produced his own more recent mutation rates back in May 2012 using methods using Chandler's methods. He has a link to his 111 marker rates near the botton of this web page.
https://dl.dropboxusercontent.com/u/...svg_trees.html
Marko Heinila's results are based on about 4,000 111 level samples. He used an estimation process that each haplotype pair was considered an independent random draw from a model distribution. Model distribution suggests what is the ratio of mismatches and matches in a given marker if pairs with a given number of matching markers in general are considered. The pair data was then used to solve the mutation rates. He said that this is the same idea as in Chandler's paper on mutation rate estimation.
Ken Nordtvedt chose to use Heinla's 2012 mutation rates in his 111t Generations spreadsheet which I maintained its use in my TRMCA Estimator spreadsheet as well.
It is estimated that a mutation will occur only once every 500 transmission (birth events) per a single Y-DNA STR marker – or roughly an overall rate of a
0.2% mutation rate, a debated rate of genetic mutation clock. We have more recent calculations that show a more realistic transmission rates.
Recalulated using Marko Heinla 2012 Mutation Rates
MJost#Markers Transmissions BirthEvents GenYrs=25.0 GenYrs=30.0
12 495 41.3 1,031.3 1,237.5
25 413 16.5 413.0 495.6
37 280 7.6 189.2 227.0
67 388 5.8 144.8 173.7
111 382 3.4 86.0 103.2
12-mcm 556 61.8 1,544.4 1,853.3
25-mcm 428 26.8 668.8 802.5
37-mcm 319 13.3 332.3 398.8
67-mcm 452 9.0 226.0 271.2
111-mcm 411 4.4 109.3 131.2
#Mkrs MarkoHCumlRate perMarkerRate
12 0.0242 0.0020
25 0.0605 0.0024
37 0.1323 0.0036
67 0.1728 0.0026
111 0.2907 0.0026
12-mcm 0.0162 0.0018
25-mcm 0.0374 0.0023
37-mcm 0.0747 0.0031
67-mcm 0.1107 0.0022
111-mcm 0.2285 0.0024
148326, FGC-0FW1R, YSID6 & YF3272 R-DF13>FGC5494>*7448>*5496>*5521>*5511>*5539>*5538>* 5508>*5524
NiloSaharan (01-31-2017)
Calculating a group of Haplotypes' TMRCAs in my TRMCA Estimator spreadsheet Concepts overview.
Intraclade is 'within a clade', a clade is derived from a common ancestor's data which are
within a higher level grouping of a genetic haplogroup such as M222 and includes those
haplotypes that are known to have positive test results.
Technically two things are being calculated from a clade (Haplotype) dataset, Population
variance and Sample variance which are used in calculating the Coalescence and Founders
Modal Intraclade generation age respectively. Next the sum of each type of variance is
divided by the sum of the mutation rates to garner a generation (MRCA) age.
Further when estimating the variance, the dataset used is technically a sample of the
population space. Coalescence looks at just the data as a small population which is
assumed to be close to actual population representation, where the modal Founders section
is an adjusted sample that represents the entire population.
The Coalescence Whole (n) population generation age is biased. The Coalescence sample (n-
1) population generation age is a corrected generation age to get a 'True' unbiased
result.
To explain bias, this method of Coalescence estimation is close to optimal, with the
caveat that it underestimates the variance by a factor of (n - 1)/ n. (For example, when n
= 1 the variance of a single observation is obviously zero regardless of the true
variance). This gives a bias which should be corrected for when n is small by multiplying
by n /(n-1). This is why Coalescence Whole population Age is less than the Coalescence
sample population age.
My TMRCA spreadsheet can produce individual statistical variances which should show a
generational point were all haplotypes meet their common ancestor (think of the first two
Coalescence Ages which is a variance (Think variance of factional mutations counting {sort
of}).
I report three intraclade variance reports to produce an estimated Most Recent Common
Ancestor (MRCA) age:
Coalescence Age = Variance of Whole Population (n) < (near to KenN's original Coalescence
age using Varp functions)
Coalescence Age = Variance of Sample Population (n-1) (Sampled Var)
Founder's Modal Age Variance (using Ken's formula for Modal Method)
Use Coalescence(n) for close families with all known family members MRCA node.
Use Coalescence(n-1) for groups of unknown or missing lineages to a MRCA node of the
applied set of haplotypes (most runs).
Use Modal for the Founders Age. The founders Age will be older than the Coalescence (n-1)
Age. Since there are usually missing lineage branches and/or generations without mutations
considering Haplotype markers are not 100 percent represented.
An Interclade MRCA age point is calculated for the last two results above [(n-1) and
Modal] between the two clades studied to point to a MRCA age from each clades node point
using a statistical Pooled Standard Deviation method.
MJost
148326, FGC-0FW1R, YSID6 & YF3272 R-DF13>FGC5494>*7448>*5496>*5521>*5511>*5539>*5538>* 5508>*5524
Silesian (05-02-2013), Telfermagne (07-16-2014)
I posted some TMRCAs on the Yahoo 1113 Combo forum and poster Daryl posed some questions and skepticisms of TMRCA's. So I will reply here under this thread as suggested by MikeW.
Daryl,
As I have always stated, I am not a Math expert. But Yes, I agreed with you when you said in a previous post that "TMRCA calculations are mostly speculative", And I said the results are all about their relevance. These are not error rates as you pointed out, but only Statistics probabilities. Let review.
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out has a theoretical probability distribution at 1 sigma (66.27%) is the distribution's outcome probability.
Look at this chart which shows the normal distribution curve that illustrates standard
deviations. Each band has 1 standard deviation.
https://en.wikipedia.org/wiki/File:S...on_diagram.svg
The standard deviation is an important reference, because we can say that any generaton value calculated is:
•likely to be within 1 standard deviation (68 out of 100 will be)
•very likely to be within 2 standard deviations (95 out of 100 will be)
•almost certainly within 3 standard deviations (997 out of 1000 will be)
I have an option in my spreadsheet to adjust and check the Confidence level to any level to evaluate what number of generations it would it take to produce a MRCA point at the assigned confidence. In other words, at a 99.73% probability that the standard deviation of the generations of the sample fall between x and y generations. CI indicate the reliability of an estimate. Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population.
The “Variance Method” (Slatkin, 1995; Stumpf, 2001) assumes that the variance (average-squared-distance from ancestral value) of each STR marker in a large population, is proportional to the TMRCA of that population.
Ken Nordtvedt has implemented variance into his Generations spreadsheets.calculations's generation cacluations are very close to each other.
Please note that Ken explains Variance Sigma (Standard Deviation) Concepts on his website.
http://knordtvedt.home.bresnan.net/S...0Variance.pptx
Yes, Statistically Relevant.
MJost
148326, FGC-0FW1R, YSID6 & YF3272 R-DF13>FGC5494>*7448>*5496>*5521>*5511>*5539>*5538>* 5508>*5524
I'm copying this over from another thread so we don't bog that one down. For some people this might be interesting so I'll continue the conversation on estimating ages and using Y STRs and some of the vagaries and benefits there of.
Originally Posted by Richard A. RoccaOriginally Posted by MikewwwI wasn't intending to slight anyone's understanding of the situation, so sorry if I sounded condescending. For people just catching up or tuning in, I just wanted to point out that Klyosov's methodology has nothing uniquely wrong with it although it suffers the same maladies as any Y STR based age estimation technique.Originally Posted by Richard A. Rocca
Probably the best and most fun initiation into this is Dienekes's blog entry here.
http://dienekes.blogspot.com/2011/08...t-al-2011.html
Here is the kick-off of the fun part. You have to scroll down to the comments.
Originally Posted by DienekesOriginally Posted by Klyosov
Last edited by Mikewww; 05-14-2013 at 04:49 PM.
This may seem a little off track, but bear with me. This is about understanding MRCAs....
What's the value of a haplogroup?
What's the value of an SNP?
You might be surprised to hear me say this but I think there is very little value in haplogroups and SNPs
... at least in and of themselves.
A haplogroup is just a group of people with a common ancestor.
An SNP is just a single nucleotide polymorphism, a mutation, that marks the group of people with a common ancestor. It is just a signpost on a branch of the human family tree. The true nature of the haplogroup of people, any commonality in culture, location, etc., many not align with the SNPs have marked the lineages. The SNP could mark either a subset or superset of the true group of people we care about.
This gets into some notions about value and philosopical concerns, but these are the points I'm getting at.
1) I do not care too much about all of the extinct lineages of mankind. There are many, many extinct lineages. On the Y chromosome/paternal side probably there are many, many more lineages that have gone extinct than those who survive.
2) I do care about how we got here and how, where, when and why they did what they did to get us to where we are today.
I think these notions are just conveying that what many hobbyists may care about most is the connection to genealogy and deeper ancestry.... and specifically our ancestry.
The net is that the most recent common ancestors (MRCAs) of the various branches remaining today (and in recent history) are critical people to try understand. The more MRCAs we can understand better at more layers and branches in the tree, then the more we have a chance to understand our ancestry.
I am not saying that all of the old extinct lineages were not important people or that SNPs are useless. I'm just trying to say they are most important in how they help us understand who we are and how we got here. They are just bread crumbs from an old trail.
Superconducting supercolliders smash atoms and look at the residue of the accident to try to get more detail on the characteristics of the atom. In the case of genetics; the accidents, bottlenecks, growth spurts, etc. have already taken place but, likewise, we are looking at the residue to try to ascertain what happened.
I don't care when an SNP first occurred. I care about the expansion and movements of my ancestry. The SNP marked haplotroup ages may help put a maximum age in place for my ancesty. That's good, but it's not really the haplogroup I'm after.
P.S. Science may be interested in who the genetic Adam was or wasn't and some other things. That's fine with me but I'm really after understanding how we, the survivors, got here.
Last edited by Mikewww; 05-16-2013 at 04:21 AM.
NiloSaharan (01-31-2017)
Last edited by Mikewww; 05-23-2013 at 05:37 PM.
This can be mitigated by use of intraclade age estimates within known related groups, as defined by SNPs, and then comparing those estimates across a known tree of SNP based subclades. This is what Ken Nordtvedt's interclade TMRCA estimates are all about.
We also see other non STR methods are coming on-line. The 2008 Karafet study used a scientific sampling of Y chromsome SNPs to estimate ages. They estimated the R1 TMRCA, which is ancestral to R1b and R1a, as 18.5K ybp. This fits nicely with what the common (hobbyist and FTDNA) TMRCA estimation methods are getting for R1b subgroups so there is some apparent corraboration of STR based methods from this "novel" (Karafet's word) SNP method.
"New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree" by Karafet, et. al., 2008. The et. al. in this case includes Michael Hammer, FTDNA's Chief Scientist.
Last edited by Mikewww; 05-23-2013 at 08:00 PM.