Page 1 of 31 12311 ... LastLast
Results 1 to 10 of 302

Thread: STR Wars, GDs, TMRCA estimates, Variance, Mutation Rates & SNP counting

  1. #1
    Registered Users
    Posts
    3,492
    Sex
    Y-DNA
    R1b
    mtDNA
    H

    STR Wars, GDs, TMRCA estimates, Variance, Mutation Rates & SNP counting

    I'm opening this thread as a place to discuss and catalog information on using Y STR and Y SNP information to try to calculate aging within R haplogroups.

  2. The Following User Says Thank You to Mikewww For This Useful Post:

     NiloSaharan (01-31-2017)

  3. #2
    Registered Users
    Posts
    3,492
    Sex
    Y-DNA
    R1b
    mtDNA
    H

    Although I think the Law of Large Numbers can outweigh problems with individual STRs it makes sense to realize that NOT all STRs have similar behavior patterns. We are generally interested in those that can help us estimate time to a most recent common ancestor (TMRCA).

    If some are not good at that and we have enough alternatives it makes sense to me to use the alternatives.
    Steve Bird at Texas State Univ. wrote this paper: "Towards Improvements in the Estimation of the Coalescent: Implications for the Most Effective Use of Y Chromosome Short Tandem Repeat Mutation Rates", 2012.
    http://www.plosone.org/article/info%...l.pone.0048638

    He evaluates Y STRs for their fitness to having a linear variance relationship with time.

  4. The Following 2 Users Say Thank You to Mikewww For This Useful Post:

     Clinton P (04-21-2013), Silesian (04-21-2013)

  5. #3
    Registered Users
    Posts
    1,048
    Sex
    Location
    Central United States
    Ethnicity
    Celtic, Goidelic
    Y-DNA
    DF13>FGC5494>FGC5496
    mtDNA
    H11a2a>C12014T

    United States of America Netherlands Isle of Man Scotland Poland Northern Ireland
    A discussion is working on the Yahoo L21 board, under AlexWilliams 111 marker SNP based Haplotype PhyloTree where the discussion of mutation rates that Anatole Klyosov uses are calibrated for 25 years per generation. One person converted AK's mutation rate to a 30 year per generation number. Also discussed was the use of 25 or 30 years per generation. One believed 30 years should be use back 1000 years and 25 prior to that.

    Based on my understanding that a mutation rate is calculated is based on the number of transmissions that occur before a STR mutation happens. Example, it is estimated that a mutation will occur only once every 500 transmissions (birth events) per a single Y-DNA STR marker – or roughly an overall rate of a 0.2% mutation rate, a debated rate of genetic mutation clock.

    Anatole Klyosov uses several method to produce ages based on a 25 year per generation mutation rate.
    http://www.jogg.info/52/files/Klyosov1.pdf

    Chandler has posted his own set of calculated mutation rates. His paper is found at: http://www.jogg.info/22/Chandler.pdf

    Marko Heinla has produced his own more recent mutation rates back in May 2012 using methods using Chandler's methods. He has a link to his 111 marker rates near the botton of this web page.
    https://dl.dropboxusercontent.com/u/...svg_trees.html

    Marko Heinila's results are based on about 4,000 111 level samples. He used an estimation process that each haplotype pair was considered an independent random draw from a model distribution. Model distribution suggests what is the ratio of mismatches and matches in a given marker if pairs with a given number of matching markers in general are considered. The pair data was then used to solve the mutation rates. He said that this is the same idea as in Chandler's paper on mutation rate estimation.

    Ken Nordtvedt chose to use Heinla's 2012 mutation rates in his 111t Generations spreadsheet which I maintained its use in my TRMCA Estimator spreadsheet as well.

    It is estimated that a mutation will occur only once every 500 transmission (birth events) per a single Y-DNA STR marker – or roughly an overall rate of a

    0.2% mutation rate, a debated rate of genetic mutation clock. We have more recent calculations that show a more realistic transmission rates.

    Recalulated using Marko Heinla 2012 Mutation Rates

    #Markers Transmissions BirthEvents GenYrs=25.0 GenYrs=30.0

    12 495 41.3 1,031.3 1,237.5
    25 413 16.5 413.0 495.6
    37 280 7.6 189.2 227.0
    67 388 5.8 144.8 173.7
    111 382 3.4 86.0 103.2

    12-mcm 556 61.8 1,544.4 1,853.3
    25-mcm 428 26.8 668.8 802.5
    37-mcm 319 13.3 332.3 398.8
    67-mcm 452 9.0 226.0 271.2
    111-mcm 411 4.4 109.3 131.2


    #Mkrs MarkoHCumlRate perMarkerRate
    12 0.0242 0.0020
    25 0.0605 0.0024
    37 0.1323 0.0036
    67 0.1728 0.0026
    111 0.2907 0.0026

    12-mcm 0.0162 0.0018
    25-mcm 0.0374 0.0023
    37-mcm 0.0747 0.0031
    67-mcm 0.1107 0.0022
    111-mcm 0.2285 0.0024
    MJost
    148326, FGC-0FW1R, YSID6 & YF3272 R-DF13>FGC5494>*7448>*5496>*5521>*5511>*5539>*5538>* 5508>*5524
     
    Watterson USA GD1/67 & GD3/111, *5508+. GD1’s father’s sister-23andme pred. 3rd Cous w/ 0.91% DNA shared-3 seg. Largest on Chr1 w/non-Euro admix affirms my NPE paternal Watterson line via aDNA & YDNA. A 2nd pred. 4th cous has same DKA b. 1840's Georgia and MDKA d 1703 IOM. 3rd Cousin FtDNA FF is from the Watterson Ala. *5538+ b. IOM w/ GD6/67 & GD8/111 -SGD3. FGC5539+ a Scot-Ross GD13/111 -SGD8

  6. The Following User Says Thank You to MJost For This Useful Post:

     NiloSaharan (01-31-2017)

  7. #4
    Registered Users
    Posts
    1,048
    Sex
    Location
    Central United States
    Ethnicity
    Celtic, Goidelic
    Y-DNA
    DF13>FGC5494>FGC5496
    mtDNA
    H11a2a>C12014T

    United States of America Netherlands Isle of Man Scotland Poland Northern Ireland
    Calculating a group of Haplotypes' TMRCAs in my TRMCA Estimator spreadsheet Concepts overview.


    Intraclade is 'within a clade', a clade is derived from a common ancestor's data which are

    within a higher level grouping of a genetic haplogroup such as M222 and includes those

    haplotypes that are known to have positive test results.


    Technically two things are being calculated from a clade (Haplotype) dataset, Population

    variance and Sample variance which are used in calculating the Coalescence and Founders

    Modal Intraclade generation age respectively. Next the sum of each type of variance is

    divided by the sum of the mutation rates to garner a generation (MRCA) age.


    Further when estimating the variance, the dataset used is technically a sample of the

    population space. Coalescence looks at just the data as a small population which is

    assumed to be close to actual population representation, where the modal Founders section

    is an adjusted sample that represents the entire population.


    The Coalescence Whole (n) population generation age is biased. The Coalescence sample (n-

    1) population generation age is a corrected generation age to get a 'True' unbiased

    result.


    To explain bias, this method of Coalescence estimation is close to optimal, with the

    caveat that it underestimates the variance by a factor of (n - 1)/ n. (For example, when n

    = 1 the variance of a single observation is obviously zero regardless of the true

    variance). This gives a bias which should be corrected for when n is small by multiplying

    by n /(n-1). This is why Coalescence Whole population Age is less than the Coalescence

    sample population age.


    My TMRCA spreadsheet can produce individual statistical variances which should show a

    generational point were all haplotypes meet their common ancestor (think of the first two

    Coalescence Ages which is a variance (Think variance of factional mutations counting {sort

    of}).

    I report three intraclade variance reports to produce an estimated Most Recent Common

    Ancestor (MRCA) age:

    Coalescence Age = Variance of Whole Population (n) < (near to KenN's original Coalescence

    age using Varp functions)


    Coalescence Age = Variance of Sample Population (n-1) (Sampled Var)


    Founder's Modal Age Variance (using Ken's formula for Modal Method)

    Use Coalescence(n) for close families with all known family members MRCA node.



    Use Coalescence(n-1) for groups of unknown or missing lineages to a MRCA node of the

    applied set of haplotypes (most runs).

    Use Modal for the Founders Age. The founders Age will be older than the Coalescence (n-1)

    Age. Since there are usually missing lineage branches and/or generations without mutations

    considering Haplotype markers are not 100 percent represented.


    An Interclade MRCA age point is calculated for the last two results above [(n-1) and

    Modal] between the two clades studied to point to a MRCA age from each clades node point

    using a statistical Pooled Standard Deviation method.


    MJost
    148326, FGC-0FW1R, YSID6 & YF3272 R-DF13>FGC5494>*7448>*5496>*5521>*5511>*5539>*5538>* 5508>*5524
     
    Watterson USA GD1/67 & GD3/111, *5508+. GD1’s father’s sister-23andme pred. 3rd Cous w/ 0.91% DNA shared-3 seg. Largest on Chr1 w/non-Euro admix affirms my NPE paternal Watterson line via aDNA & YDNA. A 2nd pred. 4th cous has same DKA b. 1840's Georgia and MDKA d 1703 IOM. 3rd Cousin FtDNA FF is from the Watterson Ala. *5538+ b. IOM w/ GD6/67 & GD8/111 -SGD3. FGC5539+ a Scot-Ross GD13/111 -SGD8

  8. The Following 2 Users Say Thank You to MJost For This Useful Post:

     Silesian (05-02-2013), Telfermagne (07-16-2014)

  9. #5
    Registered Users
    Posts
    1,048
    Sex
    Location
    Central United States
    Ethnicity
    Celtic, Goidelic
    Y-DNA
    DF13>FGC5494>FGC5496
    mtDNA
    H11a2a>C12014T

    United States of America Netherlands Isle of Man Scotland Poland Northern Ireland
    I posted some TMRCAs on the Yahoo 1113 Combo forum and poster Daryl posed some questions and skepticisms of TMRCA's. So I will reply here under this thread as suggested by MikeW.

    Daryl,

    As I have always stated, I am not a Math expert. But Yes, I agreed with you when you said in a previous post that "TMRCA calculations are mostly speculative", And I said the results are all about their relevance. These are not error rates as you pointed out, but only Statistics probabilities. Let review.

    In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out has a theoretical probability distribution at 1 sigma (66.27%) is the distribution's outcome probability.


    Look at this chart which shows the normal distribution curve that illustrates standard

    deviations. Each band has 1 standard deviation.

    https://en.wikipedia.org/wiki/File:S...on_diagram.svg

    The standard deviation is an important reference, because we can say that any generaton value calculated is:
    •likely to be within 1 standard deviation (68 out of 100 will be)
    •very likely to be within 2 standard deviations (95 out of 100 will be)
    •almost certainly within 3 standard deviations (997 out of 1000 will be)


    I have an option in my spreadsheet to adjust and check the Confidence level to any level to evaluate what number of generations it would it take to produce a MRCA point at the assigned confidence. In other words, at a 99.73% probability that the standard deviation of the generations of the sample fall between x and y generations. CI indicate the reliability of an estimate. Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population.

    The “Variance Method” (Slatkin, 1995; Stumpf, 2001) assumes that the variance (average-squared-distance from ancestral value) of each STR marker in a large population, is proportional to the TMRCA of that population.

    Ken Nordtvedt has implemented variance into his Generations spreadsheets.calculations's generation cacluations are very close to each other.

    Please note that Ken explains Variance Sigma (Standard Deviation) Concepts on his website.

    http://knordtvedt.home.bresnan.net/S...0Variance.pptx

    Yes, Statistically Relevant.

    MJost
    148326, FGC-0FW1R, YSID6 & YF3272 R-DF13>FGC5494>*7448>*5496>*5521>*5511>*5539>*5538>* 5508>*5524
     
    Watterson USA GD1/67 & GD3/111, *5508+. GD1’s father’s sister-23andme pred. 3rd Cous w/ 0.91% DNA shared-3 seg. Largest on Chr1 w/non-Euro admix affirms my NPE paternal Watterson line via aDNA & YDNA. A 2nd pred. 4th cous has same DKA b. 1840's Georgia and MDKA d 1703 IOM. 3rd Cousin FtDNA FF is from the Watterson Ala. *5538+ b. IOM w/ GD6/67 & GD8/111 -SGD3. FGC5539+ a Scot-Ross GD13/111 -SGD8

  10. #6
    Registered Users
    Posts
    3,492
    Sex
    Y-DNA
    R1b
    mtDNA
    H

    I'm copying this over from another thread so we don't bog that one down. For some people this might be interesting so I'll continue the conversation on estimating ages and using Y STRs and some of the vagaries and benefits there of.

    Quote Originally Posted by Richard A. Rocca
    Just a reminder - Anatole Klyosov calculated the age of E-V13 at 1000 BC and yet E-V13 has been found in a 5000 BC sample from Spain. Therefore, I don't know that we can take his dating techniques all that seriously. I do thing that there is a small value however in showing that subclade A is older than subclade B, with the understanding that founder affects could be in play.
    Quote Originally Posted by Mikewww
    Not always, but Kylosov's TMRCA calculations for R1b are usually in line with others for R1b so I don't think we need to distrust them just because they are from Klyosov. On the other hand, his error ranges are quite narrow compare to most alternative methods. I just ignore his error ranges.

    I'm not sure the E-V13 ancient DNA is a good example to assess the Klyosov methodology's technical correctness. TMRCA intraclade estimates like Klyosov's only represent the MRCA (Most Recent Common Ancestor) for the remnant population. The E-V13 MRCA (single man) for those surviving may have very little to do with the ancient DNA E-V13 man found other than some very, very ancient connection. Of course, all of this is what drove Dienekes nuts, particularly as it relates to geographical differentiation.

    However, problems with intraclade calculations can be mitigated by doing multiple phylogenetically comparable calculations and using interclade methods like what Ken Nordtvedt developed.

    If we want to go deeper into the topic of methodologies we should probably go over to this thread designed for that purpose.
    Quote Originally Posted by Richard A. Rocca
    I am well aware of the limitations and will pass on going deeper into the topic, but I only mention it because I want people to be aware that Anatoly's dates only represent the successful living ancestors and are probably fraught with the reduced age of successful founder effects. It in no way means that we will not find R1b hundreds or even thousands of years older than his numbers.
    I wasn't intending to slight anyone's understanding of the situation, so sorry if I sounded condescending. For people just catching up or tuning in, I just wanted to point out that Klyosov's methodology has nothing uniquely wrong with it although it suffers the same maladies as any Y STR based age estimation technique.

    Probably the best and most fun initiation into this is Dienekes's blog entry here.
    http://dienekes.blogspot.com/2011/08...t-al-2011.html

    Here is the kick-off of the fun part. You have to scroll down to the comments.
    Quote Originally Posted by Dienekes
    From now on I am going on a Y-STR boycott on this blog. Y-STRs still have their obvious uses, for recent genealogy, or forensics. They may also convey some information about human prehistory in the broadest time scales.
    Quote Originally Posted by Klyosov
    Excellent, Dienekes. I truly appreciate your boycott. It means that one more person who understands nothing in the area, is out.
    Last edited by Mikewww; 05-14-2013 at 04:49 PM.

  11. #7
    Registered Users
    Posts
    1,048
    Sex
    Location
    Central United States
    Ethnicity
    Celtic, Goidelic
    Y-DNA
    DF13>FGC5494>FGC5496
    mtDNA
    H11a2a>C12014T

    United States of America Netherlands Isle of Man Scotland Poland Northern Ireland
    Quote Originally Posted by MJost View Post

    The “Variance Method” (Slatkin, 1995; Stumpf, 2001) assumes that the variance (average-squared-distance from ancestral value) of each STR marker in a large population, is proportional to the TMRCA of that population.

    Ken Nordtvedt has implemented variance into his Generations spreadsheet calculations. Please note that Ken explains Variance Sigma (Standard Deviation) Concepts on his website.

    http://knordtvedt.home.bresnan.net/S...0Variance.pptx
    Edit repost
    148326, FGC-0FW1R, YSID6 & YF3272 R-DF13>FGC5494>*7448>*5496>*5521>*5511>*5539>*5538>* 5508>*5524
     
    Watterson USA GD1/67 & GD3/111, *5508+. GD1’s father’s sister-23andme pred. 3rd Cous w/ 0.91% DNA shared-3 seg. Largest on Chr1 w/non-Euro admix affirms my NPE paternal Watterson line via aDNA & YDNA. A 2nd pred. 4th cous has same DKA b. 1840's Georgia and MDKA d 1703 IOM. 3rd Cousin FtDNA FF is from the Watterson Ala. *5538+ b. IOM w/ GD6/67 & GD8/111 -SGD3. FGC5539+ a Scot-Ross GD13/111 -SGD8

  12. #8
    Registered Users
    Posts
    3,492
    Sex
    Y-DNA
    R1b
    mtDNA
    H

    This may seem a little off track, but bear with me. This is about understanding MRCAs....

    What's the value of a haplogroup?

    What's the value of an SNP?

    You might be surprised to hear me say this but I think there is very little value in haplogroups and SNPs
    ... at least in and of themselves.

    A haplogroup is just a group of people with a common ancestor.

    An SNP is just a single nucleotide polymorphism, a mutation, that marks the group of people with a common ancestor. It is just a signpost on a branch of the human family tree. The true nature of the haplogroup of people, any commonality in culture, location, etc., many not align with the SNPs have marked the lineages. The SNP could mark either a subset or superset of the true group of people we care about.

    This gets into some notions about value and philosopical concerns, but these are the points I'm getting at.

    1) I do not care too much about all of the extinct lineages of mankind. There are many, many extinct lineages. On the Y chromosome/paternal side probably there are many, many more lineages that have gone extinct than those who survive.

    2) I do care about how we got here and how, where, when and why they did what they did to get us to where we are today.

    I think these notions are just conveying that what many hobbyists may care about most is the connection to genealogy and deeper ancestry.... and specifically our ancestry.

    The net is that the most recent common ancestors (MRCAs) of the various branches remaining today (and in recent history) are critical people to try understand. The more MRCAs we can understand better at more layers and branches in the tree, then the more we have a chance to understand our ancestry.

    I am not saying that all of the old extinct lineages were not important people or that SNPs are useless. I'm just trying to say they are most important in how they help us understand who we are and how we got here. They are just bread crumbs from an old trail.

    Superconducting supercolliders smash atoms and look at the residue of the accident to try to get more detail on the characteristics of the atom. In the case of genetics; the accidents, bottlenecks, growth spurts, etc. have already taken place but, likewise, we are looking at the residue to try to ascertain what happened.

    I don't care when an SNP first occurred. I care about the expansion and movements of my ancestry. The SNP marked haplotroup ages may help put a maximum age in place for my ancesty. That's good, but it's not really the haplogroup I'm after.


    P.S. Science may be interested in who the genetic Adam was or wasn't and some other things. That's fine with me but I'm really after understanding how we, the survivors, got here.
    Last edited by Mikewww; 05-16-2013 at 04:21 AM.

  13. The Following User Says Thank You to Mikewww For This Useful Post:

     NiloSaharan (01-31-2017)

  14. #9
    Registered Users
    Posts
    3,492
    Sex
    Y-DNA
    R1b
    mtDNA
    H

    I'm moving the following post here (into "STR Wars..." as it on a generic topic of TMRCA, STR issues and concerns. This type discussion can bog down any thread so let's keep it here and refer to it as is applicable to other discussions.

    Quote Originally Posted by Mikewww View Post
    TMRCA estimates are subject to many vagaries. However, for R1b we now several studies' worth of data, a lot of long haplotypes and and interclade TMRCA methodologies. I think we have robust enough data broken up by the SNP defined phylogeny that we can have robust estimates. The issues are the methodologies themselves, or actually, more the STR mutation rates. The SNP methods need maturity in the coverage of the Y chromosome. Posted on the R1b Early Subclades subcategory phylogeny thread are several TMRCA estimates from different folks and methodologies that find essential agreement.

    I would never say they are precise. Still the relative nature of the timing within the phylogeny along with the geographical distribution is robust.
    Quote Originally Posted by mcg11 View Post
    TMRCA estimates are precisely what the definition says: "time to the most recent common ancestor". These estimates are based on the use of STR values and have nothing intrinsically to do with SNP's and when they occur. e.g. Clan gregor had a founder c. 1350 with an apparent mutation from 11 to 10 at 385a. It turns out that a SNP mutation, L1065, defines the clan along with many other clans some 300 to 500 years before. Now there may be a more defining SNP mutation but it hasn't been found yet. So, right now, we infer SNP sequences, in part from STR data. This may be flawed because some subset of a group may have had more advantageous condition for procreation and out-produced other lines? Relative timing based on the apparent lines success may not be robust?
    Last edited by Mikewww; 05-23-2013 at 05:37 PM.

  15. #10
    Registered Users
    Posts
    3,492
    Sex
    Y-DNA
    R1b
    mtDNA
    H

    Quote Originally Posted by mcg11 View Post
    TMRCA estimates are precisely what the definition says: "time to the most recent common ancestor". These estimates are based on the use of STR values and have nothing intrinsically to do with SNP's and when they occur. e.g. Clan gregor had a founder c. 1350 with an apparent mutation from 11 to 10 at 385a. It turns out that a SNP mutation, L1065, defines the clan along with many other clans some 300 to 500 years before. Now there may be a more defining SNP mutation but it hasn't been found yet. So, right now, we infer SNP sequences, in part from STR data. This may be flawed because some subset of a group may have had more advantageous condition for procreation and out-produced other lines? Relative timing based on the apparent lines success may not be robust?
    This can be mitigated by use of intraclade age estimates within known related groups, as defined by SNPs, and then comparing those estimates across a known tree of SNP based subclades. This is what Ken Nordtvedt's interclade TMRCA estimates are all about.

    We also see other non STR methods are coming on-line. The 2008 Karafet study used a scientific sampling of Y chromsome SNPs to estimate ages. They estimated the R1 TMRCA, which is ancestral to R1b and R1a, as 18.5K ybp. This fits nicely with what the common (hobbyist and FTDNA) TMRCA estimation methods are getting for R1b subgroups so there is some apparent corraboration of STR based methods from this "novel" (Karafet's word) SNP method.
    "New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree" by Karafet, et. al., 2008. The et. al. in this case includes Michael Hammer, FTDNA's Chief Scientist.
    Last edited by Mikewww; 05-23-2013 at 08:00 PM.

Page 1 of 31 12311 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •