Lets look at calculating a TRMCA using the Scots Sc clade which is a very bushy group. The new Daddy L1335 is a small Wales based clade to the L1065 Scots. Reviewing that set of 111 marker Haplotypes, if one counts the mutations between the two at 111 markers there are 27 , which are based on the current Today's haplotypes, but this is all between their two associated modals which in essence will need to be divided by two to get to the MRCA. Recall, that increasing the number of markers used brings more confidence.

For a more detailed examination, using my TMRCA spreadsheet, I look at several criteria where I look and see if there are any possible bi-modal STRs or outlier haplotypes that show a wider than +- two allele range from the Modal. This more critical on smaller sets of HTs than the larger groups and in smaller panels vs larger and larger. You can treat the outliers by excluding them. Example was where I had a one haplotype out of 20 with an odd DYS456 value of 18 (all others had a 23) throwing my calculation off by a large amount at 25 markers at more than 10 plus generations. I use my Filter Clade(x) for STR GD's with plus or minus two from Modal in the displayed Dashboard report for those odd allele values first and try to determine how it will affect the TMRCA results via different panel size etc.

Next in order is to choose which size panel represents the best overall closest to the modal variance between the haplotypes using a Bruce Walsh designed tool called the k/n ratio which he states:

Http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461668/
"We assume that n markers are scored nonrecombining chromosome of interest (either the Y or mtDNA) and that we observe matches in allelic state at k of these. We start by assuming the infinite alleles model, where each mutation is assumed to be unique. We then modify our results by assuming a (symmetric single-step) stepwise mutational model, which is a more descriptor for microsatellite markers. As we show, when k/n is close to one and n moderate to large, the two different mutational models give essentially the distribution for t (time)."

My method is that when k/n is closest to one it gives essentially the best distribution for time. As you can notice the 37 marker panel is shows the worst k/n and should not be used for TMRCAs. This is what I might consider as the parallel to see how every HT including the outliers in the group fit.

Match k/n scored markers - Filtered (goal is closest to 1) for the CladeA: Scots 1335-Sc and CladeB: L1335 All

Ratio 111 (k/n) 67 (k/n) 37 (k/n) 25 (k/n)

CladeA 0.90319 0.90294 0.85399 0.89988

CladeB 0.89890 0.89739 0.84751 0.89409

I next look at the Confidence Index numbers to help decided which is the best set of markers to use in the above best determined panel size (This tool is also used to set a specific CI for standardization when running multiple clade aging). I usually just set the Confidence Level to 1 sigma (68.27%), for an assumed normal distribution, the probability that a measurement falls within 1 standard deviation of the mean. When performing TMRCA's on different size and/or clade data sets, using the same CI results helps confer standardization for simplicity when comparing of multiple different subclades.

Combining ancestral subclades sometime requires mixing with its subclade(s) for an IntraClade Coalescence, Founders Modal, and Interclade for the both. In the case of the Wales guys, their TMRCA is much younger by itself due to very small number of HTs that have about a 475 ybp modal which as we would have to assume, are missing lineages that possibly daughtered out or have yet to be found and tested. But when combined with its Subclades, the HTs creates a wider modal foot print where the number of generations appears to separate nicely. If the TMRCA is significanty outside the average of the others, then it probably should not be considered such as the IAM-MCM TMRCA ages as shown below. Combined, there are zero GD between them.

HTML Code:

111 Markers Bird's 48 94 NMCM IAM-MCM 111 ALL
IntraClade Coalescence (n-1) Age
YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1)
Clade A: 1335-Sc's
1,373.5 10.06 1,326.9 10.86 1,278.7 11.57 1,397.7 13.23
Clade B: 1335 All
1,461.9 10.82 1,419.5 11.77 1,355.9 12.38 1,477.4 14.08
Diff = 88.4 0.76 92.7 0.91 77.2 0.81 79.7 0.86
Intraclade Founder's Modal Age
YBP CI-SD-GenModal YBP CI-SD-GenModal YBP CI-SD-GenModal YBP CI-SD-GenModal
Clade A: 1335-Sc's
1,512.6 11.59 1,444.1 12.29 1,152.1 9.87 1,525.9 15.04
Clade B: 1335 All
1,601.7 12.37 1,538.6 13.25 1,226.2 10.62 1,607.7 15.94

The CI concept is to chose the most stable type of markers with in the best overall panel as shown by the CI results. In this calculation the Interclade Founders modal is the same as the Intraclade Founders age, which should be and is the high end of the Clade A's TMRCA.

And the result shows that L1335* appears to be only about three generations older than the L1065's at a TMRCA of 1,601.7 vs 1,512.6 ybp, three generations. Only more L1335 haplotypes will confirm or not the truth. There was a quick SNP mutation between Wale and the Scots Guys. Need to look for a 1335 Sc variety that did not get the L1065 to prove that the SNP L1065 was farther apart in generations from L1335 than about three.

Soon I will show how these same methods produced a very close number of generations compared to a known Paper trail. The known ages information remained hidden until after TMRCA result were produced and then was compared to the actual known data in a TRMCA Methods Study which should be due out shortly.

MJost