Page 9 of 32 FirstFirst ... 789101119 ... LastLast
Results 81 to 90 of 320

Thread: STR Wars, GDs, TMRCA estimates, Variance, Mutation Rates & SNP counting

  1. #81
    Gold Class Member
    Posts
    8,479

    Quote Originally Posted by Michał View Post
    Below please find my SNP-based TMRCA estimations (in ky) for different R1b clades (and some upstream haplogroups) present in Sardinia. These estimations are based on an assumed mutation rate of 0.7 10^-9 per nucleotide per year (chosen for some reasons mentioned in another thread), while the numbers given in parentheses correspond to the TMRCA values calculated using the mutations rates 0.82 and 0.53, as proposed by Poznik and Francalacci, respectively).

    63.0 (53.7-82.8) haplogroup F
    61.9 (52.8-81.3) haplogroup IJK
    58.8 (50.1-77.3) haplogroup K
    40.2 (34.3-52.9) haplogroup P
    36.6 (31.2-48.2) haplogroup I
    33.5 (28.6-44.1) haplogroup R
    27.6 (23.5-36.3) haplogroup R1
    22.9 (19.5-30.1) R1b-P25
    14.9 (12.5-19.6) R1b-V88
    8.6 (7.3-11.3) R1b-M269
    8.3 (7.1-10.9) R1b-L23
    7.6 (6.5-10.0) R1b-L51
    7.4 (6.3-9.7) R1b-Z2105
    7.2 (6.1-9.5) R1b-M269(xL23)
    6.6 (5.6-8.6) R1b-L11
    6.2 (5.3-8.2) R1b-P312
    6.1 (5.2-8.0) R1b-U152

    Assuming that the number of downstream mutations found in members of some poorly represented subclades is not a reliable source of data (due to some technical reasons associated with using the low pass sequencing method), I have instead used the distance (i.e. the number of SNPs) between a parent clade and a common ancestor of a given subclade as a basis for calculating the age (TMRCA) of every subclade. For estimating the age of haplogroup F, I have used the average number of mutations downstream of haplogroup F in members of the well-represented (in Sardinia) clade I2a-M26, which was about 404 mutations. It is worth noting that the average number of mutations downstream of haplogroup F in clade R1b-U152 (a clade that is also frequent in Sardinia, but not as common as I2a-M26) was close to the above number but, nevertheless, evidently lower (392). Thus, when basing similar calculations on this slightly reduced number of SNPs found in members of R1b-U152, we get lower TMRCA values, as shown below.

    61.3 (52.3-80.6) haplogroup F
    60.2 (51.3-79.1) haplogroup IJK
    57.1 (48.7-75.0) haplogroup K
    38.5 (32.9-50.6) haplogroup P
    34.9 (30.0-45.9) haplogroup I
    31.8 (27.1-41.8) haplogroup R
    25.9 (22.1-34.0) haplogroup R1
    21.2 (18.1-27.9) R1b-P25
    13.3 (11.3-17.4) R1b-V88
    6.9 (5.9-9.0) R1b-M269
    6.6 (5.6-8.6) R1b-L23
    5.9 (5.1-7.8) R1b-L51
    5.6 (4.8-7.4) R1b-Z2105
    5.5 (4.7-7.2) R1b-M269(xL23)
    4.8 (4.1-6.4) R1b-L11
    4.5 (3.9-5.9) R1b-P312
    4.4 (3.7-5.7) R1b-U152

    Neither of the above sets of TMRCAs can be considered secure, but I think the first approach is slightly more likely to give correct values when using those Sardinian data alone.
    Michał,

    What would you estimate the age of M479 to be? There is an R2 Francalacci et al sample. It seems to be far older than R1 counting the number of mutations downstream from M207 say to HG03727 ITU. http://www.yfull.com/tree/R/

    From the Mal'ta sample:

  2. The Following User Says Thank You to parasar For This Useful Post:

     palamede (01-23-2014)

  3. #82
    Registered Users
    Posts
    1,813
    Sex
    Location
    Warsaw, Poland
    Y-DNA (P)
    R1a-L1280>FGC41205
    mtDNA (M)
    H2a2(b)
    Y-DNA (M)
    R1a-L1029>YP517
    mtDNA (P)
    H5a2

    Poland European Union
    Quote Originally Posted by parasar View Post
    Michał,
    What would you estimate the age of M479 to be? There is an R2 Francalacci et al sample.
    I don’t have enough data to do so. This would require comparing the data for at least two fully sequenced people representing different sublineages of M479 (i.e. R2a and R2b).

    Quote Originally Posted by parasar View Post
    It seems to be far older than R1 counting the number of mutations downstream from M207 say to HG03727 ITU. http://www.yfull.com/tree/R/
    I am not sure what you mean by mutations downstream from M207 to HG03727 ITU? Are you saying that the number of mutations downstream of M207 in sample HG03727 is significantly larger than a number of mutations downstream of M420 in any R1a member? Have you seen such data? I am not aware of any R1a* member who would be fully sequenced, and as long as we lack this kind of data, it is impossible to provide any SNP-based calculations for the TMRCA of R1a.

  4. #83
    Is there such a thing as R2b?

  5. #84
    Registered Users
    Posts
    1,813
    Sex
    Location
    Warsaw, Poland
    Y-DNA (P)
    R1a-L1280>FGC41205
    mtDNA (M)
    H2a2(b)
    Y-DNA (M)
    R1a-L1029>YP517
    mtDNA (P)
    H5a2

    Poland European Union
    Quote Originally Posted by newtoboard View Post
    Is there such a thing as R2b?
    Not yet, so I should indeed have written R2* and R2a instead.

  6. #85
    Registered Users
    Posts
    7,873
    Sex
    Omitted
    Y-DNA (P)
    L21
    mtDNA (M)
    H

    It certainly looks that SNP counting on modern subjects and just a modest amount done on radiocarbondated ancient subjects could really resolve the dating issues for yDNA in the near future. It looks like the method is in place and its just a case of doing it now.

  7. #86
    Registered Users
    Posts
    1,813
    Sex
    Location
    Warsaw, Poland
    Y-DNA (P)
    R1a-L1280>FGC41205
    mtDNA (M)
    H2a2(b)
    Y-DNA (M)
    R1a-L1029>YP517
    mtDNA (P)
    H5a2

    Poland European Union
    Using the full Y-DNA sequencing data from the recent Ashkenazi-Levites paper by Rootsi et al., I've done some calculations that should allow us to compare the age of some major subclades of R1a and R1b. I have used a slightly modified mutation rate (0.66 x 10^-9 per nucleotide per year, instead of 0.7 x 10^-9 per nucleotide per year), mostly because this new value is not only the exact average of the rates calculated by Francalacci (0.53), Poznik (0.82) and Mendez (0.62), but also because it was suggested by another forumer James Dow Allen that this new rate 0.66 (which corresponds to 165 years per each mutation found in the 8.97 Mb Y-DNA sequence) is consistent with the data provided by both Raghavan (the Mal'ta paper) and Francalacci (the Sardinian paper).

    Below please find my new TMRCA estimates (in ky). The values shown in the parentheses correspond to my two previous sets of estimates that were based on the Sardinian data. For all clades below the R1a-Z645 and R1b-L23 levels (where only some relatively small numbers of individuals were available), the estimates were based on taking into account both the average number of mutations downstream of a given branching point and the number of mutations separating this particular branching point from an upstream node.

    38.8 (40.2, 38.5) haplogroup P
    32.2 (33,5, 31.8) haplogroup R
    26.9 (27.6, 25.9) haplogroup R1
    8.0 (8.3, 6.6) R1b-L23
    6.7 (7.4, 5.6) R1b-Z2103
    6.1 (6.6, 4.8) R1b-L11
    6.0 R1a-Z645
    5.7 R1a-Z93
    5.4 R1a-Z94
    5.3 R1a-Z282
    5.2 R1a-Z2123
    5.2 R1a-Z2122
    4.3 R1a-L657
    3.9 R1a-M582

  8. The Following 3 Users Say Thank You to Michał For This Useful Post:

     Belgae (02-28-2014),  jamesdowallen (03-26-2014),  Jean M (01-07-2014)

  9. #87
    Registered Users
    Posts
    7,873
    Sex
    Omitted
    Y-DNA (P)
    L21
    mtDNA (M)
    H

    I have to say that P, R and R1 admirably fit the dates of three main divisions of the upper palaeolithic in Siberia with P to R matching the apparent first intrusion of modern humans (apparently from the Levant via Iran and central Asia), R to R1 matching the middle upper palaeolithic and R1 matching the late upper palaeolithic.


    Quote Originally Posted by Michał View Post
    Using the full Y-DNA sequencing data from the recent Ashkenazi-Levites paper by Rootsi et al., I've done some calculations that should allow us to compare the age of some major subclades of R1a and R1b. I have used a slightly modified mutation rate (0.66 x 10^-9 per nucleotide per year, instead of 0.7 x 10^-9 per nucleotide per year), mostly because this new value is not only the exact average of the rates calculated by Francalacci (0.53), Poznik (0.82) and Mendez (0.62), but also because it was suggested by another forumer James Dow Allen that this new rate 0.66 (which corresponds to 165 years per each mutation found in the 8.97 Mb Y-DNA sequence) is consistent with the data provided by both Raghavan (the Mal'ta paper) and Francalacci (the Sardinian paper).

    Below please find my new TMRCA estimates (in ky). The values shown in the parentheses correspond to my two previous sets of estimates that were based on the Sardinian data. For all clades below the R1a-Z645 and R1b-L23 levels (where only some relatively small numbers of individuals were available), the estimates were based on taking into account both the average number of mutations downstream of a given branching point and the number of mutations separating this particular branching point from an upstream node.

    38.8 (40.2, 38.5) haplogroup P
    32.2 (33,5, 31.8) haplogroup R
    26.9 (27.6, 25.9) haplogroup R1
    8.0 (8.3, 6.6) R1b-L23
    6.7 (7.4, 5.6) R1b-Z2103
    6.1 (6.6, 4.8) R1b-L11
    6.0 R1a-Z645
    5.7 R1a-Z93
    5.4 R1a-Z94
    5.3 R1a-Z282
    5.2 R1a-Z2123
    5.2 R1a-Z2122
    4.3 R1a-L657
    3.9 R1a-M582

  10. #88
    Registered Users
    Posts
    7,873
    Sex
    Omitted
    Y-DNA (P)
    L21
    mtDNA (M)
    H

    Those calculations would suggest that

    1. R1 occurred almost exactly at the time of the best calculations for the start of the LGM.

    2. The age for R1b's main clades of main eastern European/SW Asian clade of c. 4700BC is still way too late to associate with an expansion linked to early farmers.

    3. In Steppe terms the R1b expansion would be in a Sredny Stog sort of timeframe.

    4. The R1a clades seem to centre on what would be Yamnaya times.

    In general its suggestive of a two step model of an R1b expansion c. 4500BC and a little later and an R1a expansion c. 3300BC and a little later. This fits quite well a two wave steppe model with R1b associated with the Suvorovo type groups and R1a mainly linked to Yamnaya. There are of course other options for both but it has a nice fit with the timing of steppe waves into east-central Europe and the Balkans. When you add the apparent association of R1a and b with different IE branches with the R1b-rich groups tending to be linked to most of the earliest branching on the linguistic tree then it does feel like a pretty good fit all things considered.

    Another aspect of this is that if R1b is linked to the pre-Yamnaya Suvorovo type waves west dervived from Sredny Stog groups around the Dnieper then it also makes some kind of sense in terms of the later environments they spread into. I say this because these Suvorovo groups seem to be descended from groups who had a lot more of a farming aspect to their economy and had longstanding links to the Balkans copper trade. Craniology shows that this was not just influence but almost certainly also involved real mixing. So, you could say that that wave had a bit of preparation for their future home in the farming world and would more easily integrate after a little time.

    Conversely, if R1a was, as seems likely, associated with Yamnaya and its predecessors east of the Don then it really was peripheral to farming and not preparared in any way to easily integrate into the farming world. It was probably the first steppe culture, other than maybe some steppe Maykop elements a century before, to take up the wagon. That created both an incredible expansion opportunity within the interfluvial areas of the steppes and steppe-like land in east-central Europe but it also made them a further degree alien in terms of lifestyle with the farming world. This may have primed it for the later pattern that once it invaded the Balkans it remained somewhat aloof and confined to steppe-like land separate from the farmers.

    In such a scenario of chronological and socio-economic differences the contrast in distribution between R1a and b seems understandable and the differences are foreshadowed in their earlier history on the steppes.

    Quote Originally Posted by Michał View Post
    Using the full Y-DNA sequencing data from the recent Ashkenazi-Levites paper by Rootsi et al., I've done some calculations that should allow us to compare the age of some major subclades of R1a and R1b. I have used a slightly modified mutation rate (0.66 x 10^-9 per nucleotide per year, instead of 0.7 x 10^-9 per nucleotide per year), mostly because this new value is not only the exact average of the rates calculated by Francalacci (0.53), Poznik (0.82) and Mendez (0.62), but also because it was suggested by another forumer James Dow Allen that this new rate 0.66 (which corresponds to 165 years per each mutation found in the 8.97 Mb Y-DNA sequence) is consistent with the data provided by both Raghavan (the Mal'ta paper) and Francalacci (the Sardinian paper).

    Below please find my new TMRCA estimates (in ky). The values shown in the parentheses correspond to my two previous sets of estimates that were based on the Sardinian data. For all clades below the R1a-Z645 and R1b-L23 levels (where only some relatively small numbers of individuals were available), the estimates were based on taking into account both the average number of mutations downstream of a given branching point and the number of mutations separating this particular branching point from an upstream node.

    38.8 (40.2, 38.5) haplogroup P
    32.2 (33,5, 31.8) haplogroup R
    26.9 (27.6, 25.9) haplogroup R1
    8.0 (8.3, 6.6) R1b-L23
    6.7 (7.4, 5.6) R1b-Z2103
    6.1 (6.6, 4.8) R1b-L11
    6.0 R1a-Z645
    5.7 R1a-Z93
    5.4 R1a-Z94
    5.3 R1a-Z282
    5.2 R1a-Z2123
    5.2 R1a-Z2122
    4.3 R1a-L657
    3.9 R1a-M582

  11. #89
    Registered Users
    Posts
    1,248
    Sex
    Location
    Florida, USA
    Ethnicity
    Iberian+Canary Islander
    Nationality
    American
    Y-DNA (P)
    R1b-Z279(xM153)
    mtDNA (M)
    L2a1a3c

    United States of America Spain Basque Galicia Portugal Cuba
    Here is a relatively newer study which shows pretty much what I've been saying for a while, that mutation rate isn't linear as Klyosov and the likes claim, and that it varies as a function of the repeat number.

    Empirical Evaluation Reveals Best Fit of a Logistic Mutation Model for Human Y-Chromosomal Microsatellites

    Quote Originally Posted by Jochens-et-al-2011

    The rate of microsatellite mutation is dependent upon both the allele length and the repeat motif, but the exact nature of this relationship is still unknown. We analyzed data on the inheritance of human Y-chromosomal microsatellites in father–son duos, taken from 24 published reports and comprising 15,285 directly observable meioses. At the six microsatellites analyzed (DYS19, DYS389I, DYS390, DYS391, DYS392, and DYS393), a total of 162 mutations were observed. For each locus, we employed a maximum-likelihood approach to evaluate one of several single-step mutation models on the basis of the data. For five of the six loci considered, a novel logistic mutation model was found to provide the best fit according to Akaike’s information criterion. This implies that the mutation probability at the loci increases (nonlinearly) with allele length at a rate that differs between upward and downward mutations. For DYS392, the best fit was provided by a linear model in which upward and downward mutation probabilities increase equally with allele length. This is the first study to empirically compare different microsatellite mutation models in a locus-specific fashion.
    Here is more from the study:

    It is well known that the pattern of microsatellite mutation varies across loci (Kelkar et al. 2008). To our knowledge, however, the present study is the first to systematically compare novel as well as previously described microsatellite mutation models for Y-STRs in a locus-specific fashion. This comparison was made possible by the accumulation of suitable genotype data from 15,285 father–son duos. The major advantage of this type of data is that all father–son relationships had been confirmed by independent genotyping of other markers. In studies using deep pedigree data for mutational analyses, this is typically not the case (Heyer et al. 1997), which renders discrimination between false paternity records and genuine mutations notoriously difficult. Furthermore, by basing our analysis on directly observed mutations (i.e., Y-STR mutations in father–son duos), we avoided the need for additional assumptions about the underlying population dynamics, mating behavior, or selective pressure. This is a clear advantage over studies that sought to investigate microsatellite mutation processes by comparing distantly related genomes (Dieringer and Schlötterer 2003).

    In our analysis, we also avoided the complicating effects of recombination through choosing loci from the male-specific region of the Y chromosome. Although this restriction may at first glance seem to limit the general applicability of our results, it may be surmised that Y-chromosomal and autosomal microsatellite loci obey similar mutation models because they have similar biochemical properties and because replication slippage is responsible for STR mutations in both instances (Heyer et al. 1997; Kayser et al. 2000). This contrasts with minisatellite mutations, where recombination plays a significant role (Buard et al. 2000).

    One caveat of our study is that the loci considered were originally selected for forensic applications because of their high variability. Therefore, we cannot exclude that our parameter estimates are biased toward higher mutation probabilities, but this seems unlikely to affect the general conclusion as to which models are most appropriate for microsatellites.

    As was mentioned before, many statistical models have been proposed for the microsatellite mutation process (Calabrese and Sainudiin 2005). However, only a few of these turned out to be applicable to our data. For example, the model proposed by Kruglyak et al. (1998), which includes point mutations, was not deemed relevant to our study because, with the genotyping systems used in forensics, point mutations are not altering repeat counts (Gusmão et al. 2006). Moreover, we chose to restrict ourselves to one-step models owing to the scarcity of data on multistep mutations (Table 3). The three instances of mutations resulting in a change by 2 repeat units in our data were counted as single-step changes for model-fitting purposes. This concerned two of the six loci considered, for which the models should therefore be interpreted as dichotomizing all possible mutation events into up- and downward mutations, regardless of step size. However, since multistep mutations are very rare, this dichotomization should not affect our conclusions substantially. Nevertheless, should more data on multistep mutations become available in the future, a study of more sophisticated models may become worthwhile.

    Our study was in part inspired by Whittaker et al. (2003), who were the first to suggest the use of maximum likelihood to fit microsatellite mutation models. However, as explained above, their exponential mutation model was not well defined, resulting in unbounded mutation probabilities. This problem does not occur with a logistic model, which to our knowledge has not been investigated before, because in the logistic model mutation probabilities are always bounded by parameter γ. Notably, for small repeat numbers, Whittaker’s model and the logistic model are similar in that both entail an exponential increase in mutation probabilities with increasing repeat number. Qualitatively, the logistic model is also similar to the best models emerging from genome comparisons, e.g., the PL1 model in Sainudiin et al. (2004). The main features in both instances are an allele-length–dependent mutation rate and a confinement to single-step mutations.

    In principle, it would be possible to combine mutation models to obtain better fits. For example, a model with linearly increasing upward mutation probabilities but with downward mutation probabilities according to the logistic model fits our data for DYS390 and DYS391 somewhat better than the logistic model alone (ΔAIC = −21.9 and ΔAIC = −37.7, respectively, cf. Table 5). However, we decided to focus on pure models here to not exacerbate the multiple-testing problem.

    Practical applications of our results are vast because many uses of microsatellite data require estimates of the respective mutation probabilities. The logistic model, which was shown here to provide the best fit to empirical mutation data, is readily applicable to likelihood-based kinship analysis, phylogenetic analysis, and coalescence methods used in population genetics. Our statistical evaluation of mutation models may also contribute to a better understanding of the underlying biological mutation mechanisms. In particular, the fact that the combined models fit the data better than the original ones suggests possible differences between the mechanisms of upward and downward mutation.

    With these applications in mind, gathering of further mutation data, for example, in an international STR mutation database, seems to be warranted. With a growing database, it will become possible to further refine parameter estimates as well as the models themselves.

  12. The Following 2 Users Say Thank You to jeanL For This Useful Post:

     parasar (03-21-2014),  Rathna (03-21-2014)

  13. #90
    Gold Class Member
    Posts
    8,479

    Quote Originally Posted by Michał View Post
    Using the full Y-DNA sequencing data from the recent Ashkenazi-Levites paper by Rootsi et al., I've done some calculations that should allow us to compare the age of some major subclades of R1a and R1b. I have used a slightly modified mutation rate (0.66 x 10^-9 per nucleotide per year, instead of 0.7 x 10^-9 per nucleotide per year), mostly because this new value is not only the exact average of the rates calculated by Francalacci (0.53), Poznik (0.82) and Mendez (0.62), but also because it was suggested by another forumer James Dow Allen that this new rate 0.66 (which corresponds to 165 years per each mutation found in the 8.97 Mb Y-DNA sequence) is consistent with the data provided by both Raghavan (the Mal'ta paper) and Francalacci (the Sardinian paper).

    Below please find my new TMRCA estimates (in ky). The values shown in the parentheses correspond to my two previous sets of estimates that were based on the Sardinian data. For all clades below the R1a-Z645 and R1b-L23 levels (where only some relatively small numbers of individuals were available), the estimates were based on taking into account both the average number of mutations downstream of a given branching point and the number of mutations separating this particular branching point from an upstream node.

    38.8 (40.2, 38.5) haplogroup P
    32.2 (33,5, 31.8) haplogroup R
    26.9 (27.6, 25.9) haplogroup R1
    8.0 (8.3, 6.6) R1b-L23
    6.7 (7.4, 5.6) R1b-Z2103
    6.1 (6.6, 4.8) R1b-L11
    6.0 R1a-Z645
    5.7 R1a-Z93
    5.4 R1a-Z94
    5.3 R1a-Z282
    5.2 R1a-Z2123
    5.2 R1a-Z2122
    4.3 R1a-L657
    3.9 R1a-M582
    Another set of estimates by Michał

    Z280 - 5121 years
    CTS1211- 4770 years
    CTS3402 - 4224 years
    Y33 - 3784 years
    CTS8816 - 3696 years
    Y2900 - 2171 years
    Z93 - 5456 years
    Z94 - 5280 years
    L657 - 4752 years
    Y57 - 2904 years
    http://eng.molgen.org/viewtopic.php?...=1300&start=60

  14. The Following User Says Thank You to parasar For This Useful Post:

     jeanL (04-03-2014)

Page 9 of 32 FirstFirst ... 789101119 ... LastLast

Similar Threads

  1. Replies: 18
    Last Post: 04-13-2020, 12:28 AM
  2. Replies: 2
    Last Post: 11-24-2019, 04:43 PM
  3. Replies: 0
    Last Post: 04-22-2018, 09:02 PM
  4. Replies: 41
    Last Post: 07-28-2017, 06:29 AM
  5. Replies: 77
    Last Post: 09-26-2013, 03:37 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •