Page 11 of 32 FirstFirst ... 91011121321 ... LastLast
Results 101 to 110 of 320

Thread: STR Wars, GDs, TMRCA estimates, Variance, Mutation Rates & SNP counting

  1. #101
    Registered Users
    Posts
    1,813
    Sex
    Location
    Warsaw, Poland
    Y-DNA (P)
    R1a-L1280>FGC41205
    mtDNA (M)
    H2a2(b)
    Y-DNA (M)
    R1a-L1029>YP517
    mtDNA (P)
    H5a2

    Poland European Union
    Quote Originally Posted by seferhabahir View Post
    I'm just using Mark's recent quote "The Experts have deemed from studies that the range is 70-90 years per SNP mutation." I think Michal uses 88 years per SNP. Some would say it's three generations per SNP. Perhaps Mark or Michal or Warwick can elaborate on these ranges or point to specific studies. I'm assuming these reanges are for FGC high reliability SNPs, not Big Y or YFULL.
    I have already provided the basis for my calculations on numerous occasions, but since there are some new data available, let me summarize it again.

    As for the Y-DNA mutation rate (or, more precisely, for the Y-DNA SNP rate), there have been four major papers that have provided some reasonable estimates for this rate. They have been briefly described in some previous posts of mine:
    http://www.anthrogenica.com/showthre...0824#post10824
    http://eng.molgen.org/viewtopic.php?...=1300&start=02

    Based on those four independent estimates, we could securely assume that the true mutation rate is roughly 0.7 (or between 0.6 and 0.8) x 10^-9 per bp per year. Using this well justified assumption, I have produced a series of estimates for some selected haplogroups, including some major subclades of R1b (based on the SNP data provided by the Sardinian paper by Francalacci et al., 2013):
    http://www.anthrogenica.com/showthre...5936#post15936

    Shortly thereafter, it turned out that my estimates are in perfect agreement with the more recently published SNP data for the Siberian radiocarbon dated Mal’ta boy (24 kya, R*):
    http://www.anthrogenica.com/showthre...0838#post20838

    In the meantime, I have switched to a slightly modified mutation rate by replacing the 0.7 rate with 0.66 x 10^-9 per bp per year (though this does not mean that I am strongly convinced that using the 0.7 rate would be inappropriate):
    http://www.anthrogenica.com/showthre...6002#post26002

    More recently, my SNP-based estimates have also been confirmed by Underhill et al., 2014, who dated the R1a/R1b split to about 25 ky and calculated the R1a-Z645 clade to be 5.8 ky old (though the authors were, quite surprisingly, not aware that this is a TMRCA value for R1a-Z645 and not R1a-M417, as they claimed in their paper):
    http://eng.molgen.org/viewtopic.php?...=1300&start=72

    Using the above well-supported (and positively verified) mutation rate of 0.66 x 10^-9 per bp per year, I have estimated that each mutation in the so-called “gold region” (of about 10 Mb) covered by the FTDNA Big Y test should correspond to about 151 years:
    http://eng.molgen.org/viewtopic.php?...=1300&start=60

    At the same time, I have also suspected that the number of years per each BigY-tested mutation should be slightly lower than 151 (because the range of Big Y is actually slightly larger than 10 Mb). On the other hand, the higher number of years per mutation (about 180) should be used in all those cases where the SNP count is based only on the number of reliable HQ variants reported by FTDNA (and not on an SNP count produced by some additional analysis of the vcf files, or by the analysis of the Big Y BAM files at YFull or FGC), which has been, unfortunately, wrongly interpreted as my suggestion that each BigY-tested SNP corresponds to about 180 years:
    http://eng.molgen.org/viewtopic.php?...=1494&start=03

    Since the Y-DNA region covered by the Full Genome Corp (FGC) test is much harder to define than in the case of Big Y, and it seems certain that some less reliable regions additionally included into the FGC test do not produce reliable SNPs at the same rate as the “standard” 8-10 Mb region used in most research studies, one way to estimate the number of years per each FGC-tested SNP marker is to calculate the percentage of the FGC-tested SNPs that are covered by Big Y (see the link shown above). I have initially assumed (based on some R1a-based calculations) that the Big Y test covers approximately 58% of all SNPs tested by FGC, which would correspond to 88 years per each FGC-tested SNP (assuming 151 years/SNP for Big Y), although now it seems that 60% would be more appropriate (which would then correspond to 91 years/SNP for FGC).

    Using the above SNP rate and the huge collection of Big Y results from the R1b-U106 project, I have calculated a series of provisional SNP-based estimates for different subclades of R1b-U106. These estimates strongly indicated that R1b-U106 is much older than suggested based on some previous STR-based estimates:
    http://www.anthrogenica.com/showthre...-and-subclades

    Most recently, Iain McDonald from the R1b-U106 FTDNA project has used the Big Y results for four relatively large and deeply rooted families (three R1b and one R1a) with known genealogies to estimate that each reliable BigY-tested SNP (i.e. an SNP mutation verified by the analysis of the vcf files) corresponds to about 140 years, with a 95.5% confidence level of 104-197 years/SNP. Although the margin of error is still very large, it is easy to notice that this is fully consistent with the Y-DNA SNP mutation rates I was using so far (be it 0.66 or 0.7 x 10^-9 per bp per year), while negatively verifying all much lower (<0.5) or much higher (>0.9) mutation rates that were sometimes suggested.

    Assuming that the above 140 years per each BigY-tested mutation is more or less correct (which should be considered very likely in the view of all the data discussed above), this would also indicate that each FGC-tested mutation corresponds to about 84 years (or 81 years when assuming the that Big Y covers only 58% (and not 60%) of the FGC-tested SNPs).

    To summarize, before the above estimates are further refined based on investigating more families with known genealogies, or based on some new radiocarbon-dated archaeological remains, I would recommend using the above number of 84 years (or the 81-91 range) for each reliable FGC-tested SNP, and 140-150 years for each relaible BigY-tested SNP. And since I know that people frequently use such estimates to calculate the age of a single lineage, I would like to remind all of you that only by testing multiple independent lineages descending from a common ancestor (and calculating the average number of SNPs) one may get a fairy reliable TMRCA estimate. Also, when calculating the age of a specific clade, it is always good to compare it with the age of some sister clades, as it is always possible that a substantially decreased or increased number of mutations at the root of a given clade (due to some random fluctuations) may significantly affect such TMRCA calculation.

  2. The Following 11 Users Say Thank You to Michał For This Useful Post:

     alan (05-24-2014),  Brent.B (08-19-2014),  Heber (05-25-2014),  jdean (05-25-2014),  lgmayka (05-24-2014),  mcg11 (05-25-2014),  MitchellSince1893 (05-25-2014),  MJost (05-25-2014),  RGM (06-20-2016),  seferhabahir (05-26-2014),  VinceT (05-26-2014)

  3. #102
    Registered Users
    Posts
    7,873
    Sex
    Omitted
    Y-DNA (P)
    L21
    mtDNA (M)
    H

    Thanks that is a very useful post for numerically challenged people like myself.

    Quote Originally Posted by Michał View Post
    I have already provided the basis for my calculations on numerous occasions, but since there are some new data available, let me summarize it again.

    As for the Y-DNA mutation rate (or, more precisely, for the Y-DNA SNP rate), there have been four major papers that have provided some reasonable estimates for this rate. They have been briefly described in some previous posts of mine:
    http://www.anthrogenica.com/showthre...0824#post10824
    http://eng.molgen.org/viewtopic.php?...=1300&start=02

    Based on those four independent estimates, we could securely assume that the true mutation rate is roughly 0.7 (or between 0.6 and 0.8) x 10^-9 per bp per year. Using this well justified assumption, I have produced a series of estimates for some selected haplogroups, including some major subclades of R1b (based on the SNP data provided by the Sardinian paper by Francalacci et al., 2013):
    http://www.anthrogenica.com/showthre...5936#post15936

    Shortly thereafter, it turned out that my estimates are in perfect agreement with the more recently published SNP data for the Siberian radiocarbon dated Mal’ta boy (24 kya, R*):
    http://www.anthrogenica.com/showthre...0838#post20838

    In the meantime, I have switched to a slightly modified mutation rate by replacing the 0.7 rate with 0.66 x 10^-9 per bp per year (though this does not mean that I am strongly convinced that using the 0.7 rate would be inappropriate):
    http://www.anthrogenica.com/showthre...6002#post26002

    More recently, my SNP-based estimates have also been confirmed by Underhill et al., 2014, who dated the R1a/R1b split to about 25 ky and calculated the R1a-Z645 clade to be 5.8 ky old (though the authors were, quite surprisingly, not aware that this is a TMRCA value for R1a-Z645 and not R1a-M417, as they claimed in their paper):
    http://eng.molgen.org/viewtopic.php?...=1300&start=72

    Using the above well-supported (and positively verified) mutation rate of 0.66 x 10^-9 per bp per year, I have estimated that each mutation in the so-called “gold region” (of about 10 Mb) covered by the FTDNA Big Y test should correspond to about 151 years:
    http://eng.molgen.org/viewtopic.php?...=1300&start=60

    At the same time, I have also suspected that the number of years per each BigY-tested mutation should be slightly lower than 151 (because the range of Big Y is actually slightly larger than 10 Mb). On the other hand, the higher number of years per mutation (about 180) should be used in all those cases where the SNP count is based only on the number of reliable HQ variants reported by FTDNA (and not on an SNP count produced by some additional analysis of the vcf files, or by the analysis of the Big Y BAM files at YFull or FGC), which has been, unfortunately, wrongly interpreted as my suggestion that each BigY-tested SNP corresponds to about 180 years:
    http://eng.molgen.org/viewtopic.php?...=1494&start=03

    Since the Y-DNA region covered by the Full Genome Corp (FGC) test is much harder to define than in the case of Big Y, and it seems certain that some less reliable regions additionally included into the FGC test do not produce reliable SNPs at the same rate as the “standard” 8-10 Mb region used in most research studies, one way to estimate the number of years per each FGC-tested SNP marker is to calculate the percentage of the FGC-tested SNPs that are covered by Big Y (see the link shown above). I have initially assumed (based on some R1a-based calculations) that the Big Y test covers approximately 58% of all SNPs tested by FGC, which would correspond to 88 years per each FGC-tested SNP (assuming 151 years/SNP for Big Y), although now it seems that 60% would be more appropriate (which would then correspond to 91 years/SNP for FGC).

    Using the above SNP rate and the huge collection of Big Y results from the R1b-U106 project, I have calculated a series of provisional SNP-based estimates for different subclades of R1b-U106. These estimates strongly indicated that R1b-U106 is much older than suggested based on some previous STR-based estimates:
    http://www.anthrogenica.com/showthre...-and-subclades

    Most recently, Iain McDonald from the R1b-U106 FTDNA project has used the Big Y results for four relatively large and deeply rooted families (three R1b and one R1a) with known genealogies to estimate that each reliable BigY-tested SNP (i.e. an SNP mutation verified by the analysis of the vcf files) corresponds to about 140 years, with a 95.5% confidence level of 104-197 years/SNP. Although the margin of error is still very large, it is easy to notice that this is fully consistent with the Y-DNA SNP mutation rates I was using so far (be it 0.66 or 0.7 x 10^-9 per bp per year), while negatively verifying all much lower (<0.5) or much higher (>0.9) mutation rates that were sometimes suggested.

    Assuming that the above 140 years per each BigY-tested mutation is more or less correct (which should be considered very likely in the view of all the data discussed above), this would also indicate that each FGC-tested mutation corresponds to about 84 years (or 81 years when assuming the that Big Y covers only 58% (and not 60%) of the FGC-tested SNPs).

    To summarize, before the above estimates are further refined based on investigating more families with known genealogies, or based on some new radiocarbon-dated archaeological remains, I would recommend using the above number of 84 years (or the 81-91 range) for each reliable FGC-tested SNP, and 140-150 years for each relaible BigY-tested SNP. And since I know that people frequently use such estimates to calculate the age of a single lineage, I would like to remind all of you that only by testing multiple independent lineages descending from a common ancestor (and calculating the average number of SNPs) one may get a fairy reliable TMRCA estimate. Also, when calculating the age of a specific clade, it is always good to compare it with the age of some sister clades, as it is always possible that a substantially decreased or increased number of mutations at the root of a given clade (due to some random fluctuations) may significantly affect such TMRCA calculation.

  4. #103
    Registered Users
    Posts
    2,138
    Sex
    Location
    UK
    Nationality
    Welsh
    Y-DNA (P)
    R-DF49
    mtDNA (M)
    J1c2e

    European Union
    Quote Originally Posted by alan View Post
    Thanks that is a very useful post for numerically challenged people like myself.
    I second that !!

    Using the 180 yr no. with my results (taking care to remove anything that could be contentious) correlated very well with TMRCA calculations I've done for my cluster using Ken Nordtvedt's 111 loci interclade spreadsheet.

    Also the age of P312 came out at 3600 BC and DF13 2500 BC, which I think falls in line with Alan's thoughts ?

    One query though is how many years per generation were used in this figure, I have trouble with the 25 yrs communally quoted.

  5. #104
    Registered Users
    Posts
    1,813
    Sex
    Location
    Warsaw, Poland
    Y-DNA (P)
    R1a-L1280>FGC41205
    mtDNA (M)
    H2a2(b)
    Y-DNA (M)
    R1a-L1029>YP517
    mtDNA (P)
    H5a2

    Poland European Union
    Quote Originally Posted by jdean View Post
    Using the 180 yr no. with my results (taking care to remove anything that could be contentious) correlated very well with TMRCA calculations I've done for my cluster using Ken Nordtvedt's 111 loci interclade spreadsheet.
    Also the age of P312 came out at 3600 BC and DF13 2500 BC, which I think falls in line with Alan's thoughts ?
    As I have mentioned in my above post, I would use the 180 years/SNP rate only for the Big-tested private SNPs that were “manually” extracted from the list of novel variants reported by FTDNA, as in case you have used any additional analysis of the vcf and/or BAM files, I would rather use the lower number of years per SNP (i.e. 150 or 140). Also, it seems to me that when the known SNPs upstream of the private ones are considered, the number of years per SNP should also be lower than 180, mostly because we are usually more willing to accept a low quality SNP that is shared by other members of our clade (which is not possible for the singletons/private SNPs) and because we frequently assume that we are positive for such shared SNP (even if we get a no-call for it).


    Quote Originally Posted by jdean View Post
    One query though is how many years per generation were used in this figure, I have trouble with the 25 yrs communally quoted.
    Frankly speaking, there was no need to assume any specific generation time in my calculations. This is because the initial estimates by Poznik and Francalacci did not require to assume a specific number of years per generation. Also, the radiocarbon dating provides only the number of years (and not the number of generations).

    Of course, it would be interesting to know which generation time is more appropriate for a given population or for a given time period, but since we will be unable to determine this for any prehistoric period, I wouldn’t pay too much attention to this question when producing the estimates for R1b-P312 or R1b-DF13.

  6. The Following User Says Thank You to Michał For This Useful Post:

     jdean (05-25-2014)

  7. #105
    Registered Users
    Posts
    2,138
    Sex
    Location
    UK
    Nationality
    Welsh
    Y-DNA (P)
    R-DF49
    mtDNA (M)
    J1c2e

    European Union
    Thanks Michal, the SNPs were extracted from my Big Y variance file.

    Really I ought do this for a few more DF49 kits but the process of removing the stragglers not picked up by my kill list is still quite time consuming and I'm feeling lazy today : )

  8. #106
    Registered Users
    Posts
    376
    Sex
    Location
    USA
    Ethnicity
    Northern Europe
    Nationality
    USA
    Y-DNA (P)
    R-FGC5301 or R-A197
    mtDNA (M)
    T1a1

    United States of America Scotland England North of England Norway England
    Quote Originally Posted by Michał View Post
    I have already provided the basis for my calculations on numerous occasions, but since there are some new data available, let me summarize it again.

    As for the Y-DNA mutation rate (or, more precisely, for the Y-DNA SNP rate), there have been four major papers that have provided some reasonable estimates for this rate. They have been briefly described in some previous posts of mine:
    http://www.anthrogenica.com/showthre...0824#post10824
    http://eng.molgen.org/viewtopic.php?...=1300&start=02

    Based on those four independent estimates, we could securely assume that the true mutation rate is roughly 0.7 (or between 0.6 and 0.8) x 10^-9 per bp per year. Using this well justified assumption, I have produced a series of estimates for some selected haplogroups, including some major subclades of R1b (based on the SNP data provided by the Sardinian paper by Francalacci et al., 2013):
    http://www.anthrogenica.com/showthre...5936#post15936

    Shortly thereafter, it turned out that my estimates are in perfect agreement with the more recently published SNP data for the Siberian radiocarbon dated Mal’ta boy (24 kya, R*):
    http://www.anthrogenica.com/showthre...0838#post20838

    In the meantime, I have switched to a slightly modified mutation rate by replacing the 0.7 rate with 0.66 x 10^-9 per bp per year (though this does not mean that I am strongly convinced that using the 0.7 rate would be inappropriate):
    http://www.anthrogenica.com/showthre...6002#post26002

    More recently, my SNP-based estimates have also been confirmed by Underhill et al., 2014, who dated the R1a/R1b split to about 25 ky and calculated the R1a-Z645 clade to be 5.8 ky old (though the authors were, quite surprisingly, not aware that this is a TMRCA value for R1a-Z645 and not R1a-M417, as they claimed in their paper):
    http://eng.molgen.org/viewtopic.php?...=1300&start=72

    Using the above well-supported (and positively verified) mutation rate of 0.66 x 10^-9 per bp per year, I have estimated that each mutation in the so-called “gold region” (of about 10 Mb) covered by the FTDNA Big Y test should correspond to about 151 years:
    http://eng.molgen.org/viewtopic.php?...=1300&start=60

    At the same time, I have also suspected that the number of years per each BigY-tested mutation should be slightly lower than 151 (because the range of Big Y is actually slightly larger than 10 Mb). On the other hand, the higher number of years per mutation (about 180) should be used in all those cases where the SNP count is based only on the number of reliable HQ variants reported by FTDNA (and not on an SNP count produced by some additional analysis of the vcf files, or by the analysis of the Big Y BAM files at YFull or FGC), which has been, unfortunately, wrongly interpreted as my suggestion that each BigY-tested SNP corresponds to about 180 years:
    http://eng.molgen.org/viewtopic.php?...=1494&start=03

    Since the Y-DNA region covered by the Full Genome Corp (FGC) test is much harder to define than in the case of Big Y, and it seems certain that some less reliable regions additionally included into the FGC test do not produce reliable SNPs at the same rate as the “standard” 8-10 Mb region used in most research studies, one way to estimate the number of years per each FGC-tested SNP marker is to calculate the percentage of the FGC-tested SNPs that are covered by Big Y (see the link shown above). I have initially assumed (based on some R1a-based calculations) that the Big Y test covers approximately 58% of all SNPs tested by FGC, which would correspond to 88 years per each FGC-tested SNP (assuming 151 years/SNP for Big Y), although now it seems that 60% would be more appropriate (which would then correspond to 91 years/SNP for FGC).

    Using the above SNP rate and the huge collection of Big Y results from the R1b-U106 project, I have calculated a series of provisional SNP-based estimates for different subclades of R1b-U106. These estimates strongly indicated that R1b-U106 is much older than suggested based on some previous STR-based estimates:
    http://www.anthrogenica.com/showthre...-and-subclades

    Most recently, Iain McDonald from the R1b-U106 FTDNA project has used the Big Y results for four relatively large and deeply rooted families (three R1b and one R1a) with known genealogies to estimate that each reliable BigY-tested SNP (i.e. an SNP mutation verified by the analysis of the vcf files) corresponds to about 140 years, with a 95.5% confidence level of 104-197 years/SNP. Although the margin of error is still very large, it is easy to notice that this is fully consistent with the Y-DNA SNP mutation rates I was using so far (be it 0.66 or 0.7 x 10^-9 per bp per year), while negatively verifying all much lower (<0.5) or much higher (>0.9) mutation rates that were sometimes suggested.

    Assuming that the above 140 years per each BigY-tested mutation is more or less correct (which should be considered very likely in the view of all the data discussed above), this would also indicate that each FGC-tested mutation corresponds to about 84 years (or 81 years when assuming the that Big Y covers only 58% (and not 60%) of the FGC-tested SNPs).

    To summarize, before the above estimates are further refined based on investigating more families with known genealogies, or based on some new radiocarbon-dated archaeological remains, I would recommend using the above number of 84 years (or the 81-91 range) for each reliable FGC-tested SNP, and 140-150 years for each relaible BigY-tested SNP. And since I know that people frequently use such estimates to calculate the age of a single lineage, I would like to remind all of you that only by testing multiple independent lineages descending from a common ancestor (and calculating the average number of SNPs) one may get a fairy reliable TMRCA estimate. Also, when calculating the age of a specific clade, it is always good to compare it with the age of some sister clades, as it is always possible that a substantially decreased or increased number of mutations at the root of a given clade (due to some random fluctuations) may significantly affect such TMRCA calculation.
    Thanks! I am learning a lot from your summary. One thing though for myself (R-U152-L2*) having tested both at FGC BGI amd FTDNA Big Y and having NGC [Edit: this should be FGC] analyze both of them, Big Y only covered 15/44 or 34% of valid Private SNPs. BIG Y did not report INDELs but they could be found in the VCF file and BIG Y found 1/6 or 17% of INDELs. I verified this was due to coverage differences as the NGC BGI mutations found were all outside of the FTDNA BED file. This coverage difference may depend on haplogroup or person as the BIG Y coverage may be be biased towards previously tested with known SNP regions.
    Last edited by haleaton; 05-25-2014 at 11:45 PM.

  9. The Following 2 Users Say Thank You to haleaton For This Useful Post:

     Kwheaton (07-02-2014),  Michał (05-25-2014)

  10. #107
    Registered Users
    Posts
    1,813
    Sex
    Location
    Warsaw, Poland
    Y-DNA (P)
    R1a-L1280>FGC41205
    mtDNA (M)
    H2a2(b)
    Y-DNA (M)
    R1a-L1029>YP517
    mtDNA (P)
    H5a2

    Poland European Union
    Quote Originally Posted by haleaton View Post
    One thing though for myself (R-U152-L2*) having tested both at FGC BGI amd FTDNA Big Y and having NGC analyze both of them
    What is NGC? (Did you mean FGC?)

    Quote Originally Posted by haleaton View Post
    Big Y only covered 15/44 or 34% of valid Private SNPs.
    What about the appropriate numbers for all your SNPs downstream of P312?

    Also, are you sure that all of those FGC-tested "private" SNPs are indeed downstream of all non-private SNPs detected by Big Y (and not at the same level as some of the BigY-tested SNPs from the level just upstream of the "private level")? I am not saying that this is indeed your case, but such situation may happen when the number of BigY-tested people is much larger than a number of the FGC-tested people from a given subclade.

    Quote Originally Posted by haleaton View Post
    BIG Y did not report INDELs but they could be found in the VCF file and BIG Y found 1/6 or 17% of INDELs. I verified this was due to coverage differences as the NGC BGI mutations found were all outside of the FTDNA BED file.
    On the other hand, many of the "high quality" (i.e. no asterisk or one asterisk) INDELS reported by FGC should be considered as not reliable (i.e. they have no phylogenetic value), at least this is suggested by the YFull analysis of the FGC BAM files I've seen, while many of the BigY-tested INDELs that are not included in the VCF file can be extracted from the Big Y BAM files (for example at YFull).

    The major disadvantage of Big Y is not the lower percentage of the chromosome Y covered by the test (which is of course compensated by the lower price) but the fact that FTDNA does not provide any appropriate tool to interpret the raw data.


    Quote Originally Posted by haleaton View Post
    This coverage difference may depend on haplogroup or person as the BIG Y coverage may be be biased towards previously tested with known SNP regions.
    Do you know any data that would strongly indicate that the BigY/FGC ratio (for all reliable SNPs, including the private and non-private ones) is higher for most well studied haplogroups, like R1b and G2a, than for the less studied ones, like T or L?

    I can imagine that the well studied haplogroups would show much lower (than average) BigY/FGC ratio for "private" SNPs but this should be compensated but the much higher (than average) BigY/FGC ratio for the non-private SNPs (and we should keep in mind that the ratio of private to non-private SNPs should be much lower in the well studied haplogroups).

  11. The Following 5 Users Say Thank You to Michał For This Useful Post:

     AJL (05-25-2014),  haleaton (05-26-2014),  jdean (05-25-2014),  leonardo (05-25-2014),  lgmayka (05-26-2014)

  12. #108
    Gold Class Member
    Posts
    347
    Sex
    Location
    Ohio
    Ethnicity
    German
    Nationality
    Galactic Empire
    Y-DNA (P)
    R-Z324, Z5055, L188+
    mtDNA (M)
    H5

    Quote Originally Posted by Michał View Post
    What is NGC? (Did you mean FGC?)

    On the other hand, many of the "high quality" (i.e. no asterisk or one asterisk) INDELS reported by FGC should be considered as not reliable (i.e. they have no phylogenetic value), at least this is suggested by the YFull analysis of the FGC BAM files I've seen, while many of the BigY-tested INDELs that are not included in the VCF file can be extracted from the Big Y BAM files (for example at YFull).
    The lack of phylogenetic value does not imply that an INDEL or SNP is "not reliable." Are all equivalent SNPs considered to be "unreliable?" Reliability should be associated with whether a mutation is highly recurrent. A more meaningful statement would be that with with current limited test results one cannot establish if a particular INDEL has phylogenetic value. Under R1b-U106 we have a new INDEL that is present in 3 Big-Y and 2 Full-Y samples yet is not present in 4 other Big-Y results. This novel INDEL currently has no equivalent SNPs and seems to define a new intermediate level haplogroup.

  13. The Following User Says Thank You to Cofgene For This Useful Post:

     haleaton (05-26-2014)

  14. #109
    Registered Users
    Posts
    376
    Sex
    Location
    USA
    Ethnicity
    Northern Europe
    Nationality
    USA
    Y-DNA (P)
    R-FGC5301 or R-A197
    mtDNA (M)
    T1a1

    United States of America Scotland England North of England Norway England
    Quote Originally Posted by Michał View Post
    What is NGC? (Did you mean FGC?)
    I corrected my typo, I meant FGC - Full Genomes Corporation.

    Quote Originally Posted by Michał View Post
    What about the appropriate numbers for all your SNPs downstream of P312?

    Also, are you sure that all of those FGC-tested "private" SNPs are indeed downstream of all non-private SNPs detected by Big Y (and not at the same level as some of the BigY-tested SNPs from the level just upstream of the "private level")? I am not saying that this is indeed your case, but such situation may happen when the number of BigY-tested people is much larger than a number of the FGC-tested people from a given subclade.
    Myself, I have not done a study between P312 and U152/L2, but so far others (FGC, YFULL) have not reported anything new, but I should check the very few "no calls in FGC BGI data. There were 83 no calls for Y-Full's known SNPs. Big Y does have a problem of not of covering (per BED) many important L2 itself and subclade defining SNPs such as L2 and Z49, but I have relied on multiple other tests and checked every SNP relevant that I could find in public record. I had a previous post on just how bad Big Y was in my particular L2* case. In some cases, such as L2, data sufficient by FGC or YFull exists in the raw FTDNA BAM data, but gets excluded by the BED file and not reported by FTDNA Big Y as positive.

    FGC analysis finds everything that FTNDA does, but Big Y NGS data is a subset and the quality valuations can differ. FTDNA does not compare against public data sets from other labs such as 1K Genomes, but FGC and YFull does. Novel Variants called out by FTDNA Big Y that are shared by these samples were not considered to be Private to me. Multiple comparisons between U152+ and L2+ data samples have been made and I have been tested negative for all known branches below L2+ including all those new ones on the FTDNA tree that are from GENO 2.0 results but have not been found in public data sets or corroborating proof from GENO 2.0 provided.

    Quote Originally Posted by Michał View Post
    On the other hand, many of the "high quality" (i.e. no asterisk or one asterisk) INDELS reported by FGC should be considered as not reliable (i.e. they have no phylogenetic value), at least this is suggested by the YFull analysis of the FGC BAM files I've seen, while many of the BigY-tested INDELs that are not included in the VCF file can be extracted from the Big Y BAM files (for example at YFull).

    The major disadvantage of Big Y is not the lower percentage of the chromosome Y covered by the test (which is of course compensated by the lower price) but the fact that FTDNA does not provide any appropriate tool to interpret the raw data.
    I did not find the YFull did much with reporting on complex INDELs found in either my FGC BGI & FTNDA Big Y BAMs. The single (1/6) INDEL that was in Big Y's coverage was marked as "PASS" in the VCF file but was not reported as a Novel Variant.

    Since I am a heavily tested L2*, I cannot say anything about phylogenetic value of INDELs. I am interested to learn about mutation rate models for INDELS as well as the chemical mechanism and if cosmic rays play any role with folks migrating to high altitude during warm periods in the Alps (just to veer a bit off-topic).

    The issue with Big Y is the time wasted and expense in individual orders in what is not covered to compare with others. Advantage is they have more people to compare with today. I still think FTDNA delivered everything and more to what they said they would do with Big Y. Raw data was the key.

    YFull is a great product, but currently they cannot handle two BAM files from the same person, so I was not able to compare.

    Advantage of FGC BGI is they are much much closer to full coverage. I don't see how Big Y can be that precise just counting total Private SNPS, seems to add a millennium or two of error. I am an not an expert in any of this and am learning from you all.

    Quote Originally Posted by Michał View Post

    Do you know any data that would strongly indicate that the BigY/FGC ratio (for all reliable SNPs, including the private and non-private ones) is higher for most well studied haplogroups, like R1b and G2a, than for the less studied ones, like T or L?

    I can imagine that the well studied haplogroups would show much lower (than average) BigY/FGC ratio for "private" SNPs but this should be compensated but the much higher (than average) BigY/FGC ratio for the non-private SNPs (and we should keep in mind that the ratio of private to non-private SNPs should be much lower in the well studied haplogroups).
    No this is just speculation based on my L2* case and my new private SNPs being found in regions Big Y does not cover, that Walk through the Y did not cover, and the talk that Big Y could replace Deep Clade. GENO 2.0 replaces Deep Clade but just for the FTDNA halpotree by self-reference.

    Looking through YBrowse it is interesting how FGC named SNPs are filling in across the Y. Within R1b I speculate that those that truly have their "*" in an early terminal SNP through multiple tests, even following Big Y, have that because of coverage and should consider Full Genomes Corporation testing.

  15. The Following 2 Users Say Thank You to haleaton For This Useful Post:

     Kwheaton (07-02-2014),  Michał (05-26-2014)

  16. #110
    Registered Users
    Posts
    1,813
    Sex
    Location
    Warsaw, Poland
    Y-DNA (P)
    R1a-L1280>FGC41205
    mtDNA (M)
    H2a2(b)
    Y-DNA (M)
    R1a-L1029>YP517
    mtDNA (P)
    H5a2

    Poland European Union
    Quote Originally Posted by Cofgene View Post
    The lack of phylogenetic value does not imply that an INDEL or SNP is "not reliable." Are all equivalent SNPs considered to be "unreliable?"
    Firstly, you seem to use the term „phylogenetic value” in a very narrow sense (that was never suggested in my post), by excluding the equivalent mutations (both SNPs and INDEls) from a group of phylogenetically useful mutations. I can only say that I consider all equivalent mutations (as long as they are relatively reliable/stable) to be very useful from the phyologenetic point of view.

    Secondly, even when using this very narrow definition of a phylogenetic value, I have not suggested that the mutations with no phylogenetic value are not reliable but rather that the unreliable mutations have no phylogenetic value, which in this particular case is quite a difference.



    Quote Originally Posted by Cofgene View Post
    Reliability should be associated with whether a mutation is highly recurrent. A more meaningful statement would be that with with current limited test results one cannot establish if a particular INDEL has phylogenetic value. Under R1b-U106 we have a new INDEL that is present in 3 Big-Y and 2 Full-Y samples yet is not present in 4 other Big-Y results. This novel INDEL currently has no equivalent SNPs and seems to define a new intermediate level haplogroup.
    Agreed. I have never stated that all INDELs should be considered not reliable from the phylogenetic point of view, only that surprisingly many “good quality” INDELs from the FGC reports seem to be considered unreliable by the YFull specialists (and I must admit that I trust their opinion in this respect, after being corrected myself on several occasions).

Page 11 of 32 FirstFirst ... 91011121321 ... LastLast

Similar Threads

  1. Replies: 18
    Last Post: 04-13-2020, 12:28 AM
  2. Replies: 2
    Last Post: 11-24-2019, 04:43 PM
  3. Replies: 0
    Last Post: 04-22-2018, 09:02 PM
  4. Replies: 41
    Last Post: 07-28-2017, 06:29 AM
  5. Replies: 77
    Last Post: 09-26-2013, 03:37 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •