PDA

View Full Version : YFull's New Rate Constant for Y-Chromosome SNPs based on Full Sequence Data



seferhabahir
03-21-2015, 01:36 AM
I didn't see this link this posted here yet. If it was, sorry for duplication

http://rjgg.molgen.org/index.php/RJGGRE/article/view/151/175

in the Russian Journal of Genetic Genealogy

Abstract

Two important advances: 1) the accumulation of BigY and FGC test data, and 2) the publication of Y-chromosome sequences for three ancient samples (Anzick-1, Ust-Ishim, and K14), have made it possible to estimate the average rate of base substitutions (SNPs). The authors of this study have developed a new method of selecting true mutations in modern and ancient samples, and have defined with high accuracy the rate constant of SNP mutations...

[Edit: I see the link was already posted (sorry) with discussion at http://www.anthrogenica.com/showthread.php?2573-New-DNA-Papers-General-Discussion-Thread/page87]

MJost
03-28-2015, 09:41 PM
I just checked out this poster who created a nice, no it's an "Outstanding" spreadsheet that:

"3. The worksheet will strip out the number portion into Column C and count which SNPs are within YFull's defined CombBED region (Adamov et al, 2015). It will also calculate the age of this block of SNPs using the coefficients from the Adamov paper."


http://www.anthrogenica.com/showthread.php?4067-Age-of-Z251-Based-on-YFull-SNP-Rate-Constants&p=76468&viewfull=1#post76468

I ran my list of Full Genome DF13>FGC5494 27 Sanger Sequence confirmed SNPs resulted in 24 in the CombinedBed regions. It reports my SNPs have a 160.5 years per SNP under DF13.

Interestingly, I just recounted YFull newest Experimental YTree v3.7 3/28/2015 and then recalibrated Mal'ti boy to BB I0806 Calibrated dates. With a span of 19794 years and 126 SNPs from MA1 (24kya) to the end of P312 (4206 ybp), equated to 157.1 years per SNP. Only three years difference with my 23 Cbed SNPs. MJost



Coefficients for Age Calculations























Rate Constant of SNP Mutations:
8.2E-10

95% CI Min:
7.00E-10










95% CI Max:
9.40E-10



















Tested CombBED Length:
7.60E+06
bp
Or enter exact value from YFull BAM Report

















Resulting SNP Mutation Rate:
160.46
years per SNP mutation









(High of 187.97 years, Low of 139.98 years)


























Phylogenetic Block Age Estimates:











Number of SNPs in CombBED Region:
23






















Calculated Age of Phylogenetic Block:
3690.63
ybp
= 1741BC










95% CI High = 4323.31 ybp =2373BC








95% CI Low = 3219.48 ybp =1269BC

Dave-V
03-29-2015, 03:43 AM
I just checked out this poster...

Mark, thanks for the shout-out.

Bear in mind that the spreadsheet follows the YFull paper and only uses the Rate Constant of SNP Mutation and Tested CombBED length to calculate the years per SNP mutation, so the 160.46 will remain the same unless you change one of those coefficients.

The relevant section of the paper (pg. 76) includes this: "For a more effective selection of actual mutations, we recommend that any research area be within the boundaries of the combBED area. The size of the combBED area in individual BigY samples varies and the average is about 7.6 Mbp. The appropriate conversion factor, therefore, is 160 years per base substitution."

So what the spreadsheet is doing is restricting the SNP list to match their criteria (i.e. only ones in the CompBED region) and then applying their years-per-SNP factor to get an age estimate that presumably matches the methods YFull uses for their SNP tree.

YFull has also just started reporting the actual coverage of the CombBED region for individual tests in their BAM reports (as confirmed on the YFull FB page, it's the "Length Coverage for Age" result under Statistics) and you can replace the 7.6M Tested CompBED Length in the spreadsheet with an individual test's value to adjust the 160.46 "average" to an individual. For instance when I do it for my own Big Y results I get 150 years per SNP.

I'm not saying it's better or worse than any other method at this point... just interesting to have a new tool.

Dave

MitchellSince1893
03-29-2015, 05:25 AM
...YFull has also just started reporting the actual coverage of the CombBED region for individual tests in their BAM reports (as confirmed on the YFull FB page, it's the "Length Coverage for Age" result under Statistics) and you can replace the 7.6M Tested CompBED Length in the spreadsheet with an individual test's value to adjust the 160.46 "average" to an individual. For instance when I do it for my own Big Y results I get 150 years per SNP...Dave

Thanks for the info Dave!

I had asked about this on FB a couple of weeks ago and they mentioned it was coming but wasn't aware it had been added with 3.7.

My specific "Length by coverage age" is 7269144. I replaced the 7600000 with 7269144 and my "Resulting Mutation rate" changed to 167.77 years per SNP mutation.

Very cool

Petr
03-29-2015, 02:59 PM
Just for reference, these are the values of YFull "Length by coverage age" for kits that I manage:
BigY: 7886681, 7558859, 7379769, 7871503, 7788009, 8133687
FGC Elite: 8440802

MJost
03-29-2015, 05:06 PM
Just for reference, these are the values of YFull "Length by coverage age" for kits that I manage:
BigY: 7886681, 7558859, 7379769, 7871503, 7788009, 8133687
FGC Elite: 8440802


Looking at the Tested CombBED Length averages

N=6 Average:7769751
N=5 w/o 8133687 average: 7649535

MJost

Dave-V
03-29-2015, 06:34 PM
Just for reference, these are the values of YFull "Length by coverage age" for kits that I manage:
BigY: 7886681, 7558859, 7379769, 7871503, 7788009, 8133687
FGC Elite: 8440802

Makes sense. YFull says the average coverage of their combBED region for Big Y is 7.6M bp, but they don't give an average for FGC Elite. But you'd certainly expect more coverage from FGC Elite and the highest possible is the total length of the CombBED area - 8,473,821 bp. So this one FGC Elite example covers nearly all of that.

VinceT
03-29-2015, 11:07 PM
My stats from the FGC phase 2 pilot, according to YFull:



ChrY BAM file size:
3.04 Gb



Reads (all):
1123134



Mapped reads:
1101618 (98.08%)



Unmapped reads:
21516 (1.92%)



Length coverage:
25069674 bp (97.72%)



Min depth coverage:
1X



Max depth coverage:
7964X



Mean depth coverage:
109.85X



Median depth coverage:
66X



Length coverage for age:
8434148 bp



No call:
583892 bp

Petr
03-30-2015, 08:01 AM
Interesting, my FGC Elite (Batch 12) results are significantly worse:



ChrY BAM file size:
2.26 Gb



Reads (all):
22008729



Mapped reads:
21130290 (96.01%)



Unmapped reads:
878439 (3.99%)



Length coverage:
22959696 bp (89.50%)



Min depth coverage:
1X



Max depth coverage:
7996X



Mean depth coverage:
76.85X



Median depth coverage:
37X



Length coverage for age:
8440802 bp



No call:
2693870 bp



And my latest BigY results (finished on January 28th):



ChrY BAM file size:
0.57 Gb



Reads (all):
9323452



Mapped reads:
9323452 (100.00%)



Unmapped reads:
0



Length coverage:
14722302 bp (57.39%



Min depth coverage:
1X



Max depth coverage:
7999X



Mean depth coverage:
73.97X



Median depth coverage:
54X



Length coverage for age:
8133687 bp



No call:
10931264 bp

MitchellSince1893
03-30-2015, 01:01 PM
Here are my BigY Yfull stats



ChrY BAM file size:
0.51 Gb


Reads (all):
9962312


Mapped reads:
9962312 (100.00%)


Unmapped reads:
0


Length coverage:
13179112 bp (51.37%)


Min depth coverage:
1X


Max depth coverage:
7999X


Mean depth coverage:
66.20X


Median depth coverage:
41X


Length coverage for age:
7269144 bp


No call:
12474454 bp

jbarry6899
03-30-2015, 02:13 PM
Here are mine:


ChrY BAM file size: 0.70 Gb
Reads (all): 12738845
Mapped reads: 12738845 (100.00%)
Unmapped reads: 0
Length coverage: 14399239 bp (56.13%)
Min depth coverage: 1X
Max depth coverage: 7999X
Mean depth coverage: 93.14X
Median depth coverage: 72X
Length coverage for age: 7930475 bp
No call: 11254327 bp

I ran the formula with myself and my two closest matches in the S8183>Y11178 subclade who are separated by a single SNP and got an average of 150 years per SNP. That's roughly equivalent to the difference as calculated using STRs.

R.Rocca
03-30-2015, 03:27 PM
Does anyone know how to get the length of coverage solely by looking at the Full Genomes elite data...another words, non-YFull customers?

Dave-V
03-30-2015, 03:51 PM
Does anyone know how to get the length of coverage solely by looking at the Full Genomes elite data...another words, non-YFull customers?

Outside of the YFull BAM reporting it won't be easy to get the exact number for the CombBED coverage. In theory you could go through any BAM file counting the exact coverage of those 857 regions; it might be possible (although painful) to do it manually with IGV or certainly someone who knows the structure of a BAM file better than I do could automate it.

But that seems to be only a concern for Big Y tests. So far at least the variability of FGC Elite coverage of these regions seems very small and they're all reporting 8.43-8.44M bp. Granted it's a small number of data points so far, but it appears for FGC Elite data you should be within <1% margin of error by using about 8,435,000 as the "Tested CombBED Length" figure in the spreadsheet.

MitchellSince1893
03-30-2015, 04:35 PM
Does anyone know how to get the length of coverage solely by looking at the Full Genomes elite data...another words, non-YFull customers?

Rich, we'd love to have you in the Yfull R-U152 group. The addition of your Full Genome test would also flesh your section of the Ytree.

razyn
03-30-2015, 06:19 PM
Rich, we'd love to have you in the Yfull R-U152 group.

Speaking of U152, has anybody gotten around to informing the YFull guys about Alex W's ZZ SNPs? They place DF27 and U152 together on a branch that diverges from the rest of P312. (Or, one of them does -- ZZ11_1.) I suspect that's going to prove geographically significant, but I haven't had time to mull it over, much -- mainly because new SNPs keep getting found, in the part of the tree I have to work on.

This observation is only pertinent to the discussion (of YFull's mutation rate constant) if it turns out that their rate is valid; then ZZ11_1 will need to be factored in, and dated, like the other SNPs.

And doing that might show us that the brothers DF27 and U152 share their paternal origin about 2,000 miles farther east than most people have been suggesting, and mapping, for the past few years. Or, of course, it might not. Anyway, a little more discussion, for those who missed it and might care, is here: http://www.anthrogenica.com/showthread.php?2375-Updated-U152-Tree&p=66255&viewfull=1#post66255

Webb
03-30-2015, 06:39 PM
Speaking of U152, has anybody gotten around to informing the YFull guys about Alex W's ZZ SNPs? They place DF27 and U152 together on a branch that diverges from the rest of P312. (Or, one of them does -- ZZ11_1.) I suspect that's going to prove geographically significant, but I haven't had time to mull it over, much -- mainly because new SNPs keep getting found, in the part of the tree I have to work on.

This observation is only pertinent to the discussion (of YFull's mutation rate constant) if it turns out that their rate is valid; then ZZ11_1 will need to be factored in, and dated, like the other SNPs.

And doing that might show us that the brothers DF27 and U152 share their paternal origin about 2,000 miles farther east than most people have been suggesting, and mapping, for the past few years. Or, of course, it might not. Anyway, a little more discussion, for those who missed it and might care, is here: http://www.anthrogenica.com/showthread.php?2375-Updated-U152-Tree&p=66255&viewfull=1#post66255

I did not want ZZ11 to fall out of the conversation and it got caught up in the link you posted above but I quickly lost it. So that is why I started a new thread about it in the general P312 area.

MitchellSince1893
03-30-2015, 07:26 PM
...then ZZ11_1 will need to be factored in, and dated, like the other SNPs...[/url]

FYI, because Z11_11 is located outside the combBED area, it won't be counted by Yfull for dating purposes.


Our SNP mutation rate calibration was carried out in what we call the “combBED” area (combined BED), which contains start and end coordinates in the hg19 system of the Y-chromosome segments, in which we expect our samples to have SNP...Table 1 of the Supplement to this article shows the location of 857 “good” regions of the Y-chromosome (total length of 8,473,821 bp). SNP mutation rate calibration was carried out for these areas, which will be further referred to as "combBED area”...For a more effective selection of actual mutations, we recommend that any research area be within the boundaries of the combBED area. The size of the combBED area in individual BigY samples varies and the average is about 7.6 Mbp. The appropriate conversion factor, therefore, is 160 years per base substitution.

Selection criteria for using a SNP

We developed a selection method which effectively excluded from consideration false options with derived alleles (“false positives”). We used the following filtration criteria:
1. “Reg” criterion. There are derived variants (i.e. alleles different from the reference sequence) revealed in the BAM files. The nucleotide sequences under investigation had a total length between 13-15 Mbp for BigY, and about 23 Mbp for FGC. Single base read coverage varied from 1X to 8000X. The average coverage of commercial samples is about 60X. From this set of variants, we selected only those coordinates that fell into the combBED regions. As it was mentioned above, the combBED area was designed by the authors to select X-degenerate segments. The combBED area borders were formed by mutual overlapping BED file taken from the work of Poznik et al. (2013) (total length of 10.45 Mbp) and by the generalized BigY BED file (11.38 Mbp long), published in the BigY White Paper (2014). The result was 857 continuous segments of the Y-chromosome with a total length of 8,473,821 bp. The coordinates of the beginning and the end of these regions are contained in Table 1 of Supplement.
2. “Indel” criterion. We excluded insertions and deletions (indels), as well as multiple nucleotide polymorphism (more than one base position in derived alleles, MNP) variants.
3. “Locs” criterion. We excluded variants which were detected in more than five different localizations. (Note: “localization” is defined as a group of samples from the YFull database [2,900 samples at February, 2015] belonging to the same subclade and having derived allele nomination that have been studied). In some cases, the same derived variants were revealed in samples from different subclades or
haplogroups. One of the reasons consists of the fact that standard reference sequence is based on haplogroup R1b data and also to a lesser extent on haplogroup G data. Thus, some variants in some haplogroups are ancestral allele, not derived. Another reason is mapping errors. We found limit of five localization empirically. This criterion is soft but effective.
4. “Reads” criterion. We excluded from consideration any one or two read variants.
5. “Qual” criterion. We excluded variants with a read quality less than 90%. Quality is defined as weighted average of the quality index where correct values are taken with the positive and error values, with the negative.
6. “Post mortem” criterion. It’s applied only to the ancient samples. Postmortem damages of DNA, lead to the replacement of these basepairs: C→T and G→A (Briggs et al., 2007) were excluded.
7. “Single SNP” criterion. We excluded variants with Double Nucleotide Polymorphisms (DNP). Our program interpreted DNP as a base-substitution in two adjacent positions and therefore were not excluded by our Indel criterion. This secondary criterion allowed us to reject both options.
8. “Trash” criterion. We excluded suspicious variants which have alignment error or reading error. In general, these are variants in palindromic segments and segments with repetitive copies at other Y-chromosome
segments.
http://rjgg.molgen.org/index.php/RJGGRE/article/view/151/175

Muircheartaigh
03-30-2015, 07:58 PM
My FGC Yfull Statistics

ChrY BAM file size: 1.55 Gb
Reads (all): 13445632
Mapped reads: 13241685 (98.48%)
Unmapped reads: 203947 (1.52%)
Length coverage: 22932844 bp (89.39%)
Min depth coverage: 1X
Max depth coverage: 7998X
Mean depth coverage: 57.64X
Median depth coverage: 30X
Length coverage for age: 8434406 bp
No call: 2720722 bp

razyn
03-30-2015, 08:09 PM
[Replying to post #17.] So, they will ignore part of the dating evidence. It's nice to be rigorous, but that doesn't mean that messy evidence is lack of evidence. I did btw read their paper, and applaud the effort. If SNP counting is a valid approach (which I doubt, because I don't believe the evidence we can see suggests they occur sequentially and at a random but constant rate), only counting some while ignoring others is more like ironing out wrinkles than establishing the inherent grounds for the hypothetical mutation-rate constant. It may, however, sort of work -- as Kepler's Music of the Spheres sort of predicted the orbit of Uranus; and that's better than not trying.

lgmayka
03-30-2015, 08:21 PM
Speaking of U152, has anybody gotten around to informing the YFull guys about Alex W's ZZ SNPs?
The description (http://www.anthrogenica.com/showthread.php?4037-ZZ11&p=76877&viewfull=1#post76877) makes it sound utterly atrocious. Frankly, I can't imagine YFull including such a thing on their official tree.

Here is what Thomas Krahn wrote to a public mailing list last year (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2014-04/1398354257) about that region:
---
The region around 22.2 .. 22.6 Mb (hg19) is simply a problematic section of the Y chromosome.
I've made a dot-plot of a (randomly chosen ChrY:22505200..22505700) 500 bp section taken from the reference sequence with itself.
http://i.imgur.com/Q2ccVh5.png
Do you see the diagonal lines at 125 bases distance? That's what I keep referring to as the "125bp repeat region".

You can imagine this like a huge STR with hundreds of repeats at several thousand bases length. There are about 125 bases in each repeat unit that are almost exact matches to itself. This is for sure a recombination hotspot and some of those repeats aren't even sequenced (they are marked with NNNN...). Whenever you see a hg19 ChrY position starting with 222 you should be aware of this 125bp repeat region. Essentially you can scratch this from further phylogenetic consideration (unless you know everything about recombination events, which I don't).
---

razyn
03-30-2015, 11:33 PM
The description (http://www.anthrogenica.com/showthread.php?4037-ZZ11&p=76877&viewfull=1#post76877) makes it sound utterly atrocious. Frankly, I can't imagine YFull including such a thing on their official tree.
[snip]
Essentially you can scratch this from further phylogenetic consideration (unless you know everything about recombination events, which I don't).

I don't think inclusion on a tree prepared by YFull or ISOGG demands such consideration, nor that exclusion precludes it. Iff all of the U152+ people and all the DF27+ people have this genetic indicator -- and nobody else does -- it's probably defining that part of the phylogeny. I don't know that Alex is right; but I think he gave a rational argument for it, last October, that is neither trumped nor refuted by arguments Thomas had presented earlier last year, against testing in that sector of the Y chromosome. Alex's more current statement about that is:


Some of these mutations have ZZ series names. These mutations are generally marked as REJECTED in the FTDNA .vcf files, and have multiple variants. They are typically from regions of the Y for which similar regions exist, such as the palindromic region. The most important property of these mutations I'm introducing is that they are phylogenetically consistent. They don't occur in any other men, and they demonstrate a sensible relationship to the other known mutations.

He will be glad to argue with you about what he means. I'm not; but nor do I see much need to cite a description authored by our resident commentator MitchellSince1893 as proof of Alex's error. His light is not under a bushel; he's right here: http://www.littlescottishcluster.com/RL21/NGS/Tree.html

lgmayka
03-30-2015, 11:51 PM
Iff all of the U152+ people and all the DF27+ people have this genetic indicator -- and nobody else does -- it's probably defining that part of the phylogeny.
But they don't. I have posted my cousin's results in the ZZ11 thread.

David Wilson
03-31-2015, 02:33 AM
In case this is of interest:

ChrY BAM file size: 0.88 Gb
Reads (all): 12185490
Mapped reads: 11944923 (98.03%)
Unmapped reads: 240567 (1.97%)
Length coverage: 22891119 bp (89.23%)
Min depth coverage: 1X
Max depth coverage: 8014X
Mean depth coverage: 49.98X
Median depth coverage: 19X
Length coverage for age: 8329499 bp
No call: 2762447 bp

So at 8.33M I am a little under the average for FGC customers who submitted results to YFull, but not ruinously so.

vettor
03-31-2015, 06:10 AM
will Yfull create another branch in their tree if one is tested and all SNP's do not match a current line. ? or will these negative SNP's be ignored.
to explain.......the T line I belong to in Yfull appears to be this

T-CTS8489 CTS8862 * CTS9984 * CTS10538... 1 SNPs

my SNP's so have have confirmed from the above SNP
CTS8489 NEGATIVE
CTS8862 POSITIVE
CTS9984 POSITIVE
CTS10538 NEGATIVE


Maybe I need to find the cheapest test somwhere

Dave-V
03-31-2015, 03:13 PM
will Yfull create another branch in their tree if one is tested and all SNP's do not match a current line. ? or will these negative SNP's be ignored...


If it works the way they've added kits elsewhere on their tree, YFull should create a new T-CTS8862 (or perhaps T-CTS9984 for some reason) under T-CTS54 with your kit there (probably shown as T-CTS8862*) and then T-CTS8489 would branch off below T-CTS8862 with the two kits they have there already.

I don't think they would put Formed/TMRCA estimates for any of those until more kits are added although since yours breaks up a block on their current tree it might give them enough for estimating T-CTS54.

vettor
03-31-2015, 05:48 PM
If it works the way they've added kits elsewhere on their tree, YFull should create a new T-CTS8862 (or perhaps T-CTS9984 for some reason) under T-CTS54 with your kit there (probably shown as T-CTS8862*) and then T-CTS8489 would branch off below T-CTS8862 with the two kits they have there already.

I don't think they would put Formed/TMRCA estimates for any of those until more kits are added although since yours breaks up a block on their current tree it might give them enough for estimating T-CTS54.

Thanks, seems like as you say, this is the way it works

of the CTS54 line .....I have 16 untested SNPand these 2 below tested as Positive
CTS11984
CTS1774

So, I will end up with a new branch on my own

MJost
03-31-2015, 08:55 PM
FYI, I went ahead and submitted my full Y BAM file to YFull. It shows I have a ChrY BAM file size: 1.62 Gb and has an Expected date: 04/21/2015 with STRs about two months later.

MJost

VinceT
04-01-2015, 06:43 AM
The description (http://www.anthrogenica.com/showthread.php?4037-ZZ11&p=76877&viewfull=1#post76877) makes it sound utterly atrocious. Frankly, I can't imagine YFull including such a thing on their official tree.

Here is what Thomas Krahn wrote to a public mailing list last year (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2014-04/1398354257) about that region:
---
The region around 22.2 .. 22.6 Mb (hg19) is simply a problematic section of the Y chromosome.
I've made a dot-plot of a (randomly chosen ChrY:22505200..22505700) 500 bp section taken from the reference sequence with itself.
http://i.imgur.com/Q2ccVh5.png
Do you see the diagonal lines at 125 bases distance? That's what I keep referring to as the "125bp repeat region".

You can imagine this like a huge STR with hundreds of repeats at several thousand bases length. There are about 125 bases in each repeat unit that are almost exact matches to itself. This is for sure a recombination hotspot and some of those repeats aren't even sequenced (they are marked with NNNN...). Whenever you see a hg19 ChrY position starting with 222 you should be aware of this 125bp repeat region. Essentially you can scratch this from further phylogenetic consideration (unless you know everything about recombination events, which I don't).
---

I had colorized a section of this region (a.k.a. DYZ19) a few months ago while investigating some reported Big-Y anomalies in my clade (R-FGC396). Attempting to assert phylogenetic reliability to any of the bases within this absolutely massive 125 bp repeated motif is simply crazy, IMO.

The image below is 250 bases wide, so 2 repeats per line.

4226

VinceT
04-01-2015, 06:54 AM
Does anyone know how to get the length of coverage solely by looking at the Full Genomes elite data...another words, non-YFull customers?

If I recall correctly, the "CallableLoci" function in the Genome Analysis Toolkit (GATK) available from the Broad Institute does the job, in a roundabout way: https://www.broadinstitute.org/gatk/

Documentation at https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_coverage_Cal lableLoci.php

[edit]

I think it took around 40 to 50 minutes to process the BAM. Also, the summary file reports on the entire BAM, so you'd have to split the Y out into its own BAM beforehand, or summarize only the Y regions from the resulting bed file.

Actually, I had pulled out the Y-only reads for myself some time ago, and am now walking through that with CallableLoci.

[some 22 minutes later...]

Which gives me:


state nBases
REF_N 237019519
CALLABLE 13813155
NO_COVERAGE 2842268087
LOW_COVERAGE 796452
EXCESSIVE_COVERAGE 0
POOR_MAPPING_QUALITY 7907526


(This doesn't quite agree with YFull, but not all tools, and not all analyses are equal.)

Note that if you try running this on a BigY file, you need to add the flag:
-rf BadCigar
because the the guys who wrote the Arpeggi aligner used by FTDNA had botched encoding of CIGAR strings from the official spec.

MJost
04-09-2015, 09:09 PM
FYI, I went ahead and submitted my full Y BAM file to YFull. It shows I have a ChrY BAM file size: 1.62 Gb and has an Expected date: 04/21/2015 with STRs about two months later.

MJost

To date, my data reports:

ChrY BAM file size: 1.62 Gb
Reads (all): 13514863
Mapped reads: 13323583 (98.58%)
Unmapped reads: 191280 (1.42%)

Mapping refers to the process of aligning short reads to a reference sequence. The goal of mapping is to create an alignment file also known as a Sequence/Alignment Map (SAM) file for each sample. This SAM file will contain one line for each of the reads in your sample denoting the reference sequence (genes, contigs, or gene regions) to which it maps, the position in the reference sequence, and a Phred-scaled quality score of the mapping, among other details to obtain the highest number of high quality mapped reads.

Phred quality scores are logarithmically linked to error probabilities

Phred Quality Score / Probability of incorrect base call / Base call accuracy

10 / 1 in 10 / 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10,000 99.99%
50 1 in 100,000 99.999%
60 1 in 1,000,000 99.9999%

For example, if Phred assigns a quality score of 30 to a base, the chances that this base is called incorrectly are 1 in 1000. The most commonly used method is to count the bases with a quality score of 20 and above. The high accuracy of Phred quality scores make them an ideal tool to assess the quality of sequences.

MJost

MJost
04-10-2015, 05:30 PM
Getting some result by Expected date: 04/19/2015, a couple of days earlier.

MJost

razyn
04-15-2015, 01:35 PM
Iff all of the U152+ people and all the DF27+ people have this genetic indicator -- and nobody else does -- it's probably defining that part of the phylogeny.


But they don't. I have posted my cousin's results in the ZZ11 thread.

Well, that's why I spelled "iff" with two fs (meaning "if and only if"). But I ran this by Alex, and his comment Monday was:


I think the mutation remains valid. It's in a bad spot, so it's very reasonable that it won't last forever in all lines. I assume there must be some branch downstream of Z49 which should all be negative.

That's an allusion to lgmayka's cousin, mentioned on the ZZ11 thread, who is Z49*: http://www.anthrogenica.com/showthread.php?4037-ZZ11&p=76927&viewfull=1#post76927

We know of one example (a Plant family) in which the ZZ12_1 mutation (that essentially defines the Z195- subclades of DF27) has disappeared, presumably via back-mutation at the same locus, since other families with the same SNP ancestry still have it. On the other hand there is a cluster of Ashkenazi families in which that same mutation (on a palindrome arm) has copied itself, so they are just plain ZZ12 -- can be read on either arm. Also, my subgroup Bbbb in the DF27 project -- first identified by one L. G. Mayka, who announced it four years ago on DNAForums after he noticed its quirky STR patterns in one family of his Polish project -- has been traveling on a suspect L484+ passport for most of those four years. Sure enough, recent BigY testing has spotted a new member of the group who has the STR pattern, but tests L484-. This appears to be another back-mutation, of a SNP that's known to be "recurrent." Luckily my FGC Elite test found several more stable SNPs to substitute for L484, and four of us have confirmed them with matching BigY results.

In time the Rockies may crumble, Gibraltar may tumble -- they're only made of clay. But. ZZ11_1 really only needed to be stable prior to the birth of U152 and DF27; and I think it most likely was.

MJost
04-16-2015, 08:57 PM
Getting some result by Expected date: 04/19/2015, a couple of days earlier.

MJost
My YFULL FGC BAM file data results that just showed up today.

ChrY BAM file size: 1.62 Gb
Reads (all): 13514863
Mapped reads: 13323583 (98.58%)
Unmapped reads: 191280 (1.42%)
Length coverage: 22938415 bp (89.42%)
Min depth coverage: 1X
Max depth coverage: 7998X
Mean depth coverage: 52.78X
Median depth coverage: 27X
Length coverage for age: 8402506 bp
No call: 2715151 bp

MJost