PDA

View Full Version : Methodology of determining new Full Genomes YSNPs - P312/L21



RobertCasey
11-29-2013, 04:49 AM
Below is the very preliminary analysis of three Full Genomes tests (excluding novel FGC YSNPs):

____________P312/DF99_Z253/L226_Z253/PF825

equiv - DF13
Z2542-
Z2542+
Z2542+


equiv - L21
L459-
L459+
L459+


equiv - L21
Z245-
Z245+
Z245+


equiv - L21
Z260-
Z260+
Z260+


equiv - L21
Z290-
Z290+
Z290+


same - Z2542
CTS8221-
CTS8221+
CTS8221+


new - L21
M2693-
M2693+
M2693+


new - L21
PF5501-
PF5501+
PF5501+


same - Z252
S471-
S471+
S471+


new - L21
Z252-
Z252+
Z252+


new - P312
Z6427+
Z6427-
Z6427-


new - P312
CTS11707+
CTS11707-
CTS11707-


new - Z253
CTS3524-
CTS3524+
CTS3524-


new - Z253
CTS3525-
CTS3525+
CTS3525-


new - Z253
K289-
K289+
K289-


new - Z253
M6006-
M6006+
M6006-


new - Z253
CTS10596-
CTS10596-
CTS10596+


new - Z253
CTS12273-
CTS12273-
CTS12273+


new - Z253
L1495-
L1495-
L1495+


new - Z253
L430-
L430-
L430+


new - Z253
PF2916-
PF2916-
PF2916+


same - L1495

PR2573-
PR2573-
PR2573+


recurrent
PF2595+
PF2595-
PF2595+



recurrent
M9194+
M9194+
M9194-


same - DF13
S521-
S521+
S521+


same - DF13
CTS241-
CTS241+
CTS241+


same - L21
M529-
M529+
M529+


same - L21
S145-
S145+
S145+


same - L226
S168-
S168+
S168-


same - Z253
S218-
S218+
S218+


son - DF13
Z253-
Z253+
Z253+


son - L21
DF13-
DF13+
DF13+


son - P312
DF99+
DF99-
DF99-


son - P312
L21-
L21+
L21+


son - Z253
Z2534-
Z2534+
Z2534-


son - Z2534
L226-
L226+
L226-


unstable
M684-
M684+
M684x


unstable
M4128+
M4128x
M4128-



Looking for input to better enhance this preliminary analysis. Only used the ISOGG haplotree and Mike W's descendant charts for this analysis. Looking for other sources to identify known equivalent or YSNPs that have multiple names for the same YSNP. Of course, the "new P312" is the absolute highest level of this YSNP (it could become new L21 with more information). New L21 could become new Z253 if a third Z253 reports a negative, etc.

If you have other L21 submissions that you want added, please send your original Full Genomes email to the address below:

http://www.rcasey.net/DNA/R_L21/R_L21_Contact_Project.html

George Chandler
11-29-2013, 04:29 PM
Hi Robert,

Thanks for posting that. I don't mind sharing my results when the eventually come in with you or Mark Jost (which is the whole point of this). Has anyone come up with a centralized location as David R has done for the Geno 2.0 L21 results yet? I know this is still at the disorganized puzzle phase in terms of SNP placement, but it would be nice to see all the incoming results patterned out as you've done (especially for the newly discovered SNP's). I know this is a monumental task but it would be nice to start seeing the new SNP comparison (as you've done here) for other results.

Thanks
George

RobertCasey
11-29-2013, 08:15 PM
Hi Robert,

Thanks for posting that. I don't mind sharing my results when the eventually come in with you or Mark Jost (which is the whole point of this). Has anyone come up with a centralized location as David R has done for the Geno 2.0 L21 results yet? I know this is still at the disorganized puzzle phase in terms of SNP placement, but it would be nice to see all the incoming results patterned out as you've done (especially for the newly discovered SNP's). I know this is a monumental task but it would be nice to start seeing the new SNP comparison (as you've done here) for other results.

Thanks
George

Our original thought was assuming that David Reynolds would continue to be the focal point for collecting the data (as he did with the WTY and Nat Geo 2.0 raw files). But David has been very inactive lately. Mark has stated that he would coordinate the Sanger sequencing of the test results and is collecting results in order to do that (I assume his source is the Krahn's new company). I am currently just collecting several test results to understand how to analyze these results. I do have a MySQL database up and running (probably less functional than Mike W's but more comprehensive). I am attempting to understand how to review 54,000 YSNPs per test and extract out the L21 discoveries. The first three tests that I have looked at have around 40 novel private YSNPs (none overlap) but also have around 40 YSNPs that are different (part of the 10,000 new YSNPs being reported). So, these three tests appear to have at least 120 new private L21 YSNPs plus around 20 or so new other YSNPs which appear to be the most interesting at this point in time.

By reviewing the FG test results I just learned that CTS8221 is the same as Z2542 and S471 is the same as Z252. So the challenge at this point in time is to determine which ones are actually new YSNPs vs. just new names for the same YSNPs (or yet more equivalent YSNPs). I will post my findings as they progress. Really need L21** and DF13** submissions next as well as other Z253 submissions.

At this point in time, I am only in analysis mode and attempting to figure out how the test results should be summarized and filtered. I will eventually expand my database to include L21 specific YSNPs. Also, CROMO2 and Krahn's testing results also need to be tracked - this would be too much to expect David to continue expand his scope. No telling how the Big Y results will come in as well. I think we need to split up different testing companies and then share results between those who actually track this kind of data.

I can not understand why my Z253/L226 results and another Z253/PF825 share no common FGC YSNPs. I would also love to get the results for the person who also tested FGC498 positive (which I tested positive for as well). The P312/DF99 test did not get any FGC YSNPs assigned for his high reliable private YSNPs, so we may have to track some by YChr position. Also, the Z253/PF825 results did not get any FGC YSNPs assigned to their high quality indel mutations like I did. It looks like the 10,000 or so new YSNPs show the most promise by using only three results. These have no YChr position listed, so I will have to find a source for that information so that the Krahn's will how to build the primers for individual testing or submit forms to FTDNA (but that is a future issue - we need to analyze which YSNPs are important to L21 first).

BTW - the above list only analyzes the file with all 54,000 YSNPs summarized. I still need to add the FGC YSNPs as well as other YSNPs listed in the compare report - so more YSNPs will be added over time.

George Chandler
11-29-2013, 08:52 PM
Ok thanks Robert. I hope everything is ok with David R. It's quite the task try to figure this out..I started making a spreadsheet using David R's Geno 2.0 and include the newly discovered high quality SNP's but I know it's only scratching the surface. It does seem strange you wouldn't have any FGC SNP's? You would think you would find a bit of overlap on a few?

George

RobertCasey
11-30-2013, 03:01 AM
Ok thanks Robert. I hope everything is ok with David R. It's quite the task try to figure this out..I started making a spreadsheet using David R's Geno 2.0 and include the newly discovered high quality SNP's but I know it's only scratching the surface. It does seem strange you wouldn't have any FGC SNP's? You would think you would find a bit of overlap on a few?

George
I do have one FGC YNSP that overlaps with some other FG test - FGC498. However, my Z253/L226 results and another Z253/PF825 results have no overlapping new novel YSNP labeled FGC. However, the P312/DF99 submission has no FGC YSNPs assigned - even the high quality YSNPs. So we may have to track those new novel YSNPs as Ychr ID positions. Also, the Z253/PF825 test did not get FGC YSNPs assigned to their their high quality indel YSNPs. The best approach my be to request Full Genomes to assign FGC SNPs to these high quality novel private YSNPs in consistent manner with more recent tests (like mine).

hoxgi
11-30-2013, 04:47 AM
The issue of SNPs having more than one name is likely to cause significant problems in comparing results from different companies. There is a list of duplicate names relevant to Chromo 2.0 results at http://www.yourgeneticgenealogist.com/ (post of Nov 20). It would be helpful if other companies released similar lists relevant to the SNPs they test.

Also, as has been pointed out on another thread, it may be that the FG and FGC designations for new SNPs in Full Genomes results are not equivalent, in which case the two Z253+ Full Genome results may share some new SNPs after all.

Perhaps an independent body, such as ISOGG, needs to take control of SNP nomenclature to ensure that a uniform approach is followed.

Greg H

hoxgi
11-30-2013, 07:34 AM
A recent update: the owner of the PF825+ kit whose results Robert has analyzed has now received an email from FGC renaming his new private SNPs as FGC3221-FGC3268, and his three private indels as FGC3269, FGC3270 and FGC3217 (hopefully not a typo).

So it appears that he and Robert do not have one shared SNP in their total of around 90 new SNPs, meaning all of these must be downstream from Z253.

Greg H

Muircheartaigh
11-30-2013, 08:37 AM
A recent update: the owner of the PF825+ kit whose results Robert has analyzed has now received an email from FGC renaming his new private SNPs as FGC3221-FGC3268, and his three private indels as FGC3269, FGC3270 and FGC3217 (hopefully not a typo).

So it appears that he and Robert do not have one shared SNP in their total of around 90 new SNPs, meaning all of these must be downstream from Z253.

Greg H

Does that include the low reliability ** and *** private variants? Robert has a total of 239 variants in his list of private SNPs. Do they share any of these variants?


Ray

hoxgi
11-30-2013, 10:09 AM
Does that include the low reliability ** and *** private variants? Robert has a total of 239 variants in his list of private SNPs. Do they share any of these variants?

Ray, only the no asterisk (99% reliability) and * (95% reliability) private SNPs have been given FGC designations, so they are the only ones I've looked at so far. It is certainly possible that some of the lower reliability SNPs might be shared by both Z253+ kits, but surely it would be strange, statistically speaking, if all the new SNPs above Z253 happened to be of low reliability.

Greg H

Muircheartaigh
11-30-2013, 11:29 AM
Ray, only the no asterisk (99% reliability) and * (95% reliability) private SNPs have been given FGC designations, so they are the only ones I've looked at so far. It is certainly possible that some of the lower reliability SNPs might be shared by both Z253+ kits, but surely it would be strange, statistically speaking, if all the new SNPs above Z253 happened to be of low reliability.

Greg H

Greg,

I was thinking of possible common SNPs downstream of Z253. The two threads either divided downstream of Z253 in which case there are likely to be common SNPs, or they are individual sons of Z253 in which case there will be no common SNPs.

The simplest way to compare them is probably to copy each set of unfiltered private variants separately to new Excel Sheets, sort them by b37 location and copy them to a common Excel sheet and look for common locations/transitions between the two sets.

hoxgi
11-30-2013, 11:53 AM
Ray

Thanks. I agree, but I still would have thought that we may have turned up SNPs above Z253 as well.

I only have the raw data for the PF825+ kit and so have compared its SNPs with the information Robert Casey has posted online about his own results, rather than his raw data. The information with the FG results cautions that only SNP results with at least 95% reliability should be regarded as significant.

I have forwarded a copy of the PF825+ kit raw SNP data to Robert Casey, with the permission of the kit's owner. I could check if he is happy for me to send you a copy as well if you like. Hopefully it won't be too long before you receive your own FG results, and I think the more people working on comparisons the better.

We also have a Chromo 2.0 result (positive SNPs only, not raw data) for a different PF825 kit online in the L21 Yahoo Group, so adding those results into the comparison may clarify things further. The kit is Buchanan 200481.

Greg H

Muircheartaigh
11-30-2013, 12:40 PM
Ray

Thanks. I agree, but I still would have thought that we may have turned up SNPs above Z253 as well.



I have forwarded a copy of the PF825+ kit raw SNP data to Robert Casey, with the permission of the kit's owner. I could check if he is happy for me to send you a copy as well if you like. Hopefully it won't be too long before you receive your own FG results, and I think the more people working on comparisons the better.

We also have a Chromo 2.0 result (positive SNPs only, not raw data) for a different PF825 kit online in the L21 Yahoo Group, so adding those results into the comparison may clarify things further. The kit is Buchanan 200481.

Greg H

Greg,

That would be great. PF825 and my cluster (253-1716-11) share some common off-modal STR mutations and it is possible that we share some common SNPs downstream of Z253. It would be good to do some comparisons to confirm where they branched when I eventually receive results (batch # 5)

Ray

RobertCasey
12-03-2013, 02:25 PM
The Z253/DF825 sponsor received updated results. We now share 20 or 30 shared high quality YSNPs since his original file had no FGC assignments for shared YSNPs or indels like mine did. For now, I think we need to use the Ychr position to find L21 relevant YSNPs since there are many high quality YSNPs with no FGC designations. As we get FGC mapping to YChr, we can replace the YChr designation with the FGC assignments. Also, I only look at high quality mutations blank and *. With at least 50 of these per test, I think we should only concentrate on high quality YSNPs for now. Also, I do plan to review all high quality indels as well.

Peter M
12-03-2013, 08:55 PM
At this point in time, I am only in analysis mode and attempting to figure out how the test results should be summarized and filtered. I will eventually expand my database to include L21 specific YSNPs. Also, CROMO2 and Krahn's testing results also need to be tracked - this would be too much to expect David to continue expand his scope. No telling how the Big Y results will come in as well. I think we need to split up different testing companies and then share results between those who actually track this kind of data.


If you want my opinion: do *NOT* split up your activities on the basis of the testing companies the results were obtained from. Focus on a branch of the tree with a width you can handle and then combine ALL results from ALL testing companies for people in that branch. Otherwise I'm afraid you'll find yourself in a big mess shortly.

GoldenHind
12-04-2013, 01:41 AM
Can anyone tell me anything about CTS11707 and Z6427. I don't see them listed in the ISOGG index of SNPs.

RobertCasey
12-04-2013, 03:34 AM
Here is the analysis of the Compare files of the P312/DF99, Z253/L226 and Z253/DF825 files:

Here are some observations:

1) There are a lot of private high quality YSNPs for each submission.
2) There seems to be too few YSNPs between Z253 and P312 (labeled Approx Z253).
3) FGC164 appears to be the best find - between Z253 and P312.
4) Need to research other approx Z253 candidates, PF5888 and Z252.
5) All other YSNPs were either all three positive or all three negative.
6) The PF825 submission also has FGC28 and FGC187 (which are probably one of the five unknown PF825 YSNPs)
7) I have not received the updated PF825 results - had to cross reference them by YChr positions with L226 results.
8) YSNPs labeled FSS are high quality FGC Shared SNPs with no known assigned FGC numbers.

Look forward to comments - will update as I research interesting YSNPs on the list.

Part 1 of 3 (table was too big for one post):



aaAnalysis
aaP312/DF99
aaZ253/L226
aaZ253/PF825


Approx DF99
FGC846+
Missing
Missing


Approx DF99
FGC847+
Missing
Missing


Approx DF99
FGC848+
Missing
Missing


Approx DF99
FGC849+
Missing
Missing


Approx DF99
FGC850+
Missing
Missing


Approx DF99
FGC851+
Missing
Missing


Approx DF99
FGC852+
Missing
Missing


Approx DF99
FGC853+
Missing
Missing


Approx DF99
FGC854+
Missing
Missing


Approx DF99
FGC855+
Missing
Missing


Approx DF99
FGC856+
Missing
Missing


Approx DF99
FGC857+
Missing
Missing


Approx DF99
FGC858+
Missing
Missing


Approx DF99
FGC859+
Missing
Missing


Approx DF99
FGC860+
Missing
Missing


Approx DF99
FGC861+
Missing
Missing


Approx DF99
FGC862+
Missing
Missing


Approx DF99
FGC863+
Missing
Missing


Approx DF99
FGC864+
Missing
Missing


Approx DF99
FGC865+
Missing
Missing


Approx DF99
FGC866+
Missing
Missing


Approx DF99
FGC867+
Missing
Missing


Approx DF99
FGC868+
Missing
Missing


Approx DF99
FGC869+
Missing
Missing


Approx DF99
FGC870+
Missing
Missing


Approx DF99
FGC871+
Missing
Missing


Approx DF99
FGC872+
Missing
Missing


Approx DF99
FGC873+
Missing
Missing


Approx DF99
FGC874+
Missing
Missing


Approx DF99
FGC875+
Missing
Missing


Approx DF99
FGC876+
Missing
Missing


Approx DF99
FGC877+
Missing
Missing


Approx DF99
FGC878+
Missing
Missing


Approx DF99
FGC879+
Missing
Missing


Approx DF99
FGC880+
Missing
Missing


Approx DF99
FGC881+
Missing
Missing


Approx DF99
FGC882+; site of CTS9213
Missing
Missing


Approx DF99
FGC883+
Missing
Missing


Approx DF99
FGC884+
Missing
Missing


Approx DF99
FGC885+
Missing
Missing


Approx DF99
FGC886+
Missing
Missing


Approx DF99
FGC887+
Missing
Missing


Approx DF99
FGC888+
Missing
Missing


Approx DF99
FGC889+
Missing
Missing


Approx DF99
FGC890+
Missing
Missing


Approx DF99
FGC891+
Missing
Missing


Approx DF99
FGC892+
Missing
Missing


Approx DF99
FGC893+
Missing
Missing


Approx DF99
FGC894+
Missing
Missing


Approx DF99
FGC895+
Missing
Missing


Approx DF99
FGC896+
Missing
Missing


Approx DF99
FGC897+
Missing
Missing


Approx DF99
FGC898+
Missing
Missing


Approx DF99
FGC899+
Missing
Missing

RobertCasey
12-04-2013, 03:35 AM
Part 2 of 3:


Approx L226
Missing
FGC1556+
Missing


Approx L226
Missing
FGC1557+
Missing


Approx L226
Missing
FGC267+
Missing


Approx L226
Missing
FGC271+
Missing


Approx L226
Missing
FGC385+
Missing


Approx L226
Missing
FGC498+
Missing


Approx L226
Missing
FGC5618+
Missing


Approx L226
Missing
FGC5619+
Missing


Approx L226
Missing
FGC5620+
Missing


Approx L226
Missing
FGC5621+
Missing


Approx L226
Missing
FGC5622+
Missing


Approx L226
Missing
FGC5623+
Missing


Approx L226
Missing
FGC5624+
Missing


Approx L226
Missing
FGC5625+
Missing


Approx L226
Missing
FGC5626+
Missing


Approx L226
Missing
FGC5627+
Missing


Approx L226
Missing
FGC5628+
Missing


Approx L226
Missing
FGC5629+
Missing


Approx L226
Missing
FGC5630+
Missing


Approx L226
Missing
FGC5631+
Missing


Approx L226
Missing
FGC5632+
Missing


Approx L226
Missing
FGC5633+
Missing


Approx L226
Missing
FGC5634+
Missing


Approx L226
Missing
FGC5635+
Missing


Approx L226
Missing
FGC5636+
Missing


Approx L226
Missing
FGC5637+
Missing


Approx L226
Missing
FGC5638+
Missing


Approx L226
Missing
FGC5639+
Missing


Approx L226
Missing
FGC5640+
Missing


Approx L226
Missing
FGC5641+
Missing


Approx L226
Missing
FGC5642+
Missing


Approx L226
Missing
FGC5643+
Missing


Approx L226
Missing
FGC5644+
Missing


Approx L226
Missing
FGC5645+
Missing


Approx L226
Missing
FGC5646+
Missing


Approx L226
Missing
FGC5647+
Missing


Approx L226
Missing
FGC5648+
Missing


Approx L226
Missing
FGC5649+
Missing


Approx L226
Missing
FGC5650+
Missing


Approx L226
Missing
FGC5651+
Missing


Approx L226
Missing
FGC5652+
Missing


Approx L226
Missing
FGC5653+
Missing


Approx L226
Missing
FGC5654+
Missing


Approx L226
Missing
FGC5655+
Missing


Approx L226
Missing
FGC5656+
Missing


Approx L226
Missing
FGC5657+
Missing


Approx L226
Missing
FGC5658+
Missing


Approx L226
Missing
FGC5659+
Missing


Approx L226
Missing
FGC5660+
Missing

RobertCasey
12-04-2013, 03:37 AM
Part 3 of 3:



Approx PF825
Missing
Missing
FGC3221+


Approx PF825
Missing
Missing
FGC3222+


Approx PF825
Missing
Missing
FGC3223+


Approx PF825
Missing
Missing
FGC3224+


Approx PF825
Missing
Missing
FGC3225+


Approx PF825
Missing
Missing
FGC3226+


Approx PF825
Missing
Missing
FGC3227+


Approx PF825
Missing
Missing
FGC3228+


Approx PF825
Missing
Missing
FGC3229+


Approx PF825
Missing
Missing
FGC3230+


Approx PF825
Missing
Missing
FGC3231+


Approx PF825
Missing
Missing
FGC3232+


Approx PF825
Missing
Missing
FGC3233+


Approx PF825
Missing
Missing
FGC3234+


Approx PF825
Missing
Missing
FGC3235+


Approx PF825
Missing
Missing
FGC3236+


Approx PF825
Missing
Missing
FGC3237+


Approx PF825
Missing
Missing
FGC3238+


Approx PF825
Missing
Missing
FGC3239+


Approx PF825
Missing
Missing
FGC3240+


Approx PF825
Missing
Missing
FGC3241+


Approx PF825
Missing
Missing
FGC3242+


Approx PF825
Missing
Missing
FGC3243+


Approx PF825
Missing
Missing
FGC3244+


Approx PF825
Missing
Missing
FGC3245+


Approx PF825
Missing
Missing
FGC3246+


Approx PF825
Missing
Missing
FGC3247+


Approx PF825
Missing
Missing
FGC3248+


Approx PF825
Missing
Missing
FGC3249+


Approx PF825
Missing
Missing
FGC3250+


Approx PF825
Missing
Missing
FGC3251+


Approx PF825
Missing
Missing
FGC3252+


Approx PF825
Missing
Missing
FGC3253+


Approx PF825
Missing
Missing
FGC3254+


Approx PF825
Missing
Missing
FGC3255+


Approx PF825
Missing
Missing
FGC3256+


Approx PF825
Missing
Missing
FGC3257+


Approx PF825
Missing
Missing
FGC3258+


Approx PF825
Missing
Missing
FGC3259+


Approx PF825
Missing
Missing
FGC3260+


Approx PF825
Missing
Missing
FGC3261+


Approx PF825
Missing
Missing
FGC3262+


Approx PF825
Missing
Missing
FGC3263+


Approx PF825
Missing
Missing
FGC3264+


Approx PF825
Missing
Missing
FGC3265+


Approx PF825
Missing
Missing
FGC3266+


Approx PF825
Missing
Missing
FGC3267+


Approx PF825
Missing
Missing
FGC3268+


Approx PF825
Missing
Missing
FGC3269+


Approx PF825
Missing
Missing
FGC3270+


Approx PF825
Missing
Missing
FGC3271+


Approx PF825
Missing
Missing

FGC3219+



Approx PF825
Missing
Missing
FGC3220+



Approx PF825
Missing
Missing

FGC187+



Approx PF825
Missing
Missing
FGC28+



Approx PF825
Missing
Missing
FGC3218+



Approx Z253
Missing
FGC164+

FGC164+


Approx Z253
Missing
PF5888+
PF5888+


Approx Z253
Missing
Z252+ (also known as S471)
Z252+ (also known as S471)


Equiv DF13
Missing
Z2542+ (also known as CTS8221)
Z2542+ (also known as CTS8221)


Equiv L21
Missing
L459+
Missing


Equiv L21
Missing
Z245+
Z245+


Equiv L21
Missing
Z260+
Z260+


Equiv L21
Missing
Z290+
Z290+


Inconclusive
Missing
Missing
L116- (also known as S284, PF2955)


Inconclusive
L20- (also known as S144)
Missing
L20- (also known as S144)


Inconclusive
M2693- (also known as PF5501)
Missing
Missing


Inconclusive
Missing
Missing
M9194-


Inconclusive
U2- (also known as S314, PF2952)
U2- (also known as S314, PF2952)
Missing


Pre P312
FGC173+
FGC173+
Missing


Pre P312
CTS3075+
Missing
CTS3075+


Pre P312
FGC148+
FGC148+
Missing


Pre P312
M1206+ (also known as PF5898, CTS3135)
Missing
M1206+ (also known as PF5898, CTS3135)


Same L226
Missing
S168+ (also known as L226)
Missing


Same Z253
Missing
S218+ (also known as Z253)
S218+ (also known as Z253)


Son L21
Missing
DF13+ (also known as S521, CTS241)
DF13+ (also known as S521, CTS241)


Son P312
Missing
L21+ (also known as S145, M529)
L21+ (also known as S145, M529)


Son Z253
Missing
Z2534+
Missing

RobertCasey
12-04-2013, 05:04 AM
I went back and added information from the gtype files and found that the Compare file is just not very reliable for non-FGC YSNPs. I think that the Compare file is the only source for FGC YSNPs and the gtype file is the most reliable source for non-FGC YSNPs.



All Negative
L116- (gtype)
L116 - (gtype)
L116- (also known as S284, PF2955)


All Negative
L20- (also known as S144)
L20 - (gtype)
L20- (also known as S144)


All Negative
U2- (also known as S314, PF2952)
U2- (also known as S314, PF2952)
U2- (gtype)


All Positive
PF5888+ (gtype)
PF5888+
PF5888+


All Positive
CTS3075+
CTS3075+ (gtype)
CTS3075+


All Positive
M1206+ (also known as PF5898, CTS3135)
M1206+ (gtype)
M1206+ (also known as PF5898, CTS3135)


Approx Z253
Missing
FGC164+
FGC164+


Approx Z253
Z252- (gtype)
Z252+ (also known as S471)
Z252+ (also known as S471)


Approx Z253
M2693- (also known as PF5501)
M2693+ (gtype)
M2693+ (gtype)


Equiv DF13
Z2542- (gtype)
Z2542+ (also known as CTS8221)
Z2542+ (also known as CTS8221)


Equiv L21
L459- (gtype)
L459+
L459+ (gtype)


Equiv L21
Z245- (gtype)
Z245+
Z245+


Equiv L21
Z260- (gtype)
Z260+
Z260+


Equiv L21
Z290- (gtype)
Z290+
Z290+


Pre P312
M9194+ (gtype)
M9194+ (gtype)
M9194-


Pre P312
FGC173+
FGC173+
Missing


Pre P312
FGC148+
FGC148+
Missing


Same L226
S168- (gtype)
S168+ (also known as L226)
S168- (gtype)


Same Z253
S218- (gtype)
S218+ (also known as Z253)
S218+ (also known as Z253)


Son L21
DF13- (gtype)
DF13+ (also known as S521, CTS241)
DF13+ (also known as S521, CTS241)


Son P312
L21- (gtype)
L21+ (also known as S145, M529)
L21+ (also known as S145, M529)


Son Z253
Z2534- (gtype)
Z2534+
Z2534- (gtype)

MJost
12-04-2013, 12:58 PM
Robert,

I have a Excel VB Script that preps a Haplogroup compare file allowing to be merged into the current combined list I have, all with some manual steps prior and post run steps. With four kits merged I am, I have over 2600 positions now. I am trying to get all the tesed L21 subclades included.

Does DaveR have a complete HaplotypeCompare Out file available?

MJost

RobertCasey
12-04-2013, 02:14 PM
Robert,

I have a Excel VB Script that preps a Haplogroup compare file allowing to be merged into the current combined list I have, all with some manual steps prior and post run steps. With four kits merged I am, I have over 2600 positions now. I am trying to get all the tesed L21 subclades included.

Does DaveR have a complete HaplotypeCompare Out file available?

MJost

Mark, I only post the YSNPs that have interesting results and filter out any YSNP that shows positive or negative for all test results. I have determined that the Compare file is only good for FGC novel YSNPs and can not be trusted for non-FGC YSNPs. The gtype file is much more reliable for non-FGC YSNPs. Other than the 159 high quality FG YSNPs in the three submissions compared to date, I found only one FGC YSNP shared in common for the two Z253 submissions (FGC164) and not by the P312/DF99 submission. However, the gtype file reveals nine interesting YSNPs that are not shared by the two Z253 submissions and three more that are shared by both but not by the P312/DF99 submission. Unfortunately, there are no YChr positions for these non-FGC YSNPs in the files (researching these YSNPs elsewhere).

Since DavidR has been quiet lately, I do not know any status of his collection of FG, CROMO2 or Krahn tested YSNPs (and soon to be Big Y). I have seen posts that he has not updated Nat Geo 2.0 files for some time as well. For now, you and I appear to be the only ones collecting FG files. Also, my results was the only submission of the three that had FGC YSNPs assigned. I got an updated version of the P312/DF99 results and an email describing the Z253/PF825 submission which I had to cross compile via YChr position (and could not assign two PF825 unique YSNPs since I have no YChr position for these two since the updated file has not been sent to date). Plus I found three more high quality YSNPs in the PF825 results that are still not assigned any FGC labels to date. I suggest that we exchange our two files next so that we continue to have more to compare. We probably need to ask the sponsors if it is OK to send the others to each other. If you agree to exchange our source files, send you original (and any updated FG results files) to my email address:

http://www.rcasey.net/DNA/R_L21/R_L21_Contact_Project.html

Please post your email address (image), I will send you my file (no updates required since my original results was much later than the others and has FGC numbers assigned to all high quality novel FG YSNPs).

I will still manually manipulate via EXCEL macros for a while until I get comfortable on how to analyze the data. Once I get comfortable with the analysis methodology, I will upload the files to my MySQL database where I can just create queries to analyze the data. I also intend to eventually upload the YSTR results as well - but you need a lot more submissions before YSTRs are very useful.

RobertCasey
12-04-2013, 08:40 PM
After more investigation of the relevant YSNPs, I determined that PF5888 is not a good YSNP for L21 testing. It is currently part of the Nat Geo 2.0 test and has extensively tested positive for (SEMARGL): U152+, U106+, L21+ and DF27+. It has also tested positive for DF99+ (P312 FTDNA project) as well. Since we have a P312/DF99 submission where PF5888 is not found, we obviously can not assume that a missing result is negative. This means Z252 is probably at risk as well. The DF99+/PF5888+ submission is found in the P312 FTDNA project (FTDNA ID 272715). Hopefully, missing FGC YSNPs can be safely assumed to be negative. Also, several PF5888+ submissions have tested P312-, so PF5888 appears to be pre-P312 (SEMARGL).

Williamson
12-04-2013, 11:06 PM
Hopefully, missing FGC YSNPs can be safely assumed to be negative.

Hi Robert,

No, missing FGC numbers can not be assumed to be negative. I recently received my updated FGC results (with the FG number renaming) and it did not indicate the I was positive for FGC3218. However, after manually checking in the BAM file for my results, I am in fact positive for this SNP. FGC3218 looks to be equivalent to L21 at the moment.

Alex

warwick
12-04-2013, 11:12 PM
Hi Robert,

No, missing FGC numbers can not be assumed to be negative. I recently received my updated FGC results (with the FG number renaming) and it did not indicate the I was positive for FGC3218. However, after manually checking in the BAM file for my results, I am in fact positive for this SNP. FGC3218 looks to be equivalent to L21 at the moment.

Alex

If the SNP was reported as ** or greater by the software then it is not labeled a high quality SNP and named. Do you know the confidence level reported in your file for this SNP? You might want to confer with Greg M. on this question since he developed the software.

Greg has specific confidence criteria that he uses that may be more cautious, i.e. favor not reporting data with at least 95% confidence as a private SNP.

Williamson
12-04-2013, 11:40 PM
If the SNP was reported as ** or greater by the software then it is not labeled a high quality SNP and named. Do you know the confidence level reported in your file for this SNP? You might want to confer with Greg M. on this question since he developed the software.

Greg has specific confidence criteria that he uses that may be more cautious, i.e. favor not reporting data with at least 95% confidence as a private SNP.

There was no confidence level reported for the SNP as it simply wasn't included at all, named or not. I will confer with Greg about this. To me, my result looks clean enough that I am positive for the SNP, but more expect eyes may see something different.

Alex

warwick
12-04-2013, 11:51 PM
There was no confidence level reported for the SNP as it simply wasn't included at all, named or not. I will confer with Greg about this. To me, my result looks clean enough that I am positive for the SNP, but more expect eyes may see something different.

Alex

We see it in your file.

Williamson
12-04-2013, 11:56 PM
We see it in your file.

Well, that's embarrassing.

I can't seem to find it in either my haplogroupCompare file, or my variantCompare file. If you could let me know which file to look it, I'd really like to see the result.

Thanks,
Alex

warwick
12-04-2013, 11:58 PM
Well, that's embarrassing.

I can't seem to find it in either my haplogroupCompare file, or my variantCompare file. If you could let me know which file to look it, I'd really like to see the result.

Thanks,
Alex

rs150868296
haplogroupCompare

Williamson
12-05-2013, 12:01 AM
rs150868296
haplogroupCompare

My haplogroupCompare file is dated 20131111. What is the date on your file? Can you please e-mail to me?

Thanks,
Alex

warwick
12-05-2013, 12:37 AM
My haplogroupCompare file is dated 20131111. What is the date on your file? Can you please e-mail to me?

Thanks,
Alex

We'll take a look at this. Can be an omission from the report.
Greg will take another look tomorrow.

warwick
12-05-2013, 02:03 PM
My haplogroupCompare file is dated 20131111. What is the date on your file? Can you please e-mail to me?

Thanks,
Alex

That SNP call in your data did not meet our quality threshold, which is why it was not reported by us. In the bam file by our criteria we do not report the result.

RobertCasey
12-10-2013, 05:06 AM
For non-FGC YSNPs, I still think that the gtype file has so many more YSNPs than the Compare files which have less than 100 CTS YSNPs. There appears to be another quality rating found the gtype file - Q13 and Q0. Does anyone know the following:

1) High level definitions of Q13 and Q0 (Q0 seems to have a slightly higher error rate in my small sample of relevant YSNPs).


2) It seems like the quantity of calls made is probably very important. It varies from 1 call to 458 calls, so I assume YSNPs with only 5 to 10 calls are less reliable and those YSNPs with in the 20 to 50 range. I am not sure how you get 458 calls with only 50 coverage. Calls with very low numbers appear to be unreliable. What are the equivalent blank, *, ** and *** assignments for lower numbered calls.

3) The error rates all seem to be very low for most of the relevant YSNPs that I am looking at. All are zero false hits except for two. Here are few samples:
L430 Q13 errors = 0; Q0 errors = 1, 51 other calls which are the same (assumed to be correct). Error rate 2 % for Q0.
PF2916 Q13 errors = 0; Q0 errors = 4; Q13 - 3 are the same & Q0 - 4 are the same and 4 are others (50 % error rate for Q0).
Z252 Q13 errors = 3; Q0 errors = 5; Q13 correct calls = 452; Q0 correct calls = 458 (less than 1 % error rate).
Since you can simply divide the sum of minority calls over the majority calls - this yields an error rate.

Between three submissions compared, I found around a dozen non-FGC YSNPs that did not track all positive or all negative. CTS10596, CTS12273, CTS2524, CTS2525, K289 and PF2916 all have less than ten results reported (Q0 or Q13). These appear to be unreliable due to low counts for 50 coverage.

L1495, L430, M6006 and M2693 all have 22 to 86 results reported and error rates below 2 %. These appear to be reliable.

PF2916 not only has a low result counts 3 (Q13) and 4 (Q0), but Q0 has a 50 % error rate with 4 of 8 that has to be incorrect.

Since only M2693 is found in the Compare files, many of these YSNP mutations in the gtype file may be very relevant.

L1495 and L430 have only tested positive in non-L21 submissions to date (ISOGG browser), so these may new recurrent YSNPs under L21.

For non-FGC YSNP mutations, should the gtype file be the primary source to compare mutations ? If so, what are reliability criteria rules of thumb for Q13 and Q0 reported results ? If we only trust non-NGC YSNPs in the Compare files, there are so few there, very little information would be available other than the novel FGC YSNPs. Thanks in advance.

RobertCasey
12-10-2013, 02:01 PM
Another addition to the above. I can not find the Ychr position for these relevant YSNPs in any FG files. I have looked at many other sources and found a few but most I could not find any information about the YChr position for most of these YSNPs. If this gtype file is to be useful, we also need the YChr position added into some file (or a reliable link where the YChr position can be found). Without the YChr position neither YSEQ or FTDNA will accept YSNP testing requests.

warwick
12-10-2013, 02:46 PM
Another addition to the above. I can not find the Ychr position for these relevant YSNPs in any FG files. I have looked at many other sources and found a few but most I could not find any information about the YChr position for most of these YSNPs. If this gtype file is to be useful, we also need the YChr position added into some file (or a reliable link where the YChr position can be found). Without the YChr position neither YSEQ or FTDNA will accept YSNP testing requests.

YSEQ (Thomas's Krahn's company) is already offering FGC SNPs. The positions are in the data released in the FGC files to customers who can then choose with whom to test. [That should not be taken as an endorsement by FGC]

RobertCasey
12-10-2013, 04:07 PM
YSEQ (Thomas's Krahn's company) is already offering FGC SNPs. The positions are in the data released in the FGC files to customers who can then choose with whom to test. [That should not be taken as an endorsement by FGC]
I was referring to only non-FGC YSNPs (CTS, PF, L, etc) from the the gtype file. How do we request these YSNPs be added to be tested when no YChr position is found ? Also, any input on how to determine reliability of non-FGC YSNPs (CTS, Z, M, etc.) that are found in the gtype file ?

razyn
12-10-2013, 10:53 PM
YSEQ (Thomas's Krahn's company) is already offering FGC SNPs. The positions are in the data released in the FGC files to customers who can then choose with whom to test. [That should not be taken as an endorsement by FGC]

It's encouraging to see (mainly from a post by Thomas Krahn on the ISOGG Facebook page this evening) that several companies are sort of going with the flow and letting customers get the services they want from the providers willing to gear up and provide them. Stored samples transferred from FTDNA to YSeq and tested for new SNPs discovered at FGC -- looks very helpful, to me.

RobertCasey
12-12-2013, 03:34 PM
I just got an email from Thomas Krahn that he has added four potential Approx Z253/L226 YSNPs - FGC498, FGC5618, FGC5626 and FGC5658.

For those interested in testing out novel Full Genomes YSNPs that are below Z253, there are now three YSNPs available for testing (FGC56XX). These are labeled as high quality YSNPs (reliability greater than 99%). The second FG tester for Z253 is DF825 positive and does not have these YSNPs and none of the 1000 Genomes testers tested positive for these. Also, three DF21, one DF41 and one P312/DF99 tests did not test positive for these four YSNPs. These are brand new YSNPs, so they could be equivalents known Z253 YSNPs (Z253, Z2534 or L226) and eventually could be found to be recurrent as well (they could eventually find mutations in other parts of the haplotree). I randomly selected three YSNPs out of my 40 "private" that were not shared by the PF825 test (none of our private YSNPs were shared). I submitted requests to both FTDNA and YSEQ (Krahn's new testing company) and Thomas Krahn just informed me that these three YSNPs were ready to order.

These YSNPs should be somewhere between Z253 and my South Carolina Casey cluster which is extremely genetically isolated under L226. In order to save testing costs, we really need to determine exactly where these YSNPs are located before we extensively test. I am looking for a couple Z253* and Z2534* submissions in case these YSNPs are above L226. I am also looking for two or three L226 submissions to test as well (the more isolated from the South Carolina cluster the higher the odds of being a broader YSNP including more L226 submissions).

My Casey South Carolina fingerprint (67 markers) is 393 <= 12, 458 <= 16, 449 >= 30, 464b >= 14, 460 >= 12. 534 <= 14 and 481 >= 23. If you are L226 and any of these YSNPs are very private, the more likely that you will test positive if you match part of this fingerprint. Therefore, we need one or two that match part of this fingerprint in case the YSNP is more recent.

The L226 fingerprint (67 markers) is 439 <= 11, 459a <= 8, 459b <= 9, 449 <= 29, 464a <= 13, 464b <= 13, 464c <=15, 456 <= 15 and 557 <= 15. The closer you are to this fingerprint and are Z2534**, the more likely that you will test positive if these YSNPs are pre-L226.

Please submit your orders to YSEQ directly and let everyone know that you are testing, so testing can be coordinated (I will track and post Z253 testing results for YSEQ). We really need to test all three FG56XX YSNPs each time (at $35 each). The FGC498 is more questionable but more interesting. It is the only lowered numbered FGC YSNP which are usually broad shared YSNPs or another earlier tester testing positive. It was listed as private since no 1000 Genomes tester tested positive (the Z253/PF825 FG test did not test positive for FGC498 as well:

http://shop.yseq.net/index.php?cPath=1

Once we test these YSNPs on several submissions and see how these work out, we can then investigate adding three more YSNPs. I will go ahead and determine the next three testing candidates and request FTDNA and YSEQ to add those. FGC164 was also added but we now know that this must be much earlier than L21 since so many 1000 Genomes testers have tested positive for this YSNP. All seven known L21 FG testers (that I have access to) and one P312 tester has tested positive for FGC164, so this YSNP is no longer of interest to L21 research - but could be interesting to other researchers that are pre-P312.

RobertCasey
12-13-2013, 09:58 PM
Using Mark J's MRCAs for Z253 and L226 (Yahoo L21 link - Big Picture tab - Early Expansion rows), Z253 is around 2,200 years old and L226 is around 1,200 years old. I have 43 "private" high quality mutations and since we have another Z253/PF825 submission that does not share any of these YSNPs, all of these "private" YSNPs have to be post Z253. This means 55 % should be, on the average, post L226 (1,200/2,200) and 45 % should be between L226 and Z253. Mike W's current L21 graphic have around 1/3 of the YSNPs being assigned to his descendant chart, 1/3 are being private (under research) and another 1/3 being duplicates, unstable and recurrent mutations (found in the penalty box of Dave R's summaries). Therefore, out of the 43 novel YSNPs, probably one third will be found not to branch defining and should be filtered out. This means around 29 YSNPs of my L226 FG test should be good solid branch defining YSNPs.

This means that 16 (29 x 55 %) YSNPs should be good branch defining ISOGG YSNPs under L226. This is one mutation per 75 years which is close to our 90 years per mutation floating around. This means each L226 tester has 37 % chance (16/43) of testing positive for a good branch defining YSNP L226 YSNP (both ISOGG qualifying and genealogical). By testing the four available FGC YSNPs to test via YSEQ, any L226 tester that tests all four should positive for 1.5 (37 % x 4) good YSNP defining branches under L226 on the average, test positive for 1.2 ((29 x 45 % / 43 ) x 4) branch defining YSNPs between Z253 and L226 and should expect 1.3 YSNPs to be duplicates, multiple recurrent, unstable, X cross-over, etc. So for only $140, an average L226 researcher will find a new YSNP under L226 (requires Z253* and Z2534* tests as well to find the positives that are pre-L226).

Z253* and Z2534* testers will have slightly lower odds due being 45 % of the time frame vs. 55 % of the time frame. This is only a $140 commitment to discover a new L226 YSNP or a new YSNP in between Z253 and L226. Unfortunately, more testing will be required to get it ISOGG qualified - business as usual there - but not near the pain that Mark J will endure for all the DF13 sons. What a bargain to get a new terminal YSNP. L226 people, Z2534* and Z253*, now is the time to test. We also need to add four PF825 YSNPs to discover more branches under PF825. PF825 has a very clean testing candidate curve and looks to be 1,000 to 1,500 years old since there is no evidence of convergence. So testing any four YSNPs should find another YSNP below PF825 or one between PF825 and Z253. WTY's were last $950 for discovering less than one YSNP. This testing can be done now - no need to wait for three or four future L226 Big Y test results. Let's get the ball rolling, Mark J is the only one to date to race for the first FGC qualified ISOGG YSNP under L21. L226 and Z2534 has many fewer sons to test, so let's make L226, PDF835, Z2534* and Z253* deliver some new YSNPs as soon as possible. Go Z253 team !!

RobertCasey
12-13-2013, 10:04 PM
Update on FGC498. Greg M posted in the L21 Yahoo forum confirmed that FGC498 must be recurrent as another non-L21 submission has tested positive in the 1000 Genomes project. But only one other person out of over one thousand has tested positive, so he thinks this would a very stable YSNP to test (we should test first so that we can get FGC498.1 on the ISOGG haplotree first). So anyone should test all four FGC YSNPs available for testing from YSEQ. We could have many more, but do not want to waste Thomas K's time if nobody is going order the ones that we already have ready to test.