Results 1 to 10 of 10

Thread: Big Y - different reads for hg19 and hg38

  1. #1
    Registered Users
    Posts
    282
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA
    R-Y14088
    mtDNA
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia

    Question Big Y - different reads for hg19 and hg38

    Does anybody have an idea, what is behind the big differences between hg19 and hg38 on some positions?

    One example: position 22463781 (hg19) = 20301895 (hg38)

    Kit # 195364 hg19: 24T, 97G - hg38: 12G
    Kit # 348782 hg19: 25T, 70G - hg38: 13G

    Where disappeared those 109 and 82 reads respectively?

    And where is the origin of this difference? In FASTQ -> BAM process, or in the processing of the BAM file?

    In case of direct conversion of hg19 BAM to hg38 BAM, would be the result the same?

    FGC Y Elite 1.0 shows on the same position in hg19 3A 116T 553G and WGS 15x shows in hg19 5T 6G, it would be interesting to see the result in hg38, I already ordered the analysis but it will take some time.
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  2. The Following 3 Users Say Thank You to Petr For This Useful Post:

     Celt_?? (11-05-2017), gotten (11-04-2017), Robert1 (11-04-2017)

  3. #2
    Registered Users
    Posts
    87
    Sex
    Omitted

    The position you cite is probably in a Palindromic region, (most positions starting with 224 are). In theory both arms of the palindrome are identical, and so reads can get mapped to two positions (one on the forward, one on the backward strand), if you would blast the sequence around the position you would find the corresponding positon of the second arm of the palindrome. But sometimes an SNP happens on one of the arms of the palindrome. As Palindromes can recombine it's possible for such an SNP to be either erased or copied to the second arm, but generally it will remain as such. The HG37 results refect this, reads from both arms get mapped to the same position, hence both T and G, with a preference for the G (this might be due to strand bias).
    What is surprising however is that in HG38 he finds just G, and less so. While it might well be that G is the correct read I do not understand how the switch to HG38 would make for more accurate mapping in the palindromic area. Alternatively it's possible that they also upgraded their mapping tools creating better results. Offcourse there is still the problem of the dissapearing G reads.
    I tried to blast the sequence at that position to find the corresponding position on the other arm of the palindrome (if it is in fact on a palidrome), but NCBI's Blast tool doesn't find anything (it should at least find the original position). At first glance the region doesn't seem too repetitive. I'll try to see if I can find out the second position later. Would be interesting if FTDNA works better with palindromes now.

  4. The Following 3 Users Say Thank You to rafc For This Useful Post:

     Celt_?? (11-05-2017), Petr (11-04-2017), Robert1 (11-04-2017)

  5. #3
    Registered Users
    Posts
    102
    Sex
    Location
    Wisconsin, USA
    Nationality
    American
    Y-DNA
    R1b-FGC29071
    mtDNA
    U5a1b1g*

    Ireland England Netherlands Germany France
    Reads are placed onto a reference using edit distances. Modifications to the reference can cause reads to shift around substantially, if there are new areas where the 100 bases fit better.

    To see what is going on spot check some of the individual read sequences through a BLAT search using GRCh38/hg38: http://genome.ucsc.edu/cgi-bin/hgBlat

  6. #4
    Registered Users
    Posts
    282
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA
    R-Y14088
    mtDNA
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    Thank you, unfortunately I have no idea how to use BLAT, how I could obtain the sequence to test?
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  7. #5
    Registered Users
    Posts
    102
    Sex
    Location
    Wisconsin, USA
    Nationality
    American
    Y-DNA
    R1b-FGC29071
    mtDNA
    U5a1b1g*

    Ireland England Netherlands Germany France
    I use IGV when actually looking at BAMs, so these instructions are specific to that tool.

    1) Right click a read to reveal the context menu
    2) Select 'copy read sequence'
    3) Vist the URL in my previous post and paste the string into the text box
    4) Ensure you have GRCh38/hg38 selected and submit the search.

    The results will tell you which segments are most similar to the read, and aligners will typically concur with the top result from BLAT. You want to see if the read is being put somewhere completely differently or there are regions of high similarity.

    I should note IGV actually has a menu item that makes that process one step when your reads are aligned to the same reference. It won't help here since the BAM you have access to is hg19.

  8. The Following 4 Users Say Thank You to JamesKane For This Useful Post:

     gotten (11-05-2017), Muircheartaigh (11-04-2017), Petr (11-05-2017), Robert1 (11-05-2017)

  9. #6
    Registered Users
    Posts
    87
    Sex
    Omitted

    Nevermind
    Last edited by rafc; 11-05-2017 at 08:51 AM.

  10. #7
    Registered Users
    Posts
    87
    Sex
    Omitted

    On second look the location you cite is in the DYZ19 area. This is known to be a very complex and highly variable region. It seems there are some differences in DYZ19 between HG37 and 38 so that might explain the new results. Let's hope it's an improvement.

  11. #8
    Registered Users
    Posts
    282
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA
    R-Y14088
    mtDNA
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    I have checked 8 kits I manage and FTDNA found the following new variants in the DYZ19 area:

    hg38 position: hg19 reads -- hg38 reads
    20067651: 199A 17T -- 13T
    20067680: 181A 20C -- 9C
    20067682: 182A 21T -- 9T
    20067692: 175C 20G -- 9G
    20076829: 45A 3T -- 23A
    20101608: 26G -- 21G
    20101641: 29T -- 12T
    20101641: 34T -- 16T
    20111610: 23A 1T -- 16A
    20125974: 49C 57G -- 48C 8G
    20136397: 94T 71G -- 61T 3G
    20136526: 86A 44G -- 25G
    20136543: 71A 28C -- 10C
    20136544: 70A 28T -- 10T
    20136545: 28T 70G -- 10T
    20136552: 62A 28G -- 10G
    20147942: 63A 83C -- 34A 45C
    20278796: 1A 1C 79G -- 1C 34G
    20284034: 126A 8G -- 71A
    20284466: 49T 122C -- 37T 4C
    20296178: 83A 38C -- 11A
    20296405: 54C -- 54C
    20296736: 7A 115T -- 73T
    20296965: 145T 2C 122G -- 139T 7G
    20300432: 86A 19T -- 82A
    20301875: 43A 76T -- 35T
    20301895: 25T 70G -- 13G
    20301895: 24T 97G -- 12G
    20302968: 70G -- 40G
    20305178: 100A 119C -- 64A 1C
    20308040: 22A 33G -- 12A
    20308041: 22T 33C -- 12T
    20308044: 22T 33C -- 12T
    20308053: 32A 31G -- 16A
    20311965: 8A 83T -- 74T
    20314557: 25A 48T -- 12A
    20314561: 48T 25G -- 12G
    20314765: 66A 1T 94C 6G -- 5A 72C
    20314848: 69G -- 9G
    20316024: 134A 26C -- 133A
    20324088: 51T 167C -- 20C
    20324129: 109A 45G -- 9A
    20325095: 6A 15T 126C 434G 3DEL -- 16C
    20344583: 11A 1T -- 10A
    20345643: 59T 10G -- 18T
    20345643: 77T 11G -- 35T 1G
    20346468: 8A 101C 386G -- 52C
    20348281: 96A 595G -- 32A 2G
    20348281: 118A 1T 589G -- 33A
    20348281: 84A 434G -- 16A 1G
    20349897: 155T 217G -- 59T 2G

    So really big difference.

    Now there is a question what is the cause of this change. hg38 itself? Or different alignment process done by FTDNA?

    It looks like some variants are specific for certain haplogroups - 20348281 G->A R1a, 20345643 G->T R-Z280.
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  12. #9
    Registered Users
    Posts
    38
    Sex

    They are probably getting read o really bad mapping quality reads.

    Look the the VCF for the total number of reads and the number of mapping quality zero.

    Discarding ALL reads of mapping quality zero results in enormous increasesa
    in reliability.

  13. The Following 2 Users Say Thank You to dtvmcdonald For This Useful Post:

     Mikewww (11-14-2017), Petr (11-14-2017)

  14. #10
    Registered Users
    Posts
    282
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA
    R-Y14088
    mtDNA
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    Do you mean
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description= "Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##INFO=<ID=MQ0,Number=1,Type=Integer,Description=" Total Mapping Quality Zero Reads">
    ?


    20301895: 24T 97G -- 12G -- DP=12 -- MQ0=0

    chrY 20301895 . T G 69.9194 PASS BQ=25.343;GC=0.458725;HL=2;HR=2;IndelCnt=0;MQ=46.4 013;MQ0=0;MismatchCnt=0 GT:AD:DP:GQ:PL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:R AD:AS 1/1:0,12:12:19:194,194,0:1:0.25:25:0,4:0,1:0,95.3333 :0,9:0,11.9645
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

Similar Threads

  1. BigY move to hg38
    By simdadams in forum DF41
    Replies: 29
    Last Post: Today, 04:46 PM
  2. Replies: 2
    Last Post: 11-05-2017, 06:03 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •