PDA

View Full Version : Program to find matches Y-DNA



SSlava
08-09-2016, 07:15 PM
Whether such program is necessary to you))? I here studied programming)) developed))

It is possible to build the trees on the set gaplotype which are also very quickly given to download from the website of FTDNA.

http://s018.radikal.ru/i504/1511/c8/eb8da8269872t.jpg (http://radikal.ru/fp/6a899f760f55457789b734f1152da708)

http://s017.radikal.ru/i402/1512/61/9695d35600cct.jpg (http://radikal.ru/fp/c1162c2103ad457da9524a60fb3e5079)

SSlava
08-09-2016, 08:56 PM
http://s014.radikal.ru/i329/1512/35/443fa0cc6e3a.jpg

SSlava
08-09-2016, 08:57 PM
In the program it is possible to build a tree of several thousand gaplotype. On time takes about five minutes))

SSlava
08-09-2016, 09:02 PM
And search of matches can be realized according to the database which I collected for example. There about 50 thousand gaplotype or more. I definitely don't remember)) Process of adding is automated))

AJL
08-13-2016, 09:48 PM
Thread moved from General to yDNA-Other subforum with a one-week expiring redirect.

ChrisR
10-14-2016, 11:53 PM
Whether such program is necessary to you))? I here studied programming)) developed))

It is possible to build the trees on the set gaplotype which are also very quickly given to download from the website of FTDNA.

http://s018.radikal.ru/i504/1511/c8/eb8da8269872t.jpg (http://radikal.ru/fp/6a899f760f55457789b734f1152da708)


Genosearch 0.1 Beta
Do I understand this correctly: this tool is searching in a database of STR haplotypes the nearest kits by GD to a given haplotype?
Is this programm available in English and for public Beta testing?

RobertCasey
10-15-2016, 05:08 AM
Note - AVG warned me that the pop-up screen Russian advertise has viruses (on the graphics). Be warned if you do not have a good anti-virus installed.

Anything that charts everything - just does not work with YSTRs and YSNPs. There is just not enough genetic data to create accurate charts. Also, anything based on mathematical models in academic circles does not work either. What does work is charting based on YSTR signature recognition of submissions that have been YSNP tested. Also, chart building on older branches will not work either as YSTR markers are not reliable in the timeframe before 1,200 to 2,500 years. What can be done is partial charting of predictable YSNPs but it can chart two or three times more that what is fully YSNP tested. But that last ten or twenty percent will be very difficult since I am finding that 10 or 20 percent of a predicted YSNP can only be charted with extensive YSNP testing due lack of divergence for a certain amount of testers.

For example, I am finding 90 % errors for YSTR matches at 1 to 3 GD (omitting CDY markers). This is where YSNPs are over 1,000 years old yet they still match at genetic distances of 1 to 3 at 67 markers for a significant part of R-L226. Here is my latest manual analysis of R-L226:

http://www.rcasey.net/DNA/R_L226/Haplotrees/L226_Home.pdf

This process could be easily automated via coding. There is another tool, SAP, that I beta tested. This tool is still useful but uses neighbor joining algorithms which just do not represent how YDNA actually works. It gets pretty close on creating clusters but does not join the clusters in any reliable manner. It also significant performance issues and only allows around 100 to 150 submissions before the tool times out.

http://www.jdvtools.com/SAPP/

(http://www.jdvtools.com/SAPP/)https://www.dropbox.com/s/jkenv7o2nh5h1ak/SAPP%20-%20Vector%20Angle%20Calculations.pdf?dl=0

ChrisR
10-16-2016, 07:46 PM
Thanks RobertCasey. See also my post in "Cladogram from STR Values" (http://www.anthrogenica.com/showthread.php?4590-Cladogram-from-STR-Values&p=192829&viewfull=1#post192829).

SSlava
10-18-2016, 11:58 AM
Do I understand this correctly: this tool is searching in a database of STR haplotypes the nearest kits by GD to a given haplotype?
Is this programm available in English and for public Beta testing?

Sorry, I forgot about the program))

I can make a few days later version, I will lay out.

I do not know the truth, how well the trees are obtained. but the result did not seem so bad.

SSlava
10-19-2016, 11:36 PM
Created version.

SSlava
10-19-2016, 11:44 PM
https://drive.google.com/file/d/0BwsW0x6OzYExbjNXRFE2V3hZV0k/view

Well, that's about how it works. For example, type the following line:

11 24 14 10 11-14 12 12 12 13 13 30 18 9-10 11 11 25 15 19 30 15-15-17-17 11 12 19-24 15 15 18 17 36-37 12 12 12 9 15-16 8 10 10 8 10 10 12 22-23 17 10 12 12 16 8 11 22 19 13 12 11 13 11 11 12 12

Choose a distance of 10 mutations

It turns out that's a result of:
http://www.picshare.ru/uploads/161020/493bR3hQOO_thumb.jpg (http://www.picshare.ru/view/7708734/)

SSlava
10-20-2016, 12:50 PM
Well, that's about how it works:

https://drive.google.com/file/d/0BwsW0x6OzYExbjNXRFE2V3hZV0k/view

Here, for example enter here the line (for example, you can copy in a format directly from FTDNA site):

11 24 14 10 11-14 12 12 12 13 13 30 18 9-10 11 11 25 15 19 30 15-15-17-17 11 12 19-24 15 15 18 17 36-37 12 12 12 9 15-16 8 10 10 8 10 10 12 22-23 17 10 12 12 16 11 August 22 19 December 13 November 13 November 11 December 12

http://www.picshare.ru/uploads/161020/493bR3hQOO_thumb.jpg (http://www.picshare.ru/view/7708734/)

If you choose to search for the 10 mutations, such a result will give.

Well and further it is possible for this result to construct a tree, for example

http://www.picshare.ru/uploads/161020/DP12GnLR7Z_thumb.jpg (http://www.picshare.ru/view/7710148/)

You press the right mouse button, select the MRCA

http://www.picshare.ru/uploads/161020/G1QSazcJ7h_thumb.jpg (http://www.picshare.ru/view/7710182/)
Well, if about these settings, turn the two files, remote file with the matrix, and the tree file.
The tree should be opened in the Mega program

http://www.megasoftware.net/

Well, it turns out such a result, if you select a pie chart and save to PDF:

https://drive.google.com/open?id=0BwsW0x6OzYExQmR2OUpIemk3T1k

ChrisR
12-08-2016, 05:41 PM
...
Sorry for the late reply. Compliments: YSTRSearcher 0.7.5 Beta already works very well and is useful! :thumb:

Hoping to make something useful let me give a feedback and improvement wishes:
Date and sources of the included database: it would be very helpful to know when the newest data in the database was included and the sources (projects, maybe as an additional hidden row)
Import function: could maybe optimized by allowing the import of FTDNA GAP Download File Y-DNA Results Classic CSV etc. - even better would be to have a second "personal-local" database which can be linked and when having duplicated IDs (Kit_Number) only the haplotype from the "personal-local" database should be listed. This would be work very well combined with a continuously updated online database.
when entering Y-STR haplotypes with some values missing (like old Ancestry Y-STR transfers or study results converted to the FTDNA format) the missing values are not filled in correctly as empty columns.
Dist. markers: would be nice if the software would have the preset of a percentage (like 15%) and allow to change the preset to a numerical value or a percentage
Kit number filter and searching for matches: simply entering a kit number and select it as base for matches would speed up usage considerably
automatic "down-filtering" of haplotype from Y111 to Y67 to Y37 etc.: it would be very nice if after entering/using a Y111 haplotype it would be possible to use the same haplotype but only with Y67, Y37, etc.
quick copy & paste of selected rows/columns to spreadsheet or export to file: selecting whole sets of rows and copy/paste is possible but not certain subsets (not all rows, for example without the marker values)
when entering a Y25 FTDNA haplotype the DYS464 values (last column) seem to cause a problem "An error in parsing a query". Leaving this value away results in a working query.
if the GD is too high for the selected markers the software is blocked for a longer time, maybe a confirmation windows when the GD>30% helps to avoid this by mistake?

I'm happy to help to help in any way I can to improve this tool and maybe make it a core instrument for quick haplotype match searches (not possible with FTDNA project GAP, ysearch, semargl.me, etc.)

RobertCasey
12-08-2016, 08:14 PM
Do I understand this correctly: this tool is searching in a database of STR haplotypes the nearest kits by GD to a given haplotype?
Is this programm available in English and for public Beta testing?

Chris - for L21 and P312, you can use Mike W's spreadsheets that are posted. For U106, I think that group also keeps a pretty good copy of their YSTR submissions. Across the rest of the genome, I am not sure but it probably varies in coverage and accuracy. You should always go to your particular haplogroup project to see what is being collected by those project admins. I have recently expanded my coverage to all of haplogroup R - but it takes a lot of time to keep this information up to date and there is a lot of redundant overlap of collecting this data and major variations of completeness as well.

SSlava
01-04-2017, 04:03 PM
Sorry for the late reply. Compliments: YSTRSearcher 0.7.5 Beta already works very well and is useful! :thumb:

Hoping to make something useful let me give a feedback and improvement wishes:
Date and sources of the included database: it would be very helpful to know when the newest data in the database was included and the sources (projects, maybe as an additional hidden row)
Import function: could maybe optimized by allowing the import of FTDNA GAP Download File Y-DNA Results Classic CSV etc. - even better would be to have a second "personal-local" database which can be linked and when having duplicated IDs (Kit_Number) only the haplotype from the "personal-local" database should be listed. This would be work very well combined with a continuously updated online database.
when entering Y-STR haplotypes with some values missing (like old Ancestry Y-STR transfers or study results converted to the FTDNA format) the missing values are not filled in correctly as empty columns.
Dist. markers: would be nice if the software would have the preset of a percentage (like 15%) and allow to change the preset to a numerical value or a percentage
Kit number filter and searching for matches: simply entering a kit number and select it as base for matches would speed up usage considerably
automatic "down-filtering" of haplotype from Y111 to Y67 to Y37 etc.: it would be very nice if after entering/using a Y111 haplotype it would be possible to use the same haplotype but only with Y67, Y37, etc.
quick copy & paste of selected rows/columns to spreadsheet or export to file: selecting whole sets of rows and copy/paste is possible but not certain subsets (not all rows, for example without the marker values)
when entering a Y25 FTDNA haplotype the DYS464 values (last column) seem to cause a problem "An error in parsing a query". Leaving this value away results in a working query.
if the GD is too high for the selected markers the software is blocked for a longer time, maybe a confirmation windows when the GD>30% helps to avoid this by mistake?

I'm happy to help to help in any way I can to improve this tool and maybe make it a core instrument for quick haplotype match searches (not possible with FTDNA project GAP, ysearch, semargl.me, etc.)

Thank you very much for the answer! Yes, it is necessary to consider how to improve the program.


when entering Y-STR haplotypes with some values missing (like old Ancestry Y-STR transfers or study results converted to the FTDNA format) the missing values are not filled in correctly as empty columns.

And which haplotypes, please, can write some examples?
Well, I'm out there trying to enter any additional values. They are turned off in the options. I tried to make search in other formats, or to himself user could customize the order, but not before the end of this feature implemented..


Import function: could maybe optimized by allowing the import of FTDNA GAP Download File Y-DNA Results Classic CSV etc. - even better would be to have a second "personal-local" database which can be linked and when having duplicated IDs (Kit_Number) only the haplotype from the "personal-local" database should be listed. This would be work very well combined with a continuously updated online database.


This opportunity I had. But I do not know how filter accidentally put)). Need to check.

ChrisR
01-04-2017, 09:12 PM
Thank you very much for the answer! Yes, it is necessary to consider how to improve the program.Happy to hear you are interested in the improvement.


And which haplotypes, please, can write some examples?
Well, I'm out there trying to enter any additional values. They are turned off in the options. I tried to make search in other formats, or to himself user could customize the order, but not before the end of this feature implemented..I add a TXT and a PDF file which is hopefully useful for you: 13450, 13451

SSlava
01-04-2017, 09:18 PM
Happy to hear you are interested in the improvement.

I add a TXT and a PDF file which is hopefully useful for you: 13450, 13451

That is it is necessary to make so that it was possible to choose what values to include in search?
Table, where it is possible to include or switch off markers? but the truth multiallelic it will be impossible to separate.

SSlava
01-04-2017, 09:24 PM
Well just many various options of formation of values, except FTDNA. In a different order settle, and different markers are passed.
Only such Table it is possible to solve a problem it seems.

The truth as import to make even more convenient, it is necessary to think.

MfA
01-05-2017, 02:43 PM
Importing FTDNA haplotypes from a file would be beneficial to me. I'd like to create MRCA trees based on haplotypes I enter.

ChrisR
01-05-2017, 03:26 PM
That is it is necessary to make so that it was possible to choose what values to include in search?
Table, where it is possible to include or switch off markers? but the truth multiallelic it will be impossible to separate.

Well just many various options of formation of values, except FTDNA. In a different order settle, and different markers are passed.
Only such Table it is possible to solve a problem it seems.
The truth as import to make even more convenient, it is necessary to think.
The PDF output was created with Y-Utility by Dean McGee (2014) (http://www.mymcgee.com/tools/yutility111.html). See also the derived MODIFIED Y-Utility by Colin Ferguson (2015) (http://www.dna.cfsna.net/HAP/Modified_yUtility.htm).
It is very user friendly how Y-Utility can use Copy and Paste from FTDNA classic chart even with different numbers of non STR-Data Columns.
I usually also convert other Y-STR results (non FTDNA) to the FTDNA format and use them (like in the TXT file).
YSTRSearcher seems to not recognize multiple Tab separations and therefore pasting haplotypes like in the TXT file results into STR-Values in the wrong Column.
It is not necessary to include other STR-Marker format styles for now I think, just to make YSTRSearcher able to see "empty" columns in pasted haplotypes.

SSlava
01-05-2017, 06:28 PM
It is not necessary to include other STR-Marker format styles for now I think, just to make YSTRSearcher able to see "empty" columns in pasted haplotypes.

Sorry, badly I understand English)).
How I have understood, at addition of haplotype you want to see empty columns on the place of empty values?
In The program of this feature is implemented, so you can add.
Without such function it would be impossible even to import haplotypes from the website FTDNA.

I will write a bit later what dividers I used. Already itself has forgotten))

SSlava
01-05-2017, 06:33 PM
I'd like to create MRCA trees based on haplotypes I enter.

The program can create trees by means of the Neighbor Joining method.

But yes, unfortunately the opportunity to add a haplotype directly in the table or editing is not provided.
The program of course isn't finished, and something is possible in it works not correctly, or it is inconvenient in use. The other day probably I will be engaged in the program.

SSlava
01-05-2017, 06:34 PM
It is necessary to write a program operation manual in English

SSlava
01-05-2017, 07:32 PM
The PDF output was created with Y-Utility by Dean McGee (2014) (http://www.mymcgee.com/tools/yutility111.html). See also the derived MODIFIED Y-Utility by Colin Ferguson (2015) (http://www.dna.cfsna.net/HAP/Modified_yUtility.htm).
It is very user friendly how Y-Utility can use Copy and Paste from FTDNA classic chart even with different numbers of non STR-Data Columns.
I usually also convert other Y-STR results (non FTDNA) to the FTDNA format and use them (like in the TXT file).
YSTRSearcher seems to not recognize multiple Tab separations and therefore pasting haplotypes like in the TXT file results into STR-Values in the wrong Column.
It is not necessary to include other STR-Marker format styles for now I think, just to make YSTRSearcher able to see "empty" columns in pasted haplotypes.

It seems, division by tabulation has to work. And from where you have copied haplotypes?

For example:
444916 Yadger Shomon, b.~1840 in Separghan, Urmia, Persia Iran

Just do not forget that all multi-allelic markers or additional non-standard values of markers must be in communication of dashes. If you add the haplotypes from other places.
Here, for example, the Name column is missing.

http://www.picshare.ru/uploads/170105/p9mu0yW9J3_thumb.jpg (http://www.picshare.ru/view/7840078/)

13466


https://www.familytreedna.com/public/J2-M172?iframe=yresults

At standard addition all columns have to be divided by a tabulation sign.
Symbols of tabulation have to be also for empty columns.
To add the remote values.

Or I haven't understood something else?))
But don't forget about the passed first columns.

SSlava
01-05-2017, 07:36 PM
But I have found other mistake. When this haplotype are entered in the search, the program crashes. It is necessary to fix the problem.

ChrisR
01-07-2017, 02:03 AM
It seems, division by tabulation has to work. And from where you have copied haplotypes?

For example:
444916 Yadger Shomon, b.~1840 in Separghan, Urmia, Persia Iran

Just do not forget that all multi-allelic markers or additional non-standard values of markers must be in communication of dashes. If you add the haplotypes from other places.
Here, for example, the Name column is missing.

https://www.familytreedna.com/public/J2-M172?iframe=yresults

At standard addition all columns have to be divided by a tabulation sign.
Symbols of tabulation have to be also for empty columns.
To add the remote values.

Or I haven't understood something else?))
But don't forget about the passed first columns.
Do I understand correctly that you mean importing haplotypes? Because copy and pasting them to the YSTRSearcher haplotype field as far as I understand is only possible without the descriptive columns?
When I copy and paste the STR value data of kit 444916 from Classic Chart (https://www.familytreedna.com/public/J2-M172?iframe=yresults) on my Win10x64 system using Firefox or Chromium YSTRSearcher 0.7.5 Beta does not seem to recognize anything:
copy and paste data (tabs in this quote may not be included):

12 22 14 10 14-17 11 15 11 13 11 29 16 11 11 27 20 30 13-13-15-15-16-17 10 10 19-22 21 12 13
YSTRSearcher 0.7.5 Beta after click on "Search" (just empty, no error or other message appears)
13506