July 8, 2016 at 1:57 pm
Very nice work, Alan. I think you explained the concept really well and used some very good examples. I have to tell you that it reminds me of a roll-your-own index to support leading wildcards in a LIKE operator.
I'm definitely looking forward to the rest of the series.
July 8, 2016 at 2:07 pm
Ed Wagner (7/8/2016)
Very nice work, Alan. I think you explained the concept really well and used some very good examples. I have to tell you that it reminds me of a roll-your-own index to support leading wildcards in a LIKE operator.I'm definitely looking forward to the rest of the series.
Thanks a lot Ed!
-- Itzik Ben-Gan 2001
November 6, 2017 at 11:46 am
Alan - I know I'm a little late to the party (can't believe I'm finding this just now), excellent work sir!
I haven't had the opportunity to give it the attention it deserves... I plan on doing that (and reading the full series) this evening.
November 7, 2017 at 8:51 pm
Jason A. Long - Monday, November 6, 2017 11:46 AMAlan - I know I'm a little late to the party (can't believe I'm finding this just now), excellent work sir!
I haven't had the opportunity to give it the attention it deserves... I plan on doing that (and reading the full series) this evening.
Thanks a lot Jason!
If you read Part 1 you're all caught up. Being a new dad has dug into my writing time but I'm trying my best to get the next couple installments out by the end of the year.
-- Itzik Ben-Gan 2001
November 7, 2017 at 9:47 pm
Alan.B - Tuesday, November 7, 2017 8:51 PMJason A. Long - Monday, November 6, 2017 11:46 AMAlan - I know I'm a little late to the party (can't believe I'm finding this just now), excellent work sir!
I haven't had the opportunity to give it the attention it deserves... I plan on doing that (and reading the full series) this evening.Thanks a lot Jason!
If you read Part 1 you're all caught up. Being a new dad has dug into my writing time but I'm trying my best to get the next couple installments out by the end of the year.
I had no idea... Congratulations on the new member of the family!
The kudos are well deserved my friend. Of course you also gave me a kick in the backside to go back and re-investigate a previously abandoned idea for the table-less working days function.
I figured out how to compress a full date (at least the ones in my working range) all the way down to BINARY(2) and bring it back w/o loss of information...
Then doing something similar to the NGRAMs by using a cte_tally to slide along the concatenated binary, and find the min & max row_number() values between the begin and end dates.
Of course, the tally has me back to square one with the Cartesian product with the outer table but I'm still beating it down, one step at a time...
I need to reread this again to see what, if anything you're doing to deal with the tally/outer table thing...
In any case, great work!
November 22, 2018 at 1:37 am
Hi,
there has , in fact been some research on fast qgrams with SQL, see for instance
http://www.cs.columbia.edu/~gravano/Papers/2001/deb01.pdf
Lauri
February 11, 2019 at 12:32 pm
lauri.pietarinen - Thursday, November 22, 2018 1:47 AM
Thanks for posting this Lauri. I just left you a longer reply and my browser crashed. I just read both of these, very interesting stuff!
-- Itzik Ben-Gan 2001
February 14, 2019 at 3:19 pm
Hi Alan,
thanks for reading them. It would have been nice to see your longer answer.
Microsoft has(had?) a reseearch group on data cleaning https://www.microsoft.com/en-us/research/project/data-cleaning/
which resulted in the fuzzy-matching function in SSIS, and it is also available as an ad on for Excel. If you look at the research papers they reference the papers I mentioned in my post.
The SSIS function works actually quite well, I used it in a project to match customers with names and addresses. Used iteratively and interactively it works quite well.
Lauri
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply