January 29, 2014 at 4:12 am
Abu Dina (1/29/2014)
ChrisM@Work (1/28/2014)
Gail's suggestion using DENSE_RANK() is well worth a try when you get time.Is it?
So what happens if I have two records with the same Organisationname but the address1 of the first record matches the address2 of the second?
Is it possible to have multiple PARTITION BYs using the same rank?
It's the method which is significant. As Gail pointed out, with no expected result set it's up to you to figure out the correct list of columns, and even the most appropriate function. You've mentioned PARTITION BY and I too think that ROW_NUMBER() could meet your requirements here. It's a completely different method to the self-join and will swap 20 gazillion index seeks for a huge spilling sort. One method will be cheaper than the other. I'm surprised you haven't investigated which might be the most appropriate in this case.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 30, 2014 at 8:43 am
GilaMonster (1/27/2014)
Abu Dina (1/27/2014)
It's really about finding duplicate entries within the record set. A duplicate in this case is at site/business level.Same company same address = duplicate
No company but with same address = duplicate
Order the result set by company name and the duplicates will seem obvious.
In that case it should be as simple as:
DENSE_RANK() OVER (Order By Organisationname, Address1, Address2, Address3, Address5, Town, County, Postcode, Country) AS SiteID
Same number assigned for companies with the same name and address
Is Organisationname the same as Company?
Or is Company the Site that sent part of the data?
You have a lot of things that appear similar, which could be the same, but maybe not.
Fuzzy Lookup and confidence rating are different than Exact Match.
And if this is information that is being sent by multiple sites daily, weekly monthly, etc. what you intend to do with the duplicates might also be a question.
Is your intent to have them clean up what they are sending?
Or invent your own cleansing routine?
Viewing 2 posts - 31 through 31 (of 31 total)
You must be logged in to reply to this topic. Login to reply