May 16, 2009 at 9:58 am
Thats for that. have run it and it does not seem to have helped. Without searching for name variations the query takes 17 seconds. I have let it run for three hours so far and there is no sign of any results being returned.
I had to make a small change to add source-name into the functions so now it looks like this
WITH cteSurname95 as (
Select Source_Name,Name_Variation
From Names_Database..Surname
Where Score >= 95
), cteForeName95 as (
Select Source_Name,Name_Variation
From Names_Database..Forename
Where Score >= 95
With out this there was a syntax error.
Is there any other information that would help?
May 16, 2009 at 10:26 am
sperry (5/16/2009)
Thats for that. have run it and it does not seem to have helped. Without searching for name variations the query takes 17 seconds. I have let it run for three hours so far and there is no sign of any results being returned....
Is there any other information that would help?
Thanks for the feedback Sperry, and sorry that it isn't fixed so far. Logically it is not much different that before, but oftentimes just cleaning up the syntax will help the optimizer, plus I had hoped that getting rid of the duplicate self-joins might fix it.
I am continuing to work on it. One thing that would help, would it be possible to get a copy of the execution plan? thnx.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 16, 2009 at 10:29 am
Oops, thought of something else...
sperry (5/16/2009)
Thats for that. have run it and it does not seem to have helped. Without searching for name variations the query takes 17 seconds. I have let it run for three hours so far and there is no sign of any results being returned.
How long did these two queries take for the original code? And what are you looking for as a "reasonable" time?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 16, 2009 at 10:41 am
sperry (5/16/2009)
...I had to make a small change to add source-name into the functions so now it looks like this
WITH cteSurname95 as (
Select Source_Name,Name_Variation
From Names_Database..Surname
Where Score >= 95
), cteForeName95 as (
Select Source_Name,Name_Variation
From Names_Database..Forename
Where Score >= 95
With out this there was a syntax error.
Is there any other information that would help?
Yeah, sorry about that. I really need to mock-up some dummy tables so that I can test for all of the errors ahead of time. Could you post the CREAT table DDL for the PERSON and PERSON_SUBSET tables, with keys and indexes? Thanks.
Oh also, can you tell me about what percentage of the Forname & Surname records have a [Score] of 95% or greater?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 16, 2009 at 5:06 pm
OK, I made my own table definitions and generated random data to make a more real performance test. After that, it became apparent that the problem was coming from the OR's that include the PERSON Sur/Fore names in with the name_variants for matching to PERSON_SUBSET. The optimiser seems to be doing really bad things because of that and though I could reduce its effect, I could not find a way to make it go away in a single query.
Therefore, I had to make some temp tables to hold the combined PERSON Sur/Fore names with the name_variants. This now performs extremely well on my system:
create proc spSurnameMatching_TempTAnswer
AS
--====== Create the 95% Forename table + all Person names
Create table #Forename95_U_P(
Source_Name varchar(20)
, Name_Variation varchar(20)
, Primary Key (Source_name, Name_Variation)
)
INSERT into #Forename95_U_P
Select Source_name, Name_variation
--From Names_Database..Forename
From Forename
Where Score >= 95
INSERT into #Forename95_U_P
Select distinct Forename, Forename
From PERSON P
Left Join #Forename95_U_P U
ON P.Forename = U.Source_name
And P.Forename = U.Name_variation
Where U.Source_name IS NULL
----====== Create the 95% Surname table + all Person names
Create table #Surname95_U_P(
Source_Name varchar(20)
, Name_Variation varchar(20)
, Primary Key (Source_name, Name_Variation)
)
INSERT into #Surname95_U_P
Select Source_name, Name_variation
--From Names_Database..Surname
From Surname
Where Score >= 95
INSERT into #Surname95_U_P
Select distinct Surname, Surname
From PERSON P
Left Join #Surname95_U_P U
ON P.Surname = U.Source_name
And P.Surname = U.Name_variation
Where U.Source_name IS NULL
--====== Find all Person-subsets that 95% match persons
;WITH ctePersToPersSubs as (
Select *
FROM PERSON P
INNER JOIN PERSON_SUBSET PER
ON PER.EnglishForename IN (
Select fn.Name_Variation
From #Forename95_U_P fn
Where fn.Source_Name = P.Forename
)
AND PER.EnglishSurname IN (
Select fn.Name_Variation
From #Surname95_U_P fn
Where fn.Source_Name = P.Surname
)
)
SELECT Distinct
P.ID,
P.PersonID,
1,
1
From ctePersToPersSubs P
WHERE P.Forename ''
AND P.Surname ''
--====== clean-up temp tables
drop table #Forename95_U_P
drop table #Surname95_U_P
--==================
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 16, 2009 at 5:09 pm
Here is the best result that I could get with a single query:
--create proc spSurnameMatching_CTEAnswer AS
--====== Find all Person-subsets that 95% match persons
;WITH cteSurname95 as (
Select Source_name, Name_Variation
--From Names_Database..Surname
From Surname
Where Score >= 95
UNION
Select Surname, Surname
From PERSON P
), cteForeName95 as (
Select Source_name, Name_Variation
--From Names_Database..Forename
From Forename
Where Score >= 95
UNION
Select Forename, Forename
From PERSON P
), ctePersToPersSubs as (
Select *
FROM PERSON P
--INNER JOIN C6..PERSON_SUBSET PER
INNER JOIN PERSON_SUBSET PER
ON PER.EnglishForename IN (
Select fn.Name_Variation
From cteForeName95 fn
Where fn.Source_Name = P.Forename
)
AND PER.EnglishSurname IN (
Select fn.Name_Variation
From cteSurname95 fn
Where fn.Source_Name = P.Surname
)
)
SELECT Distinct
P.ID,
--PER.PersonID,
P.PersonID,
1,
1
From ctePersToPersSubs P
WHERE P.Forename ''
AND P.Surname ''
Although it is almost 80x faster than both the original query and my first attempted fix here, it is still about 50x slower than the temp table version above.
Let me know how these work for you.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 17, 2009 at 7:36 am
Thank you SO MUCH for all of your hard work and over a weekend too. I am away from my computer until monday but will start testing it first thing and get back to you with answers to all of your questions if they are still needed.
Once again, thank you so much. Now go enjoy the rest of your weekend!!!
May 17, 2009 at 9:01 am
Thanks, sperry, be sure to let us know how it works out.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 18, 2009 at 3:17 am
Hi RBarryYoung.
Thanks so much for your hard work. I ran the script (the one that uses tmp tables) this morning and results were returned in just over 20mins. Since this will run as an overnight job, 20 mins is fine.
I just want to clarify one part of the syntax to make sure i understand it all correctly.
;WITH ctePersToPersSubs as
I have not see the above used in this context before. Does it create a tmp function?
Thanks again
Steve
May 18, 2009 at 7:03 am
sperry (5/18/2009)
Hi RBarryYoung.Thanks so much for your hard work. I ran the script (the one that uses tmp tables) this morning and results were returned in just over 20mins. Since this will run as an overnight job, 20 mins is fine.
Great thanks for the feedback! Though I can't help feeling that this should still be able to run faster...
I just want to clarify one part of the syntax to make sure i understand it all correctly.
;WITH ctePersToPersSubs as
I have not see the above used in this context before. Does it create a tmp function?
No, this is the syntax for creating Common-Table Expressions or CTE's. CTEs are really just named table expressions that can be re-used (because they are named). Except for the naming/reuse with a single query and the ability to define recursive queries (not used here, see BOL) they are really no different than a subquery. However most people agree that they make it easier to organize a complex query and the naming makes them more readable.
If it helps, think of them as temporary Views that only exist within a single query (normal (non-indexed) Views are also table expressions).
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 18, 2009 at 7:07 am
I forgot to mention, that because I was using my own made up data, I can be sure that it is correct, so you should definitely check it for validity before relying on it in production.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 18, 2009 at 7:20 am
Thanks for the explination. Always good to learn something new.
I am in the process of checking the data now.
Thanks again
May 18, 2009 at 7:24 am
Forgot to mention, i have also made a post about the same issue on experts exchane. No one has come up with a solution as good as yours, so would you mind if i post your solution there to (Giving you full credit of course).
May 18, 2009 at 7:45 am
Yes, that's OK.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply