January 2, 2019 at 6:49 am
Hello All,
Wish you a very happy new year 2019!!
I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
Since both queries are similar, I am putting one of them here.
Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here.
Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.
Thanks in advance!!
declare @maxResults INT = 25,
@companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
@issuerCode nvarchar(6)=''
DECLARE @searchText NVARCHAR(300)
DECLARE @issuerCodeType int
DECLARE @searchTextType int
SELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)
--SELECT @searchText = @companyName
-- Get company by QGrams
DECLARE @minLen int
DECLARE @Q int
SET @Q = 3
SET @minLen = 4
IF LEN(@searchText) < 6
BEGIN
SET @minLen = 2
END
Create table #qgram
(
gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
, [count] INT NOT NULL
)
DECLARE @SearchTextLenght INT
SET @SearchTextLenght = LEN(@searchText)
INSERT INTO #qgram
SELECT
ISNULL(gram, '') as gram
, ISNULL(COUNT(*), 1) AS [count]
FROM
(
SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram
FROM dbautil.dbo.numbers_tbl n (NOLOCK)
WHERE n.n <= @SearchTextLenght + @Q - 2
) sub
GROUP BY gram
--option(recompile)
DECLARE @maxMatch INT
SELECT @maxMatch = SUM(count) FROM #qgram
SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn
FROM
(
SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId
FROM
(
SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count
FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING
SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC
) sub
INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId
ORDER BY ratio DESC
) sub2
drop table #qgram
January 2, 2019 at 8:21 am
chandan_jha18 - Wednesday, January 2, 2019 6:49 AMHello All,Wish you a very happy new year 2019!!
I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
Since both queries are similar, I am putting one of them here.Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here.
Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.
Thanks in advance!!
declare @maxResults INT = 25,
@companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
@issuerCode nvarchar(6)=''
DECLARE @searchText NVARCHAR(300)
DECLARE @issuerCodeType int
DECLARE @searchTextType intSELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)
--SELECT @searchText = @companyName
-- Get company by QGrams
DECLARE @minLen int
DECLARE @Q int
SET @Q = 3
SET @minLen = 4
IF LEN(@searchText) < 6
BEGIN
SET @minLen = 2
ENDCreate table #qgram
(
gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
, [count] INT NOT NULL
)
DECLARE @SearchTextLenght INT
SET @SearchTextLenght = LEN(@searchText)
INSERT INTO #qgram
SELECT
ISNULL(gram, '') as gram
, ISNULL(COUNT(*), 1) AS [count]
FROM
(
SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram
FROM dbautil.dbo.numbers_tbl n (NOLOCK)
WHERE n.n <= @SearchTextLenght + @Q - 2
) sub
GROUP BY gram
--option(recompile)
DECLARE @maxMatch INT
SELECT @maxMatch = SUM(count) FROM #qgram
SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn
FROM
(
SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId
FROM
(
SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count
FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING
SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC
) sub
INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId
ORDER BY ratio DESC
) sub2
drop table #qgram
Start by getting rid of the nolock hints, the implicit conversions and the parallel execution, run the query again and post the execution plan.
๐
January 2, 2019 at 8:26 am
Cluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 2, 2019 at 8:48 am
Eirikur Eiriksson - Wednesday, January 2, 2019 8:21 AMchandan_jha18 - Wednesday, January 2, 2019 6:49 AMHello All,Wish you a very happy new year 2019!!
I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
Since both queries are similar, I am putting one of them here.Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here.
Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.
Thanks in advance!!
declare @maxResults INT = 25,
@companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
@issuerCode nvarchar(6)=''
DECLARE @searchText NVARCHAR(300)
DECLARE @issuerCodeType int
DECLARE @searchTextType intSELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)
--SELECT @searchText = @companyName
-- Get company by QGrams
DECLARE @minLen int
DECLARE @Q int
SET @Q = 3
SET @minLen = 4
IF LEN(@searchText) < 6
BEGIN
SET @minLen = 2
ENDCreate table #qgram
(
gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
, [count] INT NOT NULL
)
DECLARE @SearchTextLenght INT
SET @SearchTextLenght = LEN(@searchText)
INSERT INTO #qgram
SELECT
ISNULL(gram, '') as gram
, ISNULL(COUNT(*), 1) AS [count]
FROM
(
SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram
FROM dbautil.dbo.numbers_tbl n (NOLOCK)
WHERE n.n <= @SearchTextLenght + @Q - 2
) sub
GROUP BY gram
--option(recompile)
DECLARE @maxMatch INT
SELECT @maxMatch = SUM(count) FROM #qgram
SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn
FROM
(
SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId
FROM
(
SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count
FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING
SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC
) sub
INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId
ORDER BY ratio DESC
) sub2
drop table #qgram
Start by getting rid of the nolock hints, the implicit conversions and the parallel execution, run the query again and post the execution plan.
๐
Implicit conversions are happening in very small tables. I can try removing nolock hint and add maxdop 1 to remove parallelism if that's what you mean by parallel execution.
January 2, 2019 at 8:48 am
ScottPletcher - Wednesday, January 2, 2019 8:26 AMCluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.
Let me get back to you by making this change and report the performance.
Thanks
January 2, 2019 at 8:57 am
And, calculations like this are best done into a variable once:
@SearchTextLenght + @Q - 2
Having it in the filter criteria leads to problems.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 2, 2019 at 9:46 am
ScottPletcher - Wednesday, January 2, 2019 8:26 AMCluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.
While I agree that will cure the problem of doing 1,500 individual seeks by (possibly... no guarantee without knowing more about the data) by changing to a (possibly, again) seek followed by a range scan, it will only solve about 5% or less of the problem.
The real problem is that the table (apparently) has almost a quarter million rows in it and the query at hand is returning more than 90 million of them in 51 seek/scans. Apparently (after a quick glance at the code and the schema and the execution plan) there are 51 items in the temp table to be checked and trying to do this in a single query is absolutely the worst way to do this. Since there is no early disqualification of data, you end up with what amounts to as an accidental Cross Join.
The design of the tables also has a problem... there are distinctly (pun intended) no unique indexes.
Just changing one index (or any index, for that matter) just isn't going to solve the performance issues with this query.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2019 at 9:52 am
chandan_jha18 - Wednesday, January 2, 2019 6:49 AMHello All,Wish you a very happy new year 2019!!
I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
Since both queries are similar, I am putting one of them here.Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here.
Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.
Thanks in advance!!
declare @maxResults INT = 25,
@companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
@issuerCode nvarchar(6)=''
DECLARE @searchText NVARCHAR(300)
DECLARE @issuerCodeType int
DECLARE @searchTextType intSELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)
--SELECT @searchText = @companyName
-- Get company by QGrams
DECLARE @minLen int
DECLARE @Q int
SET @Q = 3
SET @minLen = 4
IF LEN(@searchText) < 6
BEGIN
SET @minLen = 2
ENDCreate table #qgram
(
gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
, [count] INT NOT NULL
)
DECLARE @SearchTextLenght INT
SET @SearchTextLenght = LEN(@searchText)
INSERT INTO #qgram
SELECT
ISNULL(gram, '') as gram
, ISNULL(COUNT(*), 1) AS [count]
FROM
(
SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram
FROM dbautil.dbo.numbers_tbl n (NOLOCK)
WHERE n.n <= @SearchTextLenght + @Q - 2
) sub
GROUP BY gram
--option(recompile)
DECLARE @maxMatch INT
SELECT @maxMatch = SUM(count) FROM #qgram
SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn
FROM
(
SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId
FROM
(
SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count
FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING
SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC
) sub
INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId
ORDER BY ratio DESC
) sub2
drop table #qgram
It would really have helped if you had identified what the actual purpose of the query is. So, for better understanding by all, let me see if I get it right.
In this particular bit of code, it would appear that you're retrieving 51 "key items" that uniquely identify a company and then storing that information in a temp table.
After that, you're trying to figure out which rows in a much larger table provide a match of those 51 "grams" so you can return only those nameCacheId that are actually a match for the given company.
Is that correct? If not, please expound.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2019 at 9:59 am
Jeff Moden - Wednesday, January 2, 2019 9:52 AMchandan_jha18 - Wednesday, January 2, 2019 6:49 AMHello All,Wish you a very happy new year 2019!!
I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
Since both queries are similar, I am putting one of them here.Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here.
Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.
Thanks in advance!!
declare @maxResults INT = 25,
@companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
@issuerCode nvarchar(6)=''
DECLARE @searchText NVARCHAR(300)
DECLARE @issuerCodeType int
DECLARE @searchTextType intSELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)
--SELECT @searchText = @companyName
-- Get company by QGrams
DECLARE @minLen int
DECLARE @Q int
SET @Q = 3
SET @minLen = 4
IF LEN(@searchText) < 6
BEGIN
SET @minLen = 2
ENDCreate table #qgram
(
gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
, [count] INT NOT NULL
)
DECLARE @SearchTextLenght INT
SET @SearchTextLenght = LEN(@searchText)
INSERT INTO #qgram
SELECT
ISNULL(gram, '') as gram
, ISNULL(COUNT(*), 1) AS [count]
FROM
(
SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram
FROM dbautil.dbo.numbers_tbl n (NOLOCK)
WHERE n.n <= @SearchTextLenght + @Q - 2
) sub
GROUP BY gram
--option(recompile)
DECLARE @maxMatch INT
SELECT @maxMatch = SUM(count) FROM #qgram
SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn
FROM
(
SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId
FROM
(
SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count
FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING
SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC
) sub
INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId
ORDER BY ratio DESC
) sub2
drop table #qgram
It would really have helped if you had identified what the actual purpose of the query is. So, for better understanding by all, let me see if I get it right.
In this particular bit of code, it would appear that you're retrieving 51 "key items" that uniquely identify a company and then storing that information in a temp table.
After that, you're trying to figure out which rows in a much larger table provide a match of those 51 items so you can return only those rows that are actually a match for the given company.
Is that correct? If not, please expound.
For a quick & temporary small improvement I'd try something like this:
CREATE INDEX ix_JustTesting ON SearchSuggest_QGram_tbl (gram, nameCacheId) INCLUDE ([count])
But it's never going to perform as well as a redesign.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 2, 2019 at 11:21 am
Jeff Moden - Wednesday, January 2, 2019 9:46 AMScottPletcher - Wednesday, January 2, 2019 8:26 AMCluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.While I agree that will cure the problem of doing 1,500 individual seeks by (possibly... no guarantee without knowing more about the data) by changing to a (possibly, again) seek followed by a range scan, it will only solve about 5% or less of the problem.
The real problem is that the table (apparently) has almost a quarter million rows in it and the query at hand is returning more than 90 million of them in 51 seek/scans. Apparently (after a quick glance at the code and the schema and the execution plan) there are 51 items in the temp table to be checked and trying to do this in a single query is absolutely the worst way to do this. Since there is no early disqualification of data, you end up with what amounts to as an accidental Cross Join.
The design of the tables also has a problem... there are distinctly (pun intended) no unique indexes.
Just changing one index (or any index, for that matter) just isn't going to solve the performance issues with this query.
Quite right. I didn't look as fully at the query as I should have (I'm just too busy at work right now, but I look at SQL qs as kind of a quick break from work).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 2, 2019 at 4:39 pm
The Op hasn't gotten back on my questions yet. If the problem definition turns out to be what I what I posted in my questions, then it will be what I've done on similar things in the past (one being just 2 weeks ago) and we should be able to have the code complete in about a quarter of a second. Maybe less.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2019 at 6:30 am
Jeff Moden - Wednesday, January 2, 2019 4:39 PMThe Op hasn't gotten back on my questions yet. If the problem definition turns out to be what I what I posted in my questions, then it will be what I've done on similar things in the past (one being just 2 weeks ago) and we should be able to have the code complete in about a quarter of a second. Maybe less.
Sir, I apologize for delay in response which is mainly due to our different timezones and I could have got a chance to talk to the dev today only. I have got the document explaining the logic and will post the information shortly.
January 3, 2019 at 6:54 am
January 3, 2019 at 6:55 am
ScottPletcher - Wednesday, January 2, 2019 8:26 AMCluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.
Sir, No change in performance even after this.
Thanks
January 3, 2019 at 7:02 am
Jeff Moden - Wednesday, January 2, 2019 9:52 AMchandan_jha18 - Wednesday, January 2, 2019 6:49 AMHello All,Wish you a very happy new year 2019!!
I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
Since both queries are similar, I am putting one of them here.Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here.
Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.
Thanks in advance!!
declare @maxResults INT = 25,
@companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
@issuerCode nvarchar(6)=''
DECLARE @searchText NVARCHAR(300)
DECLARE @issuerCodeType int
DECLARE @searchTextType intSELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)
--SELECT @searchText = @companyName
-- Get company by QGrams
DECLARE @minLen int
DECLARE @Q int
SET @Q = 3
SET @minLen = 4
IF LEN(@searchText) < 6
BEGIN
SET @minLen = 2
ENDCreate table #qgram
(
gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
, [count] INT NOT NULL
)
DECLARE @SearchTextLenght INT
SET @SearchTextLenght = LEN(@searchText)
INSERT INTO #qgram
SELECT
ISNULL(gram, '') as gram
, ISNULL(COUNT(*), 1) AS [count]
FROM
(
SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram
FROM dbautil.dbo.numbers_tbl n (NOLOCK)
WHERE n.n <= @SearchTextLenght + @Q - 2
) sub
GROUP BY gram
--option(recompile)
DECLARE @maxMatch INT
SELECT @maxMatch = SUM(count) FROM #qgram
SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn
FROM
(
SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId
FROM
(
SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count
FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING
SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC
) sub
INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId
ORDER BY ratio DESC
) sub2
drop table #qgram
It would really have helped if you had identified what the actual purpose of the query is. So, for better understanding by all, let me see if I get it right.
In this particular bit of code, it would appear that you're retrieving 51 "key items" that uniquely identify a company and then storing that information in a temp table.
After that, you're trying to figure out which rows in a much larger table provide a match of those 51 "grams" so you can return only those nameCacheId that are actually a match for the given company.
Is that correct? If not, please expound.
Sir, here is what I got from the developer as per his document:
1) When the search text is entered, the length of the search text is calculated and then the search string is broken into multiple small strings of 3 characters and stored in a table called qgram
Example: If I pass the search text as 'Legg Mason Partners', we will get the following results:
Results:
gram count
Ma 1
Pa 1
##L 1
#Le 1
etc.
2) Qgram table and SearchSuggest_QGram_tbl are comibined based on gram column.
3) After Qgram and SearchSuggest_QGram_tbl are combined, the rows are grouped on basis of same namecacheid and count values are summed up. And top 1500 rows are selected.
Since we have count column in both the tables, the column with least value is used for adding the all the count values while grouping based on namecacheid.
Example
nameCacheId count
5359401 9
512010 6
2545081 8
2875285 11
1934139 10
4) The result fetched in step 4 is combined with SearchSuggest_QGramNameCache_tbl based on ‘nameCacheId’
5) From the result set obtained in step 6, top 500 records are fetched which contain company id, ratio, namecache id, strippednamelength (company or fund name).
6) Ratio for each company id is calculated as below
(count (obtained in step 4) * 0.1 /length(companyname fetched in step7) +1) * count (obtained in step 4) * 0.1/ sum of count values obtained earlier.
Can you see if this matches your observation.
Regards
Chandan Jha
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply