April 5, 2016 at 7:35 am
I have inherited a DB, and need to make some changes for a search proc.
This app and DB are still in dev, and have not yet been published to production, so schema changes are still possible - not easy, but possible.
I have a table with an integer field that needs to be searched for partial matches. The partial match can be anywhere within the number.
In the physical world, the number that is visible is a 6-character number, left-padded with zeros (eg: 000123).
The table will contain anywhere from 300,000 to 5,000,000 records.
Currently, most instances, will have only 1 CategoryId in the table, but I have one instance that will have 3.
The way I see it, I have 3 options...
* Am I missing any obvious improvement options?
* Am I at least on the right track?
The new query requirements
DECLARE @PartialPartNumber VARCHAR(6) = '234';
SELECT CategoryId, PartNumber
FROM #PartsList AS pl
WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';
Below is a representation of the current structure
-- This is the original table design
IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL
BEGIN
DROP TABLE #PartsList;
END;
GO
CREATE TABLE #PartsList
(
PartId INT IDENTITY(1,1) NOT NULL
, CategoryId INT NOT NULL
, PartNumber INT NOT NULL CHECK (PartNumber > 0 AND PartNumber < 1000000)
, UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())
, PRIMARY KEY CLUSTERED (PartId)
);
GO
CREATE UNIQUE NONCLUSTERED INDEX uxParts
ON #PartsList (CategoryId, PartNumber);
GO
Code to load some sample data
;WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+8 or 100,000,000 rows max
cteTally(N) AS (SELECT TOP (999999) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8 )
, cteCat1 AS (
SELECT TOP (5000) CategoryId = 1, PartNumber = N
FROM cteTally
ORDER BY NEWID()
)
, cteCat2 AS (
SELECT TOP (5000) CategoryId = 2, PartNumber = N
FROM cteTally
ORDER BY NEWID()
)
, cteCat3 AS (
SELECT TOP (5000) CategoryId = 3, PartNumber = N
FROM cteTally
ORDER BY NEWID()
)
INSERT INTO #PartsList ( CategoryId, PartNumber )
SELECT CategoryId, RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) FROM cteCat1 UNION ALL
SELECT CategoryId, RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) FROM cteCat2 UNION ALL
SELECT CategoryId, RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) FROM cteCat3;
GO
OPTION 2 - Add a persisted computed column and index
-- This is a minimal change, adding a persisted computed column and index
IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL
BEGIN
DROP TABLE #PartsList;
END;
GO
CREATE TABLE #PartsList
(
PartId INT IDENTITY(1,1) NOT NULL
, CategoryId INT NOT NULL
, PartNumber INT NOT NULL CHECK (PartNumber > 0 AND PartNumber < 1000000)
, UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())
, PartNumberSearch AS RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) PERSISTED
, PRIMARY KEY CLUSTERED (PartId)
);
GO
CREATE UNIQUE NONCLUSTERED INDEX uxParts
ON #PartsList (CategoryId, PartNumber);
GO
CREATE NONCLUSTERED INDEX ixPartSearch
ON #PartsList (PartNumberSearch, CategoryId)
INCLUDE (PartNumber);
GO
OPTION 3 - Fix the data type and clustered key
-- This change is the big one. It requires FK changes on other tables, and related proc changes.
IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL
BEGIN
DROP TABLE #PartsList;
END;
GO
CREATE TABLE #PartsList
(
CategoryId INT NOT NULL
, PartNumber CHAR(6) NOT NULL CHECK (LEN(LTRIM(PartNumber)) = 6 AND CAST(PartNumber AS INT) > 0 AND CAST(PartNumber AS INT) < 1000000)
, UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())
, PRIMARY KEY CLUSTERED (PartNumber, CategoryId)
);
GO
April 5, 2016 at 8:51 am
Interesting dilemma and great work with the sample data/ddl!
Quick question what are the Partnumber column always going to be 6 characters long or less?
Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").
Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?
Update:
After playing around with your solutions it appears that Option 3 is pretty brilliant and certainly the way to go!
-- Itzik Ben-Gan 2001
April 5, 2016 at 9:02 am
Alan.B (4/5/2016)
Interesting dilemma and great work with the sample data/ddl!Quick question what are the Partnumber column always going to be 6 characters long or less?
Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").
Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?
The PartNumber is a 6-character numeric field. It is actually a subset of a larger number.
Real word examples are .... [other data-]123456, or [other data-]009865, or [other data-]000037
The @PartialPartNumber search could be 1 to 6 characters. Although, at this point, we are considering forcing the user to enter at least 3 digits. This will result in a smaller resultset being returned to the application, and hopefully an easier search for SQL.
April 5, 2016 at 9:33 am
Alan.B (4/5/2016)
Interesting dilemma and great work with the sample data/ddl!Quick question what are the Partnumber column always going to be 6 characters long or less?
Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").
Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?
Let's take a look at your solution for the case where the length is always 4, 5, or 6 characters, as this is where we are trying to push the user.
I have also made a slight change to the CHECK parameter in OPTION 3
IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL
BEGIN
DROP TABLE #PartsList;
END;
CREATE TABLE #PartsList
(
CategoryId INT NOT NULL
, PartNumber CHAR(6) NOT NULL CHECK ( (LEN(LTRIM(PartNumber)) = 6) AND (PartNumber NOT LIKE '%[^0-9]%') )
, UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())
, PRIMARY KEY CLUSTERED (PartNumber, CategoryId)
);
April 5, 2016 at 3:35 pm
Another possibility is to leave the main table alone, and create a separate table to hold the partial partids to match on. Use a trigger on the main table to load the partial table. I did trigger code only for the 3-digit part of it, but it's easy enough to add 4 and 5 digit values if you want to (or even to use the 3-digit code to relatively quickly get to the 4 and 5 digit matches(?)). I switched both tables from #names to "real" tables names since temp objects cannot have a trigger.
Note the very nice query plan for the last query: a merge join on the main table.
USE tempdb;
CREATE TABLE tempdb.dbo.PartsListPartials (
PartIdPartialLength tinyint NOT NULL,
PartIdPartial smallint NOT NULL,
PartIdPartialStart tinyint NOT NULL,
PartId int NOT NULL
)
CREATE CLUSTERED INDEX PartsListPartials__CL ON tempdb.dbo.PartsListPartials ( PartIdPartialLength, PartIdPartial ) WITH ( FILLFACTOR = 85 );
GO
CREATE TRIGGER PartsList__TR_Insert_Partial
ON tempdb.dbo.PartsList
AFTER INSERT
AS
SET NOCOUNT ON;
INSERT INTO tempdb.dbo.PartsListPartials ( PartId, PartIdPartial, PartIdPartialLength, PartIdPartialStart )
SELECT i.PartId, ca1.PartIdPartial, ca1.PartIdPartialLength, ca1.PartIdPartialStart
FROM inserted i
CROSS APPLY (
SELECT PartId / 1000 AS PartIdPartial, 3 AS PartIdPartialLength, 1 AS PartIdPartialStart
WHERE PartId / 1000 > 0
UNION ALL
SELECT PartId % 100000 / 100, 3, 2
WHERE PartId % 100000 / 100 > 0
UNION ALL
SELECT PartId % 10000 / 10, 3, 3
WHERE PartId % 10000 / 10 > 0
UNION ALL
SELECT PartId % 1000, 3, 4
WHERE PartId % 1000 > 0
) AS ca1
GO
DECLARE @PartIdPartial smallint
SET @PartIdPartial = 123
SELECT pl.*
FROM tempdb.dbo.PartsList pl
INNER JOIN (
SELECT DISTINCT PartId
FROM PartsListPartials
WHERE PartIdPartialLength = LEN(@PartIdPartial) AND
PartIdPartial = @PartIdPartial
) AS plp ON plp.PartId = pl.PartId
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 5, 2016 at 4:46 pm
DesNorton (4/5/2016)
I have inherited a DB, and need to make some changes for a search proc.This app and DB are still in dev, and have not yet been published to production, so schema changes are still possible - not easy, but possible.
I have a table with an integer field that needs to be searched for partial matches. The partial match can be anywhere within the number.
Hold on a sec.
What do you mean "anywhere within the number"?
If you search for a number "23" you'll find "000023", "230000", "000123", "052321", "123456", etc.
Is it what do you really want?
_____________
Code for TallyGenerator
April 5, 2016 at 8:49 pm
DesNorton (4/5/2016)
Alan.B (4/5/2016)
Interesting dilemma and great work with the sample data/ddl!Quick question what are the Partnumber column always going to be 6 characters long or less?
Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").
Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?
Let's take a look at your solution for the case where the length is always 4, 5, or 6 characters, as this is where we are trying to push the user.
I have also made a slight change to the CHECK parameter in OPTION 3
IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL
BEGIN
DROP TABLE #PartsList;
END;
CREATE TABLE #PartsList
(
CategoryId INT NOT NULL
, PartNumber CHAR(6) NOT NULL CHECK ( (LEN(LTRIM(PartNumber)) = 6) AND (PartNumber NOT LIKE '%[^0-9]%') )
, UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())
, PRIMARY KEY CLUSTERED (PartNumber, CategoryId)
);
Ok, never mind the solution I was talking about - I can't come up with something better than your third option. I can show you how to make it a little better however. I'm still blown away at how Option 3 gets an index seek considering how you have wild cards in the front of the WHERE clause.
Anyhow, here's how you can make your solution a little better. Below is 1,050,000 rows of sample data. I made some slight changes including removing the extra zeros logic.
SET NOCOUNT ON;
USE tempdb
GO
-- This change is the big one. It requires FK changes on other tables, and related proc changes.
IF OBJECT_ID('tempdb.dbo.PartsList', N'U') IS NOT NULL DROP TABLE dbo.PartsList;
CREATE TABLE dbo.PartsList
(
CategoryId INT NOT NULL
, PartNumber CHAR(6) NOT NULL
, UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())
, CONSTRAINT pk_pl PRIMARY KEY CLUSTERED (PartNumber, CategoryId)
, CONSTRAINT ck_pl1 CHECK (PartNumber > 0 AND PartNumber < 1000000)
);
GO
;WITH
E1(N) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS
(
SELECT TOP (999999) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f -- 1,000,0000 Rows
),
cteCat1 AS (SELECT TOP(350000) CategoryId=1, PartNumber=N FROM cteTally ORDER BY NEWID()),
cteCat2 AS (SELECT TOP(350000) CategoryId=2, PartNumber=N FROM cteTally ORDER BY NEWID()),
cteCat3 AS (SELECT TOP(350000) CategoryId=3, PartNumber=N FROM cteTally ORDER BY NEWID())
INSERT INTO dbo.PartsList (CategoryId, PartNumber)
SELECT CategoryId, PartNumber FROM cteCat1 UNION ALL
SELECT CategoryId, PartNumber FROM cteCat2 UNION ALL
SELECT CategoryId, PartNumber FROM cteCat3;
GO
Now let's compare your query when the optimizer uses a clustered index to when a UNIQUE nonclustered index is available (if you run this with "include actual execution plan" on you'll see that the optimizer picks the UNIQUE index when available):
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'Clustered Index:'+CHAR(10)+REPLICATE('-',50);
DECLARE @PartialPartNumber VARCHAR(6) = '234';
SELECT CategoryId, PartNumber
FROM dbo.PartsList AS pl
WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
GO
CREATE UNIQUE NONCLUSTERED INDEX uq_vPL ON dbo.PartsList(PartNumber, CategoryId);
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT CHAR(10)+'Unique Index:'+CHAR(10)+REPLICATE('-',50);
DECLARE @PartialPartNumber VARCHAR(6) = '234';
SELECT CategoryId, PartNumber
FROM dbo.PartsList AS pl
WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
Results:
Clustered Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 235 ms, elapsed time = 245 ms.
Unique Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 2083, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 150 ms.
Notice that a UNIQUE index reduces the number of reads by 40%+ and the CPU by almost 1/2. That's the best I can offer.
-- Itzik Ben-Gan 2001
April 5, 2016 at 9:20 pm
Sergiy (4/5/2016)
DesNorton (4/5/2016)
I have inherited a DB, and need to make some changes for a search proc.This app and DB are still in dev, and have not yet been published to production, so schema changes are still possible - not easy, but possible.
I have a table with an integer field that needs to be searched for partial matches. The partial match can be anywhere within the number.
Hold on a sec.
What do you mean "anywhere within the number"?
If you search for a number "23" you'll find "000023", "230000", "000123", "052321", "123456", etc.
Is it what do you really want?
Hi Sergiy
YES ... I do mean "anywhere within the number".
If I search for "23" I need to get "000023", "230000", "000123", "052321", "123456", etc.
This is a requirement from the Business Analysts. The dev team are pushing back, and we are trying to ensure that we get at least 3 digits (trying hard for 4) from the user.
So: search for "234" to get back "000234", "234000", "052341", "123456", etc.
Or: search for "2345" to get back "002345", "234500", "023457", "123456", etc.
April 5, 2016 at 9:26 pm
Scott and Alan
I have just walked into a #$@%storm that needs my full attention. I will take a closer look at your suggestions as soon as I can.
April 6, 2016 at 8:08 am
Alan.B (4/5/2016)
Now let's compare your query when the optimizer uses a clustered index to when a UNIQUE nonclustered index is available (if you run this with "include actual execution plan" on you'll see that the optimizer picks the UNIQUE index when available):
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'Clustered Index:'+CHAR(10)+REPLICATE('-',50);
DECLARE @PartialPartNumber VARCHAR(6) = '234';
SELECT CategoryId, PartNumber
FROM dbo.PartsList AS pl
WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
GO
CREATE UNIQUE NONCLUSTERED INDEX uq_vPL ON dbo.PartsList(PartNumber, CategoryId);
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT CHAR(10)+'Unique Index:'+CHAR(10)+REPLICATE('-',50);
DECLARE @PartialPartNumber VARCHAR(6) = '234';
SELECT CategoryId, PartNumber
FROM dbo.PartsList AS pl
WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
Results:
Clustered Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 235 ms, elapsed time = 245 ms.
Unique Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 2083, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 150 ms.
Notice that a UNIQUE index reduces the number of reads by 40%+ and the CPU by almost 1/2. That's the best I can offer.
Hi Alan. I fear that your numbers are using cached data. On my system, I ran your sample, and had slightly different results.
Results of your code: The I/O is well reduced, but the CPU usage is only marginally reduced
Clustered Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 262 ms.
Unique Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 2088, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 234 ms, elapsed time = 238 ms.
I then added sql to ensure that both statements run with clear cache.
-----------------------------------
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
-----------------------------------
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'Clustered Index:'+CHAR(10)+REPLICATE('-',50);
DECLARE @PartialPartNumber VARCHAR(6) = '234';
SELECT CategoryId, PartNumber
FROM dbo.PartsList AS pl
WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
GO
CREATE UNIQUE NONCLUSTERED INDEX uq_vPL ON dbo.PartsList(PartNumber, CategoryId);
GO
-----------------------------------
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
-----------------------------------
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT CHAR(10)+'Unique Index:'+CHAR(10)+REPLICATE('-',50);
DECLARE @PartialPartNumber VARCHAR(6) = '234';
SELECT CategoryId, PartNumber
FROM dbo.PartsList AS pl
WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
Results of code with cleared cache: The I/O is still well reduced, but the CPU usage is marginally increased
Clustered Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 297 ms, elapsed time = 291 ms.
Unique Index:
--------------------------------------------------
Table 'PartsList'. Scan count 1, logical reads 2088, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 313 ms, elapsed time = 309 ms.
I get the same results, whether the second index is unique or not.
It seems that SQL uses the second index, as it has everything that it needs.
If I add UTCDateCreated to the SELECT list, then SQL reverts back to the PK. So, due to fact that my real table and select statements have additional columns (omitted here for berevity), I will exclude the additional index.
April 6, 2016 at 9:58 am
ScottPletcher (4/5/2016)
Another possibility is to leave the main table alone, and create a separate table to hold the partial partids to match on. Use a trigger on the main table to load the partial table. I did trigger code only for the 3-digit part of it, but it's easy enough to add 4 and 5 digit values if you want to (or even to use the 3-digit code to relatively quickly get to the 4 and 5 digit matches(?)). I switched both tables from #names to "real" tables names since temp objects cannot have a trigger.Note the very nice query plan for the last query: a merge join on the main table.
USE tempdb;
CREATE TABLE tempdb.dbo.PartsListPartials (
PartIdPartialLength tinyint NOT NULL,
PartIdPartial smallint NOT NULL,
PartIdPartialStart tinyint NOT NULL,
PartId int NOT NULL
)
CREATE CLUSTERED INDEX PartsListPartials__CL ON tempdb.dbo.PartsListPartials ( PartIdPartialLength, PartIdPartial ) WITH ( FILLFACTOR = 85 );
GO
CREATE TRIGGER PartsList__TR_Insert_Partial
ON tempdb.dbo.PartsList
AFTER INSERT
AS
SET NOCOUNT ON;
INSERT INTO tempdb.dbo.PartsListPartials ( PartId, PartIdPartial, PartIdPartialLength, PartIdPartialStart )
SELECT i.PartId, ca1.PartIdPartial, ca1.PartIdPartialLength, ca1.PartIdPartialStart
FROM inserted i
CROSS APPLY (
SELECT PartId / 1000 AS PartIdPartial, 3 AS PartIdPartialLength, 1 AS PartIdPartialStart
WHERE PartId / 1000 > 0
UNION ALL
SELECT PartId % 100000 / 100, 3, 2
WHERE PartId % 100000 / 100 > 0
UNION ALL
SELECT PartId % 10000 / 10, 3, 3
WHERE PartId % 10000 / 10 > 0
UNION ALL
SELECT PartId % 1000, 3, 4
WHERE PartId % 1000 > 0
) AS ca1
GO
DECLARE @PartIdPartial smallint
SET @PartIdPartial = 123
SELECT pl.*
FROM tempdb.dbo.PartsList pl
INNER JOIN (
SELECT DISTINCT PartId
FROM PartsListPartials
WHERE PartIdPartialLength = LEN(@PartIdPartial) AND
PartIdPartial = @PartIdPartial
) AS plp ON plp.PartId = pl.PartId
Hi Scott
This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.
April 6, 2016 at 12:25 pm
DesNorton (4/6/2016)
Hi Scott
This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.
The math is just an efficient (I hope) way to extract all non-zero 3-digit parts of the number. Let's walk thru it for value 123456:
123456 / 1000 = 123 (for integer arithmetic, SQL drops all fractions).
123456 % 100000 = 23456 (strip off first digit) / 100 = 234 (again, fraction is dropped)
123456 % 10000 = 3456 (strip off first two digits) / 10 = 345
123456 % 1000 = 456
So the math just returns each non-zero three digit number:
123, 234, 345 and 456.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 6, 2016 at 12:42 pm
ScottPletcher (4/6/2016)
DesNorton (4/6/2016)
Hi Scott
This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.
The math is just an efficient (I hope) way to extract all non-zero 3-digit parts of the number. Let's walk thru it for value 123456:
123456 / 1000 = 123 (for integer arithmetic, SQL drops all fractions).
123456 % 100000 = 23456 (strip off first digit) / 100 = 234 (again, fraction is dropped)
123456 % 10000 = 3456 (strip off first two digits) / 10 = 345
123456 % 1000 = 456
So the math just returns each non-zero three digit number:
123, 234, 345 and 456.
Thank you. Now it seems so simple. I'll apply my mind to it when I'm back at work tomorrow.
April 7, 2016 at 2:07 am
DesNorton (4/6/2016)
ScottPletcher (4/6/2016)
DesNorton (4/6/2016)
Hi Scott
This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.
The math is just an efficient (I hope) way to extract all non-zero 3-digit parts of the number. Let's walk thru it for value 123456:
123456 / 1000 = 123 (for integer arithmetic, SQL drops all fractions).
123456 % 100000 = 23456 (strip off first digit) / 100 = 234 (again, fraction is dropped)
123456 % 10000 = 3456 (strip off first two digits) / 10 = 345
123456 % 1000 = 456
So the math just returns each non-zero three digit number:
123, 234, 345 and 456.
Thank you. Now it seems so simple. I'll apply my mind to it when I'm back at work tomorrow.
I am thinking of taking this method to the next level. Rather than a lookup table, I want to try and CROSS APPLY this logic to a table. So I have come up with something like this. It returns data, but also raises a Divide by zero error.
DECLARE @BaseNumber INT = 123456;
DECLARE @PartNum INT = 23;
;WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
, cteTally(N) AS (SELECT TOP (LEN(@BaseNumber)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E1 )
, cteStart(N) AS (SELECT N FROM cteTally WHERE N <= (LEN(@BaseNumber) - LEN(@PartNum) +1) )
, cteParts AS (
SELECT BaseNumber = @BaseNumber
, RequiredPart = @BaseNumber % POWER(10, LEN(@BaseNumber) -s.N +1) / POWER(10, LEN(@BaseNumber) -LEN(@PartNum) -s.N +1 )
FROM cteStart AS s
)
-- This could probably all be moved into the cteParts, but need to get it working first.
SELECT p.*
FROM cteParts AS p
WHERE @PartNum = p.RequiredPart;
April 7, 2016 at 10:40 am
OK. So I have tweaked Scott's math and massaged it into a CROSS APPLY against the table to search for any partial PartNumber, regardless of length.
The performance is a bit slower, but somehow the other developers are able to understand it a bit better than the separate table.
Create the table and data:
-- Drop and recreate the table
IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL
BEGIN
DROP TABLE #PartsList;
END;
GO
CREATE TABLE #PartsList
(
PartId INT IDENTITY(1,1) NOT NULL
, CategoryId INT NOT NULL
, PartNumber INT NOT NULL CHECK (PartNumber > 0 AND PartNumber < 1000000)
, UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())
--, PRIMARY KEY CLUSTERED (PartId)
, PRIMARY KEY CLUSTERED (CategoryId, PartNumber)
);
--CREATE UNIQUE NONCLUSTERED INDEX uxParts
--ON #PartsList (CategoryId, PartNumber);
CREATE UNIQUE NONCLUSTERED INDEX uxPartId
ON #PartsList (PartId); -- Needed for FK
GO
-- Populate the table
;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS (
SELECT TOP (999999) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f -- 1,000,0000 Rows
),
cteCat1 AS (SELECT TOP(350000) CategoryId=1, PartNumber=N FROM cteTally ORDER BY NEWID()),
cteCat2 AS (SELECT TOP(350000) CategoryId=2, PartNumber=N FROM cteTally ORDER BY NEWID()),
cteCat3 AS (SELECT TOP(350000) CategoryId=3, PartNumber=N FROM cteTally ORDER BY NEWID())
INSERT INTO #PartsList (CategoryId, PartNumber)
SELECT CategoryId, PartNumber FROM cteCat1 UNION ALL
SELECT CategoryId, PartNumber FROM cteCat2 UNION ALL
SELECT CategoryId, PartNumber FROM cteCat3;
GO
Query the data:
SET NOCOUNT ON;
SET STATISTICS TIME, IO ON;
DECLARE @PartialPartNumber INT = 123;
DECLARE @CategoryId INT = 2;
SELECT pl.PartId, pl.CategoryId, pl.PartNumber --, s.*
FROM #PartsList AS pl
CROSS APPLY ( -- TOP(LEN(pl.PartNumber)) and WHERE n.N <= (LEN(pl.PartNumber) - LEN(@PartialPartNumber) +1)
-- required to prevent DivideByZero errors
SELECT TOP (LEN(pl.PartNumber))
MatchThis = pl.PartNumber % POWER(10, LEN(pl.PartNumber) -n.N +1) / POWER(10, LEN(pl.PartNumber) -LEN(@PartialPartNumber) -n.N +1 )
--, MatchThisLength = LEN(@PartialPartNumber)
--, MatchThisStart = n.N
FROM (
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) AS n(N)
WHERE n.N <= (LEN(pl.PartNumber) - LEN(@PartialPartNumber) +1)
AND pl.PartNumber % POWER(10, LEN(pl.PartNumber) -n.N +1) / POWER(10, LEN(pl.PartNumber) -LEN(@PartialPartNumber) -n.N +1 ) = @PartialPartNumber
) AS s
WHERE pl.CategoryId = @CategoryId;
SET STATISTICS TIME, IO OFF;
The Execution Plan shows parallelism as well, which is good.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply