April 11, 2015 at 12:04 am
Hi
I have problem that I can't get my head around. I'm sure there's a simple solution but I can't figure it out.
I have two tables (simplified below).
Prefix
AAA
AAB
AAC
AAAA
Data
AAAFFFF 20
AAAFFFG 25
AACGGG 25
AAAAHHH 10
I need to sum the data according to the prefixes in the Lookup table.
Using something like this almost works:
SELECT Prefix, sum(value)
FROM Data d inner join Prefix p on p.Prefix = left(d.name, len(p.Prefix))
As you can see, the AAA and AAAA cause duplicates.
Is there a simple (and fast) way to only use the longest matching prefix?
April 11, 2015 at 12:33 am
Quick suggestion with a CTE and ROW_NUMBER ordered by the length of the prefix
😎
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.TBL_PREFIX') IS NOT NULL DROP TABLE dbo.TBL_PREFIX;
CREATE TABLE dbo.TBL_PREFIX
(
PREFIX_VALUE VARCHAR(10) NOT NULL CONSTRAINT PK_DBO_TBL_PREFIX_PREFIX_VALUE PRIMARY KEY CLUSTERED
,PREFIX_LEN AS (LEN(PREFIX_VALUE)) PERSISTED
);
INSERT INTO dbo.TBL_PREFIX (PREFIX_VALUE)
VALUES
('AAA' )
,('AAB' )
,('AAC' )
,('AAAA');
IF OBJECT_ID(N'dbo.TBL_DATA') IS NOT NULL DROP TABLE dbo.TBL_DATA;
CREATE TABLE dbo.TBL_DATA
(
DATA_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_DATA_DATA_ID PRIMARY KEY CLUSTERED
,DATA_STR VARCHAR(10) NOT NULL
,DATA_VALUE INT NOT NULL
);
INSERT INTO dbo.TBL_DATA(DATA_STR,DATA_VALUE)
VALUES
('AAAFFFF', 20)
,('AAAFFFG', 25)
,('AACGGG', 25)
,('AAAAHHH', 10);
;WITH MATCHED_DATA AS
(
SELECT
TD.DATA_ID
,ROW_NUMBER() OVER
(
PARTITION BY TD.DATA_ID
ORDER BY TP.PREFIX_LEN DESC
) AS DATA_RID
,TD.DATA_STR
,TD.DATA_VALUE
,TP.PREFIX_VALUE
,TP.PREFIX_LEN
FROM dbo.TBL_DATA TD
CROSS APPLY dbo.TBL_PREFIX TP
WHERE CHARINDEX(TP.PREFIX_VALUE,TD.DATA_STR,1) > 0
)
SELECT
MD.DATA_ID
,MD.DATA_STR
,MD.DATA_VALUE
,MD.PREFIX_VALUE
,MD.PREFIX_LEN
FROM MATCHED_DATA MD
WHERE MD.DATA_RID = 1;
Results
DATA_ID DATA_STR DATA_VALUE PREFIX_VALUE PREFIX_LEN
----------- ---------- ----------- ------------ -----------
1 AAAFFFF 20 AAA 3
2 AAAFFFG 25 AAA 3
3 AACGGG 25 AAC 3
4 AAAAHHH 10 AAAA 4
April 13, 2015 at 2:54 pm
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 13, 2015 at 3:29 pm
ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.
😎
April 13, 2015 at 3:33 pm
Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.
😎
But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 13, 2015 at 3:50 pm
Thank you for the replies.
Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.
I implemented something like Scotts example. The sub-select worked well for me.
Thanks for the help.
April 13, 2015 at 4:33 pm
jgale 70688 (4/13/2015)
Thank you for the replies.Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.
I implemented something like Scotts example. The sub-select worked well for me.
Thanks for the help.
Glad it helped. For max benefit, make sure the prefix table is clustered on the lookup value.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 13, 2015 at 10:34 pm
jgale 70688 (4/13/2015)
Thank you for the replies.Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.
I implemented something like Scotts example. The sub-select worked well for me.
Thanks for the help.
Quick questions,
1. What is the cardinality of the prefix table?
2. Is this a one off query?
3. How wide (Len) are the string values?
😎
April 13, 2015 at 10:52 pm
ScottPletcher (4/13/2015)
Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.
😎
But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.
It will sort every row in the output for the group by aggregation.
😎
On the other hand, I must admit that my test/analysis was somewhat flawed, less than 10K rows / 100 prefixes, without the right indexing it is almost certain that the cross apply will outperform the window function method on larger sets.
April 14, 2015 at 8:09 am
Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.
😎
But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.
It will sort every row in the output for the group by aggregation.
😎
On the other hand, I must admit that my test/analysis was somewhat flawed, less than 10K rows / 100 prefixes, without the right indexing it is almost certain that the cross apply will outperform the window function method on larger sets.
The final sort for the aggregation is to produce the total that the original post specified. You stopped your code before that point. Presumably to get that summary total you'd have to GROUP BY too ;-).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 14, 2015 at 12:18 pm
Removed... Faster solution has already been posed,
Jason
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply