Stuck with LIKE criteria

Question

Post reply

Stuck with LIKE criteria

jgale 70688

Grasshopper

Points: 12
More actions
April 11, 2015 at 12:04 am

#296873

Hi
I have problem that I can't get my head around. I'm sure there's a simple solution but I can't figure it out.
I have two tables (simplified below).
Prefix
AAA
AAB
AAC
AAAA
Data
AAAFFFF 20
AAAFFFG 25
AACGGG 25
AAAAHHH 10
I need to sum the data according to the prefixes in the Lookup table.
Using something like this almost works:
SELECT Prefix, sum(value)
FROM Data d inner join Prefix p on p.Prefix = left(d.name, len(p.Prefix))
As you can see, the AAA and AAAA cause duplicates.
Is there a simple (and fast) way to only use the longest matching prefix?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182919 More actions · Answer 1

Quick suggestion with a CTE and ROW_NUMBER ordered by the length of the prefix

😎

USE tempdb;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.TBL_PREFIX') IS NOT NULL DROP TABLE dbo.TBL_PREFIX;

CREATE TABLE dbo.TBL_PREFIX

(

PREFIX_VALUE VARCHAR(10) NOT NULL CONSTRAINT PK_DBO_TBL_PREFIX_PREFIX_VALUE PRIMARY KEY CLUSTERED

,PREFIX_LEN AS (LEN(PREFIX_VALUE)) PERSISTED

);

INSERT INTO dbo.TBL_PREFIX (PREFIX_VALUE)

VALUES

('AAA' )

,('AAB' )

,('AAC' )

,('AAAA');

IF OBJECT_ID(N'dbo.TBL_DATA') IS NOT NULL DROP TABLE dbo.TBL_DATA;

CREATE TABLE dbo.TBL_DATA

(

DATA_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_DATA_DATA_ID PRIMARY KEY CLUSTERED

,DATA_STR VARCHAR(10) NOT NULL

,DATA_VALUE INT NOT NULL

);

INSERT INTO dbo.TBL_DATA(DATA_STR,DATA_VALUE)

VALUES

('AAAFFFF', 20)

,('AAAFFFG', 25)

,('AACGGG', 25)

,('AAAAHHH', 10);

;WITH MATCHED_DATA AS

(

SELECT

TD.DATA_ID

,ROW_NUMBER() OVER

(

PARTITION BY TD.DATA_ID

ORDER BY TP.PREFIX_LEN DESC

) AS DATA_RID

,TD.DATA_STR

,TD.DATA_VALUE

,TP.PREFIX_VALUE

,TP.PREFIX_LEN

FROM dbo.TBL_DATA TD

CROSS APPLY dbo.TBL_PREFIX TP

WHERE CHARINDEX(TP.PREFIX_VALUE,TD.DATA_STR,1) > 0

)

SELECT

MD.DATA_ID

,MD.DATA_STR

,MD.DATA_VALUE

,MD.PREFIX_VALUE

,MD.PREFIX_LEN

FROM MATCHED_DATA MD

WHERE MD.DATA_RID = 1;

Results

DATA_ID DATA_STR DATA_VALUE PREFIX_VALUE PREFIX_LEN

----------- ---------- ----------- ------------ -----------

1 AAAFFFF 20 AAA 3

2 AAAFFFG 25 AAA 3

3 AACGGG 25 AAC 3

4 AAAAHHH 10 AAAA 4

ScottPletcher SSC Guru Points: 101130 More actions · Answer 2

Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.

SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE

FROM dbo.TBL_DATA d

CROSS APPLY (

SELECT TOP (1) p.PREFIX_VALUE

FROM dbo.TBL_PREFIX p

WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'

ORDER BY p.PREFIX_LEN DESC

) AS p_cross_apply

GROUP BY p_cross_apply.PREFIX_VALUE

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eirikur Eiriksson SSC Guru Points: 182919 More actions · Answer 3

ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE

Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.

😎

ScottPletcher SSC Guru Points: 101130 More actions · Answer 4

Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.
😎

But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

jgale 70688 Grasshopper Points: 12 More actions · Answer 5

Thank you for the replies.

Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.

I implemented something like Scotts example. The sub-select worked well for me.

Thanks for the help.

ScottPletcher SSC Guru Points: 101130 More actions · Answer 6

jgale 70688 (4/13/2015)
Thank you for the replies.
Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.
I implemented something like Scotts example. The sub-select worked well for me.
Thanks for the help.

Glad it helped. For max benefit, make sure the prefix table is clustered on the lookup value.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eirikur Eiriksson SSC Guru Points: 182919 More actions · Answer 7

jgale 70688 (4/13/2015)
Thank you for the replies.
Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.
I implemented something like Scotts example. The sub-select worked well for me.
Thanks for the help.

Quick questions,

1. What is the cardinality of the prefix table?

2. Is this a one off query?

3. How wide (Len) are the string values?

😎

Eirikur Eiriksson SSC Guru Points: 182919 More actions · Answer 8

ScottPletcher (4/13/2015)
Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.
😎
But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.

It will sort every row in the output for the group by aggregation.

😎

On the other hand, I must admit that my test/analysis was somewhat flawed, less than 10K rows / 100 prefixes, without the right indexing it is almost certain that the cross apply will outperform the window function method on larger sets.

ScottPletcher SSC Guru Points: 101130 More actions · Answer 9

Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Eirikur Eiriksson (4/13/2015)
ScottPletcher (4/13/2015)
Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.
SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE
FROM dbo.TBL_DATA d
CROSS APPLY (
SELECT TOP (1) p.PREFIX_VALUE
FROM dbo.TBL_PREFIX p
WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'
ORDER BY p.PREFIX_LEN DESC
) AS p_cross_apply
GROUP BY p_cross_apply.PREFIX_VALUE
Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.
😎
But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.
It will sort every row in the output for the group by aggregation.
😎
On the other hand, I must admit that my test/analysis was somewhat flawed, less than 10K rows / 100 prefixes, without the right indexing it is almost certain that the cross apply will outperform the window function method on larger sets.

The final sort for the aggregation is to produce the total that the original post specified. You stopped your code before that point. Presumably to get that summary total you'd have to GROUP BY too ;-).

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 10

Removed... Faster solution has already been posed,

Jason