January 4, 2011 at 7:20 am
Happy New Year!
I tried to find a solution for an unusual problem. I have following data:
1cata
2catb
3catc
4cata
5catc
1doga
2dogc
3dogb
4dogd
4doga
5dogc
6dogc
1panthera
2panthera
3pantherb
4pantherc
5panthera
6pantherc
7pantherb
8panthera
9pantherc
10pantherb
11pantherd
I have to find last record of a consecutive pattern. Pattern can be any length from 1 - n.
Assuming I'm looking for a pattern like:
a c b
As an answer I need to get following records:
3dogb
7pantherb
10pantherb
Assuming I'm looking for a pattern like:
a c b d
As an answer I need to get:
4dogd
11pantherd
I spent some time looking for solutions. Honestly - I can't find one for now. Maybe it is just post Christmas / New Year brake slow down 🙂
Help me please,
Sebastian
January 4, 2011 at 7:31 am
First you need to identify the prior n rows for each set (dog, cat, etc.) for each entry. That can be done by using row_number and partitioning by the dog/cat/panther column, ordering by the number column.
Then, use the FOR XML PATH trick to concatenate the abc column into a string for each row.
Then compare that to the pattern.
The first step could be done in a CTE or temp table, then the second step would be done by cross applying that to itself using the length of the pattern in a TOP statement.
However, this kind of thing is something that won't perform well on any sizeable amount of data.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 4, 2011 at 8:51 am
Thanks,
Although that is the problem... Amount of data is huge. Not in gigabytes but in records. I tried doing something like this with a recursive stored procedure.
Due to number of records I've been wondering about as simple solution as it could possibly be.
What if... I’d add additional column in source data. By default it would be NULL. In stored procedure I would add very simple loop going from the beginning to the end of the table. That loop will store concatenated n consecutive values in a string. In every row I will add current value and remove first one. So I will have n-length pattern in new column. Resetting the string every time cat changes to dog or panther etc.
Than – all I have to do is just filter for rows with column 4 equal to the pattern.
1cataa
2catbab
3catcabc
4catabca
5catccac
1dogaa
2dogcac
3dogbacb
4dogdcbd
4dogabda
5dogcdac
6dogcacc
1pantheraa
2pantheraaa
3pantherbaab
4panthercabc
5pantherabca
6pantherccac
7pantherbacb
8pantheracba
9panthercbac
10pantherbacb
11pantherdcbd
Do you think it will be time / resources consuming task? I’m afraid it may generate huge mess in db…
Thanks,
Sebastian
January 4, 2011 at 8:53 am
That would work. Do it in a temp table instead of the real table, and query the temp table. That way, you don't mess up the database.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 4, 2011 at 8:58 am
Yep, you are right about the temp table.
Do you think it will not be performance killer?
January 4, 2011 at 9:46 am
This works for me on the sample data. Performance might be a problem on large datasets.
CREATE TABLE #tbl1(number INT , animal VARCHAR(10) , grade CHAR(1) )
INSERT INTO #tbl1
select 1,'cat','a'
UNION ALL select 2,'cat','b'
UNION ALL select 3,'cat','c'
UNION ALL select 4,'cat','a'
UNION ALL select 5,'cat','c'
UNION ALL select 1,'dog','a'
UNION ALL select 2,'dog','c'
UNION ALL select 3,'dog','b'
UNION ALL select 4,'dog','d'
UNION ALL select 4,'dog','a'
UNION ALL select 5,'dog','c'
UNION ALL select 6,'dog','c'
UNION ALL select 1,'panther','a'
UNION ALL select 2,'panther','a'
UNION ALL select 3,'panther','b'
UNION ALL select 4,'panther','c'
UNION ALL select 5,'panther','a'
UNION ALL select 6,'panther','c'
UNION ALL select 7,'panther','b'
UNION ALL select 8,'panther','a'
UNION ALL select 9,'panther','c'
UNION ALL select 10,'panther','b'
UNION ALL select 11,'panther','d'
DECLARE @SEARCH VARCHAR(100)
SELECT @SEARCH = --'a c b'
'a c b d'
;WITH
CTE_ROWID
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY animal ORDER BY animal) as RW
, *
FROM #tbl1
)
,CTE_CONCAT
AS
(
SELECT
CURR.RWAS RW
,CURR.NumberAS ORIGNUMBER
,CURR.gradeAS ORIGGRADE
,CURR.animal
,CAST(CURR.grade AS NVARCHAR(200)) as grade
FROM
CTE_ROWID CURR
UNION ALL
SELECT
PREV.RW
,CURR.ORIGNUMBERAS ORIGNUMBER
,CURR.ORIGGRADEAS ORIGGRADE
,CURR.animal
,CAST(PREV.grade + ' ' + CURR.grade AS NVARCHAR(200)) as grade
FROM
CTE_CONCAT CURR
INNER JOIN CTE_ROWID PREVON CURR.animal = prev.animal
and CURR.RW = PREV.RW + 1
)
SELECT
CTE_CONCAT.ORIGNUMBERas number
,CTE_CONCAT.animal
,CTE_CONCAT.ORIGGRADEas grade
FROM CTE_CONCAT
WHERE
grade = @SEARCH
drop table #tbl1
If you are using a temp table you might be able to trim the results to just the length of the pattern you are searching for. You can also add the level of recursion to the CTE based query and add it to the where clause.
January 5, 2011 at 2:31 am
Thanks guys for your help.
Yesterday I tried to solve that problem. With your suggestions I was moving forward. But... I got additional requirement now. The requirement is:
pattern may contain ranges e.g.
a [a-c] b d
Since I have all alphabet in the use it can be even:
[a-c] [b-c] [g-l] [y-z]
Argh...
But it is not a problem... It is a challenge :w00t:
January 5, 2011 at 2:44 am
Sebastian,
In the sample data you included this:
1 dog a
2 dog c
3 dog b
4 dog d
4 dog a
5 dog c
6 dog c
Note, the 4 for dog d and a is repeated. Is this a typo, or is does this row numbering actually repeat? The reason I'm asking is order control, and if it repeats, you can't be sure by any ordering if "4 dog d" or "4 dog a" comes first except dumb luck and it'll need another external control.
I may have a reasonably fast method to deal with this otherwise, however, I'll need to know that first.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 5, 2011 at 3:02 am
Good point,
It is an error. Sorry for that. I made the sample data manually.
January 5, 2011 at 3:27 am
First, tertiusdp needs to be thanked for making consumable sample data.
Now, for the code. I modified the sample data to remove the typo:
IF OBJECT_ID('tempdb..#tbl1') IS NOT NULL
DROP TABLE #tbl1
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
IF OBJECT_ID('tempdb..#tmp2') IS NOT NULL
DROP TABLE #tmp2
CREATE TABLE #tbl1(number INT , animal VARCHAR(10) , grade CHAR(1) )
INSERT INTO #tbl1
select 1,'cat','a'
UNION ALL select 2,'cat','b'
UNION ALL select 3,'cat','c'
UNION ALL select 4,'cat','a'
UNION ALL select 5,'cat','c'
UNION ALL select 1,'dog','a'
UNION ALL select 2,'dog','c'
UNION ALL select 3,'dog','b'
UNION ALL select 4,'dog','d'
UNION ALL select 5,'dog','a'
UNION ALL select 6,'dog','c'
UNION ALL select 7,'dog','c'
UNION ALL select 1,'panther','a'
UNION ALL select 2,'panther','a'
UNION ALL select 3,'panther','b'
UNION ALL select 4,'panther','c'
UNION ALL select 5,'panther','a'
UNION ALL select 6,'panther','c'
UNION ALL select 7,'panther','b'
UNION ALL select 8,'panther','a'
UNION ALL select 9,'panther','c'
UNION ALL select 10,'panther','b'
UNION ALL select 11,'panther','d'
This first component simply pivots the pattern field sideways.
This article will help you understand what it's doing if you require it:
http://www.sqlservercentral.com/articles/FOR+XML+PATH/70203/
SELECTDISTINCT animal,
REPLACE((SELECTgrade AS 'data()'
FROM#tbl1 AS t2
WHEREt1.animal = t2.animal
ORDER BY number
FOR XML PATH('')
), ' ', '') AS Pattern
INTO
#tmp
FROM
#tbl1 AS t1
SELECT * FROM #tmp
Next, we need to locate the positions where your pattern occurs. You'll have to use @PatLen as an input instead of deriving it. This uses a Tally table. You can find an explanation of that in my signature, but it's basically just a stored list of Numbers you can use instead of a counter loop, and runs much faster.
What we're doing is locating when the pattern exists in each subset of data (in this case, 4 characters, starting from each position), finding that pattern, then subtracting 1. This is the row position of the END of the pattern chain.
DECLARE @pattern VARCHAR(100),
@PatLen INT
SET @pattern = 'a[a-c]bd'
--SET @PatLen = LEN( @pattern)
SET @PatLen = 4
SELECT
t.Animal,
tl.N + @PatLen -1 AS RowNumToFind
INTO
#tmp2
FROM
#tmp AS t,
tempdb..Tally AS tl
WHERE
tl.N < LEN( t.Pattern)
AND SUBSTRING( t.Pattern, tl.N, @PatLen) like @Pattern
Finally, apply rownumbers to each partition of animal, and reference our previously located pattern ends.
SELECT
drv.*
FROM
(SELECT
*,
ROW_NUMBER() OVER ( PARTITION BY tb.Animal ORDER BY tb.number) AS RowNum
FROM
#tbl1 AS tb
) AS drv
JOIN
#tmp2 AS t2
ONdrv.Animal = t2.Animal
AND drv.RowNum = t2.RowNumToFind
I broke it up to make it easier to see each component. You could subquery the entire lot into a single query, however, I would recommend splitting at the #tmp2 portion either way, to save yourself extra overhead.
Let me know if that does what you need.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 5, 2011 at 4:30 am
Craig,
It works. I have to make some tweaks. E.g. pattern comes from SELECT from another table and I have to add some more conditions but it does what it should. I'm going to test it on larger amount of data. It has to be available almost instantly.
It is part of a process that selects n following rows and then gives a kind of histogram.
Fun, fun, fun 🙂
Thank all of you guys. It was and still is interesting challenge. Senores, I owe you cerveza. 😉
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply