April 7, 2009 at 4:39 pm
This is a sample for a larger problem I'm trying to solve.
I am looking for a list of users who have three consecutive blank comments.
I have tried using Row_Number and Jeff's "Quirky Update" but I just can't seem to figure it out.
Here's some sample DDL & DATA to work with:
-- sample table
CREATE TABLE #CheckIn
(
CheckInID INT IDENTITY(1,1)
,UserIDINT
,Comment VARCHAR(50)
)
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
SELECT *
FROM #CheckIn
-- clean up
DROP TABLE #CheckIn
I would expect to see UserID 1 as a result.
Thanks!
April 7, 2009 at 5:14 pm
I think this may have something to do with the amount of data being manipulated.
The table we are talking about contains 4 million to 10 million rows.
The update method seems to work on a subset of the data, but it seems to act weird when there are millions of rows.
Has anyone experienced this?
April 7, 2009 at 5:26 pm
Hi. What is your current update statement?
Bevan
April 7, 2009 at 5:36 pm
Here's what I have now. It seems like it's working correctly.
However, when I try it on a real data table with 5 million rows it returns inconsistent results.
-- create a temp table for running total because we can't alter existing production table
CREATE TABLE #OrderedCheckIn
(
CheckInIDINTNOT NULL
,UserIDINTNOT NULL
,CommentVARCHAR(50)
,GroupCounterINT
)
INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)
SELECT CheckInID
,UserID
,Comment
FROM #CheckIn
--===== MUST have a clustered primary key to GUARANTEE this will work
ALTER TABLE #OrderedCheckIn
ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)
--===== Create the required local variables
DECLARE @user-id INT
,@GroupCounter INT
,@CheckInID INT
SET @GroupCounter = 0
UPDATE #OrderedCheckIn
SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)
THEN @GroupCounter
ELSE @GroupCounter + 1 END
,@UserID = UserID
,@CheckInID = CheckInID
FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!
SELECT UserID
FROM #OrderedCheckIn
GROUP BY GroupCounter
,UserID
HAVING COUNT(*) > 3
-- clean up
DROP TABLE #OrderedCheckIn
DROP TABLE #CheckIn
April 7, 2009 at 5:47 pm
hi Goldie,
Before you posted that code I was trying to solve this another way - by concatenating all the comments and then searching for three blanks in a row
with cte(UserID, Comments)
as
(select distinct a.userID ,(select ',' + Comment from #CheckIn b where a.userID = b.userID for XML path('')) as comments
from #CheckIn a
group by a.UserID)
select * from cte where Comments like '%,,,%'
Does that work for you? I am still looking at the other code you posted.
Bevan
April 8, 2009 at 9:10 am
Now THAT's one really smart idea!
With the addition of an ORDER BY and a trailing comma it works perfectly.
Here's the final code:
;WITH OrderedCheckIn AS
(
SELECT UserID
, (SELECT ',' + ISNULL(Comment,'')
FROM #CheckIn B
WHERE B.UserID = A.UserID
ORDER BY B.CheckInID
FOR XML PATH('')) + ',' AS AllComments
FROM #CheckIn A
GROUP BY UserID
)
SELECT UserID
FROM OrderedCheckIn
WHERE AllComments LIKE '%,,,,%'
April 8, 2009 at 9:21 am
You can also check this out:
--- sample table
CREATE TABLE #CheckIn
(
CheckInID INT IDENTITY(1,1)
,UserID INT
,Comment VARCHAR(50)
)
--- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C';
WITH BlankComments (
RowNum,
CheckInID,
UserID,
Comment
) as (
SELECT
row_number() over (partition by UserID, Comment order by UserID, Comment) as RowNum,
CheckInID,
UserID,
Comment
FROM #CheckIn
)
select
UserID,
Comment
from
BlankComments
where
RowNum = 3 and
Comment = ''
--- clean up
DROP TABLE #CheckIn
April 8, 2009 at 9:55 am
That won't work, Lynn. It doesn't take into account consecutive blanks.
For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''
April 8, 2009 at 10:01 am
Goldie Graber (4/8/2009)
That won't work, Lynn. It doesn't take into account consecutive blanks.For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''
I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.
April 8, 2009 at 10:05 am
Lynn Pettis (4/8/2009)
I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.
There are not 3 consecutive blanks for UserID 2.
His Comments are 'B','','','C',''.
UserID 1 has 3 consecutive blanks
His Comments are 'A','','',''
April 8, 2009 at 10:18 am
I missed th 2 'C'. Sorry. Still looking at an alternative solution.
April 20, 2009 at 5:19 pm
As a post script, I ended up using Bevan's method.
It works well on tables with up to a few million records. Takes a minute or two.
When I had to run it on a table of 22 million it took over an hour!!
We will need to run this code in our production environment every once in a while, so I would appreciate if someone could help tweak this code or come up with a different approach.
On a different note, has anyone had trouble using Jeff's running total method before?
It seems to fail on tables with a few million records.
April 20, 2009 at 8:04 pm
Goldie Graber (4/20/2009)
As a post script, I ended up using Bevan's method.It works well on tables with up to a few million records. Takes a minute or two.
When I had to run it on a table of 22 million it took over an hour!!
We will need to run this code in our production environment every once in a while, so I would appreciate if someone could help tweak this code or come up with a different approach.
On a different note, has anyone had trouble using Jeff's running total method before?
It seems to fail on tables with a few million records.
I have not had any trouble with the "quirky update" method and I have tested it on a test table of up to 10 million records with out any problems. The only problem that I am aware of with the "quirky update" method is on partitioned tables. I wrote an article that addressed that issue, and you can find a link to that article below in my signature block.
I haven't had an opportunity to revisit this yet, but I will try to this evening.
April 20, 2009 at 8:08 pm
Is there more to the real world data that would support the sequence of data entered? is there a date field or other form of identification that signifies the order of entry into the table?
April 20, 2009 at 8:22 pm
Lynn Pettis (4/8/2009)
Goldie Graber (4/8/2009)
That won't work, Lynn. It doesn't take into account consecutive blanks.For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''
I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.
I see how you have defined this as 3 consecutive rows - but, how is the system supposed to do that? What columns define the order such that the row 2 'B' comes before row 2 'C' and the rows with 2 and a blank somehow fall between?
I think this solution by Lynn would work if you created the ordering and partitioning.
On another note, I think the problem with your 'quirky update' method is the final select statement. You are looking for rows where you have a count(*) > 3 - but I don't see where the count is adjusting when you find a blank.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 15 posts - 1 through 15 (of 38 total)
You must be logged in to reply to this topic. Login to reply