April 20, 2009 at 9:18 pm
Lynn Pettis (4/20/2009)
Is there more to the real world data that would support the sequence of data entered? is there a date field or other form of identification that signifies the order of entry into the table?
Yes, there is a DateTime field as well as a Primary Key field. (Like CheckInID in this example)
I have been using the Primary Key field for the sequence because it is indexed and the DateTime field is not.
April 20, 2009 at 9:21 pm
Lynn Pettis (4/20/2009)
I have not had any trouble with the "quirky update" method and I have tested it on a test table of up to 10 million records with out any problems. The only problem that I am aware of with the "quirky update" method is on partitioned tables. I wrote an article that addressed that issue, and you can find a link to that article below in my signature block.
Yes, I read your article. 🙂 In fact, it was the first place I looked when I began to have some difficulties with this method. I don't think that table is partitioned. Is there a way to check?
I haven't had an opportunity to revisit this yet, but I will try to this evening.
April 20, 2009 at 9:28 pm
Sounds like you need to redefine your problem somewhat. How about providing something closer to the actual problem we are working on. Can you give me something closer to what you really have to work with?
April 20, 2009 at 9:34 pm
Jeffrey Williams (4/20/2009)
I see how you have defined this as 3 consecutive rows - but, how is the system supposed to do that? What columns define the order such that the row 2 'B' comes before row 2 'C' and the rows with 2 and a blank somehow fall between?
By using the primary key field. That's how the order is determined. There is a DateTime field as well, but I prefer using the primary key because it is indexed.
I think this solution by Lynn would work if you created the ordering and partitioning.
I think I might be missing something. Is there a way to know if a table is partitioned?
On another note, I think the problem with your 'quirky update' method is the final select statement. You are looking for rows where you have a count(*) > 3 - but I don't see where the count is adjusting when you find a blank.
Here's what I'm trying to do. First I am doing a "quirky" update with a temp table. I am creating "groups" of data. The group counter increments when we start a new group. In this case a new user or a non-blank field. After that I am counting the rows to see if 3 blanks exist.
Not sure if that makes sense.
April 20, 2009 at 9:40 pm
I believe that this will do the job:
SELECT Distinct UserID
From (
SELECT *
, ROW_NUMBER() Over(partition by UserId, Comment, CmtTypeGrp order by CheckInID) as GrpCnt
From (
SELECT CheckInID
, UserID
, comment
, ROW_NUMBER() Over (partition by UserID order by CheckInID) -
ROW_NUMBER() Over (partition by UserID, Comment order by CheckInID) as CmtTypeGrp
FROM #CheckIn
) g
Where Comment = ''
) U
Where GrpCnt = 3
(I am ecstatic that I got to use this trick before Jeff Moden 😀 ).
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 20, 2009 at 9:41 pm
Lynn Pettis (4/20/2009)
Sounds like you need to redefine your problem somewhat. How about providing something closer to the actual problem we are working on. Can you give me something closer to what you really have to work with?
As odd as this may sound, we are really pretty close to the actual problem.
I have a table which is updated from a web application which is being used by thousands of users at once. Every time the user navigates a record is added to my table.
My requirement is to first filter out certain data, and then find all users who have 3 blank records in a row. The only parts I've skipped out here is having extra filter conditions, and the actual nature of the data.
April 20, 2009 at 9:45 pm
Based upon your sample data - you don't have a primary key. The examples given here don't include a primary key definition.
For either solution to work - you need to identify the ordering for each group, and how each partition is defined.
Once you have that defined, we can modify Lynn's row_number() so it works correctly.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
April 20, 2009 at 9:56 pm
Well, I am a little confused. Why isn't CheckInID sufficient as a primary key & ordering principle?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 20, 2009 at 9:57 pm
Jeffrey Williams (4/20/2009)
Based upon your sample data - you don't have a primary key. The examples given here don't include a primary key definition.
My bad. Here's the same table with a primary key.
-- sample table
CREATE TABLE #CheckIn
(
CheckInID INT IDENTITY(1,1) PRIMARY KEY
,UserID INT
,Comment VARCHAR(50)
,CheckInDate DATETIME DEFAULT(GetUtcDate())
)
For either solution to work - you need to identify the ordering for each group, and how each partition is defined.
The order for each group is the UserID and then CheckInID (or CheckInDate).
I don't know what you mean by partition here.
April 20, 2009 at 10:10 pm
RBarryYoung (4/20/2009)
I believe that this will do the job:
SELECT Distinct UserID
From (
SELECT *
, ROW_NUMBER() Over(partition by UserId, Comment, CmtTypeGrp order by CheckInID) as GrpCnt
From (
SELECT CheckInID
, UserID
, comment
, ROW_NUMBER() Over (partition by UserID order by CheckInID) -
ROW_NUMBER() Over (partition by UserID, Comment order by CheckInID) as CmtTypeGrp
FROM #CheckIn
) g
Where Comment = ''
) U
Where GrpCnt = 3
(I am ecstatic that I got to use this trick before Jeff Moden 😀 ).
Very cool! It works perfectly. Although I don't really get how it works. Would you explain?
I just tested it on real data -- 1.6 million records.
Your solution took 23 seconds, Bevan's solution took 3. (Not official times, just SSMS time from query window)
I'm going to try to test it tommorrow on a larger data set.
April 20, 2009 at 10:19 pm
Here is another attempt at solving your problem:
CREATE TABLE #CheckIn
(
CheckInID INT IDENTITY(1,1)
,UserID INT
,Comment VARCHAR(50)
)
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''
-- create a temp table for running total because we can't alter existing production table
CREATE TABLE #OrderedCheckIn
(
CheckInIDINTNOT NULL
,UserIDINTNOT NULL
,CommentVARCHAR(50)
,GroupCounterINT
)
INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)
SELECT CheckInID
,UserID
,Comment
FROM #CheckIn
--===== MUST have a clustered primary key to GUARANTEE this will work
ALTER TABLE #OrderedCheckIn
ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)
--===== Create the required local variables
DECLARE @user-id INT
,@GroupCounter INT
,@CheckInID INT
SET @GroupCounter = 0;
SET @user-id = -1;
UPDATE #OrderedCheckIn
SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)
THEN @GroupCounter + 1
ELSE 1 END
,@UserID = UserID
,@CheckInID = CheckInID
FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!
SELECT * --UserID
FROM #OrderedCheckIn
select
UserID
from
#OrderedCheckIn
group by
UserID, GroupCounter
having
GroupCounter > 3;
-- clean up
DROP TABLE #OrderedCheckIn
DROP TABLE #CheckIn
Take a look and let me know.
April 21, 2009 at 3:49 pm
Lynn: I can't get your solution to work correctly. GroupCounter is always 1 on my live data. I can't understand it.
Barry: As I suspected, it looks like your solution out-performs Bevan's as the volume increases.
On a table of 22+ million records, your solution took only 6 minutes.
Last time I ran Bevan's on that same table it took more than one hour!
I tried to run it now and it's been going for 15 minutes. I'm going to cancel it before I get a karate chop from my DBA...
Also, Barry, would you be able to explain your query?
April 21, 2009 at 4:03 pm
Hi Goldie,
Do you have an index on UserID?
April 21, 2009 at 7:08 pm
Goldie Graber (4/21/2009)
Also, Barry, would you be able to explain your query?
Sorry Goldie, not trying to be evasive but my brain isn't at 100% right now and this technique is hard to explain.
Here is an article where it is used and explained by Itzik Ben-Gan: http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html
Here's another thread where it is used and Jeff and I talk about it. http://www.sqlservercentral.com/Forums/Topic646593-338-1.aspx
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 21, 2009 at 8:19 pm
Lynn Pettis (4/20/2009)
Here is another attempt at solving your problem:
CREATE TABLE #CheckIn
(
CheckInID INT IDENTITY(1,1)
,UserID INT
,Comment VARCHAR(50)
)
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''
-- create a temp table for running total because we can't alter existing production table
CREATE TABLE #OrderedCheckIn
(
CheckInIDINTNOT NULL
,UserIDINTNOT NULL
,CommentVARCHAR(50)
,GroupCounterINT
)
INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)
SELECT CheckInID
,UserID
,Comment
FROM #CheckIn
--===== MUST have a clustered primary key to GUARANTEE this will work
ALTER TABLE #OrderedCheckIn
ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)
--===== Create the required local variables
DECLARE @user-id INT
,@GroupCounter INT
,@CheckInID INT
SET @GroupCounter = 0;
SET @user-id = -1;
UPDATE #OrderedCheckIn
SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)
THEN @GroupCounter + 1
ELSE 1 END
,@UserID = UserID
,@CheckInID = CheckInID
FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!
SELECT * --UserID
FROM #OrderedCheckIn
select
UserID
from
#OrderedCheckIn
group by
UserID, GroupCounter
having
GroupCounter > 3;
-- clean up
DROP TABLE #OrderedCheckIn
DROP TABLE #CheckIn
Take a look and let me know.
I'm not sure why it isn't working for you. I run the above code as is and it works correctly with the test data.
Can you post the DDL and code you are trying to run?
Viewing 15 posts - 16 through 30 (of 38 total)
You must be logged in to reply to this topic. Login to reply