April 30, 2009 at 10:49 am
Of course!
Actually I thought some of you might have a better solution than I might come up with.
Anyone got a better performing solution than Gail?
April 30, 2009 at 10:59 am
Nope, just would have written it with a CTE instead of a derived table.
April 30, 2009 at 11:13 am
Steve Jones - Editor (4/30/2009)
Anyone got a better performing solution than Gail?
Heh. Now you know we'll need to see the execution plan to better answer that. 😀
Seriously though, we do need to see it. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 30, 2009 at 11:31 am
Hmm, depending on the duplicates density, this might be faster:
;WITH cteUsers as (
SELECT UserID
, UserName
, ROW_NUMBER() OVER (Partition By UserName Order By LastLoggedIn DESC) incr
FROM Users
)
UPDATE cteUsers
SET UserName = UserName + CAST(incr-1 AS VARCHAR(3))
Where incr > 1
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 30, 2009 at 11:40 am
RBarryYoung (4/30/2009)
Steve Jones - Editor (4/30/2009)
Anyone got a better performing solution than Gail?Heh. Now you know we'll need to see the execution plan to better answer that. 😀
Use this to populate rather than a few hardcoded rows
Insert Into Users (UserName, LastLoggedIn)
SELECT TOP 40000 CHAR(65+FLOOR(RAND(a.column_id*801020 + a.column_id*705020)*12)) + CHAR(65+FLOOR(RAND(a.column_id*6010 + b.object_id)*5)), DATEADD(dd,RAND(a.column_id*600 + b.object_id/1.5)*500,'2008/01/01')
FROM master.sys.columns a CROSS JOIN master.sys.columns b
It gives lots of dups, expand the varchar in the update to VARCHAR(5).
Exec plan attached.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Users'. Scan count 2, logical reads 80538, physical reads 0
Table 'Worktable'. Scan count 1, logical reads 80842, physical reads 0
SQL Server Execution Times:
CPU time = 422 ms, elapsed time = 445 ms.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 30, 2009 at 11:44 am
RBarryYoung (4/30/2009)
Hmm, depending on the duplicates density, this might be faster:
Better on IOs, worse on time (at least on my PC)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Users'. Scan count 1, logical reads 81083
SQL Server Execution Times:
CPU time = 1250 ms, elapsed time = 1291 ms.
Take the where clause out and it's way better than the update with join
;WITH cteUsers as (
SELECT UserID
, UserName
, ROW_NUMBER() OVER (Partition By UserName Order By LastLoggedIn DESC) incr
FROM Users
)
UPDATE cteUsers
SET UserName = UserName + CAST(incr-1 AS VARCHAR(5))
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 22 ms.
Table 'Users'. Scan count 1, logical reads 81199.
SQL Server Execution Times:
CPU time = 438 ms, elapsed time = 458 ms.
Since there's no way to index that incr column, the where's not helping much. Might help with writes. I've got way too many dupes in the test data to tell.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 30, 2009 at 12:16 pm
GilaMonster (4/30/2009)
RBarryYoung (4/30/2009)
Hmm, depending on the duplicates density, this might be faster:Better on IOs, worse on time (at least on my PC)
Yeah, I'm pretty sure that there's a whole "seek vs scan" threshold in there, even on the output side. My SWAG would be less than 5%, seek (use the WHERE clause), more than 20%, definitely scan (no WHERE clause), in-between, "it depends".
Take the where clause out and it's way better than the update with join
;WITH cteUsers as (
SELECT UserID
, UserName
, ROW_NUMBER() OVER (Partition By UserName Order By LastLoggedIn DESC) incr
FROM Users
)
UPDATE cteUsers
SET UserName = UserName + CAST(incr-1 AS VARCHAR(5))
Don't forget to put the CASE back in if you do.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 30, 2009 at 12:37 pm
Ha! I knew someone would take up the challenge.
I'll run a couple execution plans on these at night, see which one works better in my 800k row table.
April 30, 2009 at 12:44 pm
RBarryYoung (4/30/2009)
Don't forget to put the CASE back in if you do.
True. That's what I get for posting while playing computer games....
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 30, 2009 at 1:17 pm
OK, I used to following to generate test data:
delete from Users
Insert into Users
Select TOP 1000000
Left(c1.name, 19)
+ right(cast(newid() as varchar(36)), 2)
+ Left(c2.name, 19)
, Cast( abs(c1.object_id+c2.column_id)/1000.0 as datetime) as LastLoggedIn
From master.sys.system_columns c1, master.sys.system_columns c2
Which gives about 50,000 duplicates which is 5% (it varies though and this technique does not produce any really long chains, though I don't think that that should matter much).
At that density the WHERE version takes about 5 sec on my system and the No-WHERE version takes about 20. So 10% is probably a good dividing line.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 30, 2009 at 1:25 pm
The other thing to beware of is that there are issues with names whose length hits the max length (40 chars). I didn't see it until I used my last generating method, because it tends to produce really long names.
The upshot is that if you have 40-char names that are duplicates, you cannot just add a digit on to the end because then the name will be too long and it will fail.
I doubt that you have very many of these, Steve and you may not have any, so it's questionable how much work you want to do to cover this case. I would suggest just ignoring them by putting a "LEFT(... , 40)" around the final value of the UserName assignment.
Then if you have any of them and still need to fix them, use a different query or just do it by hand. Seriously, there cannot be that many, who uses a 40-character name?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 30, 2009 at 1:36 pm
ALTER TABLE Users ALTER COLUMN UserName VARCHAR(42)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 30, 2009 at 3:19 pm
Here's another approach (using a large hammer):
DECLARE@sql VARCHAR(500);
SET@sql = 'ALTER DATABASE ' + DB_NAME() + ' SET RECOVERY BULK_LOGGED;'
EXECUTE(@sql)
IF@@error != 0 RETURN;
;WITHCTE (UserID, SeqNo, UserName, LastLoggedIn) AS
(
SELECTUserID,
ROW_NUMBER() OVER (PARTITION BY UserName ORDER BY LastLoggedIn DESC) AS SeqNo,
UserName + CONVERT(VARCHAR(10), ROW_NUMBER() OVER (PARTITION BY UserName ORDER BY LastLoggedIn DESC)), LastLoggedIn
FROMdbo.Users
)
SELECTUserId, UserName, LastLoggedIn
INTOdbo.[9D2213FB-3FAB-40BA-9F7C-7C3B9CB8A7F7]
FROMCTE
WHERESeqNo > 1
BEGIN TRANSACTION
BEGIN TRY
DROP TABLE dbo.Users;
EXECUTE sp_rename N'dbo.[9D2213FB-3FAB-40BA-9F7C-7C3B9CB8A7F7]', N'Users', 'OBJECT'
END TRY
BEGIN CATCH
IF XACT_STATE() != 0 ROLLBACK TRANSACTION;
RAISERROR('Carp!', 16, 1);
END CATCH
COMMIT TRANSACTION;
144 logical reads, < 400ms elapsed on my laptop.
Paul
April 30, 2009 at 3:30 pm
Excluding, of course, creation of any constraints or indexes
With the same table that I used earlier, I make it
Table 'Users'. Scan count 1, logical reads 269, physical reads 0
SQL Server Execution Times:
CPU time = 109 ms, elapsed time = 373 ms.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 30, 2009 at 3:40 pm
I used:
Insert Into dbo.Users (UserName, LastLoggedIn)
SELECT TOP 40000 CHAR(65+FLOOR(RAND(a.column_id*801020 + a.column_id*705020)*12)) + CHAR(65+FLOOR(RAND(a.column_id*6010 + b.object_id)*5)), DATEADD(dd,RAND(a.column_id*600 + b.object_id/1.5)*500,'2008/01/01')
FROM master.sys.columns a CROSS JOIN master.sys.columns b
from a previous post...was that yours? If so, the royalty cheque (check) is in the post 🙂
Yes I forgot to re-create the primary key - silly boy. And yes, other stuff may well need to be added. It was just an alternative idea.
ALTER TABLE dbo.Users ADD CONSTRAINT [PK dbo.Users UserID] PRIMARY KEY CLUSTERED (UserID) WITH (ONLINE=OFF,MAXDOP=4,FILLFACTOR=100);
[font="Courier New"]Table 'Users'. Scan count 1, logical reads 152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 62 ms, elapsed time = 225 ms.[/font]
Viewing 15 posts - 16 through 30 (of 36 total)
You must be logged in to reply to this topic. Login to reply