August 1, 2011 at 3:23 pm
I have to manage various kinds of "data" and maintain "versions" of that data. To simplify the problem I have created a dummy data table and a dummy version table that has foreign key to the data id. I could just type the rest in English but the SQL code speaks for itself. My questions is at the bottom in SQL comments. Need to figure out which is faster because I'm getting sporadic results. Sometimes one is faster and sometimes the other.
One can simply copy / paste the SQL and run it
Thanks in advance.
---------------------------------
CREATE TABLE SomeData (
SomeDataIdBIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
SomeTextVARCHAR(64),
CreateDtDATETIME NOT NULL DEFAULT (GETDATE()),
)
CREATE TABLE SomeDataVersion (
SomeDataVersionIdBIGINT IDENTITY(1, 1) NOT NULL,
SomeDataIdBIGINT NOT NULL REFERENCES SomeData(SomeDataId),
CreateDtDATETIME NOT NULL DEFAULT (GETDATE())
)
DECLARE @Id BIGINT
INSERT INTO SomeData (SomeText) VALUES ('DontCare'); SELECT @Id = SCOPE_IDENTITY();
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeData (SomeText) VALUES ('DontCare'); SELECT @Id = SCOPE_IDENTITY();
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeData (SomeText) VALUES ('DontCare'); SELECT @Id = SCOPE_IDENTITY();
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
SELECT * FROM SomeData
SELECT * FROM SomeDataVersion
-- Which is FASTER?
-- THIS?
SELECT SomeDataId, MAX(SomeDataVersionId)
FROM SomeDataVersion t
GROUP BY SomeDataId
-- OR THIS?
;WITH cte AS (
SELECT
t.SomeDataVersionId,
t.SomeDataId,
ROW_NUMBER() OVER (PARTITION BY t.SomeDataId ORDER BY t.SomeDataVersionId DESC) AS RowNum
FROM SomeDataVersion t
)
SELECT cte.SomeDataId, cte.SomeDataVersionId FROM cte WHERE RowNum = 1
August 1, 2011 at 4:06 pm
Row_Number() is rarely faster, and usually only due to bad indexing/heaps, or because the data is incredibly limited that it has to number before moving forward. On large data tests it has failed rather badly on a number of benchmarks I've built.
Use MAX() or CrossApply as your default go-tos for logging table pulls. Use MAX() as your primary if you can't add supporting indexes for the Cross Apply (in particular because you need one in a DESC order for the TOP 1 clause).
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 1, 2011 at 4:10 pm
August 1, 2011 at 7:40 pm
Pagan DBA (8/1/2011)
-- Which is FASTER?
Do what any of us would do... TEST IT and see! 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 2, 2011 at 6:19 am
You guys actually test your code? More power to you :hehe:.
August 2, 2011 at 6:36 am
Performance is dependant on the amount of data and indexing. Even with indexes on the columns you are using for the ROW_NUMBER(), you still need to pull all results and then rank them based on your partiioning and ordering. If you just need the top 1, as suggested before, I have found that a subquery with MAX and GROUP BY is much quicker. I have also used CROSS APPLY as someone else suggested and it was faster than the ROW_NUMBER() method. I would limit ROW_NUMBER() and it's siblings to when you actually need to rank something vs pulling top 1 type operations.
August 2, 2011 at 7:29 am
Seriously, I did say I was getting sporadic results. How would I know that if I hadn't tested?
The idea behind the question was to get some insight into SQL and SQL Server.
Thanks to the gents who actually gave useful answer regarding ROW_NUMBER() rarely being faster on larger datasets and setting me straight on MAX()/GROUP BY
As an aside, I'm sorry but BIT(1) = 1/0 is my preferred choice to VARCHAR(1) = Y/N. This is because in spite of my efforts, I have always found a bozo who will circumvent whatever protection I offer using database constraints to but thinks like B (both?!?!?!) or U (unknown?!?!?!?) and mess up my application. Can't put say 9 in a BIT(1) can we now?
I really don't want to have to TEST my code to find this out. Prevention is the best cure. Y/N is for report writers, and they can write the logic for display.
Apologies for hijacking my own thread.
August 2, 2011 at 7:42 am
I assure you nobody wanted to insult you.
The bottom line is that each server / app is different and that I rarely if ever see something really standard accross the board.
The only rule I've seen stay true all those years is the less work the server does the better. As annoying as this explaination is it always hold true.
With that in mind, you can read this to understand a little better how the server works and how to see what's going on with each of your queries :
August 2, 2011 at 9:29 am
Pagan DBA (8/2/2011)
...
As an aside, I'm sorry but BIT(1) = 1/0 is my preferred choice to VARCHAR(1) = Y/N. This is because in spite of my efforts, I have always found a bozo who will circumvent whatever protection I offer using database constraints to but thinks like B (both?!?!?!) or U (unknown?!?!?!?) and mess up my application. Can't put say 9 in a BIT(1) can we now?
...
First of all you can put 9 in a BIT(1) , it will just turn to 1 😀
The second: it's not only 0 and 1 can be there... There is a third possible value: NULL
So, you can always find a hardcore bozo who will circumvent whatever protection you can setup and mess up your application. You just need to look in right places :hehe:
August 2, 2011 at 2:44 pm
Pagan DBA (8/2/2011)
As an aside, I'm sorry but BIT(1) = 1/0 is my preferred choice to VARCHAR(1) = Y/N. This is because in spite of my efforts, I have always found a bozo who will circumvent whatever protection I offer using database constraints to but thinks like B (both?!?!?!) or U (unknown?!?!?!?) and mess up my application. Can't put say 9 in a BIT(1) can we now?
If you're discussing this line from my signature:
Space is cheap! nVARCHAR(1) all your BITs today!
It's a joke. 😀 8 bits takes up one byte, 8 nVARCHAR(1)'s (Nullable) takes up 32 bytes. Per record. 😉
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 2, 2011 at 11:10 pm
Pagan DBA (8/2/2011)
Seriously, I did say I was getting sporadic results. How would I know that if I hadn't tested?The idea behind the question was to get some insight into SQL and SQL Server.
Seriously... the 9 rows you posted isn't a performance test. The reason why you're getting sporadic results is because both run so fast with that small bit of data that you tested with that even the heartbeat of SQL Server will interfer with such a paltry "test".
It turns out that the gents gave the correct answer [font="Arial Black"]but YOU still don't really know [/font]because YOU haven't learned how to test and they provided no evidence, not even a link, that they're correct in what they say. What would you have done, as so often happens on forums, if they gave the wrong answer? Would you still believe them just because there were two of them? Of course you would because you just did that very same thing. 😉
Since you still don't really know if they're right or not, do like I first suggested... test it.
Here's your first lesson on how to make enough test data for this particular comparison. 😉 I'll let you play with indexes.
USE TempDB
;
IF OBJECT_ID('TempDB.dbo.SomeDataVersion','U') IS NOT NULL
DROP TABLE TempDB.dbo.SomeDataVersion
;
SELECT TOP (1000000)
SomeDataVersionID = IDENTITY(INT,1,1),
SomeDataID = ABS(CHECKSUM(NEWID()))%250000+1,
CreateDt = (RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2010','2020'))+CAST('2010' AS DATETIME)
INTO dbo.SomeDataVersion
FROM sys.all_columns ac1,
sys.all_columns ac2
;
Protect yourself. Believe no one when it comes to questions like "Which is faster?"... not even me. 😉 Always do your own performance testing and learn what performance testing entails. I assure you that it doesn't have anything to do with 9 rows like you posted or even 10,000 rows.
Performance testing also has nothing to do with returning the data to the screen. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 3, 2011 at 4:50 am
Just as a minor point you should never use varchar(1) anyway - it takes up more storage than a char(1), regardless of if they both have a value in them or not.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 3, 2011 at 9:01 am
Err...the code sample I provided was to serve to explain my question. The sporadic results were on my REAL problem where I have several pieces of data EACH versioned and unpredictable data distributions in terms of how many versions each piece of data has. I couldn't possibly post sample code for all scenarios here. As has been explained to me in multiple replies, my problem is specific and I get that. Unfortunately I guess my code sample was perceived in a way one would think I'm asking a general question, which I wasn't.
As to what I really don't know yet, it is true I'm not as educated as most people here as far as SQL is concerned. The reason I joined this forum - to learn.
Finally, one gent DID provide a link on how to write a test. "It Depends" - where performance of TOP(1), MAX()/GROUP BY and ROW_NUMBER() is being compared. The single most succinct, useful and unpatronizing reply to my original question and which is what I plan to use by experimenting with various indexes on my tables.
Thanks all.
August 3, 2011 at 6:49 pm
Pagan DBA (8/3/2011)
The reason I joined this forum - to learn.
Not sure it will help at this point, but I recognized that. Looking back at what I posted, I can see why you might have thought so, but I wasn't trying to be patronizing.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2011 at 5:20 am
I published a series of tests on this subject for retrieving versioned data over here at Simple-Talk[/url]. It might prove useful. The answer, in a nutshell, it depends.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply