February 24, 2009 at 6:41 pm
If memory serves correctly, the outer cylinders of a disk actually have more sectors of data than the inner cylinders simply because there's more real-estate. Even the old "windsor" disks were formatted like that. The "new" read/write heads are smaller, more sensitive, and much more accurate so they can keep up with the "edge speeds".
Part of the reason why the "middle cylinders" use to be the sweet spot is because of the physical size of the read head and the angle it was held in relation to the cylinder paths themselves... when the harddisks where larger in diameter, the head skew was much greater and that affected the "bit spacing" on the outer tracks and the inner cyclinders. That's not so much of a problem anymore.
Of course, I haven't taken one apart in years to measure such things so my information isn't first hand anymore and I could definitely be wrong about these super fast disks they have now.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2009 at 7:04 pm
You're a lot more current than I am, so I'll take your word for it. Thanks 🙂
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 24, 2009 at 8:45 pm
Bob Hovious (2/23/2009)
Also, I think you missed the point about compression.If I return 1000 rows that read
John|Smith|1,2,3 (assume that's average number of characters)
instead of 10000 rows that read
John|Smith|1
John|Smith|2
John|Smith|3
it looks to me like I'm saving (John|Smith)*2 in exchange for a couple of commas That's seems like a pretty good trade.
I agree... for that short of a concatenation and that long of a "base", I most definitely agree.
But, I've been running into folks (mostly on the outside world) that want to concatenate 30 or more columns to a 6 or 7 digit ID... the number of commas now outweigh the byte savings of your good example and the server is punished twice... once to do the concatenation and once to send the extra bytes formed by the commas.
Ordinarily I'd say "It Depends", as it obviously does, but some folks just don't know to do an analysis as someone like you would and they end up just beating the crap out of the server with string aggregates. Although your fine example is on the "worth it" side, I've found myself taking the high road and just telling folks, "Don't do it in the database, do it in the GUI (if you have one)". That way, no one in a pinch get's the idea to do it on 30 or more columns because they saw an example with 3 columns.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2009 at 8:49 pm
Oh yeah... almost forgot... the big reason why I wanted to know what the OP was going to do with the string aggregates is that a lot of people think that a lot of rows are bad and will summarize a table by storing a string aggregate and then getting rid of the original table... a "Bozo-No-No" for sure.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2009 at 9:01 pm
My first thought was, "You're kidding.....", but I know you are not.
O geez.
By the way I experimented with a larger number of departments (20) and the concatenation started running a good bit faster than the simple display. The output size to disk of the concatenated query was around 100k and the output size of the simple display was around a meg. Easy to see why:
more rows concatenated per employee = greater "compression ratio"
What I'm carrying away from this is that using FOR XML to concatenate values from a number of rows is a special case. Performance is a function of the size of the values being concatenated, the average number of rows in a group, and the average size of the values in the columns that are grouped upon.
For things like firstname|middleInitial|lastname|suffix, I will always follow the general rule and send them as discrete columns for the UI to do with as it will.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 24, 2009 at 9:33 pm
Test data...
DROP TABLE JBMTest
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
Test code...
SET STATISTICS TIME ON
SELECT t1.SomeInt,
STUFF((SELECT ',' + t2.SomeLetters2 FROM dbo.JBMTest t2 WHERE t1.SomeInt = t2.SomeInt FOR XML PATH('')),1,1,'') AS Nodes
FROM dbo.JBMTest t1
GROUP BY t1.SomeInt
SELECT SomeInt,SomeLetters2
FROM dbo.JBMTest
SET STATISTICS TIME OFF
Test results...
(50000 row(s) affected)
SQL Server Execution Times:
CPU time = 19281 ms, elapsed time = 29566 ms.
(1000000 row(s) affected)
SQL Server Execution Times:
CPU time = 703 ms, elapsed time = 25840 ms.
The concatenation used 27.42 times more cpu and, if you consider the display as the I/O, the concatenation was longer in duration, as well.
Even though the concatenation sent about half the number of bytes (because the concatenated operands were so short), which do you suppose is tougher on the server?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 25, 2009 at 9:02 am
Jeff, this is a tremendous learning experience for me, so please hang in with me a little while longer.
I notice that you are grouping on an int column and I was grouping on two varchar columns with a combined value length of 15-20 bytes. That costs a lot of the "compression" advantage (5-1 in this case). But I'm not and never will be surprised at the cpu difference.
However, I just had my eyes opened when I wrote the query results to disk to see how big the difference was in output size. I've been testing the queries by displaying the results, because SQL Mgmt studio is a UI and I figured that was the fairest test. When I display the results I get the same CPU results, but elapsed time for the concatenation function is half that of the simple display. Literally, I have all concatenated results displayed in 11 seconds, while the simple display takes 24 seconds.
I had been assuming that the bottleneck was on the SQL Server, and that the difference in elapsed time is due to the work (cpu and i/o) of buffering and transporting a million rows instead of 50,000. This isn't reflected in the CPU measurement, and I wasn't prepared to just to ignore that work on the server. My thinking had been that I would accept a short CPU spike to get a job over with and out of the system. Most of the CPU cycles on our SQL servers are used for thumb-twiddling and it seemed just common sense that the query that ran in half the time was the more efficient query.
But when I just wrote the results to files (on my laptop), I saw results for elapsed times similar to yours. (I'm running the code on a shared development server and viewing the results on my laptop, connected by our internal LAN.)
Since writing to disk produces different results than displaying the data, even though I'm writing to disk on my laptop, the difference in elapsed time must be due to having to wait for the laptop to display. I'm assuming that the wait is a minimal amount of work for the server, but assuming has gotten me into trouble before. Am I on the right track with my analysis so far?
-----------
By the way, thanks for taking the time to work me through this. I have learned a great deal from reading your articles and your forum posts, but perhaps the greatest lesson you reminded me of is to test and to understand rather than just blindly follow what someone says. Unfortunately, I'm applying that lesson to you as well, even though everyone tells me "If Jeff says it, just go with it."
There's an old Zen saying: "If you meet the Buddha on the road, kill him." Don't worry, I'm not out to get you... I'm just seeking my own enlightenment 😉 When the light bulb goes on, I can change my entire way of thinking and approaching problems in a heartbeat. (I was quite prepared to keep arguing, until I saw the elapsed time results for writing to disk.) So thanks again.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 25, 2009 at 7:55 pm
Bob Hovious (2/25/2009)
Unfortunately, I'm applying that lesson to you as well, even though everyone tells me "If Jeff says it, just go with it."
I absolutely agree with and support that thought. Too many people have become SQL Clones and it's becoming a bloody epidemic. Very happy that you're testing right along here.
(I was quite prepared to keep arguing, until I saw the elapsed time results for writing to disk.)
Every once in a while, especially when I'm tired, I'll let my guard down and argue... and it never works because "A man, forced against his will, is of the same opinion still." So, I normally just let the code talk for me. Nothing else matters... just the code. 😉
So thanks again.
You bet... thank you for being a gentleman about it all. I've seen many a heated battle on this forum about all sorts of stuff. None of it mattered until someone posted code and others tried it.
Shifting gears back to string aggregates...
You mention that I used an ID instead of a name... that's what I've seen lot's of folks do and you can see the results even when using just 2 letters in the concatenation. What scenario would you like to see? You post the table definition and the distribution of data that you'd like to see, and I'll write some code to test it. Just don't go nuts on me because I don't have a huge amount of extra time because work is an hour and ten minute drive away.... on good days.
The hard part for me is that I usually don't mix business with pleasure when it comes to company machines. But, the best way to test this is to come closer to real life where the database server isn't the machine that's issuing the query. If you have a particular scenario that you'd like to test, like I said, post the table definition and the distribution of data that you'd like to see and I'll see if I can give it a whirl at work after work.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 26, 2009 at 7:55 am
Establishing that the elapsed time I was seeing was due to display issues, not a load on the SQL server pretty much convinced me. I'm not tossing the FOR XML trick out of my toolbox, but I will definitely be more cautious about when and if I use it.
My only point about the INT was that it makes for less of a "compression" by concatenation. When I've been asked for this kind of thing it has usuallly been a request from the reports group in the context of reports of some sort with a full name and a some personal information included on each row. I will post up one more example for you later today, but I think I can surrender with honor from this test war.
Can my men keep their horses, for the spring plowing? 😉
------------------------------------------------------------
P.S.
thank you for being a gentleman about it all
I left IBM due to a mental illness that compelled me to go to law school and practice briefly as a litigation attorney before deciding I just HATED it. I've spent years dealing with people who acted like s.o.b.'s because they felt it was a professional obligation. So, I've had my fill of pointless anger and posturing. I just want to get better at what I do and hopefully not stop learning until I die... and maybe not even then.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
January 9, 2011 at 5:57 am
Check out this link
<a href="http://itdeveloperzone.blogspot.com/2010/12/display-column-of-different-rows-as.html">Display column of different rows as column of a single row</a>
January 9, 2011 at 6:25 am
sandeepmittal11 (1/9/2011)
Check out this link<a href="http://itdeveloperzone.blogspot.com/2010/12/display-column-of-different-rows-as.html">Display column of different rows as column of a single row</a>
Did you realize this thread is almost two years old?
And the link you posted shows the same solution as Jeff did in his latest post.
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply