May 8, 2009 at 6:55 pm
Comments posted to this topic are about the item A SQL Server Issue? (Database Weekly May 11, 2009)
May 9, 2009 at 11:47 am
According to this:
http://sqlcat.com/faq/archive/2009/05/08/windows2008-r2-beta-download-runs-smoothly-now.aspx
...this was nothing to do with SQL Server, CPU spikes, page splitting or GUIDs. The site was simply configured for an expected 20% increase in traffic (and was prepared to handle a 100% increase) but in the event load went up by 500%...!
Given that information, direct from the SQL Server team, it's no wonder things slowed down a bit.
As always, making bold statements before the facts are known is liable to make the participants look a little silly...:-P
Cheers,
Paul
May 10, 2009 at 12:35 am
Uh oh, the dreaded clustered vs. non-clustered argument. Well, in this specific case, it might have been better to go with the non-clustered option. But it's easy to say that after the fact, isn't it?
To quote from one of my favorite TV shows EVER - "This has all happened before, and it will all happen again."
(Battlestar Galactica 2004, btw)
James Stover, McDBA
May 10, 2009 at 1:50 am
James,
Have you read the article at the link I posted?
Paul
May 10, 2009 at 4:09 am
Paul White (5/10/2009)
James,Have you read the article at the link I posted?
Paul
Er, no sorry. Read it just now. OK, so it was a capacity issue unrelated to SQL. Well, that's good news. Unfortunately, thanks to the lightning speed at which mis-information spreads on the internet, this issue will probably be "sticky" for SQL Server for a while.
Regarding the ZDNet blog, I quote:
"Ed Bott is an award-winning technology writer with more than two decades' experience writing for mainstream media outlets and online publications."
Well Ed, the SQLCAT team has posted an official explanation. Where is your official correction on the topic? Award-winning journalists publicly correct their mis-information.
James Stover, McDBA
May 10, 2009 at 4:17 am
Hey James,
Yes it is spreading, but there is some fire-control too:
http://sqlblog.com/blogs/andrew_kelly/archive/2009/05/09/so-the-real-story-is.aspx
It will be interesting to see if Ed does post a correction.
Paul
May 11, 2009 at 9:37 am
Paul,
Excellent link and thanks for posting.
I don't think it changes the point of the editorial: Microsoft should ensure the blame, or explanation, is the architecture and planning, not the platform.
May 11, 2009 at 12:25 pm
Yes, it was a capacity planning issue because they hadn't tested the schema for that load. As soon as they rebuilt the clustered index to remove the fragmentation, things sped up. Not a SQL problem, a design and planning problem.
And the CAT team article didn't say anything about what it *wasn't* - only that it was a capacity planning issue. That's a nice catch-all for all kinds of performance problems...
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
May 11, 2009 at 12:27 pm
I pinged the ZDNet writer. He is looking into it and will likely post an update.
May 11, 2009 at 12:36 pm
And also, if it turns out that what was reported originally was BS, I did caveat my blog post by saying "Now, this is slight conjecture, as I don't know the exact schema, but it's the only thing that explains what's been divulged so far".
I'll be happy to hear a definitive statement that it had nothing to do with the unanticipated load on the schema and lack of frequent defrags.
Thanks
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
May 11, 2009 at 2:51 pm
Latest I've heard from the SQL team today is that the absolute root-cause is still being investigated.
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
May 11, 2009 at 3:50 pm
Steve, Paul
Hey guys. My main concern here is that Steve's editorial, and Paul's article (even after the edit) may well give people the wrong impression.
If it is the intent of the editorial to berate Microsoft for not providing correct information, and for handling the whole issue badly, then I agree whole-heartedly. I am not sure that the editorial makes this entirely clear, however. The link to Paul's article and the surrounding text very much gives the impression that something fundamental about SQL Server was responsible.
Of course that could still turn out to be the case, but it seems much more likely that the 500% increase in traffic simply exceeded the designed capacity of either the hardware or the database design. I guess we will see.
For my money, both the editorial and Paul's article have way too confident an aura about the cause. Remember that many people seeing Paul's name and an SSC editorial will take what is read as gospel.
Even if both are revised when (if) the true cause is known, the damage will have been done.
Cheers,
Paul
May 11, 2009 at 4:03 pm
Fair enough, Paul White. I'm not sure what I could do here now other than watch for more info and revise things when it's available.
If it's not clear to you, perhaps I need to go through it again and build a better message for this event and the future.
Any comments on what I could say better or differently?
May 11, 2009 at 4:14 pm
I know where you're coming from - no offense taken!
Interesting - my article is specifically about how it *wasn't* a SQL Server problem - but how bad schema/design/testing led to SQL Server being overloaded because of what it was being told to do. SQL Server was being (speculating) asked to perform under a high load with a bad, non-scalable schema - not a problem with the design of SQL Server itself.
I thought I'd made that clear several times in the article... SQL Server couldn't handle the load put on the bad schema. Do you think that wasn't clear enough in the article?
Cheers
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
May 11, 2009 at 4:20 pm
Heh - another good discussion on SSC - win!
Guys,
I don't quite have the ego to assume that the world at large will read the editorial and article the way I did. Not quite 🙂
I'm no writer or editor, so I won't presume to advise you guys on what to write. I would just say that, for me, a greater emphasis could be placed on Microsoft not handling the PR very well, the problems probably being caused by the hardware or design simply being overwhelmed by 5x that planned traffic, and that investigations are continuing.
I don't know - it sounds like everyone pretty much agrees what should be said anyway - maybe it's just me that is reading the wrong things into it? Nah! :laugh:
Paul
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply