I've been doing a lot of reading over the past few months trying to get my
SQL 2005 skills in shape and thought I'd share a few things about the books as
well as things I learned from them. I know book reviews aren't the most exciting
reading, so I'm going to combine a few in hopes of holding your interest. The
things listed aren't the only things I learned by far, but are just things I
tagged when I had an aha moment. Maybe if you find this format useful you'll let
me know so I can do another set, or even better, maybe you can write up some of
your own and send in.
Inside Microsoft (r) SQL Server (tm) 2005: The Storage Engine (Paperback) by
Kalen Delaney, $33 at Amazon.
This is the essentially the third edition,
following the editions dedicated to SQL 7 & SQL 2000. For SQL 2005 the book has
been split into four parts, covering the storage engine, query optimization,
TSQL programming, and TSQL querying. There's no doubt that SQL 2005 is a
significant release and has a lot of new features, but I kind of miss having one
book (for ease of use and buying one book instead of four). But as far as the
content goes, first class information as always. It contains a lot of low level
information (byte level schemes for pages for example) that will be of most use
to the very advanced DBA, but great reading for someone on the road to advancing
their skills.
Some things I learned:
- You can no longer pin a table in SQL 2005. DBCC Pintable is still here
and you can run it without error, but it doesn't do anything. For those not
familiar with it, the idea was that you force SQL to maintain a table in
memory, overriding it's memory management. Take a look at it in Books Online
and I swear you have to read it twice to realize it. I never had found a
reason to pin, but it's interesting - if arcane - to know it doesn't do
anything any longer
- A nice bit on page splitting that describes it's possible for SQL to
need to split a page more than once. Imagine a scenario where you have a
full page and need to insert a row on that page, SQL physically splits the
page approximately in half, leaving about 4000 bytes available on each page.
Now we try to insert our new row on either page, but the row is large - say
5000 characters. The row won't fit and the page has to be split again. Not
sure how often it happens or if it matters too much, but it is interesting
- That I need to spend more time on the new isolation options. I get the
concepts, but I haven't mastered the details as much as I should have
- Set Deadlock_Priority has changed. Before the options were LOW and
NORMAL, now there is a new setting of HIGH, can you call also set it to an
integer in the range -10 to +10, where LOW=-5, NORMAL=0, and HIGH=5.
Paid for this one myself and it will stay on the shelf, but it's not the
first book I'd recommend you buy if you're a beginner to intermediate user.
Professional SQL Server 2005 Integration Services by Brian Knight & many others,
$35 at Amazon
Disclosure: Brian is my long time business partner, so I'm not an unbiased
reviewer, but I hope you'll still find my comments interesting.
This is a hard book to read cover to cover. Not because it's poorly written,
but rather it has a lot of detail and covers a wide range of subjects, ranging
from converting DTS packages to SSIS all the way through building custom tasks
for SSIS. I find it to be a good reference book and it's where I start when I
try something new in SSIS.
Things I learned:
- A great paragraph covers getting temp tables to work. SSIS seems to use
connections freely and then close them (good), but the downside is that if
you create a temp table at one step and then want to clean it up later (or
more likely take some action based on it) the table will be gone because
you're now on two different connections. Setting the retain same connection
property of the connection manager to true makes the problem go away.
Simple, but I wouldn't say intuitive
- Not a new lesson, but as I was working through a couple of the
early chapters I remembered how much I hate portions of SSIS. Paths have to
be escaped (C:\\blah\\etc), equal is ==,
and in some places you address variables by ordinal position! End of rant
- There's an example in Chapter 8 that imports the SSC rss feed to
demonstrate working with SSIS and one of the comments was that even though
it's importing a parent/child relationship, there is no guarantee that the
children wouldn't get inserted before the parents, causing you grief if you
have defined foreign key constraints. This seems strange to me, but falls
into the category of nice to know up front I guess
- The most interesting point was about the move away from staging data in
SSIS because it can keep most of what it needs in memory, increasing
performance and decreasing the work to manage staging. It went on to discuss
that there was still merit to staging - one example being to stage right
after a data retrieval from a slow/remote location, and another was that by
breaking it up into a couple of steps with staging you get better
restartability (checkpoints only work on the control flow). The last bit was
pretty insightful, that most of us use the tools we know - temp tables for
staging - but that the SSIS raw files were better suited for the job. It's
easy to get used to doing things one way and not realize that the tools &
rules have changed
Yep, paid for this one too and it also stays on the shelf. As a
beginner/intermediate user I find it to be a good reference, seems like it
addresses some more advanced scenarios that would be helpful to a real power
user.
SQL Server Query Performance Tuning Distilled by Sajal Dam, $35 at Amazon
This book covers SQL 2000 and was published in 2004, somehow I missed it then
and only got around to looking at a copy late last year. I found the book to be
very approachable and my only complaint was that it didn't pay enough attention
(in my opinion of course) to data partitioning strategies. It's got very good
coverage of query tuning and index design.
Things I learned:
- I've been advocated for owner (schema) qualifying objects using the two
part syntax for years, for the slight boost in performance saved due to less
work figuring out which object to access as well as the greater clarity I
think the two part conveys. I didn't realize that not using the two part
syntax can result in separate execution plans being created - maybe because
I've been using the two part syntax! Might be something worth looking at if
you're using NT authentication, it could drive some churn in the procedure
cache
- A great example of how transactions can affect performance compared
doing 256,000 inserts in a loop with and without an explicit transaction.
The explicit transaction was almost 10 times faster and the log flush rate
dropped from 1100/sec to 14/sec. This an extreme example, but it does make
it clear that transactions cause log flushes, so it's worth looking at
whether you can reduce them some by using explicit transactions (please
don't start adding transactions everywhere, remember they can affect
concurrency).
I like the book and find it useful, so much so that we're now providing all
the students that attend our performance tuning class with a copy. I hope the
author publishes a new version for SQL 2005.