Best practices in querying

  • This is just a general question, but let's say I was going to create a stored procedure and reference a table called "Students" that contained 1000 student records. Of those students, let's say I want only those who are graduating, who took at least one math class, and who received a scholarship in 2009. The resulting number may end up being 200.

    These are all fictitious tables, but I would normally join each of them together and then filter in my where clause....nothing special.

    In a stored procedure, does it make sense to join tables upon tables and add where criteria upon where criteria, or would it make more sense (and be more efficient?) to back out each grouping individually from a memory table?

    Something like:

    SELECT *

    INTO #students

    FROM students

    DELETE FROM #students

    WHERE studentid IN (SELECT studentid FROM graduates)

    DELETE FROM #students

    WHERE studentid IN (SELECT studentid FROM classes WHERE subject = 'Math')

    DELETE FROM #students

    WHERE studentid IN (SELECT studentid FROM scholarships WHERE scholarship_yr = '2009')

    SELECT * FROM #students -- 200 records

    I'm sure many of you have a quick answer of one way or the other, but I never really thought about doing things the second way, and wonder if there's any merit to it. In one sense, breaking things down into smaller parts is sometimes easier to follow. Of course, I can see that the latter method would require three separate queries to the database, which is probably not the most efficient way of doing it.

    Thoughts?

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • In general it's more efficient to query for just the rows you want.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail is correct, you will incur far more reads by deleting in this manner, which will have an impact on resources. Best to only insert what you need.

    MCITP SQL 2005, MCSA SQL 2012

  • A crude analogy. You own a fruit orchard but are only interested in selling apples today. Is it less work to pick all the apples AND all the pears, load them into a wheelbarrow, move them to your fruit stand, and then sort the pears out. Or is it less work to just pick the apple trees, load only apples into your wheelbarrow, and then move the lighter load?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • It is generally more efficient to use a single query and let SQL Server do the work. However, there are times when the query or process becomes so complex that it is better to divide & conquer.

    The approach that you outlined though, is not one that I would ever recommend. This approach can end up building huge tables - then adds a lot of overhead to delete the data you do not want to include in the final results.

    When faced with a situation where I need to divide & conquer, I will still filter the data to just the minimum amount of data needed for that step. Instead of getting everything and deleting what I don't want - I just select what I do want into a temp table.

    For example, let's say I need to find all of the invoices for the past year where a particular transaction exists on that invoice. Then end result needs data from the invoice header, invoice detail, transaction detail, order detail, and several other tables. When putting this all together into a single query - it just doesn't perform well enough.

    Then, I might first run a query that creates a temp table from the invoice detail that identifies just those invoices I need. Then I can join to that temp table to get all of the other details.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • There are select times when adding in more data than is needed and then deleting makes more sense. I had to do that one place because the query that populated the table was complicated enough that SQL didn't like the extra complexity and the plan wasn't as efficient as doing the delete. That isn't something I would expect to need to do often though.

  • cfradenburg (5/27/2011)


    There are select times when adding in more data than is needed and then deleting makes more sense. I had to do that one place because the query that populated the table was complicated enough that SQL didn't like the extra complexity and the plan wasn't as efficient as doing the delete. That isn't something I would expect to need to do often though.

    I have always found that when this situation occurs - people tend to forget that the next step in the process (it is always a step in a process) can just use a filter in the where clause to exclude those extra rows.

    In other words, I have never found a good reason for performing a delete on a temp table that I just created. Nor have I found any good reasons to perform updates on a temp table that was just created.

    I have found the in most cases - I can eliminate the non-required data up front, or eliminate it in the following steps using a filter to exclude those rows (or only include those I really need).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply