Database Query Optimization: How to Improve Query Performance in SQL

  • I have a SQL database that stores a large amount of data, and I'm experiencing slow query performance when retrieving information. What are some strategies and best practices for optimizing database queries in SQL to achieve better performance?

    Here's an example query I'm working with:

    SELECT first_name, last_name, email
    FROM users
    WHERE registration_date >= '2023-01-01'
    ORDER BY last_name ASC;

    While this query works, it's becoming slow as the dataset grows. Could you provide guidance on how to optimize this query and potentially use indexes or other techniques to speed up the retrieval of data? Additionally, are there any common pitfalls I should be aware of when optimizing SQL queries? Your insights, including code examples if applicable, would be greatly appreciated. Thank you!

  • Step 1 would be to get into the habit of using the 2 part naming convention.  That doesn't have so much to do with performance in this case (although there is some) but it is possible to have more than 1 schema with each having a "users" table.  Bulletproof you code by using the 2 point naming convention.  For example...

     SELECT first_name, last_name, email
    FROM dbo.users
    WHERE registration_date >= '2023-01-01'
    ORDER BY last_name ASC
    ;

    You should try adding a non-clustered index on the registration_date column.  If you want it to really fly, turn it into a "covering" index by including the first_name, last_name, email column as an INCLUDE in the index.

    There will be some that suggest there should be a clustered index on the registration date column instead of the non-clustered index suggested above.  I assume that you have a "UserID" (or similar) column in this table and that its both the PK and the clustered index.  In this case and based on what I think the actual usage for this table is, I would NOT move the clustered index to the registration_date column.

    You can find the official MS documentation about clustered and non-clustered indexes at the following URL  Be sure to click on the embedded links, like "Create indexes with included columns" in the text. Scroll down to the bottom of that brief introduction to the "Next Steps" section for links for each type of index.

    https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described

    I'd post the actual code but your profile says you want to learn.  Reading the official MS documentation should always be included (probably first) because people sometimes miss things and aren't always right.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It depends, but, yes, if this query is vital for you, then change the clustering key to ( registration_date, UserID /*assuming a unique UserID*/ ) (the PK can still be whatever you want it to).

    Since you have a lot of users, if this query is frequently issued, you should also consider changing the clus key to ( registration_date, last_name, UserID /*assuming a  unique UserID*/).  This will prevent repeated sorts of the data, which are very expensive to do.

    I prefer the clus key to match the needs of that table rather be based on some arbitrary general "rule" about "all" PKs/clus keys.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You could add an index:

    CREATE INDEX IX_users_1 ON users(registration_date) INCLUDE (first_name, last_name, email);

    If you're not sure what to do just put the query in a script and run it in Database Engine Tuning Advisor. This will give recommendations for improving the performance of the query.

    Screenshot 2023-09-12 180920

  • If you are going to do that -- and then keep modifying that index endlessly as the base query add columns and otherwise changes -- you might as well do something like this, again, to prevent the overhead of a sort for every query:

    CREATE INDEX users__IX_registration_date_last_name ON dbo.users ( registration_date, last_name ) 
    INCLUDE ( email, first_name) /*I list the INCLUDEd columns in alpha order: might as well, right?*/ WITH ( DATA_COMPRESSION=xx, FILLFACTOR=nn, SORT_IN_TEMPDB = ON ) ON [PRIMARY] /*other fg name*/;

    The fillfactor won't really matter until/unless the index is rebuilt, so naturally you can leave that off if you want.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    If you are going to do that -- and then keep modifying that index endlessly as the base query add columns and otherwise changes -- you might as well do something like this, again, to prevent the overhead of a sort for every query:

    CREATE INDEX users__IX_registration_date_last_name ON dbo.users ( registration_date, last_name ) 
    INCLUDE ( email, first_name) /*I list the INCLUDEd columns in alpha order: might as well, right?*/ WITH ( DATA_COMPRESSION=xx, FILLFACTOR=nn, SORT_IN_TEMPDB = ON ) ON [PRIMARY] /*other fg name*/;

    The fillfactor won't really matter until/unless the index is rebuilt, so naturally you can leave that off if you want.

    With ORDER BY lastname in the query it would still have to sort by lastname as the primary order of the index is on registration_date

  • Jonathan AC Roberts wrote:

    ScottPletcher wrote:

    If you are going to do that -- and then keep modifying that index endlessly as the base query add columns and otherwise changes -- you might as well do something like this, again, to prevent the overhead of a sort for every query:

    CREATE INDEX users__IX_registration_date_last_name ON dbo.users ( registration_date, last_name ) 
    INCLUDE ( email, first_name) /*I list the INCLUDEd columns in alpha order: might as well, right?*/ WITH ( DATA_COMPRESSION=xx, FILLFACTOR=nn, SORT_IN_TEMPDB = ON ) ON [PRIMARY] /*other fg name*/;

    The fillfactor won't really matter until/unless the index is rebuilt, so naturally you can leave that off if you want.

    With ORDER BY lastname in the query it would still have to sort by lastname as the primary order of the index is on registration_date

    Oops, quite right.  For some reason I was thinking it was a specific date selection rather than a generic.  It will still be less overhead overall since many of the sort work strings will already be in order.  Also, if the optimizer can do it, the results can be merged rather than fully sorted.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply