Are the posted questions getting worse?

  • Michael L John wrote:

    x wrote:

    x wrote:

    Brandie Tarvin wrote:

    Michael L John wrote:

    Here's the query.  There's not any understanding on how the the language works, index usage (or lack of usage), and what these things do.

    It works in dev, why is is slow in production???

    select   distinct    U.ClientID from (select userID from TableA where DateColumn > @StartDate 
    union select userID from TableB where Success = 1 and DateColumn> @StartDate) l
    inner join UserTable u on u.userID = l.userID

    The aggravating part is that I worked very closely with the original development team on the architecture and code.  The WORST proc/orm code that runs against the database was averaging 4 MILLISECONDS.

    Now, with this new crew, we have queries taking many minutes.

    Brain bleach! WHERE IS THE BRAIN BLEACH???

    I could totally make that code worse, but I could also write better code in my sleep. Good grief.

    Well lets see the query then! Please include an explanation why yours will use indexes while the original one won't. I'm going to admit that I'm probably like the poor 6 row guy and don't understand why indexes won't help that query!

    Also, would a query that targets a 6 row table EVER use an index? I would hazard a guess that it wouldn't but I'm n0t a wiz like you guys !

    And another thing, 6 rows for development in the first place, well that is a really low effort setup. Who made the decision for this developer to write efficient queries then toss him 6 rows?

    I'm sure I'm the dumb one here but that's why I'm here, to learn!

    Six rows was an exaggeration.   There are a 200-300k rows in the dev environment for each of the tables.  In prod, there are ~60 million in each of the tables.

    Lynn, as for what this is trying to do, it's attempting to get a list of clients ID's who have had people log in over the past 12 months.  Where the design goes haywire is that the last login date may be in two places, one in the "user token" table, the second in a "login attempt" table.   A record may exist in one or both tables, so they need to check both places.  Figuring out why there are two places is something I am working on.

     

    Okay, so TableA and TableB in the derived table are those two source tables where the login info may exist. So why the distinct in the outer query when there are no duplicates because of the UNION in the derived table? I won't bother to throw out any ideas on the this as I am sure you may have some good ideas of your own.

     

  • Six rows was an exaggeration.

    I suspect that's not the only exaggeration LOL

    Now is someone going to rewrite that query or not?

     

     

  • What percentage of the rows of each table are less than 12 months ago?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Okay, so TableA and TableB in the derived table are those two source tables where the login info may exist. So why the distinct in the outer query when there are no duplicates because of the UNION in the derived table? I won't bother to throw out any ideas on the this as I am sure you may have some good ideas of your own.

    Why would this disallow using indexes though? I would go in the entire other direction, and expect that SQL might detect and drop the superfulous "distinct" but that's of course dependent on whether the query optimizer would have code to detect this.

     

  • Jeff Moden wrote:

    What percentage of the rows of each table are less than 12 months ago?

    All of them.  The system is only 9 months old.

    Lynn, there are a lot of index changes for this database that are in various states of being deployed.  There are 3 that will directly help this query, as well as quite a few other queries.  These are in QA.

    X, the re-write goes like this:

    WITH CTE (Pull all the user ids who have logged in for last 12 months)

    SELECT Client ID's WHERE Exists (select From CTE)

    You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.

    With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Why even use a CTE? Why not just join to those tables?

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

  • Michael L John wrote:

    Jeff Moden wrote:

    What percentage of the rows of each table are less than 12 months ago?

    All of them.  The system is only 9 months old.

    Lynn, there are a lot of index changes for this database that are in various states of being deployed.  There are 3 that will directly help this query, as well as quite a few other queries.  These are in QA.

    X, the re-write goes like this:

    WITH CTE (Pull all the user ids who have logged in for last 12 months)

    SELECT Client ID's WHERE Exists (select From CTE)

    You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.

    With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.

    Hey thats fine, but being accurate doesn't really need the exact code, I try to tell folks to act as if they're logging a service call and try to duplicate the issue. You'll notice that I'm not asking "what the code is for," because all I was interested in was why this snippet won't use indexes.

     

  • x wrote:

    Michael L John wrote:

    Jeff Moden wrote:

    What percentage of the rows of each table are less than 12 months ago?

    All of them.  The system is only 9 months old.

    Lynn, there are a lot of index changes for this database that are in various states of being deployed.  There are 3 that will directly help this query, as well as quite a few other queries.  These are in QA.

    X, the re-write goes like this:

    WITH CTE (Pull all the user ids who have logged in for last 12 months)

    SELECT Client ID's WHERE Exists (select From CTE)

    You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.

    With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.

    Hey thats fine, but being accurate doesn't really need the exact code, I try to tell folks to act as if they're logging a service call and try to duplicate the issue. You'll notice that I'm not asking "what the code is for," because all I was interested in was why this snippet won't use indexes.

    Hi Patrick. I don't know the exact code or tables, but I'd think 6 rows would qualify as a trivial plan. Just read the (likely 1) page and call it quits.  If that's really the scenario, the QO should call that a "good enough" plan and skip the rest.

    However, if Mike wrote this code, then never mind. Right, Mike? 😉

  • Okay, so TableA and TableB in the derived table are those two source tables where the login info may exist. So why the distinct in the outer query when there are no duplicates because of the UNION in the derived table? I won't bother to throw out any ideas on the this as I am sure you may have some good ideas of your own.

    Well in one trivial example I tried, the number of sorts are the same whether the "distinct" clause is used or not. I suspect it depends on the query complexity tho.

     

  • I think the most obvious way to rewrite that query would be:

    SELECT u.UserId
    FROM UserTable u
    WHERE EXISTS(SELECT *
    from TableA a
    where DateColumn > @StartDate
    and a.UserId = u.UserId)
    OR EXISTS(select *
    from TableB b
    where Success = 1
    and DateColumn > @StartDate
    and b.UserId = u.UserId)

    which I'm sure would perform better.

     

  • Ed Wagner wrote:

    x wrote:

    Michael L John wrote:

    Jeff Moden wrote:

    What percentage of the rows of each table are less than 12 months ago?

    All of them.  The system is only 9 months old.

    Lynn, there are a lot of index changes for this database that are in various states of being deployed.  There are 3 that will directly help this query, as well as quite a few other queries.  These are in QA.

    X, the re-write goes like this:

    WITH CTE (Pull all the user ids who have logged in for last 12 months)

    SELECT Client ID's WHERE Exists (select From CTE)

    You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.

    With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.

    Hey thats fine, but being accurate doesn't really need the exact code, I try to tell folks to act as if they're logging a service call and try to duplicate the issue. You'll notice that I'm not asking "what the code is for," because all I was interested in was why this snippet won't use indexes.

    Hi Patrick. I don't know the exact code or tables, but I'd think 6 rows would qualify as a trivial plan. Just read the (likely 1) page and call it quits.  If that's really the scenario, the QO should call that a "good enough" plan and skip the rest.

    However, if Mike wrote this code, then never mind. Right, Mike? 😉

    I now suspect I have zero insight into the situation especially once Mike said he was "paraphrasing / exaggerating / venting" but thats what I get for poking my nose into someone elses thread LOL

    However it was cool to see SQL ditch the extra sort when it sees a superfluous "distinct," so I learned something anyways!

     

  • X, the re-write goes like this:

    WITH CTE (Pull all the user ids who have logged in for last 12 months)

    SELECT Client ID's WHERE Exists (select From CTE)

    This would not work, it could not progress the service call because in your first post there was a union.

     

  • x wrote:

    X, the re-write goes like this:

    WITH CTE (Pull all the user ids who have logged in for last 12 months)

    SELECT Client ID's WHERE Exists (select From CTE)

    This would not work, it could not progress the service call because in your first post there was a union.

    He left out the details of the code in the CTE.

  • I feel like I'm missing something. An inner join is necessarily an existence check, why not:

    SELECT u.UserId
    FROM UserTable u
    JOIN TableA a ON a.UserId = u.UserId
    JOIN TableB b ON b.UserId = u.UserId
    WHERE a.DateColumn > @StartDate
    AND b.Success = 1
    AND b.DateColumn > @StartDate

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

  • jonathan.crawford wrote:

    I feel like I'm missing something. An inner join is necessarily an existence check, why not:

    SELECT u.UserId
    FROM UserTable u
    JOIN TableA a ON a.UserId = u.UserId
    JOIN TableB b ON b.UserId = u.UserId
    WHERE a.DateColumn > @StartDate
    AND b.Success = 1
    AND b.DateColumn > @StartDate

    more data would be consumed on average. "if exists" makes the determination after the first row fufills the predicate whereas just joining looks to read more from tablea and tableb than needed, also I get "cartesianish" vibes somehow.

    also, you're requiring a match in both tablea and tableb but I'm a bit hazy whether that matches the requirements.

Viewing 15 posts - 64,876 through 64,890 (of 66,738 total)

You must be logged in to reply to this topic. Login to reply