May 12, 2020 at 6:37 pm
x wrote:x wrote:Brandie Tarvin wrote:Michael L John wrote:Here's the query. There's not any understanding on how the the language works, index usage (or lack of usage), and what these things do.
It works in dev, why is is slow in production???
select distinct U.ClientID from (select userID from TableA where DateColumn > @StartDate
union select userID from TableB where Success = 1 and DateColumn> @StartDate) l
inner join UserTable u on u.userID = l.userIDThe aggravating part is that I worked very closely with the original development team on the architecture and code. The WORST proc/orm code that runs against the database was averaging 4 MILLISECONDS.
Now, with this new crew, we have queries taking many minutes.
Brain bleach! WHERE IS THE BRAIN BLEACH???
I could totally make that code worse, but I could also write better code in my sleep. Good grief.
Well lets see the query then! Please include an explanation why yours will use indexes while the original one won't. I'm going to admit that I'm probably like the poor 6 row guy and don't understand why indexes won't help that query!
Also, would a query that targets a 6 row table EVER use an index? I would hazard a guess that it wouldn't but I'm n0t a wiz like you guys !
And another thing, 6 rows for development in the first place, well that is a really low effort setup. Who made the decision for this developer to write efficient queries then toss him 6 rows?
I'm sure I'm the dumb one here but that's why I'm here, to learn!
Six rows was an exaggeration. There are a 200-300k rows in the dev environment for each of the tables. In prod, there are ~60 million in each of the tables.
Lynn, as for what this is trying to do, it's attempting to get a list of clients ID's who have had people log in over the past 12 months. Where the design goes haywire is that the last login date may be in two places, one in the "user token" table, the second in a "login attempt" table. A record may exist in one or both tables, so they need to check both places. Figuring out why there are two places is something I am working on.
Okay, so TableA and TableB in the derived table are those two source tables where the login info may exist. So why the distinct in the outer query when there are no duplicates because of the UNION in the derived table? I won't bother to throw out any ideas on the this as I am sure you may have some good ideas of your own.
May 12, 2020 at 6:37 pm
Six rows was an exaggeration.
I suspect that's not the only exaggeration LOL
Now is someone going to rewrite that query or not?
May 12, 2020 at 6:39 pm
What percentage of the rows of each table are less than 12 months ago?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2020 at 6:45 pm
Okay, so TableA and TableB in the derived table are those two source tables where the login info may exist. So why the distinct in the outer query when there are no duplicates because of the UNION in the derived table? I won't bother to throw out any ideas on the this as I am sure you may have some good ideas of your own.
Why would this disallow using indexes though? I would go in the entire other direction, and expect that SQL might detect and drop the superfulous "distinct" but that's of course dependent on whether the query optimizer would have code to detect this.
May 12, 2020 at 6:51 pm
What percentage of the rows of each table are less than 12 months ago?
All of them. The system is only 9 months old.
Lynn, there are a lot of index changes for this database that are in various states of being deployed. There are 3 that will directly help this query, as well as quite a few other queries. These are in QA.
X, the re-write goes like this:
WITH CTE (Pull all the user ids who have logged in for last 12 months)
SELECT Client ID's WHERE Exists (select From CTE)
You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.
With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
May 12, 2020 at 6:57 pm
Why even use a CTE? Why not just join to those tables?
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
May 12, 2020 at 7:17 pm
Jeff Moden wrote:What percentage of the rows of each table are less than 12 months ago?
All of them. The system is only 9 months old.
Lynn, there are a lot of index changes for this database that are in various states of being deployed. There are 3 that will directly help this query, as well as quite a few other queries. These are in QA.
X, the re-write goes like this:
WITH CTE (Pull all the user ids who have logged in for last 12 months)
SELECT Client ID's WHERE Exists (select From CTE)
You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.
With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.
Hey thats fine, but being accurate doesn't really need the exact code, I try to tell folks to act as if they're logging a service call and try to duplicate the issue. You'll notice that I'm not asking "what the code is for," because all I was interested in was why this snippet won't use indexes.
May 12, 2020 at 7:20 pm
Michael L John wrote:Jeff Moden wrote:What percentage of the rows of each table are less than 12 months ago?
All of them. The system is only 9 months old.
Lynn, there are a lot of index changes for this database that are in various states of being deployed. There are 3 that will directly help this query, as well as quite a few other queries. These are in QA.
X, the re-write goes like this:
WITH CTE (Pull all the user ids who have logged in for last 12 months)
SELECT Client ID's WHERE Exists (select From CTE)
You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.
With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.
Hey thats fine, but being accurate doesn't really need the exact code, I try to tell folks to act as if they're logging a service call and try to duplicate the issue. You'll notice that I'm not asking "what the code is for," because all I was interested in was why this snippet won't use indexes.
Hi Patrick. I don't know the exact code or tables, but I'd think 6 rows would qualify as a trivial plan. Just read the (likely 1) page and call it quits. If that's really the scenario, the QO should call that a "good enough" plan and skip the rest.
However, if Mike wrote this code, then never mind. Right, Mike? 😉
May 12, 2020 at 7:22 pm
Okay, so TableA and TableB in the derived table are those two source tables where the login info may exist. So why the distinct in the outer query when there are no duplicates because of the UNION in the derived table? I won't bother to throw out any ideas on the this as I am sure you may have some good ideas of your own.
Well in one trivial example I tried, the number of sorts are the same whether the "distinct" clause is used or not. I suspect it depends on the query complexity tho.
May 12, 2020 at 7:29 pm
I think the most obvious way to rewrite that query would be:
SELECT u.UserId
FROM UserTable u
WHERE EXISTS(SELECT *
from TableA a
where DateColumn > @StartDate
and a.UserId = u.UserId)
OR EXISTS(select *
from TableB b
where Success = 1
and DateColumn > @StartDate
and b.UserId = u.UserId)
which I'm sure would perform better.
May 12, 2020 at 7:30 pm
x wrote:Michael L John wrote:Jeff Moden wrote:What percentage of the rows of each table are less than 12 months ago?
All of them. The system is only 9 months old.
Lynn, there are a lot of index changes for this database that are in various states of being deployed. There are 3 that will directly help this query, as well as quite a few other queries. These are in QA.
X, the re-write goes like this:
WITH CTE (Pull all the user ids who have logged in for last 12 months)
SELECT Client ID's WHERE Exists (select From CTE)
You're not getting the actual code, because I do not feel like obfuscating it, and the original discussion was not about how to tune a query, but more a vent about developers who don't think past their fingertips.
With the indexes, and the rewrite, there are 3 index seeks as opposed to scans, and in dev the execution time improvement went from ~40 seconds to less than 1 second.
Hey thats fine, but being accurate doesn't really need the exact code, I try to tell folks to act as if they're logging a service call and try to duplicate the issue. You'll notice that I'm not asking "what the code is for," because all I was interested in was why this snippet won't use indexes.
Hi Patrick. I don't know the exact code or tables, but I'd think 6 rows would qualify as a trivial plan. Just read the (likely 1) page and call it quits. If that's really the scenario, the QO should call that a "good enough" plan and skip the rest.
However, if Mike wrote this code, then never mind. Right, Mike? 😉
I now suspect I have zero insight into the situation especially once Mike said he was "paraphrasing / exaggerating / venting" but thats what I get for poking my nose into someone elses thread LOL
However it was cool to see SQL ditch the extra sort when it sees a superfluous "distinct," so I learned something anyways!
May 12, 2020 at 7:33 pm
X, the re-write goes like this:
WITH CTE (Pull all the user ids who have logged in for last 12 months)
SELECT Client ID's WHERE Exists (select From CTE)
This would not work, it could not progress the service call because in your first post there was a union.
May 12, 2020 at 7:54 pm
X, the re-write goes like this:
WITH CTE (Pull all the user ids who have logged in for last 12 months)
SELECT Client ID's WHERE Exists (select From CTE)
This would not work, it could not progress the service call because in your first post there was a union.
He left out the details of the code in the CTE.
May 12, 2020 at 10:21 pm
I feel like I'm missing something. An inner join is necessarily an existence check, why not:
SELECT u.UserId
FROM UserTable u
JOIN TableA a ON a.UserId = u.UserId
JOIN TableB b ON b.UserId = u.UserId
WHERE a.DateColumn > @StartDate
AND b.Success = 1
AND b.DateColumn > @StartDate
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
May 12, 2020 at 10:38 pm
I feel like I'm missing something. An inner join is necessarily an existence check, why not:
SELECT u.UserId
FROM UserTable u
JOIN TableA a ON a.UserId = u.UserId
JOIN TableB b ON b.UserId = u.UserId
WHERE a.DateColumn > @StartDate
AND b.Success = 1
AND b.DateColumn > @StartDate
more data would be consumed on average. "if exists" makes the determination after the first row fufills the predicate whereas just joining looks to read more from tablea and tableb than needed, also I get "cartesianish" vibes somehow.
also, you're requiring a match in both tablea and tableb but I'm a bit hazy whether that matches the requirements.
Viewing 15 posts - 64,876 through 64,890 (of 66,738 total)
You must be logged in to reply to this topic. Login to reply