March 6, 2023 at 8:43 pm
From a huge and ever growing table (tbl_Values) I need to select all records which conform to rules defined in another table (tbl_Rules). Each record in tbl_Values has a time_stamp (datetime) and a fvalue (float); the rules defined in tbl_Rules have a WeekDay (int), a HourOfDay (int), a MinValue (float), a MaxValue (float). Records from tbl_Values which will go into the final aggregations are selected with the following logic:
SELECT
*
FROM
tbl_Values
LEFT JOIN tbl_Rules on
datepart(weekday,tbl_Values.t_stamp)=tbl_Rules.WeekDay
and datepart(hour,tbl_Values.t_stamp)=tbl_Rules.Hour
and tbl_Values.fvalue between tbl_Rules.MinValue and tbl_Rules.MaxValue
WHERE tbl_Values.t_stamp between @StartDate and @EndDate
The WHERE condition selects almost 1.000.000 records from tbl_Values, but this is very quick, thanks to and Index on tbl_Rules(t_stamp).
Reading through the query plan, what is slowing down the whole logic are the weekday and hour calculation: finally it takes almost 18 seconds to get a result.
As a workaround, I added a WeekDay (int) and an Hour (int) column to the tbl_Values: each time a record is INSERTED into tbl_Values, a trigger AFTER INSERT is activated, which calculates WeekDay and Hour and stores them in the corresponding columns. Using those columns instead of the DATEPART calculation, the whole logic runs in 2 seconds.
Computed columns are no solution either. Indexing on a view is not possible because of not determinism.
Is there a more elegant solution?
Thanks for your help, regards
March 6, 2023 at 9:09 pm
I think adding the WeekDay and Hour columns to the main table make the most sense. Computed columns will be more efficient than a trigger, but they do have some complexities.
For absolute best performance, those columns should be "tinyint" not "int" (in the main table and the rule table, btw).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 7, 2023 at 5:17 am
I think adding the WeekDay and Hour columns to the main table make the most sense. Computed columns will be more efficient than a trigger, but they do have some complexities.
For absolute best performance, those columns should be "tinyint" not "int" (in the main table and the rule table, btw).
If the computed column contains a user defined scalar function, the queries against the table will go single threaded even if they don't address the computed column. Here's the article and I verified his findings in that and the article that led up to it.
https://www.brentozar.com/archive/2020/10/using-triggers-to-replace-scalar-udfs-on-computed-columns/
Not sure that I agree with the "TinyInt" thing... for example, I rewrote the horrible sys.fn_PhysLocCracker function that MS created. For the two byte values, I initially converted them to SMALLINT and measured the performance multiple times with recompiles, etc, and whatnot. Then I changed those two SMALLINT conversions to INT and did exactly the same test. The INT conversions were significantly faster even though I was only converting 2 byte binaries.
And, no... I'm not saying that your wrong. I'm saying that using the smaller datatype is not a guarantee. As with a lot of other things, "It Depends" and, even though it flies in the face of common sense and good logic (it shocked the hell out of me... just doesn't make sense), you have to test it "in situ".
For disk space and memory space... yeah... no arguments there unless you have row compression going and that can present a substantially more insidious problem concerning "ExpAnsive Updates" but, again, "It Depends" and "Must look eye"! 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2023 at 7:46 am
Ok, thanks to both! One more question, if you have time: if I enclose the above SELECT logic in a CTE and then use the CTE for my SUM aggregation, that arrangement will perform much slower than if I SELECT the above logic INTO a #tempTable and then use that #tempTable for my aggregation.
Any reason for that?
Thanks, regards
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply