September 26, 2017 at 11:04 am
Hi all,
I've been working a performance issue that occurs from time to time with a reporting stored procedure, and I've got solutions readily available to me, but I'm wondering about methodology that might lead to overall better results, as the time to test some things can be prohibitive, and if in general, it's not generally a good idea to do things a given way, I can't really then justify a massive time expenditure to prove it. Here's the basics of what I'm working with:
I have a table with 21 fields, 8 of which are FK, and one is PK. One of the non-key fields is a SortOrder field that is being divided by 10 using a modulo operator (%) in a stored procedure's WHERE clause. The query in the stored procedure accesses this table via a view where this field is also divided by 10 using a modulo operator (%) and the result of the modulo computation is a field within the view. The sproc does NOT reference that field in the view, but instead does it's own calculation. The thinking is that if I instead add a computed, persisted field with the same modulo calculation to the table, and then remove the calculation from the view, referencing the computed persisted field with the same field name as the unused field in the view, and then replace the sproc's references to the view's SortOrder field with the unused SortOrderMod10 field (the unused one referred to earlier), that I can then sparse index the table on it's joined fields in both the view and the sproc, including newly, the SortOrderMod10 field as well as the original SortOrder field, and be able to get an index seek on that table. That does indeed occur, but it appears to be driving up the reads significantly, as well as the execution time, and now I wonder if I've just "over done it", or I've missed something. I have profiler available to trace things, but I see a total read count that is significantly higher than the sum of all the individual statement completion values, and need to know where that comes from. It may be that I haven't selected enough of the events to trace to be able to see it all. at the moment, I have the following:
I'm currently using that sparse index, as well as a number of others, but some previous measurements without this index appear to show just more than half the reads that I'm seeing at the moment. I can't disclose the sproc at this time but can provide some additional details as needed. If I can't determine where all the additional reads are coming from, it might be hard to solve the problem. At the moment, the worst-case scenario on this is a duration of 1.7 seconds and some 17,000 reads, and up until now, I thought I had gotten down to just over 7500, but am back up over 12,600, although duration is around 1.4 seconds. Any/all guidance appreciated.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 26, 2017 at 2:28 pm
I think you've got the computed column backwards here.
That is, since the data is supposed to be in 1NF, the "divided by 10" value should be in a separate column in the table already. The computed / virtual column would be the combined value, which you currently have as an actual column.
Btw, the term "fields" is anachronistic for a relational dbms; you really should call them "columns".
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 26, 2017 at 2:53 pm
ScottPletcher - Tuesday, September 26, 2017 2:28 PMI think you've got the computed column backwards here.That is, since the data is supposed to be in 1NF, the "divided by 10" value should be in a separate column in the table already. The computed / virtual column would be the combined value, which you currently have as an actual column.
Btw, the term "fields" is anachronistic for a relational dbms; you really should call them "columns".
I get the COBOL reference, but the rest of what you were saying isn't quite registering, and maybe it's because what I wrote and meant and what you read and thought you understood aren't quite the same thing. I'm trying to decide whether I can get better performance by placing a computed persisted column in the base table instead of in a view where the column wasn't even getting used. I won't actually end up with any schema changes in the view, ... only whether the column in the view references the computed persisted column that I'll create, or whether it just does that calculation as part of the view. The calculation is just ColumnName % 10, so it's not complex math. I'm struggling to understand where all the extra reads are coming from in the execution plan that aren't represented by statement completion lines in the .SQLPlan file that I generate using Profiler. The specific portion of the query I'm dealing with might use around 4500 in two pieces, but the whole shootin' match is up over 12,600. If I knew what else to account for, I could probably solve this more quickly, but I'm hitting a wall that's not making sense. I figured that pushing that computation into the base table would relieve the view of the job and the index updates that now include that computed persisted column would do their magic. I do get that part of the query to be better, but I'm at a loss as to how to impact that missing 2/3 of the reads that Profiler isn't directly accounting for, based on what I'm gathering anyway. Maybe I am not gathering all I need?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 26, 2017 at 3:15 pm
sgmunson - Tuesday, September 26, 2017 2:53 PMScottPletcher - Tuesday, September 26, 2017 2:28 PMI think you've got the computed column backwards here.That is, since the data is supposed to be in 1NF, the "divided by 10" value should be in a separate column in the table already. The computed / virtual column would be the combined value, which you currently have as an actual column.
Btw, the term "fields" is anachronistic for a relational dbms; you really should call them "columns".
I get the COBOL reference, but the rest of what you were saying isn't quite registering, and maybe it's because what I wrote and meant and what you read and thought you understood aren't quite the same thing. I'm trying to decide whether I can get better performance by placing a computed persisted column in the base table instead of in a view where the column wasn't even getting used. I won't actually end up with any schema changes in the view, ... only whether the column in the view references the computed persisted column that I'll create, or whether it just does that calculation as part of the view. The calculation is just ColumnName % 10, so it's not complex math. I'm struggling to understand where all the extra reads are coming from in the execution plan that aren't represented by statement completion lines in the .SQLPlan file that I generate using Profiler. The specific portion of the query I'm dealing with might use around 4500 in two pieces, but the whole shootin' match is up over 12,600. If I knew what else to account for, I could probably solve this more quickly, but I'm hitting a wall that's not making sense. I figured that pushing that computation into the base table would relieve the view of the job and the index updates that now include that computed persisted column would do their magic. I do get that part of the query to be better, but I'm at a loss as to how to impact that missing 2/3 of the reads that Profiler isn't directly accounting for, based on what I'm gathering anyway. Maybe I am not gathering all I need?
Sorry, I wasn't clear enough. My point is that if you are often using "ColumnName % 10", then that column should have originally been stored as 2 (or more perhaps) separate columns. This in keeping with the doctrine that each column is a single ("atomic") piece of data (aka first normal form / 1NF).
If any current code no doubt references the existing combined "ColumnName" column, you could create a computed column to construct the original combined column on the fly.
For example, existing definition:
ColumnName int
New definition along these lines:
ColumnNameA int --would be populated as ColumnName % 10
ColumnNameB int --would be populated as ColumnName / 10
ColumnName AS CAST(ColumnNameA + ColumnNameB AS int)
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 26, 2017 at 4:06 pm
Unfortunately, that's not a viable option. What should have been done is to have the application designed to keep that information computed and placed in the 2nd field. I'm finding, however, that it just looks like no matter how well I index the computed persisted field, I end up playing whack-a-mole with some other element of the query due to complications well beyond the scope of this exercise. Given that my objective was to just try and understand what the difference would be performance wise between having the column added as a computed, persisted column in the base table, vs. letting the view do the computation, I'm starting to suspect that the view is the place for it, and I'll index the view and see if that gets me where I need to be. Apparently, pushing that computation down to the table makes it happen too darned often. I'm seeing estimates of over 1 million rows when no rows will actually pass through, so I'm just down to a "nothing burger", when I was hoping I'd just found an inexpensive bison(grass-fed) burger, medium rare, with the ketchup that does NOT have HFCS in it (high fructose corn syrup), done to perfection with soft fries (Yummmmmmmmmm!). Oh well...
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 28, 2017 at 7:09 am
I still believe what should have been done, and what could still be done, is that the value should be in its own column.
You can always add columns to split the existing column, drop the original column and recreate it as a computed value. If necessary you could even rename the original table and create a view to simulate the original table for existing code, which new code could use the new table name.
Failure to properly model data, in particular not following 1NF, will always cause issues such as this.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 28, 2017 at 7:43 am
ScottPletcher - Thursday, September 28, 2017 7:09 AMI still believe what should have been done, and what could still be done, is that the value should be in its own column.You can always add columns to split the existing column, drop the original column and recreate it as a computed value. If necessary you could even rename the original table and create a view to simulate the original table for existing code, which new code could use the new table name.
Failure to properly model data, in particular not following 1NF, will always cause issues such as this.
I hear you. I just can't do something like that because of the quantity of regression testing it would require, and the level of involvement and overall effort it would require to get all the pieces surrounding it dealt with. There are a lot of other things that ended up putting me in a position of trying to play "whack-a-mole", because with all the other pieces of the query, just pushing that computation down into the table ended up allowing me to excellently index it, but in the long run, everything else in the query started breaking worse than this particular part, and thus how I ended up playing whack-a-mole. Realized it quickly though, as just like in the movie "War Games", the only winning strategy is not to play. I ended up removing the use of the view from the query, and doing the computation for a small subset of records into a temp table early in the sproc, for later reference by the rest of it, and that worked like a charm in terms of performance. Cold cache run (after DBCC FREEPROCCACHE) was then around 1 second, down significantly from the 1.7 seconds it started at, and subsequent runs, even with different parameters, were consistently in the range of 13 to 35 milliseconds. Given that my prior attempts had only managed to get down to a 1.4 second cold cache start and then 250 millisecond repeats, this was a vast improvement. There are just too many other moving pieces for me to be able to make that extensive a change. At least now it only calculates that modulo 10 value for relevant records, and not for the entire table, which is probably a good part of the reason I had ended up driving the reads higher in my earlier efforts.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply