August 18, 2016 at 7:07 am
Any other takers on showing how this problem (return the same output as the T-SQL I provided) could be done in SSRS and attaching a package for folks to learn from?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 18, 2016 at 7:11 am
I know you must be busy but any feedback on the questions below?
Jeff Moden (8/16/2016)
If you don't mind and if you have the time, I'd like to ask a couple of "usage" questions so that I might be able to help others in the future by setting up more realistic test data.1. Obviously, this is a narrow test table and your real table may be quite a bit wider. How many columns are actually in this table and what's the average width of the rows in bytes? sys.dm_db_index_phyical_stats would be the tool to use for that number.
2. How many rows do you have in the real table?
3. How many products do you have in the real table?
4. How many months do you have in the table?
5. What's the ratio of active to inactive rows?
Thanks again for both the interesting problem and the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 18, 2016 at 8:13 am
Sorry Jeff, No I don't have the spare time at work that you seem to have. What I posted was intended to show anyone with a basic knowledge of SSRS how to calculate stuff etc. within a tablix.
I did specifically take a bit of time out (that I shouldn't have) to prepare the example I posted, so I'm pipped that you are disappointed by it not being runnable code: I just don't have the time to prepare anything more than I have posted, nor did I think this was some kind of efficiency contest, or in fact a contest of any kind, between you and I.
If that's the reaction I get when I put myself out trying to help people, then I think it's best that I resign from this forum.
August 18, 2016 at 9:09 am
Jeff Moden (8/18/2016)
@faulknerwilliam2,I know you must be busy but any feedback on the questions below?
Jeff Moden (8/16/2016)
If you don't mind and if you have the time, I'd like to ask a couple of "usage" questions so that I might be able to help others in the future by setting up more realistic test data.1. Obviously, this is a narrow test table and your real table may be quite a bit wider. How many columns are actually in this table and what's the average width of the rows in bytes? sys.dm_db_index_phyical_stats would be the tool to use for that number.
2. How many rows do you have in the real table?
3. How many products do you have in the real table?
4. How many months do you have in the table?
5. What's the ratio of active to inactive rows?
Thanks again for both the interesting problem and the feedback.
Sorry very busy indeed I will try and get back to you tomorrow / Monday latest. Apologies.
August 18, 2016 at 3:38 pm
cad.delworth (8/18/2016)
Sorry Jeff, No I don't have the spare time at work that you seem to have. What I posted was intended to show anyone with a basic knowledge of SSRS how to calculate stuff etc. within a tablix.I did specifically take a bit of time out (that I shouldn't have) to prepare the example I posted, so I'm pipped that you are disappointed by it not being runnable code: I just don't have the time to prepare anything more than I have posted, nor did I think this was some kind of efficiency contest, or in fact a contest of any kind, between you and I.
If that's the reaction I get when I put myself out trying to help people, then I think it's best that I resign from this forum.
Please don't resign! I had a look at your tablix screenshots / notes and they were helpful and good pointers. I had tried to do as you suggested (simple query followed by SSRS manipulation) but got frustrated to the point of near-tears. That prompted me to post here and then that terrific scripting example. If you look at the script Jeff provided it does follow very logically - I guess though I had the advantage of using it to drive real data.
August 18, 2016 at 4:14 pm
cad.delworth (8/18/2016)
Sorry Jeff, No I don't have the spare time at work that you seem to have. What I posted was intended to show anyone with a basic knowledge of SSRS how to calculate stuff etc. within a tablix.I did specifically take a bit of time out (that I shouldn't have) to prepare the example I posted, so I'm pipped that you are disappointed by it not being runnable code: I just don't have the time to prepare anything more than I have posted, nor did I think this was some kind of efficiency contest, or in fact a contest of any kind, between you and I.
If that's the reaction I get when I put myself out trying to help people, then I think it's best that I resign from this forum.
You're mistaken. It's not a contest. I'm just interested in possible differences in performance and I'm interested in seeing the actual package because most people don't post packages. I apologize if it all came across the wrong way.
And, no... I don't have time for this stuff at work either. Besides, it wouldn't be ethical to spend company time on such a thing. I did it after hours.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 19, 2016 at 2:27 am
Jeff Moden (8/18/2016)
You're mistaken. It's not a contest. I'm just interested in possible differences in performance and I'm interested in seeing the actual package because most people don't post packages.
With my reports I try to strike a balance with performance, use whichever performs best. With a few exceptions I tend to summarise (if required) data in SQL to a detail level and use totalling in the report.
With this volume of data SQL will win every time and just to prove it I have attached a very simple matrix report which produces a count and percentage per month (column) per Name (row) and Year (page).
I gave up after 10 minutes!
Far away is close at hand in the images of elsewhere.
Anon.
August 19, 2016 at 7:35 am
David Burrows (8/19/2016)
Jeff Moden (8/18/2016)
You're mistaken. It's not a contest. I'm just interested in possible differences in performance and I'm interested in seeing the actual package because most people don't post packages.With my reports I try to strike a balance with performance, use whichever performs best. With a few exceptions I tend to summarise (if required) data in SQL to a detail level and use totalling in the report.
With this volume of data SQL will win every time and just to prove it I have attached a very simple matrix report which produces a count and percentage per month (column) per Name (row) and Year (page).
I gave up after 10 minutes!
Awesome, David. Thanks a million. I'll give that bad boy a try. I really appreciate your time.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2016 at 4:27 am
Jeff Moden (8/16/2016)
faulknerwilliam2 (8/16/2016)
Dear JeffA thousand thanks for your help and your taking the time and trouble to help with my query. Your solution has worked perfectly. I love the ranking element.
I really appreciate the logical way your scripting evolves; and your notes make such a complex piece of scripting easy to follow.
Inspirational at many levels and I will be deconstructing / studying the script for my own development.
Many thanks again.
Absolutely my pleasure and apologies for the delay in returning to your post. Thank you very much for the feedback.
What I'm really happy about is this part of your response...
I will be deconstructing / studying the script for my own development.
There are a lot of people that would take the code and apply it to solve their problem but not take the time to understand it not only so they can actually maintain it, but so that they can solve similar problems on their own in the future. My hat is off to you, good Sir!
If you don't mind and if you have the time, I'd like to ask a couple of "usage" questions so that I might be able to help others in the future by setting up more realistic test data.
1. Obviously, this is a narrow test table and your real table may be quite a bit wider. How many columns are actually in this table and what's the average width of the rows in bytes? sys.dm_db_index_phyical_stats would be the tool to use for that number.
2. How many rows do you have in the real table?
3. How many products do you have in the real table?
4. How many months do you have in the table?
5. What's the ratio of active to inactive rows?
Thanks again for both the interesting problem and the feedback.
Hello Jeff, to answer some of your questions:
We are only focusing on four products, and from April this year, so 5 months, 4 products = 20 rows so far. There are many other products, which can be added if necessary and similarly there may be a requirement to go back further in time.
Where you say, What's the ratio of active to inactive rows? I'm not sure what you mean. All the rows have active data returned - there are none with null or 0% values, if that's what you mean?
And is sys.dm_db_index_phyical_stats something I can get to in management studio? I've not come across it nor run it before. It sounds helpful.
Sorry for the slightly vague answers, I hope they help. The report runs fast in management studio and SSRS, where I deployed it (with charts etc).
Thanks again for your help.
August 22, 2016 at 7:26 am
Heh... my test bed contained 26 products over 7 years with a million rows per year so I think you're probably all set for scalability. 😛
My question on the "ratio" was to find out how many active vs inactive rows you had so that I could use that to guide me in the creation of test data for other folks that may have a similar request. In the absence of such information, I had assumed 10% of the rows would carry an "inactive" status. For 20 rows, it won't matter one way or the other.
sys.dm_db_index_physical_stats is, indeed, available from SSMS and comes with any SQL Server installation. It's one of the more popular system functions that people use for various index maintenance steps and is considered to be "essential knowledge" for DBAs.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2016 at 8:48 am
I've worked with many different reporting tools including SSRS, and they all try and hide the SQL away and encourage you to do grouping and calculations within the report. The idea is good, for users that understand excel-like formulas who want to focus on design and layout, it makes sense. Some tools (e.g. MicroStrategy) even generate SQL for you, sending the grouping and calcs back to the database, which is probably the only way you would get any semblance of performance by working entirely report-side.
This is a fine way to operate on small datasets, but it absolutely doesn't scale. The main issues with connecting your report to a huge dataset are
1) the volume of data that is transferred over the wire from the database to the reporting tool
2) the difficulty with rendering huge numbers of rows in a browser
This applies to every reporting tool, Tableau, qlik, ssrs, power BI, whatever.
As Jeff pointed out, optimizing the query on the database side means firstly you are taking advantage of the faster database query engine, secondly sending fewer rows over the wire. That equals faster report render and solves the (1) issue above. The (2) issue above might go hidden if you always start designing your report with groups and filters prior to running it for the first time. I find people don't do that. They connect to the million rows (even 5000 is too many) and try and run the details section (i.e. no groups) and wonder why it falls over and the browser hangs. It's why Tableau will throw a LIMIT or a TOP in the query while in design mode, and why ssrs has pagination on by default.
There is of course though, the concept of drilling, drill down, drill across, or slicing and dicing. Many specs for dashboards call for flexible parameters and the ability for users to drill down to more granular data, while not waiting for the report to reload. If you've done all your grouping database side, then you sacrifice the flexibility of being able to change that grouping on the fly, without doing another round trip to the database, potentially waiting for the report to reload or respond.
If you think about a ratio calculation, or a percent of total calculation in a report syntax being equivalent to using a variation of a sum() OVER window function in SQL, then what the report calc gives you is actually more flexible. The partition by bit is not hard coded, it's relative to the current context of the report. You can change some drop-down or other control on the report and effectively change the grouping and the % total calc will, in SQL equivalence, adjust itself to be OVER the appropriate partition.
The downside is that having "ajax-like" ( i.e. no page reloading) page response whilst having full parameter flexibility implies returning a huge dataset to the report. I think that should be discouraged as much as possible, as you are back to problem (1).
There are things that reports can do that an optimized query that returns a single aggregated dataset doesn't cater for. It restricts flexibility. On the other hand total flexibility is going to lead to extremely slow performance. A report needs to be a compromise between those extremes. User experience is king.
There are a few good strategies.
Take the user on a journey through the data, start out fast and aggregated and high level, and then let the user go to the slow places if they need to. They will understand why it's slower.
Take for example measuring cpu usage on a server. You don't really need a second by second breakdown as an initial report view. It's too much data (86400 data points in a day) and it's too noisy. You might want to see max per hour in a day. That's down to 24 data points. From there if you spot an hour with high usage, zoom into (drill down) that hour and see if there is an issue. It's a simple example, and flawed, but explains my point.
Detect the user, and prefilter the granular data to what is relevant to them if possible. E.g. If they are sales manager for a single region, filter to the region. Report services has a system variable that you can interrogate and determine who is currently running the report. Assuming windows authentication, you can do some magic right there. Personalized, custom reports are pretty impressive.
One of my favourites is the exception report. Query ahead of time for the top 10, the bottom 10, the statistically significant, the anomalies, the failures, and present those in the report. After all that's what people are often doing with the report, looking for weirdness. Make their jobs easier. If they are an analyst, or they just want to dump the data, then why even create the report? Of course there are legit reasons for that. Financial reports, legally defensible documents etc. But for the most part, if users just want to dump out the data and import it into Excel or a different reporting tool, then what they need is a stored proc or a web service, not a report.
Finally, if you're a report writer, take some time with SQL, don't be put off by syntax that is new to you or looks complicated. Play around with it. CTE's are just an upfront way of writing subqueries with a few advantages - apart from recursive ones which don't have many advantages! "cascading CTE" just means it is like a bunch of nested subqueries, but much more clearly laid out. Window functions are just ways of getting to data that's not part of the current row being returned. You're already doing that same thing in the report calculations. OVER is amazing, really.
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply