Question about proper practices on Hierarchical data

  • Greetings everyone 😀

    I am responsible for maintaining and further developing the reporting data warehouse for my company (and many days I wish I was in charge of building it from the get go!). We have a Reporting Services person who likes to do things a bit odd in my opinion. Luckily I get the final say as to what goes into or comes out of our Data warehouse, or we would have a lot of issues 😀

    One prime example, so you can understand what I mean...a stored proc he asked me to implement, was taking on average of 20 seconds to run. I checked the output of the stored proc, and then where the data was coming from and couldn't fathom why it would take so long. After opening up the stored proc my jaw dropped. About 10 variables were defined (albeit poorly) and set across 7 statements that used the same table. The main query itself was a gigantic combination of 2 common table expressions that were combined together and pivoted. After about 4 hours, I reduced the amount of variables down to 7 and set them in one statement. I then took the remainder of CTE's and pivot statement into one select statement that yielded the exact same results. At the end it now takes less than a second to run. Note; no indexes were changed, added, or dropped to achieve that performance gain either. There were also several joined tables to add columns that already existed on the table he was using (which is a denormalized table I created for his reporting needs).

    So suffice to say his programming practices are a bit odd...he is smart no doubt and once you get him working in the right direction, the reports he produces are pretty cool.

    Having said all that, he asked me yesterday to deploy several new functions that he created, which are all Case statements that create custom categories for him to report on. We never really got a whole lot of requirements from our end users or management, so this project has been kind of 'winging it' mostly.

    My biggest contention with this is it seems like creating all these custom functions seems an improper way to handle this. Some of the case statements are rather large and I fear it would cause large amounts of overhead if used improperly (which given my example is inevitable).

    So I can think of 2 solutions; 1 would be to add these columns to our current dimension tables, and spend the time modifying and testing load scripts to make sure this doesn't case anything to blow up or, my second solution would be to add all these columns to a view, since some of them may change what is in them over time?

    Any thoughts on the situation are appreciated, I know I am not perfect myself either but I would prefer to do this the right way 😀

    Link to my blog http://notyelf.com/

  • I'd be more inclined towards a dimension. Views don't do anything but hide code.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Yeah I tend to agree with that. I just wanted to make sure my thought process on that was right, or possibly if there were other solutions I wasn't thinking of 😀

    Link to my blog http://notyelf.com/

  • Sounds like you're heading the right direction.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply