Is this a good idea or a really, really dumb one?

  • We have dumped quite a bit of data into a data warehouse and starting to create reports/views/dashboards/etc.
    In creating particular views, I am seem to be using the same metrics over and over.
    For instance, almost everyone wants to see "Current Month Sales" somewhere on their dashboard.
    Instead of writing the same query over and over, I thought about creating functions for these commonly used metrics.
    Only thing is, I can already think of about 30 functions that would help me and I bet there will be many more.

    So my question is....should I use functions for this or just keep re-using my queries?
    Are there limitations as to how many UDF's I can have?
    Are there any drawbacks to using functions?

    Many thanks in advance for the input!

  • You have to be careful with functions, because there are a lot of ways that you can write a function that will kill your performance. One of the biggest ones is that anything that uses certain kinds of functions (like a multi-statement function) will make SQL Server unable to use a parallel plan. That's a big killer, especially in a DW scenario where you might be doing a lot of data processing.
    So before you start down the path of making functions, make sure you read up on what you have to do to make SQL Server be able to work well with them.

  • Can you explain a bit about how you have architected the current solution? In particular, what is it about the way that you have put things together which means that you have to repeatedly cut/paste function definitions?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • I'm not sure what you're asking, but if what you're saying is that there is one database view created for each application dashboard, meaning that each new application dashboard requires creation of a new database view, then you're going about this the wrong way. There should be one database view for each distinct logical view (ie: vCurrentMonthSales and vSalesByAssociate). Then you have a small manageable number of database views from which any number of  application dashboards can combine in an unlimited number of ways.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • While I'm all for not copy and pasting code unnecessarily what exactly are you thinking about doing?  If all you end up doing is changing copying and pasting queries around to copy and pasting queries around that use functions what's the point?

  • stevenb 14609 - Tuesday, October 3, 2017 1:21 PM

    You have to be careful with functions, because there are a lot of ways that you can write a function that will kill your performance. One of the biggest ones is that anything that uses certain kinds of functions (like a multi-statement function) will make SQL Server unable to use a parallel plan. That's a big killer, especially in a DW scenario where you might be doing a lot of data processing.
    So before you start down the path of making functions, make sure you read up on what you have to do to make SQL Server be able to work well with them.

    I don't believe I would be using multi statement function but that is a great piece of advice.  Thank you!!!

  • Phil Parkin - Tuesday, October 3, 2017 1:24 PM

    Can you explain a bit about how you have architected the current solution? In particular, what is it about the way that you have put things together which means that you have to repeatedly cut/paste function definitions?

    The major source data is from a Salesforce instance with multiple bolt-ons....i.e. about 200 individual tables.
    If I want to get sales for a particular location and display "friendly" values then I have to perform joins.
    If I want to get detailed data then the number of joins goes up and up.

    I have considered creating a table to just populate the metrics but our process allows for parts of transactions in the past to be modified so I would have to re-evaluate the entire table on a daily basis.
    Therefore, I just re-query data from the past to ensure that all modifications have been accounted for.

  • Eric M Russell - Tuesday, October 3, 2017 1:35 PM

    I'm not sure what you're asking, but if what you're saying is that there is one database view created for each application dashboard, meaning that each new application dashboard requires creation of a new database view, then you're going about this the wrong way. There should be one database view for each distinct logical view (ie: vCurrentMonthSales and vSalesByAssociate). Then you have a small manageable number of database views from which any number of  application dashboards can combine in an unlimited number of ways.

    I strive for "all purpose" views but our org isn't as cut & dry as that.
    If I accumulate "sales by associate" and compare it to "sales by store" it is different.
    The reason is that associates can sell in multiple locations.
    Now multiply that example by the 60 other transaction types (which can be split by store or associate) and 4 major job types and then we have a large array of logical views.
    Maybe that is the right way to go but the biz isn't willing to wait on me that long...

  • ZZartin - Tuesday, October 3, 2017 1:39 PM

    While I'm all for not copy and pasting code unnecessarily what exactly are you thinking about doing?  If all you end up doing is changing copying and pasting queries around to copy and pasting queries around that use functions what's the point?

    Clarity of code and re-usability is probably my biggest goal.
    I've done some testing and performance seems to be the same between the two.
    I know I don't know everything so I'm wondering if there were inherent issues of using too many functions.

  • Ozymandias_EBON - Wednesday, October 4, 2017 8:31 AM

    Phil Parkin - Tuesday, October 3, 2017 1:24 PM

    Can you explain a bit about how you have architected the current solution? In particular, what is it about the way that you have put things together which means that you have to repeatedly cut/paste function definitions?

    The major source data is from a Salesforce instance with multiple bolt-ons....i.e. about 200 individual tables.
    If I want to get sales for a particular location and display "friendly" values then I have to perform joins.
    If I want to get detailed data then the number of joins goes up and up.

    I have considered creating a table to just populate the metrics but our process allows for parts of transactions in the past to be modified so I would have to re-evaluate the entire table on a daily basis.
    Therefore, I just re-query data from the past to ensure that all modifications have been accounted for.

    Is your DW load process doing any of this work for you?
    It should be doing the joins, calculating the results (or at least putting the data into a friendly format from which the results may be easily derived), picking up data modifcations since it was last run and more.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Wednesday, October 4, 2017 9:09 AM

    Ozymandias_EBON - Wednesday, October 4, 2017 8:31 AM

    Phil Parkin - Tuesday, October 3, 2017 1:24 PM

    Can you explain a bit about how you have architected the current solution? In particular, what is it about the way that you have put things together which means that you have to repeatedly cut/paste function definitions?

    The major source data is from a Salesforce instance with multiple bolt-ons....i.e. about 200 individual tables.
    If I want to get sales for a particular location and display "friendly" values then I have to perform joins.
    If I want to get detailed data then the number of joins goes up and up.

    I have considered creating a table to just populate the metrics but our process allows for parts of transactions in the past to be modified so I would have to re-evaluate the entire table on a daily basis.
    Therefore, I just re-query data from the past to ensure that all modifications have been accounted for.

    Is your DW load process doing any of this work for you?
    It should be doing the joins, calculating the results (or at least putting the data into a friendly format from which the results may be easily derived), picking up data modifcations since it was last run and more.

    Nope. 
    We use a third party service to aggregate the data and they do not perform and transforms/calculations/etc.....just a straight copy.
    We had to pick this vendor due to some "exotic" connections that they support (and most do not).

  • Ozymandias_EBON - Wednesday, October 4, 2017 9:20 AM

    Phil Parkin - Wednesday, October 4, 2017 9:09 AM

    Is your DW load process doing any of this work for you?
    It should be doing the joins, calculating the results (or at least putting the data into a friendly format from which the results may be easily derived), picking up data modifcations since it was last run and more.

    Nope. 
    We use a third party service to aggregate the data and they do not perform and transforms/calculations/etc.....just a straight copy.
    We had to pick this vendor due to some "exotic" connections that they support (and most do not).

    OK, but once the data is in your DW, are you able to run additional processes to augment the results produced by the third party? Should the results produced by the third party be considered as 'staging' results, from which you build your proper DW, perhaps?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Wednesday, October 4, 2017 9:57 AM

    Ozymandias_EBON - Wednesday, October 4, 2017 9:20 AM

    Phil Parkin - Wednesday, October 4, 2017 9:09 AM

    Is your DW load process doing any of this work for you?
    It should be doing the joins, calculating the results (or at least putting the data into a friendly format from which the results may be easily derived), picking up data modifcations since it was last run and more.

    Nope. 
    We use a third party service to aggregate the data and they do not perform and transforms/calculations/etc.....just a straight copy.
    We had to pick this vendor due to some "exotic" connections that they support (and most do not).

    OK, but once the data is in your DW, are you able to run additional processes to augment the results produced by the third party? Should the results produced by the third party be considered as 'staging' results, from which you build your proper DW, perhaps?

    That is what I am working on but until I get there I have to produce 🙂

  • Ozymandias_EBON - Wednesday, October 4, 2017 8:44 AM

    ZZartin - Tuesday, October 3, 2017 1:39 PM

    While I'm all for not copy and pasting code unnecessarily what exactly are you thinking about doing?  If all you end up doing is changing copying and pasting queries around to copy and pasting queries around that use functions what's the point?

    Clarity of code and re-usability is probably my biggest goal.
    I've done some testing and performance seems to be the same between the two.
    I know I don't know everything so I'm wondering if there were inherent issues of using too many functions.

    Functions are just a tool, it's not so much about how many you have but more about how they are being used.  How exactly are you planning on using functions that will improve reusability vs just moving what objects you are copying and pasting?

  • If is the database isn't building a dimensional model, a star-schema model, or computed summary tables, then it's not really a "data warehouse".

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply