November 4, 2008 at 5:00 am
Hi,
I am working on a db with a structure similar to that in the picture. I tried to make this as simple as possible while still mimicking the problem I am trying to solve.
The Food table contains common attributes for any food. Fruit and Nut contain attributes specific to those food types.
Food can exist in a location. I want to have two main stored procedures for retrieving food.
1) GetFoodByID - Should retrieve the relevant row in the Food table and by checking the FoodTypeID, the retrieve the relevant row in either the Fruit or Nut table.
2) GetFoodByLocationID - Should use a set of FoodIDs gotten from FoodLocation by locationID to retrieve all relevant rows in the Food table and by checking the FoodTypeIDs for the FoodIDs, it will then retrieve the relevant rows from the Fruit and Nut tables. If there are none of a particular type, that table is not queried (i.e. If there is no Fruit in a Location, the Fruit table is not queried)
Each procedure will use helper functions/procs to retrieve the info from the Food, Fruit and Nut tables respectively. (e.g. GetFood, GetFruit, GetNut)
QUESTION:
Case 1) If I implement the two main procs exclusively in T-SQL, the only way I know is to check the FoodType and then query the appropriate tables. Essentially, a large IF statement.
Case 2) If I use CLR functions/procs I can wrap the T-SQL procedure calls in them and then use Delegates to call the appropriate function/proc (GetFruit or GetNut) in the appropriate instance. (no big IF).
I have not used CLR in SQL before and I do not know if this is a good use of it (i.e. worth the hassle). In terms of extending this system in the future, in case 1 I would have to modifiy the sproc with the large IF. In case two I would need to add additional T-SQL procedures and new CLR wrappers for them but I would not change code in the execution path.
Is there another solution I am missing? Any thoughts would be appreciated. Thanks in advance.
Peter
November 4, 2008 at 8:00 am
Peter (11/4/2008)
Case 1) If I implement the two main procs exclusively in T-SQL, the only way I know is to check the FoodType and then query the appropriate tables. Essentially, a large IF statement.
Yes. If the specific types (fruit, nut, etc.) return distinct row definitions, then this is the most practical approach within SQL. As an implementation tip, I would suggest having your IF statement call two different subordinate stored procedures, that each then execute the appropiate query to return their specific food type (this actually performs better in SQL).
(It is important to note that in SQL you cannot return two different row-types in the same rowset. If you need to do this, then you will need to employ either column mapping or column concatenation tricks to get around this. Essentially creating a new row-type that can hold both of the original row-types).
Case 2) If I use CLR functions/procs I can wrap the T-SQL procedure calls in them and then use Delegates to call the appropriate function/proc (GetFruit or GetNut) in the appropriate instance. (no big IF).
Right. This just pushes the IF into SQLCLR which has no advantage whatsoever over (1). It will be measurably slower and much more complicated.
What you do not mention is the possibility of pushing the IF into either your client or middleware code. That is, first get the Food row(s), then check the Type and get either the Fruit or Nut row as appropiate. I do not recommend this either, however, it does have the advantage that your calling code would know ahead of time what row-type was going to be returned.
I have not used CLR in SQL before and I do not know if this is a good use of it (i.e. worth the hassle).
No it is not. It is more work, more complex, harder to maintain and runs slower. No upside that I know of.
In terms of extending this system in the future, in case 1 I would have to modifiy the sproc with the large IF. In case two I would need to add additional T-SQL procedures and new CLR wrappers for them but I would not change code in the execution path.
If you implement Case 1 as I have suggested, then extending it becomes "extending the IF statement and add additional T-SQL procedures". If you really want to avoid "changing code in the execution path" for maintenance, there is a little-used trick that can allow you to change your IF into a dynamic switch-type statement (which is implicitly what delegation does in CLR). It is a bit "clever" though, so most people just stick with the IF statement.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
November 4, 2008 at 8:25 am
Hi,
Thank you for your reply. I agree with many of your ideas.
Your understanding of my Case 1 is correct. The specific types do return different row types but I do this in different rowsets so it is not an issue. I also use subordinate stored procs as you recommend.
As for Case 2, I can create a structure in which I can do away with any IF by using delegates and creating a Dictionary of these delegates using the FoodTypeID as a key for the dictionary. I agree that this is more complicated.
You mention that you would not recommend pushing the delegates or IF into the middleware. What is your objection to this shape?
Finally, you mention a "dynamic switch-type statement". Could you please give me a link or an example of this? I would be interested to see it. The main reason I got into this delegate CLR stuff was this reason of not wanting to change the code in the execution path. If I can accomplish this using SQL, that would be the preferred option.
Thanks again.
November 4, 2008 at 12:42 pm
Peter (11/4/2008)
As for Case 2, I can create a structure in which I can do away with any IF by using delegates and creating a Dictionary of these delegates using the FoodTypeID as a key for the dictionary. I agree that this is more complicated.
That is not the primary issue though: SQLCLR is. SQLCLR adds some maintenance and complexity overhead, but very rarely is a good as the best available solution for a problem. For instance, in this case, it is unlikely that the SQLCLR solution offers anything that the Client/Middleware implementation of the same thing does not, and it is more difficult to implement.
You mention that you would not recommend pushing the delegates or IF into the middleware. What is your objection to this shape?
"Not recommended" is probably too strong on my part, "probably not preferred" is closer to what I intend. Doing it that way involves and extra round-trip/stored procedure call, that's all. It's a little extra performance overhead.
Finally, you mention a "dynamic switch-type statement". Could you please give me a link or an example of this? I would be interested to see it. The main reason I got into this delegate CLR stuff was this reason of not wanting to change the code in the execution path. If I can accomplish this using SQL, that would be the preferred option.
I have never seen an example of it on-line. It is documented in BOL (under the "EXECUTE statement"), but it is subtle and most people miss both the implication and the application. I will see if I can come up with a Good example
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
November 4, 2008 at 2:03 pm
OK, here is an example that demonstrates calling subordinate stored procedures as though they were Delegates:
---- GetFoodByID EXEC Switch example ---
----------------------------------------
--====== First Create the subOrdinate procedures:
CREATE Proc GetFoodById_AsFruit( @ID as int)
As
Select Food.FoodID--first, the common food columns
, Food.FoodTypeId
, Food.ProductionRegion
--add other Food columns here ...
, Fruit.NumberOfSeeds
--add other Fruit columns here ...
From Fruit
Inner Join Food on Food.FoodID = Fruit.FoodID
Where Fruit.FoodID = @ID
GO
CREATE Proc GetFoodById_AsNut( @ID as int)
As
Select Food.FoodID--first, the common food columns
, Food.FoodTypeId
, Food.ProductionRegion
--add other Food columns here ...
, Nut.ShellStrength
--add other Nut columns here ...
From Nut
Inner Join Food on Food.FoodID = Nut.FoodID
Where Nut.FoodID = @ID
GO
--====== the Main Procedure to call all FoodType subordinate procedures:
CREATE Proc GetFoodByID( @ID as int )
As
-- get the FoodType
Declare @FoodType as NVarChar(255)
Select @FoodType = FoodType.FoodType
From FoodType
Inner Join Food On Food.FoodTypeId = FoodType.FoodTypeID
Where Food.FoodID = @ID
-- check it:
If @FoodType IS NULL
BEGIN
RaisError('Food has no Foodtype', 15, 1)
Return-- exit with error, if no food type found
END
-- build the name of the proper subordinate procedure:
Declare @proc as SYSNAME-- (SYSNAME is a type of NVarChar)
Set @proc = 'GetFoodByID_As' + @FoodType
-- check to see if it exists:
IF OBJECT_ID(@proc, 'P') IS NULL
BEGIN
RaisError('FoodType has no procedure', 15, 1)
Return-- exit with error, if no food type found
END
-- Call it as though it were a Delegate or dynamic Switch statement:
EXEC @proc
GO
Note that the FoodType name in the FoodType table determines the actual name of the subordinate stored procedure. Also, the name-space conditioning is important here for security reasons.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
November 5, 2008 at 5:36 am
Thank you for your replies and sorry for my delayed response.
Your idea of the dynamic switch type statement is an interesting one. I will look into this further. Thank you for taking the time to make an example.
To bring this a little closer to my actual environment. I have several different types of things (food in the example) many of which are sub typed to various levels of depth (e.g. Fruit-> Apple -> Gala - Granny Smith, etc...) In total I will probably have 60 or so distinct types.
That being said, the size of the tables containing this data will never be very large. I would be surprised if any of them ever reached over 5000 rows.
With that in mind, I would like to look at the new cases that seem relevant:
Case 1 - BIG IF) With roughly 60 types, this will be one large (and potentially confusing) IF statement. As we have mentioned every type will have a 'helper' stored procedure to retrieve the data for that type. Essentially I'll have a big logic procedure with the IF and 60 supplementary procedures to do the work.
Case 2 - Several Client Calls) If I use delegates in the client code I will need to make multiple calls for a given item. There is a performance cost here but due to the relatively small size of the tables the cost will likely be in establishing the connections and not in query execution.
Case 3 - Dynamic Switch Type SP) This is a good combination of the two. I need to look into our access/permissions model to know if dynamic sql is something we can use.
If it turns out I cannot use case 3, here is my question:
In case 2, there will be a small drop in performance with slightly more complicated code but adding new types will not require code changes in the future. In case 1, faster and less complicated code in which code modifications will need to take place in order to add new types. *(by code mods I mean modifying existing code, not adding new code (e.g. classes, stored procs))
What are your thoughts?
Thanks in advance.
November 5, 2008 at 6:35 am
Peter (11/5/2008)
Case 3 - Dynamic Switch Type SP) This is a good combination of the two. I need to look into our access/permissions model to know if dynamic sql is something we can use.
Before you do, you should realize a very important fact: this is not Dynamic SQL. Rather this is a "Variable EXECUTE" statement and it does not have any of the problems and security issues of Dynamic SQL.
Here's the difference, a Variable Execute looks like this:
Declare @proc NVarchar(255)
Set @proc = N'spYourStandardPrefix_' + @TypeToCall
EXECUTE @proc, @arg1, @arg2
Whereas Dynamic SQL looks like this:
Declare @sql Varchar(MAX)
Set @proc = 'Select some standard commands;
' + @AnySQLCommandsPassedIn
EXECUTE (@sql)
Notice the difference? With Dynamic SQL it is possible to execute ANY SQL commands that are passed in: this is generally dangerous (without strict discipline and precautions).
However, with Variable Execute, you can ONLY Execute a stored procedure. In fact, with prefixes (and/or suffixes) you can even strictly control what procs can be executed. So for instance, in my example, no matter what is in the @FoodType string, the Variable Execute can only execute stored procedures in your database, in the 'dbo' schema that start with 'GetFoodByID_As'. As long as you insure that only subordinate FoodType procs start with that prefix, then no security breach is possible through this mechanism.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
November 6, 2008 at 4:22 am
Ah yes! I do see the difference. Thanks for the idea. I really appreciate your replies.
I am debating between this and the BIG IF at the moment. Another related issue has come up. Sticking with our example, let's say the types look like this:
food
------/ \---------
/ fruit nut
/ \ / apple orange almond cashew
QUESTION 1:
Food Types that are created on the client are a subset of the table names (if you will). For example, I may have an abstract class fruit and a class apple which inherits from it. So the user can create a new apple, but not a new 'fruit'.
If I was to use the dynamic switch-type, in the database I would need to know the entire food hierarchy through data. I would be tempted to use the types but these exclude nodes that are not children. For example, if I wanted to query all the appropriate tables for an apple, I would need to get a string from somewhere that represented 'Food', 'Fruit' and 'Apple'. I could store a tree with this hierarchy for this purpose.
What are your thoughts on that approach? Is there another way of handling this?
QUESTION 2:
When I request getting data by location I can get different shaped datasets back. Say I make a call to find food in location 1, my dataset could be three tables: food, fruit and orange. Whereas, a call to location 2 may yield: food, fruit, nut, apple, cashew
Since the shape of the dataset will be different depending on the parameter used to call the stored proc, how do I have a stable way of determining which tables are being returned and in what order.
Case 1) Return all tables regardless of if there is any data in it or not. That way on the client I will always know where a table is in the dataset (i.e. cashew table is always the fifth)
Case 2) Return the names of the tables being returned in the first table of the dataset then return the tables with the data. Now on the client I can check the first table and determine how to handle the rest of the tables.
Case 3) Something else.
Any thoughts on this?
November 6, 2008 at 12:47 pm
What you are getting into here is the notorious "Impedance Mismatch" between Object Oriented design and the Relational Model of databases. The issues that you are raising are extremely interesting to me and I wish I could spend the next two days just on your questions. Unfortunately, work and my life are intruding and I cannot devote nearly as much time as I would like to this until tomorrow evening. So if I do not respond very quickly (or completely) that is why.
Let me address question two first:
Peter (11/6/2008)
QUESTION 2:...
Since the shape of the dataset will be different depending on the parameter used to call the stored proc, how do I have a stable way of determining which tables are being returned and in what order.
As I understand how this works in ADO.net, all of the tables should be returned in the same DataSet, as DataTables in the "Tables" DataTableCollection property. You can just browse all of the DataTables by enumerating the DataTableCollection and then for each DataTable, check its "TableName" property to find out what table it is.
Case 1) Return all tables regardless of if there is any data in it or not. That way on the client I will always know where a table is in the dataset (i.e. cashew table is always the fifth)
This is definitely the easiest way. Not for the client (which the above method should address) but for the SQL Server: this is much easier to write than the other approaches. The problem that you will run into is that it is not very efficient if you have a large number of FoodTypes/tables, with only a few that have the relevant data. However, initially, this is how I would do it (knowing that I could change it later on if I needed to).
Case 2) Return the names of the tables being returned in the first table of the dataset then return the tables with the data. Now on the client I can check the first table and determine how to handle the rest of the tables.
Actually, you should return the subordinate table names in the first (root) table, regardless of whether all of the other tables are included or not. It is much easier for the client and for debugging issues.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply