updating CLR UDFs / Procedures strategy: contributions needed.

  • Scenario

    You develop a SQL Server 2005/2008 CLR library containing several user defined functions (UDFs) or Procedures.

    During development, you need to update the Library according to modifications/bug fixes; when you have new releases of the library, you need to update the library

    ISSUE: when you try to update the assmbly, you are given the error that assembly/function/procedure is in use by table or column

    SOLUTION (traditional): you have to drop all checks,computed columns, ... that reference the function

    The reason for this is logical, but makes it very tricky to use CLR UDFs: if your reference 100 times the function, you have to drop it 100 times and then recreate it after updating the assembly.

    WORKAROUNDS

    1) SCRIPT DROP / RECREATE operations: fine, traditional approach

    I do not explain this approach in details since there are several samples on the net

    2) BRIDGE FUNCTIONs/PROCEDUREs: more interesting/useful (some limitations)

    Let's suppose you have a CLR function named CLR_Add(int i1, int i2)

    You just create a T-SQL function named TSQL_Add(int i1, int i2) which invokes CLR_Add(int i1, i2)

    You then reference from all points in SQL (computed columns, triggers, ..) the TSQL_Add(int i1, i2) which acts a bridge to CLR_Add (which does the dirty job...)

    SO, for instance: Computed Column > TSQL_Add > CLR_Add

    Provided that the T-SQL function is created without schemabind, you can drop/alter/... the CLR assembly without having to drop and recreate the referencing constraints/triggers/... every time.

    The only limitation I found up to now is that if the function is used for a computed column and the column is persisted, it requires the function to be deterministic; provided that the original CLR function is deterministic (and satisfies the deterministic requirements), to make a bridge TSQL function deterministic it must schemabound, which the poses the drop/recreate issue back in place.

    You cannot use this approach for UDTs (types); this approac does not for for persisted computed columns (see note before on determinism)

    Please let me have your constructive considerations/suggestion/approaches.

    Provided that the functions/procedures definitions' method signature remains the same, you still have an

  • I always use your method #2, which we call "Wrapper" functions/procs, etc.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I'd have to ask what you're doing with CLR's that you think you can't do in T-SQL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well,

    I know about the ever going discussion on pros/cons, dos/donts between CLR UDTs/UDs and T-SQL ones.

    I must admit I am not a fan of religion wars, so I generally follow the following common sense principles:

    - Look at the objective

    - See the technologies that can help targeting the objective

    - Look at performance/roi: If I loose something in performance but I gain much in ROI (development/maintenance/..) there are cases where I can pay the little price for performance... (of course there are others where I can't)

    In the specific case, with UDFs where are doing some functions that derive from a common shared c# library:

    - Document extension retrieval: Db table is used to store CMS documents, we need to retrieve document extension and store it in a persisted computed column. Doing that with T-SQL is much more tricky that doing it with c# (Split string and take rightmost part) and is much more maintanable (2 lines of code)

    - Binary Permission management: we have a permission system which is made up of permissions represented bit bitflags; each permission is represented by an integer which in turn get represented with a zero values bit array with the nth bit represented by the integer set to 1.

    We can the xor permissions in a query simply saying check a set of existing permissions against a target permission and return true if permission is there and false if not (i am summarizing in a few words a quite complex scenario)

    - Filed validation via Regular Expression: if you have a check constraint on a text field (i.e. email, ), you have a UDF where you pass as a parameter the field value and the regex and you can get the field validated against any complexity regex.

    These are just some samples.

    Any constructive feedback is welcome.

  • Giovanni (3/21/2009)


    Well,

    I know about the ever going discussion on pros/cons, dos/donts between CLR UDTs/UDs and T-SQL ones.

    I must admit I am not a fan of religion wars, so I generally follow the following common sense principles:

    - Look at the objective

    - See the technologies that can help targeting the objective

    - Look at performance/roi: If I loose something in performance but I gain much in ROI (development/maintenance/..) there are cases where I can pay the little price for performance... (of course there are others where I can't)

    In the specific case, with UDFs where are doing some functions that derive from a common shared c# library:

    - Document extension retrieval: Db table is used to store CMS documents, we need to retrieve document extension and store it in a persisted computed column. Doing that with T-SQL is much more tricky that doing it with c# (Split string and take rightmost part) and is much more maintanable (2 lines of code)

    - Binary Permission management: we have a permission system which is made up of permissions represented bit bitflags; each permission is represented by an integer which in turn get represented with a zero values bit array with the nth bit represented by the integer set to 1.

    We can the xor permissions in a query simply saying check a set of existing permissions against a target permission and return true if permission is there and false if not (i am summarizing in a few words a quite complex scenario)

    - Filed validation via Regular Expression: if you have a check constraint on a text field (i.e. email, ), you have a UDF where you pass as a parameter the field value and the regex and you can get the field validated against any complexity regex.

    These are just some samples.

    Any constructive feedback is welcome.

    Hopfully, you'll consider this to actually be constructive feedback rather than the holy war that most people perceive it to be... especially on my part. :hehe:

    I definitely and absolutely agree with the idea of using a well formed and safe CLR for RegEx validation. Matt Miller and I did some pretty extreme testing (I can't find those posts anymore) comparing the performance and relative simplicity of using a RegEx CLR to various T-SQL methods to do the same thing. I could get pretty darned close and I could sometimes beat it so far as performance went, but it also gave me an "ice cream headache" on some of the more complex validations. It's definitely a case of reinventing the wheel when it comes to doing RegEx-like validations in T-SQL and RegEx is one of the places where I immediately drop my sword and shield on the subject of CLR deployment.

    Of course, it would be real handy if MS stopped spending time deprecating useful features and concentrated on adding things like RegEx as a native part of T-SQL. One can only wish...

    On the others, I'm not so sure. As you've probably seen, my ideas of the ROI between CLR's and T-SQL are quite a bit different than most simply because I can normally beat the tar our of CLR's for both performance and ease of maintenance. I've found that it's (please, no insult or irony implied here) usually just a case of someone not knowing what's fully available in T-SQL. For example, T-SQL does have an XOR operator that would probably handle your binary permissions task quite nicely without the use of a CLR.

    For the persisted column containing the extension of a file, I'm assuming (we know what happens when you do that, so please clarify if I'm wrong) that you mean the characters that follow the right-most period in a file name. Rather than going through the Strains'n'Pains of doing a split or building a CLR, that's easily handled by a single formula in T-SQL and, of course, will be very fast because it is in a persisted calculated column. Here's an example using a UNC name...

    DECLARE @SomeUNC VARCHAR(512)

    SELECT @SomeUNC = '\\Document-Vault\Some Main Section\Some Sub-Section\Category.SubCategory.FileName.date.Extension'

    --

    SELECT RIGHT(@SomeUNC,CHARINDEX('.',REVERSE(@SomeUNC))-1)

    Of course, for the persisted calculated column formula, you wouldn't need the declaration of a variable nor the first SELECT (or SET) and the variable would be replaced by the column name in final SELECT. To wit, the formula for the persisted calculated column would look more like this...

    RIGHT(FileNameCol,CHARINDEX('.',REVERSE(FileNameCol))-1)

    The key here is that you sure don't need a CLR or a split for such a simple thing and, so far a maintenance goes, what maintenance? It's not going to break once you've written the forumula for this very specific task.

    Anyway, thanks for taking the time to write your good feedback. I really hope you take this as my trying to teach rather than perceiving it to be a holy war, as so many do. I just find that it's just a whole lot easier to maintain code in a single language and, since T-SQL isn't a compiled language, I find that maintenance using T-SQL is a whole lot easier than writing source code, compiling it to a DLL, making the DLL safe, deploying it, and having to deploy that on each server that will use it especially when a super simple formula will handle the problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ... and, I apologize for not being able to help you with your original question.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I appreciated your contribution very much: you are perfectly right in your statements, including the fact that often there is a shift to CLR due to ignorance on T-SQL (and I put myself on this area as well, since i am not a SQL guru).

    As for the binary/bitwise permission, you are perfectly right.

    The only issue with that is that SQL does not accept/support to XOR a binary value against another binary value (one of the two must be an int, smallint, ...).

    This poses issues in the scenario when you want to manage a large (or customizable) set of permissions, since an int lets you manage 32 permissions at most (issues with sign bit apart), which could be limiting in several scenarios when you wanna have developers adding very specific permission sets (not just the traditional read, write, ...).

    Our approach, till this morning and due to "inheritance" from a previous application, was to develop a custom UDT: this is fine, because we can store the BitArray as a Byte[] and allows "simply" to manage inheritance.

    As a sample, take in consideration a data structure where you have folders and files, that you could represent through 2 tables (TBL_Folders and TBL_Files).

    Now, TBL_Folders is a hierarchical structure that can be made with traditional approach (ParentID) or SQL 2008 approach (hierarchid).

    Let's suppose that you want to manage "NTFS like" permissions: you have users and groups, with users that can be members of none, one or more groups.

    Users and groups are assigned permissions to folders through a couple of tables (TBL_UsersFoldersPermissions and TBL_GroupsFoldersPermissions) as you would do in NTFS; as in NTFS, we require that if you set permission on a "Father" folder, it is "inherited" to the descendant folders/files (inherited "virtually", we do not want to duplicate data) unless a folder in the descendant chain has a flag stating "block inheritance"; besides any folder down the chain can add further permissions.

    From one point of view, a UDT representing a permission set of a user/group on a folder makes things clean and does not potentially limit the number of bits/permissions; the XOR, AND, OR, .. operators can be overridden and the c# BitArray operators can be used to determine the resultant set of permissions down the chain.

    On the other point of view, this approach does not exploit SQL indexing and requires each single value to be XORed, ANDed, ... at runtime, which can have a discrete computational cost.

    Another approach would be to use pure SQL structure, representing each single permission in a relationship.

    Now, it think this could be a better design approach from the point of view of SQL and Indexes, but it seems to me messy/complex to implement.

    At the end of the day, we need to have a Function/SP that says: has user X permission Y on item Z ?

    With the SQL approach, we would substitute Bitwise operations (calculations at runtime) with indexes, so indeed it should be much more performing; nevertheless we should say (in words): find all the permissions for User X and for the groups where X is member which are of type Y (if any) and are related to the requested item in table TBL_Files or to any folder which is direct ancestor of file up to root or up to the point where ancestor has block inheritance set to true.

    The steps should be:

    1. Get the inheritance chain PK (Leaf>Parent>..) up to the point where block inheritance is set to true (if any)

    2 Find all permission of type P which are related to any of the PK chain of point 1 and if count>0 return true.

    I must say I am a little bit puzzled on how to achive point 1 effectively (without cursors or temp tables); i imagine I could do with CTE, but I did not come to a reasonable solution yet.

    At the end, the issue would be on how to get the PK of all the items in a hieararchy (based on hieararchid) up to the point where a StopInheritance column has a value of true.

    Any help, suggestion, consideration will be very much appreciated, both on the general issue/performance and on the specific hierarchy doubts.

    Thank you in advance

    Giovanni

  • You've managed to hit the other area that I drop the sword and shield on for CLR deployment... file management. 😉

    As you know, hierarchical data in SQL Server is a bit of a pain. Sure, there're a lot of fast and effective methods to interogate a hierarchical chain either on the fly (adjacency models) or by preprocessing (nested set models) and even hibrids between the two but, either way, it's a pain in T-SQL and I don't know enough about the hierarchical methods in OLAP to be able to make any claims there. I have a couple of ideas on how it could be done in T-SQL, but I'd need a heck of a lot more information on the requirements before I could be of much service there.

    So far as the bit-wise operators, you can also use BIGINT on both sides of the bit-wise operator.

    SELECT CAST(6000000000 AS BIGINT) & CAST(4000000000 AS BIGINT)

    Even though the POWER function makes bit identification quite easy, I suspect, however, that 63 bits of flags may not suffice for your particular application. With that in mind, you could do some logical "byte splicing", but I'll admit that that gets a bit messy in T-SQL.

    Except for simple file name reading and, perhaps, the simple movement of read/processed files to an archival area for ETL, you're correct that the type of file handling and persmissions settings that you're working with are probably best done in a CLR (although I wouldn't mind taking a crack at it someday).

    Tell me though... was I correct in my definition of what you expect as a file extension?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    you were perfectly correct with the file extension issue. I was missing the "reverse" approach with T-SQL and that lead us to think of using CLR to be efficient.

    With the permission approach, I will let you know which approach we'll go: of course, any idea will be very welcome.

    I confirm that even bigint is a sort of a issue: the number of manageable permissions would be fairly limited.

    I am investigating the plain T-SQL approach with joins and indexes: I will post here if we come to any good design result.

    As per the hierarchy, sql 2008 hierarchyid adds some help/features, but - on the other end - is much more complex to manage compared to the traditional ParentID approach (for instance you need to "manually" cascade delete things and cannot use referetial integrity rules since it "tolerates" orphano nodes).

    You can achieve same results, but need to write triggers and checkconstraints and - possibly - some instead of. This is not rocket science, but it is always code to maintain,test, debug in all the use cases (and you know that multiple updates/inserts/deletes need to be carefully managed in this case).

    As per permissions, I am still "mumbling" the best way to go.

    Any contribuition will be very welcome.

  • Giovanni (3/21/2009)


    Jeff,

    you were perfectly correct with the file extension issue. I was missing the "reverse" approach with T-SQL and that lead us to think of using CLR to be efficient.

    Heh... I don't get it... since the formula I gave you is so very simple and fast, why not just use it in the persisted calculated column definition instead of creating a CLR?

    On the permissions thing... I'm not sure I have a deep enough understand of all the business requirements surrounding that, but if you have a question on it related to T-SQL, I'll sure be happy to help and I'm sure others will climb aboard, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply