Data modeling processing rules?

  • Hi SQL community -- I'm trying to model some data processing rules (relationships?) that involve prioritization. The whole thing feels a bit like a hierarchy, I suppose, so I'll do my best to explain.

    I work with healthcare data which can be quite complex, and depending on the needs of reporting, querying the data can feel like a brain-teaser at times. Specifically, I produce a report that clinical pharmacists oversee wherein the goal is to identify patients taking specific expensive drug prescriptions and reporting out lower-cost alternatives.

    Up till now, the effort has been straight forward: e.g., For a single given expensive (brand) drug, list the lower-cost substitutions (i.e., one-to-many relationship).

    But, now the clinical pharmacists have come up with some complex scenarios that are tricky to model. Below is a sample of the rules:

    1. If patient has been prescribed drug A & drug B within 30 days, then recommend drug D as the lower cost option.
    2. If patient has been prescribed drug A & drug C within 30 days, then recommend drug D as the lower cost option.
    3. If patient has been prescribed drug A, only, then recommend drug E as the lower cost option.
    4. Do not execute #3 unless #1 & #2 are not true (i.e., Patient only filled drug A, and not B and not C.)

    I've been handed a six-page document that contains 100+ of these types of composite rules, where certain rules take priority (precedence) over other rules. So, the notion that I can use a Case Statement to hash through rules #1-4 wont scale by the time I get to rule #100 (that would be a ridiculous Case Statement).

    So, my initial thought on how to model the rules is the following table, I think:

    RuleID  ExpensiveDrug   CheapDrug SuperRuleID
    ------ ------------- --------- -------
    1 A D 0
    1 B D 0
    -------------------------------------------
    2 A D 0
    2 C D 0
    -------------------------------------------
    3 A E 1
    3 A E 2

    The idea behind the table is that the records for RuleID #3 shouldn't execute unless RuleIDs #1 & #2 are false (i.e., patient did not fill two drugs, just Drug A.)  In essence, the table is a parent-child -- I'm not thrilled with this approach, however, so I'm wondering if someone else has better ideas or perhaps some other model they've used that's similar in nature to my problem?

    (Also, these composite rules are going to be insurance plan specific. So, RuleID #1 may apply to insurance plans X & Y, RuleID #2 applies only to insurance plan Y, and RuleID #3 applies to both insurance plans X & Y.... And don't get me started on which chronic conditions a patient may(not) have -- heart disease, diabetes, asthma, etc. -- that can also impact whether a given rule applies at all.)

    As a final note, given the variety of types of rules, I started pondering whether an EAV approach would be necessary (yikes).

    Thanks in advance for your thoughts & suggestions,

    Pete

  • One thing I will tell you that may help is to create a truth table using the rules.  Working through that should help answer your questions regarding the interaction between the different rules where there are dependencies between the different rules.

    This could mean numerous tables where groups of rules are independent of others.  The idea here is to see how they may tie together.  I have found creating truth tables extremely helpful when working with complex criteria.

     

  • Lynn! I appreciate your replying to my post

    I'll be honest, I'm not familiar with "truth tables" -- I can guess what they are, but could you provide an example or point me to an article that demonstrates their use?

    Thanks!

  • Here is the place to start:

    https://www.google.com/search?q=truth+tables&rlz=1C1EJFC_enUS816US816&oq=truth+tables&aqs=chrome..69i57j0l5.2671j0j7&sourceid=chrome&ie=UTF-8

    I learned about them back in the late 70's while taking a micro electronics class in college.

     

    • This reply was modified 5 years, 5 months ago by  Lynn Pettis.
  • Thanks -- I was right in the middle of "googling" truth tables when I received your email.  Just a brief glance, and I see where this is headed. Much appreciated!

  • Without creating a truth table I can describe your first three rules in this manner:

    For drugs A, B, C if prescribed and filled within 30 days of each other:

    IF A and B THEN D

    IF A and C THEN D

    These two rules could be combined as IF A and (B or C) THEN D

    IF A and NOT B and NOT C THEN E

    Of course, there may be a rule that actually says something different if A and B and C are all prescribed within 30 days of each other, then that needs to be accounted for as well.

     

  • I like Lynn's idea. I was going to suggest some other items in the rule table, like the time or cost that might influence this. I might also include a priority, on the chance that someone has inconsistent rules that meet the same criteria.

  • Steve -- yes, your mentioning of timing and cost are also factors, in addition to patient's age, chronic conditions, insurance plan formularies... lots to track for Go/No-Go drug substitution reporting. And, yes, maintaining a priority flag is something I've also discussed with a colleague, too.

    Setting all of the many variables aside for the moment, and focusing on the basic concept of prioritized rules as I'd outlined in my original post, I'm still not quite formulating how to translate "IF A and (B or C) THEN D" into a table(s) that clearly defines the rule/relationship. Google searches for "SQL truth tables" aren't quite returning useful examples; many of them display SQL code that create T/F outcomes that look like permutations (e.g., total possible outcomes of flipping a coin three times) rather than a demonstration of how the logic is translated into rules that actually contain the data IDs/values. I understand the usefulness of logic tables for comprehending/formulating the rules, but not seeing how to operationalize these logical proofs into tangible reference tables.

    So, instead of thinking about drug substitutions, how would someone setup a table(s) that recommends the next college course to take based on prior completed courses:  "IF course A completed and (courses B or C completed) THEN course D, else course E"? So, in this light, we're talking about prerequisites and co-requisites, which seems comparable to the drug substitution issue I'm trying to model. I even find myself reflecting on complicated board games I used to play as a kid, where you'd roll dice and refer to an "events table" that would then refer to a secondary/tertiary outcome table-- But the game example is turn-by-turn (i.e. row-by-row), rather than being set-based (i.e., For all patients prescribed drugs in the past three months, report only those patients who filled expensive drugs along with the appropriate lower-cost substitutions?)

    I keep looking at the table mock-up from my original post, and feel as though it's somewhat defining the relationship between an expensive drug and substitute drug, while also specifying that the third rule is subordinate to the first two rules. Unfortunately, the table seems somewhat clunky, and yes, it would need additional columns of information such as priority, cost threshold, insurance plan ID, etc.... so, is the mockup table headed in the right direction, or terribly wrong?

     

  • I may have something that you can look at but I don't have it available at work.  When I get home I will see if I still have what I am looking for at home, and if so, I will provide that as an attachment to a subsequent post.

     

  • I did the Prerequisite/Co-requisite course thing a long time ago, and asked about it there. So if you look for it, Jeff Moden and someone else answered and came up with a good answer, but I'm not sure that one answers this question... unless you're looking at scenarios where "student 1 took courses {a, c}, which courses is he qualified to take next? In a nutshell, you remove all the "passed" courses from the list of "next" courses, and then filter for those where he is not missing any prerequisites.

  • I'll let Lynn respond, but I think you're on the right track here. To me, the table will return a prioritized list of rows that your application (or a process) will then walk down, looking for the next item to recommend.

  • I'm not sure at the moment.  I was unexpectedly busy this weekend and did not have a chance to look for the code I think may help in this situation even though it was for a different problem.  The concepts used in the code, however, I do believe to be relevant.  I should be able to look for it tonight as my daughter and her fiance are on their way back home in Virginia and looking (and buying) a wedding dress and brides maid dresses, and checking out possible venues for their wedding in November.

     

  • Actually, I found the code I was looking for, it was in my Azure database.  The code itself is not very usefully on its own as you need to see it run against the test data provided by the original presenter of the Killing Cursors presentation at the 2018 Colorado Springs SQL Saturday.  I will see about getting the data down or at least come up with a new set of test data using code that will allow you to see how it works.  Until then, here is the code:

    /****** Object:  StoredProcedure [dbo].[EliminateCursorsAndWhileLoops]    Script Date: 7/1/2019 8:07:39 AM ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE PROCEDURE [dbo].[EliminateCursorsAndWhileLoops]
    AS BEGIN
    /**********************************************************************************************************************************

    Procedure Name: [dbo].[EliminateCursorsAndWhileLoops]
    Author: Mr. Lynn A Pettis
    Date: 2018-03-27
    Site: Colorado Springs, CO

    Scope:
    This procedure was developed to show how cursors or while loops (basically still a cursor) could be eliminated. This example
    makes use of the environment provided by Cynthia Johnson and was the basis for her presentation at SQLSaturday 700 entitled
    "Killing Cursors."

    In my opinion, however, replacing cursors with WHILE loops isn't killing cursors. The way some of the code was written I actually
    believe well written fire-hose cursors would perform as well if not some what better than the code using WHILE loops. My intent,
    however, was not to see if fire-hose cursors would perform better but to see if I could find a way to actually remove both cursors
    and WHILE loops.

    During this process I first explored using dynamic SQL but writing it so that it was a single query. This code worked, reducing
    the number of reads slightly but also ran a bit longer than stored procedure [dbo].[CTE_Normal_Dynamic]. It also did not make use
    of any temporary tables writing directly to the results table created for storing the output so it could be compared to the other
    routines.

    After some thought I realized that there could be another way, partially based on how I developed the dynamic SQL for my first
    attempt at eliminating cursors/WHILE loops.

    First, it makes use of a table variable. When originally starting this procedure it was going to be a table valued parameter
    passed to dynamic SQL. As I began writing the code I realized that I would not even need to write dynamic SQL which simplified
    the code.

    The data written to the table variable is based off the information in the table [dbo].[Worklist]. The columns of interest in this
    table are: [NameFilterBeg], [NameFilterEnd], [ProviderFiltList], [InsCarrierFiltList], [LocationID], and [WorklistID].

    The first four columns are mnnipulated to provide the needed information for data comparisions.
    The columns [NameFilterBeg] and [NameFilterEnd] are combined into a single column, [PatientFilter]. The data, stored as comma
    delimited data, in the columns [ProviderFiltList] and [InsCarrierFiltList] are split into individual rows of data. This also
    resulted in my adding the function [dbo].[DelimitedSplitN4K] to the database. The data from the columns [ProviderFiltList] and
    [InsCarrierFiltList] is written to [ProviderID] and [InsCarrierID] respectively.

    Logic used:
    For determining if a patient name was in the defined range in the Worklist table, instead of doing a between comparison with the
    name I thought it would be easier to use the LIKE operator with T-SQL regular expression. To insure that the data was properly
    compared I only needed to determine if the name started with a letter in the range defined by the first letter of the data in the
    [NameFilterBeg] and [NameFilterEnd] columns. This resulted in a string value, for example, of '[A-J]%' in the @WorklistLookup
    table. If either or both of these columns contained a NULL value then a single '%' wildcard was inserted.

    For the [ProviderFiltList] and [InsCarrierFiltList] columns, I determined that it would actually facilitate the comparison of
    these values by spliting them into individual rows of data allowing for a direct comparison. Leaving them as comma separated
    data would have required spliting the data eventually and probably using dynamic SQL to facilitate the process. Doing this spilt
    upfront simplified the process.

    As each row of data from the table [dbo].[PatientCharge] could potentially match multiple rows of data from the @WorklistLookup
    table, it was also necessary to select the top one (1) row of data based on the priority of the match. This is why the lookup
    was done in a select query inside of a CROSS APPLY clause in the FROM clause of the query.


    This code is the intellectual property of the author but may be used with attribution (and retaining all comments) in future
    presentations or even with modifications in a production environment.
    **********************************************************************************************************************************/
    SET NOCOUNT ON;

    --~ For testing purposes insure that the landing place for the output data is empty.
    TRUNCATE TABLE [dbo].[PatientChargeResults_lap];

    DECLARE @WorklistLookup TABLE (
    [LocationID] INT NOT NULL
    , [PatientFilter] NVARCHAR(100) NOT NULL
    , [ProviderID] INT NULL
    , [InsCarrierID] INT NULL
    , [Priority] INT NOT NULL
    , [WorklistID] INT NOT NULL
    , [LookupID] INT NOT NULL IDENTITY(1,1)
    --~ Create the table variable with a unique clustered index to facilitate faster comparisons (hopefully, not tested without it).
    --~ Used a unique clustered index as [ProviderID] and [InsCarrierID] are nullable columns.
    , UNIQUE CLUSTERED ([LocationID],[PatientFilter],[ProviderID],[InsCarrierID],[Priority],[WorklistID],[LookupID]));

    INSERT INTO @WorklistLookup([LocationID],[PatientFilter],[ProviderID],[InsCarrierID],[Priority],[WorklistID])
    SELECT
    [w].[LocationID]
    --~ Convert AAA and JZZ to [A-J]%, or if one or both are null to %, to allow filtering on all names that start with the specified letter range.
    , CASE WHEN [w].[NameFilterBeg] IS NOT NULL
    AND [w].[NameFilterEnd] IS NOT NULL
    THEN N'[' + LEFT([w].[NameFilterBeg], 1) + N'-' + LEFT([w].[NameFilterEnd], 1) + N']%'
    ELSE N'%'
    END AS PatientFilter
    , [ca1].[ProviderID]
    , [ca2].[InsCarrierID]
    , [w].[Priority]
    , [w].[WorklistID]
    FROM
    [dbo].[Worklist] AS [w]
    --~ Split the data in the column [ProviderFiltList] into individual rows of data for easier comparisons.
    CROSS APPLY ( SELECT
    CAST([dsnk].[Item] AS int)
    FROM
    [dbo].[DelimitedSplitN4K]([w].[ProviderFiltList],',') AS [dsnk])ca1(ProviderID)
    --~ Split the data in the column [InsCarrierFiltList] into individual rows of data for easier comparisons.
    CROSS APPLY ( SELECT
    CAST([dsnk1].[Item] AS int)
    FROM
    [dbo].[DelimitedSplitN4K]([w].[InsCarrierFiltList],',') AS [dsnk1])ca2(InsCarrierID)
    ORDER BY
    [w].[LocationID]
    ,[w].[Priority];

    INSERT INTO dbo.PatientChargeResults_lap(
    [LocationID]
    ,[PatientID]
    ,[PatientChargeID]
    ,[WorklistID]
    ,[BalanceAmt]
    ,[ProviderID]
    ,[InsCarrierID])
    SELECT
    [pc].[LocationID]
    ,[pc].[PatientID]
    ,[pc].[PatientChargeID]
    ,[ca1].[WorklistID]
    ,[pc].[BalanceAmt]
    ,[pc].[ProviderID]
    ,[pc].[InsCarrierID]
    FROM
    [dbo].[PatientCharge] AS [pc]
    --~ Join to the Patient table to obtain the patient's last name for determining worklist assignment.
    INNER JOIN [dbo].[Patient] AS [p]
    ON [p].[PatientID] = [pc].[PatientID]
    --~ Using cross apply determine the appropriate worklist to assign based on the valid join conditions and priority.
    CROSS APPLY (SELECT TOP (1)
    [wl].[WorklistID]
    FROM
    @WorklistLookup AS [wl]
    WHERE
    [pc].[LocationID] = [wl].[LocationID]
    AND [p].[PatientNameLast] LIKE [wl].[PatientFilter]
    AND ([pc].[ProviderID] = [wl].[ProviderID] OR [wl].[ProviderID] IS NULL)
    AND ([pc].[InsCarrierID] = [wl].[InsCarrierID] OR [wl].[InsCarrierID] IS NULL)
    ORDER BY
    [wl].[Priority] ASC
    ) AS [ca1]
    WHERE
    --~ Only interested in the data that has a balance greater than zero (0).
    [pc].[BalanceAmt] > 0;

    END -- End of procedure [dbo].[EliminateCursorsAndWhileLoops]
    GO

     

  • Lynn/Steve/pietlinden -- Thanks for your replies and willingness to provide assistance. Sorry my appreciation was delayed -- life and work priorities shifted on me last week.

    Lynn -- Thanks for sharing your SQL code -- I understand how it's operating. The "select top 1... order by" within the cross apply statement is clever. I'd been pondering using a ranking function against a "priority" field to ensure that the rules which qualify certain criteria are properly selected per the ranking; similar concept to your "select top 1".

    pietlinden: I found your forum post/discussion regarding pre-/co-requisite courses. Interesting solution provided in the thread. The solution almost translates to my drug-substitution requirements, but not quite. The courses solution assumes that a student needs to take all courses in the "courses" table which is somewhat off-point from the drug-substitution challenge I'm facing. Nonetheless, the courses solution provides good food-for-thought.

    Overall, the feedback from all of you has led me to examine articles on decision tables and business rules engines -- lots of information within these topics.  Came across one basic article: http://www.brcommunity.com/articles.php?id=b516 . I think where I'm headed with all of this is that the data model may end up providing a definition of how drugs can be substituted, and how the priority of the substitutions should be executed, but the data model/relationships won't necessarily offer up the actual explicit processing logic. That's where code logic similar to Lynn's stored proc comes into play, it seems.

    Thanks

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply