Join using Charindex -- there's got to be a better way?

  • peterzeke (10/20/2011)


    Makes sense to create a mapping table that can be leveraged for performance when doing reporting, but... doesn't the act of building such a mapping/crosswalk table return us back to my initial inquiry (query) of how to join the tables to get the correct mapping in the first place without performance totally dragging even for adding to the mapping table? At the very least, limiting the rows to evaluate being based on the latest insert_date would trim down the pool of data to be considered for appends to the mapping table, but the join itself would still be "flimsy" if charindex() or LIKE% is employed.

    For example, if the mapping table has three variations for "Citalopram", and a nightly run identifies a fourth variation of this drug, what technique/join is employed to identify this new drug description in the first place? It's simple enough to use an outer join to identify the new drug names, but somehow those new drug names need to map back to a key/primary drug description.

    Lastly, before I forget to say it, thanks to all of your input and participation on this topic.

    As Phil shared earlier your mapping table would have 2 columns, one for the incoming name and one for the common name. So your process would do a "lookup" on the mapping table and if a match is not found you'd insert that row with a null for the common name. Unfortunately, this is where the manual intervention comes in, you'd have to look at the mapping table for NULL in the common name column and manually update them. Then you'd have to re-run the process for the ones that you didn't map.

  • This should give you some ideas, Peter.

    SELECT

    DrugNameString = c.DrugName,

    DrugName = x.Item

    FROM (

    SELECT ClaimID = 123, DrugName = 'Oral Solution Citalopram HBR' UNION ALL

    SELECT 124, 'CITALOPRAM TABLETS' UNION ALL

    SELECT 125, 'Paracetamol TABLETS' UNION ALL

    SELECT 126, 'Paracetamol solution' UNION ALL

    SELECT 127, 'Paracetamol' UNION ALL

    SELECT 128, 'Tab, Citalopram'

    ) c

    CROSS APPLY(SELECT DrugString = REPLACE(c.DrugName, ' ', ',')) cl -- for clarity - avoids multiple REPLACE in below APPLY

    CROSS APPLY (

    SELECT

    Item = CAST(LEFT(cl.DrugString, ISNULL(NULLIF(CHARINDEX(',', cl.DrugString, 1),0)-1,8000)) AS VARCHAR(8000))

    UNION ALL

    SELECT

    Item = SUBSTRING(cl.DrugString,n,ISNULL(NULLIF(CHARINDEX(',',cl.DrugString,n),0)-n,8000))

    FROM (

    SELECT n = n+1

    FROM (

    SELECT TOP (1+ISNULL(DATALENGTH(cl.DrugString),0))

    n = (n1 + n2 + n3 + n4)

    FROM (((VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t3 (n3)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t1 (n1))

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t2 (n2))

    CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4 (n4) --

    ) n

    WHERE SUBSTRING(cl.DrugString,n,1) = ','

    ) d

    ) x (Item)

    INNER JOIN (SELECT DrugName = 'CITALOPRAM' UNION ALL SELECT 'Paracetamol') rft_DrugXrefDrugClass

    ON rft_DrugXrefDrugClass.DrugName = x.Item

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris -- interesting query. Looks like it's for SQL 2008?

    So, if I understand it, the CROSS APPLY components transform a drug description into a comma-delimited array, which is then pivoted vertically to inner join the parsed values against the rft table values. The deeply nested derived table of "n" values is another form of a tally table, it seems -- clever.

    I'll need to tweek the query to work in SQL 2005, but certainly promising.

    Without turning this thread into a SQL 2008 inquiry, I have a couple quick questions:

    1) what exactly does "t3 (n3)" and "t1 (n1)" ... and so on doing? Not familiar with parens () after an alias. Does the "t1" alias the "VALUES" as a derived table, and (n1) as a field name within the "VALUES"?

    2) Similar to the first question, what's "x (Item)" at the end of the second CROSS APPLY all about?

    Thanks

  • Hi Peter, I'm running out of time today but here's some notes.

    The entire cross apply bit can - and should - be replaced by a string splitter function. This is the bit I mean:

    CROSS APPLY (

    SELECT

    Item = CAST(LEFT(cl.DrugString, ISNULL(NULLIF(CHARINDEX(',', cl.DrugString, 1),0)-1,8000)) AS VARCHAR(8000))

    UNION ALL

    SELECT

    Item = SUBSTRING(cl.DrugString,n,ISNULL(NULLIF(CHARINDEX(',',cl.DrugString,n),0)-n,8000))

    FROM (

    SELECT n = n+1

    FROM (

    SELECT TOP (1+ISNULL(DATALENGTH(cl.DrugString),0))

    n = (n1 + n2 + n3 + n4)

    FROM (((VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t3 (n3)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t1 (n1))

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t2 (n2))

    CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4 (n4) --

    ) n

    WHERE SUBSTRING(cl.DrugString,n,1) = ','

    ) d

    ) x (Item)

    And the reason? Firstly, it will drastically reduce the complexity of your code. Secondly, I'm not even sure if Row Constructor - VALUES () - works with 2k5 😉

    t3 is the table alias, n3 is the column name/alias.

    x is the alias of the cross apply, (item) is the column alias

    The cross apply takes a delimited string and chops it into elements, a new row for each.

    Use REPLACE to replace spaces with a designated character, ',' is good. ' ' doesn't always work well as the delimiter.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • peterzeke (10/20/2011)


    Chris -- interesting query. Looks like it's for SQL 2008?

    So, if I understand it, the CROSS APPLY components transform a drug description into a comma-delimited array, which is then pivoted vertically to inner join the parsed values against the rft table values. The deeply nested derived table of "n" values is another form of a tally table, it seems -- clever.

    I'll need to tweek the query to work in SQL 2005, but certainly promising.

    Without turning this thread into a SQL 2008 inquiry, I have a couple quick questions:

    1) what exactly does "t3 (n3)" and "t1 (n1)" ... and so on doing? Not familiar with parens () after an alias. Does the "t1" alias the "VALUES" as a derived table, and (n1) as a field name within the "VALUES"?

    2) Similar to the first question, what's "x (Item)" at the end of the second CROSS APPLY all about?

    Thanks

    For 1, that basically builds an inline numbers table (derived). I've never seen it done that way before but it works.

  • OK -- great tips, clever tricks (and new fangled features I might get to play with if I ever get beyond sql 2k5)!

    Plenty to go on from here.

    thanks!

  • Phil Parkin (10/20/2011)


    Surely another possibility is building a mapping table, if this is likely to be a repeated process?

    Something like this:

    select 'CITALOPRAM TABLETS' Pseudonym,

    'Citalopram' GenericName

    union

    select 'Tab, Citalopram',

    'Citalopram'

    union

    select 'Oral Solution Citalopram HBR',

    'Citalopram'

    where 'Pseudonym' is the PK.

    Then joining on that to get to the Generic name (apologies - pharmaceutical nomenclature is not my strong point)

    Agreed. There are also formularies that at least used to be public info (Medicare's formulary, some blue shields have formularies) which hold this very info (i.e. every specific "market" name and formulation mapped to the generic/pharmaceutical name). Leveraging one of those should substantially cut down on the amount you need to map (excluding typos, etc...)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply