Compare date fields in SQL or determine which is older !

  • Hi all,

    I do apologize that the question is long...

    I've a flat file that I cleaned the data out using SSIS, the output looks like that :

    MEDICAL ADMIT PATIENT PATIENT DATE OF DX

    REC NO DATE NUMBER NAME DISCHARGE Code DRG #

    Editor: Removed data

    and the report (final results) attached in the screen shot from the final excel report.

    so what's happening is IF the same name or same account number is duplicate, that means the patient has entered the hospital again and needs to be included in the report.

    what I need to do is...

    Eliminate any rows that is NOT duplicate (not everybody in this file has been admitted again)

    and compare the dates to get the ReAdmitdate and ReDischargedate

    I dumped the data into a SQL table and trying to compare the dates to figure out "ReAdmitdate" and "ReDischargedate"

    but still how to tell which date is before the other and what's the "ReAdmitdate" and "ReDischargedate" from the duplicate rows and add it to the report? :w00t:

    any help is appreciated.

    Thanks

  • First, may I suggest that you read the first article I reference below in my signature block regarding asking for assistance. Please follow the instructions in the article for posting the information we can use to help you solve your problem. In addition to what the article states, please include the expected results based on the sample data, and the code you have written so far to try and solve your problem.

    Doing this will get you better answers faster, and with the additional benefit of tested code.

  • :exclamationmark::exclamationmark::exclamationmark::exclamationmark::exclamationmark:

    WARNING! WARNING! WARNING!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    NEVER, EVER, EVER, EVER, EVER POST LIVE PATIENT DATA!!!!!!!!!!!!!!!!!!!!!

    You might have just commited a violation of HIPAA laws.

    If you're going to post sample data like this, DO NOT use real names or data!!!

    Change it to bogus (but a valid representation) of your data -- for example, change names to "John Doe," change ID numbers to "XXXXXX" or "999999," etc.

    This is for the protection of your clients, as well as covering your own butt!!!

    I VERY STRONGLY SUGGEST that you go back and edit your original posting so that actual names and ID numbers DO NOT APPEAR!!!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • Ray K (4/7/2010)


    :exclamationmark::exclamationmark::exclamationmark::exclamationmark::exclamationmark:

    WARNING! WARNING! WARNING!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    NEVER, EVER, EVER, EVER, EVER POST LIVE PATIENT DATA!!!!!!!!!!!!!!!!!!!!!

    You might have just commited a violation of HIPAA laws.

    If you're going to post sample data like this, DO NOT use real names or data!!!

    Change it to bogus (but a valid representation) of your data -- for example, change names to "John Doe," change ID numbers to "XXXXXX" or "999999," etc.

    This is for the protection of your clients, as well as covering your own butt!!!

    Actually second that. Should have specified that in my post. I believe this to be true REGARDLESS of environment you are working in. Sample data should not be live data, but be representative of your data and the problem domain.

  • I have removed the data for safety. If this was made up data, I apologize, but please make sure that you are not posting live data on the Internet.

  • I'm guessing that there is a puddle of pee under huslayer's desk right now... :hehe:

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Hi Lynn,

    Thank you for your reply...

    I've attached , the DATA, TABLE query, and the final results..

    What I've done actually for the last 2 days is NOTHING, running 10000 or queries to try to compare and figure it out.

    I can get the duplicates , i can run datediff queries, but still how to tell which date is before the other and what's the "ReAdmitdate" and "ReDischargedate" from the duplicate rows and add it to the report?

    I know it might be simple but I just can't figure out what's the query to compare the dates, especially when some records has been admitted for 3 and 4 times !

    Thanks for your help and god bless whoever shares the knowledge

  • No Actually not, it wasn't real live data, I just mixed the names and the numbers 🙂

  • huslayer (4/7/2010)


    No Actually not, it wasn't real live data, I just mixed the names and the numbers 🙂

    You can't even use real names and numbers, even if they're mixed up. Truly deidentified PHI goes to the level of not even being able to use dates less than the year.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • No problem, I'm working on a JOHN DOE example 🙂 but i'm trying to match my sample data with the final report... working on it.

    Thanks

  • huslayer (4/7/2010)


    No problem, I'm working on a JOHN DOE example 🙂 but i'm trying to match my sample data with the final report... working on it.

    Thanks

    Thanks for doing that. When dealing with customer data or person data, one needs to be certain that no identifying ties can be made back to the real person when using it for test. Thus the best thing to do is really create a bunch of sample fake data that closely matches the results seen with the live data.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Readmits within what time frame? We have been looking within 30 days typically, but some reports want any readmits within a rolling 12 months with a 3 month lag for claims runout, etc.

    comparing dates should be done using datediff(), but I think you're saying that you don't know which records come first?

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • In case this helps, here's what I use when I need to look for help, just realized that the date shifting isn't working, but if you mock those up to dummy data, it'll take care of the rest.

    replace the parts that will pull out your actual data, then copy the SQL and the INSERTs for the group. Save your copy of the actual data, because the sample data will match the rows, so you can work back to real people on your end.

    /*

    Title: 'Generate De-identified Test Data.sql'

    Description:HIPAA legislation requires that unless you 'need to know' data to perform your job,

    you should not have access to it.

    But what if you need help on querying your data?

    The purpose of this script is to demonstrate how to generate test data

    so that help can be sought enterprise-wide, from users who may not have

    access to the actual data source.

    Values entered into the Upper and Lower Bound variables will change dates randomly,

    although the same values for both can be used to retain date information if necessary

    Here are the steps to use this script (marked as such below):

    STEP ONE - initialize variables as needed to name tables, limit results and move dates around

    STEP TWO - enter your query, adding 'INTO ##myTemp' as noted and fully naming tables

    STEP THREE - Execute this script and copy/paste first the Message tab,

    then the Results tab into your post/email

    STEP FOUR - Save a copy of the results in case you want to tie the new values back to the original

    The 'iRow' in the generated table matches the row in your data table

    Known flaws:n/a

    Author: Jon Crawford

    Healthcare Data Analyst

    Revision History:

    DateChanges

    -------------

    12/12/2008'initial implementation'

    */

    SET NOCOUNT ON

    USE TempDB -- DO NOT CHANGE THIS

    -- =================================

    -- DECLARE variables

    -- =================================

    DECLARE @shiftYearsLower int,-- lower bound of random years added to dates

    @shiftYearsUpper int,-- upper bound of random years added to dates

    @shiftMonthsLower int,-- lower bound of random months added to dates

    @shiftMonthsUpper int,-- upper bound of random months added to dates

    @shiftDaysLower int,-- lower bound of random days added to dates

    @shiftDaysUpper int,-- upper bound of random days added to dates

    @sq char(1), -- holds the value of a single quote to simplify our dynamic SQL

    @limit varchar(25), -- holds the number of rows you want to generate

    @columns varchar(8000), -- variable to hold the columns that we want

    @createTableSQL varchar(8000), -- variable to hold the dynamic SQL that will generate our table

    @mySelectList varchar(8000), -- variable to hold the dynamic SQL that represents your SELECT in the real world

    @myInsertList varchar(8000), -- @myInsertList holds the dynamic SQL for the INSERT statements you need

    -- if your inserts are too long for this,

    -- then my advice is to rethink your inserts, you're passing test data. Make it fit.

    @myTableData varchar(8000), -- holds the data that you will be copy/pasting

    @myTempTableName varchar(255) -- holds the name of your current output table (set to #temp)

    -- ===============================================

    -- Initialize variables - DON'T CHANGE THESE ONES

    -- ===============================================

    -- we have to set all to Empty String ('') because we're adding their value to themselves,

    --and if we don't, the first value is undefined, which wrecks the whole thing

    SET @columns = ''

    SET @createTableSQL = ''

    SET @mySelectList = ''

    SET @myInsertList = ''

    SET @myTableData = ''

    SET @sq = ''''

    -- ======================================================================================

    -- STEP ONE: Initialize variables - CHANGE THESE AS NEEDED

    -- ======================================================================================

    -- change @myTempTableName to whatever you want your table name to be that you will be posting

    -- *** NOTE *** you'll have to run this script once for each temp table that you want to generate

    SET @myTempTableName = '#temp'

    -- set @limit to 'TOP x' where x is the number of rows you want to generate

    -- or set to '' if you don't want to limit the results

    -- (but remember your statements need to fit within an 8000 character variable)

    SET @limit = 'TOP 10'

    -- the following shift days/months/years in all date fields either a set number or a random number in a boundary

    SET @shiftYearsLower = 10 -- lower bound of random years added to dates

    SET @shiftYearsUpper = 10 -- upper bound of random years added to dates

    SET @shiftMonthsLower = 1 -- lower bound of random months added to dates

    SET @shiftMonthsUpper = 1 -- upper bound of random months added to dates

    SET @shiftDaysLower = 0 -- lower bound of random days added to dates

    SET @shiftDaysUpper = 0 -- upper bound of random days added to dates

    -- =========================

    -- END OF STEP ONE

    -- =========================

    IF OBJECT_ID('Tempdb..##myTemp') IS NOT NULL BEGIN DROP TABLE ##myTemp END

    -- ======================================================================================

    -- STEP TWO:

    -- Add your SELECT statement below, this is a sample based on member and claim startdate

    -- that shows both the original and modified data

    -- SELECT INTO a global temp table (allows for complex joins, calculated fields, etc),

    -- so you'll want to alias all columns that are modified in any way (e.g. column AS alias)

    -- ======================================================================================

    SELECT TOP 10 rtrim(member.fullname) AS member,

    rtrim(provider.fullname) AS provider,

    rtrim(claim.claimid) AS claimid,

    claim.startdate

    -- ======================

    -- Add this to your query after your SELECT clause

    INTO ##myTemp

    -- FULLY NAME YOUR TABLES below along with aliases

    -- e.g. JOIN myDB.dbo.tablename tablename ON blah blah blah

    -- ^database author^ ^table ^alias ^JOIN criteria

    -- ======================

    FROM myDB.dbo.member member

    JOIN myDB.dbo.claim claim on member.memid = claim.memid

    JOIN myDB.dbo.provider provider ON claim.provid = provider.provid

    WHERE provider.fullname = 'myProvider'

    ORDER BY member.memid

    -- =========================

    -- END OF STEP TWO

    -- =========================

    -- ====================================================================

    -- STEP THREE: Now Execute the query for each temp table and

    --copy/paste from the Messages tab first and then

    -- the Results tab into your post/email

    -- ====================================================================

    -- =======================================================================================================

    -- STEP FOUR: Save the data from this as a reference in case you want to tie back to the original

    -- =======================================================================================================

    -- GENERATE INSERT STATEMENTS TO HELP OTHERS POPULATE #temp WITH TEST DATA

    SELECT @columns = @columns + char(9) + cols.name + ' ' + systypes.name

    + CASE WHEN systypes.name IN ('char','varchar') THEN '('+convert(varchar(4),cols.length)+')' ELSE '' END

    + ',' + char(13),

    @mySelectList = @mySelectList + cols.name + ', ',

    @myInsertList = @myInsertList + @sq + ',' + @sq + char(13)

    +

    CASE

    WHEN systypes.name IN ('char','varchar','datetime','smalldatetime')

    THEN '+' +@sq+@sq+@sq+@sq+ ' + '

    +CASE

    WHEN cols.name IN ('memid','carriermemid','fullname','enrollid','member')

    THEN

    @sq+'member'+@sq+'+convert(varchar(25),abs(checksum(ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')))) + '

    WHEN cols.name = 'ssn'

    THEN 'convert(varchar(9),100000000+abs(checksum('+objs.name+'.'+cols.name+','+@sq+@sq+'))) + '

    WHEN cols.name IN ('provid','affiliateid','affiliationid','fedid','provider')

    THEN

    @sq+'provider'+@sq+'+convert(varchar(25),abs(checksum(ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')))) + '

    WHEN cols.name IN ('claimid','claim','encounter')

    THEN 'convert(varchar(11),10000000000+abs(checksum('+objs.name+'.'+cols.name+','+@sq+@sq+'))) + '

    WHEN systypes.name IN ('datetime','smalldatetime')

    THEN 'convert(varchar,'+objs.name+'.'+cols.name+',20) + '

    ELSE

    'convert(varchar(25),ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')) + '

    END

    + @sq+@sq+@sq+@sq

    ELSE '+ CONVERT(varchar(255),ISNULL(' + objs.name+'.'+cols.name+',0))'

    END

    + ' + '

    FROM dbo.sysobjects AS objs

    INNER JOIN dbo.syscolumns AS cols ON cols.id = objs.id

    INNER JOIN dbo.systypes systypes ON cols.xtype = systypes.xusertype

    WHERE objs.name+'.'+cols.name IN (

    SELECT objs.name + '.'+ cols.name

    FROM sysobjects AS objs

    INNER JOIN syscolumns AS cols

    ON cols.id = objs.id

    WHERE objs.name = '##myTemp'

    )

    -- Create temp table generation code

    SET @createTableSQL = '

    IF object_id(''Tempdb..'+@myTempTableName+''') IS NOT NULL

    BEGIN DROP TABLE '+@myTempTableName+' END

    CREATE TABLE '+@myTempTableName+'

    (iRow int identity(1,1), -- identity column for primary key

    ' + substring(@columns,1,len(@columns)-2) + ')

    --===== Add a Primary Key to maximize performance

    IF OBJECT_ID(''Tempdb..'+@myTempTableName+''') IS NULL

    BEGIN

    ALTER TABLE '+@myTempTableName+'

    ADD CONSTRAINT PK_'+@myTempTableName+'_iRow

    PRIMARY KEY CLUSTERED (iRow)

    WITH FILLFACTOR = 100

    END'

    PRINT @createTableSQL -- Created code will show up in Messages tab of QA, copy/paste into your email/posted question

    PRINT '-- Insert test data into table' + char(13) + '-------------------------------------------------------' + char(13)

    -- Created code will show up in Results tab of QA, copy/paste into your email/posted question

    SET @myTableData = 'SELECT ' + @limit + ' ' + @sq + 'INSERT INTO '+@myTempTableName+' (' + substring(@mySelectList,1,len(@mySelectList)-1) + ') '

    + 'VALUES (' + @sq + substring(@myInsertList,4,len(@myInsertList)) + @sq +')' + @sq

    + ' AS [Copy and Paste these AFTER the text from the Messages tab]'

    + char(13)+ ' FROM ##myTemp'

    --PRINT @myTableData -- uncommenting this will show you what the dynamic SQL generated from the above looks like

    EXEC(@myTableData) -- execute the dynamic SQL we just generated, to create the INSERT statements

    SELECT @myTempTableName + ': SAVE THIS RESULT, ' AS [Example of how the data is changed],

    'the iRow of the ' + @myTempTableName + ' table' AS [.],

    ' corresponds to the row of this table' AS [..]

    SELECT * FROM ##myTemp

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Thanks anyway Steve, but it was made up and mixed up data..anyway I hope someone can direct me of how to do it using a query or using SSIS.

    Thanks

  • jcrawf02 (4/7/2010)


    Readmits within what time frame? We have been looking within 30 days typically, but some reports want any readmits within a rolling 12 months with a 3 month lag for claims runout, etc.

    comparing dates should be done using datediff(), but I think you're saying that you don't know which records come first?

    It's a monthly report, so it's within 30 days.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply