July 6, 2016 at 1:46 pm
The Dixie Flatline (7/6/2016)
Hey JLS, I think I have a solution, but for some strange reason the website won't let me post or upload the code if I spell out TRY_CONVERT more than once.Do a search_replace on the code below and let me know.
Key points:
1. Expanded rules for what can be considered a date string.
2. Used PATINDEX to determine if admissions came before discharges in CTE ([AdmissionFirst])
3. Used ROW_NUMBER() in CTE2 to make sure that DateX was always the last row for a given RowID and that the order of integers in [textstring] was preserved.
4. Used CASE statements in the final query to decide which integers were used for admissions and which for discharges, based on the value of the AdmissionFirst flag
---------------------------------------------------------------------------------------------------------------------------------------------------------
-- Search/replace "try_C(" to the correct function
---------------------------------------------------------------------------------------------------------------------------------------------------------
with cte as (select rowID, itemnumber,convert(varchar(50),item) as item
, case when PATINDEX('%Admission%Discharge%',TextString) > 0 then 'Y' else '' end as AdmissionFirst
, case when item like '%/%/%' or (try_C(date,Item) is not null and len(item) > 5) then item end as DateX
from #input i
cross apply DelimitedSplit8K(TextString, ' ')
)
,cte2 as (select *, ROW_NUMBER() over(Partition by RowID order by rowID,datex, ItemNumber) as RowNum
from cte
where item > ''
and (try_C(int,Item) is not null
or Item like '%/%/%'
or try_C(date,Item) is not null ))
select RowID
,max(case when AdmissionFirst = 'Y' and RowNum = 1 then Item
when AdmissionFirst <> 'Y' and RowNum = 2 then Item
end) as Admissions
,max(case when AdmissionFirst = 'Y' and RowNum = 2 then Item
when AdmissionFirst <> 'Y' and RowNum = 1 then Item
end) as Discharges
,max(DateX) as DateX
from cte2
group by RowID
Hi Dixie....
is this how you wanted to post your code?
with cte as (select rowID, itemnumber,convert(varchar(50),item) as item
, case when PATINDEX('%Admission%Discharge%',TextString) > 0 then 'Y' else '' end as AdmissionFirst
, case when item like '%/%/%' or (TRY_CONVERT(date,Item) is not null and len(item) > 5) then item end as DateX
from #input i
cross apply DelimitedSplit8K(TextString, ' ')
)
,cte2 as (select *, ROW_NUMBER() over(Partition by RowID order by rowID,datex, ItemNumber) as RowNum
from cte
where item > ''
and (TRY_CONVERT(int,Item) is not null
or Item like '%/%/%'
or TRY_CONVERT(date,Item) is not null ))
select RowID
,max(case when AdmissionFirst = 'Y' and RowNum = 1 then Item
when AdmissionFirst <> 'Y' and RowNum = 2 then Item
end) as Admissions
,max(case when AdmissionFirst = 'Y' and RowNum = 2 then Item
when AdmissionFirst <> 'Y' and RowNum = 1 then Item
end) as Discharges
,max(DateX) as DateX
from cte2
group by RowID
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 6, 2016 at 2:10 pm
Yes. I'm not sure why it wasn't letting me post that.
More importantly: Does it work for you?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 6, 2016 at 2:24 pm
The Dixie Flatline (7/6/2016)
Yes. I'm not sure why it wasn't letting me post that.More importantly: Does it work for you?
yeah.. I think it works...sorry but havent had time to fully investigate.
hopefully will hav esome more time tomorrow.
fancy buidling a test harness of 1 miilions rows to see perfomeance?
<grin>
best JLS
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 6, 2016 at 3:16 pm
Already did that. I created a table with an identity column and put your text strings into it, then just kept ramping it up with the following loop.
while (select count(*) from #input) < 1000000
begin
insert into #input
select top (10000) TextString
from #input;
end
The extraction query itself takes about 15 seconds for just over 1 million rows on my box. Execution plan says 30% of that is just writing the data out to a #results table.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 6, 2016 at 3:19 pm
The Dixie Flatline (7/6/2016)
Already did that. The extraction query itself takes about 15 seconds on my box. Execution plan says 30% of that is just writing the data out to a #results table.
while (select count(*) from #input) < 1000000
begin
insert into #input
select top (10000) TextString
from #input;
end
edit ...doh
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 6, 2016 at 5:40 pm
Couldn't help myself but have a little play with this myself.
Basically this determines the order of the words admission and discharge, grabs and remove the date (if possible) from the string, grabs the first and second number from the string and returns them in the the appropriate column determined by the word order.
This is all done in a bunch of nested queries.
Performance should be really good. Will fail if there isn't two numbers. If it can't determine a date it will return an empty string ... I'm looking at you row 14 🙂
WITH TestValues AS(
SELECT *
FROM (values
( 1, 'There were 5 admissions and 3 discharge since reporting 22/04/2016'),
( 2, 'today here were 6 discharges and 3 admission since reporting 21/04/2016'),
( 3, 'since reporting 30/04/2016 the discharges are 8 and the admissions are 10' ),
( 4, ' the discharges are 20 and the admissions are 30 on 10/05/2016 '),
( 5, ' the admission was 100 on 15/05/2016 and the discharge was 30 '),
( 6, ' on 15/08/2016 5 admissions and discharge 10' ),
( 7, ' 100 admission 200 discharges 25/12/2018' ),
( 8, ' discharge 500 on 01/01/2020 and 100 admission '),
( 9, ' discharge 1 2 admission 01/01/2020' ),
( 10, ' 20180101 100 admission 200 discharges' ),
( 11, '20160101 admission 1 2 discharges '),
( 12, 'here are the results up to 12/12/12 Admission 300 400 Discharge ' ) ,
( 13, ' Discharges 9000 20/May/2015 8000 Admissions ' ) ,
( 14, ' Discharges 99999 88888 Admissions 161216' ),
( 15, ' Discharges I think were 99 and maybe 88 possibly less Admissions on 2016-05-15' )
)
dt (rowID,TextString)
)
SELECT ReportDate
,CASE NumOrder WHEN -1 THEN FirstNum ELSE SecondNum END Admissions
,CASE NumOrder WHEN -1 THEN SecondNum ELSE FirstNum END Discharges
FROM (
SELECT NumOrder, ReportDate, FirstNum
,CAST(LEFT(STUFF(LeftOver,1,PATINDEX('%[0-9]%',LeftOver)-1,''),CHARINDEX(' ',STUFF(LeftOver,1,PATINDEX('%[0-9]%',LeftOver)-1,''))-1) AS INT) SecondNum
FROM (
SELECT NumOrder, ReportDate
,CAST(LEFT(STUFF(DateGone,1,PATINDEX('%[0-9]%',DateGone)-1,''),CHARINDEX(' ',STUFF(DateGone,1,PATINDEX('%[0-9]%',DateGone)-1,''))-1) AS INT) FirstNum
,STUFF(STUFF(DateGone,1,PATINDEX('%[0-9]%',DateGone)-1,''),1,CHARINDEX(' ',STUFF(DateGone,1,PATINDEX('%[0-9]%',DateGone)-1,''))-1,'') LeftOver
FROM (
SELECT SIGN( PATINDEX( '%admission%',TextString ) - PATINDEX( '%discharge%',TextString )) NumOrder
,CASE
WHEN PATINDEX('%[0-9][0-9][/-]%[/-]%[0-9][0-9]%',TextString) > 0 THEN
SUBSTRING(TextString,PATINDEX('%[0-9][0-9][/-]%[/-]%[0-9][0-9]%',TextString),10)
ELSE
SUBSTRING(TextString,PATINDEX('%[1-2][8,9,0,1][0-9][0-9][0-1][0-9][0-3][0-9]%',TextString),8 * SIGN(PATINDEX('%[1-2][8,9,0,1][0-9][0-9][0-1][0-9][0-3][0-9]%',TextString)))
END ReportDate
,CASE
WHEN PATINDEX('%[0-9][0-9][/-]%[/-]%[0-9][0-9]%',TextString) > 0 THEN
REPLACE(TextString,SUBSTRING(TextString,PATINDEX('%[0-9][0-9][/-]%[/-]%[0-9][0-9]%',TextString),10 ),'')+' '
ELSE
REPLACE(TextString,SUBSTRING(TextString,PATINDEX('%[1-2][8,9,0,1][0-9][0-9][0-1][0-9][0-3][0-9]%',TextString),8 * SIGN(PATINDEX('%[1-2][8,9,0,1][0-9][0-9][0-1][0-9][0-3][0-9]%',TextString))),'')+' '
END
DateGone
FROM TestValues
) A
) B
) C
July 6, 2016 at 8:01 pm
My customized solution would be to hunt down the idiot(s) that are creating the mess in the files and take them to a nice, sit down, pork chop dinner. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
July 7, 2016 at 7:42 am
You always cut to the heart of the pork chop... errrrr matter..... Jeff 😀
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 8 posts - 31 through 37 (of 37 total)
You must be logged in to reply to this topic. Login to reply