January 19, 2007 at 3:44 pm
What is the best way to take 30,000 records with a delimited Outcome field like the following….
PatientId Outcome
180460 “Observation initiated/increased; Laboratory tests performed; Drug therapy initiated/changed”
And insert multiple recs into another table, one rec for each of the semi-colon delimited values, such as:
PatientId Outcome
180460 Observation initiated/increased
180460 Laboratory tests performed
180460 Drug therapy initiated/changed
January 19, 2007 at 4:16 pm
Can I always split on the semicolon? What is the maximum number of semicolons in a field? Is this something that you want to do once and then not again or is this part of a query that you will be running multiple times and need it have reasonable performance?
Russel Loski, MCSE Business Intelligence, Data Platform
January 19, 2007 at 9:01 pm
Just posted the function here:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=338231
_____________
Code for TallyGenerator
January 19, 2007 at 10:10 pm
Stef,
These types of problems are incredibly simple if you have the right kind of tools. Fortunately, there is one tool that has tremendous utility as just such a tool. It is called a "Tally" table, consists of a single well-indexed column of numbers, and has many, many uses including solving problems like these. If you don't already have one, now is the time to make one... here's how...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates when needed
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
You only need to make it once... it should be a permanent table in your arsenal of SQL tools.
Because I always like to test these things, here's how I made 30,000 rows of test data. Granted, the data is highly duplicated, but it will suffice for this test... you have real data so you won't need to create this table...
--===== Conditionally drop and recreate a test table to simulate Stef's original table -- (creates 30,000 rows) -- IF OBJECT_ID('OriginalTable') IS NOT NULL -- DROP TABLE OriginalTable
SELECT TOP 30000 ParentID = IDENTITY(INT,180000,1), Outcome = CAST('Observation initiated/increased; Laboratory tests performed; Drug therapy initiated/changed' AS VARCHAR(400)) INTO OriginalTable FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2
ALTER TABLE OriginalTable ADD PRIMARY KEY (ParentID)
Ok... here's the code that does the trick. I do it only as a SELECT but you could very easily change it to an INSERT/SELECT to get the table you asked for...
SELECT ParentID,
LTRIM(SUBSTRING(+ot.Outcome,t.N+1,CHARINDEX(';',ot.Outcome,t.N+1)-t.N-1)) AS OutCome
FROM (SELECT ParentID,';'+Outcome+';' AS OutCome FROM OriginalTable) ot,
dbo.Tally t
WHERE t.N < LEN(ot.Outcome)
AND SUBSTRING(ot.Outcome,t.n,1)=';'
The above code should take about 5 seconds to run. Obviously, you need to change "OriginalTable" to the name of your actual table. Write back if you have any questions...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply