April 1, 2008 at 6:39 pm
I need help to create a script that would perform text parsing from a table. I have a text (words separated by the symbol '&' (symbol ampersand). I need to extract all words from the text as columns.
-The column name is 'Parameters' ---> [Source Table].[Parameters]
-The Parameter field contains text like this all togther: (Below is 2 sample rows)
Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008
Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674
There is a symbol '&' (symbol ampersand) between each category, all in 1 column.
My goal: To extract just the TEXT after the '=' like this:
Result should look like:
Subject Category Status IPPDM_Count Well_Count
---------------------------------------------------------------
Drill Drill Header NO 94356 84353
Zone Zone Header YES 94356 94356
THANKS!
April 1, 2008 at 6:44 pm
Just checking to be sure... do all the rows have exactly the same number of "fields" and all the rows are always in the same basic format?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 6:47 pm
Jeff Moden (4/1/2008)
Just checking to be sure... do all the rows have exactly the same number of "fields" and all the rows are always in the same basic format?
since that looks like an URL fragment, I'll bet no. but, i'll also bet that an xml parsing solution can do this quickly and (relatively) simply.;)
April 1, 2008 at 6:59 pm
Thanks, Antonio...
While I agree that the XML concatenation method is very fast, I've found that the XLM parsing method is a bit slow compared to a Tally Table or cteTally split.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 7:15 pm
sounds like a challenge to me! winner buys a round of beer for everyone?
April 1, 2008 at 7:36 pm
lalonacademy (4/1/2008)
I need help to create a script that would perform text parsing from a table.I have a text (words separated by the symbol '&' (symbol ampersand). I need to extract all words from the text as columns.
-The column name is 'Parameters' ---> [Source Table].[Parameters]
-The Parameter field contains text like this all togther: (Below is 2 sample rows)
Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008
Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674
There is a symbol '&' (symbol ampersand) between each category, all in 1 column.
My goal: To extract just the TEXT after the '=' like this:
Result should look like:
Subject Category Status IPPDM_Count Well_Count
---------------------------------------------------------------
Drill Drill Header NO 94356 84353
Zone Zone Header YES 94356 94356
Ok... first step... you have to add one of the most valuable tools available to your data base... a Tally table.
Sure, you can make a CTE to do something similar, but even that won't be as fast as a Tally table. Please visit the
following URL to learn what it is and how to make one...
http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/
Once you have that bad boy in place, then you can start doing some very high speed magic... like splitting your data
into columns... here's some test data (please see the URL in my signature for how we'd like to see example data) and
the code to split it... lemme know how it works for you...
--===== Create a test table to hold the test data. This is NOT part of the solution
CREATE TABLE #SourceTable (Parameters VARCHAR(8000))
INSERT INTO #SourceTable
(Parameters)
SELECT 'Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008' UNION ALL
SELECT 'Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674'
--===== Split the data into an EAV (Entity and Value) cte and reassemble as rows containing the desired columns
;WITH
cteFirstSplit AS
(--==== Splits data on the "&" character
SELECT RowNum = DENSE_RANK() OVER (ORDER BY Parameters),
ColInfo = SUBSTRING('&'+h.Parameters+'&', t.N+1, CHARINDEX('&', '&'+h.Parameters+'&', t.N+1)-t.N-1)
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case Parameters is NULL
#SourceTable h
ON SUBSTRING('&'+h.Parameters, t.N, 1) = '&'
AND t.N < LEN('&'+h.Parameters)
)
,
cteSecondSplit AS
(--==== This splits the split-out ColInfo at the "=" sign
SELECT RowNum,
ColName = LEFT(ColInfo,CHARINDEX('=',ColInfo)-1),
ColVal = SUBSTRING(Colinfo,CHARINDEX('=',ColInfo)+1,400)
FROM cteFirstSplit
)
SELECT MAX(CASE WHEN ColName = 'Subject' THEN ColVal ELSE NULL END) AS Subject,
MAX(CASE WHEN ColName = 'Category' THEN ColVal ELSE NULL END) AS Category,
MAX(CASE WHEN ColName = 'Status' THEN ColVal ELSE NULL END) AS Status,
MAX(CASE WHEN ColName = 'IPPDM_Count' THEN ColVal ELSE NULL END) AS IPPDM_Count,
MAX(CASE WHEN ColName = 'Well_Count' THEN ColVal ELSE NULL END) AS Well_Count
FROM cteSecondSplit
GROUP BY RowNum
ORDER BY RowNum
Of course, you'll need to change the "#SourceTable " table name in the first CTE to the actual name of your table.
Again, please visit the URL in my signature line... help us help you in the future.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 7:37 pm
antonio.collins (4/1/2008)
sounds like a challenge to me! winner buys a round of beer for everyone?
Sure, I'm good for a virtual beer and pretzels... Million rows ok with you? 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 9:56 pm
Ok... here's my entry for the virtual beer...
1 million rows... half are the first row and half are the second row that the Op posted. My goal was to split the first 5 columns out just like the Op asked for and then cross-tab them into rows... like the OP asked for. To keep differences in video cards from coming into play, the final result is inserted into a temp table. Times do not include the time to make the data. Since the OP did not post a PK, I took the liberty of making an IDENTITY column and put the PK on that.
My run times and box description is at the end of the code...
--===== Create and populate a million row test table to hold the test data. This is NOT part of the solution
drop table #SourceTable
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
u.Parameters
INTO #SourceTable
FROM (--===== Same two lines of data repeated
SELECT CAST('Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008' AS VARCHAR(8000)) AS Parameters
UNION ALL
SELECT 'Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674') u,
Master.sys.All_Columns ac1,
Master.sys.All_Columns ac2
ALTER TABLE #SourceTable
ADD PRIMARY KEY CLUSTERED (RowNum)
drop table #Results
SET ANSI_WARNINGS OFF
SET NOCOUNT ON
SET STATISTICS TIME ON
--===== Split the data into an EAV (Entity and Value) cte and reassemble as rows containing the desired columns
;WITH
cteFirstSplit AS
(--==== Splits data on the "&" character
SELECT h.RowNum,
ColNum = (ROW_NUMBER() OVER (ORDER BY Parameters)-1)%5,
ColVal = SUBSTRING(h.Parameters, t.N+1, CHARINDEX('&', h.Parameters, t.N+1)-t.N-1)
FROM dbo.Tally t WITH (NOLOCK)
RIGHT OUTER JOIN --Necessary in case Parameters is NULL
#SourceTable h
ON SUBSTRING(h.Parameters, t.N, 1) = '='
AND t.N < CHARINDEX('Zone_',h.Parameters)
)
SELECT RowNum,
MAX(CASE WHEN ColNum = 0 THEN ColVal ELSE NULL END) AS Subject,
MAX(CASE WHEN ColNum = 1 THEN ColVal ELSE NULL END) AS Category,
MAX(CASE WHEN ColNum = 2 THEN ColVal ELSE NULL END) AS Status,
MAX(CASE WHEN ColNum = 3 THEN ColVal ELSE NULL END) AS IPPDM_Count,
MAX(CASE WHEN ColNum = 4 THEN ColVal ELSE NULL END) AS Well_Count
INTO #Results
FROM cteFirstSplit
GROUP BY RowNum
SET STATISTICS TIME OFF
/**************************************************************
6 year old single P5 1.8 GHz, IDI Hard Drive, 1GB ram
SQL Server 2005 Developer's Edition sp2 with no cume updates.
SQL Server Execution Times:
1,000 rows: CPU time = 172 ms, elapsed time = 225 ms.
10,000 rows: CPU time = 1531 ms, elapsed time = 1619 ms.
100,000 rows: CPU time = 16875 ms, elapsed time = 21959 ms.
1,000,000 rows: CPU time = 185921 ms, elapsed time = 268232 ms.
***************************************************************/
Looking forward to your entry 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 10:00 pm
lalonacademy,
I'm not sure how you got the data into the table, but consider this... BULK INSERT with a format file to handle your strange delimiters will load 5.1 MILLION 20 column rows in 60 seconds flat... That means that 1 million rows will take about 12 seconds to import instead of going through all this in-column split stuff.
You might want to consider changing your load procedure. :w00t:
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 10:29 pm
Wow! I don't know where to start. Thanks for taking the time. Actually this is one of the field in a table. It's from the [reportserver].[ExecutionLog table].[Parameters]. The PK would be the key in that table. The parameters column contains the text in that format. Basically, I'm trying to query the [ExecutionLog] table and split the TEXT in the [Parameter] colum into several fields. This isn't from an URL. There will be thousands of rows in that table, some rows might have NULL values. I'm only looking to EXTRACT first 5 values. See the table in the REPORTSERVER database.
Result should look like:
Subject Category Status IPPDM_Count Well_Count
---------------------------------------------------------------
Drill Drill Header NO 94356 84353
Zone Zone Header YES 94356 94356
Cheers!!
April 1, 2008 at 10:33 pm
Where you want to start is the last code I posted... it will do the job.
The whole thing is that I wonder how the data got into the parameters column like it is... did it come from a file?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 10:38 pm
Jeff,
I'm NOT loading this into a table. I'm reporting on the executionLog table in reportserver database. I'll be quering the table to CREATE a DATASET in that format. I'm only parsing the [parameter] field. I'll be using the PK from the [ExecutionLog] table. Thanks!
Query Result should look like:
Subject Category Status IPPDM_Count Well_Count
---------------------------------------------------------------
Drill Drill Header NO 94356 84353
Zone Zone Header YES 94356 94356
April 1, 2008 at 10:43 pm
Would you please read what I said more carefully? The code I wrote does your split the way you want it... the load stuff was just a suggestion to do things better.
And read the comments in the code... the first part is just to build test data... that's not part of the solution...
--Jeff Moden
Change is inevitable... Change for the better is not.
April 1, 2008 at 10:44 pm
Jeff,
The Parameter is coming from the .RDL file SSRS 2005. All the parameters from the .rdl file logged into this field. Thanks I'll work on it in the morning.
April 1, 2008 at 10:46 pm
Thanks...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply