January 2, 2015 at 2:57 pm
I need some help if someone is willing. I'm not a professional DBA however with Google as your friend anything is possible. My issue is that I have a table that has say 300 rows in it (could be up to 10,000 later on) and my deliverable is taking one column of string data and parsing it to 6 other columns on the same row in the same table. I have worked out the custom split I need using one variable and Jeff's Tally OH! My problem is I haven't figured out the best way (or for that matter any way) to feed the splitter one field at a time on the current row and then parse and insert the other 6 columns on that same row. Just copy and paste then set up #temp2 at the bottom of the code and execute.
Current custom parse code:
Declare @pDelimiter char(1),
@pString varchar(8000)
Set @pDelimiter='_'
--Set @pString=(SELECT [TagName] FROM [dbo].[TagReporting])
Set @pString='CS12_PrbTmp_L_DegF_1M'
;
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
RN=ROW_NUMBER() OVER(ORDER BY N1)
,N1
,L1
--Original columns for tally table. Shows actual split
--ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
--Item = SUBSTRING(@pString, l.N1, l.L1)
INTO #temp1
FROM cteLen l
Declare
@N1SN int
,@L1SN int
,@N1PM int
,@L1PM int
,@N1PQ int
,@L1PQ int
,@N1QL int
,@L1QL int
,@N1UN int
,@L1UN int
,@N1AP int
,@L1AP int
,@RN int
SELECT
@RN=(select max(rn) from #temp1)
FROM #temp1
SELECT
@N1SN=N1
,@L1SN=L1
FROM #temp1
WHERE rn=1
SELECT
@N1PM=N1
,@L1PM=L1
FROM #temp1
WHERE rn=2
SELECT
@N1UN=N1
,@L1UN=L1
FROM #temp1
WHERE rn=@RN-1
SELECT
@N1AP=N1
,@L1AP=L1
FROM #temp1
WHERE rn=@RN
SELECT
@N1PQ=(select N1 from #temp1 where rn=2)
,@L1PQ=(@N1UN)-(select N1 from #temp1 where rn=2)-1
FROM #temp1
SELECT
@N1QL=(select N1 from #temp1 where rn=3)
,@L1QL=(@N1UN)-(select N1 from #temp1 where rn=3)-1
FROM #temp1
SET @N1QL=
CASE
WHEN @RN > 4 AND @L1QL < 7 THEN
@N1QL
ELSE NULL
END
SET @L1QL=
CASE
WHEN @RN > 4 AND @L1QL < 7 THEN
@L1QL
ELSE NULL
END
INSERT INTO #temp2
([STD_SiteName]
,[STD_Parameter]
,[STD_PrmQual]
,[STD_Qualifier]
,[STD_Units]
,[STD_AvgPer]
)
VALUES
( substring(@pString, @N1SN, @L1SN)
,substring(@pString, @N1PM, @L1PM)
,substring(@pString, @N1PQ, @L1PQ)
,substring(@pString, @N1QL, @L1QL)
,substring(@pString, @N1UN, @L1UN)
,substring(@pString, @N1AP, @L1AP)
)
select * from #temp2
drop table #temp1
--drop table #temp2
--create table #temp2
-- (
--[STD_SiteName] [varchar](20) NULL,
--[STD_Parameter] [varchar](16) NULL,
--[STD_Qualifier] [varchar](6) NULL,
--[STD_PrmQual] [varchar](22) NULL,
--[STD_Units] [varchar](12) NULL,
--[STD_AvgPer] [varchar](6) NULL
--)
This is in the code and the sample I am using to develop the custom split, Set @pString='CS12_PrbTmp_L_DegF_1M'.
Results:
STD_SiteNameSTD_ParameterSTD_QualifierSTD_PrmQualSTD_UnitsSTD_AvgPer
CS12 PrbTmp L PrbTmp_LDegF 1M
I need this custom split because of non-standard string construction.
The code for column STD_PrmQual is provided in case of extra delimiters. Another database table with additional data will replace the #temp2 table and I do have a PK column for ID in the final table.
Thanks for any help.
Rich
January 2, 2015 at 3:25 pm
If I understood correctly, this should help you:
CREATE TABLE #Source(
SomeString varchar(8000))
CREATE TABLE #Test(
[STD_SiteName] [varchar](20) NULL,
[STD_Parameter] [varchar](16) NULL,
[STD_Qualifier] [varchar](6) NULL,
[STD_PQ] AS (([STD_Parameter]+'_')+[STD_Qualifier]) ,
[STD_PrmQual] [varchar](22) NULL,
[STD_Units] [varchar](12) NULL,
[STD_AvgPer] [varchar](6) NULL)
INSERT INTO #Source (SomeString) VALUES( 'CS12_PrbTmp_L_DegF_1M')
INSERT INTO #Test(
[STD_SiteName],
[STD_Parameter],
[STD_Qualifier],
[STD_PrmQual],
[STD_Units],
[STD_AvgPer])
SELECT MAX( CASE WHEN ItemNumber = 1 THEN Item END) AS STD_SiteName,
MAX( CASE WHEN ItemNumber = 2 THEN Item END) AS [STD_Parameter],
MAX( CASE WHEN ItemNumber = 3 THEN Item END) AS [STD_Qualifier],
MAX( CASE WHEN ItemNumber = 2 THEN Item END) + '_'
+ MAX( CASE WHEN ItemNumber = 3 THEN Item END) AS [STD_PrmQual],
MAX( CASE WHEN ItemNumber = 4 THEN Item END) AS [STD_Units],
MAX( CASE WHEN ItemNumber = 5 THEN Item END) AS [STD_AvgPer]
FROM #Source t
CROSS APPLY dbo.DelimitedSplit8K( SomeString, '_') s
GROUP BY SomeString
SELECT *
FROM #Test
GO
DROP TABLE #Source
DROP TABLE #Test
January 2, 2015 at 3:34 pm
since the STD_PrmQual column isn't atomic (= a concatenation of two other columns), I would drop it from the table definition. You can add it back in as an expression in a stored proc or view.
January 3, 2015 at 7:34 am
Strongly recommend following Luis's suggestion, simple and well tested approach.
😎
For completeness, below is an sample solution which replicates the logic in a query instead of the splitter function.
USE tempdb;
SET NOCOUNT ON;
GO
/* Drop the sample data table if it already exists in the database */
IF OBJECT_ID(N'dbo.TBL_SAMPLE_STRING') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_STRING;
/* Create the table for the sample data */
CREATE TABLE dbo.TBL_SAMPLE_STRING
(
SS_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED
,SS_STRING VARCHAR(50) NOT NULL
);
/* Insert few rows */
INSERT INTO dbo.TBL_SAMPLE_STRING (SS_STRING)
VALUES ('CS12_PrBTmp_L_DegF_1M')
,('CS13_PrCTmp_M_DegC_2M')
,('CS14_PrDTmp_N_DegK_3M')
;
/* Variable for delimiter character/symbol */
DECLARE @DELIM CHAR(1) = '_';
/* A CTE to seed the inlined tally table. In this case the tally table is
implemented as a sub-query as the number of characters in the raw string
controls the number of rows generated in the tally table for each row
in the set */
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
/* Using a set based (tally table driven) iteration to locate all delimiter
character instances in the set
*/
,DELIM_SET AS
(
SELECT
ST.SS_ID
,ROW_NUMBER() OVER
(
PARTITION BY ST.SS_ID
ORDER BY NM.N
) AS FLD_RID
,ST.SS_STRING
,NM.N
FROM dbo.TBL_SAMPLE_STRING ST
/* Inline tally table implemented as a sub-query */
CROSS APPLY
(
SELECT TOP(LEN(ST.SS_STRING)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM T T1,T T2,T T4,T T5
) AS NM
WHERE SUBSTRING(ST.SS_STRING,NM.N,1) = @DELIM
)
/* String splitting CTE, uses a combination of CASE and LAG window function to
split/substring the character string.
*/
,STRING_PARTS AS
(
SELECT
DS.SS_ID
,DS.SS_STRING
,CASE WHEN FLD_RID = 1 THEN SUBSTRING(DS.SS_STRING,1,DS.N - 1) END AS FLD_01
,CASE WHEN FLD_RID = 2 THEN SUBSTRING(DS.SS_STRING,LAG(DS.N,1) OVER (PARTITION BY DS.SS_ID ORDER BY DS.N) + 1 ,(DS.N - LAG(DS.N,1) OVER (PARTITION BY DS.SS_ID ORDER BY DS.N) - 1)) END AS FLD_02
,CASE WHEN FLD_RID = 3 THEN SUBSTRING(DS.SS_STRING,LAG(DS.N,1) OVER (PARTITION BY DS.SS_ID ORDER BY DS.N) + 1 ,(DS.N - LAG(DS.N,1) OVER (PARTITION BY DS.SS_ID ORDER BY DS.N) - 1)) END AS FLD_03
,CASE WHEN FLD_RID = 4 THEN SUBSTRING(DS.SS_STRING,LAG(DS.N,1) OVER (PARTITION BY DS.SS_ID ORDER BY DS.N) + 1 ,(DS.N - LAG(DS.N,1) OVER (PARTITION BY DS.SS_ID ORDER BY DS.N) - 1)) END AS FLD_04
,CASE WHEN FLD_RID = 4 THEN SUBSTRING(DS.SS_STRING,(DS.N + 1),LEN(DS.SS_STRING)) END AS FLD_05
FROM DELIM_SET DS
)
/* Simple aggregation to compress the values into rows, the group by aggregation
is used to eliminate NULLs from the set
*/
,FINAL_SET AS
(
SELECT
SP.SS_ID
,SP.SS_STRING
,MAX(SP.FLD_01) AS STD_SiteName
,MAX(SP.FLD_02) AS STD_Parameter
,MAX(SP.FLD_03) AS STD_Qualifier
,CONCAT(MAX(SP.FLD_02),@DELIM,MAX(SP.FLD_03)) AS STD_PrmQual
,MAX(SP.FLD_04) AS STD_Units
,MAX(SP.FLD_05) AS STD_AvgPer
FROM STRING_PARTS SP
GROUP BY SP.SS_ID
,SP.SS_STRING
)
/* Final output */
SELECT
FS.SS_ID
,FS.SS_STRING
,FS.STD_SiteName
,FS.STD_Parameter
,FS.STD_Qualifier
,FS.STD_PrmQual
,FS.STD_Units
,FS.STD_AvgPer
FROM FINAL_SET FS;
Results
SS_ID SS_STRING STD_SiteName STD_Parameter STD_Qualifier STD_PrmQual STD_Units STD_AvgPer
----------- ---------------------- ------------- -------------- -------------- ------------ ---------- -----------
1 CS12_PrBTmp_L_DegF_1M CS12 PrBTmp L PrBTmp_L DegF 1M
2 CS13_PrCTmp_M_DegC_2M CS13 PrCTmp M PrCTmp_M DegC 2M
3 CS14_PrDTmp_N_DegK_3M CS14 PrDTmp N PrDTmp_N DegK 3M
January 3, 2015 at 7:37 am
Thank you Luis and Mark for replying. Your advice was great however I have an unusual issue so I thought I would explain so you can see it better. I have provided an example to help.
We are replicating 40+ servers using the central subscriber model for reporting purposes. These servers have an app that uses a very non-normalized schema and I think there were at least a dozen different people involved with the configuration of the server app. This is legacy and costs along with lack of database experience keeps us from normalizing. In addition it is near real time monitoring of the process. We have set up individual db's for each server at central and will pull data from there to a central db for reporting. Our biggest issue is normalizing the data to make reporting easier for the business units involved. This is were I'm at now.
We have the original replicated table call Tag and we have set up a 2nd table called Tag_Reporting which is sync'ed to the Tag table and Tag_Reporting contains extra columns for the standards. We are using the 2nd table for standardizing tag names. Our plans are after splitting the tag names and putting them in the other columns we will try to standardize by referencing a standards table and replacing what is split if it is not standard. That's next and I need to do this first.
My example shows the column names involved with the averaging period added to the tag name by the app. As you can see my issue is that even though line 1 and line 2 have the same tag name they were configured differently. Design wise on the app if during the original install there had been a standard to follow maybe I wouldn't have this issue now however with a dozen different people I doubt it. There should 5 sections to a tag name as follows:
[MonitoringSiteName] [varchar](20) NOT NULL,
[Parameter] [varchar](16) NOT NULL,
[Qualifier] [varchar](6) NULL,
[Units] [varchar](12) NOT NULL,
An averaging period added by the app.
Best case, I will have 4 sections since Qualifier can be null or 5 if it is used and this is because the app adds the delimiter "_". Worst case would be 24 sections based on maximum use of the delimiter based on at least having one character separated by the delimiter. I don't have any like that but the possibly exists. I don't have a way to check but from reviewing a few of the tables I think 6 or 7 is where I stand.
I had thought about the same design that Luis proposed however with a 7 section tag based on additional delimiters added during configuration how could I tell what each section was. After studying the construct of the tag I had an idea. I know that the first section of the tag "MonitoringSiteName" was a maximum of 20 characters and after the delimiter, started the Parameter always. Next the last section was always going to be the averaging period which length didn't matter. From the design I knew that the next to the last section was the Units and that it was 12 characters. So what was left were the characters making up the Parameter and Qualifier sections. I decided to put the 2nd delimited section in as the Parameter, the 3rd from the last as the Qualifier if it was 6 characters or less and added a column to put in all characters from the start of the 2 delimited section to the start -1 of the next to last delimited section and called it STD_PmrQual. I felt this would take care of more than 95% of the tags for standardization and if we had problems the Business Unit could help us during UAT.
Long story short:
How to handle excess delimiters (more sections than design). I think I have a handle on that but not certain.
How to scan Tag_Reporting table select TagName, split, then put sections back into the columns on the same row. Cursor, loop???
I have since found out that the maximum row count for the worst case is ~800 rows. I think query however function or other methods???
Thanks for taking the time to help again. I know this was long but sometimes more information helps.
Rich
January 3, 2015 at 8:55 am
Thank you Eirikur for your reply. I am looking at both the methods currently. You have given insight to new processes I have not had a chance to study yet and I thank you for that because in my case I can understand in these real world scenarios and it makes me study.
Rich
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply