December 27, 2016 at 12:35 pm
Hello,
I have data from two different sources - Source1 and Source2 - we have a dtsx package where it loads data from source 1 to source 2(temp table) and in the same package the last step is executing stored procedure where it updates the columns.
UPDATE Emp
SET Emp.emptyp= temp.studcode
FROM dbo.Employee Emp
INNER JOIN student temp
ON Emp.ID = temp.ID
Now I have to add a logic in this stored procedure
after update statement add another set - write a select query select all the data form employee where condiiton ID does not start with ASJ..it returns everything..(basically it will exclude which brings SAP) then I have to apply loop on that curosr with in cursor need to check each row and parse it..i have to get Individual columns and update it...
10001:4|xxx ( Parsing)
split it on pipe.get 2 values then split it on colon get emp num and type number - we use these to do joins..
same columns exists in temp table too.emp num and type number
xxx- is the description field ..no need it to check this..
December 27, 2016 at 12:38 pm
mcfarlandparkway (12/27/2016)
Hello,I have data from two different sources - Source1 and Source2 - we have a dtsx package where it loads data from source 1 to source 2(temp table) and in the same package the last step is executing stored procedure where it updates the columns.
UPDATE Emp
SET Emp.emptyp= temp.studcode
FROM dbo.Employee Emp
INNER JOIN student temp
ON Emp.ID = temp.ID
Now I have to add a logic in this stored procedure
after update statement add another set - write a select query select all the data form employee where condiiton ID does not start with ASJ..it returns everything..(basically it will exclude which brings SAP) then I have to apply loop on that curosr with in cursor need to check each row and parse it..i have to get Individual columns and update it...
10001:4|xxx ( Parsing)
split it on pipe.get 2 values then split it on colon get emp num and type number - we use these to do joins..
same columns exists in temp table too.emp num and type number
xxx- is the description field ..no need it to check this..
I can't see a question here ...
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
December 27, 2016 at 12:51 pm
In the stored procedure - I need to implement a loop based logic after update statement ..
Now I have to add a logic in this stored procedure
after update statement add another set - write a select query select all the data form employee where condiiton ID does not start with ASJ..it returns everything..(basically it will exclude which brings SAP) then I have to apply loop on that curosr with in cursor need to check each row and parse it..i have to get Individual columns and update it...
Parsing Logic fields... from a coding perspecitive, I need to look for parsing colon and the pipe
10001:4|xxx ( Parsing)
split it on pipe.get 2 values then split it on colon get emp num and type number - we use these to do joins..
same columns exists in temp table too.emp num and type number
xxx- is the description field ..no need it to check this..
How to implement loop based cursor after the update statement?
December 27, 2016 at 12:59 pm
mcfarlandparkway (12/27/2016)
In the stored procedure - I need to implement a loop based logic after update statement ..Now I have to add a logic in this stored procedure
after update statement add another set - write a select query select all the data form employee where condiiton ID does not start with ASJ..it returns everything..(basically it will exclude which brings SAP) then I have to apply loop on that curosr with in cursor need to check each row and parse it..i have to get Individual columns and update it...
Parsing Logic fields... from a coding perspecitive, I need to look for parsing colon and the pipe
10001:4|xxx ( Parsing)
split it on pipe.get 2 values then split it on colon get emp num and type number - we use these to do joins..
same columns exists in temp table too.emp num and type number
xxx- is the description field ..no need it to check this..
How to implement loop based cursor after the update statement?
What makes you think you need a cursor? Why not just split the data as a set-based operation?
Based on the information you've provided, it's impossible to give you a code-based solution. If you want that, please take the time to write out sample DDL, INSERT statements to populate the tables whose DDL you have provided and details of the results which you expect, based on the sample data provided.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
December 27, 2016 at 1:18 pm
Set based approach? How to do this?
How to parse(split) this 9000430944:7|Information
-> 9000430944 - emp number
-- 7 - stud number
--Information - Description
How to split on pipe? and How to split on colon?
December 27, 2016 at 5:34 pm
mcfarlandparkway (12/27/2016)
Set based approach? How to do this?How to parse(split) this 9000430944:7|Information
-> 9000430944 - emp number
-- 7 - stud number
--Information - Description
How to split on pipe? and How to split on colon?
Totally possible. Here is an example, it relies on you having created DelimitedSplit8k[/url] somewhere in one of your DBs.
IF OBJECT_ID('tempdb..#tmp', 'U') IS NOT NULL
DROP TABLE #tmp;
CREATE TABLE #tmp
(
TempId INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED
, Jumble VARCHAR(50)
);
INSERT #tmp
(
Jumble
)
VALUES
('10001:4|xxx')
, ('10002:5|yyy')
, ('100933:222|asdml');
WITH UnJumbled
AS (SELECT
t.TempId
, x.ItemNumber
, x.Item
FROM
#tmp t
CROSS APPLY dbo.DelimitedSplit8K(REPLACE(t.Jumble, ':', '|'), '|') x
)
SELECT
UnJumbled.TempId
, EmpNo = MAX(IIF(UnJumbled.ItemNumber = 1, UnJumbled.Item, NULL))
, StudentNo = MAX(IIF(UnJumbled.ItemNumber = 2, UnJumbled.Item, NULL))
, Dsc = MAX(IIF(UnJumbled.ItemNumber = 3, UnJumbled.Item, NULL))
FROM UnJumbled
GROUP BY UnJumbled.TempId;
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
December 27, 2016 at 6:19 pm
mcfarlandparkway (12/27/2016)
Set based approach? How to do this?How to parse(split) this 9000430944:7|Information
-> 9000430944 - emp number
-- 7 - stud number
--Information - Description
How to split on pipe? and How to split on colon?
Another approach that does not require a splitter:
WITH pos(TempID, Jumble, d1, d2) AS
(
SELECT TempID, Jumble, CHARINDEX(':', Jumble), CHARINDEX('|', Jumble) FROM #tmp
)
SELECT
tempid,
EmpNo = SUBSTRING(Jumble, 1, d1-1),
StudentNo = SUBSTRING(jumble, d1+1, d2-d1-1),
Dsc = SUBSTRING(jumble, d2+1, 50)
FROM pos;
-- Itzik Ben-Gan 2001
December 27, 2016 at 6:43 pm
Nice, Alan. I saw the word 'split' and I was off like one of Pavlov's four-legged friends. Did not even consider alternatives.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
December 27, 2016 at 7:42 pm
Phil Parkin (12/27/2016)
Nice, Alan. I saw the word 'split' and I was off like one of Pavlov's four-legged friends. Did not even consider alternatives.
😛
I did make the assumption that there would always be a colon, then a pipe. If only it were really like that in the real world.
-- Itzik Ben-Gan 2001
December 28, 2016 at 4:12 pm
IIRC, PARSENAME has an element length of NCHAR(128) as a limit on each part it returns and is heavily dependent on the consistent position of items but the following is another way to skin the same cat.
WITH cteFixDelimiter AS
(SELECT Jumbled = REPLACE(REPLACE(Jumbled,':','.'),'|','.') FROM #Tmp)
SELECT EmpNo = PARSENAME(Jumbled,3)
,StudentNo = PARSENAME(Jumbled,2)
,Dsc = PARSENAME(Jumbled,1)
FROM cteFixDelimiter
;
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2017 at 7:15 am
Jeff Moden (12/28/2016)
IIRC, PARSENAME has an element length of NCHAR(128) as a limit on each part it returns and is heavily dependent on the consistent position of items but the following is another way to skin the same cat.
WITH cteFixDelimiter AS
(SELECT Jumbled = REPLACE(REPLACE(Jumbled,':','.'),'|','.') FROM #Tmp)
SELECT EmpNo = PARSENAME(Jumbled,3)
,StudentNo = PARSENAME(Jumbled,2)
,Dsc = PARSENAME(Jumbled,1)
FROM cteFixDelimiter
;
Just an FYI, but the use of PARSENAME would require that none of the data elements ever contain a period / decimal point. I'd have a preference for maintaining the existing delimiters, and just using the DelimitedSplit8K function to do the splitting. Having to use the REPLACE function is also surprisingly costly in terms of runtime, so if the data quantities are in any way sizable, the split function is probably going to beat the pants off of most other methods. However; chances are; PARSENAME is probably going to be faster than REPLACE.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
January 2, 2017 at 8:35 am
sgmunson (1/2/2017)
Jeff Moden (12/28/2016)
IIRC, PARSENAME has an element length of NCHAR(128) as a limit on each part it returns and is heavily dependent on the consistent position of items but the following is another way to skin the same cat.
WITH cteFixDelimiter AS
(SELECT Jumbled = REPLACE(REPLACE(Jumbled,':','.'),'|','.') FROM #Tmp)
SELECT EmpNo = PARSENAME(Jumbled,3)
,StudentNo = PARSENAME(Jumbled,2)
,Dsc = PARSENAME(Jumbled,1)
FROM cteFixDelimiter
;
Just an FYI, but the use of PARSENAME would require that none of the data elements ever contain a period / decimal point. I'd have a preference for maintaining the existing delimiters, and just using the DelimitedSplit8K function to do the splitting. Having to use the REPLACE function is also surprisingly costly in terms of runtime, so if the data quantities are in any way sizable, the split function is probably going to beat the pants off of most other methods. However; chances are; PARSENAME is probably going to be faster than REPLACE.
I have seen the REPLACE performance issue and agree 100%.
Why any splitter? You can just use CHARINDEX/SUBSTRING (note my earlier example) which will beat the pants off any splitter based solution.
edit: meant "CHARINDEX" but typed "APPLY"
-- Itzik Ben-Gan 2001
January 2, 2017 at 1:17 pm
Alan.B (1/2/2017)
I have seen the REPLACE performance issue and agree 100%.
Why any splitter? You can just use CHARINDEX/SUBSTRING (note my earlier example) which will beat the pants off any splitter based solution.
edit: meant "CHARINDEX" but typed "APPLY"
The last time I saw consistent data in a string that might need splitting, I fell off my dinosaur... Not that it doesn't happen... it just doesn't happen often enough in my world for me to see it...
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
January 2, 2017 at 2:59 pm
Alan.B (1/2/2017)
sgmunson (1/2/2017)
Jeff Moden (12/28/2016)
IIRC, PARSENAME has an element length of NCHAR(128) as a limit on each part it returns and is heavily dependent on the consistent position of items but the following is another way to skin the same cat.
WITH cteFixDelimiter AS
(SELECT Jumbled = REPLACE(REPLACE(Jumbled,':','.'),'|','.') FROM #Tmp)
SELECT EmpNo = PARSENAME(Jumbled,3)
,StudentNo = PARSENAME(Jumbled,2)
,Dsc = PARSENAME(Jumbled,1)
FROM cteFixDelimiter
;
Just an FYI, but the use of PARSENAME would require that none of the data elements ever contain a period / decimal point. I'd have a preference for maintaining the existing delimiters, and just using the DelimitedSplit8K function to do the splitting. Having to use the REPLACE function is also surprisingly costly in terms of runtime, so if the data quantities are in any way sizable, the split function is probably going to beat the pants off of most other methods. However; chances are; PARSENAME is probably going to be faster than REPLACE.
I have seen the REPLACE performance issue and agree 100%.
Why any splitter? You can just use CHARINDEX/SUBSTRING (note my earlier example) which will beat the pants off any splitter based solution.
edit: meant "CHARINDEX" but typed "APPLY"
It's kind of a given that your elements shouldn't have periods in them if you use this method. I thought everyone would understand that and that's why I didn't bring it up. It would be kind of like stating that any elements that contain a colon or pipe could be a problem in this case.
REPLACE isn't actually expensive at all especially for 1:1 replacements... unless you're using a collation where it's slow. You can easily overcome that by using the COLLATE clause with a binary collation. And remember that CHARINDEX, although to a lesser extent, is affected by the same collation problems.
As for why to use any kind of a splitter at all, consider that's exactly what was done using the two CHARINDEX functions and a calculated substring for each element 😉 and, as I stated in my post, it's just another way to skin the same cat.
Shifting gears a bit and since no one has yet brought it up... let's talk about the real problems with all of this...
1. Why is this highly denormalized data being stored in a table to begin with? If it's only stored in a staging table being used to parse such bad data before it gets to a permanent table, that's ok. If it's stored in a permanent table this way, that's a really bad idea and needs to be permanently fixed to avoid such a flagrant violation of what a column is supposed to contain because it will need to be split every time you want to guery an element from it.
2. Why would anyone use two different delimiters for this when there's only ever going to be a single set of elements in the data?
3. Why would anyone use a printable character (2 in this case) as a delimiter?
Using REPLACE or a splitter are minor problems compared to the larger problems posed by this type of data. If it were all done properly to begin with, this thread wouldn't be necessary and no one would have to worry about any of the unnecessary performance problems caused by this kind of data. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2017 at 4:12 pm
Jeff Moden (1/2/2017)
Alan.B (1/2/2017)
sgmunson (1/2/2017)
Jeff Moden (12/28/2016)
IIRC, PARSENAME has an element length of NCHAR(128) as a limit on each part it returns and is heavily dependent on the consistent position of items but the following is another way to skin the same cat.
WITH cteFixDelimiter AS
(SELECT Jumbled = REPLACE(REPLACE(Jumbled,':','.'),'|','.') FROM #Tmp)
SELECT EmpNo = PARSENAME(Jumbled,3)
,StudentNo = PARSENAME(Jumbled,2)
,Dsc = PARSENAME(Jumbled,1)
FROM cteFixDelimiter
;
Just an FYI, but the use of PARSENAME would require that none of the data elements ever contain a period / decimal point. I'd have a preference for maintaining the existing delimiters, and just using the DelimitedSplit8K function to do the splitting. Having to use the REPLACE function is also surprisingly costly in terms of runtime, so if the data quantities are in any way sizable, the split function is probably going to beat the pants off of most other methods. However; chances are; PARSENAME is probably going to be faster than REPLACE.
I have seen the REPLACE performance issue and agree 100%.
Why any splitter? You can just use CHARINDEX/SUBSTRING (note my earlier example) which will beat the pants off any splitter based solution.
edit: meant "CHARINDEX" but typed "APPLY"
It's kind of a given that your elements shouldn't have periods in them if you use this method. I thought everyone would understand that and that's why I didn't bring it up. It would be kind of like stating that any elements that contain a colon or pipe could be a problem in this case.
REPLACE isn't actually expensive at all especially for 1:1 replacements... unless you're using a collation where it's slow. You can easily overcome that by using the COLLATE clause with a binary collation. And remember that CHARINDEX, although to a lesser extent, is affected by the same collation problems.
As for why to use any kind of a splitter at all, consider that's exactly what was done using the two CHARINDEX functions and a calculated substring for each element 😉 and, as I stated in my post, it's just another way to skin the same cat.
Shifting gears a bit and since no one has yet brought it up... let's talk about the real problems with all of this...
1. Why is this highly denormalized data being stored in a table to begin with? If it's only stored in a staging table being used to parse such bad data before it gets to a permanent table, that's ok. If it's stored in a permanent table this way, that's a really bad idea and needs to be permanently fixed to avoid such a flagrant violation of what a column is supposed to contain because it will need to be split every time you want to guery an element from it.
2. Why would anyone use two different delimiters for this when there's only ever going to be a single set of elements in the data?
3. Why would anyone use a printable character (2 in this case) as a delimiter?
Using REPLACE or a splitter are minor problems compared to the larger problems posed by this type of data. If it were all done properly to begin with, this thread wouldn't be necessary and no one would have to worry about any of the unnecessary performance problems caused by this kind of data. 😉
I agree, ... all those things would be wonderful to have be the norm. Unfortunately, folks like you and I have to deal with the "real world", where ideals don't count for much often enough, and bad is normal, and ugly isn't sufficiently uncommon. I sometimes wonder if I would have work if these kinds of "problems" didn't exist.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply