December 6, 2011 at 8:11 am
Good morning Everybody.
I know that it might be Not a best solution to use cursor....however here is my question:
using below USP to 'scramble' some test data...
it's a SP with cursor inside that replaces one symbol with the other...
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
Create PROCEDURE [dbo].[mask_data]
AS
SET NOCOUNT ON
DECLARE @newVARCHAR(11)
DECLARE @oldVARCHAR(11)
DECLARE @col varchar(10)
DECLARE @get cursor
Set @get = CURSOR FOR
SELECT col1, col2
FROM table
order by col1
OPEN @get
FETCH NEXT FROM @get INTO @icol,@old
WHILE (@@FETCH_STATUS <> -1)
BEGIN
select something
from something
UPDATE table
SET col2 = @new
where col1 = @col
FETCH NEXT FROM @get INTO @col, @old
-- END
END
CLOSE @get
DEALLOCATE @get
I would like to may be optimize it, if possible
To see what is going on there i am running this-
SELECT execution_count, st.text
FROM sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
WHERE st.text like '%FETCH NEXT FROM @get INTO%'
It does shows me that it looped 90,000 times ..which is total number of records for that clmn
BUT WHY, WHY in the output 'text' column it shows me that statement that was executed 90,000 times was 'CREATE PROCEDURE [dbo].[mask_data] AS SET NOCOUNT ON DECLARE @new VARCHAR(11) ......
Why 'CREATE' ?? why not 'execute'? DOes that means that, for example, execution plan has not been reused after it was created and executed first time?
Am i missing something? 🙂
December 6, 2011 at 8:33 am
I'm skipping the question completely, and going to what i think was the core issue...scrambling data.
I stopped as soonas I saw the cursor, because I know that unless you are passing specific values to a procedure ,or doing something with the metadata, like DDL operations, anything that is using a curosr can be replaced with a set based command instead.
It looks like you have a varchar(11) column you need masked, and it can probably be done with a CTE and an update in a single super quick set based operation instead.
what are the rules for the new value? any old combination of stirngs and numbers? an arbitrary value from a dictionary?
I'm rubbing my hands with glee, because if you can describe the expected results, I can throw an example out there to update your table in a signel shot with random data.
This is one of my favorite style of forum posts to help with.
Lowell
December 6, 2011 at 8:39 am
I knew, i will step on 'cursor killers' 🙂 🙂
Sure, thank you for the offer.
Need some time to modify script ( for security purposes ) before i can 'publish ' it.
December 6, 2011 at 8:41 am
SD1999 (12/6/2011)
I knew, i will step on 'cursor killers' 🙂 🙂Sure, thank you for the offer.
Need some time to modify script ( for security purposes ) before i can 'publish ' it.
yeah guilty as charged, we are almost as bad as a spelling nazi, I know 😀
Lowell
December 6, 2011 at 9:12 am
wait..it gets even worse 😀 , i have udf that being used in that cursor
so heres 2 queries to scramble users phone #
first is a function
second is that cursor's loop that utilizes that function
Oooookk, hit me ..show me please how BAD it is 🙂
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create function [dbo].[fn_phn_mask]
(
@Origtring as varchar(10),
@charstobereplaced as varchar(10),
@charsreplacingwith as varchar(10)
)
returns varchar(10)
as
begin
declare @output as varchar(10)
-- declare the return variable here
declare @pos as int
declare @index as int
declare @numtoreplace as varchar(1)
declare @replacementnum as varchar(1)
declare @rnums as varchar(10)
set @pos = 1
set @output = @Origtring
set @rnums = ''
while @pos<= len(@output)
begin
set @numtoreplace = substring(@output, @pos, 1)
-- look up the number to se if it shoul be replaced
set @index = Charindex(@numtoreplace, @charstobereplaced)
if @index=0
begin
set @replacementnum= @numtoreplace -- leave the number as it is
end
else
begin
-- get the number that will replace the original
set @replacementnum = substring(@charsreplacingwith, @index, 1)
end
select @output = stuff(@output, @pos, 1, @replacementnum)
set @pos = @pos+1
end
return @output
end
----------------------------------------
--------------------------------------
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
Create PROCEDURE [dbo].[P_mask_phn]
AS
SET NOCOUNT ON
DECLARE @new_phnVARCHAR(11)
DECLARE @old_phnVARCHAR(11)
DECLARE @icustomer varchar(10)
DECLARE @get_phn cursor
Set @get_phn = CURSOR FOR
SELECT icustomer,phn
FROM dbo.customer
order by icustomer
OPEN @get_phn
FETCH NEXT FROM @get_phn INTO @icustomer,@old_phn
WHILE (@@FETCH_STATUS <> -1)
BEGIN
SELECT @new_phn = SUBSTRING(@old_phn, 1, 1)+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 2, 1), '5496321708', '9876543210')+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 3, 1), '6985743021', '9876543210')+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 4, 1), '6745832912', '9876543210')+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 5, 1), '7854930216', '9876543210')+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 6, 1), '1896423057', '9876543210')+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 7, 1), '8756932014', '9876543210')+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 8, 1), '2814603975', '9876543210')+
DBO.fn_phn_mask(SUBSTRING(@old_phn, 9, 1), '7854930216', '9876543210')
UPDATE dbo.customer
SET phn = @new_phn
where icustomer = @icustomer
FETCH NEXT FROM @get_phn INTO @icustomer, @old_phn
END
CLOSE @get_phn
DEALLOCATE @get_phn
December 6, 2011 at 9:55 am
well, here's how I do it;
My assumption is the phone number string in question is all numeric with no dashes, but it's not hard to add logic to fiddle with that as well.
instead of the proc calling a function, it's just a straight update: Performance wise, this updated a bit more than a million rows in 16 seconds on my decent dev machine.
(1100011 row(s) affected)
CREATE TABLE #MyContactTable (ID int,name varchar(30),phone varchar(11))
INSERT INTO #MyContactTable
SELECT 8,'Dee','3333333333' UNION ALL
SELECT 9,'Monkey','3333333333' UNION ALL
SELECT 10,'Robot','9999999999' UNION ALL
SELECT 11,'Lizard','3333333333' UNION ALL
SELECT 12,'Lauray','9542179597' UNION ALL
SELECT 14,'Roberts','9548878712' UNION ALL
SELECT 15,'Maluroney','3333333333' UNION ALL
SELECT 18,'Jordan','9544548899' UNION ALL
SELECT 21,'Seinfield','3333333333' UNION ALL
SELECT 22,'Jackson','3333333333' UNION ALL
SELECT 23,'Mourning ','1111111111'
--now the randmization update.
UPDATE #MyContactTable
SET #MyContactTable.phone = RANDDATA.Random_String
FROM
(select #MyContactTable.ID,
--dashes are commented out if needed later.
Random_String =
substring(x,(abs(checksum(newid()))%10)+1,1) +
substring(x,(abs(checksum(newid()))%10)+1,1) +
substring(x,(abs(checksum(newid()))%10)+1,1) +
-- '-' +
substring(x,(abs(checksum(newid()))%10)+1,1) +
substring(x,(abs(checksum(newid()))%10)+1,1) +
substring(x,(abs(checksum(newid()))%10)+1,1) +
-- '-' +
substring(x,(abs(checksum(newid()))%10)+1,1) +
substring(x,(abs(checksum(newid()))%10)+1,1) +
substring(x,(abs(checksum(newid()))%10)+1,1) +
substring(x,(abs(checksum(newid()))%10)+1,1)
from (select x='0123456789') a
CROSS APPLY #MyContactTable --this gets us a random phone number for each primary key in the target table
) RANDDATA
WHERE #MyContactTable.ID = RANDDATA.ID
Lowell
December 6, 2011 at 11:55 am
One more condition here i forgot to mention...
the logic(algorithm) of number replacements should be followed as per below
we cant use some random number
the reason is appl talking to oracle as well and even though numbers are scrambled, they should
match in sql and oracle
December 6, 2011 at 11:58 am
What's the point of having obfuscated yet, 100% accurate data??? :blink:
December 6, 2011 at 12:25 pm
SD1999 (12/6/2011)
One more condition here i forgot to mention...the logic(algorithm) of number replacements should be followed as per below
we cant use some random number
the reason is appl talking to oracle as well and even though numbers are scrambled, they should
match in sql and oracle
still easy to fix; since an area code or exchange should not start with zero or one, you just tweak those two rand elements to return only 2-9 instead of 0-9; other than that, what is left to need a specific biz rule?
Lowell
December 6, 2011 at 1:00 pm
Thank you, i'll play with it.
those are not really phone numbers, it was just for example
no more rules, this is it
what and where do you think would be a performance or level of complexity gain be??
December 6, 2011 at 1:07 pm
SD1999 (12/6/2011)
Thank you, i'll play with it.those are not really phone numbers, it was just for example
no more rules, this is it
what and where do you think would be a performance or level of complexity gain be??
Good question;
I've got a script that has to obfuscate a specific version of a client database; while even the biggest tables only have 40K rows, i have to obfuscate hundreds of columns.
In that case, I pretty much have a script per table that needs to be obfuscated; all those scripts might take 3 minutes to run, tops. now, change that to be a hundred cursors, and it might take an hour or more to run instead.
So I have the ability to restore and obfuscate in a couple of minutes, vs the much slower cursor examples.
My snippets do stuff like replace all last names with HTML colors, so people become Jeff Brown, Jack Green, etc; replaicng urls, birthdays, other sensitive data.
Lowell
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply