June 30, 2010 at 8:37 am
I've never used a CTE for an update before. I'm hoping I can use it in this situation.
We have a database to be restored down to a dev environment and during the restore, we'll need to scrub PII (Personally Identifiable Information). Here's the set up (all data is faked).
Create Table Store (StoreID int NOT NULL, FedTaxID char(9) NOT NULL);
GO
Create Table Franchise (FranchiseID int NOT NULL, StoreID int NOT NULL);
GO
Insert into Store (StoreID, FedTaxID)
(Select 1234, 115588779
UNION
Select 2564, 998822546
UNION
Select 3258, 851234567
UNION
Select 5582, 641122553);
Insert into Franchise (FranchiseID, StoreID)
(Select 1, 1234
UNION
Select 2, 2564
UNION
Select 3, 3258
UNION
Select 3, 5582);
In this case, the goal is to randomize the FedTaxID column. In other cases, I'd be randomizing addresses, etc. But the caveat is that all Stores within the same Franchise MUST have the same randomized FedTaxID. So if Store 3258 gets a randomized number of 123456789, then Store 5582 must have 123456789 as it's FedTaxID, but neither 1234 nor 2564 should have that number.
Before I found this information out, I was just randomizing every number with the below cursor:
Declare @EntityID int;
DECLARE TaxIDWrite CURSOR FOR SELECT StoreID FROM dbo.Store;
OPEN TaxIDWrite;
FETCH NEXT FROM TaxIDWrite INTO @EntityID;
WHILE @@fetch_status = 0
BEGIN
UPDATE dbo.Store
SET TaxID = '11' + convert(varchar(7),(left(convert(bigint,RAND()*100000000000),7)))
WHERE StoreID = @EntityID;
FETCH NEXT FROM TaxIDWrite INTO @EntityID;
END
CLOSE TaxIDWrite;
DEALLOCATE TaxIDWrite;
GO
Now, I'm trying to figure out if I can use a CTE for matching, but my brain falls apart when writing the CTE.
WITH TestCode (StoreID, FranchiseID) AS
(Select s.StoreID, f.FranchiseID
From dbo.Store s
join dbo.Franchise f
on s.StoreID = f.StoreID
join (Select FranchiseID
from dbo.Franchise
Group By FranchiseID
Having Count(FranchiseID) > 1) f1
on f.FranchiseID = f1.FranchiseID
UNION
Select s.StoreID, f.FranchiseID
From dbo.Store s
join dbo.Franchise f
on s.StoreID = f.StoreID
join (Select FranchiseID
from dbo.Franchise
Group By FranchiseID
Having Count(FranchiseID) > 1) f1
on f.FranchiseID = f1.FranchiseID
join TestCode tc
on f.FranchiseID = tc.FranchiseID)
--This is where I'm stuck.
I know there should be a UNION statement and a recursion back to the FranchiseID, but I'm not sure if I'm doing it correctly. And then I have no idea how to write the UPDATE statement after the CTE. I'm hoping there's a way to do this so I can have SET based code in this update instead of RBAR.
Am I fighting a losing battle? Should I just stick to the cursor?
June 30, 2010 at 8:52 am
Actually, recursive CTE is not far from RBAR (can be even worse in performance)...
What you need can be achieved by a single update:
UPDATE Store SET FedTaxID = CAST(RAND(F.FranchiseID) * 1000000000 as int)
FROM Store S
JOIN Franchise F ON F.StoreID = S.StoreID
June 30, 2010 at 8:57 am
Thanks, but your solution won't work. Even though my supplied data set only has one paired team, my actual database has multiple FranchiseIDs with multiple StoreIDs. Your update will make all of them the same. I need each FranchiseID to have a different TaxID from the others.
Example:
FranchiseID StoreID
1 1234
2 2564
3 3258
3 5582
4 9985
4 8672
4 3421
5 6666
5 5555
6 7777
Your solution would force all of these to have the same randomized number. 1,2 and 6 need to have different numbers. All the storeIDs under 3 have to be the same, but different from 1,2,4,5 & 6. Etc. [EDIT: I already tried the simple update statement, sans cursor. That's how I know this will happen.]
June 30, 2010 at 9:21 am
Perhaps you could do this by breaking it into steps. First generate the Tax ID's per Franchise and then run an update.
How about something along the lines of the following pseudo code:
CREATE TABLE #TIDS(FranchiseID int PRIMARY KEY CLUSTERED,TaxID int)
INSERT INTO #TIDS
SELECT DISTINCT FranchiseID,CAST(RAND(FranchiseID) * 1000000000 as int)
UPDATE Store SET FedTaxID = TaxID
FROM Store S
JOIN Franchise F ON F.StoreID = S.StoreID
JOIN #TIDS t ON f.FranchiseID = t.FranchiseID
I'm not at my SQL machine so apologies for syntax errors.
June 30, 2010 at 9:22 am
Brandie Tarvin (6/30/2010)
Thanks, but your solution won't work. Even though my supplied data set only has one paired team, my actual database has multiple FranchiseIDs with multiple StoreIDs. Your update will make all of them the same. I need each FranchiseID to have a different TaxID from the others.Example:
FranchiseID StoreID
1 1234
2 2564
3 3258
3 5582
4 9985
4 8672
4 3421
5 6666
5 5555
6 7777
Your solution would force all of these to have the same randomized number. 1,2 and 6 need to have different numbers. All the storeIDs under 3 have to be the same, but different from 1,2,4,5 & 6. Etc. [EDIT: I already tried the simple update statement, sans cursor. That's how I know this will happen.]
1. You better try my update before saying it will not work. I have setup additional test data as:
Insert into Store (StoreID, FedTaxID)
(Select 1234, 115588779
UNION
Select 2564, 998822546
UNION
Select 3258, 851234567
UNION
Select 5582, 641122553
UNION
Select 9985, 641122553
UNION
Select 8672, 641122553
UNION
Select 3421, 641122553
UNION
Select 7777, 641122553
);
Insert into Franchise (FranchiseID, StoreID)
(Select 1, 1234
UNION
Select 2, 2564
UNION
Select 3, 3258
UNION
Select 3, 5582
UNION
Select 4, 9985
UNION
Select 4, 8672
UNION
Select 4, 3421
UNION
Select 6, 7777
);
Update I gave you does work fine.
Now, the following update will make the numbers look even more random:
UPDATE Store SET FedTaxID = FI.NewFTID
FROM Store S
JOIN Franchise F
ON F.StoreID = S.StoreID
JOIN (SELECT FranchiseID, RIGHT(CAST(CAST(RAND() * FranchiseID AS DECIMAL(28,9)) as VARCHAR(30)),9) AS NewFTID
FROM Franchise) FI
ON FI.FranchiseID = F.FranchiseID
And the last thing.
In order to guarantee that the generated random numer is unique, you will need to check if this number was not already used. I don't think you can do it in set-based update (with CTE or without)
What I would suggest in this case, after the first unconditional update, do update in a loop untill dupplicate FedTaxID are ellimnated (if any). Something like:
WHILE EXISTS(
SELECT NULL
FROM Store S
JOIN Franchise F
ON F.StoreID = S.StoreID
GROUP BY S.FedTaxID
HAVING COUNT(DISTINCT F.FranchiseID) > 1
)
BEGIN
UPDATE Store SET FedTaxID = FI.NewFTID
FROM Store S
JOIN Franchise F
ON F.StoreID = S.StoreID
JOIN (SELECT FranchiseID, RIGHT(CAST(CAST(RAND() * FranchiseID AS DECIMAL(28,9)) as VARCHAR(30)),9) AS NewFTID
FROM Franchise) FI
ON FI.FranchiseID = F.FranchiseID
WHERE EXISTS (SELECT NULL
FROM Store S1
JOIN Franchise F1
ON F1.StoreID = S1.StoreID
WHERE S1.FedTaxID = S.FedTaxID
GROUP BY S1.FedTaxID
HAVING COUNT(DISTINCT F1.FranchiseID) > 1
)
END
June 30, 2010 at 9:52 am
Thinking about this a bit more, you do not need a truly random number for your purpose. As this is for test data, could you not just use StoreID * 1000 as your tax reference? It would make checking your data simpler. From an educational point of view it may be a worthy exercise to generate the random numbers in a set based manner, but for your purpose, I would go for the simpler approach. The resulting Update statement would also be simpler.
June 30, 2010 at 10:12 am
Eugene Elutin (6/30/2010)
1. You better try my update before saying it will not work.
You're correct. I didn't catch that you were using FranchiseID in that number. I skimmed over the random calc and assumed it was set up like my own. My apologies.
I don't think I can use FranchiseID or StoreID in my newly generated TaxID. And since I'll be using this technique for other PII (address information, etc), I'd like a technique that doesn't rely on a specific column in the randomization. For this particular situation, it's something to look into though.
Still, as an exercise in educating myself on advanced T-SQL coding, I'd like to figure out all the options that are available, so if anyone else has any suggestions, please let me know.
June 30, 2010 at 10:26 am
Brandie Tarvin (6/30/2010)
...I don't think I can use FranchiseID or StoreID in my newly generated TaxID. And since I'll be using this technique for other PII (address information, etc), I'd like a technique that doesn't rely on a specific column in the randomization. For this particular situation, it's something to look into though.
...
You can use NEWID() function:
-- replace in places where RAND() was used
...,
RIGHT(CAST(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT) AS VARCHAR(30)),9) AS NewFTID
July 1, 2010 at 10:17 am
Thanks, everyone. I appreciate your assistance. Looks like I'm stuck with cursors still, but I'm well on my way to a solution.
Soon as I verify it, I'll write up an article for the site.
July 1, 2010 at 11:02 am
Brandie Tarvin (7/1/2010)
Looks like I'm stuck with cursors still, but I'm well on my way to a solution.
Why are you still stuck with cursors? What else needs to be done here?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 1, 2010 at 11:27 am
Brandie Tarvin (7/1/2010)
Thanks, everyone. I appreciate your assistance. Looks like I'm stuck with cursors still, but I'm well on my way to a solution.Soon as I verify it, I'll write up an article for the site.
Actually - why a cursor? Sounds like you need to extract a unique list of FranchiseID's, assign a random FedtaxID to each, then update the original table with the randomized ones.
select distinct franchiseID, cast(rand(checksum(newID()))*1000000 as varchar(10)) RandFedTaxID
into scrubtable
from franchise
update franchise
set oldFFedTaxID=RandFedTaxID
from franchise join scrubtable on franchise.franchiseID=scrubtable .franchiseID
drop scrub table
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 1, 2010 at 12:51 pm
I've been advising the same all the way long...
But, for some reason, he loves a cursor 😀
July 6, 2010 at 11:33 am
First of all, "she", not he. @=)
Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.
Lastly, I can't use StoreID or FranchiseID in the TaxID.
If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.
July 6, 2010 at 12:23 pm
Brandie Tarvin (7/6/2010)
First of all, "she", not he. @=)Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.
Lastly, I can't use StoreID or FranchiseID in the TaxID.
If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.
Sorry Brandie,
Could you please check the post #945596 again (one of my prev. replies on this thread), you will find the bit where it describes how to avoid duplicates whithout a cursor...
Post #945638 explains how you can generate random number using NEWID() function without using FranchiseID column to randomize.
It doesn't uses FranchiseID in the TaxID also.
Combine hints form two posts together and you will get a solution wihtout using a cursor. What is wrong with it?
July 7, 2010 at 7:51 am
Brandie Tarvin (7/6/2010)
First of all, "she", not he. @=)Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.
Lastly, I can't use StoreID or FranchiseID in the TaxID.
If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.
1) You can't use StoreID or FranchiseID in the TaxID, but can you use them to seed a HASHBYTES call which will get a repeating value that is 'obfuscated'?
Off-Topic:
2) Reviewing your website links you are not just a "she" but a very attractive (and interesting!) one. 😉 I loved the Life, Love, and Sex article - very thought provoking!
3) What is the precon you submitted for Orlando? I hope to get picked to present there and get to meet you in October!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply