CTE - Avoiding the cursor

Question

Post reply

CTE - Avoiding the cursor

Brandie Tarvin

SSC Guru

Points: 173081
More actions
June 30, 2010 at 8:37 am

#226280

I've never used a CTE for an update before. I'm hoping I can use it in this situation.
We have a database to be restored down to a dev environment and during the restore, we'll need to scrub PII (Personally Identifiable Information). Here's the set up (all data is faked).
Create Table Store (StoreID int NOT NULL, FedTaxID char(9) NOT NULL);
GO
Create Table Franchise (FranchiseID int NOT NULL, StoreID int NOT NULL);
GO
Insert into Store (StoreID, FedTaxID)
(Select 1234, 115588779
UNION
Select 2564, 998822546
UNION
Select 3258, 851234567
UNION
Select 5582, 641122553);
Insert into Franchise (FranchiseID, StoreID)
(Select 1, 1234
UNION
Select 2, 2564
UNION
Select 3, 3258
UNION
Select 3, 5582);
In this case, the goal is to randomize the FedTaxID column. In other cases, I'd be randomizing addresses, etc. But the caveat is that all Stores within the same Franchise MUST have the same randomized FedTaxID. So if Store 3258 gets a randomized number of 123456789, then Store 5582 must have 123456789 as it's FedTaxID, but neither 1234 nor 2564 should have that number.
Before I found this information out, I was just randomizing every number with the below cursor:
Declare @EntityID int;
DECLARE TaxIDWrite CURSOR FOR SELECT StoreID FROM dbo.Store;
OPEN TaxIDWrite;
FETCH NEXT FROM TaxIDWrite INTO @EntityID;
WHILE @@fetch_status = 0
BEGIN
UPDATE dbo.Store
SET TaxID = '11' + convert(varchar(7),(left(convert(bigint,RAND()*100000000000),7)))
WHERE StoreID = @EntityID;
FETCH NEXT FROM TaxIDWrite INTO @EntityID;
END
CLOSE TaxIDWrite;
DEALLOCATE TaxIDWrite;
GO
Now, I'm trying to figure out if I can use a CTE for matching, but my brain falls apart when writing the CTE.
WITH TestCode (StoreID, FranchiseID) AS
(Select s.StoreID, f.FranchiseID
From dbo.Store s
join dbo.Franchise f
on s.StoreID = f.StoreID
join (Select FranchiseID
from dbo.Franchise
Group By FranchiseID
Having Count(FranchiseID) > 1) f1
on f.FranchiseID = f1.FranchiseID
UNION
Select s.StoreID, f.FranchiseID
From dbo.Store s
join dbo.Franchise f
on s.StoreID = f.StoreID
join (Select FranchiseID
from dbo.Franchise
Group By FranchiseID
Having Count(FranchiseID) > 1) f1
on f.FranchiseID = f1.FranchiseID
join TestCode tc
on f.FranchiseID = tc.FranchiseID)
--This is where I'm stuck.
I know there should be a UNION statement and a recursion back to the FranchiseID, but I'm not sure if I'm doing it correctly. And then I have no idea how to write the UPDATE statement after the CTE. I'm hoping there's a way to do this so I can have SET based code in this update instead of RBAR.
Am I fighting a losing battle? Should I just stick to the cursor?
Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 1

Actually, recursive CTE is not far from RBAR (can be even worse in performance)...

What you need can be achieved by a single update:

UPDATE Store SET FedTaxID = CAST(RAND(F.FranchiseID) * 1000000000 as int)

FROM Store S

JOIN Franchise F ON F.StoreID = S.StoreID

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Brandie Tarvin SSC Guru Points: 173081 More actions · Answer 2

Thanks, but your solution won't work. Even though my supplied data set only has one paired team, my actual database has multiple FranchiseIDs with multiple StoreIDs. Your update will make all of them the same. I need each FranchiseID to have a different TaxID from the others.

Example:

FranchiseID StoreID

1 1234

2 2564

3 3258

3 5582

4 9985

4 8672

4 3421

5 6666

5 5555

6 7777

Your solution would force all of these to have the same randomized number. 1,2 and 6 need to have different numbers. All the storeIDs under 3 have to be the same, but different from 1,2,4,5 & 6. Etc. [EDIT: I already tried the simple update statement, sans cursor. That's how I know this will happen.]

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Dwayne Dibley SSCrazy Points: 2350 More actions · Answer 3

Perhaps you could do this by breaking it into steps. First generate the Tax ID's per Franchise and then run an update.

How about something along the lines of the following pseudo code:

CREATE TABLE #TIDS(FranchiseID int PRIMARY KEY CLUSTERED,TaxID int)

INSERT INTO #TIDS

SELECT DISTINCT FranchiseID,CAST(RAND(FranchiseID) * 1000000000 as int)

UPDATE Store SET FedTaxID = TaxID

FROM Store S

JOIN Franchise F ON F.StoreID = S.StoreID

JOIN #TIDS t ON f.FranchiseID = t.FranchiseID

I'm not at my SQL machine so apologies for syntax errors.

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 4

Brandie Tarvin (6/30/2010)
Thanks, but your solution won't work. Even though my supplied data set only has one paired team, my actual database has multiple FranchiseIDs with multiple StoreIDs. Your update will make all of them the same. I need each FranchiseID to have a different TaxID from the others.
Example:
FranchiseID StoreID
1 1234
2 2564
3 3258
3 5582
4 9985
4 8672
4 3421
5 6666
5 5555
6 7777
Your solution would force all of these to have the same randomized number. 1,2 and 6 need to have different numbers. All the storeIDs under 3 have to be the same, but different from 1,2,4,5 & 6. Etc. [EDIT: I already tried the simple update statement, sans cursor. That's how I know this will happen.]

1. You better try my update before saying it will not work. I have setup additional test data as:

Insert into Store (StoreID, FedTaxID)

(Select 1234, 115588779

UNION

Select 2564, 998822546

UNION

Select 3258, 851234567

UNION

Select 5582, 641122553

UNION

Select 9985, 641122553

UNION

Select 8672, 641122553

UNION

Select 3421, 641122553

UNION

Select 7777, 641122553

);

Insert into Franchise (FranchiseID, StoreID)

(Select 1, 1234

UNION

Select 2, 2564

UNION

Select 3, 3258

UNION

Select 3, 5582

UNION

Select 4, 9985

UNION

Select 4, 8672

UNION

Select 4, 3421

UNION

Select 6, 7777

);

Update I gave you does work fine.

Now, the following update will make the numbers look even more random:

UPDATE Store SET FedTaxID = FI.NewFTID

FROM Store S

JOIN Franchise F

ON F.StoreID = S.StoreID

JOIN (SELECT FranchiseID, RIGHT(CAST(CAST(RAND() * FranchiseID AS DECIMAL(28,9)) as VARCHAR(30)),9) AS NewFTID

FROM Franchise) FI

ON FI.FranchiseID = F.FranchiseID

And the last thing.

In order to guarantee that the generated random numer is unique, you will need to check if this number was not already used. I don't think you can do it in set-based update (with CTE or without)

What I would suggest in this case, after the first unconditional update, do update in a loop untill dupplicate FedTaxID are ellimnated (if any). Something like:

WHILE EXISTS(

SELECT NULL

FROM Store S

JOIN Franchise F

ON F.StoreID = S.StoreID

GROUP BY S.FedTaxID

HAVING COUNT(DISTINCT F.FranchiseID) > 1

)

BEGIN

UPDATE Store SET FedTaxID = FI.NewFTID

FROM Store S

JOIN Franchise F

ON F.StoreID = S.StoreID

JOIN (SELECT FranchiseID, RIGHT(CAST(CAST(RAND() * FranchiseID AS DECIMAL(28,9)) as VARCHAR(30)),9) AS NewFTID

FROM Franchise) FI

ON FI.FranchiseID = F.FranchiseID

WHERE EXISTS (SELECT NULL

FROM Store S1

JOIN Franchise F1

ON F1.StoreID = S1.StoreID

WHERE S1.FedTaxID = S.FedTaxID

GROUP BY S1.FedTaxID

HAVING COUNT(DISTINCT F1.FranchiseID) > 1

)

END

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Dwayne Dibley SSCrazy Points: 2350 More actions · Answer 5

Thinking about this a bit more, you do not need a truly random number for your purpose. As this is for test data, could you not just use StoreID * 1000 as your tax reference? It would make checking your data simpler. From an educational point of view it may be a worthy exercise to generate the random numbers in a set based manner, but for your purpose, I would go for the simpler approach. The resulting Update statement would also be simpler.

Brandie Tarvin SSC Guru Points: 173081 More actions · Answer 6

Eugene Elutin (6/30/2010)
1. You better try my update before saying it will not work.

You're correct. I didn't catch that you were using FranchiseID in that number. I skimmed over the random calc and assumed it was set up like my own. My apologies.

I don't think I can use FranchiseID or StoreID in my newly generated TaxID. And since I'll be using this technique for other PII (address information, etc), I'd like a technique that doesn't rely on a specific column in the randomization. For this particular situation, it's something to look into though.

Still, as an exercise in educating myself on advanced T-SQL coding, I'd like to figure out all the options that are available, so if anyone else has any suggestions, please let me know.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 7

Brandie Tarvin (6/30/2010)
...
I don't think I can use FranchiseID or StoreID in my newly generated TaxID. And since I'll be using this technique for other PII (address information, etc), I'd like a technique that doesn't rely on a specific column in the randomization. For this particular situation, it's something to look into though.
...

You can use NEWID() function:

-- replace in places where RAND() was used

...,

RIGHT(CAST(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT) AS VARCHAR(30)),9) AS NewFTID

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Brandie Tarvin SSC Guru Points: 173081 More actions · Answer 8

Thanks, everyone. I appreciate your assistance. Looks like I'm stuck with cursors still, but I'm well on my way to a solution.

Soon as I verify it, I'll write up an article for the site.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

WayneS SSC Guru Points: 95461 More actions · Answer 9

Brandie Tarvin (7/1/2010)
Looks like I'm stuck with cursors still, but I'm well on my way to a solution.

Why are you still stuck with cursors? What else needs to be done here?

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

Brandie Tarvin (7/1/2010)
Thanks, everyone. I appreciate your assistance. Looks like I'm stuck with cursors still, but I'm well on my way to a solution.
Soon as I verify it, I'll write up an article for the site.

Actually - why a cursor? Sounds like you need to extract a unique list of FranchiseID's, assign a random FedtaxID to each, then update the original table with the randomized ones.

select distinct franchiseID, cast(rand(checksum(newID()))*1000000 as varchar(10)) RandFedTaxID

into scrubtable

from franchise

update franchise

set oldFFedTaxID=RandFedTaxID

from franchise join scrubtable on franchise.franchiseID=scrubtable .franchiseID

drop scrub table

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 11

I've been advising the same all the way long...

But, for some reason, he loves a cursor 😀

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Brandie Tarvin SSC Guru Points: 173081 More actions · Answer 12

First of all, "she", not he. @=)

Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.

Lastly, I can't use StoreID or FranchiseID in the TaxID.

If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 13

Brandie Tarvin (7/6/2010)
First of all, "she", not he. @=)
Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.
Lastly, I can't use StoreID or FranchiseID in the TaxID.
If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.

Sorry Brandie,

Could you please check the post #945596 again (one of my prev. replies on this thread), you will find the bit where it describes how to avoid duplicates whithout a cursor...

Post #945638 explains how you can generate random number using NEWID() function without using FranchiseID column to randomize.

It doesn't uses FranchiseID in the TaxID also.

Combine hints form two posts together and you will get a solution wihtout using a cursor. What is wrong with it?

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

Brandie Tarvin (7/6/2010)
First of all, "she", not he. @=)
Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.
Lastly, I can't use StoreID or FranchiseID in the TaxID.
If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.

1) You can't use StoreID or FranchiseID in the TaxID, but can you use them to seed a HASHBYTES call which will get a repeating value that is 'obfuscated'?

Off-Topic:

2) Reviewing your website links you are not just a "she" but a very attractive (and interesting!) one. 😉 I loved the Life, Love, and Sex article - very thought provoking!

3) What is the precon you submitted for Orlando? I hope to get picked to present there and get to meet you in October!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service