CTE - Avoiding the cursor

  • I've never used a CTE for an update before. I'm hoping I can use it in this situation.

    We have a database to be restored down to a dev environment and during the restore, we'll need to scrub PII (Personally Identifiable Information). Here's the set up (all data is faked).

    Create Table Store (StoreID int NOT NULL, FedTaxID char(9) NOT NULL);

    GO

    Create Table Franchise (FranchiseID int NOT NULL, StoreID int NOT NULL);

    GO

    Insert into Store (StoreID, FedTaxID)

    (Select 1234, 115588779

    UNION

    Select 2564, 998822546

    UNION

    Select 3258, 851234567

    UNION

    Select 5582, 641122553);

    Insert into Franchise (FranchiseID, StoreID)

    (Select 1, 1234

    UNION

    Select 2, 2564

    UNION

    Select 3, 3258

    UNION

    Select 3, 5582);

    In this case, the goal is to randomize the FedTaxID column. In other cases, I'd be randomizing addresses, etc. But the caveat is that all Stores within the same Franchise MUST have the same randomized FedTaxID. So if Store 3258 gets a randomized number of 123456789, then Store 5582 must have 123456789 as it's FedTaxID, but neither 1234 nor 2564 should have that number.

    Before I found this information out, I was just randomizing every number with the below cursor:

    Declare @EntityID int;

    DECLARE TaxIDWrite CURSOR FOR SELECT StoreID FROM dbo.Store;

    OPEN TaxIDWrite;

    FETCH NEXT FROM TaxIDWrite INTO @EntityID;

    WHILE @@fetch_status = 0

    BEGIN

    UPDATE dbo.Store

    SET TaxID = '11' + convert(varchar(7),(left(convert(bigint,RAND()*100000000000),7)))

    WHERE StoreID = @EntityID;

    FETCH NEXT FROM TaxIDWrite INTO @EntityID;

    END

    CLOSE TaxIDWrite;

    DEALLOCATE TaxIDWrite;

    GO

    Now, I'm trying to figure out if I can use a CTE for matching, but my brain falls apart when writing the CTE.

    WITH TestCode (StoreID, FranchiseID) AS

    (Select s.StoreID, f.FranchiseID

    From dbo.Store s

    join dbo.Franchise f

    on s.StoreID = f.StoreID

    join (Select FranchiseID

    from dbo.Franchise

    Group By FranchiseID

    Having Count(FranchiseID) > 1) f1

    on f.FranchiseID = f1.FranchiseID

    UNION

    Select s.StoreID, f.FranchiseID

    From dbo.Store s

    join dbo.Franchise f

    on s.StoreID = f.StoreID

    join (Select FranchiseID

    from dbo.Franchise

    Group By FranchiseID

    Having Count(FranchiseID) > 1) f1

    on f.FranchiseID = f1.FranchiseID

    join TestCode tc

    on f.FranchiseID = tc.FranchiseID)

    --This is where I'm stuck.

    I know there should be a UNION statement and a recursion back to the FranchiseID, but I'm not sure if I'm doing it correctly. And then I have no idea how to write the UPDATE statement after the CTE. I'm hoping there's a way to do this so I can have SET based code in this update instead of RBAR.

    Am I fighting a losing battle? Should I just stick to the cursor?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Actually, recursive CTE is not far from RBAR (can be even worse in performance)...

    What you need can be achieved by a single update:

    UPDATE Store SET FedTaxID = CAST(RAND(F.FranchiseID) * 1000000000 as int)

    FROM Store S

    JOIN Franchise F ON F.StoreID = S.StoreID

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks, but your solution won't work. Even though my supplied data set only has one paired team, my actual database has multiple FranchiseIDs with multiple StoreIDs. Your update will make all of them the same. I need each FranchiseID to have a different TaxID from the others.

    Example:

    FranchiseID StoreID

    1 1234

    2 2564

    3 3258

    3 5582

    4 9985

    4 8672

    4 3421

    5 6666

    5 5555

    6 7777

    Your solution would force all of these to have the same randomized number. 1,2 and 6 need to have different numbers. All the storeIDs under 3 have to be the same, but different from 1,2,4,5 & 6. Etc. [EDIT: I already tried the simple update statement, sans cursor. That's how I know this will happen.]

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Perhaps you could do this by breaking it into steps. First generate the Tax ID's per Franchise and then run an update.

    How about something along the lines of the following pseudo code:

    CREATE TABLE #TIDS(FranchiseID int PRIMARY KEY CLUSTERED,TaxID int)

    INSERT INTO #TIDS

    SELECT DISTINCT FranchiseID,CAST(RAND(FranchiseID) * 1000000000 as int)

    UPDATE Store SET FedTaxID = TaxID

    FROM Store S

    JOIN Franchise F ON F.StoreID = S.StoreID

    JOIN #TIDS t ON f.FranchiseID = t.FranchiseID

    I'm not at my SQL machine so apologies for syntax errors.

  • Brandie Tarvin (6/30/2010)


    Thanks, but your solution won't work. Even though my supplied data set only has one paired team, my actual database has multiple FranchiseIDs with multiple StoreIDs. Your update will make all of them the same. I need each FranchiseID to have a different TaxID from the others.

    Example:

    FranchiseID StoreID

    1 1234

    2 2564

    3 3258

    3 5582

    4 9985

    4 8672

    4 3421

    5 6666

    5 5555

    6 7777

    Your solution would force all of these to have the same randomized number. 1,2 and 6 need to have different numbers. All the storeIDs under 3 have to be the same, but different from 1,2,4,5 & 6. Etc. [EDIT: I already tried the simple update statement, sans cursor. That's how I know this will happen.]

    1. You better try my update before saying it will not work. I have setup additional test data as:

    Insert into Store (StoreID, FedTaxID)

    (Select 1234, 115588779

    UNION

    Select 2564, 998822546

    UNION

    Select 3258, 851234567

    UNION

    Select 5582, 641122553

    UNION

    Select 9985, 641122553

    UNION

    Select 8672, 641122553

    UNION

    Select 3421, 641122553

    UNION

    Select 7777, 641122553

    );

    Insert into Franchise (FranchiseID, StoreID)

    (Select 1, 1234

    UNION

    Select 2, 2564

    UNION

    Select 3, 3258

    UNION

    Select 3, 5582

    UNION

    Select 4, 9985

    UNION

    Select 4, 8672

    UNION

    Select 4, 3421

    UNION

    Select 6, 7777

    );

    Update I gave you does work fine.

    Now, the following update will make the numbers look even more random:

    UPDATE Store SET FedTaxID = FI.NewFTID

    FROM Store S

    JOIN Franchise F

    ON F.StoreID = S.StoreID

    JOIN (SELECT FranchiseID, RIGHT(CAST(CAST(RAND() * FranchiseID AS DECIMAL(28,9)) as VARCHAR(30)),9) AS NewFTID

    FROM Franchise) FI

    ON FI.FranchiseID = F.FranchiseID

    And the last thing.

    In order to guarantee that the generated random numer is unique, you will need to check if this number was not already used. I don't think you can do it in set-based update (with CTE or without)

    What I would suggest in this case, after the first unconditional update, do update in a loop untill dupplicate FedTaxID are ellimnated (if any). Something like:

    WHILE EXISTS(

    SELECT NULL

    FROM Store S

    JOIN Franchise F

    ON F.StoreID = S.StoreID

    GROUP BY S.FedTaxID

    HAVING COUNT(DISTINCT F.FranchiseID) > 1

    )

    BEGIN

    UPDATE Store SET FedTaxID = FI.NewFTID

    FROM Store S

    JOIN Franchise F

    ON F.StoreID = S.StoreID

    JOIN (SELECT FranchiseID, RIGHT(CAST(CAST(RAND() * FranchiseID AS DECIMAL(28,9)) as VARCHAR(30)),9) AS NewFTID

    FROM Franchise) FI

    ON FI.FranchiseID = F.FranchiseID

    WHERE EXISTS (SELECT NULL

    FROM Store S1

    JOIN Franchise F1

    ON F1.StoreID = S1.StoreID

    WHERE S1.FedTaxID = S.FedTaxID

    GROUP BY S1.FedTaxID

    HAVING COUNT(DISTINCT F1.FranchiseID) > 1

    )

    END

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thinking about this a bit more, you do not need a truly random number for your purpose. As this is for test data, could you not just use StoreID * 1000 as your tax reference? It would make checking your data simpler. From an educational point of view it may be a worthy exercise to generate the random numbers in a set based manner, but for your purpose, I would go for the simpler approach. The resulting Update statement would also be simpler.

  • Eugene Elutin (6/30/2010)


    1. You better try my update before saying it will not work.

    You're correct. I didn't catch that you were using FranchiseID in that number. I skimmed over the random calc and assumed it was set up like my own. My apologies.

    I don't think I can use FranchiseID or StoreID in my newly generated TaxID. And since I'll be using this technique for other PII (address information, etc), I'd like a technique that doesn't rely on a specific column in the randomization. For this particular situation, it's something to look into though.

    Still, as an exercise in educating myself on advanced T-SQL coding, I'd like to figure out all the options that are available, so if anyone else has any suggestions, please let me know.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (6/30/2010)


    ...

    I don't think I can use FranchiseID or StoreID in my newly generated TaxID. And since I'll be using this technique for other PII (address information, etc), I'd like a technique that doesn't rely on a specific column in the randomization. For this particular situation, it's something to look into though.

    ...

    You can use NEWID() function:

    -- replace in places where RAND() was used

    ...,

    RIGHT(CAST(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT) AS VARCHAR(30)),9) AS NewFTID

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks, everyone. I appreciate your assistance. Looks like I'm stuck with cursors still, but I'm well on my way to a solution.

    Soon as I verify it, I'll write up an article for the site.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (7/1/2010)


    Looks like I'm stuck with cursors still, but I'm well on my way to a solution.

    Why are you still stuck with cursors? What else needs to be done here?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Brandie Tarvin (7/1/2010)


    Thanks, everyone. I appreciate your assistance. Looks like I'm stuck with cursors still, but I'm well on my way to a solution.

    Soon as I verify it, I'll write up an article for the site.

    Actually - why a cursor? Sounds like you need to extract a unique list of FranchiseID's, assign a random FedtaxID to each, then update the original table with the randomized ones.

    select distinct franchiseID, cast(rand(checksum(newID()))*1000000 as varchar(10)) RandFedTaxID

    into scrubtable

    from franchise

    update franchise

    set oldFFedTaxID=RandFedTaxID

    from franchise join scrubtable on franchise.franchiseID=scrubtable .franchiseID

    drop scrub table

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I've been advising the same all the way long...

    But, for some reason, he loves a cursor 😀

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • First of all, "she", not he. @=)

    Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.

    Lastly, I can't use StoreID or FranchiseID in the TaxID.

    If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (7/6/2010)


    First of all, "she", not he. @=)

    Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.

    Lastly, I can't use StoreID or FranchiseID in the TaxID.

    If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.

    Sorry Brandie,

    Could you please check the post #945596 again (one of my prev. replies on this thread), you will find the bit where it describes how to avoid duplicates whithout a cursor...

    Post #945638 explains how you can generate random number using NEWID() function without using FranchiseID column to randomize.

    It doesn't uses FranchiseID in the TaxID also.

    Combine hints form two posts together and you will get a solution wihtout using a cursor. What is wrong with it?

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Brandie Tarvin (7/6/2010)


    First of all, "she", not he. @=)

    Secondly, I need random numbers that do not repeat. I'm not limited to a range, but I need to make sure they only repeat for those Stores with the same FranchiseID because our application will croak (even in Dev/Test) if I do repeat them. (EDIT: Application will also croak if I have stores under the same franchise that DON'T have the same TaxID). It's a business rules validation thing.

    Lastly, I can't use StoreID or FranchiseID in the TaxID.

    If you have a solution that meets these requirements, I would be happy to entertain it. I don't like using cursors when I can avoid it. I just can't seem to avoid it given my restrictions.

    1) You can't use StoreID or FranchiseID in the TaxID, but can you use them to seed a HASHBYTES call which will get a repeating value that is 'obfuscated'?

    Off-Topic:

    2) Reviewing your website links you are not just a "she" but a very attractive (and interesting!) one. 😉 I loved the Life, Love, and Sex article - very thought provoking!

    3) What is the precon you submitted for Orlando? I hope to get picked to present there and get to meet you in October!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply