July 7, 2010 at 8:41 am
I just realized I'm going about this all wrong. I don't need *random* numbers specifically. I just need numbers that are NOT the TaxID. Which means I can use a Tally table. And if I can use a Tally table, I don't need to re-create this silly table every single time I restore the database. I can just store it (or my random numbers table) in another DB after the initial generation. DOH!
BTW: For any who are interested, this is my code for generating non-repetitive random numbers. I noticed that there weren't a whole lot of solutions for this problem when I googled it.
DECLARE @StoreCnt int, @CntUp int, @NewTaxID varchar(10);
SELECT @StoreCnt = Count(StoreID) FROM dbo.tblStore;
SET @CntUp = 0;
CREATE TABLE dbo.#RandomNumbers (TaxID varchar(10));
WHILE @CntUp < (@StoreCnt + 1000)
BEGIN
SET @NewTaxID = '11-' + convert(varchar(7),(left(convert(bigint,RAND()*100000000000),7)));
IF (SELECT TaxID FROM dbo.#RandomNumbers WHERE TaxID = @NewTaxID) IS NULL
BEGIN
INSERT INTO dbo.#RandomNumbers (TaxID)
SELECT @NewTaxID;
END;
SET @CntUp = @CntUp + 1;
END;
Yeah, I'm making this harder than it needs to be, but I want to explore all options which is why I'm doing this to myself.
As far as the precon class I submitted for Orlando, it's "T-SQL 101" (for raw beginners who know nothing about SQL coding). I'm really good at teaching the intro stuff even if sometimes the advanced stuff slips by me.
And now that my focus has changed, I'm going back to look once again at everyone's suggestions to see how I can fit that into what I'm doing.
July 7, 2010 at 8:44 am
Eugene Elutin (7/6/2010)
Post #945638 explains how you can generate random number using NEWID() function without using FranchiseID column to randomize.It doesn't uses FranchiseID in the TaxID also.
Combine hints form two posts together and you will get a solution wihtout using a cursor. What is wrong with it?
My issue with NEWID() is the resulting datatype (uniqueidentifier) and the fact that it includes letters in it. It doesn't fit with what should be an int result.
And yes, I know TaxID as stated is a varchar(), but it's only that way for the hyphens that are usually included. There shouldn't be any letters.
July 12, 2010 at 5:31 am
Hi, I was on holiday, so couldn't reply earlier.
Brandie Tarvin (7/7/2010)
Eugene Elutin (7/6/2010)
Post #945638 explains how you can generate random number using NEWID() function without using FranchiseID column to randomize. ...My issue with NEWID() is the resulting datatype (uniqueidentifier) and the fact that it includes letters in it. It doesn't fit with what should be an int result...
Sorry, but your real issue is not reading answers carefully and not trying to run given samples of code.
NEWID() function indeed returns uniqueidentifier and it does include letters, however post #945638 explains how it can be used to generate quite unique value which only contain digits. I will show it again here:
select RIGHT(CAST(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT) AS VARCHAR(30)),9)
Now you can see how NEWID() function can be used to generate random numbers. Unlike RAND(), it will generate unique random numbers for each raw affected by select query.
July 16, 2010 at 5:38 am
Eugene Elutin (7/12/2010)
Sorry, but your real issue is not reading answers carefully and not trying to run given samples of code.
And perhaps you should test your own code too?
Yes, I admit I missed a few things in this thread. I'm a speed reader, which tends to bite me in the posterior when reading technical stuff. I didn't see the BigInt conversion because I was busy looking at the NewID() function.
But I have tested your solution both using a normal update and using a Tally Table. In both methods, it still repeats numbers. So, I'm back where I was with needing unique, non-repeating numbers.
I appear to have a solution to the problem of repetition, however, that I'm working on right now and it does appear avoid the issue of using the Loop, Cursor, and CTE.
July 16, 2010 at 6:19 am
Brandie Tarvin (7/16/2010)
...But I have tested your solution both using a normal update and using a Tally Table. In both methods, it still repeats numbers. So, I'm back where I was with needing unique, non-repeating numbers.
...
Dear Brandie,
Solution provided in Post #945594 does work fine it does generate unique random numbers, then looks for duplicates and regenerate FedTaxID for them. However it did used RAND() function in expression toghether with FranchiseID. You said you don't want to use FranchiseID when generating number, which is a strange requirement, as it used as kind of "seed" for RAND() function and it doesn't mean that result random number will look like it. Ok, I have advised to use NEWID(), here was my mistake, as use of RAND() together with its "seed" could not be just replaced with NEWID(). If done so, generated numbers will be unique for each record in Franchise table, but you need them to be the same for a FranchiseID. Therefore using NEWID() function will require some additonal grouping...
here is a complete query (please notice how NEWID() gets converted to digits...):
UPDATE Store SET FedTaxID = FI.NewFTID
FROM Store S
JOIN Franchise F
ON F.StoreID = S.StoreID
JOIN (SELECT FranchiseID, MAX(LEFT(ABS(CHECKSUM(NEWiD())), 9)) AS NewFTID
FROM Franchise
GROUP BY FranchiseID) FI
ON FI.FranchiseID = F.FranchiseID
-- CHECK IF DUPLICATE FedTaxID EXISTS FOR DIFFERENT FranchiseID
WHILE EXISTS(
SELECT S.FedTaxID
FROM Store S
JOIN Franchise F
ON F.StoreID = S.StoreID
GROUP BY S.FedTaxID
HAVING COUNT(DISTINCT F.FranchiseID) > 1
)
BEGIN
-- REGENERATE FedTaxID WHERE DUPLICATES FOUND
UPDATE Store SET FedTaxID = FI.NewFTID
FROM Store S
JOIN Franchise F
ON F.StoreID = S.StoreID
JOIN (SELECT FranchiseID, MAX(LEFT(ABS(CHECKSUM(NEWID())), 9)) AS NewFTID
FROM Franchise
GROUP BY FranchiseID) FI
ON FI.FranchiseID = F.FranchiseID
WHERE EXISTS (SELECT NULL
FROM Store S1
JOIN Franchise F1
ON F1.StoreID = S1.StoreID
WHERE S1.FedTaxID = S.FedTaxID
GROUP BY S1.FedTaxID
HAVING COUNT(DISTINCT F1.FranchiseID) > 1
)
END
July 16, 2010 at 6:54 am
Just an FYI:
Earlier, I was seeing 3 duplicate numbers when running your original NewID() solution. Repeated runnings seems to generate between 0 - 5 duplicate numbers with no discernible pattern to the repeats or non-repeats.
I haven't run the code more than 20 times, but I have to wonder what the max number of duplicates will end up being after more repeated runnings. During investigating this whole issue, I've found a number of interesting issues that have absolutely no bearing on my current situation but gives me fodder for future research. @=)
July 16, 2010 at 7:22 am
Eugene Elutin (6/30/2010)
Actually, recursive CTE is not far from RBAR (can be even worse in performance)...
Not strictly true. A good example of RBAR would be performing a running totals update using a cursor. What makes the scalable and much faster "Quirky Update" method "set-based" other than the simple fact it can be written (and processed) as a single UPDATE statement? How fast does a single-statement running totals rCTE have to be before it slips out of the RBAR camp? Run the following code and prepare to be surprised.
DROP TABLE #Numbers
SELECT TOP 100000
n = ROW_NUMBER() OVER (ORDER BY a.name),
CalcValue = CAST(NULL AS BIGINT)
INTO #Numbers
FROM master.dbo.syscolumns a, master.dbo.syscolumns b
CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- 'Quirky' update
DECLARE @Lastval INT = 0, @CalcValue BIGINT
UPDATE #Numbers SET
@CalcValue = CalcValue = (@Lastval + n),
@Lastval = n
-- (100000 row(s) affected) / CPU time = 454 ms, elapsed time = 526 ms.
-- Table #Numbers... Scan count 1, logical reads 314, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Recursive CTE
;WITH Calculator (n, CalcValue) AS (
SELECT n.n,
CalcValue = CAST(n.n AS BIGINT)
FROM #Numbers n
WHERE n.n = 1
UNION ALL
SELECT n.n,
CalcValue = n.n + c.n
FROM #Numbers n
INNER JOIN Calculator c ON c.n + 1 = n.n -- nice
)
SELECT n, CalcValue
FROM Calculator
OPTION (MAXRECURSION 0)
-- (100000 row(s) affected) / CPU time = 3203 ms, elapsed time = 3483 ms.
-- Table 'Worktable'. Scan count 2, logical reads 600001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SET STATISTICS IO Off
SET STATISTICS TIME Off
@Brandie:-
Why do the new values have to be randomised?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 16, 2010 at 8:06 am
Chris Morris-439714 (7/16/2010)
Eugene Elutin (6/30/2010)
Actually, recursive CTE is not far from RBAR (can be even worse in performance)...Not strictly true. A good example of RBAR would be performing a running totals update using a cursor. What makes the scalable and much faster "Quirky Update" method "set-based" other than the simple fact it can be written (and processed) as a single UPDATE statement?
That's what I thought. And that's what I was originally going for. Thank you for verifying that fact for me, Chris.
Chris Morris-439714 (7/16/2010)
@Brandie:-Why do the new values have to be randomised?
1) There is a school of thought that thinks random numbers are less likely to be converted to "real" TaxIDs. Yeah, there's a debate behind this that we could get into, but allow me to now refer you to reason # 2.
2) The boss said so.
Personally, if I'm going to argue a point with someone about the workability of solutions, the random / not random argument is at the bottom of my list. I've got bigger fish to fry, if you know what I mean. Boss wants random? Boss gets random.
The upside of this whole thread is I got a brand new article for site publication out of it. @=) Eugene gets mentioned for providing me with the seed of my final solution. And so does Jeff Moden because, well, it's Jeff. @=)
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply