October 8, 2008 at 5:32 pm
Howdy
I need to shuffle data within our production database so we can provide it as a test database to a customer purchasing our software.
I was hoping a statement such as below would help, but I just end up with the every row containing the exact same data.
update customers
set
firstname = (select top 1 firstname from customers order by newid()),
middlename = (select top 1 middlename from customers order by newid()),
lastname = (select top 1 lastname from customers order by newid()),
birthdate = (select top 1 birthdate from customers order by newid())
Any help would be greatly appreciated. I can solve it using cursors but we have far too much data for this to be a feasible solution.
Cheers,
Craig
October 8, 2008 at 7:34 pm
What is the primary key of the table?
Also, since this is your first post, you might want to take a look at the link in my signature below... folks will be more likey to help if you help them a bit.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 8, 2008 at 9:54 pm
Ok... I borrowed some data and assume that you don't actually have a primary key... this should give you and idea...
--drop table #shuffle
DECLARE @NameCountPlus1 INT
SELECT @NameCountPlus1 = COUNT(*)+1 FROM NorthWind.dbo.Employees
SELECT IDENTITY(INT,1,1) AS OriginalOrder,
LastName, FirstName, BirthDate,
LastName AS NewLastName,
FirstName AS NewFirstName,
BirthDate AS NewBirthDate,
ABS(CHECKSUM(NEWID()))%@NameCountPlus1 AS Order1,
ABS(CHECKSUM(NEWID()))%@NameCountPlus1 AS Order2,
ABS(CHECKSUM(NEWID()))%@NameCountPlus1 AS Order3
INTO #Shuffle
FROM NorthWind.dbo.Employees
UPDATE s2
SET NewLastName = s1.LastName
FROM #Shuffle s1, #Shuffle s2
WHERE s1.OriginalOrder = s2.Order1
UPDATE s2
SET NewFirstName = s1.FirstName
FROM #Shuffle s1, #Shuffle s2
WHERE s1.OriginalOrder = s2.Order2
UPDATE s2
SET NewBirthDate = s1.BirthDate
FROM #Shuffle s1, #Shuffle s2
WHERE s1.OriginalOrder = s2.Order3
select * from #shuffle
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2008 at 5:29 pm
Thanks Jeff
I've created a test script below.
IF OBJECT_ID('TempDB..#customers', 'U') IS NOT NULL
DROP TABLE #customers
CREATE TABLE #customers
(
CustomerID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Title VARCHAR(10) NOT NULL,
FirstName VARCHAR(30) NOT NULL,
MiddleName VARCHAR(30) NULL,
LastName VARCHAR(30) NOT NULL,
BirthDate SMALLDATETIME NULL
)
INSERT INTO #customers VALUES ('MR.', 'CHRISTOPHER', NULL, 'PETERSON', '1967-01-25')
INSERT INTO #customers VALUES ('MS.', 'RENAE', NULL, 'DAMIANI', '1984-09-21')
INSERT INTO #customers VALUES ('MR.', 'GARY', NULL, 'MAY', '1974-05-12')
INSERT INTO #customers VALUES ('MISS', 'ELIZABETH', 'JANE', 'FLAHERTY', '1956-12-02')
INSERT INTO #customers VALUES ('MR.', 'SAMUEL', 'NATHAN', 'HILL', '1968-07-14')
INSERT INTO #customers VALUES ('MR.', 'MATTHEW', 'J', 'HORNE', '1971-03-30')
INSERT INTO #customers VALUES ('MR.', 'ADAM', 'MAX', 'REID', '1979-09-16')
INSERT INTO #customers VALUES ('MR.', 'JOSHUA', 'DAVID', 'ROUTLEY', '1982-02-02')
INSERT INTO #customers VALUES ('MR.', 'TUTONE', NULL, 'KALONIHEA', '1958-11-15')
INSERT INTO #customers VALUES ('MRS.', 'MUNA', NULL, 'NAJJAR', '1963-08-18')
October 9, 2008 at 5:54 pm
Heh... I was hoping you had an IDENTITY PK... I won't have to make many changes to the script I just wrote above...
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2008 at 7:36 pm
Thanks a million Jeff
Using your script as a basis, you turned a 26+ hour process into 32 second process.
This is the code I'm now using
-- Add Column for Sequential Order
ALTER TABLE Customers
ADD tmpOrder1INT
GO
-- Add Sequential Order
UPDATE c1
SET tmpOrder1 = RowNum
FROM Customers c1
INNER JOIN
(
SELECT CustomerID, ROW_NUMBER() OVER (ORDER BY CustomerID) 'RowNum' FROM Customers
) c2 ON c2.CustomerID = c1.CustomerID
GO
-- Add Column for Random Order
ALTER TABLE Customers
ADD tmpOrder2INT
GO
-- Add Random Order
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID())) % (SELECT COUNT(*) FROM Customers)
GO
-- Update FirstName
UPDATE c1
SET FirstName = c2.FirstName
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
-- Re-randomise
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID()))%(SELECT COUNT(*) FROM Customers)
GO
-- Update MiddleName
UPDATE c1
SET MiddleName = c2.MiddleName
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
-- Re-randomise
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID()))%(SELECT COUNT(*) FROM Customers)
GO
-- Update LastName
UPDATE c1
SET LastName = c2.LastName
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
-- Re-randomise
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID()))%(SELECT COUNT(*) FROM Customers)
GO
-- Update BirthDate
UPDATE c1
SET BirthDate = c2.BirthDate
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
ALTER TABLE Customers
DROP COLUMN tmpOrder1
GO
ALTER TABLE Customers
DROP COLUMN tmpOrder2
GO
October 9, 2008 at 9:56 pm
craig (10/9/2008)
Thanks a million JeffUsing your script as a basis, you turned a 26+ hour process into 32 second process.
This is the code I'm now using
-- Add Column for Sequential Order
ALTER TABLE Customers
ADD tmpOrder1INT
GO
-- Add Sequential Order
UPDATE c1
SET tmpOrder1 = RowNum
FROM Customers c1
INNER JOIN
(
SELECT CustomerID, ROW_NUMBER() OVER (ORDER BY CustomerID) 'RowNum' FROM Customers
) c2 ON c2.CustomerID = c1.CustomerID
GO
-- Add Column for Random Order
ALTER TABLE Customers
ADD tmpOrder2INT
GO
-- Add Random Order
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID())) % (SELECT COUNT(*) FROM Customers)
GO
-- Update FirstName
UPDATE c1
SET FirstName = c2.FirstName
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
-- Re-randomise
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID()))%(SELECT COUNT(*) FROM Customers)
GO
-- Update MiddleName
UPDATE c1
SET MiddleName = c2.MiddleName
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
-- Re-randomise
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID()))%(SELECT COUNT(*) FROM Customers)
GO
-- Update LastName
UPDATE c1
SET LastName = c2.LastName
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
-- Re-randomise
UPDATE Customers
SET tmpOrder2 = ABS(CHECKSUM(NEWID()))%(SELECT COUNT(*) FROM Customers)
GO
-- Update BirthDate
UPDATE c1
SET BirthDate = c2.BirthDate
FROM Customers c1
INNER JOIN Customers c2 ON c2.tmpOrder1 = c1.tmpOrder2
GO
ALTER TABLE Customers
DROP COLUMN tmpOrder1
GO
ALTER TABLE Customers
DROP COLUMN tmpOrder2
GO
Very cool... thanks for posting your solution. I was just getting to the rework on this and you saved me a bit of time. 🙂 And thanks for the feedback on the performance!
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2008 at 9:56 pm
Craig, BTW... how many rows was this for?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2008 at 10:23 pm
Oh... one more thing... since you added and then deleted columns from the "real" table instead of using a temp table as a driver, you should probably run a DBCC DBReindex to repack some of the page splits you may have caused on the Clustered Index.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2008 at 10:32 pm
There's only 154,000 rows. That's how bad my row by row cursor solution was. :crazy:
This is it below (with minimal tidy up)
SET NOCOUNT ON
DECLARE @customerID INT
DECLARE @gender CHAR(1)
DECLARE @dob DATETIME
DECLARE @firstName VARCHAR(50)
DECLARE @middleName VARCHAR(50)
DECLARE @lastName VARCHAR(50)
DECLARE @birthDate DATETIME
DECLARE myCursor CURSOR FOR
SELECT CustomerID, Gender, BirthDate
FROM Customers
WHERE CustomerID > -1
ORDER BY CustomerID
OPEN myCursor
FETCH NEXT FROM myCursor INTO @customerID, @gender, @dob
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT
@firstName = (SELECT TOP 1 firstname FROM Customers WHERE gender = @gender ORDER BY NEWID()),
@middleName = (SELECT TOP 1 middlename FROM Customers WHERE gender = @gender ORDER BY NEWID()),
@lastName = (SELECT TOP 1 lastname FROM Customers WHERE gender = @gender ORDER BY NEWID()),
@birthDate = (SELECT TOP 1 birthdate FROM Customers WHERE (BirthDate IS NULL AND @dob IS NULL) OR
(BirthDate IS NOT NULL AND @dob IS NOT NULL) ORDER BY NEWID())
UPDATE Customers
SET FirstName = @firstName,
MiddleName = @middleName,
LastName = @lastName,
BirthDate = @birthDate
WHERE CustomerID = @customerID
FETCH NEXT FROM myCursor INTO @customerID, @gender, @dob
END
CLOSE myCursor
DEALLOCATE myCursor
October 9, 2008 at 11:18 pm
Holy moly! Thanks for posting that. It's always good to see the before'n'after.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply