October 20, 2007 at 7:01 am
SQLServer 2005
I have tables with live data and would like to create new test tables with dummy data.
What is the best approach?
Is there a way to select from the live tables, generate random data and save to a 'dump' that can be used to create new tables with the dummy data? I would like to also have a recordlimit because I wouldn't want too many records.
Thanks in advance.
October 20, 2007 at 9:16 am
Here's a way to make lots of constrained test data of many types... so far as not making "too many" rows, I agree, partially... when you are experimenting to make something work, you don't need many because you need to be able to work quickly... however, if you don't test against a million or so rows, you WILL run into scalability problems with your data in the future. It's a real bugger when that happens because it ALWAYS happens when you least expect it and when you least need it to happen.
DROP TABLE JBMTest
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Takes about 77 seconds to execute.
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
Of course, you can change the column names, the table name, the number of rows produced, remove columns you don't need, or create new columns. Remember that unless RAND has a different seed in any given select, RAND will always return the same number for all rows in a given SELECT... hence the need for the NEWID() conversion... NEWID() is the only truly unique number produced by SQL Server... makes a great seed for RAND.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 22, 2007 at 1:46 pm
If you want an exact replica of your real data, just do this
SELECT *
INTO YourTable_Test (Or whatevery you want to call it)
FROM YourTable
I am fairly certain the new table will take on all the attributes of the existing table.
Greg
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
October 23, 2007 at 8:41 am
My favorite technique for generating test data is cross joining temp tables. Cross joins return results where every member of one table is combined with every member of another table.
By cross joining a few temporary tables of names, streets, cities, etc. you can quickly generate hundreds (if not thousands of rows of test data). Just be aware that combinations multiply rapidly so use TOP 1000 (or whatever) as needed. Or, cross join first/last names in one update query and streets/cities in another.
The example below generates random first, middle and last names where first and middle names are the guaranteed to be the same gender. To mix things up, it returns them in random order.
[font="Courier New"]/* generate random, gender-specific first, middle and last names */
CREATE TABLE #fnames
(
name VARCHAR(32),
gender CHAR(1)
)
CREATE TABLE #lnames
(
name VARCHAR(32),
)
INSERT #fnames ( name, gender ) VALUES ( 'Joe', 'M' )
INSERT #fnames ( name, gender ) VALUES ( 'Frank', 'M' )
INSERT #fnames ( name, gender ) VALUES ( 'Bob', 'M' )
INSERT #fnames ( name, gender ) VALUES ( 'Mark', 'M' )
INSERT #fnames ( name, gender ) VALUES ( 'Ted', 'M' )
INSERT #fnames ( name, gender ) VALUES ( 'Larry', 'M' )
INSERT #fnames ( name, gender ) VALUES ( 'James', 'M' )
INSERT #fnames ( name, gender ) VALUES ( 'Ann', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Mary', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Carol', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Cindy', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Alice', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Joan', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Laura', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Lisa', 'F' )
INSERT #fnames ( name, gender ) VALUES ( 'Kim', 'F' )
INSERT #lnames ( name ) VALUES ( 'Brown' )
INSERT #lnames ( name ) VALUES ( 'Smith' )
INSERT #lnames ( name ) VALUES ( 'Edwards' )
INSERT #lnames ( name ) VALUES ( 'Jackson' )
INSERT #lnames ( name ) VALUES ( 'Jones' )
INSERT #lnames ( name ) VALUES ( 'Smith' )
INSERT #lnames ( name ) VALUES ( 'Allen' )
INSERT #lnames ( name ) VALUES ( 'Hobbs' )
INSERT #lnames ( name ) VALUES ( 'Townsend' )
-- get every combo of first, middle, last name where
-- first and middle are same gender but aren't the same
SELECT ABS(CHECKSUM(NEWID())) % 1000, f.name + ' ' + m.name + ' ' + l.name As [Full Name]
FROM #fnames f
CROSS JOIN #fnames m
CROSS JOIN #lnames l
-- make sure first and middle are not the same and their gender's match
WHERE f.name <> m.name AND f.gender = m.gender
ORDER BY 1 -- random order
DROP TABLE #fnames
DROP TABLE #lnames[/font]
November 3, 2007 at 12:27 pm
Thank you all for your responses. They are very helpful.
November 4, 2007 at 8:15 pm
Be careful about using live data to generate test data without some obfuscation. If you have personal data (names, phones, SSNs, etc.) you should change these around somehow. Make some random substituions in your joins so that you aren't unnecessarily exposing user's data.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply