Can You Give Me Tips To Creating Test Data

  • SQLServer 2005

    I have tables with live data and would like to create new test tables with dummy data.

    What is the best approach?

    Is there a way to select from the live tables, generate random data and save to a 'dump' that can be used to create new tables with the dummy data? I would like to also have a recordlimit because I wouldn't want too many records.

    Thanks in advance.

  • Here's a way to make lots of constrained test data of many types... so far as not making "too many" rows, I agree, partially... when you are experimenting to make something work, you don't need many because you need to be able to work quickly... however, if you don't test against a million or so rows, you WILL run into scalability problems with your data in the future. It's a real bugger when that happens because it ALWAYS happens when you least expect it and when you least need it to happen.

    DROP TABLE JBMTest

    GO

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Takes about 77 seconds to execute.

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

    SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

    + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

    SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    Of course, you can change the column names, the table name, the number of rows produced, remove columns you don't need, or create new columns. Remember that unless RAND has a different seed in any given select, RAND will always return the same number for all rows in a given SELECT... hence the need for the NEWID() conversion... NEWID() is the only truly unique number produced by SQL Server... makes a great seed for RAND.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you want an exact replica of your real data, just do this

    SELECT *

    INTO YourTable_Test (Or whatevery you want to call it)

    FROM YourTable

    I am fairly certain the new table will take on all the attributes of the existing table.

    Greg

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • My favorite technique for generating test data is cross joining temp tables. Cross joins return results where every member of one table is combined with every member of another table.

    By cross joining a few temporary tables of names, streets, cities, etc. you can quickly generate hundreds (if not thousands of rows of test data). Just be aware that combinations multiply rapidly so use TOP 1000 (or whatever) as needed. Or, cross join first/last names in one update query and streets/cities in another.

    The example below generates random first, middle and last names where first and middle names are the guaranteed to be the same gender. To mix things up, it returns them in random order.

    [font="Courier New"]/* generate random, gender-specific first, middle and last names */

    CREATE TABLE #fnames

    (

    name VARCHAR(32),

    gender CHAR(1)

    )

    CREATE TABLE #lnames

    (

    name VARCHAR(32),

    )

    INSERT #fnames ( name, gender ) VALUES ( 'Joe', 'M' )

    INSERT #fnames ( name, gender ) VALUES ( 'Frank', 'M' )

    INSERT #fnames ( name, gender ) VALUES ( 'Bob', 'M' )

    INSERT #fnames ( name, gender ) VALUES ( 'Mark', 'M' )

    INSERT #fnames ( name, gender ) VALUES ( 'Ted', 'M' )

    INSERT #fnames ( name, gender ) VALUES ( 'Larry', 'M' )

    INSERT #fnames ( name, gender ) VALUES ( 'James', 'M' )

    INSERT #fnames ( name, gender ) VALUES ( 'Ann', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Mary', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Carol', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Cindy', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Alice', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Joan', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Laura', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Lisa', 'F' )

    INSERT #fnames ( name, gender ) VALUES ( 'Kim', 'F' )

    INSERT #lnames ( name ) VALUES ( 'Brown' )

    INSERT #lnames ( name ) VALUES ( 'Smith' )

    INSERT #lnames ( name ) VALUES ( 'Edwards' )

    INSERT #lnames ( name ) VALUES ( 'Jackson' )

    INSERT #lnames ( name ) VALUES ( 'Jones' )

    INSERT #lnames ( name ) VALUES ( 'Smith' )

    INSERT #lnames ( name ) VALUES ( 'Allen' )

    INSERT #lnames ( name ) VALUES ( 'Hobbs' )

    INSERT #lnames ( name ) VALUES ( 'Townsend' )

    -- get every combo of first, middle, last name where

    -- first and middle are same gender but aren't the same

    SELECT ABS(CHECKSUM(NEWID())) % 1000, f.name + ' ' + m.name + ' ' + l.name As [Full Name]

    FROM #fnames f

    CROSS JOIN #fnames m

    CROSS JOIN #lnames l

    -- make sure first and middle are not the same and their gender's match

    WHERE f.name <> m.name AND f.gender = m.gender

    ORDER BY 1 -- random order

    DROP TABLE #fnames

    DROP TABLE #lnames[/font]

  • Thank you all for your responses. They are very helpful.

  • Be careful about using live data to generate test data without some obfuscation. If you have personal data (names, phones, SSNs, etc.) you should change these around somehow. Make some random substituions in your joins so that you aren't unnecessarily exposing user's data.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply