Need a T-SQL Script to create millions of rows

  • Hi all,

    Can anyone provide a customizable T-SQL script to create millions of rows in a table?

    My understanding is that I can create a C# SP to do that? that would give me flexibility. I'd use an array of 1000 first names, 1000 middle name and 1000 last names, then make sure that the combination of these names are unique then insert a new row into the database.

    I want generate that table to practice performance related techniques.

    Any ideas?

  • a set based operation will be orders of magnitude faster than c#.

    you need the three base tables, say FNAME,MNAME and LNAME.

    the secret is to cross join the three agasint themselves, and then also cross join them agaisnt sys.columns or some other big table a couple of times to get massive quantities.

    then its pretty simple:

    INSERT INTO DESTINATION(FirstName,MiddleName,LastName)

    SELECT TOP 2000000 --two million?

    FNAME.Firstname,

    MNAME.MiddleName,

    LNAME.LastName

    FROM FNAME

    CROSS JOIN MNAME

    CROSS JOIN LNAME

    CROSS JOIN sys.columns C1

    CROSS JOIN sys.columns C2

    ORDER BY NEWID() --slows things down, but randomized the insert so the first zillion people are not all named "ALex" for example)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thank you but would that create unique rows? I'm sorry I'm still very new to the whole thing.

  • it depedns on your source.

    if you only have 3 firstnames,3 middlenames and 3 lastnames that is 3x3x3 = 81 unique combinations.in two million rows, that will repeat a lot.

    if this is fake data, does it REALLY need to be unique? you have a constraint that sais no two people can be named John Smith, for example?

    If they REALLY need to be unique, you need to wrap it once more with rownumber, and only get the first(unique) sets

    INSERT INTO DESTINATION(FirstName,MiddleName,LastName)

    SELECT TOP 2000000 --two million?

    Firstname,

    MiddleName,

    LastName

    FROM (

    SELECT TOP 2000000 --two million?

    ROW_NUMBER() OVER(PARTITION BY FNAME.Firstname, MNAME.MiddleName,LNAME.LastName ORDER BY FNAME.Firstname) AS RW,

    FNAME.Firstname,

    MNAME.MiddleName,

    LNAME.LastName

    FROM FNAME

    CROSS JOIN MNAME

    CROSS JOIN LNAME

    --no need to cross join sys.columns now, as we don't want duplicates!

    )MyAlias

    WHERE RW = 1

    ORDER BY NEWID()

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • You can use this:

    create table numbers(n int)

    insert into numbers(n)

    select 10*t2.y + t1.x

    from (select 1 as x

    union all

    select 2

    union all

    select 3

    union all

    select 4

    union all

    select 5

    union all

    select 6

    union all

    select 7

    union all

    select 8

    union all

    select 9

    union all

    select 0) t1

    cross join

    (select 0 as y

    union all

    select 1

    union all

    select 2

    union all

    select 3

    union all

    select 4

    union all

    select 5

    union all

    select 6

    union all

    select 7

    union all

    select 8

    union all

    select 9

    ) t2

    select identity(int,1,1) as id

    into #tmp

    from numbers u cross join numbers v

    cross join numbers w

    insert into numbers

    select id + 99

    from #tmp

    drop table #tmp

  • Thank you so much

    I want to create as much real data as possible, so I guess duplicate names are fine given that I have 1,000 fname, 1,000 mname and another 1,000 lname.

    what if I want add an address column and an email? they can be generated out of a concatenation script???

  • sure just add something simple like Firstname + '.' + Lastname + '@fake.com' to generate John.Smith@fake.com

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thank you so much 🙂

  • I need to create a primary key on that new table, should I add something like an identity(1,1) to the script?

  • beginnew2012 (3/14/2012)


    I need to create a primary key on that new table, should I add something like an identity(1,1) to the script?

    yes, because of possibility of duplicates, you need the surrogate key - Identity or something like (for example, GUID or Sequence).

  • This will create 1,000,000 rows of pseudo-random unique firstName, secondName, e-mail address and mail address.

    SET NOCOUNT ON;

    IF object_id('tempdb..#testNames') IS NOT NULL

    BEGIN

    DROP TABLE #testNames;

    END;

    WITH t1(N) AS (SELECT 1 UNION ALL SELECT 1),

    t2(N) AS (SELECT 1 FROM t1 x, t1 y),

    t3(N) AS (SELECT 1 FROM t2 x, t2 y),

    Tally(N) AS (SELECT TOP 98 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),

    Tally2(N) AS (SELECT TOP 5 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),

    Combinations(N) AS (SELECT DISTINCT LTRIM(RTRIM(RTRIM(SUBSTRING(poss,a.N,2)) + SUBSTRING(vowels,b.N,1)))

    FROM Tally a

    CROSS JOIN Tally2 b

    CROSS APPLY (SELECT 'B C D F G H J K L M N P R S T V W Z SCSKKNSNSPSTBLCLFLGLPLSLBRCRDRFRGRPRTRVRSHSMGHCHPHRHWHBWCWSWTW') d(poss)

    CROSS APPLY (SELECT 'AEIOU') e(vowels))

    SELECT IDENTITY(INT,1,1) AS ID, a.N + b.N AS N

    INTO #testNames

    FROM Combinations a

    CROSS JOIN Combinations b;

    IF object_id('tempdb..#testNames2') IS NOT NULL

    BEGIN

    DROP TABLE #testNames2;

    END;

    SELECT IDENTITY(INT,1,1) AS ID, firstName, secondName

    INTO #testNames2

    FROM (SELECT firstName, secondName

    FROM (SELECT TOP 1000 --1000 * 1000 = 1,000,000 rows

    N AS firstName

    FROM #testNames

    ORDER BY NEWID()) a

    CROSS JOIN (SELECT TOP 1000 --1000 * 1000 = 1,000,000 rows

    N AS secondName

    FROM #testNames

    ORDER BY NEWID()) b) innerQ;

    IF object_id('tempdb..#testNames3') IS NOT NULL

    BEGIN

    DROP TABLE #testNames3;

    END;

    SELECT firstName, secondName,

    firstName + '.' + secondName + '@fake.com' AS eMail,

    CAST((ABS(CHECKSUM(NEWID())) % 250) + 1 AS VARCHAR(3)) + ' ' AS mailAddress,

    (ABS(CHECKSUM(NEWID())) % 152100) + 1 AS jID,

    IDENTITY(INT,1,1) AS ID

    INTO #testNames3

    FROM #testNames2

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    SELECT IDENTITY(INT,1,1) AS ID, firstName, secondName, eMail,

    mailAddress + b.N + b.N AS mailAddress

    INTO #testEnvironment

    FROM #testNames3 a

    INNER JOIN #testNames b ON a.jID = b.ID;

    --CLEAN UP USELESS TABLES

    DROP TABLE #testNames;

    DROP TABLE #testNames2;

    DROP TABLE #testNames3;

    Here are a few sample rows: -

    SELECT TOP 10 * FROM #testEnvironment ORDER BY NEWID();

    --ID firstName secondName eMail mailAddress

    ------------- --------- ---------- ---------------------- ----------------

    --993085 KOFRI SLULA KOFRI.SLULA@fake.com 104 GEFRAGEFRA

    --557819 CHEWTE NSEBWO CHEWTE.NSEBWO@fake.com 144 HROSLAHROSLA

    --854678 HRULBA LPEWCU HRULBA.LPEWCU@fake.com 161 KKIBAKKIBA

    --848807 FUSCI LPEPLA FUSCI.LPEPLA@fake.com 84 GROCHOGROCHO

    --40825 HWEBLA RCOSKE HWEBLA.RCOSKE@fake.com 197 WSISHOWSISHO

    --138785 CWUSPI HETBE CWUSPI.HETBE@fake.com 120 LEPHELEPHE

    --592768 CLOSI WIHCE CLOSI.WIHCE@fake.com 36 PROGROPROGRO

    --475296 LPIVI HWIHSE LPIVI.HWIHSE@fake.com 1 GLOLBEGLOLBE

    --872973 RPOSWE GREDE RPOSWE.GREDE@fake.com 128 SCOHESCOHE

    --520195 CLEVA MAPLU CLEVA.MAPLU@fake.com 77 WSAGRIWSAG


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (3/14/2012)


    This will create 1,000,000 rows of pseudo-random unique firstName, secondName, e-mail address and mail address.

    I like your example Cadavre, but there's a possbility of creating names that contain unintenteded bad or offensive words when you use that technique.

    That's something to consider if you have a target audience for the data; if it's just for internal use, as a development database, I personally don't have a problem with it.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (3/14/2012)


    Cadavre (3/14/2012)


    This will create 1,000,000 rows of pseudo-random unique firstName, secondName, e-mail address and mail address.

    I like your example Cadavre, but there's a possbility of creating names that contain unintenteded bad or offensive words when you use that technique.

    That's something to consider if you have a target audience for the data; if it's just for internal use, as a development database, I personally don't have a problem with it.

    Indeed. I actually intended to mention that, as it's a technique based entirely on a Phil Factor article in which he warned of offensive names going out to customers.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Lowell, the code you gave me didn't work 🙁

    it consumed 20 GBs in the temp DB and the destination table was empty and it took like 2 hours to run. I'm not sure what happened. I'm going to try Cadavre code and give it a try 🙂

    Thanks alot anyway but Lowell have you run that code yourself? maybe I'm doing something wrong!

  • Cadavre, I don't want the table to be a temporary table. I want an actual table, what changes should I apply?

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply