March 13, 2012 at 10:01 am
Hi all,
Can anyone provide a customizable T-SQL script to create millions of rows in a table?
My understanding is that I can create a C# SP to do that? that would give me flexibility. I'd use an array of 1000 first names, 1000 middle name and 1000 last names, then make sure that the combination of these names are unique then insert a new row into the database.
I want generate that table to practice performance related techniques.
Any ideas?
March 13, 2012 at 10:11 am
a set based operation will be orders of magnitude faster than c#.
you need the three base tables, say FNAME,MNAME and LNAME.
the secret is to cross join the three agasint themselves, and then also cross join them agaisnt sys.columns or some other big table a couple of times to get massive quantities.
then its pretty simple:
INSERT INTO DESTINATION(FirstName,MiddleName,LastName)
SELECT TOP 2000000 --two million?
FNAME.Firstname,
MNAME.MiddleName,
LNAME.LastName
FROM FNAME
CROSS JOIN MNAME
CROSS JOIN LNAME
CROSS JOIN sys.columns C1
CROSS JOIN sys.columns C2
ORDER BY NEWID() --slows things down, but randomized the insert so the first zillion people are not all named "ALex" for example)
Lowell
March 13, 2012 at 10:19 am
Thank you but would that create unique rows? I'm sorry I'm still very new to the whole thing.
March 13, 2012 at 10:28 am
it depedns on your source.
if you only have 3 firstnames,3 middlenames and 3 lastnames that is 3x3x3 = 81 unique combinations.in two million rows, that will repeat a lot.
if this is fake data, does it REALLY need to be unique? you have a constraint that sais no two people can be named John Smith, for example?
If they REALLY need to be unique, you need to wrap it once more with rownumber, and only get the first(unique) sets
INSERT INTO DESTINATION(FirstName,MiddleName,LastName)
SELECT TOP 2000000 --two million?
Firstname,
MiddleName,
LastName
FROM (
SELECT TOP 2000000 --two million?
ROW_NUMBER() OVER(PARTITION BY FNAME.Firstname, MNAME.MiddleName,LNAME.LastName ORDER BY FNAME.Firstname) AS RW,
FNAME.Firstname,
MNAME.MiddleName,
LNAME.LastName
FROM FNAME
CROSS JOIN MNAME
CROSS JOIN LNAME
--no need to cross join sys.columns now, as we don't want duplicates!
)MyAlias
WHERE RW = 1
ORDER BY NEWID()
Lowell
March 13, 2012 at 10:33 am
You can use this:
create table numbers(n int)
insert into numbers(n)
select 10*t2.y + t1.x
from (select 1 as x
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
union all
select 0) t1
cross join
(select 0 as y
union all
select 1
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
) t2
select identity(int,1,1) as id
into #tmp
from numbers u cross join numbers v
cross join numbers w
insert into numbers
select id + 99
from #tmp
drop table #tmp
March 13, 2012 at 10:58 am
Thank you so much
I want to create as much real data as possible, so I guess duplicate names are fine given that I have 1,000 fname, 1,000 mname and another 1,000 lname.
what if I want add an address column and an email? they can be generated out of a concatenation script???
March 13, 2012 at 11:02 am
sure just add something simple like Firstname + '.' + Lastname + '@fake.com' to generate John.Smith@fake.com
Lowell
March 13, 2012 at 11:17 am
Thank you so much 🙂
March 14, 2012 at 2:53 am
I need to create a primary key on that new table, should I add something like an identity(1,1) to the script?
March 14, 2012 at 3:44 am
beginnew2012 (3/14/2012)
I need to create a primary key on that new table, should I add something like an identity(1,1) to the script?
yes, because of possibility of duplicates, you need the surrogate key - Identity or something like (for example, GUID or Sequence).
March 14, 2012 at 4:59 am
This will create 1,000,000 rows of pseudo-random unique firstName, secondName, e-mail address and mail address.
SET NOCOUNT ON;
IF object_id('tempdb..#testNames') IS NOT NULL
BEGIN
DROP TABLE #testNames;
END;
WITH t1(N) AS (SELECT 1 UNION ALL SELECT 1),
t2(N) AS (SELECT 1 FROM t1 x, t1 y),
t3(N) AS (SELECT 1 FROM t2 x, t2 y),
Tally(N) AS (SELECT TOP 98 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),
Tally2(N) AS (SELECT TOP 5 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),
Combinations(N) AS (SELECT DISTINCT LTRIM(RTRIM(RTRIM(SUBSTRING(poss,a.N,2)) + SUBSTRING(vowels,b.N,1)))
FROM Tally a
CROSS JOIN Tally2 b
CROSS APPLY (SELECT 'B C D F G H J K L M N P R S T V W Z SCSKKNSNSPSTBLCLFLGLPLSLBRCRDRFRGRPRTRVRSHSMGHCHPHRHWHBWCWSWTW') d(poss)
CROSS APPLY (SELECT 'AEIOU') e(vowels))
SELECT IDENTITY(INT,1,1) AS ID, a.N + b.N AS N
INTO #testNames
FROM Combinations a
CROSS JOIN Combinations b;
IF object_id('tempdb..#testNames2') IS NOT NULL
BEGIN
DROP TABLE #testNames2;
END;
SELECT IDENTITY(INT,1,1) AS ID, firstName, secondName
INTO #testNames2
FROM (SELECT firstName, secondName
FROM (SELECT TOP 1000 --1000 * 1000 = 1,000,000 rows
N AS firstName
FROM #testNames
ORDER BY NEWID()) a
CROSS JOIN (SELECT TOP 1000 --1000 * 1000 = 1,000,000 rows
N AS secondName
FROM #testNames
ORDER BY NEWID()) b) innerQ;
IF object_id('tempdb..#testNames3') IS NOT NULL
BEGIN
DROP TABLE #testNames3;
END;
SELECT firstName, secondName,
firstName + '.' + secondName + '@fake.com' AS eMail,
CAST((ABS(CHECKSUM(NEWID())) % 250) + 1 AS VARCHAR(3)) + ' ' AS mailAddress,
(ABS(CHECKSUM(NEWID())) % 152100) + 1 AS jID,
IDENTITY(INT,1,1) AS ID
INTO #testNames3
FROM #testNames2
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment;
END;
SELECT IDENTITY(INT,1,1) AS ID, firstName, secondName, eMail,
mailAddress + b.N + b.N AS mailAddress
INTO #testEnvironment
FROM #testNames3 a
INNER JOIN #testNames b ON a.jID = b.ID;
--CLEAN UP USELESS TABLES
DROP TABLE #testNames;
DROP TABLE #testNames2;
DROP TABLE #testNames3;
Here are a few sample rows: -
SELECT TOP 10 * FROM #testEnvironment ORDER BY NEWID();
--ID firstName secondName eMail mailAddress
------------- --------- ---------- ---------------------- ----------------
--993085 KOFRI SLULA KOFRI.SLULA@fake.com 104 GEFRAGEFRA
--557819 CHEWTE NSEBWO CHEWTE.NSEBWO@fake.com 144 HROSLAHROSLA
--854678 HRULBA LPEWCU HRULBA.LPEWCU@fake.com 161 KKIBAKKIBA
--848807 FUSCI LPEPLA FUSCI.LPEPLA@fake.com 84 GROCHOGROCHO
--40825 HWEBLA RCOSKE HWEBLA.RCOSKE@fake.com 197 WSISHOWSISHO
--138785 CWUSPI HETBE CWUSPI.HETBE@fake.com 120 LEPHELEPHE
--592768 CLOSI WIHCE CLOSI.WIHCE@fake.com 36 PROGROPROGRO
--475296 LPIVI HWIHSE LPIVI.HWIHSE@fake.com 1 GLOLBEGLOLBE
--872973 RPOSWE GREDE RPOSWE.GREDE@fake.com 128 SCOHESCOHE
--520195 CLEVA MAPLU CLEVA.MAPLU@fake.com 77 WSAGRIWSAG
March 14, 2012 at 6:23 am
Cadavre (3/14/2012)
This will create 1,000,000 rows of pseudo-random unique firstName, secondName, e-mail address and mail address.
I like your example Cadavre, but there's a possbility of creating names that contain unintenteded bad or offensive words when you use that technique.
That's something to consider if you have a target audience for the data; if it's just for internal use, as a development database, I personally don't have a problem with it.
Lowell
March 14, 2012 at 6:33 am
Lowell (3/14/2012)
Cadavre (3/14/2012)
This will create 1,000,000 rows of pseudo-random unique firstName, secondName, e-mail address and mail address.I like your example Cadavre, but there's a possbility of creating names that contain unintenteded bad or offensive words when you use that technique.
That's something to consider if you have a target audience for the data; if it's just for internal use, as a development database, I personally don't have a problem with it.
Indeed. I actually intended to mention that, as it's a technique based entirely on a Phil Factor article in which he warned of offensive names going out to customers.
March 14, 2012 at 7:42 am
Lowell, the code you gave me didn't work 🙁
it consumed 20 GBs in the temp DB and the destination table was empty and it took like 2 hours to run. I'm not sure what happened. I'm going to try Cadavre code and give it a try 🙂
Thanks alot anyway but Lowell have you run that code yourself? maybe I'm doing something wrong!
March 14, 2012 at 7:46 am
Cadavre, I don't want the table to be a temporary table. I want an actual table, what changes should I apply?
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply