May 13, 2011 at 2:15 am
If I want to populate a table with dummy data to test procs and then add indexes we needed, do I have to add unique values in each row or will it be the same if the there was the same row 1 000 000 times. Basically is 1 000 000 unique rows processed differently from 1 000 000 unique rows.
The Fastest Methods aren't always the Quickest Methods
May 13, 2011 at 3:25 am
you need to try to generate different values for each row - or at least in the columns which are important. At best you need data representative of actual data if you want to test - I've seen many a failure due to not having "typical" data sets.
There's lots of ways to do this but usually a loop does the trick - in fact I'm just doing some testing of that type ( well generating data for testing of a specific kind ) I populate tables with 1 million rows each for this.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
May 13, 2011 at 3:32 am
do you have any known methods to generate a lot of random rows, I could do it through a C# app but is there a good way SQL
The Fastest Methods aren't always the Quickest Methods
May 13, 2011 at 4:03 am
well I wrote up my benchmark tests so you can see how I attempted to generate data. I use this type of approach.
http://www.grumpyolddba.co.uk/infrastructure/sqlbenchmark.htm
I used this script to generate character data ( 17,500 rows unique ) which was used for testing foreign key lookups performance. ( I needed a reasonable amount of unique values for a lookup table ) you can also cast a guid to varchar - from an indexing point of view you just need the minimum uniqueness.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
May 13, 2011 at 4:04 am
sorry forgot to paste!
set nocount on
declare @scount tinyint=65,@scount2 tinyint=70,@scount3 tinyint=68
declare @loop int =1
while @loop<17577
begin
--insert into dbo.LookupCharOne(Squirrel)
select CHAR(@scount)+CHAR(@scount2)+CHAR(@scount3)+'sqlsquirrel'
set @scount+=1;
if @scount>90
BEGIN
set @scount=65
set @scount2+=1
END
if @scount2>90
BEGIN
set @scount2=65
set @scount3+=1
END
if @scount3>90 set @scount3=65
end
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
May 13, 2011 at 4:50 am
thank you, so it doesn't have to be completely different values just that they are unique?
The Fastest Methods aren't always the Quickest Methods
May 13, 2011 at 5:06 am
INSERT INTO TAB1 values('Jahanzaib','Pakistan')
GO 400000
INSERT INTO TAB1 values('Colin Roberts','UK')
GO 400000
INSERT INTO TAB1 values('Binary','South Africa')
GO 100000
INSERT INTO TAB1 values('SuperMan','USA')
GO 100000
that is an example to load the dummy data just execute it and then apply queries to find the indexes requirement.Note It ,I assumed that there is an Identity column ID in this table,you can do same as I define with your table structure as well,create 5 to 10 different records and then iterate it with your requirement with GO operator
Regards,
Syed Jahanzaib Bin Hassan
BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA
My Blog
www.aureus-salah.com
May 13, 2011 at 5:31 am
BinaryDigit (5/13/2011)
thank you, so it doesn't have to be completely different values just that they are unique?
One or the other will do just fine 😉
Jeff Moden has a sample data generator which has been used for years on this forum. Here's a very small adjustment to his procedure which generates 100,000 rows of mixed-datatype data:
USE [Testing]
GO
/****** Object: StoredProcedure [dbo].[CreateSampleData] Script Date: 05/13/2011 12:25:45 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:Jeff Moden
-- Create date: <Create Date,,>
-- Description:<Description,,>
-- =============================================
ALTER PROCEDURE [dbo].[CreateSampleData]
AS
--===== Preset the environment for appearance and speed
SET NOCOUNT ON
--===== If the test table already exists, drop it in case we need to rerun.
-- The 3 part naming is overkill, but prevents accidents on real tables.
IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL
DROP TABLE TempDB.dbo.TransactionDetail
--===== Create the test table (TransactionDetail) with a NON clustered PK
CREATE TABLE TempDB.dbo.TransactionDetail (
TransactionDetailID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
[Date] DATETIME,
AccountID INT,
Amount MONEY,
SentenceVARCHAR(200),
AccountRunningTotal MONEY, --Running total across each account
AccountRunningCount INT, --Like "Rank" across each account
NCID INT)
INSERT INTO TempDB.dbo.TransactionDetail (Date, AccountID, Amount, Sentence)
SELECT TOP 100000
--10 years worth of dates with times from 1/1/2000 to 12/31/2009
CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,
--100 different account numbers
ABS(CHECKSUM(NEWID()))%100+1,
--Dollar amounts from -99.99 to + 99.99
CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),
--Randomised number of words
iTVF.Sentence
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2
CROSS APPLY (
SELECT Word AS 'data()'
FROM (
SELECT TOP ((sc1.colorder*sc2.colorder)%9+1) word
FROM (
SELECT word = CAST('the' AS VARCHAR(200)) UNION ALL
SELECT 'quick' UNION ALL
SELECT 'brown' UNION ALL
SELECT 'fox' UNION ALL
SELECT 'jumped' UNION ALL
SELECT 'over' UNION ALL
SELECT 'the' UNION ALL
SELECT 'lazy' UNION ALL
SELECT 'dog'
) Words ORDER BY NEWID()) u2
FOR XML PATH('')
) iTVF(Sentence)
RETURN 0
SELECT * FROM TempDB.dbo.TransactionDetail
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
May 13, 2011 at 5:36 am
thanks a lot this will come in handy 🙂
The Fastest Methods aren't always the Quickest Methods
May 13, 2011 at 5:53 am
what I did as a random test was the below 🙂
INSERT INTO randomTest
(Name, Surname, Age)
values
('John'+ DATENAME(millisecond,getdate()),'Smith' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate())),
('Paul'+ DATENAME(millisecond,getdate()) ,'Foster' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate())),
('Lee' + DATENAME(millisecond,getdate()),'Carey' + DATENAME(millisecond,getdate()), DATENAME(millisecond,getdate()))
GO 9999999999
add about 10 people and then each row should be unique 😀
The Fastest Methods aren't always the Quickest Methods
May 13, 2011 at 6:00 am
Is 29,999,999,997 rows representative of the data you wish to model? Or, with 10 names to start, 99,999,999,990 rows? Is the data itself representative?
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply