March 16, 2010 at 9:55 am
Hey guys.
I'm trying to select all rows of data that have the same ID. Basically I have a table where the same unique person could've arrived at the park at different times. I'm looking at eventually displaying a flag to show that someone has re-entered.
For now I'm simply having trouble displaying the records I want. I currently have ~650,000 to sort through, so my statement is very inefficient.
Below I'll create some random data similar to what I'm using: -
IF EXISTS (SELECT *
FROM sys.tables t
JOIN sys.schemas s
ON t.schema_id = s.schema_id
WHERE t.[Name] = 'Test_Data'
AND s.[Name] = 'dbo')
DROP TABLE dbo.test_data
CREATE TABLE dbo.test_data (
[ID] VARCHAR(128) NOT NULL,
[address] VARCHAR(250) NOT NULL,
[StartDate] DATETIME NOT NULL)
ON [PRIMARY]
DECLARE @maxRandomValue TINYINT,
@minRandomValue TINYINT,
@cnt INT,
@string VARCHAR(256),
@stringlen INT,
@index INT,
@fakeLength INT,
@fakeaddress VARCHAR(8000)
SET @maxRandomValue = 100
SET @minRandomValue = 1
/* CAREFUL! CURRENTLY CREATING 650,000 ROWS OF FAKE DATA */
SET @cnt = 650000
SELECT @string = 'abcdefghijklmnopqrstuvwxyz'
SELECT @stringlen = Len(@string)
WHILE @cnt > 0
BEGIN
SET @cnt = @cnt - 1
SET @fakeLength = Cast(((@maxRandomValue + 20) - (@minRandomValue + 5)) * Rand() + (@minRandomValue + 5) AS TINYINT)
SELECT @fakeaddress = ''
WHILE (@fakeLength > 0)
BEGIN
SELECT @index = (Abs(Checksum(Newid()))%@stringlen) + 1
SELECT @fakeaddress = @fakeaddress + Substring(@string,@index,1)
SELECT @fakeLength = @fakeLength - 1
END
INSERT INTO dbo.test_data
([ID],
[address],
[StartDate])
SELECT Cast(((@maxRandomValue + 1) - @minRandomValue) * Rand() + @minRandomValue AS TINYINT),
Cast(Cast(((@maxRandomValue + 91) - @minRandomValue) * Rand() + @minRandomValue AS TINYINT) AS VARCHAR) + ' ' + @fakeaddress,
Getdate() - ((18 * 365) + Rand() * (47 * 365))
END
GO
/* The above just creates some random data. */
Now, what I'm using to select what I need from this data worked fine on small ammounts of data but on large amounts such as the above table, it takes forever.
SELECT *
FROM [dbo].[test_data] AS a,
[dbo].[test_data] AS b
WHERE a.[ID] = b.[ID]
AND a.[StartDate] <> b.[StartDate];
Would love a pointer or two in the correct direction for this! 🙂
March 16, 2010 at 10:02 am
My apologies, I meant to give you some sample output.
I'd want something like: -
------------------------------------------------------------------------------------
ID - | -- Address - | - Start Date --------------- | ID - | -- Address - | - Start Date
------------------------------------------------------------------------------------
14 - | 15 iojwepoh | - 1976-02-26 18:35:20.500- | 14 - | 15 iojwepoh -| 1977-02-26 18:35:20.500
14 - | 15 iojwepoh | - 1977-02-26 18:35:20.500 -| 14 - | 15 iojwepoh-| 1979-02-26 18:35:20.500
March 16, 2010 at 10:06 am
I would consider creating an index - Something with ID, and startdate.
Also, reworked the last select.
SELECT *
FROM [dbo].[test_data] AS a
INNER JOIN [dbo].[test_data] AS b
ON a.[ID] = b.[ID]
WHERE a.[StartDate] <> b.[StartDate];
-- Cory
March 16, 2010 at 10:20 am
I would add an index as Cory suggests.
WITH DupCheck AS (
SELECT ID, COUNT(*) [COUNT]
FROM dbo.test_data
GROUP BY ID
HAVING COUNT(*) > 1
)
SELECT DupCheck.ID, Address, StartDate FROM DupCheck
LEFT JOIN dbo.test_data
ON DupCheck.ID = test_data.ID
ORDER BY DupCheck.ID
March 17, 2010 at 3:27 am
Thanks for that. I've added an ORDER BY StartDate so I can keep things orderly, then added a quick script to test the time taken for the script to run. Suffice to say, a massive improvement. 7 second execution time, in comparison to the previous several minutes.
DECLARE @time DATETIME
SET @time = Getdate();
WITH [DupCheck]
AS (SELECT [ID],
Count(* ) [COUNT]
FROM dbo.test_data
GROUP BY [ID]
HAVING Count(* ) > 1)
SELECT [DupCheck].[ID],
[Address],
[StartDate]
FROM [DupCheck]
LEFT JOIN dbo.test_data
ON [DupCheck].[ID] = [test_data].[ID]
ORDER BY [DupCheck].[ID], [StartDate]
SELECT Datediff(ms,@time,Getdate()) AS "time taken for grabbing data"
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply