February 19, 2013 at 7:01 am
I'm at a new place, trying to find my way around. There's an import/update process that runs very slowly and I don't see an obvious cause.
The process bring in records of people that are connected, similar to Facebook friends ie: Joe became friends with Mary on Jan 1. And Mary became friends with Joe on Feb 1. So there are 2 records initially, but I want to merge them into 1 record making them friends with each other and putting the Jan 1 date as a created date, and Feb 1 as Modified date.
I have the merged records already populated in table FriendExport_NoDup, so just the update of FriendExport_NoDup from the duplicate data in FriendExport_Temp is the problem.
Here is the code. I think it does the right thing, but takes a VERY long time to process. There is one query that updates the CreateDate, and a second that updates a status code and ModifiedDate. I'm sure it could be rolled into 1, but that;s how it is for now.
I notice the ID numbers are NVARCHAR, would that hurt performance for joining ? I could change them to int.
EDIT: I changed the 2 ID columns in the tables to INT, and it seems to run faster, but still much slower than I would expect.
-- Table1 Original File containing duplicates
CREATE TABLE dbo.FriendExport_Temp(
Mem_ID nvarchar(50) NULL,
ConnectionID nvarchar(50) NULL,
Status nvarchar(50) NULL,
DateUpdated datetime NULL
) ON PRIMARY
CREATE NONCLUSTERED INDEX IX_ConnectionID ON dbo.FriendExport_Temp
(ConnectionID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
CREATE NONCLUSTERED INDEX IX_DateUpdated ON dbo.FriendExport_Temp
(DateUpdated ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
CREATE NONCLUSTERED INDEX IX_MEM_ID ON dbo.FriendExport_Temp
(Mem_ID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
-- Table 2 Duplicates removed
CREATE TABLE dbo.FriendExport_NoDup(
Mem_ID nvarchar(50) NULL,
ConnectionID nvarchar(50) NULL,
Status nvarchar(50) NULL,
DateUpdated datetime NULL,
CreatedDateTime datetime NULL,
ModifiedDateTime datetime NULL ) ON PRIMARY
CREATE NONCLUSTERED INDEX IX_ConnectionID ON dbo.FriendExport_NoDup
(ConnectionID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
CREATE NONCLUSTERED INDEX IX_CreatedDate ON dbo.FriendExport_NoDup
(CreatedDateTime ASC,
Mem_ID ASC,
ConnectionID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
CREATE NONCLUSTERED INDEX IX_MEM_ID ON dbo.FriendExport_NoDup
(Mem_ID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY
truncate table FriendExport_Temp
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100671869,100667739,1,'2011-01-04 09:47:18.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100671909,100667739,3,'2011-01-04 04:41:21.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100672019,100667739,1,'2011-01-04 10:07:57.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100672409,100667739,1,'2011-01-03 15:14:25.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100673659,100667739,1,'2011-01-03 15:14:27.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100675249,100667739,1,'2011-01-03 15:14:29.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 101654749,100667739,1,'2011-01-04 09:52:41.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 101670839,100667739,1,'2011-01-04 10:14:47.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 101678529,100667739,1,'2011-01-04 10:26:17.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 101714309,100667739,1,'2011-01-04 11:15:56.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 101783579,100667739,1,'2011-01-04 12:58:32.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 101813709,100667739,1,'2011-01-04 13:48:52.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 101830369,100667739,1,'2011-01-04 14:01:22.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,100671869,1,'2011-01-04 09:47:18.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,100671909,3,'2011-01-04 04:41:21.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,100672019,1,'2011-01-04 10:07:57.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,100672409,1,'2011-01-03 15:14:25.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,100673659,1,'2011-01-03 15:14:27.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,100675249,1,'2011-01-03 15:14:29.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,101654749,1,'2011-01-04 09:52:41.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,101670839,1,'2011-01-04 10:14:47.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,101678529,1,'2011-01-04 10:26:17.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,101714309,1,'2011-01-04 11:15:56.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,101783579,1,'2011-01-04 12:58:32.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,101813709,1,'2011-01-04 13:48:52.000'
insert into FriendExport_Temp (Mem_ID,ConnectionID,Status,DateUpdated) select 100667739,101830369,1,'2011-01-04 14:01:22.000'
select * from FriendExport_Temp
truncate table FriendExport_NoDup
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 100671869,100667739,1,'2011-01-04 09:47:18.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 100671909,100667739,1, '2011-01-04 04:41:21.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 100672019,100667739,1,'2011-01-04 10:07:57.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 100672409,100667739,1,'2011-01-03 15:14:25.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 100673659,100667739,1,'2011-01-03 15:14:27.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 100675249,100667739,1,'2011-01-03 15:14:29.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 101654749,100667739,1,'2011-01-04 09:52:41.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 101670839,100667739,1,'2011-01-04 10:14:47.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 101678529,100667739,1,'2011-01-04 10:26:17.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 101714309,100667739,1,'2011-01-04 11:15:56.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 101783579,100667739,1,'2011-01-04 12:58:32.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 101813709,100667739,1,'2011-01-04 13:48:52.000',NULL,NULL
insert into FriendExport_NoDup (Mem_ID,ConnectionID,Status,DateUpdated, CreatedDateTime, ModifiedDateTime) select 101830369,100667739,1,'2011-01-04 14:01:22.000',NULL,NULL
select * from FriendExport_NoDup
/* Update 1 record in FriendExport_NoDup with information from 2 records in FriendExport_Temp
(1) update "ModifiedDateTime" with the most recent "DateUpdated"
(2) Update "STATUS" with the Status from the most recent "DateUpdated"
(3) Update "CreatedDateTime" with the earliest "DateUpdated"
Processing batches so I can track how fast it's updating
*/
UpdateMore:
WAITFOR DELAY '00:00:01'
-- update modified data & status
update top (500) FriendExport_NoDup
set ModifiedDateTime = (select top 1 dateUpdated
from FriendExport_Temp FE
where (mem_id = FE.MEM_ID and FND.ConnectionID = FE.ConnectionID)
or (mem_id = FE.ConnectionID and FE.MEM_ID = FND.ConnectionID)
order by dateUpdated desc)
, status = (select top 1 status from FriendExport_Temp FE
where (mem_id = FE.MEM_ID and FND.ConnectionID = FE.ConnectionID)
or (mem_id = FE.ConnectionID and FE.MEM_ID = FND.ConnectionID)
order by dateUpdated desc)
from FriendExport_NoDup FND
join FriendExport_Temp TMP on (TMP.mem_id = FND.MEM_ID and TMP.ConnectionID = FND.ConnectionID)
or (TMP.mem_id = FND.ConnectionID and TMP.ConnectionID = FND.MEM_ID)
where ModifiedDateTime is null
if @@rowcount > 0 goto UpdateMore
------
-- update created date
update top (500) FriendExport_NoDup
set createdDateTime = (select top 1 dateUpdated --min(dateUpdated)
from FriendExport_Temp FE
where (FND.mem_id = FE.MEM_ID and FND.ConnectionID = FE.ConnectionID)
or (FND.mem_id = FE.ConnectionID and FE.MEM_ID = FND.ConnectionID)
order by FE.dateUpdated )
from FriendExport_NoDup FND
join FriendExport_Temp TMP on (TMP.mem_id = FND.MEM_ID and TMP.ConnectionID = FND.ConnectionID)
or (TMP.mem_id = FND.ConnectionID and TMP.ConnectionID = FND.MEM_ID)
where createdDateTime is null
February 19, 2013 at 8:59 am
homebrew i think this is equivilent to your first update query, but should theoretically perform better.
compare the SELECT to teh UPDATE , and tell me if it looks like it woudl update the same number of rows you are expecting:
--visualizing the matching rows
SELECT FND.MEM_ID,FND.ConnectionID,FND.ModifiedDateTime,FND.[status],FE.RW,FE.MEM_ID,FE.ConnectionID,FE.ModifiedDateTime, FE.[status]
FROM FriendExport_NoDup FND
INNER JOIN (SELECT
row_number() over (PARTITION BY mem_id ORDER BY mem_id,dateUpdated DESC) AS RW,
MEM_ID,
ConnectionID,
dateUpdated,
[status]
FROM FriendExport_Temp) FE
on (FE.mem_id = FND.MEM_ID and FE.ConnectionID = FND.ConnectionID)
or (FE.mem_id = FND.ConnectionID and FE.ConnectionID = FND.MEM_ID)
where FE.ModifiedDateTime is null
AND FE.RW = 1
--the update candidate
update FND
set FND.ModifiedDateTime = FE.dateUpdated,
status = FE.status
--SELECT FND.MEM_ID,FND.ConnectionID,FND.ModifiedDateTime,FND.[status],FE.RW,FE.MEM_ID,FE.ConnectionID,FE.ModifiedDateTime, FE.[status]
FROM FriendExport_NoDup FND
INNER JOIN (SELECT
row_number() over (PARTITION BY mem_id ORDER BY mem_id,dateUpdated DESC) AS RW,
MEM_ID,
ConnectionID,
dateUpdated,
[status]
FROM FriendExport_Temp) FE
on (FE.mem_id = FND.MEM_ID and FE.ConnectionID = FND.ConnectionID)
or (FE.mem_id = FND.ConnectionID and FE.ConnectionID = FND.MEM_ID)
where FE.ModifiedDateTime is null
AND FE.RW = 1
Lowell
February 21, 2013 at 10:28 am
Thanks, I will give that a try.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply