November 4, 2008 at 3:50 am
Hi,
I wrote the store procedure to fix the one of my EMPLOYEE Table which contained 30,000,000 records. It tooks about 5 hours :w00t: to do a process when I run the store procedure below. Anyway can I make it quicker?
Thanks.
** I got 4000 records when I distinct it, then I search the currentmonth_1 which is about 2,000,00 records to update.
--***********
CREATE PROCEDURE [dbo].[Emp_Fix]
(@CurrentMonth_1char(6),
@SearchMonth_2char(6))
AS
DECLARE @EmpNo nvarchar(50)
DECLARE @WorkSQLString nvarchar(500)
DECLARE @Dept nvarchar(50)
CREATE TABLE #TEMP_EMP_TABLE (RowID int IDENTITY(1, 1),
EmpNonvarchar(50),
Deptnvarchar(50))
DECLARE @tbl_TEMP_EMP_TABLETABLE( RowID int IDENTITY(1, 1),
EmpNonvarchar(50),
Deptnvarchar(50))
DECLARE @ctr_TEMP_EMP_TABLE_CURRENTCOUNTER int , @ctr_TEMP_EMP_TABLE_MAXCOUNTER int
--------------------------------------------------------------------------------------------------------
SELECT @WorkSQLString = 'SELECT DISTINCT '
SELECT @WorkSQLString = @WorkSQLString + 'EmpNo, Dept '
SELECT @WorkSQLString = @WorkSQLString + 'FROM dbo.EMPLOYEE WITH (NOLOCK) '
SELECT @WorkSQLString = @WorkSQLString + 'WHERE (DEPT like ' + '''' + '%BACK%' + ''') AND '
SELECT @WorkSQLString = @WorkSQLString + '(EmpMonth = ' + '''' + @SearchMonth_2 + ''' ' + ') '
SELECT @WorkSQLString = @WorkSQLString + 'ORDER BY EmpNo '
INSERT #TEMP_EMP_TABLE EXECUTE sp_executesql @WorkSQLString
DELETE FROM @tbl_TEMP_EMP_TABLE
INSERT @tbl_TEMP_EMP_TABLE
SELECT#TEMP_EMP_TABLE.EmpNo,
#TEMP_EMP_TABLE.Dept
FROM#TEMP_EMP_TABLE
DROP TABLE #TEMP_EMP_TABLE
SELECT@ctr_TEMP_EMP_TABLE_CURRENTCOUNTER = MIN(RowID),
@ctr_TEMP_EMP_TABLE_MAXCOUNTER = MAX(RowID)
FROM@tbl_TEMP_EMP_TABLE
WHILE (Not @ctr_TEMP_EMP_TABLE_CURRENTCOUNTER Is Null) And (Not @ctr_TEMP_EMP_TABLE_CURRENTCOUNTER = 0) And (@ctr_TEMP_EMP_TABLE_CURRENTCOUNTER <= @ctr_TEMP_EMP_TABLE_MAXCOUNTER)
Begin
SELECT@EmpNo= EmpNo,
@DEPT = Dept
FROM@tbl_TEMP_EMP_TABLE
WHERERowID = @ctr_TEMP_EMP_TABLE_CURRENTCOUNTER
--@@@@@@@@@@@@@@@@
SELECT @EmpNo = LTrim(RTrim(@EmpNo))
SELECT @Dept = LTrim(RTrim(@Dept))
--Update
UPDATEdbo.EMPLOYEE
SETDept = @Dept
WHERE(EmpMonth = @CurrentMonth_1) AND
(EmpNo = @EmpNo) AND
(DEPT like '%BACK%')
--Output
SELECT @WorkSQLString = @Dept + " , " + @EmpNo
PRINT @WorkSQLString
--@@@@@@@@@@@@@@@@
SELECT @ctr_TEMP_EMP_TABLE_CURRENTCOUNTER = @ctr_TEMP_EMP_TABLE_CURRENTCOUNTER + 1
End
GO
--***********
November 4, 2008 at 4:21 am
maybe I'm missing something, but i do not see any reason for the looping construction to be in your procedure(which is the reason it is slow)
it looks like you load a table where a date matches @SearchMonth_2 , but then forces all the dates to be @SearchMonth_1
i think you could do it want in an UPDATE FROM statement and be done with it single operation:
UPDATE dbo.EMPLOYEE
SET EMPLOYEE.Dept = X.Dept
FROM (
SELECT DISTINCT
EmpNo, Dept '
FROM dbo.EMPLOYEE WITH (NOLOCK)
WHERE (DEPT like '%BACK%')
AND (EmpMonth = @SearchMonth_2 )
) X
WHERE (EMPLOYEE.EmpMonth = @CurrentMonth_1)
AND (EMPLOYEE.EmpNo = X.EmpNo) AND
AND (EMPLOYEE.DEPT like '%BACK%')
Lowell
November 4, 2008 at 4:37 am
I'm not surprised it takes so long. You've effectively implemented a cursor that goes through each row one by one. And at first glance you don't need a cursor and you can probably do this with a single statement.
Firstly, you don't need to use dynamic SQL to build your temp table.
INSERT INTO #TEMP_EMP_TABLE (EmpNo, Dept)
SELECT DISTINCT
EmpNo, Dept
FROM dbo.EMPLOYEE WITH (NOLOCK)
WHERE
DEPT like '%BACK%'
AND
EmpMonth = @SearchMonth_2
ORDER BY EmpNo
And then you'd have a single update statement, like this:
UPDATE dbo.EMPLOYEE
SET Dept = LTRIM(RTrim(e.Dept))
FROM Employee e
JOIN #TEMP_EMP_TABLE te on te.EmpNo = e.EmpNo
WHERE (EmpMonth = @CurrentMonth_1)
Now, you can easily ignore the creation of the temp table and do the whole thing in one single update statement by using a derived table but I'll let you figure that out. Actually, it looks like Lowell's being kind enough to do it for you;)
And by the way, don't just take my code as being right as I can't guarantee I haven't made a mistake. It's just to give you an idea of where to get started. Check it out and test it out for yourself to make sure it does what you want it to do.
November 4, 2008 at 5:01 am
Hi Lowell and Karl,
Thank you very much. They reason I am loop through single rows is I want the output for the record which has been updated.
But that is Much Quicker.
November 4, 2008 at 7:30 am
Hi,
Code is fine but it fill up my tempdb.
I am getting 'Could not allocate new page for database TEMPDB' ....message and stop the process.
Any idea?
Thanks.
November 4, 2008 at 7:43 am
Is tempDB restricted in size? Is it on a drive that's full?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 4, 2008 at 7:57 am
Drive Size is full.
anyway to reset the tempdb back to the normal size rather then restart the sql server?
Thanks.
November 4, 2008 at 8:08 am
I'd suggest move it to a larger drive. If it's filling up with a simple update and a temp table, then the drive it's on is probably too small.
How big is TempDB?
You can shrink it, but it will grow again.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 4, 2008 at 8:10 am
It was 104MB now 59GB so far but still going.
Update thing is still running.
November 4, 2008 at 8:16 am
Which one are you running? What indexes are there on the Emp table?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 4, 2008 at 8:20 am
This is Test Table so that no indexes.
I will going to do it on proper Table with indexes on when that Update process is okay.
November 5, 2008 at 8:25 am
Er...
What kind of an organization has an Employee table with 30 million records in the first place ?
As for obtaining a list of which records were updated, you do not need a loop to do this. Just select all the records with a "changed date" of today.
If you do not have such a date field, it might be difficult to add it to such a large table though.
November 5, 2008 at 9:54 am
Leo (11/4/2008)
This is Test Table so that no indexes.I will going to do it on proper Table with indexes on when that Update process is okay.
You should put indexes on your test table so that you you know how long the update will take in prod, and so that it will run in a reasonable amount of time.
Performance testing should not be done on the production system.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 5, 2008 at 2:19 pm
J (11/5/2008)
Er...What kind of an organization has an Employee table with 30 million records in the first place ?
As for obtaining a list of which records were updated, you do not need a loop to do this. Just select all the records with a "changed date" of today.
If you do not have such a date field, it might be difficult to add it to such a large table though.
Yes, Table has a date field. But the person who create a table, hasn't think of future growth of the database. :w00t: Now millions of rows are sitting in a table and taking too long when trying to read the records or update the records.
By the way, this table is containing 2 years of data with Employee Clock In/Out information.
November 5, 2008 at 2:41 pm
OK, makes more sense now.
To me table [font="Courier New"]Employee[/font] meant a list of employees with basic data:
[font="Courier New"]CREATE TABLE Employee
....(
........EmployeeID int NOT NULL PRIMARY KEY IDENTITY,
........EmployeeCode varchar(8),
........Family_Name varchar(30),
........Given_Name varchar(30),
........Other_Name varchar(30),
........Date_Hired datetime,
........Phone_number varchar(15)
....)[/font]
You get the picture. (There are about 30 million people in Canada)
I would have expected to a second table for the employee check-in, check-out times.
[font="Courier New"]CREATE TABLE Employee_Time_check
....(
........pk int NOT NULL PRIMARY KEY IDENTITY,
........fk_EmployeeID int ,
........Work_Day_Date datetime,
........CheckIn bit,
........CheckTime datetime
....)[/font]
This would thus be a Master-Detail configuration.
This not being your case, you have to define the selection criteria when you query your 30 million strong table.
For instance, do you query by
-EmployeeName ?
-workday ?
-workday range ?
You should set up indexes on the fields you query by.
As a stopgap measure, you might just create an archive table and dump in there any record created before a certain date.
(Do you still need to know what was the attendance record of each employee for each day since the last 10 years ?)
Note that this would get you out a fix but that it does not address design shortcomings of your database.
And then delete the records transferred from the original table. You would have to do this ONCE.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply