April 6, 2005 at 6:15 pm
Hello, wondering if any have encountered this problem:
I am looping through about 50,000 records. importing them from one table to another. For each record, selecting the data i want into local variables, then passing those variables to a stored procedure that does 2 things. 1.) checks to see if data already exists 2.) inserts the data if not.
The script runs for several minutes until it has imported about 32,700 records then stops importing...but continues running (processor at 100% the whole time, memory = 600mb/1000mb).
I have let it run for 11 hours...and it doesn't get past about 32K of 50k records. If i comment out the stored procedure and print out the number of times the loop executed, it runs in 2 seconds without fail.
I will insert both my script (loop), and my stored procedure if it helps any.
thanks for any help.
Script:
--## Import Addresses to Households from Member table to Address table
Declare @HouseHoldID as int
Declare @mailaddress1 as varchar(100), @mailaddress2 as varchar(100), @mailcity as varchar(100), @mailstate as varchar(100), @mailzip as varchar(100), @mailcountry as varchar(100)
DBCC CHECKIDENT (Address, RESEED, 0) -- reset the identity
Declare @MaxID as int, @CurrentID as int
Set @CurrentID = 1
SELECT @MaxID = Max(MVINumber) From mvi.mvi.Member
While @CurrentID <= @MaxID
BEGIN
/* do work here */
-- clear variables
Set @HouseHoldID = null
Set @mailaddress1 = null
Set @mailaddress2 = null
Set @mailcity = null
Set @mailstate = null
Set @mailzip = null
Set @mailcountry = null
-- get the source info from member table.
Select @mailaddress1=mailaddress1, @mailaddress2=mailaddress2, @mailcity=mailcity, @mailstate=mailstate, @mailzip=mailzip, @mailcountry=mailcountry
from mvi.mvi.member
where mvinumber = @CurrentID
-- insert the info into the new schema.
--address is to be linked to householdid. Get householdID
Select @HouseHoldID = ContentGroupID From Person_View Where EntityID = @CurrentID
--insert the addresses
IF @mailaddress1 is not null
begin
Exec Address_Insert @HouseHoldID=@HouseHoldID, @address1=@mailaddress1, @address2=@mailaddress2, @city=@mailcity, @state=@mailstate, @zip=@mailzip, @country=@mailcountry, @addresstypeid=0
end
/* end work here */
Set @CurrentID = @CurrentID + 1
END
Stored procedure Address_Insert:
ALTER PROCEDURE Address_Insert
(
@HouseHoldID as int,
@AddressTypeID as int,
@Address1 as varchar(100),
@Address2 as varchar(100),
@City as varchar(100),
@State as varchar(100),
@Zip as varchar(100),
@Country as varchar(100),
@ContentGroupID as int = -1
)
AS
Declare @Count as int
Select @Count = Count(AddressID) From Address_View Where HouseHoldID=@HouseHoldID And ContentTypeID=@AddressTypeID
If @Count < 1 begin -- Insert into ContentSequence system...retrieve ContentSequenceID Declare @ContentSequenceID as int Exec @ContentSequenceID = ContentSequence_Insert @ContentGroupID=@ContentGroupID, @ContentTypeID=@AddressTypeID
-- Insert the address record Insert Into Address( HouseHoldID, Address1, Address2, City, State, Zip, Country, ContentSequenceID ) Values( @HouseHoldID, @Address1, @Address2, @City, @State, @Zip, @Country, @ContentSequenceID ) end
Stored Procedure: ContentSequence_Insert:
ALTER PROCEDURE ContentSequence_Insert
(
@ContentGroupID as int = -1,
@ContentTypeID as int = null
)
AS
IF @ContentGroupID < 1 --if no group exists...Get new groupid
begin
Exec @ContentGroupID = ContentGroup_Insert
end
--get the max(ContentSequence) for inserting the record.
Declare @ContentSequence as int
Select @ContentSequence = Max(ContentSequence) +1
From ContentSequence
Where ContentGroupID = @ContentGroupID
IF (@ContentSequence IS NULL) -- check for null value...increment to 0 if null.
begin
Set @ContentSequence = 0
end
--insert the record.
Insert into ContentSequence(ContentGroupID, ContentTypeID, ContentSequence)
Values( @ContentGroupID, @ContentTypeID, @ContentSequence )
RETURN @@Identity --return ContentSequenceID
Stored Procedure: ContentGroup_Insert:
ALTER PROCEDURE ContentGroup_Insert
AS
Declare @ContentGroupID as int
Insert Into ContentGroup(ContentGroupDate)
Values(GetDate())
Set @ContentGroupID = @@Identity
-- Return the ContentGroup Identity so that a record may participate in the ContentSequence System.
-- all participating records must be part of a ContentGroup.
Return @ContentGroupID
.
April 6, 2005 at 9:02 pm
As to why it stops inserting after 32700 rows, it's possible that data being inserted to the Address table or Address_View view contains duplicate records on the HouseholdID and ContentTypeID.
To give us a complete view of the problem on why the loop won't terminate, please insert the ContentSequence_Insert stored procedure.
April 6, 2005 at 9:31 pm
I have included 2 more stored procedures to complete the code.
During my testing, i had commented out the execution of the stored procedure:
Exec @ContentSequenceID = ContentSequence_Insert @ContentGroupID=@ContentGroupID, @ContentTypeID=@AddressTypeID
in Address_Insert.
It showed no change in it's behavior.
.
April 7, 2005 at 1:57 am
Why do you not use a cursor for this job?
Anders Dæmroen
epsilon.no
April 7, 2005 at 2:01 am
Did you verify the log and data size ? Maybe the log or data file is full, and they can not grow ( not automatically grow, Restricted file growth or disk full ) ?
To see what is happening, you can print the CurrentID in the beginning of the loop. So you can see where exactly it goes wrong.
I would suggest not to use @@identity. See BOL and IDENT_CURRENT ...
Also, check for null for @HouseHoldID and @AddressTypeID in Address_Insert ...
Bert
April 8, 2005 at 7:31 am
Hi,
You declare variables as int, which can hold 32768 as maximum number.
Declare maxID and CurentID as bigint (or numeric) and you're fine!
JP
April 8, 2005 at 7:47 am
Int can hold up to 2147483647. Smallint is limited to 32767 .
From BOL:
bigint
Integer (whole number) data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807). Storage size is 8 bytes.
int
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is integer.
smallint
Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is 2 bytes.
tinyint
Integer data from 0 through 255. Storage size is 1 byte.
April 8, 2005 at 7:50 am
you're right! Got confused with vb & vbscript
JP
April 8, 2005 at 2:32 pm
You might try breaking the job into smaller batches to see what happens.
Also, I don't see any kind of error checking so you have no way of knowing if any of these statements failed and why?
Here's a good article on error handling:
http://www.sommarskog.se/error-handling-II.html#presumptions
April 10, 2005 at 12:31 am
Thanks for all the suggestions, i was able to complete the process without any problems. Here are my conclusions and questions:
The process completed when I stopped using @@Rowcount in Address_Insert and replaced it with @Count. Is there a problem with using system variables like @@Rowcount and @@Identity? Does this compromise my database?
Bert, you recommended against using @@Identity, but i am using it because i only want the identity of a table that was inserted in the current scope. i noticed that using IDENT_CURRENT could give me the Identity from another scope. here's what BOL says:
Bert: I had no problem with unrestricted growth of datbase and log files.
Grasshopper: I avoided cursors, because i seem to get much better performance with a counter rather than a cursor.
Enthusiast: is a procedural approach to SQL always wrong? It seems that my approach was the best for me since i wanted to take advantage of data logic already written and stored in other stored procedures. How are you supposed to reuse your stored procedures if you are going to only use t-sql everytime you write a complicated routine?
Example: i don't want to write an Insert statement as in your example above, i want to let my stored procedure handle it, so i don't have to rethink the process. I appreciate any more information you can give me on where my approach has gone wrong.
.
April 11, 2005 at 4:23 pm
So how do you recommend writing reusable code?
Do you recommend the use of Stored Procedures? If so, how am i supposed to run tests on a value before executing a stored procedure? I can run those tests from another language, but that may not be as practical.
Scenario:
lets say you group addresses together with GroupID and link them to customers with customerID.
A client application attempts to add an address to a Customer. You need to perform the following logic:
1. check to see if customer exists.
2. check to see if customer already has any addresses
a.) if addresses exist, get the groupID that they belong to.
b.) if addresses dont' exist, obtain a new groupID.
3. Insert the address with the CustomerID and GroupID
Where do you recomment placing this logic? And how would you write "Good Code" that would accomplish this task?
.
April 11, 2005 at 9:40 pm
Hi John just a couple of observations, your database allows alternate address fields (2) but only one field for city, state and county. What if the alternate address is in another state? Secondly if you check the first post by Joe Celko he gave you the code for moving the information from one table to another.
From Joe Celko’s first post.
“INSERT INTO Foo( ...)
SELECT ..
FROM Bar
WHERE NOT EXISTS
(SELECT *
FROM Foo
WHERE Foo.key_col = Bar.key_col);
Updating the address for an existing customer (your last post) is a different question. What are your business rules for validating your data? Are you relating Customer Id to First and Last Names, or Customer ID to address? You can write an UPDATE statement to change the value in an existing record.HTH
Mike
April 11, 2005 at 10:14 pm
Michael,
Address 1 and Address 2 are for line 1 and line 2 of an address respectively. AddressTypeID differentiates between mailing, billing, shipping addresses...etc...
so one Customer to Address is a one to many relationship.
You are correct in stating that Joe Celko's code moves data from one table to another. That isn't my problem. My question is about using procedural code. Is it a bad idea to use procedural code? Joe calls it "Bad Code". I'm more interested in why he calls it bad code. It is obvious that performance is lost, but i've found it to be very practical to check conditions of data before Updating or Inserting. I've also found it valuable to call other stored procedures which makes my code reusable...and saves large amounts of time in development and also simplifies the amount of code to manage in a project.
thanks for the input
.
April 12, 2005 at 3:36 am
John,
I agree that it is a bad habit to use procedural code, unless it is necessary for some reason. I don't see any such reason in this example, however, I don't understand much what you are doing with the ContentGroup and ContentSequence tables. Could you explain this in more detail, please? Maybe we are missing something important here...
I couldn't help thinking, that a simple SQL code with a branch (in short pseudocode IF EXISTS -> UPDATE, ELSE INSERT) would accomplish the same, while being simpler, more reliable, easier to maintain and without risk of infinite looping. Quite a lot of reasons to qualify it as better . Even these thing with Content should be manageable the same way, though this is just a guess until I know more about what it is, what are the rules and how it is used.
HTH, Vladan
April 12, 2005 at 4:46 am
"It is obvious that performance is lost, but i've found it to be very practical to check conditions of data before Updating or Inserting."
John you have answered your own question . Performance is improved. The code is also much simpler and easer to maintain.
As to checking conditions of data before Updating or Inserting it is not only practical but imperative that you filter data before updating or adding records.
Valadan's suggestion of a branching statement checks to see if the customer exists and then either Inserts a new record or updates the existing record. A carefully constructed WHERE statement will validate your data. If any of the conditions are not met then the data will not be added or updated.
But it could be that I am missing something here. It might be helpful if you could provide the business rules and what you are trying to accomplish.
HTH
Mike
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply