Populate large data

Question

Populate large data

Dung Dinh

SSCrazy

Points: 2513
More actions
September 14, 2010 at 11:13 am

#145179

Hi all,
Now, I have a table A which contains 50 million records. And each day I will pour data from 10 other tables DailyTransaction to the table A simultaneously. Each DailyTransaction table can contains 200000 records.
My problem here is the table A is locked a long time so time out happned.
My example:
Create table A
(
RecID int identity not null
,ReportDate datetime not null
,ReportTime int
,FileSource int
,MerchantNumber not null
,TransactionCount int
,TransactionAmount amount
)
Create Clustered Index I On A (RecID)
Create Non Clustered Index A On (Reportdate, MerchantNumber)
Creata table DailyTransaction_1
Create table A
(
ReportDate datetime not null
,ReportTime int
,FileSource int
,MerchantNumber not null
,TransactionCount int
,TransactionAmount amount
)
Other tables DailyTransaction are same structure
Here is process when a DailyTransaction table is processed to insert data into A table
--> Delete data in case run process many times
DELETE A WHERE ReportDate=(SELECT 1 ReportDate FROM DailyTransaction_1)
AND ReportTime=(SELECT 1 ReportTime FROM DailyTransaction_1)
AND FileSource=(SELECT 1 FileSource FROM DailyTransaction_1)
--> Insert data
INSERT INTO A()
SELECT * FROM DailyTransaction
Some conditions here:
The table A is not partitioned table and is on product environtment and I cant stop servers of client to implement partition. I also can use views because I will must modify my process and store procedures.
Any ideas can help me?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

A couple of questions:

Have you done any attempts to determine the throttle point, if it's reads from the Daily Transaction DBs, if it's network throughput, if it's the writes local, or if it's CPU/Memory pressure? If so, what have you determined is (or isn't) the problem so far?

Are the databases on the same server, or is this a non-local transfer? If the same server, are they using separate raid spindles or is everything lumped up into one drive? Is this in SSIS or are you using linked servers to move the data over if separate servers?

What's the local setup on the server? Are your log files in the same drives as your data? If not, are they different logical drives or different physical spindles in your RAID? Do you have any access to the raid controller setups, even to just look and see what's there?

And a few ideas to look into as well:

BULK INSERT

The clustered index on Table A(), is it scattering these records into the index or is it able to add to the same section to remove churn?

Are there foreign keys this table relies on to check the data, or any other internal constraints/checks/calculated fields/etc on the table?

How about secondary non-clustered indexing? If there's a lot of them and you find them under-utilized, each one you can remove will help with overall performance. Worst case scenario you can disable them/drop them for a while.

Without some more information I fear that's the best I've got for the moment, but those are the places I'd start.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Dung Dinh SSCrazy Points: 2513 More actions · Answer 2

Thanks,

The DailyTransaction tables and table A are on a server and I only get data from the tables and insert them into A.I only use script(s) to transfer data.

The server is using Win2003 Server and 2GB RAM and hard disk(s) are not RAID.

There is no contraints/checks/calculated fields on table A.

The secondary non-clustered index will help improve performance my application when I access the table A.

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 3

Dung Dinh (9/14/2010)
I only use script(s) to transfer data.

We'll probably need to see those and a couple of other things. Take a look at the article found at the second link in my signature line below.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul Swinfield SSC-Addicted Points: 498 More actions · Answer 4

In addition to trying everything else posted you could try this.

Having been through a similar problem years ago using a poorly specified machine I had to resort to using cursors (ARGHHHHH!!!). Create a table and put all ten sets of data into it. Loop through the table and merge the rows one by one and commit the transaction every few rows. People will quake with terror at such a thing. But it stopped the table deadlocking and bizarrely ended up running quite quickly.

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 5

Paul Swinfield (9/16/2010)
In addition to trying everything else posted you could try this.
Having been through a similar problem years ago using a poorly specified machine I had to resort to using cursors (ARGHHHHH!!!). Create a table and put all ten sets of data into it. Loop through the table and merge the rows one by one and commit the transaction every few rows. People will quake with terror at such a thing. But it stopped the table deadlocking and bizarrely ended up running quite quickly.

Actually, that's not such a strange recommendation for such a thing. A lot of folks on this forum (including me) recommend something similar when deleting a huge number of rows from a huge table.

Heh... people will still "quake with terror" when they first read your recommendation but it is a tried and true method. What the hell... it's how BCP and Bulk Insert both work.

When you do such a thing, it's also sometimes quite helpful to give the system a chance to breath by adding an appropriate WAITFOR DELAY command.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dung Dinh SSCrazy Points: 2513 More actions · Answer 6

Dung Dinh

SSCrazy

Points: 2513

September 17, 2010 at 2:52 am

#1222816

Thank guy,

I will try to do this way.