July 11, 2005 at 8:19 am
I just read an article by Clinton Herring titled "Replacing Cursors and While Loops". It got me thinking on if I could save some time in a process that I run on a daily basis. I have a love-hate relationship with cursors and would rather not have them if I could. But there is one place where I cannot get past using a cursor. So maybe you guys here can help me out. Here's the situation: We get a daily file with merchant activity information for the past day. Due to some internal business rules and for faster data retrieval we have to store each merchant's data in its own table. I have a SP that loops thru the daily file and inserts data in each indicidual merchant table. If a table does not exist for the mechant, it first creates a table before the Insert.
Can this be converted to a non-cursor based solution or am I stuck here? the import job currently takes 45 minutes and the daily file contains 400,000 records in each daily file.
Here are some details of the tables and process(not real production table and data)
Thanks for all your help!!!
DDL for Table with daily data:
CREATE TABLE [dbo].[DailyData] (
[DailyDataID] [int] IDENTITY (1, 1) NOT NULL ,
[MerchID] [varchar] (16) NOT NULL ,
[FileDate] [datetime] NOT NULL ,
[TranCode] [varchar] (4) NOT NULL ,
[TranDetail] [varchar] (100) NOT NULL ,
)
--SP Code:
CREATE PROCEDURE dbo.insDailyData
@FileDate varchar(10)
AS
----DECLARE variables to be used in this section
DECLARE
@MerchID varchar(16)
, @RecCount int
, @sql varchar(4000)
-----SET Initial value for @RecCount
SELECT
@RecCount = 0
-----DECLARE and Open Cursor for processing
DECLARE DailyDataCursor CURSOR
FOR
SELECT DISTINCT
T.MerchID
FROM
DailyData AS T
ORDER BY
T.MerchID
OPEN DailyDataCursor
FETCH NEXT FROM DailyDataCursor
INTO @MerchID
WHILE @@FETCH_STATUS = 0
BEGIN
-- Table Name = Tran+MerchID
-----Check to see if table exists
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE [ID] = object_id(N'[dbo].[Tran' + @MerchID + ']') AND OBJECTPROPERTY([ID], N'IsUserTable') = 1)
BEGIN
-----Step 1: DELETE data for @FileDate if it exists
SELECT
@sql = '
DELETE tbl
FROM
dbo.Tran' + @MerchID + ' AS tbl
WHERE
tbl.FileDate = CONVERT(varchar(10), ''' + @FileDate + ''', 101)'
EXEC(@SQL)
END
ELSE
BEGIN
-----Step 1: CREATE table for MerchID
SELECT
@sql = '
CREATE TABLE [dbo].[Tran' + @MerchID +'] (
[TranID] [int] IDENTITY (1, 1) NOT NULL ,
[FileDate] [datetime] NOT NULL ,
[MerchID] [varchar] (16) NOT NULL ,
[TranCode] [varchar] (4) NOT NULL ,
[TranDetail] [varchar] (100) NOT NULL ,
[DateRecAdded] [datetime] NOT NULL
)'
EXEC(@SQL)
SELECT
@sql = '
ALTER TABLE [dbo].[Tran' + @MerchID +'] WITH NOCHECK ADD
CONSTRAINT [PK_Tran' + @MerchID +'_TranID] PRIMARY KEY CLUSTERED
(
[TranID]
)'
EXEC(@SQL)
SELECT
@sql = '
ALTER TABLE [dbo].[Tran' + @MerchID +'] ADD
CONSTRAINT [DF_Tran' + @MerchID +'_DateRecAdded_1] DEFAULT (GETDATE()) FOR [DateRecAdded]'
EXEC(@SQL)
END
-----Step 2: INSERT daily data in Merchant table
SELECT
@sql = '
INSERT INTO dbo.Tran' + @MerchID + ' (FileDate, MerchID, TranCode, TranDetail, DateRecAdded)
SELECT DISTINCT TOP 100 PERCENT
D.FileDate
, D.MerchID
, D.TranCode
, D.TranDetail
, GETDATE() AS DateRecAdded
FROM
dbo.DailyDetail AS D
WHERE
D.MerchID = ''' + @MerchID + '''
ORDER BY
D.MerchID'
EXEC(@SQL)
FETCH NEXT FROM DailyDataCursor
INTO @MerchID
END
CLOSE DailyDataCursor
DEALLOCATE DailyDataCursor
July 11, 2005 at 8:31 am
There's your problem :
"Due to some internal business rules and for faster data retrieval we have to store each merchant's data in its own table. I have a SP that loops thru the daily file and inserts data in each indicidual merchant table"
There should only be one table for that. (unless you have 100M rows in that table, then you can start thinking about splitting it up ).
Here's a few pointers that might shed 1 minute on the whole process (hey it's a start ).
SELECT
@sql = '
CREATE TABLE [dbo].[Tran' + @MerchID +'] (
[TranID] [int] IDENTITY (1, 1) NOT NULL ,
[FileDate] [datetime] NOT NULL ,
[MerchID] [varchar] (16) NOT NULL ,
[TranCode] [varchar] (4) NOT NULL ,
[TranDetail] [varchar] (100) NOT NULL ,
[DateRecAdded] [datetime] NOT NULL
)'
EXEC(@SQL)
SELECT
@sql = '
ALTER TABLE [dbo].[Tran' + @MerchID +'] WITH NOCHECK ADD
CONSTRAINT [PK_Tran' + @MerchID +'_TranID] PRIMARY KEY CLUSTERED
(
[TranID]
EXEC(@SQL)
SELECT
@sql = '
ALTER TABLE [dbo].[Tran' + @MerchID +'] ADD
CONSTRAINT [DF_Tran' + @MerchID +'_DateRecAdded_1] DEFAULT (GETDATE()) FOR [DateRecAdded]')
EXEC(@SQL)
This code can be executed in a single step. The default can be included in the original create, so can the primary key.
SELECT DISTINCT TOP 100 PERCENT
D.FileDate
, D.MerchID
, D.TranCode
, D.TranDetail
, GETDATE() AS DateRecAdded
FROM
dbo.DailyDetail AS D
WHERE
D.MerchID = ''' + @MerchID + '''
ORDER BY
D.MerchID'
Remove the order by (unless it performs the insert faster... depends on the clustered index). Also since you have the default on the column, you don't need to select the getdate() (might save a little time here).
July 11, 2005 at 9:37 am
Thanks for your reply Remi.
You are right about the suggestion that we should have one table for all the data. Reason we cannot do that is for compliance issues and also majority of the merchant tables have close to 50M records in each table - we have to provide reporting data for 18-month periods.
Also I can execute the table create process in one SQL statement but we have more columns in the table with different clustered indexes that what has been displayed in my original post.
You are right too in your suggestion that I should take out “, GETDATE() AS DateRecAdded” in insert statement.
July 11, 2005 at 9:46 am
I can understand the split if you have 50M rows/merchant .
The only thing that could speed this up is to make sure that the clustered index of the Details table is on the MerchID column and to remove the order by MerchId in the select since all the columns have the same values, this sort is useless.
I fear you are stuck with this procedural approach. The final tiny upgrade you could make would be to convert the cursor to a loop
--don't forget to initialize this variable
while @Myid not is null
select @Myid = max(merchid) from @DistinctMerchids where merchid > @MyId order by Merchid
This will remove the overhead of the cursor from the proc. It's not gonna be much be it's the sum of those upgrades that make the server go fast...
July 11, 2005 at 9:54 am
I think that if you need that MANY tables you'll need a different approach or your data will become unmanageable.
I will try any of the following 2 options.
1. Create a Partitioned view which you will refresh if the customerID does not exists in it and then execute an insert comand against the the Partitioned view
2. Another option is to try Parallel loading with DTS or BCP+osql
Just my $0.02
* Noel
July 12, 2005 at 1:16 am
DECLARE @MerchID varchar(16)
, @LastMerchID varchar(16)
SELECT TOP 1 @MerchID = MerchID
FROM DailyData
WHERE MerchID > @LastMerchID
ORDER BY MerchID
WHILE LEN(@MerchID) > 0
BEGIN
...
-- Loop
SET @LastMerchID = @MerchID
SELECT TOP 1 @MerchID = MerchID
FROM DailyData
WHERE MerchID > @LastMerchID
ORDER BY MerchID
IF @LastMerchID = @MerchID
BREAK
END
Andy
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply