Can this Query be tuned ? God knows !!!

  • quoteThankd for your reply JEff, with regards to naming files daily, the problem is that w eneed to knwow aht transactions came in on a daily basis, thats the reason why the tables are named daily. Also,w...

    That's what Celko and I have been talking about... that's what datetime columns are for.  But, forget about that... as you said, that would be a major rewrite.

    Anyway, the short fix is as I said previously... make a unique index with the ignore dupes option on whatever columns in your temp table that make a CDR unique.  As you BCP the data in, duplicate rows will be treated as errors which, if you want, can actually be squirreled off to an errata file.  Check Books on line for the necessary switches...

    OR... If you don't want to do it on the temp table, you can add the unique/ignore dupe index on the daily feed table... it'll just reject duplicate records without causing the whole batch to abort.  It'll let you know if dupes were ignored for each batch but it won't abort.

    Then, change your query just to do a joinless INSERT/SELECT from one table to the other.  I tried it with a table more narrow than yours but it inserted 10.1 million records, 100,000 of which were dupes (and properly rejected), in just over 7.5 minutes.  It'll do 10,000 records in the blink of an eye.  And that's on my destop single processor box with IDE drives... imagine what it'll do on a properly configured server... it'll fly.

    That's all you need to do...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

     

    Thanks for your reply.

     

    If you look at the code I initially posted, you will see that the insert happens in chunks, so if you actually looked at the original code, I sent where it goes

    p.tbMRFileStatus_id = 974618   

     

    This would always change all the time, so where p.tbMRFileStatus_id = 974618  could mean that 2000 records would get inserted into the daily table, on th next iteration in the cursor,  p.tbMRFileStatus_id could be 974619, which in turn could mean that 1000 records would get inserted into the daily table from the temporary table.

       

    Also, this is how the temp table works/created. This is actually the code behind it.`  

     

    BULK INSERT  #temPLOADERTABLE FROM '\\FileLocation' WITH ( FIELDTERMINATOR = '~', DATAFILETYPE= 'CHAR', ROWTERMINATOR = '\n', KEEPIDENTITY  , TABLOCK )

     

    alter table #temPLOADERTABLE add  constraint Indexx_temptable Primary Key Clustered( SeizeDate , EventType , CDRFingerPrint )

     

    Having seen this code, please let me know where your suggestions fit in. If you can also provide code samples, that would be really helpful as I find it easier to decode code rather than explanations ! weird, I know.

     

    Thanks

     


    Kindest Regards,

    John Burchel (Trainee Developer)

  • That's why I did a test on 10 million records... your chunks will be easy.

    Your temp table load is fine...

    I've suggested it 3 times now, John... for the temp table to daily feed table copy, remove the clustered primary key... it's just slowing you down... add a non-clustered unique with ignore dupes key to the columns that make a CDR unique.  Change you query to a straight insert with no joins.

    If you can't write the query, lemme know...

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    The CDR is unique on 3 columns, so I can have a constraint on those 3 columns.

    I have also sent you a Pm.

    Thanks


    Kindest Regards,

    John Burchel (Trainee Developer)

  • Ok... here we go... let's NOT do it on the BULK INSERT side... let's do it where you tried to insert into the daily feed file from the temp table as in your first post...

    I don't know what you have that creates the daily feed file(s) but the index you want on them is as follows... and, whatever you do... DO NOT PUT A CLUSTERED INDEX OF ANY TYPE ON THE DAILY FEED FILES!!!!  IT WILL KILL THE PERFORMANCE ON THESE HIGHLY TRANSACTIONAL / FRAGMENTED TABLES DURING INSERTS!!!!!  If you add a Primay Key, make sure that you don't add it until you are done loading ALL data from ALL files into a daily feed table.  ONLY THEN SHOULD YOU ADD A CLUSTERED KEY!!!

    Here's what the NON-CLUSTERED UNIQUE WITH IGNORE DUPES index look like...

     CREATE UNIQUE INDEX UIX_IgnoreDupes

         ON dbo.insertdailyfeedtablenamehere (DAILYFingerPrint,EventType,Termination_Date)

       WITH IGNORE_DUP_KEY

    GO

    With a little forethought on your part, you might even be able to make it dynamic in a manner similar to what I've done below...

    Here's a stored proc to do you insert into the daily feed table, whatever it is... please notice that there are two parameters you must pass... one for the value of tbMRFileStatus_id and one for the date-part of the date-named daily feed file.  If you leave the second parameter off, it will default to whatever today is.

    Note that dynamic SQL is required because of the (no offense, well maybe to the guy that designed such a woeful system) because your daily feed file name will change all the time on a whim and at least once a day.

     CREATE PROCEDURE dbo.AddToDailyFeed

            @tbMRFileStatus_id INT,                 -- This first parameter is required.

            @DesiredDate       DATETIME = GETDATE() -- Optional parameter defaults to now if it is not passed

         AS

    --===== Declare local variables

    DECLARE @DateYYYYMMDD VARCHAR(8)    --Date for daily feed file name

    DECLARE @DateYYMMDD   VARCHAR(6)    --Date to compare against Termination_Date

    DECLARE @DynSQL       VARCHAR(8000) --Holds the dynamic SQL required because of the date-named table (ack!)

    DECLARE @MyRowCount   INT           --Hold resulting rowcount for display feedback

    DECLARE @MyError      INT           --Hold resulting error value for display feedback and error checking by main rountine

    --===== Set the date variables

        SET @DateYYYYMMDD = CONVERT(VARCHAR(8),@DesiredDate,112) --For the file name

        SET @DateYYMMDD   = CONVERT(VARCHAR(8),@DesiredDate, 12) --For the temination date comparison

    --===== Identify the Start of the proc to the output device... whatever it may be

      PRINT REPLICATE('=',78)

      PRINT 'AddToDailyFeed started at ' + CONVERT(VARCHAR(30),GETDATE(),109)

      PRINT 'tbMRFileStatus_id: ' + CAST(@tbMRFileStatus_id AS VARCHAR(10))

      PRINT 'Daily feed name:   DailyFeed' + @DateYYYYMMDD

      PRINT REPLICATE('-',78)

    --===== Create the dynamic SQL to support date-named files, etc.

        SET @DynSQL = '

     INSERT INTO DailyFeed' + @DateYYYYMMDD + '

            (

            tbMRFileStatus_id,

            CDRFingerPrint,

            EventType,

            StartDate,

            StartTime,

            UserProvCPN,

            RouteIg,

            RouteOg,

            CalledPN,

            CallingPN,

            ElapsedTime,

            ContCalledNum,

            CallRedirInd,

            CalledPNType,

            TelServInd,

            SwitchName,

            TypeOfCall,

            Termination_Date,

            Location_id,

            TelephoneIndex_id,

            Switch_id,

            ClientRecognitionMethod,

            Client_id,

            Rate_id,

            ClientService_id,

            Overhead,

            OverheadType,

            Overhead_id,

            Charge,

            ChargeType,

            ChargePrice_id,

            Invoice_id,

            Status,

            MinimumCost,

            CallAddition

            ,Export_id,

            ATelephoneIndex_id,

            ALocation_id,

            Rate1Duration,

            Rate1Charge,

            Rate2Duration,

            Rate2Charge,

            Rate3Duration,

            Rate3Charge

            )

     SELECT

            tbMRFileStatus_id,

            CDRFingerPrint,

            EventType,

            StartDate,

            StartTime,

            UserProvCPN,

            RouteIg,

            RouteOg,

            CalledPN,

            CallingPN,

            ElapsedTime,

            ContCalledNum,

            IDACode ,

            CalledPNType,

            TelServInd,

            SwitchName,

            TypeOfCall,

            CONVERT(DATETIME,Termination_Date) AS Termination_Date,

            CULocation_id,

            TelephoneIndex_id,

            Switch_id,

            CUClientRecognitionMethod,

            CUClient_id,

            CURate_id,

            CUClientService_id,

            CUOverhead,

            CUOverheadType,

            CUOverhead_id,

            CUCharge,

            CUChargeType,

            CUChargePrice_id,

            CUInvoice_id,

            CUStatus,

            CUMinimumCost,

            CUCallAddition,

            CUExport_id,

            CUATelephoneIndex_id,

            CUALocation_id,

            CURate1Duration ,

            CURate1Charge ,

            CURate2Duration,

            CURate2Charge ,

            CURate3Duration,

            CURate3Charge

       FROM #CDRDataProcess  --table must already exist before you call this proc!!!!!!

      WHERE EventType & 4 <> 0

        AND tbMRFileStatus_id = ' + CAST(@tbMRFileStatus_id AS VARCHAR(10)) + '

        AND Termination_Date = ''' + @DateYYMMDD +'''' --Quotes required around string date here...

    --===== All set, execute the dynamic SQL with the calculated filename and all...

       EXEC (@DynSQL)

    --===== Capture both the rowcount and the error value, if any

     SELECT @MyRowCount = @@ROWCOUNT,

            @MyError    = @@ERROR

    --===== Identify the End of the proc to the output device... whatever it may be

      PRINT REPLICATE('-',78)

      PRINT 'AddToDailyFeed ended at ' + CONVERT(VARCHAR(30),GETDATE(),109)

      PRINT 'Rowcount: '    + CONVERT(VARCHAR(10),@MyError)

      PRINT 'Error value: ' + @MyError

      PRINT REPLICATE('=',78)

    --===== Exit the proc using the error value (good or bad) as the return code

     RETURN @MyError

    GO

    --===== Grant execute permissions on the new proc to everybody

      GRANT EXECUTE ON dbo.AddToDailyFeed TO PUBLIC

    GO

    If you need more than this, I'm going to have to start charging for production code ... after all, YOU are the "Senior Developer" here...

    I gotta say it again just so I can get the bad taste of the vomit code I just wrote for you out of my mouth... the overall design of your system is terrible.  Any system that requires daily date named tables is just wrong and the designers of the system need to be made to suffer a public execution.  THAT's how bad it is...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello  Jeff,

    Thanks very much for your help, wondered how much you would charge a Trainee Developer, as you can tell from my code...

    I want to ask one other question,  with the IGNORE_DUP_KEY option, how can I write duplicate entry onto a side table somewhere ?

    Looking forward to hearing from you.

    Thanks

    John

     


    Kindest Regards,

    John Burchel (Trainee Developer)

  • Your humility is surpassed only by your good nature, John.  Well done.

    The only time you can do something like what you ask automatically is during the BCP process where you can log all errors (dupes, in this case) to a separate file and then import that file to a separate table.  Although it's quite effective, I think it would be a bit much for this project with all the date-named files you have and all...

    You could do it using a separate INSERT into a separate table either before or after the insert from the temp table to the daily feed table (Insert won't do any logging like BCP can in the presence of an IGNORE_DUPES index).  The following select will find the dupes in the temp table that you load the daily feed tables from... Works best if you have a decent index on the temp table that covers the columns in the WHERE clause... Just add it to an INSERT to do what you ask...

     SELECT DISTINCT t.*,d.DupeCount

       FROM #CDRDataProcess t,

            (--==== Derived table "d" finds dupes by key and provides count, as well

             SELECT CDRFingerPrint,EventType,Termination_Date,COUNT(*) AS DupeCount

               FROM #CDRDataProcess

              GROUP BY CDRFingerPrint,EventType,Termination_Date

             HAVING COUNT(*) > 1

            ) d --End derived table "d"

      WHERE t.CDRFingerPrint   = d.CDRFingerPrint

        AND t.EventType        = d.EventType

        AND t.Termination_Date = d.Termination_Date

    Just in case you didn't know... a "derived table" is nothing more than a correctly formed sub-query (NOT a correlated sub-query which can be quite slow) whose return is used as if it were a table.  They're pretty easy to troubleshoot compared to correlated sub-queries, as well. 

    To make one, you write a query that returns what you want correctly, wrap it in parenthesis, give it a table alias ("d" in my example), and put it in a FROM clause of another query as if it were a table.  Some folks also call these "inline views" and they can be quite useful in dividing a larger problem into more manageable pieces.  They can also be Nasty fast and the sum of the individual run times usually outweighs the conglomerate run time by several times.  In other words, the whole runs faster than the sum of the parts, usually.

    Considering the width of your CDR records, you may want to remove the "DISTINCT" in the code above for the sake of speed.  Try it both ways to see which way you want to go...

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello Jeff / Hello All

    Thanks for your last post, I have another quick question for you, with regards to the    WITH IGNORE_DUP_KEY options, I just realized that should duplicate values be inserted into a table, the statement terminates, is there anything one can do that ignore these duplicate values and carry on working.

    My concern is that if one was loading data in the middle of the night, and an insert failure happens, it would get stock there until someone intervenes into it.

    I look forward to hearing from you.

    Thanks

     

     

     

     


    Kindest Regards,

    John Burchel (Trainee Developer)

  • Not sure why you think the statement terminates...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi All,

    I have been running this for some time now and found out that sometimes the inserts is a bit slow.

    INSERT INTO DailyFeed' + @DateYYYYMMDD + '

            (

            tbMRFileStatus_id,

            CDRFingerPrint,

            EventType,

            StartDate,

            StartTime,

            UserProvCPN,

            RouteIg,

            RouteOg,

            CalledPN,

            CallingPN,

            ElapsedTime,

            ContCalledNum,

            CallRedirInd,

            CalledPNType,

            TelServInd,

            SwitchName,

            TypeOfCall,

            Termination_Date,

            Location_id,

            TelephoneIndex_id,

            Switch_id,

            ClientRecognitionMethod,

            Client_id,

            Rate_id,

            ClientService_id,

            Overhead,

            OverheadType,

            Overhead_id,

            Charge,

            ChargeType,

            ChargePrice_id,

            Invoice_id,

            Status,

            MinimumCost,

            CallAddition

            ,Export_id,

            ATelephoneIndex_id,

            ALocation_id,

            Rate1Duration,

            Rate1Charge,

            Rate2Duration,

            Rate2Charge,

            Rate3Duration,

            Rate3Charge

            )

     SELECT

            tbMRFileStatus_id,

            CDRFingerPrint,

            EventType,

            StartDate,

            StartTime,

            UserProvCPN,

            RouteIg,

            RouteOg,

            CalledPN,

            CallingPN,

            ElapsedTime,

            ContCalledNum,

            IDACode ,

            CalledPNType,

            TelServInd,

            SwitchName,

            TypeOfCall,

            CONVERT(DATETIME,Termination_Date) AS Termination_Date,

            CULocation_id,

            TelephoneIndex_id,

            Switch_id,

            CUClientRecognitionMethod,

            CUClient_id,

            CURate_id,

            CUClientService_id,

            CUOverhead,

            CUOverheadType,

            CUOverhead_id,

            CUCharge,

            CUChargeType,

            CUChargePrice_id,

            CUInvoice_id,

            CUStatus,

            CUMinimumCost,

            CUCallAddition,

            CUExport_id,

            CUATelephoneIndex_id,

            CUALocation_id,

            CURate1Duration ,

            CURate1Charge ,

            CURate2Duration,

            CURate2Charge ,

            CURate3Duration,

            CURate3Charge

       FROM #CDRDataProcess  --table must already exist before you call this proc!!!!!!

      WHERE EventType & 4 <> 0

        AND tbMRFileStatus_id = ' + CAST(@tbMRFileStatus_id AS VARCHAR(10)) + '

        AND Termination_Date = ''' + @DateYYMMDD +'''' --Quotes required around string date here...

     

    I was wondering whether it might be worth having an index on #CDRDataProcess, if so which columns and what kind of indexes would suit here ? 


    Kindest Regards,

    John Burchel (Trainee Developer)

  • Try running the SELECT portion by itself... if the performance is similar to the performance of the whole INSERT/SELECT, then you might try using the INDEX TUNING WIZARD (and a couple of "swags") to see if the temp table should get an index (you WILL have to temporarily convert the temp table to a perm table to use the tuning winzard on the SELECT portion).

    The other thing that could be slowing things down are indexes on the Daily_Feed (target of Insert) table.  If you have a clustered index on anything other than an Identity column, that would cause performance issues.  Having lot's of indexes would also cause performance issues because they will all need to be updated for Inserts.

    Other problems with performance might be some of the negative logic you've included... for example, this...

    WHERE EventType & 4 <> 0

    ...should be written as this...

    WHERE EventType & 4 = 4

    ...to eliminate the need for the negative logic.  (Sorry I missed a couple of these in past postings... was concentrating on a different problem).

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I changed this and even ordered the way the insert was done from the temp table. Still some isnerts take almost an hour and we are talking of less than 30k records here.

    Not sure what next now, any help would be appreciated.

     

     


    Kindest Regards,

    John Burchel (Trainee Developer)

  • I also forgot to mention, the table that its inserting into only has the Ignore Duplicate key, no clustered index or any other type of index.

    The order by after the insert was also done to avoid page splits.

     


    Kindest Regards,

    John Burchel (Trainee Developer)

  • An hour for 30k records?  Obviously, that's not good... I'm thinking that the ORDER BY is what is really slowing things down but I could be wrong... the UNIQUE IGNORE DUPLICATE key isn't clustered so I'd be a bit surprised if it where that.  Still, a different tact may be in order... would you post your latest insert code and I'll take a look at it sometime this weekend?  Thanks...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • INSERT INTO DailyFeed' + @DateYYYYMMDD + '

            (

            tbMRFileStatus_id,

            CDRFingerPrint,

            EventType,

            StartDate,

            StartTime,

            UserProvCPN,

            RouteIg,

            RouteOg,

            CalledPN,

            CallingPN,

            ElapsedTime,

            ContCalledNum,

            CallRedirInd,

            CalledPNType,

            TelServInd,

            SwitchName,

            TypeOfCall,

            Termination_Date,

            Location_id,

            TelephoneIndex_id,

            Switch_id,

            ClientRecognitionMethod,

            Client_id,

            Rate_id,

            ClientService_id,

            Overhead,

            OverheadType,

            Overhead_id,

            Charge,

            ChargeType,

            ChargePrice_id,

            Invoice_id,

            Status,

            MinimumCost,

            CallAddition

            ,Export_id,

            ATelephoneIndex_id,

            ALocation_id,

            Rate1Duration,

            Rate1Charge,

            Rate2Duration,

            Rate2Charge,

            Rate3Duration,

            Rate3Charge

            )

     SELECT

            tbMRFileStatus_id,

            CDRFingerPrint,

            EventType,

            StartDate,

            StartTime,

            UserProvCPN,

            RouteIg,

            RouteOg,

            CalledPN,

            CallingPN,

            ElapsedTime,

            ContCalledNum,

            IDACode ,

            CalledPNType,

            TelServInd,

            SwitchName,

            TypeOfCall,

            CONVERT(DATETIME,Termination_Date) AS Termination_Date,

            CULocation_id,

            TelephoneIndex_id,

            Switch_id,

            CUClientRecognitionMethod,

            CUClient_id,

            CURate_id,

            CUClientService_id,

            CUOverhead,

            CUOverheadType,

            CUOverhead_id,

            CUCharge,

            CUChargeType,

            CUChargePrice_id,

            CUInvoice_id,

            CUStatus,

            CUMinimumCost,

            CUCallAddition,

            CUExport_id,

            CUATelephoneIndex_id,

            CUALocation_id,

            CURate1Duration ,

            CURate1Charge ,

            CURate2Duration,

            CURate2Charge ,

            CURate3Duration,

            CURate3Charge

       FROM #CDRDataProcess  --table must already exist before you call this proc!!!!!!

      WHERE EventType & 4 = 4 

        AND tbMRFileStatus_id = ' + CAST(@tbMRFileStatus_id AS VARCHAR(10)) + '

        AND Termination_Date = ''' + @DateYYMMDD +''''

     

    Non clustered Ignore Duplicate key on the 3 key fields (        CDRFingerPrint,        EventType,        StartDate)

    Clustered index on #CDRDataProcess( CDRFingerPrint,        EventType,        StartDate)

     


    Kindest Regards,

    John Burchel (Trainee Developer)

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply