Batching and index question

  • I am trying to compare data from two quarters that I have.

    Table structure is the same between both tables that I am comparing with the PK being the same between each table as well.

    The following is the batch "procedure" that I have "created" based on other threads from the forums.

    declare @t int

    set @t=1

    while @t>0

    begin

    begin transaction t1

    set rowcount 1000

    update q2204 set q22004.batch = q12004.batch, q22004.updated = 1

    from q2204 inner join q12004

    on q22004.phone = q12004.phone

    and updated = 0 and q2204.batch is null

    set @t=@t-1

    commit transaction t1

    waitfor delay '00:00:0.2'

    end

    When I run through this, even for a rowcount of 1, I'm getting no results back for an inordinate amount of time.  (a single record being updated via the above took over 11 minutes to accomplish).

    The column "updated" has an index and is a character field (currently housing only 0 and 1 based on a yes/no type scenario.

    Any insight as to what I am doing incorrect here?

    Thanks

  • Hello,

    first thing I can see is that you have typos there.. I'm not sure whether it is just in the post or also in the real query, but one of the tables is sometimes referenced as q2204 and sometimes as q22004. In certain cases (if both tables exist), this can cause such huge delays... though here I'd suppose that it would rather crash than run indefinitely... but maybe? Otherwise, I'm at a loss what precisely should  the procedure do - could you elaborate on that, please? A description in words, what do you want to achieve by the code, would be fine.

    cheers, Vladan

  • Hah, sorry for the typo's I was in a bit of a hurry.

    What I get is quarterly data that we call off of.  Each time I send a bunch of leads out I assign them a batch so that we can determine ROI's from the lists.

    What I have in my procedure is Quarter 1 data and Quarter 2 data.  The quarter 1 data has had data sent out from it already, and when I send data out from the Quarter 2 database, I want to make sure I don't end up sending out the same information.  (hopefully this makes sense).

    Therefore what I am attempting to accomplish is to compare the Primary keys (Phone numbers in each table) to each other and if the record hasn't been updated already, then update it.  I also put in a "check" field so that I can know whether or not the record has at least been processed.

    For whatever reason though what I am running into is a massive slow-down on my system with this proc.  A single update for 1 record takes 17 minutes roughly and I would assume that if the tables were indexed correctly and the procedure was written correctly that time would be infinitely less, although I could always be wrong.

  • Try changing

    and updated = 0 and q2204.batch is null

    to

    where q1204.updated = 0 and q2204.batch is null

    also why q2204.batch is null?

  • Just curious... are any of the joined "tables" really views?  That's what happened to me at work... just did a SELECT * to see what I got back from a "table" someone told me to use... it took 9 HOURS (yup, I'm patient!) because the view was written so poorly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No, they are just basic tables (no FK's,etc in them).

    I had debated on trying to see if an indexed view would have halped speed things along, but am still stumped on why the problem still occurs.

    Granted I am having to "compare" a 22 Million row table with a 40 Million row table, but even so the comparisons in my understanding shouldn't take as long as they are taking.

    But of those rows that I am comparing both columns are primary keys of each table with indexes built on the other columns that are involved in the query.  I'd have thought that these would enable the query to run more efficiently than what I have seen so far.

  • Thanks for the explanation... I still don't understand everything, but I'm beginning to :-). In short, you want to find all records in table q22004, that are already present (with the same PK, phone) in table q12004, and for these copy the value of "batch" field from q12004 to q22004 + set updated = 1. Reason for this is that you only want to process ("send out") the new phone numbers... Fine, but what do you plan to do next quarter? Will you compare just q22004 and q32004, or do you need to compare with q12004 as well? Maybe the best solution would be to do something absolutely different... but I can't tell that, since I don't know the data structure and processes.

    Well, about the SQL... I suggest to rewrite the comparison part as a select first - not update - and check the execution plan in QA. Does it use indexes? If not, then you can either force usage of an index with a hint, or create a new index. What about trying a composite index on the columns batch + updated, instead of the index on Updated only? If the column has only 2 or 3 possible values, index doesn't help much with millions of rows. I also second bdloving's suggestion to rephrase the query so that the "q22004.updated = 0 AND q22004.batch is null" appears in the WHERE clause. Also, since both tables have the same structure, you need to qualify "updated" with a table name/alias (but that could be a typo in your post).

    HTH, Vladan

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply