How to handle very large dataset

  • I need to find the most recent post date for all the invoices in my table. There are millions of records. Any thoughts?

    selectINV_NUM,

    POST_DT

    from (

    selectINV_NUM,

    POST_DT,

    row_number() over(partition by INV_NUM order by POST_DT desc) as RowNum

    from IDX_INCOME

    where GRP__2='7'

    ) b

    where b.RowNum=1

    and INV_NUM='0'

  • Without you providing a create table statement and sample data in the form of inserts very generically:

    select non_agregated_column, max(date_column) MaxDate 
    from tbl
    group by non_agregated_column;

  • I know how to find the most recent records, see my query, but it's wicked slow because of the number of records. I would have to send a few million rows of sample data in order to get the same impact.

  • NineIron - Monday, February 26, 2018 10:30 AM

    I know how to find the most recent records, see my query, but it's wicked slow because of the number of records. I would have to send a few million rows of sample data in order to get the same impact.

    Provide an actual execution plan, please.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Pardon my ignorance but, how do I copy then paste the execution plan?

  • NineIron - Monday, February 26, 2018 11:24 AM

    Pardon my ignorance but, how do I copy then paste the execution plan?

    Right click / Save Execution Plan As ... pick your filename & then attach.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • The optimiser will do this anyway but simplify your query and you clarify the index requirements:

    SELECT MAX(POST_DT)  
    FROM
    IDX_INCOME  
    WHERE
    GRP__2 =
    '7'
       AND INV_NUM =
    '0'

    If you don't already have an index on GRP__2 and INV_NUM which also includes POST_DT in the KEY or INCLUDE part, then you might need one.

      

     


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • See attached.

  • NineIron - Monday, February 26, 2018 12:00 PM

    See attached.

    The query in that execution plan is quite a bit different than what you posted.

    In the execution plan, you're doing a convert on the invoice date.  What is the datatype of that date column?

    You're also searching for an invoice balance of '0', which is a string rather than a numeric so please identify the datatype of the invoice column, as well.

    The only thing that may make this faster is an index on the WHERE criteria and, even then, it may result in an index scan simply because it needs to a scan to enumerate the rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • NineIron - Monday, February 26, 2018 12:00 PM

    See attached.

    "idx_income" is an odd name for a heap! Why don't you have a clustered index? What's the purpose of this table? What's its daily/weekly cycle of changes?
    And what Jeff said too - this is wildly different from your trivial original query.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • I appologize for the confusion. I'm trying to get some financial data to tie out and I can't get out of this rat hole. The data is indexed on MRN, INV_NUM, and POST_DT. The data types on all of the columns is nvarchar(255). It's a pain to work with this table but, that's what I'm stuck with.
    Thanx for your help. I'm going to schedule this stuff to run off hours so, the time it takes won't impact the user.

  • I'm a bit of a noob, but couln't he just create computed columns that convert the data to proper types, then create an index on and query on those computed columns?

    Technet article:  https://technet.microsoft.com/en-us/library/ms191250(v=sql.105).aspx

  • Any chance that in your script you can insert the contents of that generic table into a temp table of your creation, with proper data types and indexes that you create? If your data is not huge and you do it in on the same machine then that may be a good approach.

    ----------------------------------------------------

  • I got some more information from the owner of the table and was able to reduce the number of records and stick them in a temporary table. Then, join the temp table to the other stuff. Now, seconds instead of 3 minutes. Thanx for all the input.

  • NineIron - Thursday, March 1, 2018 4:07 AM

    I got some more information from the owner of the table and was able to reduce the number of records and stick them in a temporary table. Then, join the temp table to the other stuff. Now, seconds instead of 3 minutes. Thanx for all the input.

    Thanks for the feedback.  "Pre-aggregation" and "Divide'n'Conquer" are frequently all that's needed to make monsters behave.  People make the mistake of thinking that "Set Based" means "All in one query" and nothing could be further from the truth.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply