Spill Levels?

  • Hi, I am reviewing some code that runs in 26 sec but I think it's poorly written.  I have provided it below and am asking for advice on a better way to do what we're doing which is a multi threaded select.  I personally feel the data from the inner loops should be stored in a temp tables.  I have attached the actual query plan. Am I on the right track?  Any advice is appreciated, and please no bashing, I've inherited this code. 🙂

    DECLARE @Today date = GETDATE()
    DECLARE @EndDueDate date = DATEADD(mm,1,@Today)
    DECLARE @StartDueDate date = DATEADD(yy,-1,@EndDueDate)

    SELECT
    InvoiceNumber
    ,BillCo
    ,OpCo
    ,GroupNum
    ,GroupName
    ,GroupRep
    ,GroupBillMode
    ,BillDueDate
    ,Age
    ,BillStatus
    ,OnHold
    ,DedFreq
    ,OrigPolCount
    ,ActualPolCount
    ,OrigBillAmt
    ,ActualBillAmt
    ,FirstFileDate
    ,LastFileDate
    ,LastFileUser
    ,LastRelDate
    ,LastRelUser
    ,InvoicePrintDate
    ,LockboxFundDate
    ,LockboxFunds
    ,PayCount
    ,PayMethods
    ,AppliedPaymentDate
    ,TotalAppliedAmt
    ,PaymentExtractDate
    ,Processor
    ,@Today AS ReportDate

    FROM
    --Bill dataset; aggregates contract details to bill-level
    (SELECT b.invoice_number AS InvoiceNumber
    ,b.bill_pk --for JOIN, not needed for output
    ,(SELECT branding_line_1 FROM BRANDING WHERE branding_pk = g.branding_fk) AS BillCo
    ,STUFF((SELECT DISTINCT ', ' + c2.operating_company
    FROM BILL_RECORD br2 JOIN [CONTRACT] c2 ON br2.bill_record_pk = c2.bill_record_fk
    WHERE br2.bill_fk = b.bill_pk AND c2.update_status_type_cd <> 'DELETED' AND c2.is_deleted = 0
    FOR XML PATH('') --STRING_AGG() may be another solution, but is unavailable in this version of SQL Server
    ), 1, 2, '') AS OpCo --aggregate of all operating companies on the bill
    ,g.group_number AS GroupNum
    ,g.group_name AS GroupName
    ,r.name AS GroupRep
    ,(SELECT billing_mode_type_description
    FROM BILLING_MODE_TYPE
    WHERE billing_mode_type_cd = g.billing_mode_type_cd) AS GroupBillMode
    ,CONVERT(date, b.bill_due_date, 101) AS BillDueDate
    ,DATEDIFF(day,b.bill_due_date,@Today) AS Age --number of days past DueDate; restrict to unpaid, released?
    ,b.bill_status_type_cd AS BillStatus
    ,b.hold_for_review AS OnHold
    ,(SELECT REPLACE(df.deduction_frequency_short_desc, ' deductions','')
    FROM DEDUCTION_FREQUENCY_TYPE df
    WHERE df.deduction_frequency_cd = g.deduction_frequency_cd) AS DedFreq
    ,SUM(CASE WHEN c.update_status_type_cd <> 'ADDED' THEN 1 ELSE 0 END) AS OrigPolCount
    ,SUM(CASE WHEN c.update_status_type_cd <> 'DELETED' THEN 1 ELSE 0 END) AS ActualPolCount
    ,SUM(c.original_bill_amount) AS OrigBillAmt
    ,SUM(c.actual_bill_amount) AS ActualBillAmt
    ,CONVERT(date, (SELECT MIN(add_date) FROM FILE_UPLOAD
    WHERE bill_fk = b.bill_pk AND status_at_time_of_upload <> 'UNRELEASED'), 101) AS FirstFileDate
    ,CONVERT(date, (SELECT MAX(add_date) FROM FILE_UPLOAD
    WHERE bill_fk = b.bill_pk AND status_at_time_of_upload <> 'UNRELEASED'), 101) AS LastFileDate
    ,REPLACE((SELECT add_user FROM FILE_UPLOAD
    WHERE bill_fk = b.bill_pk AND add_date = (SELECT MAX(add_date) FROM FILE_UPLOAD WHERE bill_fk = b.bill_pk)),
    '@selmanco.com','') AS LastFileUser
    ,CONVERT(date, (SELECT MAX(add_date) FROM BILL_RELEASE
    WHERE bill_fk = b.bill_pk), 101) AS LastRelDate
    ,CASE
    WHEN (SELECT release_user FROM BILL_RELEASE
    WHERE bill_fk = b.bill_pk AND add_date = (SELECT MAX(add_date) FROM BILL_RELEASE WHERE bill_fk = b.bill_pk))
    NOT LIKE '%selman%' THEN 'Autorelease'
    ELSE REPLACE((SELECT release_user FROM BILL_RELEASE
    WHERE bill_fk = b.bill_pk AND add_date = (SELECT MAX(add_date) FROM BILL_RELEASE WHERE bill_fk = b.bill_pk)),
    '@selmanco.com','')
    END AS LastRelUser
    ,CONVERT(date, (CASE WHEN b.bill_due_date < '10/01/2019'
    THEN NULL
    ELSE b.invoice_print_date
    END), 101) AS InvoicePrintDate --date bills are sent to mainframe for printing

    FROM BILL b
    LEFT JOIN BILL_RECORD br ON b.bill_pk = br.bill_fk
    LEFT JOIN [CONTRACT] c ON br.bill_record_pk = c.bill_record_fk
    LEFT JOIN [GROUP] g ON g.group_pk = b.group_fk
    LEFT JOIN REP r ON r.rep_pk = g.rep_fk

    WHERE b.is_deleted = 0
    AND br.is_deleted = 0
    AND c.is_deleted = 0
    AND g.is_deleted = 0
    AND g.admin_location_cd = 'KEENE'

    GROUP BY b.invoice_number
    ,b.hold_for_review
    ,b.bill_pk
    ,g.branding_fk
    ,g.group_number
    ,g.group_name
    ,g.billing_mode_type_cd
    ,r.name
    ,b.bill_due_date
    ,b.file_upload_date
    ,b.invoice_print_date
    ,b.bill_status_type_cd
    ,g.deduction_frequency_cd

    ) AS bds --end bill dataset

    LEFT JOIN

    /*Payment dataset; not 100% clear on the role of some of the fields in p and pd;
    some may be redundant and/or not what they appear to be? */
    (SELECT pd.bill_fk --for JOIN, not needed for output
    ,COUNT(p.payment_pk) AS PayCount --number of payment sources applied to bill
    ,STUFF((SELECT DISTINCT ', ' + p2.payment_method_cd
    FROM PAYMENT_DETAIL pd2 JOIN PAYMENT p2 ON p2.payment_pk = pd2.payment_fk
    WHERE pd2.bill_fk = pd.bill_fk
    FOR XML PATH(''))
    , 1, 2, '') AS PayMethods
    ,CONVERT(date, MAX(pd.applied_payment_date), 101) AS AppliedPaymentDate
    ,SUM(CASE WHEN p.payment_method_cd = 'Credit Suspense'
    THEN -1*pd.applied_amount
    ELSE pd.applied_amount
    END) AS TotalAppliedAmt
    ,CONVERT(date, MAX(p.payment_extract_datetime), 101) AS PaymentExtractDate
    ,STUFF((SELECT DISTINCT ', ' + REPLACE(p2.processor_user,'@selmanco.com','')
    FROM PAYMENT_DETAIL pd2 JOIN PAYMENT p2 ON p2.payment_pk = pd2.payment_fk
    WHERE pd2.bill_fk = pd.bill_fk
    FOR XML PATH(''))
    , 1, 2, '') AS Processor

    FROM PAYMENT p
    JOIN PAYMENT_DETAIL pd ON p.payment_pk = pd.payment_fk

    WHERE pd.is_deleted = 0
    AND p.is_deleted = 0
    AND p.payment_status_cd = 'PROCESSED'

    GROUP BY pd.bill_fk

    ) AS pds --end payment dataset

    ON bds.bill_pk = pds.bill_fk --should be 1-to-1

    LEFT JOIN

    /* Lockbox dataset */
    (SELECT x.invoice, MAX(x.deposit_date) AS LockboxFundDate, SUM(x.paid_amount) AS LockboxFunds
    FROM (SELECT CASE WHEN r.lockbox_reconcile_pk IS NULL
    THEN a.generated_invoice_no
    ELSE r.generated_invoice_no
    END AS invoice
    ,CONVERT(date, a.deposit_date, 101) AS deposit_date
    ,CASE WHEN r.is_deleted = 0
    THEN r.paid_amount
    ELSE NULL
    END AS paid_amount
    FROM LOCKBOX_IN_ARCHIVE a
    LEFT JOIN LOCKBOX_RECONCILE r ON a.lockbox_in_archive_pk = r.lockbox_in_archive_fk) AS x
    GROUP BY invoice
    ) AS lbds --end lockbox dataset

    ON bds.InvoiceNumber = lbds.invoice

    --WHERE
    --(OpCo LIKE '%VC%' OR OpCo LIKE '%CB%')
    --AND InvoiceNumber = '368_20190601'
    --AND GroupNum IN ('DYQ')
    --AND BillStatus NOT IN ('UNRELEASED', 'PAID')
    --AND BillDueDate BETWEEN @StartDueDate AND @EndDueDate

    ORDER BY InvoiceNumber
    ;
  • I think you forgot the execution plan.  Without that it is incredibly hard to say with any certainty what would help.

    indexes may help, or you may already have the ones you need.  if you don't have any PK/FK constraints in place, that may help.  Re-writing the nested selects may help.  It is hard to say though just from the query, especially with no data.

    Converting the nested selects MAY make things faster, or it might even make things slower.  If the tables in the nested selects are already finely tuned for these queries (good indexes and constraints), you may only slow things down by pulling it into temp tables or table variables.

    It would also be helpful if you qualified the columns you are using with the alias they correspond to.  This makes it easier to tell if you are pulling all of the data you require OR if you might be pulling data that is simply discarded.

    Depending on how much data you are pulling, that performance MAY be the best that you can get.  For example, lets say you are pulling back 10 million rows of data at 1 KB per row.  That is 10,000,000 KB of data, OR 10,000 MB of data OR 10 GB of data (approximately... I am rounding 1024 to 1000 to make the math easier to visualize).

    There are a lot of factors in place that could be causing it to be "slow" and maybe 23 seconds is as good as you can get.  Maybe there are some indexes and constraints you can add to get better performance.  Maybe your statistics are out of date and should be updated.  Maybe your indexes could use some include columns.  There are a lot of unknowns on our side.

     

    One thing that MAY help is to change your data pulls from the table "FILE_UPLOAD" into a JOIN rather than pulling data from that table multiple times per row.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply