Fill down empty cells with values from a previous non-empty row

  • Hi , I have written code to fill empty rows from previous non-empty row, attached piece of code, it is working without any issues, but the problem is it taking more time due to data is bit higher, around 700k data. I have created index on ID but still running slow, Any help on this much appreciated. Thanks!

     

    CREATE TABLE #sales_data
    (
    ID BIGINT NOT NULL,
    CustomerID varchar(5) NULL,
    CompanyName varchar(40) NULL,
    MonthYear varchar(10) NULL,
    Sales Float
    )

    insert into #sales_data
    Select 1,'QUICK','QUICK Cozinha','Oct-2017',1208.32 UNION ALL
    Select 2,'EASTC','Eastern Connection','Jan-2018',320.99 UNION ALL
    Select 3,'ISLAT','Island Trading','Nov-2016',403.20 UNION ALL
    Select 4,'','','Apr-2016',749.06 UNION ALL
    Select 5,'','','Feb-2018',660.56 UNION ALL
    Select 6,'CONSH','Consolidated Holdings','Oct-2017',1640.11 UNION ALL
    Select 7,'','','Nov-2016',480.37 UNION ALL
    Select 8,'','','Feb-2018',547.20 UNION ALL
    Select 9,'','','Dev-2016',599.22 UNION ALL
    Select 10,'','','May-2017',1668.87


    select * from #sales_data where CustomerID <> ''
    union
    select y.ID, x.CustomerID, x.CompanyName, y.MonthYear, y.Sales
    from #sales_data as x
    join
    (
    select t1.ID,max(t2.ID) as MaxID, t1.MonthYear, t1.Sales
    from (select * from #sales_data where CustomerID = '') as t1
    join (select * from #sales_data where CustomerID <> '') as t2 on t1.ID > t2.ID
    group by t1.ID,t1.MonthYear, t1.Sales
    ) as y on x.ID = y.MaxID
    order by ID;
  • The best way to work with performance issues is to look at the execution plan.  Could you post the execution plan for your query?

    When I run the above, it is pretty darn quick, but it is a touch faster by adding a clustered index on the ID column.  A nonclustered index would work too here.

    One performance boost you may get is by changing your UNION to a UNION ALL.  With the sample data, it works out to the same result, I am not 100% certain it will with real data though.

    I'm sure other experts will offer more advice, but the above is a quick tweak that I see that works with the sample data and improves performance.  Another thing that would improve performance is if you can reduce the number of times you need to look into the TABLE (ie reduce the number of FROM's and JOIN's).  I have not looked hard enough at the data to find a better set of JOIN's or FROM's to use yet.

     

    EDIT - additional suggestion - do you need the ORDER BY?  Is the final order of the data important?  If not, removing that ORDER BY will help performance a bit IF you can do UNION ALL.  Doing UNION ALL removes the distinct sort, and removing the ORDER BY removes the sort operation completely which will make things a bit faster.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Hello, Thanks for response, i updated query to pick only number of columns and union all instead of union. But still very slow!  Attached executed plan, looks like aggregation or inside join is major impact.

    • This reply was modified 3 years ago by  koti.raavi.
    Attachments:
    You must be logged in to view attached files.
  • Looking at the execution plan, I am thinking that adding an index on SubscriberID may help as you are using those in the WHERE clause.

    Your estimated number of rows is also incredibly high on some of those.  Are your statistics up to date?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Yes statistics are up to date, i pulled out inner queries into #temp, those are loading in fraction of seconds! even i added index on Sub ID but still dead slow! its due to greater than join, I'm thinking logic needs to rewritten. I'm not sure any other way. Thanks!

  • One approach that MAY work for you would be to break it up into multiple steps.

    Since pulling the inner queries into a temp table improves performance, why not just use that instead of your inner query?

    What I am assuming you mean is that you are essentially doing:

    CREATE TABLE #temp (ID BIGINT, MaxID BIGINT, MonthYear VARCHAR(10), Sales FLOAT)
    INSERT INTO #temp
    select t1.ID,max(t2.ID) as MaxID, t1.MonthYear, t1.Sales
    from (select * from #sales_data where CustomerID = '') as t1
    join (select * from #sales_data where CustomerID <> '') as t2 on t1.ID > t2.ID
    group by t1.ID,t1.MonthYear, t1.Sales

    If so, I would try doing that, toss an index on MaxID, and then change your final query to:

    SELECT * FROM #sales_data where CustomerID <> ''
    UNION ALL
    SELECT y.ID, x.CustomerID, x.CompanyName, y.MonthYear, y.Sales
    FROM #sales_data AS x
    JOIN #temp AS y ON x.ID = y.MaxID;

    I've found that SOMETIMES you get better performance by tossing things into temp tables rather than pulling from the live tables.  Not always, and I've noticed the best performance boost when working with linked servers or cross database queries, but it may help in your case too.

    Also, it looks like you only attached the estimated execution plan not the actual.  I imagine you are cancelling it and can't wait for it to finish, but looking at the estimated plan it is thinking your JOIN is going to bring back 47.2 billion rows, but you said your table only has 700 thousand rows, so fixing the LAZY SPOOL is likely going to be your best bet, and to fix that, adding indexes will be your friend.  Using the above approach (2 steps) may work for you, but if you absolutely need it in a single query, you will likely need to add an index on CustomerID.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Reading the responses already answer my questions

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply