How to speed up Running Average View

  • Hi, I was wondering if someone could help me figure out why my code has all of the sudden started executing so slow. I have 2 tables of interest for this average, 'methods' and 'readings'. They are linked by many to one relationship, that is one method can have many readings. There is an index on the reading sample date. The column i want to average is called 'value' (before I get flamed I did not name this column). So in general the table structure is as follows:

    Methods

    MethodID INT (PK)

    MethodCode NVARCHAR

    Readings

    ReadingID INT PK

    SamplingDate DATETIME (Indexed)

    Value FLOAT

    MethodID INT (FK) (Indexed)

    I also made a view called View_MethodReadings which is defined as below:

    CREATE VIEW [dbo].[View_MethodReadings] AS

    SELECT

    R.Value,

    R.SamplingDate,

    M.MethodCode

    FROM

    Readings R JOIN Methods M

    ON R.MethodID = M.MethodID

    GO

    Anyhow, that's the base data, I can provide a script to generate these objects if you wish, but I will have to strip down their structure for demonstration purposes. Here is my running average which gets all of the running average values for each date in the reading table for methods of the type 'MF_UCONC':

    CREATE VIEW [dbo].[View_RunningAverage_UConc] AS

    SELECT

    a.SamplingDate,

    AverageUConc = AVG(b.value)

    FROM

    View_MethodReadings a LEFT JOIN View_MethodReadings b

    -- Currently displays the running 1 hour average

    ON b.SamplingDate BETWEEN DATEADD(hh,-1,a.SamplingDate) AND a.SamplingDate

    WHERE

    a.MethodCode = 'MF_UCONC' AND b.MethodCode = 'MF_UCONC'

    GROUP BY

    a.SamplingDate

    GO

    When I first started querying this view it would run fairly quickly, now its slowing down. I always query for about 1 days worth of data with is about 1400 records. The readings table has about 200 000 readings and is growing rapidly.

    Is there any ideas what is going on wrong here? I will probably move this to a stored procedure because I could get better performance gains from it.

  • Check execution plan ... is it taking advantage of index on SamplingDate column?

    Are your performance statistics on base table up-to-date? if performance is decreasing as time goes by chances are query is doing a full table scan.

    Last but not least, adding a view is not going to help with performance.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Statistics on the base table are up to date, that didn't fix the issue.

    The view is just so I don't have to type that join all of the time. I can't see a simple view like this hindering performance. I used a CTE to materialize this view and tried it that way. Same thing.

    I also thought maybe it was not using the index. Is there a way to force a view to use an index? I remember something about a HINT keyword but have never had to use it.

    I attached the execution plan, there is one particular operation that takes up 47% and it seems to be the methodID seek, which is indexed. It also seeks the SamplingDate index which takes up 26%. The sampling dates are created every minute by measuring device so I kinda assumed an index wouldn't do much on this column. Maybe I should change the fill factor of the index? I don't know I have never had to do this, I have just created the indexes as basic create statements as below:

    CREATE NONCLUSTERED INDEX [IDX_Readings_SamplingDate] ON [dbo].[Readings] ([SamplingDate] ASC)

    Is there maybe a better way to create an index having some knowledge of the spacing of the samplingDates?

  • Try adding the VALUE column to your index as an included value. As of now, if it uses the nonclustered index at all, if still has to make a bookmark lookup to go find the VALUE. If it were included in the index, you'd save yourself the rather expensive (and often time-consuming) operation.

    I'm assuming it's not using the index since it has to keep doing these lookups. It's likely "cheaper" (form an exec plan perspective) to load the whole table and manually sort it, than to use the index and do the lookups.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • loki1049 (2/24/2012)


    I also thought maybe it was not using the index. Is there a way to force a view to use an index? I remember something about a HINT keyword but have never had to use it.

    For testing purposes I would write the query against base tables and use index hint on that particular one... here is the syntax:

    FROM table_name WITH (INDEX (index_name))

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • {EDIT} Please see my last coded post on this thread for a better answer.

    The real problem is that you don't have the correct indexes on the table. Since the current indexes don't support the query, it's forced to do a Clustered Index Scan on "b" for every row in "a" which is physically materialized in a Table Spool. That's the long winded explanation for saying that you have an "Accidental Cross Join" in the code.

    Let's check it out. Let's build 200,000 rows across a year of dates (about 548 rows per day) across 5 different MethodID's and only select where the MethodID of "a" and "b" = "3".

    Soooooo... let's build the test table from the description given. As always, the details are in the comments in the code. Please read them. This code includes the 3 indexes indicated in the original problem definition.

    --===== Identify a nice, safe place to do this that everyone has

    USE tempdb

    ;

    --=====================================================================================================================

    -- Create the test table and populate it with data on the fly.

    -- This takes only several seconds to run

    --=====================================================================================================================

    --===== Conditionally drop the test table to make reruns easier in SSMS.

    -- Don't change this or you take the chance of dropping your real table.

    -- We need to do this as a real table because a VIEW cannot be made on a Temp Table.

    IF OBJECT_ID('tempdb.dbo.Readings','U') IS NOT NULL DROP TABLE tempdb.dbo.Readings;

    GO

    --===== Create the data and build the table on the fly.

    WITH

    cteGenData AS

    (

    SELECT TOP 200000

    SamplingDate = RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2011','2012') + CAST('2011' AS DATETIME),

    Value = RAND(CHECKSUM(NEWID())) * 100,

    MethodID = ABS(CHECKSUM(NEWID())) % 5 + 1

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    )

    SELECT ReadingID = IDENTITY(INT,1,1),

    SamplingDate = ISNULL(SamplingDate,0), --The ISNULL here makes a NOT NULL column during the INTO/

    Value,

    MethodID

    INTO dbo.Readings

    FROM cteGenData

    ORDER BY SamplingDate

    ;

    --===== Create the PK (which is also an index in this case)

    -- and the two indexes given by the OP.

    ALTER TABLE dbo.Readings

    ADD CONSTRAINT PK_Readings

    PRIMARY KEY CLUSTERED (ReadingID)

    ;

    CREATE INDEX IX_Readings_SamplingDate

    ON dbo.Readings (SamplingDate)

    ;

    CREATE INDEX IX_Readings_MethodID

    ON dbo.Readings (MethodID)

    ;

    Now, let's build the view. Read the comments!

    GO

    --===== I don't reference the Methods table here because

    -- the Methods table isn't part of the problem.

    -- I just want to limit this all to the problem.

    CREATE VIEW dbo.View_MethodReadings

    AS

    SELECT

    a.SamplingDate,

    AverageUConc = AVG(b.value)

    FROM

    dbo.Readings a LEFT JOIN dbo.Readings b

    -- Currently displays the running 1 hour average

    ON b.SamplingDate BETWEEN DATEADD(hh,-1,a.SamplingDate) AND a.SamplingDate

    WHERE

    a.MethodID = '3' AND b.MethodID = '3'

    GROUP BY

    a.SamplingDate

    ;

    GO

    Now, let's lookup just one day of data from the view and see what happens.

    SELECT *

    FROM dbo.View_MethodReadings

    WHERE SamplingDate >= '1 Jun 2011'

    AND SamplingDate < '2 Jun 2011'

    ;

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    (115 row(s) affected)

    Table 'Worktable'. Scan count 40262, logical reads 80755, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Readings'. Scan count 3, logical reads 856, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2032 ms, elapsed time = 2183 ms.

    If we take a look at the execution plan, we can see that we get two real nice index seeks with a nicely sorted Merge Join but that does a Table Spool against the clustered index which looks at a total of almost 3,000,000 internal rows. That's 15 times the original number of rows in the table. Any guesses why the code is slow? 😉

    The problem is that we've wasted a perfectly good Clustered Index as a PK. We really need a Clustered Index on the SamplingDate column to keep this from being an "Accidental Cross Join". Following best practices for a Clustered Index, it needs to be "ever increasing", which the SamplingDate will do nicely for us, but it also needs to be UNIQUE. Since there's no guarantee the SamplingDate will be unique, we have to add the IDENTITY column to the Clustered Index.

    With all of that in mind, lets drop the two indexes on the columns we just spoke of and replace them with a single Clustered Index that follows the rules of being "ever increasing" and "unique". Like this...

    --===== Let's drop two indexes and change them out for a better index

    ALTER TABLE dbo.Readings

    DROP CONSTRAINT PK_Readings

    ;

    DROP INDEX dbo.Readings.IX_Readings_SamplingDate

    ;

    --===== Now, let's build a better mouse trap which combines

    -- the two indexes we just dropped.

    CREATE UNIQUE CLUSTERED INDEX IX_Readings_SamplingDate_ReadingID

    ON dbo.Readings (SamplingDate,ReadingID)

    ;

    If you absolutely insist on having a PK even though this table doesn't actually need one, there's only one candidate for the PK...

    --===== If you insist, you can also have a PK even though

    -- it's not necessary for this table.

    ALTER TABLE dbo.Readings

    ADD CONSTRAINT PK_Readings

    PRIMARY KEY NONCLUSTERED (ReadingID)

    ;

    Now when we run the same code as we did before, we get lightning performance...

    SELECT *

    FROM dbo.View_MethodReadings

    WHERE SamplingDate >= '1 Jun 2011'

    AND SamplingDate < '2 Jun 2011'

    ;

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    (115 row(s) affected)

    Table 'Readings'. Scan count 116, logical reads 519, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    If we take a look at the execution plan, we see why. The Table Spool that was created (as a Cross Join) and the Clustered Index Scan is gone! In fact, the "fat" arrows in the previous execution plan have ALL been reduced in size.

    [font="Arial Black"]Be advised [/font]that this still has an "Accidental Cross Join" in it formed by the NESTED LOOP in this case, but it's a heck of a lot smaller than what we had before. That, notwithstanding, even this method will slow to a crawl when you start to reach rowcounts like 2 million because the Clustered Index Scan will come back as will the Table Spool. :sick: I'm still working on trying to keep that from happening.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • {EDIT} Please see my last coded post on this thread for a better answer.

    Ah... got it. Let's rebuild the test using 2,000,000 rows and I'll show you. Read the comments about the indexes in case INSERTs start to time out.

    --===== Identify a nice, safe place to do this that everyone has

    USE tempdb

    ;

    --=====================================================================================================================

    -- Create the test table and populate it with data on the fly.

    -- This takes only several seconds to run

    --=====================================================================================================================

    --===== Conditionally drop the test table to make reruns easier in SSMS.

    -- Don't change this or you take the chance of dropping your real table.

    -- We need to do this as a real table because a VIEW cannot be made on a Temp Table.

    IF OBJECT_ID('tempdb.dbo.Readings','U') IS NOT NULL DROP TABLE tempdb.dbo.Readings;

    GO

    --===== Create the data and build the table on the fly.

    WITH

    cteGenData AS

    (

    SELECT TOP 2000000

    SamplingDate = RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2011','2012') + CAST('2011' AS DATETIME),

    Value = RAND(CHECKSUM(NEWID())) * 100,

    MethodID = ABS(CHECKSUM(NEWID())) % 5 + 1

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    )

    SELECT ReadingID = IDENTITY(INT,1,1),

    SamplingDate = ISNULL(SamplingDate,0), --The ISNULL here makes a NOT NULL column during the INTO/

    Value,

    MethodID

    INTO dbo.Readings

    FROM cteGenData

    ORDER BY SamplingDate

    ;

    --===== Here's the better clustered index

    CREATE UNIQUE CLUSTERED INDEX IX_Readings_Composite01

    ON dbo.Readings (SamplingDate,ReadingID)

    ;

    --===== Here's the index for the MethodID. It INCLUDEs

    -- the value column which keeps the Clustered Index

    -- from returning to a scan to find the Value.

    -- As a side bar, if you ever run into TIMEOUTs on

    -- inserts, it will be because of Extent Splits on

    -- this index. Try rebuilding it with a FILLFACTOR

    -- of 70. That might help a bit.

    CREATE INDEX IX_Readings_MethodID

    ON dbo.Readings (MethodID)

    INCLUDE (Value) --WITH FILLFACTOR = 70

    ;

    --===== If you insist, you can also have a PK even though

    -- it's not necessary for this table.

    ALTER TABLE dbo.Readings

    ADD CONSTRAINT PK_Readings

    PRIMARY KEY NONCLUSTERED (ReadingID)

    ;

    If we run the original code against that...

    SELECT *

    FROM dbo.View_MethodReadings

    WHERE SamplingDate >= '1 Jun 2011'

    AND SamplingDate < '2 Jun 2011'

    ;

    ... this is the kind of response time we get. Lightning fast.

    SQL Server parse and compile time:

    CPU time = 47 ms, elapsed time = 80 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    (1103 row(s) affected)

    Table 'Readings'. Scan count 1104, logical reads 4881, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 389 ms.

    Make no doubt about it that the Nested Loop still creates a bit of a cross join, as it did before, execpt the INCLUDE on the Value column of the "MethodID" index keeps the query from going back to a Clustered Index Scan to find "Value".

    Yeah, I know what you're going to say next. "But, Jeff! The Clustered Index wasn't even used! Why do we need it?"

    This is a classic example of an index that won't ever be used by this type of query but it's still absolutely necessary. What you have to remember is that 1) it keeps the table from being a heap, 2) it makes the table self maintaining because the clustered index is ever increasing which prevents page fragmentation, and 3) EVERY non-clustered index includes the clustered index columns. Because of the INCLUDE on value, the nonclustered index will have key columns of MethodID, SampleDate, ReadingID and all of those will be included in the leaf level of the index along with the Value column.

    Basically, we've just built a full blown covering index for this query. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... I've been thinking about this and have decided that I probably shouldn't post solutions late at night because all the coffee has worn off. I believe that if I make one small change to the clustered index, there probably won't be the need for any non-clustered indexes for this problem and the clustered index will still be "self maintaining".

    I'll be back as soon as I'm sure one way or the other. Still testing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IF you don't mind changing the view and the code that calls it to be a bit more flexible, we can make it so the table actually only needs one index and that index will never cause a page or extent split. The advantage there, of course, is that the index will never need to be maintained on this rapidly growing table. About the only thing you may need to do is to update statistics now and then.

    Here's the new view. The big change here is that it no longer looks at only a single MethodID. Instead, it can be reused to look at ANY MethodID.

    CREATE VIEW dbo.MeterReadings1HourRollingAverage

    /******************************************************************************

    Purpose:

    Returns 1 hour rolling averages by MethodID.

    WARNING! THIS CODE STILL USES THE HIDDEN RBAR OF CROSS JOIN AND SHOULD ONLY BE

    USED FOR A DAY OR TWO OF LOOKUPS.

    Recommended Usage Example:

    SELECT SamplingDate, AverageUConc

    FROM dbo.View_MethodReadings_AnyMethodID

    WHERE SamplingDate >= '1 Jun 2011'

    AND SamplingDate < '2 Jun 2011'

    AND MethodID = 3

    ;

    ******************************************************************************/

    AS

    SELECT a.SamplingDate,

    a.MethodID,

    AverageUConc = AVG(b.value)

    FROM dbo.Readings a

    LEFT JOIN dbo.Readings b

    ON b.SamplingDate BETWEEN DATEADD(hh,-1,a.SamplingDate) AND a.SamplingDate

    AND a.MethodID = b.MethodID

    GROUP BY a.SamplingDate, a.MethodID

    ;

    I also made the assumption that the meter readings will actually be unique by the datetime of SampleDate and MethodID (and they really should be or you have bad data in the table). So, I changed the test data to reflect that realistic fact. Details are, of course, in the comments in the code below. Do notice the changes I made for the indexes.

    --===== Identify a nice, safe place to do this that everyone has

    USE tempdb

    ;

    --=====================================================================================================================

    -- Create the test table and populate it with unique test data.

    -- Making the randomized test data unique does take a bit of time.

    --=====================================================================================================================

    --===== Conditionally drop the test table to make reruns easier in SSMS.

    -- Don't change this or you take the chance of dropping your real table.

    -- We need to do this as a real table because a VIEW cannot be made on a Temp Table.

    IF OBJECT_ID('tempdb.dbo.Readings','U') IS NOT NULL DROP TABLE tempdb.dbo.Readings;

    GO

    --=====================================================================================================================

    -- This builds a "sample" for every minute of every day for a year's worth of dates for 5 different meter methods

    -- for a total of 2,628,000. Every row is unique by SampleDate and MethodID.

    -- This takes about 90 seconds on my 10 year old machine.

    --=====================================================================================================================

    WITH

    cteCreateDates AS

    ( --=== Builds a "day" for every day in 2011

    SELECT TOP (DATEDIFF(dd,'2011','2012'))

    Date = DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,'2011')

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ),

    cteCreateTimes AS

    ( --=== Builds a "time" for every minute in a day

    SELECT TOP (1440)

    Time = DATEADD(mi,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,0)

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ) --=== The puts all of the data together and makes NOT NULL columns in the table

    -- it builds because of the ISNULL's (even though the data is never null in

    -- in this case)

    SELECT ReadingID = IDENTITY(INT,1,1),

    SamplingDate = ISNULL(d.Date + t.Time,0),

    Value = ISNULL(RAND(CHECKSUM(NEWID())) * 100,0),

    MethodID = ISNULL(m.MethodID,0)

    INTO dbo.Readings

    FROM cteCreateDates d

    CROSS JOIN cteCreateTimes t

    CROSS JOIN ( --===== Builds 5 MethodID's

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 5

    ) m (MethodID)

    ORDER BY SamplingDate

    ;

    --===== Here's the clustered index.

    -- I'm making the assumption that there won't be any duplicate readings in the table.

    -- If they're not, then Celko will probably pay you a visit in the night because they

    -- should be! ;-)

    -- To Joe's point, index should actually be the PK.

    -- The ReadingID isn't actually required in the table but I understand the reasons

    -- why some folks might want to have it.

    -- This takes about 90 seconds on my old machine, as well.

    drop index dbo.Readings.IX_Readings_Composite01

    CREATE UNIQUE CLUSTERED INDEX IX_Readings_Composite01

    ON dbo.Readings (SamplingDate,MethodID)

    ;

    --===== If you insist, you can also have a PK even though it's not necessary for this table.

    -- Because it's ever increasing, this particular index won't need maintenance either.

    ALTER TABLE dbo.Readings

    ADD CONSTRAINT PK_Readings

    PRIMARY KEY NONCLUSTERED (ReadingID)

    ;

    Again, both indexes are "ever increasing" and should never require maintenance due to fragmentation.

    If we run the following code...

    SELECT SamplingDate, AverageUConc

    FROM dbo.MeterReadings1HourRollingAverage

    WHERE SamplingDate >= '1 Jun 2011'

    AND SamplingDate < '2 Jun 2011'

    AND MethodID = 3

    ;

    ... we still get some pretty good performance out of it.

    SELECT SamplingDate, AverageUConc

    FROM dbo.MeterReadings1HourRollingAverage

    WHERE SamplingDate >= '1 Jun 2011'

    AND SamplingDate < '2 Jun 2011'

    AND MethodID = 3

    ;

    However and as I pointed out in the new view code, we're still using a hidden Cross Join in the code. Although the new indexes and the new view have greatly increased the performance compared to the original code posted by the OP, this is still heavy hidden RBAR and it will crush the server if you try to use it for a period of, say, a year. There are much better methods for doing this but they won't work in a view or Inline Table Valued function.

    The new methods in SQL Server 2012 will make easy work of this problem using AVG() OVER. That's not really an option for today.

    If you really need blazing speed (that will even beat the methods in 2012), there are other methods. First a well written cursor will beat the code above especially for larger periods like a year. The quickest method I've seen uses a "Quirky Update" and can be found at the following URL.

    http://www.sqlservercentral.com/articles/Moving+Average/69389/

    The problem, of course, with all of those high speed methods is that they require fixed and reasonably accurate time intervals but that's probably not so difficult to pull off.

    Shifting gears a bit, I have a question and this is why you should post data IAW the first link in my signature line below. This table only contains 1 meter? Don't you have a "MeterID" in the table to identify more than one meter??? I've only optimized the indexes for the data you've given. If you have a "MeterID" (or some such identification for multiple meters), THAT could change the whole personality of the indexes required to support other queries especially the clustered index. It won't be difficult to do but it will be a change.

    Ok... I'm all done. Let me know how all of this works out for you and, next time, please post IAW the article at the first link in my signature line below.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BWAA-HAAA!!! Maybe I'm not done yet! I just had an ephiphany that I'll have to explore. It seems to me that a Recursive CTE might be a pretty fast method here if the samples are taking on that "regular" time basis thing I was talking about using a similar method to the "Quirky Update" method that I previously mentioned.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the insight everyone. And thanks again Jeff for going over and above to help find those roadblock cross joins. I will try the composite index and see how performance is. As for your questions:

    - There is only one meter, but yes, each reading does have a meterID. There will only ever be one meter, so we don't need to worry about it.

    - Readings go into the readings table at about 1 reading every minute, sometimes that duration can shift up to 2 minutes.

    - I wasn't sure how to generate test data, I was actually looking through your posts (Jeff) to find some that included a way to make a span of dates, so thanks, that is helpful for future posts.

    - The readingID is auto generated and hence will always be unique, so the readingID and sampledate combination will always be unique and ascending as you have stated.

    - Rebuilding indexes and updating stats will be provided by IS on a scheduled maintenance plan, so I am not to worried about that.

    - I forgot to mention (not sure how) that there is an additional column in the readings table called 'LOCATIONID' which is a link to the locations table where the meter is currently pulling data from, I don't believe this changes the issue all that much, though it will have to be added to the group by of the view.

    On a side note, the stored procedure that inserts these readings is quite a beast as well. It inserts 8 separate readings into the readings table each minute. This database is going to grow, and fast. This is why I am concerned about the performance of this view.

    I had the luxury of changing this view into a stored procedure, which I believe should work much faster after I add this composite index. I'm not sure I want to include the 'Value' column if it's going to slow down inserts. I already have a UNIQUE column set constraint I forgot to mention, on LocationID, MethodID, SamplingDate, and Value. The procedure allows me to parametrize the method a lot better so I can change the averaging window, date range, and reading type. Anyhow, here is the procedure (which I will have to modify to include locationID):

    CREATE PROCEDURE [dbo].[Usp_RunningAverage]

    (

    @Start DATETIME,

    @End DATETIME,

    @MethodCode NVARCHAR(255),

    @Minutes FLOAT

    )

    AS

    /*

    Stored Procedure: Usp_RunningAverage

    Date: 2012-02-24

    Description: This stored procedure is used to calculate running average for

    various method types in the readings table.

    */

    BEGIN

    SET NOCOUNT ON

    ---------------------------------------------------------------------------------------------------------------------------------

    -- Return the result set

    ---------------------------------------------------------------------------------------------------------------------------------

    ;WITH CTE (SamplingDate,Reading) AS

    (

    SELECT SamplingDate,[Value]

    FROM View_MethodReadings

    WHERE MethodCode = @MethodCode AND SamplingDate BETWEEN DATEADD(mi,-@Minutes,@Start) AND @End

    )

    SELECT

    a.SamplingDate,

    AverageReading = AVG(b.Reading)

    FROM

    CTE a LEFT JOIN CTE b

    ON b.SamplingDate BETWEEN DATEADD(mi,-@Minutes,a.SamplingDate) AND a.SamplingDate

    GROUP BY

    a.SamplingDate

    HAVING

    a.SamplingDate BETWEEN @Start AND @End

    -- Requires a sort so that graphing program draws the line right

    ORDER BY

    a.SamplingDate ASC

    ---------------------------------------------------------------------------------------------------------------------------------

    -- End code

    ---------------------------------------------------------------------------------------------------------------------------------

    SET NOCOUNT OFF

    END

    GO

    I will have to look over this some more and see what I can take from this thread to get this issue running top speed. Thanks for all of the help so far, much appreciated.

  • Jeff Moden (2/25/2012)


    I also made the assumption that the meter readings will actually be unique by the datetime of SampleDate and MethodID (and they really should be or you have bad data in the table). So, I changed the test data to reflect that realistic fact. Details are, of course, in the comments in the code below. Do notice the changes I made for the indexes.

    Upon looking over this more, I noticed this is in fact not a true statement. A single sample coming from the meter consists of 8 different reading types, which may or may not be assigned to the same samplingDate by the client software. This is part of the project I have always been against. We are forced to re-use an old database structure (god knows why) that isn't made specifically for this project. Some BS about re-using code cause that will somehow save us time. Very frustrating actually. I originally wanted to make a separate table that holds a time and associated readingID, LocationID, and SampleID (which would be the primary Key of that table) but this was not an option.

    Edit: OK no, I was wrong. They should always be unique, I had just had some bad data from early testing done that was causing the constraint to fail when it was being created.

    So this a rather ugly table structure for the problem at hand.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply