Using Lag/Lead and OVER() to get growth pattern

  • Hi Guys,

    I've got a bit of a query problem that I am not quite getting where to start, to be honest. I think I know what bits I need to use just not sure how to bash all the parts together:

    I have a table that contains snapshots of disk space once a day, for a bunch of servers:

    CREATE TABLE [dbo].[mgmt_disk_free_space]

    (

    [rowguid] [uniqueidentifier] NOT NULL DEFAULT (newsequentialid()),

    [rowdate_utc] [date] NOT NULL DEFAULT (getutcdate()),

    [dayid] [smallint] NULL DEFAULT (datepart(dayofyear,getutcdate())),

    [server_name] [sysname] NOT NULL,

    [volume_tag] [sysname] NOT NULL,

    [capacity] [decimal](18, 0) NOT NULL,

    [free] [decimal](18, 0) NOT NULL,

    [used] AS ([capacity]-[free]),

    [usage_free_percent] AS (([free]/[capacity])*(100)),

    [usage_used_perecnt] AS ((([capacity]-[free])/[capacity])*(100))

    )

    From this data, I want to get a result set that can tell me:

    ... the change in disk space per server, per drive in the last 28 days (though if 28 days can be @interval tinyint that would be excellent too)

    ... a guesstimate/projection of days left until a threshold is reached - that threshold would be either something like 10GB free space or a percentage instead (@threshold bigint)

    The two queries do not need to be in one statement, but if they are then awesome.

    So I partly do not know how to bolt this together as I said earlier but also I Am not quite sure how to think about the problem even!

    I have this rather basic query that gives me AVG(free) broken down by day/drive/server:

    select server_name, volume_tag, dayid, avg(free)

    from [mgmt_disk_free_space]

    group by server_name, volume_tag, dayid

    But just not getting where to put in Lead and Lag, if indeed I need to use that. I'd rather use them over a self join. But even with a self join I am still not sure of the mechanics in the query I'm aiming! I managed to get this far:

    select

    server_name,

    volume_tag,

    avg(free),

    change_in_free=(free-lag(isnull(free,0)) over (order by dayid))

    from [mgmt_disk_free_space]

    group by server_name, volume_tag, dayid, free

    order by server_name, volume_tag

    Before declaring I've no idea what I am doing now!

    So any help would be much appreciated

    Thanks!

    Alex

  • Alex

    Not sure whether you want the average of all the drives for a particular day, or the average for a particular drive over all the days. But here's a guess.SELECT

    server_name

    ,volume_tag

    ,day_id

    ,AVG(free) OVER (PARTITION BY server_name, volume_tag)

    ,change_in_free=(free-LAG(ISNULL(free,0)) OVER (ORDER BY dayid PARTITION BY server_name, volume_tag))

    FROM mgmt_disk_free_space

    ORDER BY

    server_name

    ,volume_tag

    ,day_id

    John

  • John, that's excellent, most appreciated!

    As it turns out I didn't need the AVG() in there, that was left over from my desperate editing to make it work! I just wanted to see 'free' in there, so I could check the numbers were right.

    If I wanted to compare today's number (or the most recent day number) to one that was 28 days ago, would I pass 28 as the interval parameters to LAG()?

    SELECT

    server_name

    ,volume_tag

    ,dayid

    , free

    ,change_in_free=(free-LAG(ISNULL(free,0), 28) OVER (PARTITION BY server_name, volume_tag ORDER BY dayid ))

    FROM mgmt_disk_free_space

    ORDER BY

    server_name

    ,volume_tag

    ,dayid

    I don't quite have 28 days of data to test this on yet but i could generate some.

    Cheers!

  • Alex

    That would work so long as you have a row for every day for every volume. If there's any chance you won't, you ought to join to a calendar table to fill in the gaps.

    John

  • Hope your Capacity value never comes in as 0! :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi John,

    I am not sure what you mean by join to a calendar table?

    John Mitchell-245523 (11/2/2016)


    Alex

    ... If there's any chance you won't, you ought to join to a calendar table to fill in the gaps.

    John

    Cheers

  • TheSQLGuru (11/2/2016)


    Hope your Capacity value never comes in as 0! :hehe:

    Ohh. Me too!

  • alex.sqldba (11/2/2016)


    I am not sure what you mean by join to a calendar table?

    Calendar Tables[/url]

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Yes, although since you're working on day IDs rather than actual dates, you only need a Numbers table. You also need to cross join to your main table to make sure there's a row for every day for every volume. I haven't tested this query, and there may be ways of getting it to perform better, but it'll set you going in the right direction.

    As an aside, defaulting your day_id column to the day of the year isn't a good idea, since things will start to go wrong on 1st January next year!

    WITH N8(n) AS (

    SELECT n FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8)) N8(n)

    )

    , N64(n) AS (

    SELECT n1.n FROM N8 n1 CROSS JOIN N8 n2

    )

    , N512(n) AS (

    SELECT ROW_NUMBER() OVER (ORDER BY n1.n)

    FROM N64 CROSS JOIN N8

    )

    , DaysVolumes AS ( -- gives you one row for each number for each volume on each server

    SELECT DISTINCT

    N512.n

    ,m.server_name

    ,m.volume_tag

    FROM N512

    CROSS JOIN mgmt_disk_free_space

    )

    SELECT

    m.server_name

    ,m.volume_tag

    ,d.n

    ,m.free

    ,change_in_free=(m.free-LAG(ISNULL(m.free,0), 28) OVER (PARTITION BY m.server_name, m.volume_tag ORDER BY d.n))

    FROM DaysVolumes d

    LEFT JOIN mgmt_disk_free_space m ON d.n = m.day_id AND d.server_name = m.server_name and d.volume_tag = m.volume_tag

    ORDER BY

    m.server_name

    ,m.volume_tag

    ,d.n

    John

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply