Speed up following query (Second opinion needed)

  • Hello all,

    Following scenario

    Database Global Storage

    There is a daily import of servers and their diskspace info,so total diskspace,total used,total free. Also included

    is the domain of the server & date of import. this table is called Diskspace_Global.

    Servers that are new are added to the table Diskspace_Region_Country, which contains data about the region where the server is located and whether the server needs to be in the capacity report or not (Some servers will be clusters,while other the nodes for those clusters).

    We also have the table Reporting_Servers_AD_Global which contains more detailed information about the servers,for these we have a slowly changing dimension.

    Now here comes the clinger,out of all this data we need to get a report that is grouped by region and month,for each server we need either the most biggest value for diskspace,total used,total free in a month.

    The file that reads in daily can contain data for months besides the current one.

    Tables

    USE [Global_Storage]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[Diskspace_Global](

    [Server] [nvarchar](255) NULL,

    [Total_Diskspace] [int] NULL,

    [Total_Diskspace_Free] [int] NULL,

    [Total_Diskspace_Used] [int] NULL,

    [Domain] [nvarchar](255) NULL,

    [Date_] [date] NULL

    ) ON [PRIMARY]

    GO

    USE [Global_Storage]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[Reporting_Servers_AD_Global](

    [Server] [nvarchar](255) NULL,

    [Status] [nvarchar](255) NULL,

    [Domain] [nvarchar](255) NULL,

    [OU] [nvarchar](255) NULL,

    [OS] [nvarchar](255) NULL,

    [Service_Pack] [nvarchar](255) NULL,

    [Version] [nvarchar](255) NULL,

    [In_Server_OU] [bit] NOT NULL,

    [Date_Add] [datetime] NULL,

    [Date_Changed] [datetime] NULL,

    127.0.0.1 [nvarchar](255) NULL,

    [Date_] [date] NULL

    ) ON [PRIMARY]

    GO

    USE [Global_Storage]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[Diskspace_Region_Country](

    [Server] [nvarchar](255) NULL,

    [DOMAIN] [nvarchar](255) NULL,

    [COUNTRY] [nvarchar](255) NULL,

    [REGION] [nvarchar](255) NULL,

    [SITE] [nvarchar](255) NULL,

    [Report] [nvarchar](255) NULL

    ) ON [PRIMARY]

    GO

    USE [Global_Storage]

    GO

    CREATE NONCLUSTERED INDEX [Missing_IXNC_Reporting_Servers_AD_Global_Status_EB170] ON [dbo].[Reporting_Servers_AD_Global]

    (

    [Status] ASC

    )

    INCLUDE ( [Server],

    [Domain]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    So with that all said I created a query that works very well but takes 30-40 minutes to complete.(Note this query is only meant for the max per month)

    I'm already putting the date in a staging table,that gets dropped & recreated each time,before filling the data in the table the report uses.

    The query

    USE [Global_Storage]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[stg].[Report_Global_Storage_Capacity]') AND type in (N'U'))

    DROP TABLE [stg].[Report_Global_Storage_Capacity]

    GO

    with Dates_ as

    (select distinct Date_ from dbo.Diskspace_Global)

    ,DatesPresent as

    (select

    distinct Date_,

    ROW_NUMBER() OVER(PARTITION BY YEAR([Date_]), MONTH([Date_]) ORDER BY DAY([Date_]) DESC) AS rn

    from Dates_

    )

    select distinct

    YEAR(dp.[Date_]) As Year_,

    MONTH(dp.[Date_]) As Month_,

    rsag.Server,

    rsag.Domain,

    drc.COUNTRY,

    drc.Region,

    drc.SITE,

    (

    select top 1 rsag2.OU from dbo.Reporting_Servers_AD_Global rsag2

    where rsag2.Server = rsag.Server

    and rsag2.Domain = rsag.Domain

    and rsag2.Date_ = dp.Date_

    ) As OU,

    (

    select MAX(Total_Diskspace) from dbo.Diskspace_Global dg

    where dg.Server = rsag.Server

    and dg.Domain = rsag.Domain

    and DATEPART(year,dg.Date_) = DATEPART(year,dp.Date_)

    and DATEPART(month,dg.Date_) = DATEPART(month,dp.Date_)

    ) As TotalDiskspace,

    (

    select MAX(Total_Diskspace_Used) from dbo.Diskspace_Global dg

    where dg.Server = rsag.Server

    and dg.Domain = rsag.Domain

    and DATEPART(year,dg.Date_) = DATEPART(year,dp.Date_)

    and DATEPART(month,dg.Date_) = DATEPART(month,dp.Date_)

    ) As TotalDiskspaceUsed,

    (

    select MAX(Total_Diskspace_Free) from dbo.Diskspace_Global dg

    where dg.Server = rsag.Server

    and dg.Domain = rsag.Domain

    and DATEPART(year,dg.Date_) = DATEPART(year,dp.Date_)

    and DATEPART(month,dg.Date_) = DATEPART(month,dp.Date_)

    ) As TotalDiskspaceFree

    into [stg].[Report_Global_Storage_Capacity]

    from DatesPresent dp,dbo.Reporting_Servers_AD_Global rsag

    inner join dbo.Diskspace_Region_Country drc

    on rsag.Server = drc.Server

    and rsag.Domain = drc.Domain

    and drc.Report = 'YES'

    where rn = 1

    order by YEAR(dp.[Date_]),

    MONTH(dp.[Date_]),

    rsag.Server,

    rsag.Domain,

    drc.COUNTRY,

    drc.Region,

    drc.SITE,

    OU,

    TotalDiskspace,

    TotalDiskspaceUsed

    truncate table [dbo].[Report_Global_Storage_Capacity]

    insert into [dbo].[Report_Global_Storage_Capacity](Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree)

    select Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree

    from [stg].[Report_Global_Storage_Capacity]

    Based on this information does anyone got any suggestion how to improve the performance of the query.

  • Please will you post the execution plan?

    Do you have any control over the table structure? Is it really necessary to use nvarchar (instead of varchar) everywhere?

    Why are you dumping the results into a staging table and then copying them straight to the destination table?

    Possibly the worst performance killer here is the correlated subqueries. Have you considered using table valued functions instead?

    You can simplify the lines with the DATEPART functions something like this (although you could probably improve it further with table valued functions):

    AND dg.Date_ >= SELECT DATEADD(month,DATEDIFF(month,'19000101',dp.Date_),'19000101')

    AND dg.Date_ < DATEADD(month,DATEDIFF(month,'19000101',dp.Date_)+1,'19000101')

    Similarly, in your ORDER BY, just order by dp.Date_, instead of doing those two lines of date arithmetic.

    John

  • You can start by getting rid of some of those DISTINCTs. It's unnecessary to distinct at every level, the duplicates aren't going to creep back.

    with Dates_ as

    (select distinct Date_ from dbo.Diskspace_Global)

    ,DatesPresent as

    (select

    distinct Date_,

    ROW_NUMBER() OVER(PARTITION BY YEAR([Date_]), MONTH([Date_]) ORDER BY DAY([Date_]) DESC) AS rn

    from Dates_

    )

    Here for example, with the distinct in the first CTE, there are no duplicate values of Date_ for the DISTINCT in the second to remove. Hence it's just overhead with no benefit. Same thing further down the query.

    The functions on your date columns prevent index usage.

    Also try to move the correlated subqueries in your SELECT to derived tables in the FROM. It may not make them faster, but it'll be easier to work with, and as it is you have TOP(1) without an order by which means you get any one row from that subquery.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • John Mitchell-245523 (10/19/2015)


    Please will you post the execution plan?

    I'll see what I can do in the immediate future.

    John Mitchell-245523 (10/19/2015)


    Please will you post the execution plan?

    Do you have any control over the table structure? Is it really necessary to use nvarchar (instead of varchar) everywhere?

    Yes, I got control over the tables, so far I haven't encountered non-roman characters but I've been told it's a possibility.

    John Mitchell-245523 (10/19/2015)


    Why are you dumping the results into a staging table and then copying them straight to the destination table?

    To keep the report functional

  • USE [Global_Storage]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[stg].[Report_Global_Storage_Capacity]') AND type in (N'U'))

    DROP TABLE [stg].[Report_Global_Storage_Capacity]

    GO

    with DatesPresent as (

    SELECT

    dp_year = YEAR(Date_),

    dp_Month = MONTH(Date_),

    FirstOfThisMonth = CAST(YEAR(Date_) AS VARCHAR(4)) + RIGHT('0'+CAST(MONTH(Date_) AS VARCHAR(2)),2)+'01',

    FirstOfNextMonth = CAST(YEAR(Date_) AS VARCHAR(4)) + RIGHT('0'+CAST(MONTH(Date_)+1 AS VARCHAR(2)),2)+'01'

    FROM dbo.Diskspace_Global

    GROUP BY YEAR(Date_), MONTH(Date_)

    )

    select distinct -- is the DISTINCT really necessary?

    dp_year As Year_,

    dp_Month As Month_,

    rsag.Server,

    rsag.Domain,

    drc.COUNTRY,

    drc.Region,

    drc.SITE,

    y.OU,

    x.TotalDiskspace,

    x.TotalDiskspaceUsed,

    x.TotalDiskspaceFree

    into [stg].[Report_Global_Storage_Capacity]

    FROM dbo.Reporting_Servers_AD_Global rsag

    INNER JOIN dbo.Diskspace_Region_Country drc

    ON rsag.Server = drc.Server

    and rsag.Domain = drc.Domain

    and drc.Report = 'YES'

    CROSS JOIN DatesPresent dp

    CROSS APPLY (

    SELECT

    TotalDiskspace = MAX(Total_Diskspace),

    TotalDiskspaceUsed = MAX(Total_Diskspace_Used),

    TotalDiskspaceFree = MAX(Total_Diskspace_Free)

    FROM dbo.Diskspace_Global dg

    WHERE dg.Server = rsag.Server

    and dg.Domain = rsag.Domain

    and dg.Date_ >= dp.FirstOfThisMonth

    and dg.Date_ < dp.FirstOfNextMonth

    ) x

    OUTER APPLY (

    SELECT TOP 1 rsag2.OU -- TOP without ORDER BY?

    FROM dbo.Reporting_Servers_AD_Global rsag2

    WHERE rsag2.Server = rsag.Server

    and rsag2.Domain = rsag.Domain

    and rsag2.Date_ = dp.FirstOfThisMonth

    ) y

    --where dp.rn = 1 no longer required

    -- Not worth bothering with the ORDER BY. Use ORDER BY when querying from Report_Global_Storage_Capacity

    --ORDER BY dp_year,

    -- dp_Month,

    -- rsag.Server,

    -- rsag.Domain,

    -- drc.COUNTRY,

    -- drc.Region,

    -- drc.SITE,

    -- OU,

    -- TotalDiskspace,

    -- TotalDiskspaceUsed

    truncate table [dbo].[Report_Global_Storage_Capacity]

    insert into [dbo].[Report_Global_Storage_Capacity](Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree)

    select Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree

    from [stg].[Report_Global_Storage_Capacity]

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (10/19/2015)


    USE [Global_Storage]

    select distinct -- is the DISTINCT really necessary?

    ...

    yes it's quite necessairy,but also this solution brings the excution from 35 minutes to 22

    1 Change

    FirstOfNextMonth = CAST(YEAR(Date_) AS VARCHAR(4)) + REPLACE(RIGHT('0'+CAST(MONTH(Date_)+1 AS VARCHAR(2)),2),'13','01')+'01' FROM dbo.Diskspace_Global

    Otherwise it created the 13th month

    Reason why distinct is needed (records slightly altered)

    YearMonth ServerDomainCountryRegionTotalDiskspaceTotalDiskspaceUsedTotalDiskspaceFree

    20141210...12XYZUSNorth AmericaNULLNULLNULL

    20141210...12XYZUSNorth AmericaNULLNULLNULL

    We have several of these where the diskspaces or other values have no changes

  • So you have some improvement. Can you post the actual execution plan please (not a picture of it, a .sqlplan file attachment)? Cheers.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Ok,the execution plan of my query & your solution ChrisM@Work

  • Resender (10/19/2015)


    Ok,the execution plan of my query & your solution ChrisM@Work

    Thanks. Your tables are heaps. Clustered indexes are strongly recommended.

    Does the query really return 60 odd million rows? It can't, the dates would be outside SQL Server date range.

    Have a look at getting clustered indexes on all of those tables then post back with the actual plan for either of the queries - that's the actual rather than estimated plan. Cheers.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Ok sorry for the late answer I did not have time yesterday to pursue this.

    Here are the actual execution plans for both versions.

    I have created 1 clustered index in the diskspace_region_country table

    These also do not include the insert of the data into staging & then destination as actually I don't have a window today to let any of the data be absent.

    Respectively the execution times were 25 minutes for my original query 16 minutes for the alternative

  • Resender (10/21/2015)


    Ok sorry for the late answer I did not have time yesterday to pursue this.

    Here are the actual execution plans for both versions.

    I have created 1 clustered index in the diskspace_region_country table

    These also do not include the insert of the data into staging & then destination as actually I don't have a window today to let any of the data be absent.

    Respectively the execution times were 25 minutes for my original query 16 minutes for the alternative

    Ah, interesting. Try this as an alternative, with and without DISTINCT. If the result set from the query without DISTINCT returns correct results, please post up the actual execution plan. Cheers.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Execution results

    --distinct

    13:12

    108054

    --no distinct

    17:19

    4919706

    Rapid glamps shows same data but a lot of duplicates which I suspected.

    I've included the actual execution plans, so far I'm happy the time is more then halved vs the current query.

  • Resender (10/21/2015)


    Execution results

    --distinct

    13:12

    108054

    --no distinct

    17:19

    4919706

    Rapid glamps shows same data but a lot of duplicates which I suspected.

    I've included the actual execution plans, so far I'm happy the time is more then halved vs the current query.

    Gosh, humble apologies - I missed the query mod from my last post. Here it is:

    SELECT distinct -- is the DISTINCT really necessary?

    Year_ = x.dp_year,

    Month_ = x.dp_Month,

    x.[Server],

    x.Domain,

    drc.COUNTRY,

    drc.Region,

    drc.[SITE],

    y.OU,

    x.TotalDiskspace,

    x.TotalDiskspaceUsed,

    x.TotalDiskspaceFree

    FROM (

    SELECT

    [Server],

    Domain,

    dp_year = YEAR(Date_),

    dp_Month = MONTH(Date_),

    FirstOfThisMonth = MAX(DATEADD(MONTH,DATEDIFF(MONTH,0,_date),0)),

    TotalDiskspace = MAX(Total_Diskspace),

    TotalDiskspaceUsed = MAX(Total_Diskspace_Used),

    TotalDiskspaceFree = MAX(Total_Diskspace_Free)

    FROM dbo.Diskspace_Global

    GROUP BY [Server], Domain, YEAR(Date_), MONTH(Date_)

    ) x

    INNER JOIN dbo.Diskspace_Region_Country drc

    ON drc.[Server] = x.[Server]

    and drc.Domain = x.Domain

    and drc.Report = 'YES'

    OUTER APPLY (

    SELECT TOP 1 rsag2.OU -- TOP without ORDER BY?

    FROM dbo.Reporting_Servers_AD_Global rsag2

    WHERE rsag2.[Server] = x.[Server]

    and rsag2.Domain = x.Domain

    and rsag2.Date_ = x.FirstOfThisMonth

    ) y

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • no problem,making errors is in human nature,I'm however at the end of my workday and I carpool to work so I don't have time to start the new query I'll test it tomorrow and post the results.

  • Resender (10/21/2015)


    no problem,making errors is in human nature,I'm however at the end of my workday and I carpool to work so I don't have time to start the new query I'll test it tomorrow and post the results.

    Your pool car for today should be a DeLorean.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply