October 19, 2015 at 2:11 am
Hello all,
Following scenario
Database Global Storage
There is a daily import of servers and their diskspace info,so total diskspace,total used,total free. Also included
is the domain of the server & date of import. this table is called Diskspace_Global.
Servers that are new are added to the table Diskspace_Region_Country, which contains data about the region where the server is located and whether the server needs to be in the capacity report or not (Some servers will be clusters,while other the nodes for those clusters).
We also have the table Reporting_Servers_AD_Global which contains more detailed information about the servers,for these we have a slowly changing dimension.
Now here comes the clinger,out of all this data we need to get a report that is grouped by region and month,for each server we need either the most biggest value for diskspace,total used,total free in a month.
The file that reads in daily can contain data for months besides the current one.
Tables
USE [Global_Storage]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Diskspace_Global](
[Server] [nvarchar](255) NULL,
[Total_Diskspace] [int] NULL,
[Total_Diskspace_Free] [int] NULL,
[Total_Diskspace_Used] [int] NULL,
[Domain] [nvarchar](255) NULL,
[Date_] [date] NULL
) ON [PRIMARY]
GO
USE [Global_Storage]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Reporting_Servers_AD_Global](
[Server] [nvarchar](255) NULL,
[Status] [nvarchar](255) NULL,
[Domain] [nvarchar](255) NULL,
[OU] [nvarchar](255) NULL,
[OS] [nvarchar](255) NULL,
[Service_Pack] [nvarchar](255) NULL,
[Version] [nvarchar](255) NULL,
[In_Server_OU] [bit] NOT NULL,
[Date_Add] [datetime] NULL,
[Date_Changed] [datetime] NULL,
127.0.0.1 [nvarchar](255) NULL,
[Date_] [date] NULL
) ON [PRIMARY]
GO
USE [Global_Storage]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Diskspace_Region_Country](
[Server] [nvarchar](255) NULL,
[DOMAIN] [nvarchar](255) NULL,
[COUNTRY] [nvarchar](255) NULL,
[REGION] [nvarchar](255) NULL,
[SITE] [nvarchar](255) NULL,
[Report] [nvarchar](255) NULL
) ON [PRIMARY]
GO
USE [Global_Storage]
GO
CREATE NONCLUSTERED INDEX [Missing_IXNC_Reporting_Servers_AD_Global_Status_EB170] ON [dbo].[Reporting_Servers_AD_Global]
(
[Status] ASC
)
INCLUDE ( [Server],
[Domain]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
So with that all said I created a query that works very well but takes 30-40 minutes to complete.(Note this query is only meant for the max per month)
I'm already putting the date in a staging table,that gets dropped & recreated each time,before filling the data in the table the report uses.
The query
USE [Global_Storage]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[stg].[Report_Global_Storage_Capacity]') AND type in (N'U'))
DROP TABLE [stg].[Report_Global_Storage_Capacity]
GO
with Dates_ as
(select distinct Date_ from dbo.Diskspace_Global)
,DatesPresent as
(select
distinct Date_,
ROW_NUMBER() OVER(PARTITION BY YEAR([Date_]), MONTH([Date_]) ORDER BY DAY([Date_]) DESC) AS rn
from Dates_
)
select distinct
YEAR(dp.[Date_]) As Year_,
MONTH(dp.[Date_]) As Month_,
rsag.Server,
rsag.Domain,
drc.COUNTRY,
drc.Region,
drc.SITE,
(
select top 1 rsag2.OU from dbo.Reporting_Servers_AD_Global rsag2
where rsag2.Server = rsag.Server
and rsag2.Domain = rsag.Domain
and rsag2.Date_ = dp.Date_
) As OU,
(
select MAX(Total_Diskspace) from dbo.Diskspace_Global dg
where dg.Server = rsag.Server
and dg.Domain = rsag.Domain
and DATEPART(year,dg.Date_) = DATEPART(year,dp.Date_)
and DATEPART(month,dg.Date_) = DATEPART(month,dp.Date_)
) As TotalDiskspace,
(
select MAX(Total_Diskspace_Used) from dbo.Diskspace_Global dg
where dg.Server = rsag.Server
and dg.Domain = rsag.Domain
and DATEPART(year,dg.Date_) = DATEPART(year,dp.Date_)
and DATEPART(month,dg.Date_) = DATEPART(month,dp.Date_)
) As TotalDiskspaceUsed,
(
select MAX(Total_Diskspace_Free) from dbo.Diskspace_Global dg
where dg.Server = rsag.Server
and dg.Domain = rsag.Domain
and DATEPART(year,dg.Date_) = DATEPART(year,dp.Date_)
and DATEPART(month,dg.Date_) = DATEPART(month,dp.Date_)
) As TotalDiskspaceFree
into [stg].[Report_Global_Storage_Capacity]
from DatesPresent dp,dbo.Reporting_Servers_AD_Global rsag
inner join dbo.Diskspace_Region_Country drc
on rsag.Server = drc.Server
and rsag.Domain = drc.Domain
and drc.Report = 'YES'
where rn = 1
order by YEAR(dp.[Date_]),
MONTH(dp.[Date_]),
rsag.Server,
rsag.Domain,
drc.COUNTRY,
drc.Region,
drc.SITE,
OU,
TotalDiskspace,
TotalDiskspaceUsed
truncate table [dbo].[Report_Global_Storage_Capacity]
insert into [dbo].[Report_Global_Storage_Capacity](Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree)
select Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree
from [stg].[Report_Global_Storage_Capacity]
Based on this information does anyone got any suggestion how to improve the performance of the query.
October 19, 2015 at 2:57 am
Please will you post the execution plan?
Do you have any control over the table structure? Is it really necessary to use nvarchar (instead of varchar) everywhere?
Why are you dumping the results into a staging table and then copying them straight to the destination table?
Possibly the worst performance killer here is the correlated subqueries. Have you considered using table valued functions instead?
You can simplify the lines with the DATEPART functions something like this (although you could probably improve it further with table valued functions):
AND dg.Date_ >= SELECT DATEADD(month,DATEDIFF(month,'19000101',dp.Date_),'19000101')
AND dg.Date_ < DATEADD(month,DATEDIFF(month,'19000101',dp.Date_)+1,'19000101')
Similarly, in your ORDER BY, just order by dp.Date_, instead of doing those two lines of date arithmetic.
John
October 19, 2015 at 3:02 am
You can start by getting rid of some of those DISTINCTs. It's unnecessary to distinct at every level, the duplicates aren't going to creep back.
with Dates_ as
(select distinct Date_ from dbo.Diskspace_Global)
,DatesPresent as
(select
distinct Date_,
ROW_NUMBER() OVER(PARTITION BY YEAR([Date_]), MONTH([Date_]) ORDER BY DAY([Date_]) DESC) AS rn
from Dates_
)
Here for example, with the distinct in the first CTE, there are no duplicate values of Date_ for the DISTINCT in the second to remove. Hence it's just overhead with no benefit. Same thing further down the query.
The functions on your date columns prevent index usage.
Also try to move the correlated subqueries in your SELECT to derived tables in the FROM. It may not make them faster, but it'll be easier to work with, and as it is you have TOP(1) without an order by which means you get any one row from that subquery.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 19, 2015 at 3:12 am
John Mitchell-245523 (10/19/2015)
Please will you post the execution plan?
I'll see what I can do in the immediate future.
John Mitchell-245523 (10/19/2015)
Please will you post the execution plan?Do you have any control over the table structure? Is it really necessary to use nvarchar (instead of varchar) everywhere?
Yes, I got control over the tables, so far I haven't encountered non-roman characters but I've been told it's a possibility.
John Mitchell-245523 (10/19/2015)
Why are you dumping the results into a staging table and then copying them straight to the destination table?
To keep the report functional
October 19, 2015 at 5:36 am
USE [Global_Storage]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[stg].[Report_Global_Storage_Capacity]') AND type in (N'U'))
DROP TABLE [stg].[Report_Global_Storage_Capacity]
GO
with DatesPresent as (
SELECT
dp_year = YEAR(Date_),
dp_Month = MONTH(Date_),
FirstOfThisMonth = CAST(YEAR(Date_) AS VARCHAR(4)) + RIGHT('0'+CAST(MONTH(Date_) AS VARCHAR(2)),2)+'01',
FirstOfNextMonth = CAST(YEAR(Date_) AS VARCHAR(4)) + RIGHT('0'+CAST(MONTH(Date_)+1 AS VARCHAR(2)),2)+'01'
FROM dbo.Diskspace_Global
GROUP BY YEAR(Date_), MONTH(Date_)
)
select distinct -- is the DISTINCT really necessary?
dp_year As Year_,
dp_Month As Month_,
rsag.Server,
rsag.Domain,
drc.COUNTRY,
drc.Region,
drc.SITE,
y.OU,
x.TotalDiskspace,
x.TotalDiskspaceUsed,
x.TotalDiskspaceFree
into [stg].[Report_Global_Storage_Capacity]
FROM dbo.Reporting_Servers_AD_Global rsag
INNER JOIN dbo.Diskspace_Region_Country drc
ON rsag.Server = drc.Server
and rsag.Domain = drc.Domain
and drc.Report = 'YES'
CROSS JOIN DatesPresent dp
CROSS APPLY (
SELECT
TotalDiskspace = MAX(Total_Diskspace),
TotalDiskspaceUsed = MAX(Total_Diskspace_Used),
TotalDiskspaceFree = MAX(Total_Diskspace_Free)
FROM dbo.Diskspace_Global dg
WHERE dg.Server = rsag.Server
and dg.Domain = rsag.Domain
and dg.Date_ >= dp.FirstOfThisMonth
and dg.Date_ < dp.FirstOfNextMonth
) x
OUTER APPLY (
SELECT TOP 1 rsag2.OU -- TOP without ORDER BY?
FROM dbo.Reporting_Servers_AD_Global rsag2
WHERE rsag2.Server = rsag.Server
and rsag2.Domain = rsag.Domain
and rsag2.Date_ = dp.FirstOfThisMonth
) y
--where dp.rn = 1 no longer required
-- Not worth bothering with the ORDER BY. Use ORDER BY when querying from Report_Global_Storage_Capacity
--ORDER BY dp_year,
-- dp_Month,
-- rsag.Server,
-- rsag.Domain,
-- drc.COUNTRY,
-- drc.Region,
-- drc.SITE,
-- OU,
-- TotalDiskspace,
-- TotalDiskspaceUsed
truncate table [dbo].[Report_Global_Storage_Capacity]
insert into [dbo].[Report_Global_Storage_Capacity](Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree)
select Year_, Month_, [Server], [Domain], [COUNTRY], [Region], [SITE], [OU], TotalDiskspace, TotalDiskspaceUsed, TotalDiskspaceFree
from [stg].[Report_Global_Storage_Capacity]
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 19, 2015 at 7:27 am
ChrisM@Work (10/19/2015)
USE [Global_Storage]
select distinct -- is the DISTINCT really necessary?
...
yes it's quite necessairy,but also this solution brings the excution from 35 minutes to 22
1 Change
FirstOfNextMonth = CAST(YEAR(Date_) AS VARCHAR(4)) + REPLACE(RIGHT('0'+CAST(MONTH(Date_)+1 AS VARCHAR(2)),2),'13','01')+'01' FROM dbo.Diskspace_Global
Otherwise it created the 13th month
Reason why distinct is needed (records slightly altered)
YearMonth ServerDomainCountryRegionTotalDiskspaceTotalDiskspaceUsedTotalDiskspaceFree
20141210...12XYZUSNorth AmericaNULLNULLNULL
20141210...12XYZUSNorth AmericaNULLNULLNULL
We have several of these where the diskspaces or other values have no changes
October 19, 2015 at 7:42 am
So you have some improvement. Can you post the actual execution plan please (not a picture of it, a .sqlplan file attachment)? Cheers.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 19, 2015 at 7:48 am
Ok,the execution plan of my query & your solution ChrisM@Work
October 19, 2015 at 7:58 am
Resender (10/19/2015)
Ok,the execution plan of my query & your solution ChrisM@Work
Thanks. Your tables are heaps. Clustered indexes are strongly recommended.
Does the query really return 60 odd million rows? It can't, the dates would be outside SQL Server date range.
Have a look at getting clustered indexes on all of those tables then post back with the actual plan for either of the queries - that's the actual rather than estimated plan. Cheers.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 21, 2015 at 5:48 am
Ok sorry for the late answer I did not have time yesterday to pursue this.
Here are the actual execution plans for both versions.
I have created 1 clustered index in the diskspace_region_country table
These also do not include the insert of the data into staging & then destination as actually I don't have a window today to let any of the data be absent.
Respectively the execution times were 25 minutes for my original query 16 minutes for the alternative
October 21, 2015 at 6:21 am
Resender (10/21/2015)
Ok sorry for the late answer I did not have time yesterday to pursue this.Here are the actual execution plans for both versions.
I have created 1 clustered index in the diskspace_region_country table
These also do not include the insert of the data into staging & then destination as actually I don't have a window today to let any of the data be absent.
Respectively the execution times were 25 minutes for my original query 16 minutes for the alternative
Ah, interesting. Try this as an alternative, with and without DISTINCT. If the result set from the query without DISTINCT returns correct results, please post up the actual execution plan. Cheers.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 21, 2015 at 7:48 am
Execution results
--distinct
13:12
108054
--no distinct
17:19
4919706
Rapid glamps shows same data but a lot of duplicates which I suspected.
I've included the actual execution plans, so far I'm happy the time is more then halved vs the current query.
October 21, 2015 at 7:50 am
Resender (10/21/2015)
Execution results--distinct
13:12
108054
--no distinct
17:19
4919706
Rapid glamps shows same data but a lot of duplicates which I suspected.
I've included the actual execution plans, so far I'm happy the time is more then halved vs the current query.
Gosh, humble apologies - I missed the query mod from my last post. Here it is:
SELECT distinct -- is the DISTINCT really necessary?
Year_ = x.dp_year,
Month_ = x.dp_Month,
x.[Server],
x.Domain,
drc.COUNTRY,
drc.Region,
drc.[SITE],
y.OU,
x.TotalDiskspace,
x.TotalDiskspaceUsed,
x.TotalDiskspaceFree
FROM (
SELECT
[Server],
Domain,
dp_year = YEAR(Date_),
dp_Month = MONTH(Date_),
FirstOfThisMonth = MAX(DATEADD(MONTH,DATEDIFF(MONTH,0,_date),0)),
TotalDiskspace = MAX(Total_Diskspace),
TotalDiskspaceUsed = MAX(Total_Diskspace_Used),
TotalDiskspaceFree = MAX(Total_Diskspace_Free)
FROM dbo.Diskspace_Global
GROUP BY [Server], Domain, YEAR(Date_), MONTH(Date_)
) x
INNER JOIN dbo.Diskspace_Region_Country drc
ON drc.[Server] = x.[Server]
and drc.Domain = x.Domain
and drc.Report = 'YES'
OUTER APPLY (
SELECT TOP 1 rsag2.OU -- TOP without ORDER BY?
FROM dbo.Reporting_Servers_AD_Global rsag2
WHERE rsag2.[Server] = x.[Server]
and rsag2.Domain = x.Domain
and rsag2.Date_ = x.FirstOfThisMonth
) y
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 21, 2015 at 8:09 am
no problem,making errors is in human nature,I'm however at the end of my workday and I carpool to work so I don't have time to start the new query I'll test it tomorrow and post the results.
October 21, 2015 at 8:13 am
Resender (10/21/2015)
no problem,making errors is in human nature,I'm however at the end of my workday and I carpool to work so I don't have time to start the new query I'll test it tomorrow and post the results.
Your pool car for today should be a DeLorean.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply