Tuesday is upon us. It also happens to be the second Tuesday of the month and you know what that means. It is time for TSQL Tuesday. This month, Robert Pearl, a friend, is hosting. Read his invitation here.
As the invitation suggests, this topic is wide open. There are many ways to present data. Some of those methods may be better than others. There are a few things of note with Data Presentation that will impact how much better your Data Presentation will be.
The items I will discuss that will help your data be better presented are: Performance, Accuracy, Display, and Business Requirements. I will use a few scripts to progress through each of these topics. Let’s start with Performance.
Performance
Why is performance included in this discussion? How does performance relate to data presentation? Well, have you ever had an end-user complain that a report was utterly useless because it was too slow? If the report is too slow, it won’t matter if the data is accurate in the eyes of some. They needed to have the data yesterday and it simply took too long.
Here is an example of a query that could be optimized a bit more (admittedly this query does not perform soooo slowly that a user would give up on it – by the end you will see that it could perform better).
DECLARE @StartDate DATETIME = '2011-09-08'
,@EndDate DATETIME = '2013-11-09'
,@numMonths TINYINT
DECLARE @calendarTable TABLE (monthNum TINYINT, monthDays TINYINT)
SET @numMonths = (SELECT DATEDIFF(m,@StartDate,@EndDate) + 1)
WHILE @numMonths >= 1
BEGIN
INSERT INTO @calendarTable SELECT MONTH(@StartDate), DATEDIFF(d, @startDate, DATEADD(m, 1, DATEADD(d, 1 - DAY(@StartDate), @StartDate)))
SET @numMonths = @numMonths - 1
SET @StartDate = (SELECT DATEADD(m, 1, DATEADD(d, 1 - DAY(@StartDate), @StartDate)))
--SELECT @StartDate
END
SELECT * FROM @calendarTable
Go
The requirements for this script are simple. Provide the number of days in a month in a given date range. If the starting date provided is not the first of the month, then we must only provide the number of days from that date to the end of the month.
As you can see, this script utilizes a looping mechanism to provide this information. The loop inserts into a table variable one record at a time. When examining the execution plan and the execution time on this query, one would see that the insert is the most expensive part of the query. One would also find that this query does take a fair amount of time to run – despite its’ simplicity. On my machine, it takes about 1.3 seconds to execute.
Certainly, if this were a more complicated query, one would see that this type of query could cause some delays in data rendering and subsequently cause grief for the end-user and you.
There is one more issue with the provided query in that it doesn’t meet all requirements. I neglected to mention that the date format needs to accept date/month/year format. Yes it is nitpicking, but it was a requirement and an invalid date is far too easy to submit with this query.
Accuracy
There should be no question on how accuracy affects data presentation. No matter how pretty the rest of the data/report may look, wrong data will render the report useless in the eyes of many business users.
Again, this example is not extreme – but it does create sufficient concern that the query should not be used – unless fixed.
SET DATEFORMAT DMY
DECLARE @StartDate DATE = '05/06/2011'
,@EndDate DATE = '31/08/2012'
;
WITH getmonths AS (
SELECT CAST(CONVERT(VARCHAR,'01/' + CAST(MonthNum AS VARCHAR) + '/' + CAST(yr.yr AS VARCHAR)) AS DATE) AS Moy
FROM ( VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) Mnths (MonthNum)
Cross Apply (SELECT YEAR(@StartDate) AS yr
UNION
SELECT YEAR(@EndDate) AS yr
) yr
), inputdates AS (
SELECT dates FROM (VALUES (@StartDate),(@EndDate)) D (Dates)
)
SELECT DATENAME(m,g.moy) AS MonthInRange,YEAR(g.moy) AS YrInRange
,DATEPART(d,DATEADD(mm, DATEDIFF(mm, 0, g.moy)+1, 0)-1)
- CASE WHEN DATEPART(d,id.Dates) = DATEPART(d,DATEADD(mm, DATEDIFF(mm, 0, g.moy)+1, 0)-1)
THEN 0
WHEN DATEPART(d,g.moy) < DATEPART(d,id.Dates)
THEN DATEPART(d,id.Dates)
ELSE Isnull(DATEPART(d,id.Dates),DATEPART(d,g.moy)-1)
END AS DaysInMonthInRange
FROM getmonths g
LEFT Outer Join inputdates Id
ON DATEPART(m,g.moy) = DATEPART(m,id.Dates)
And YEAR(g.moy) = YEAR(id.Dates)
WHERE g.moy between @StartDate and @EndDate
Or id.Dates between @StartDate and @EndDate
Go
Though this query works faster than the first query, it is not entirely accurate. This query only supports a max of 12 months. Also, this query is overly complex. There is value in keeping this particular query more simple.
Display
This topic could be deemed to be largely a matter of preference. That said, there is great value in how you display the data to the end user. Displaying the data involves such things as useful column headings, and meaningful data. For instance, naming a column “Month” but displaying a year value in it – is not very useful. This could also overlap with the prior topic of accurate data.
Personally, when I am displaying the month, I like to see month names. In my experience, displaying the name of the month is more easily recognizable than the month number for many users.
SET STATISTICS TIME ON
SET DATEFORMAT DMY
DECLARE @StartDate DATETIME = '05/06/2011',
@EndDate DATETIME = '31/08/2013'
BEGIN
WITH Nbrs_2( n ) AS (SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 0),
Nbrs_3( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 CROSS JOIN Nbrs_2 n3)
, tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n)
FROM Nbrs_3
)
,cteMonthEnd AS
(
SELECT t.N,
MonthEnd = DATEADD(mm,DATEDIFF(mm,0,@StartDate) + t.N, 0) - 1
FROM Tally t
WHERE t.N BETWEEN 1 AND DATEDIFF(mm,@StartDate,@EndDate) + 1
)
SELECT MonthEnd
,NumDays = DATEPART(dd,MonthEnd) - CASE WHEN N > 1 THEN 0 ELSE DATEPART(dd,@StartDate) END
FROM cteMonthEnd
ORDER BY MonthEnd ASC
END
In this case, the Column Heading matches the data and works. However, a little change could make this query and output more useful to an end user who is quickly scanning over the results.
SET STATISTICS TIME ON
SET DATEFORMAT DMY
DECLARE @StartDate DATETIME = '05/06/2011',
@EndDate DATETIME = '31/08/2013'
BEGIN
WITH Nbrs_2( n ) AS (SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 1 UNION all
SELECT 0),
Nbrs_3( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 CROSS JOIN Nbrs_2 n3)
, tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n)
FROM Nbrs_3
)
,cteMonthEnd AS
(
SELECT t.N,
MonthEnd = DATEADD(mm,DATEDIFF(mm,0,@StartDate) + t.N, 0) - 1
FROM Tally t
WHERE t.N BETWEEN 1 AND DATEDIFF(mm,@StartDate,@EndDate) + 1
)
SELECT DATENAME(m,MonthEnd) AS [MONTH],YEAR(MonthEnd) AS [YEAR]
,NumDays = DATEPART(dd,MonthEnd) - CASE WHEN N > 1 THEN 0 ELSE DATEPART(dd,@StartDate) END
FROM cteMonthEnd
ORDER BY MonthEnd ASC
END
To this point, each query has progressively increased in performance. This last query (both versions) also matches the need to accurately display the data.
The display of data is closely coupled with the next topic – Business Requirements.
Business Requirements
When looking at the results of the last two queries, which one is more accurate? Well, that entirely depends on what the business has specified as acceptable output for this query. In this case, both could be correct or both could be wrong.
It is essential to have the requirements written down and understood by all parties so as to avoid any misconception or misunderstanding of what the query should display. Matching your output to the specified business requirements will help to provide an overall impression of accuracy and usefulness. It will also help to achieve faster sign-off by the business.
Conclusion
Data Presentation is not just about the look of a report. Data Presentation involves performance, accuracy, display and the business requirements. By giving proper attention to each of these facets, the Data presented will be both remarkable and acceptable to the end-user.