Usually you leave your office at 4:05 PM so that at 4:45 PM you can pick up your kids and
drive them to music class (your oldest daughter), to dance class (your youngest daughter),
and basketball practice (your son). Usually that's your routine but today at 4:00 PM you get a request
to generate an urgent ad hoc report.
According to WikiAnswers.com - "Ad Hoc is Latin and means 'to this'. It is used to indicate something that is done at the time without planning ahead of time. An Ad Hoc report is one that is created when someone asks for it, not at a specific point".
You call home and inform your kids that you will be late, but how late you do not know.
This is the request you: you need a report that shows all contractor names, their latest expense and date the expense was done.
First of all, you run a script to find Contractor and Expense tables across all databases on you production server. You open Expense table and review the data and table structure. What a mess!!! Your manager stops by and wants to know when he can have the report. You explain to him that you need at least two hours to generate this report, because of bad table design (no constraints, no indexes, no expense date column).
Your manager asks you to not use a cursor, not to use recursion, and send the report to a business analyst
and cc him. You know the query that will generate the requested report is just another query. You know the business analyst most likely on his way home. You also know it's fun to write this query and get correct results.
You decide to analyze data in table (table you see first time). You create test tables and populate them with a subset of data from real tables. Here is the code to set up the tables and populate them with some data.
--drop table dbo.tmpBusinessExpense Create table dbo.tmpBusinessExpense ( ContractorIdint not null ,FiscalYear int not null ,Quaterint not null ,ReportMonthint not null ,ReportWeekint not null ,ReportDayint not null ,Expensemoney not null ,CreateDate datetime not null ) create index UI_tmpBusinessExpense_ContractorId on dbo.tmpBusinessExpense(ContractorId) --truncate table dbo.tmpBusinessExpense insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (68022, 2009, 1, 2, 3, 4, 927428, '20060204') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2006, 1, 2, 3, 4, 278324, '20060204') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2007, 2, 3, 4, 6, 358974, '20060304') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2008, 3, 4, 2, 10, 438928, '20060404') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2009, 1, 5, 22, 18, 273954, '20090304') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2009, 2, 9, 33, 25, 373974, '20090304') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2009, 2, 10, 43, 7, 478464, '20090304') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2009, 3, 2, 6, 7, 768834, '20090304') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2009, 3, 2, 6, 12, 1268834, '20090304') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (238, 2009, 3, 2, 6, 29, 29568834, '20090304') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (608257, 2009, 2, 10, 43, 7, 478464, '20090708') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (608257, 2009, 4, 12, 51, 7, 768834, '20090405') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (608257, 2009, 4, 2, 51, 24, 1268834, '20090324') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (608257, 2009, 4, 12, 51, 14, 34, '20090812') --drop table dbo.tmpContractor Create table dbo.tmpContractor ( ContractorIdint not null ,ContractorName varchar(100) not null ) -- drop index dbo.tmpContractor.UI_tmpContractor_ContractorId create unique clustered index UI_tmpContractor_ContractorId on dbo.tmpContractor(ContractorId) insert into dbo.tmpContractor (ContractorId, ContractorName) values (238, 'R & P suns') insert into dbo.tmpContractor (ContractorId, ContractorName) values (68022, 'ALS Revices') insert into dbo.tmpContractor (ContractorId, ContractorName) values (608257, 'Entrance Manufacturing')
You analyze the data for few a random contractors with this query:
declare @ContractorId int set @ContractorId = 608257 --set @ContractorId = 238 --set @ContractorId = 68022 select * from dbo.tmpBusinessExpense where ContractorId = @ContractorId order by ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay
The task of selecting the latest expense for each contractor seems to be easy. You decide to work only with one contractor data to simplify the query writing process. Using SQL Server 2005 new features like Common Table Expression (CTE) and the function ROW_NUMBER(), you wrote a query that returns all entries added by one contractor: the ContractorId = 608257, and the Contractor Name is "Entrance Manufacturing".
declare @ContractorId int set @ContractorId = 608257; with RowsForOnlyEntranceManufacturing ( RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense) as ( select ROW_NUMBER() OVER (order by ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay) AS RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense from dbo.tmpBusinessExpense BE where BE.ContractorId = @ContractorId) select RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense from RowsForOnlyEntranceManufacturing
For Entrance Manufacturing, the last entered row (RowId = 4) is the result row, the only row you want to see in the report. How do you filter all the rows in the table to get only this row?
The first thought was - what if the sum of FiscalYear, ReportMonth, ReportWeek, ReportDay can uniquely identify the result row for a contractor?
select BE.FiscalYear + BE.ReportMonth + BE.ReportDay as SumOfThreeColumns, BE.* from dbo.tmpBusinessExpense BE where BE.ContractorId = 608257
Looking at rows 3 and 4 you got the answer - NO, this sum does not uniquely identify each row in the table. This is because SumOfThreeColumns = 2035 for both rows.
What if combining these three columns you can re engineer the Original ExpenseDate - the date a contractor filed his expense? You want to have the ExpenseDate in DateTime format. The right format! If you have the ExpenseDate in DateTime format you can derive FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay without storing them in table and get the results you need quickly.
Having ExpenseDate for each record in dbo.tmpBusinessExpense table you can use correlated subquery to select a row with the latest date! Simple? But what if a contractor has multiple expenses for a day? You checked it and you did not find duplicates.
select ContractorId, FiscalYear, Quater, ReportMonth, ReportDay, count(*) from dbo.tmpBusinessExpense group by ContractorId, FiscalYear, Quater, ReportMonth, ReportDay having count(*) > 1
You convert integer to string and concatenate FiscalYear, ReportMonth and ReportDay to get ExpenseDate
select convert(char(4), BE.FiscalYear) + convert(varchar(2), BE.ReportMonth) + convert(varchar(2), BE.ReportDay) as ExpenseDate, BE.* from dbo.tmpBusinessExpense BE where BE.ContractorId = 608257
You want to convert varchar data to ExpenseDate column with DateTime datatype.
declare @ContractorId int set @ContractorId = 608257; select convert(DateTime, convert(char(4), BE.FiscalYear) + convert(varchar(2), BE.ReportMonth) + convert(varchar(2), BE.ReportDay)) ,BE.* from dbo.tmpBusinessExpense BE where BE.ContractorId = @ContractorId
You got this message:
Msg 241, Level 16, State 1, Line 1 Conversion failed when converting datetime from character string.
Why it does not work? You try it for ReportDay = 14 and it works.
declare @ContractorId int set @ContractorId = 608257; select convert(DateTime, convert(char(4), BE.FiscalYear) + convert(varchar(2), BE.ReportMonth) + convert(varchar(2), BE.ReportDay) ) as ExpenseDate ,BE.* from dbo.tmpBusinessExpense BE where BE.ContractorId = @ContractorId and ReportDay = 14
It does not work for records 1,2,3 where leading zero for Month or Day is missing. You realized Month and Day should have two digits. If they do not - attach leading zero.
declare @ContractorId int set @ContractorId = 608257; select convert (DateTime, ( convert(char(4), BE.FiscalYear) + case when len(convert(varchar(2), BE.ReportMonth)) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(convert(varchar(2), BE.ReportDay)) = 1 then '0' + convert(varchar(2), BE.ReportDay) else convert(varchar(2), BE.ReportDay) end ) ) as ExpenseDate , BE.* from dbo.tmpBusinessExpense BE where BE.ContractorId = @ContractorId
Now, you noticed that you do not need to use convert when identifying the length of ReportMonth,
and ReportDay and you removed it.
declare @ContractorId int set @ContractorId = 608257; select convert (DateTime, ( convert(char(4), BE.FiscalYear) + case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2), BE.ReportDay) else convert(varchar(2), BE.ReportDay) end ) ) as ExpenseDate , BE.* from dbo.tmpBusinessExpense BE where BE.ContractorId = @ContractorId
or using Common table expression (CTE)
declare @ContractorId int set @ContractorId = 608257; with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate, ExpenseDate) as ( select BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, BE.ReportDay, BE.Expense, BE.CreateDate, convert (DateTime, ( convert(char(4), BE.FiscalYear) + case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2), BE.ReportDay) else convert(varchar(2), BE.ReportDay) end ) ) as ExpenseDate from dbo.tmpBusinessExpense BE where BE.ContractorId = @ContractorId ) select * from FixedExpenseDate
You use a CTE for simplicity, and you want to separate preparation of ExpenseDate and the main logic where you will determine the result set. You use correlated query to get in the result set only latest row entered by contractor Entrance Manufacturing. For each contractor query should return only one record with max ExpenseDate
declare @ContractorId int set @ContractorId = 608257; with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate, ExpenseDate) as ( select BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, BE.ReportDay, BE.Expense, BE.CreateDate, convert (DateTime, ( convert(char(4), BE.FiscalYear) + case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2), BE.ReportDay) else convert(varchar(2), BE.ReportDay) end ) ) as ExpenseDate from dbo.tmpBusinessExpense BE where BE.ContractorId = @ContractorId ) select * from FixedExpenseDate A where A.ExpenseDate = ( select Max(B.ExpenseDate) from FixedExpenseDate B where A.ContractorId = B.ContractorId) Here for each outer row, for each contractor subquery determines the max ExpenseDate. This query returns rows that have ExpenseDate equal to max ExpenseDate.
You check your query against the entire population
with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate, ExpenseDate) as ( select BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, BE.ReportDay, BE.Expense, BE.CreateDate, convert (DateTime, ( convert(char(4), BE.FiscalYear) + case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2), BE.ReportDay) else convert(varchar(2), BE.ReportDay) end ) ) as ExpenseDate from dbo.tmpBusinessExpense BE ) select * from FixedExpenseDate A where A.ExpenseDate = ( select Max(B.ExpenseDate) from FixedExpenseDate B where A.ContractorId = B.ContractorId)
You got an error message:
Msg 242, Level 16, State 3, Line 1 The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.
How to find a row with "out-of-range datetime value"? You use isdate function:
select top 3 BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, BE.ReportDay, BE.Expense, BE.CreateDate, isdate( ( convert(varchar(4), BE.FiscalYear) + case when len(convert(varchar(2), BE.ReportMonth)) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(rtrim(convert(char(2), BE.ReportDay))) = 1 then '0' + rtrim(convert(char(2), BE.ReportDay)) else rtrim(convert(char(2), BE.ReportDay)) end ) ) as IsDateIndicator from dbo.tmpBusinessExpense BE order by IsDateIndicator
The ReportDay = 29 and in 2009 there were only 28 days in February. You fixed this value:
update dbo.tmpBusinessExpense set ReportDay = 28 where ContractorId = 238 and FiscalYear = 2009 and Quater = 3 and ReportMonth = 2 and ReportWeek = 6 and ReportDay = 29
You check your query second time:
with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate, ExpenseDate) as ( select BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, BE.ReportDay, BE.Expense, BE.CreateDate, convert (DateTime, ( convert(char(4), BE.FiscalYear) + case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2), BE.ReportDay) else convert(varchar(2), BE.ReportDay) end ) ) as ExpenseDate from dbo.tmpBusinessExpense BE ) select * from FixedExpenseDate A where A.ExpenseDate = ( select Max(B.ExpenseDate) from FixedExpenseDate B where A.ContractorId = B.ContractorId)
Now you get rid of the extra columns and format your values in the report:
with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate, ExpenseDate) as ( select BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, BE.ReportDay, BE.Expense, BE.CreateDate, convert (DateTime, ( convert(char(4), BE.FiscalYear) + case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) else convert(varchar(2), BE.ReportMonth) end + case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2), BE.ReportDay) else convert(varchar(2), BE.ReportDay) end ) ) as ExpenseDate from dbo.tmpBusinessExpense BE ) select C.ContractorName, convert(varchar, A.ExpenseDate, 101) as LastExpenseDate, '$'+ convert(varchar, Expense, 1) as Expense from FixedExpenseDate A inner join dbo.tmpContractor C on A.ContractorId = C.ContractorId where A.ExpenseDate = ( select Max(B.ExpenseDate) from FixedExpenseDate B where A.ContractorId = B.ContractorId) order by C.ContractorName
The result is:
ContractorNameLastExpenseDateExpense ALS Revices02/04/2009$927,428.00 Entrance Manufacturing12/14/2009$34.00 R & P suns10/07/2009$478,464.00
You ran your query against production tables and you got slightly different result:
ContractorNameLastExpenseDateExpense ALS Revices02/04/2009$927,428.00 Entrance Manufacturing12/14/2009$34.00 Entrance Manufacturing12/14/2009$834.00 Entrance Manufacturing12/14/2009$3,834.00 Entrance Manufacturing12/14/2009$29,568,834.00 R & P suns10/07/2009$478,464.00
Obviously, while you were working on your report at least three records were entered for Contractor Entrance Manufacturing. To replicate this problem in your test tables you entered these rows:
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (608257, 2009, 4, 12, 51, 14, 834, '20090812') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (608257, 2009, 4, 12, 51, 14, 3834, '20090812') insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense, CreateDate) values (608257, 2009, 4, 12, 51, 14, 29568834, '20090812')
You decide to use RowId that uniquely identifies each row in table as part of ExpenseDate column to make ExpenseDate unique. You check the convert function:
SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.444', 126) union all SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.443', 126) union all SELECT [Datetime] = CONVERT(datetime,'2010-02-28 12:07:34.442', 121) union all SELECT [Datetime] = CAST('2010-02-28 12:07:34.442' AS datetime) union all SELECT [Datetime] = CAST('2010-02-28 12:07:34.443' AS datetime)
To your surprise all five rows return the same result '2010-02-28 01:07:34.443'. You think that you can use identity column with seed = 3 to avoid this issue. You check it, but convert function is not reliable:
SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.991', 126) union all SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.994', 126) union all SELECT [Datetime] = CONVERT(datetime,'2010-02-28T12:07:34.997', 126) union all SELECT [Datetime] = CAST('2010-02-28 12:07:34.988' AS datetime) union all SELECT [Datetime] = CAST('2010-02-28 12:07:34.984' AS datetime)
And minutes and seconds part of date can not exceed 59.
You review your query and decide that if multiple expenses where entered the same date (Tiebreaker) the max expense becomes the last one and it has max(RowId) for each contractor. You modify your query and it produce a correct result.
with FixedExpenseDate(RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense) as ( select ROW_NUMBER() OVER (order by ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay) AS RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense from dbo.tmpBusinessExpense ) select C.ContractorName, convert(varchar, convert (DateTime, ( convert(char(4), A.FiscalYear) + '-' + case when len(A.ReportMonth) = 1 then '0' + convert(char(1), A.ReportMonth) else convert(char(2), A.ReportMonth) end + '-' + case when len(A.ReportDay) = 1 then '0' + convert(char(1), A.ReportDay) else convert(char(2), A.ReportDay) end ) ), 101) as LastExpenseDate, '$'+ convert(varchar, Expense, 1) as Expense from FixedExpenseDate A inner join dbo.tmpContractor C on A.ContractorId = C.ContractorId where A.RowId = ( select Max(B.RowId) from FixedExpenseDate B where A.ContractorId = B.ContractorId) order by C.ContractorName
You know that in correlated subquery the inner query should execute for each row from outer query. How do you get the same result without correlated subquery? You find a tip "Get Unique from Duplicates". It was written for Oracle, and you can not use this idea in SQL Server 2000, but it should work for SQL Server 2005.
with FixedExpenseDate(RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense) as ( select ROW_NUMBER() OVER (PARTITION BY ContractorId order by FiscalYear desc, Quater desc, ReportMonth desc, ReportWeek desc, ReportDay desc, Expense desc ) AS RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense from dbo.tmpBusinessExpense ) select C.ContractorName, convert(varchar, convert (DateTime, ( convert(char(4), A.FiscalYear) + '-' + case when len(A.ReportMonth) = 1 then '0' + convert(char(1), A.ReportMonth) else convert(char(2), A.ReportMonth) end + '-' + case when len(A.ReportDay) = 1 then '0' + convert(char(1), A.ReportDay) else convert(char(2), A.ReportDay) end ) ), 101) as LastExpenseDate, '$'+ convert(varchar, Expense, 1) as Expense from FixedExpenseDate A inner join dbo.tmpContractor C on A.ContractorId = C.ContractorId where A.RowId = 1 order by C.ContractorName
You send the report and you go home.
References:
http://www.sqlservercentral.com/
Jeff Smith's SQL Server Blog at http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx
Credits:
my wife, my kids, my teacher Ilya Petrenko