July 28, 2011 at 2:11 pm
I have a request to allow the user to search on a converted datetime column(i.e. convert(varchar,column_name,101) mm/dd/yyyy). Is there
any trick to indexing this column. I guess if it were in a view I could create an index on this column in the view.
July 28, 2011 at 2:27 pm
I think an index on that column would be fairly worthless, since it is in MM/DD/YYYY format. The only thing it would be any use for is a direct lookup on a single date. A date range search would not work, since you have month and day before year.
You would be much better off saving the data as a datetime value, and creating the index on that.
July 28, 2011 at 4:07 pm
I agree with Michael, any indexes on the datetime column will be negated by surrounding the column with a CONVERT function in a search predicate.
I guess if it were in a view I could create an index on this column in the view.
Not 100% sure what you meant, but just in case you are misinformed, you cannot add an index to a [regular] View. This would require an Indexed View. I latched onto the statement because Indexed Views have many restrictions and are rarely a good choice for this type of thing.
You could make use of a computed column, index that and then have your user query it instead of the regular datetime column:
CREATE TABLE dbo.a
(
dt DATETIME,
dt_mmddyyyy AS CONVERT(CHAR(10), dt, 101)
)
CREATE INDEX [ix_dbo.a.dt_mmddyyyy] ON dbo.a (dt_mmddyyyy) ;
GO
INSERT INTO dbo.a
(dt)
VALUES ('2011-07-28 22:05:50')
GO
SELECT *
FROM dbo.a
GO
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 29, 2011 at 4:30 am
Thanks, I will try and used the computed column method.
July 29, 2011 at 4:35 am
a clustered index on this column would result in slow inserts and updates coz the day comes first in the var char column and every time a day is repeated your looking at index being rearranged. Stick with a date time column and add the index clustered or non clustered to it , the impact of conversion is minimal compared to the possibility off a poorly performing insert to update later on.
July 29, 2011 at 10:25 am
Jayanth_Kurup (7/29/2011)
a clustered index on this column would result in slow inserts and updates coz the day comes first in the var char column and every time a day is repeated your looking at index being rearranged. Stick with a date time column and add the index clustered or non clustered to it , the impact of conversion is minimal compared to the possibility off a poorly performing insert to update later on.
Who said anything about a clustered index?
Too bad we're not talking SQL 2008...this would be a nice use for a DATE type.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 29, 2011 at 10:47 am
ericwenger1 (7/28/2011)
I have a request to allow the user to search on a converted datetime column(i.e. convert(varchar,column_name,101) mm/dd/yyyy). Is thereany trick to indexing this column. I guess if it were in a view I could create an index on this column in the view.
It's a total waste of time and index space. You can't use it for anything other than single date lookups. Because of the format, it's not possible to use it for a range of dates.
I'm not picking on you, Eric. I've seen this happen several hundered times so I have to say it again, and again... storing formatted data is absolutely one of the very worst things you can do with data especially for things like dates. Even putting an index on a computed column that formats dates is a futile thing to do because, unless the ISO format was used, formatted dates just aren't sortable.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 29, 2011 at 10:51 am
Jeff Moden (7/29/2011)
ericwenger1 (7/28/2011)
I have a request to allow the user to search on a converted datetime column(i.e. convert(varchar,column_name,101) mm/dd/yyyy). Is thereany trick to indexing this column. I guess if it were in a view I could create an index on this column in the view.
It's a total waste of time and index space. You can't use it for anything other than single date lookups. Because of the format, it's not possible to use it for a range of dates.
I'm not picking on you, Eric. I've seen this happen several hundered times so I have to say it again, and again... storing formatted data is absolutely one of the very worst things you can do with data especially for things like dates. Even putting an index on a computed column that formats dates is a futile thing to do because, unless the ISO format was used, formatted dates just aren't sortable.
Agreed...unless you really do want to allow folks to search on one day, and the format matters, which is how I understood the problem case. That said, I concede I could have misread it, it happens all the time 😀
I would like it if SQL Server supported "Expression Indexes". A computed column that is indexed is only stored in the index, but it shows up in the DDL for the table which leads folks to a negative view of them.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 29, 2011 at 6:26 pm
Orlando, can you honestly say that the format of a date stored in a database should actually matter for a user that's apparently set to writing queries against something that shouldn't be allowed in a database to begin with?
And, to be sure, no... I wasn't taking a pot shot at you just because of the computed column idea (which is actually a decent idea if all else fails) . I am, however, taking direct aim squarely between the eyes of the so-called-designer that called for a VARCHAR date column and, depending on circumstances, possibly the DBA that allowed it to happen on his/her watch. There's no excuse for this problem. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
July 29, 2011 at 8:38 pm
Jeff Moden (7/29/2011)
Orlando, can you honestly say that the format of a date stored in a database should actually matter for a user that's apparently set to writing queries against something that shouldn't be allowed in a database to begin with?
In this case, sure, I am up for defending a decision like this. It may be unappealing, but after thinking it through I don't have a problem with it. If we had the DATE data type in 2005 we would not even be having this conversation. For discussion sake though, and because I respect your views and will take you up on any time you are willing to put towards something I am looking at, I'll point out that you have seen (or can look up if you have not seen) many of my posts on this site calling for a re-design/re-factor where possible. However, if the application layer won't allow for such a thing, or the application layer is actually SSMS (which we have seen too many times before and may be the case here) then why not store a formatted date in an index?
I am only trying to answer the question/solve the problem at hand. This approach is not going to endanger anyone's database...and on 2005 we do not have as many options (data type) so what is one to do? Tell the customer no when there is a simple, safe and workable way forward?
Would a trigger that stripped the time-component off the DATETIME column value and stored that in an additional column of type SMALL/DATETIME that was indexed make it easier to accept? That would use more resources than a computed column, would involve adding a trigger and would place the onus on the developer to get the trigger 'right' so the extra column never got out of sync. A computed column to a SMALLDATETIME will not work here, something about determinism 🙂 I also tried the 'CAST to INT' trick but that introduces rounding so it would involve a bit of date-math and would affect the query required to get us to our row.
And, to be sure, no... I wasn't taking a pot shot at you just because of the computed column idea (which is actually a decent idea if all else fails).
No worries if you were. I can take it 🙂 This is a public forum and I am here to learn too. Please, feel free to provide an alternative solution without changing the problem case, or to compel the OP to go back to their customer with an objection to get the problem case changed. "Computed column if all else fails" is where I landed given the situation. Did I test it exhaustively? No. Would I put/allow something like this in my own database? Not without trying to change the problem case. But would I put/allow something like this in my own database? If I had no other choice, sure, I could think of worse things...like a trigger :hehe:
I am, however, taking direct aim squarely between the eyes of the so-called-designer that called for a VARCHAR date column and, depending on circumstances, possibly the DBA that allowed it to happen on his/her watch. There's no excuse for this problem. 😉
Allow? How about proposed? It's conceivable that there is a better solution with no change in the problem case, an objection to the requestor that could change the problem case or a combination of factors that inevitably change the problem case, but for now, I would sleep at night moving forward with an index on a non-persisted computed column to satisfy a customer request and work around the platform limitation. It would go on the "list of things to update when we migrate to 2008". This works splendidly in 2008 and would require no changes to the queries sent to the database provided they were not messing with the default DATEFORMAT option:
CREATE TABLE dbo.a
(
dt DATETIME,
dt_mmddyyyy AS cast(dt AS DATE)
)
CREATE INDEX [ix_dbo.a.dt_mmddyyyy] ON dbo.a (dt_mmddyyyy) ;
GO
INSERT INTO dbo.a
(dt)
VALUES ('2011-07-28 22:05:50')
GO
SELECT *
FROM dbo.a
GO
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 30, 2011 at 12:48 pm
opc.three (7/29/2011)
In this case, sure, I am up for defending a decision like this. It may be unappealing, but after thinking it through I don't have a problem with it.
Understood. Please understand that I can't ever justify it and I can’t even rationalize it even if some 3rd party vendor is the cause of having formatted dates or time in the database. IMHO, storing formatted dates and/or time is 100% wrong unless it's in a staging table waiting for conversion and insertion into a properly typed column (and that will usually be DATETIME, not DATE, even if whole dates are supposedly "guaranteed") or for output to a fixed/formatted field in a file. Please bear with me… I’ll explain all of that in a minute.
If we had the DATE data type in 2005 we would not even be having this conversation.
It depends. As you may guess from the paragraph previous to the one above, you and I could probably have a really good discussion on the DATE data-type. I'll skip that, though, because we'd scare people. 😀 I will agree that using a non-persisted computed DATE column would probably be a safe and easy way to do this IF the original formatted date column could not be changed.
For discussion sake though, and because I respect your views and will take you up on any time you are willing to put towards something I am looking at, I'll point out that you have seen (or can look up if you have not seen) many of my posts on this site calling for a re-design/re-factor where possible.
Agreed on all points and the respect is mutual. You’re definitely one of the “good guys” and that’s why I’m going to take the time to explain why such “help” as using a formatted date column is so very dangerous.
However, if the application layer won't allow for such a thing, or the application layer is actually SSMS (which we have seen too many times before and may be the case here) then [font="Arial Black"]why not store a formatted date in an index?[/font]
[font="Arial Black"]I am only trying to answer the question/solve the problem at hand. [/font] This approach is not going to endanger anyone's database...and on 2005 we do not have as many options (data type) so what is one to do? Tell the customer no when there is a simple, [font="Arial Black"]safe [/font]and workable way [font="Arial Black"]forward[/font]?
Outstanding questions, Orlando. You're correct... it won't endanger the database. At this point, it's not the database I'm worried about. Let's talk about the words "safe" and "forward" in this case and maybe you'll understand my horror of the idea of storing formatted dates for the sake of ease-of-use for a user's queries.
Let's first ask ourselves "why" we're even considering formatted date storage to begin with. The answer is simple and seemingly logical... the user is largely "ignorant" of SQL (the term "ignorant" is not meant as a slam here... it's meant to identify a "lack of knowledge" in the classic sense of the word) and we want to make it easy for the user(s) to do what they need to do. So, we take the "easy" way out and give them exactly what they want never considering what an "SQL ignorant" user may do in the future.
Further down the line, the "SQL ignorant" user (with all respect to the "Bobs" in the world, we'll call him "Bob") is tasked by his boss to do something else. Bob is "SQL ignorant" and doesn't know what to do, so he asks a coworker that he knows (actually, the two of them are considered to be a “crack team”) knows something more of SQL but not much more. Here's how the conversation goes.
{Bob} Hi Sally. Say... the boss gave me a new task to do in SQL. He want's me to do a SUM on the "Amount" column for the last 30 days for all “Travel” expenditures. The DBA's are pretty busy and I hate to bother them. Can you show me how to do it?
{Sally, not knowing the date column is a formatted date column} Sure, Bob. No problem. You just take the SUM of the "Amount" column and use the start and end dates you want in the WHERE clause using a BETWEEN and include a filter for the “Travel” account code.
{Bob} Awesome. But I've never done such a thing before. Can you send me an email with an example?
{Sally} Sure. No problem. What's the name of the table and the date column you're working with?
{Bob} The table is called "Expenses" and the date column is called "dt_mmddyyyy".
{Sally, who doesn’t realize the potential problem from the name of the date column} Coming right up. The email will be waiting for you by the time you get back to your desk.
And then, Sally sends an email to Bob with the following code:
SELECT SUM(Amount) AS Total_Amount
FROM Expenses
WHERE dt_mmddyyyy BETWEEN '04/01/2011' AND '04/30/2011'
AND Account_Code = 'Travel'
Bob, who is “SQL ignorant”, is delighted. He didn’t know it could be that easy. “Isn’t SQL wonderful?”, he thinks. Of course, being “SQL ignorant”, he doesn’t check to see what happens because he doesn’t know how to. Besides, that seems so simple, why would he need to check? He gets his sum and gives it to the boss.
Of course, after a week of such high numbers, the boss goes through the roof and an expensive “travel witch hunt” begins. After a day or two of that, they find out that Sally unknowingly wrote the wrong code and Bob used it without verifying what it actually did. Neither is truly at fault but both their reputations have been tarnished and valuable/expensive time was spent by a bunch of people trying to figure out why travel expenses where about 25 times higher than expected. Possibly worse than any of that, Bob and Sally used to be great friends and were considered to be a “crack team”. Bob feels like Sally threw him under the bus as does Sally feel about Bob. They no longer speak with each other never mind help each other. The “crack team” has been destroyed and the company suffers for it in the future.
Who’s at fault here? It’s not Bob and it’s certainly not Sally even though they both took the fall and were made to look absolutely stupid instead of just ignorant. Whose fault is it, really?
It’s actually the fault of the very well intentioned but totally ignorant-of-human-behavior DBA who remained (I use past-tense here… you’ll see why soon) totally oblivious to the problem he or she had caused by making a totally different, very narrow problem easy for the user.
And, no… this isn’t the stretch of an old man’s wild imagination. This is a sage old fart telling people that I’ve actually seen this happen (names changed, of course) at two different companies I've worked for and that YOU MUST protect the users from themselves and their own lack of knowledge. To coin a phrase, “You can NOT simply giver a user what they WANT… you MUST, instead, give them what they NEED.”
This whole problem could have been averted if the DBA gave Bob what he needed instead of what he wanted. What Bob needed was knowledge of the problem and the code for the correct solution yet the DBA simply gave Bob what he wanted because it did what Bob wanted, it was easier to do, and it didn’t take much time.
As a bit of a sidebar, the incident I portrayed above happened before I joined the company (in both cases). As it is with the interaction of people you work with, the incident eventually came up and peaked my interest. The reason why I know how it all happened is that I necessarily became friends with the “Bob” and “Sally” persons at both companies because I needed their help and they were good people. Of course, as time passed, the story came out and the code in the emails was shared with me. “Bob”, “Sally”, the DBA, and the “boss” at each company were all quite thankful when they found out what had happened but I couldn’t get them to reestablish the wonderful working relationship they all shared prior to the incident because “Bob” and “Sally” had inadvertantly thrown each other under the bus during the incident and the "boss" was driving the bus.
So… going “forward”, how “safe” was the solution of using a formatted date column? Do you still think that the sentence “I am only trying to answer the question/solve the problem at hand.” is appropriate?
Don’t just solve the “problem at hand”, Orlando. Carbon-based life-forms are involved. Be the “exceptional DBA”. Expect the unexpected at all times and protect your users from themselves because they deserve to be and need to be protected. They’re not stupid… just ignorant. Help them be less ignorant. Make sure you give them the right code (yes, mind-reading skills do sometimes help) and make sure they know what it does before they use it. That extra 30 to 60 minute you spend doing it right and teaching the users will be paid back 100 fold.
Would a trigger that stripped the time-component off the DATETIME column value and stored that in an additional column of type SMALL/DATETIME that was indexed make it easier to accept?
Heh... gosh... I thought you knew me better than that, Orlando. My response to that question would be "NFW". :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
July 31, 2011 at 8:47 pm
Jeff, you're entire post is predicated on the idea that the problem case can be changed. Mine is predicated on the idea that it cannot. If you are 100% against the idea and will not budge then I do not expect you to provide an alternative solution...I get it...not doing it is your alternative solution...mine is a non-persisted indexed computed column. If I were to assume what you are assuming then our conversation would be about something other than this issue, because on this one I agree with you 100% !!!! :w00t: I have fought the fight to keep dates stored as character data types many many many times before, and won all of them with arguments a lot like (but not as elegant) as the one you just presented. Staging tables are an exception for me as well, however I avoid those as well (more of a conversation about ETL for later, Bartender!). I have not personally witnessed the downsides you have with what your predecessors left behind, however I see it on these forums quite often.
I do appreciate you taking the time, and I appreciate the nod, I am one of the good guys. Your post re-enforces my stance on the fundamental issue at hand and surely will for others on this thread too. Who knows, if we're lucky, it will maybe even change some hearts and minds.
That said, before we indulge in the debate about whether it is "safe" (ignorants, lemmings and idiots not withstanding) to store the DATE-portion of a DATETIME as a formatted string in an index and possibly stir things up to the point of philosophical uprising, let's see if we can get some more input from the OP. I made a blind assumption that we have a situation where the problem case cannot be changed...and save for you alluding to it, I see that we all have skipped the step of actually asking.
Is it possible for your "user" to supply a different query to get their data? Not "would they be willing initially", but do you think it would be possible to have them supply a different query, something like this:
SELECT *
FROM table_name
WHERE date_time_column BETWEEN '07/25/2011' AND '07/26/2011' ;
Instead of like this:
SELECT *
FROM table_name
WHERE formatted_date_column = '07/25/2011' ;
If you think it is possible then it is imperative that you fight the good fight and educate your user as to why they need to supply a BETWEEN instead of an equality check in the WHERE clause. You can point directly to Jeff's previous post as an example of why, and how, such a move can go wrong. Then, if they still refuse because it's too much code to write, or if they tell you it's because a third party app has some artificial limitation, then raise your voice! Shout loudly! Enlist your superiors! I like to start off shouting louder than I think the person is ever willing to shout back while also keeping some energy in reserve so I am prepared to shout even louder if the first shock and awe blast attempt doesn't get the job done 😀 I also like to do this shouting in a location where everyone in the department, or even the whole office floor if possible, can hear us. Tell them that you will not support their request if they continue to provide objections...just don't do anything that would endanger your job 🙂
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 31, 2011 at 9:15 pm
opc.three (7/31/2011)
Jeff, you're entire post is predicated on the idea that the problem case can be changed.
Heh... not quite. My entire post is predicated on the idea that I would [font="Arial Black"]MAKE [/font]the problem case change because it's not right the way it is. 😉
Even the OP unwittingly suggested an alternative...
I guess if it were in a view I could create an index on this column in the view.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 31, 2011 at 9:47 pm
Eric, tell the folks who made the request that you need to make an alternate suggestion because of the dangers of users having access to formatted dates.
Consider the following alternative solution, please...
Let's say you're table looks like the following... it only takes a couple of seconds to build if you want to try this. I suggest you do this testing in TempDB.
/**********************************************************************************************************************
Purpose:
Create a voluminous test table with various types of highly randomized data.
--Jeff Moden
**********************************************************************************************************************/
--===== Conditionally drop the test table to make reruns easier
IF OBJECT_ID('dbo.JBMTest','U') IS NOT NULL
DROP TABLE dbo.JBMTest
;
--===== Create and populate a 1,000,000 row test table.
-- "SomeID" has a range of 1 to 1,000,000 unique numbers
-- "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- "SomeMoney has a range of 0.00 to 100.00 non-unique numbers
-- "SomeDateTime" has a range of >=01/01/2000 and <01/01/2020 non-unique date/times
-- "SomeDate" has a range of >=01/01/2000 and <01/01/2020 non-unique "whole dates" as date format 101
-- "SomeName" contains random characters at random lengths from 2 to 20 characters
SELECT TOP 1000000
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID())) % 50000 + 1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A')),
SomeMoney = CAST(RAND(CHECKSUM(NEWID())) * 100 AS DECIMAL(9,2)), --Note rounding
SomeDateTime = RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME),
SomeDate = CONVERT(CHAR(10), ABS(CHECKSUM(NEWID())) % DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME), 101),
SomeName = RIGHT(NEWID(),ABS(CHECKSUM(NEWID())) % 19 + 2)
INTO dbo.JBMTest
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE dbo.JBMTest
ADD CONSTRAINT PK_JBMTest PRIMARY KEY CLUSTERED (SomeID) WITH FILLFACTOR = 90
;
Now, ask the user which columns they're going to want to return along with the searchable date and make a deterministic view of those columns following the rules for indexable views. Here's where I chose 3 columns (you can chose more). The index needs to be unique and clustered so I put the date first for the clustering and the SomeID column in for uniqueness... the SET commands follow all the rules for making an indexed view, as well.
SET ANSI_NULLS ON;
SET ANSI_PADDING ON;
SET ANSI_WARNINGS ON;
SET CONCAT_NULL_YIELDS_NULL ON;
SET QUOTED_IDENTIFIER ON;
SET NUMERIC_ROUNDABORT OFF;
--DROP VIEW dbo.JBMTest01
GO
CREATE VIEW dbo.JBMTest01 WITH SCHEMABINDING AS
SELECT TheDate = CONVERT(DATETIME, SomeDate, 101), SomeID, SomeMoney
FROM dbo.JBMTest;
GO
CREATE UNIQUE CLUSTERED INDEX IXV_JBMTest01_TheDate
ON dbo.JBMTest01 (TheDate, SomeID);
Then, write a query to use the view and teach the user how to use it. Don't forget to GRANT SELECT privs on the view to the group the user is in (or the user if you don't use groups but you really should be using groups)
The side benefit of using an indexed view like this is that the user is never exposed to the dangers of that bloody formatted date column because, if your security is setup correctly, the user can't currently see the table.
Since the view get's you out of the woods on the request, you'll be tempted to forget the rest of what needs to be done. Don't. The next thing to do is to find out what the impact of changing the datatype on that formatted date column would be and see if you can change it. You'll get some resistance because that's human nature but, if it were me, there'd be a full-court press to get it changed because it'll also cause problems down the road. It's what DBA's do. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 1, 2011 at 4:42 am
Jeff Moden (7/31/2011)
opc.three (7/31/2011)
Jeff, you're entire post is predicated on the idea that the problem case can be changed.Heh... not quite. My entire post is predicated on the idea that I would [font="Arial Black"]MAKE [/font]the problem case change because it's not right the way it is. 😉
Nice twist Jeff, but you're indulging in semantics which takes away from the root issue. You and I both know that sometimes peoples hands are tied. Philosophy aside, let's help the folks whose hands are tied.
Even the OP unwittingly suggested an alternative...
I guess if it were in a view I could create an index on this column in the view.
True, but not constituted as it was alluded too. An indexed view is extraneous and, further to that, we may not need a VIEW at all.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply