April 9, 2020 at 11:14 am
Jonathan AC Roberts wrote:Ralph Kimball has a method of not having any nulls in dimension tables. Instead, on the dimension table there are rows with negative keys: -1 for unknown, -2 for not applicable and -3 for not available. Using nulls might negatively affect the performance of your queries so it's best to avoid them if possible. Then any (fact) table that has a foreign key to a dimension table will not have any nulls but instead point to one of the default values in the dimension table if no value is found or it is not applicable.
I guess I'll not understand such a thing. Unknown and Not Available seem to be the same thing (IMHO) to me in most cases and Not Applicable usually means you've not correctly normalized the table. Of course, that's the nature of DWs, which I mostly dislike anyway. Denormalization and the resulting duplication of data without the benefit of pre-aggregation (and almost no one I've seen does pre-aggregation) for DWs just doesn't make good sense to me.
They are really just keys to text identifiers, sometimes there is no related record, so "not applicable". Or if there are inconsistencies in the source data the related key might be missing so "not available", the key might be undecided at the point you get the data so "unknown", you don't have to use all the available default values and you can add more if you need.
My main point was that using this method you can avoid using nulls and store even more information about the nature of the value that would be null. It also makes query processing faster.
April 9, 2020 at 1:11 pm
Jeff Moden wrote:Jonathan AC Roberts wrote:Ralph Kimball has a method of not having any nulls in dimension tables. Instead, on the dimension table there are rows with negative keys: -1 for unknown, -2 for not applicable and -3 for not available. Using nulls might negatively affect the performance of your queries so it's best to avoid them if possible. Then any (fact) table that has a foreign key to a dimension table will not have any nulls but instead point to one of the default values in the dimension table if no value is found or it is not applicable.
I guess I'll not understand such a thing. Unknown and Not Available seem to be the same thing (IMHO) to me in most cases and Not Applicable usually means you've not correctly normalized the table. Of course, that's the nature of DWs, which I mostly dislike anyway. Denormalization and the resulting duplication of data without the benefit of pre-aggregation (and almost no one I've seen does pre-aggregation) for DWs just doesn't make good sense to me.
They are really just keys to text identifiers, sometimes there is no related record, so "not applicable". Or if there are inconsistencies in the source data the related key might be missing so "not available", the key might be undecided at the point you get the data so "unknown", you don't have to use all the available default values and you can add more if you need.
My main point was that using this method you can avoid using nulls and store even more information about the nature of the value that would be null. It also makes query processing faster.
Yes, agreed. I was still thinking EmployeeID in an Employee table.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2020 at 1:15 pm
I'm curious as to why any row in an Employee table would ever have a NULL (or a -1 alternative or an "empty" alternative) for the Employee number for any row in the table.
I am talking about our main financial table which contains a lot of employee related cost, as salary, staff insurance, but also non workforce related stuff, such as depreciation (not sure if this is the right term, I am (Swiss) German native tongue, so I try to eyplain what I mean: the cost of something important losing value, such as ageing of Grand Pianos).
Understood. I guess I'd still give such non-workforce related people an "EmployeeID" if you're storing them in the Employee table. If nothing else, it would provide a good surrogate key to reference them in other tables.
And, yes, depreciation was the correct term.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2020 at 1:27 pm
Try normalize the Employee table, to move out rows that may me NULL. Good normalization also gives more understanding of the model. Sometimes NULL, may be acceptable, e.x. a "start date/end date" for a "thing" in a table.
If the data is for migration to another system, then a fat table structure with null, may be more efficient.
April 9, 2020 at 2:05 pm
Try normalize the Employee table, to move out rows that may me NULL. Good normalization also gives more understanding of the model. Sometimes NULL, may be acceptable, e.x. a "start date/end date" for a "thing" in a table.
If the data is for migration to another system, then a fat table structure with null, may be more efficient.
As I previously gave examples for, I've found that NULL EndDates normally lead to more complex and less efficient code. I'm also not sure why fat table structures might be more efficient for migrations because it means that you must first make the fat structure and the break it apart when it gets to the target system if you want it to be normalized at the destination. Sounds like a lot of unnecessary extra work to me. Of course, I could be wrong because I've never done a migration using fat tables as you suggest.
With that, could you expound on the efficiencies that might be had compared to the effort of creating such fat tables? Not trying to be contrary here... it's something I've not considered before and it deserves a look.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2020 at 2:10 pm
Jonathan AC Roberts wrote:Jeff Moden wrote:Jonathan AC Roberts wrote:Ralph Kimball has a method of not having any nulls in dimension tables. Instead, on the dimension table there are rows with negative keys: -1 for unknown, -2 for not applicable and -3 for not available. Using nulls might negatively affect the performance of your queries so it's best to avoid them if possible. Then any (fact) table that has a foreign key to a dimension table will not have any nulls but instead point to one of the default values in the dimension table if no value is found or it is not applicable.
I guess I'll not understand such a thing. Unknown and Not Available seem to be the same thing (IMHO) to me in most cases and Not Applicable usually means you've not correctly normalized the table. Of course, that's the nature of DWs, which I mostly dislike anyway. Denormalization and the resulting duplication of data without the benefit of pre-aggregation (and almost no one I've seen does pre-aggregation) for DWs just doesn't make good sense to me.
They are really just keys to text identifiers, sometimes there is no related record, so "not applicable". Or if there are inconsistencies in the source data the related key might be missing so "not available", the key might be undecided at the point you get the data so "unknown", you don't have to use all the available default values and you can add more if you need.
My main point was that using this method you can avoid using nulls and store even more information about the nature of the value that would be null. It also makes query processing faster.
Yes, agreed. I was still thinking EmployeeID in an Employee table.
Yes, you could also use the method above for that. Just create some default employees in the employee table with negative employeeIDs so then there would be no nulls in the table with the foreign key.
April 9, 2020 at 2:15 pm
Jeff Moden wrote:Jonathan AC Roberts wrote:Jeff Moden wrote:Jonathan AC Roberts wrote:Ralph Kimball has a method of not having any nulls in dimension tables. Instead, on the dimension table there are rows with negative keys: -1 for unknown, -2 for not applicable and -3 for not available. Using nulls might negatively affect the performance of your queries so it's best to avoid them if possible. Then any (fact) table that has a foreign key to a dimension table will not have any nulls but instead point to one of the default values in the dimension table if no value is found or it is not applicable.
I guess I'll not understand such a thing. Unknown and Not Available seem to be the same thing (IMHO) to me in most cases and Not Applicable usually means you've not correctly normalized the table. Of course, that's the nature of DWs, which I mostly dislike anyway. Denormalization and the resulting duplication of data without the benefit of pre-aggregation (and almost no one I've seen does pre-aggregation) for DWs just doesn't make good sense to me.
They are really just keys to text identifiers, sometimes there is no related record, so "not applicable". Or if there are inconsistencies in the source data the related key might be missing so "not available", the key might be undecided at the point you get the data so "unknown", you don't have to use all the available default values and you can add more if you need.
My main point was that using this method you can avoid using nulls and store even more information about the nature of the value that would be null. It also makes query processing faster.
Yes, agreed. I was still thinking EmployeeID in an Employee table.
Yes, you could also use the method above for that. Just create some default employees in the employee table with negative employeeIDs so then there would be no nulls in the table with the foreign key.
Heh.... NNNNNNOOOOOOOOOO!!!!!!!!. There is no way on this good Green Earth that I'd allow such a thing to occur in an Employee table because it would break uniqueness of the PK, which would be the EmployeeID column. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2020 at 2:33 pm
Jonathan AC Roberts wrote:Jeff Moden wrote:Jonathan AC Roberts wrote:Jeff Moden wrote:Jonathan AC Roberts wrote:Ralph Kimball has a method of not having any nulls in dimension tables. Instead, on the dimension table there are rows with negative keys: -1 for unknown, -2 for not applicable and -3 for not available. Using nulls might negatively affect the performance of your queries so it's best to avoid them if possible. Then any (fact) table that has a foreign key to a dimension table will not have any nulls but instead point to one of the default values in the dimension table if no value is found or it is not applicable.
I guess I'll not understand such a thing. Unknown and Not Available seem to be the same thing (IMHO) to me in most cases and Not Applicable usually means you've not correctly normalized the table. Of course, that's the nature of DWs, which I mostly dislike anyway. Denormalization and the resulting duplication of data without the benefit of pre-aggregation (and almost no one I've seen does pre-aggregation) for DWs just doesn't make good sense to me.
They are really just keys to text identifiers, sometimes there is no related record, so "not applicable". Or if there are inconsistencies in the source data the related key might be missing so "not available", the key might be undecided at the point you get the data so "unknown", you don't have to use all the available default values and you can add more if you need.
My main point was that using this method you can avoid using nulls and store even more information about the nature of the value that would be null. It also makes query processing faster.
Yes, agreed. I was still thinking EmployeeID in an Employee table.
Yes, you could also use the method above for that. Just create some default employees in the employee table with negative employeeIDs so then there would be no nulls in the table with the foreign key.
Heh.... NNNNNNOOOOOOOOOO!!!!!!!!. There is no way on this good Green Earth that I'd allow such a thing to occur in an Employee table because it would break uniqueness of the PK, which would be the EmployeeID column. 😉
I'm not sure I made it clear what I meant. I was assuming that employee's had non-negative employeeId's so adding some additional default employees with negative IDs would not break any uniqueness of the PK.
Then if you have another table with an FK to the employees a table, but sometimes there is no related value you could relate it to one of the default employees. That way no null values in the table with the FK.
April 9, 2020 at 6:32 pm
Jeff, I don't see the problem with a noble end date. I don't get makes anything complicated.Given a table that needs to show an ongoing event that has not yet terminated, we can use a NULL-able end_date, thus:
CREATE TABLE Foobar
( ...
start_date DATE NOT NULL DEFAULT CURRENT_TIMESTAMP,
end_date DATE,
CHECK (start_date <= end_date),
..);
Then you use expressions like (COALESCE (end_date, CURRENT_TIMESTAMP) in your code. This will give you temporal interval up until today, which you know is correct. If you want to make this a computed column or put it in a view, and there's no problems.
Please post DDL and follow ANSI/ISO standards when asking for help.
April 9, 2020 at 7:07 pm
Joe, if the indexing is setup correctly, wrapping the ENDDATE in a COALESCE can cause a range scan (at best) instead of a SEEK because it makes that part of the query NON-SARGable. It also requires you to use COALESCE (or an OR), which is more complicated than not needing to use it.
Using computed columns in a non-indexed view and then using that column for criteria is a form of slow death in SQL Server because the entire view will need to materialize before the criteria can be applied.
A persisted computed column in a table IS INDEXABLE and IS SARGable but it's totally unnecessary to do so if your end date defaults to a standard "future date" such as 99990101.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2020 at 12:38 am
>>Joe, if the indexing is setup correctly, wrapping the ENDDATE in a COALESCE() can cause a range scan (at best) instead of a SEEK because it makes that part of the query NON-SARGable. It also requires you to use COALESCE() (or an OR), which is more complicated than not needing to use it. <<
Why don't you have an index on the start date for an interval? Or index. The (start_date, end_date) pair? You're going to have to do a COALESCE() on the NULL end_date and the current_timestamp in any query where you want to know the elapsed time.
>> ... if your end date defaults to a standard "future date" such as '9999-01-01' <<
I guess if you don't care about having a corrected data model you could do this. The problem is that your dummy date is a real date, and not a token marking an unknown future time. I've seen leases in which the lease and hundreds of years in the future and old COBOL systems. However, the preferred format would be to use the clause "in perpetuity" in the contract. That's what a NULL end_date does. In order to work with your dummy future eternity token, nephews are pretty ugly CASE expression I find is far more complicated than a simple COALESCE().
The floating-point standards have both a positive and negative infinity NaN (not a number) tokens. Unfortunately, temporal values don't have this in SQL..
Please post DDL and follow ANSI/ISO standards when asking for help.
April 13, 2020 at 3:55 pm
Actually, Joe, I care very much about having a "corrected [sic] data model" and that's why I'm suggesting that having a NULL EndDate is the wrong way to do things. And, yes, I do have an index on the StartDate and EndDate pair but actually does little to help queries because of the way most people create such an index. I'm working right now and have night time obligations until Wednesday. I'll try to remember this and get back to you with demonstrable code that demonstrates that NULLs and COALESCE on the EndDate column in the WHERE clause is the wrong thing to do.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply