March 10, 2010 at 8:40 am
Hi All,
Hopefully an easy one for somebody in "the know".
I have a table with some 75 fields, that acts as a staging table for incoming data feeds.
I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:
Update table
set field1 = Null where field1 = '' or field1 = ' '
Update table
set field2 = Null where field2 = '' or field2 = ' '
Update table
set field3 = Null where field3 = '' or field3 = ' '
Update table
set field4 = Null where field4 = '' or field4 = ' '
Update table
set field5 = Null where field5 = '' or field5 = ' '
Update table
set field6 = Null where field6 = '' or field6 = ' '
Update table
set field7 = Null where field7 = '' or field7 = ' '
etc etc
I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?
Paul
March 10, 2010 at 8:59 am
Paul_Harvey (3/10/2010)
Hi All,Hopefully an easy one for somebody in "the know".
I have a table with some 75 fields, that acts as a staging table for incoming data feeds.
I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:
Update table
set field1 = Null where field1 = '' or field1 = ' '
Update table
set field2 = Null where field2 = '' or field2 = ' '
Update table
set field3 = Null where field3 = '' or field3 = ' '
Update table
set field4 = Null where field4 = '' or field4 = ' '
Update table
set field5 = Null where field5 = '' or field5 = ' '
Update table
set field6 = Null where field6 = '' or field6 = ' '
Update table
set field7 = Null where field7 = '' or field7 = ' '
etc etc
I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?
Paul
update dbo.table set
field1 = case when field1 is null or field1 = '' or field1 = ' ' then null else field1 end,
field2 = case when field2 is null or field2 = '' or field2 = ' ' then null else field2 end,
...,
field75 = case when field75 is null or field75 = '' or field75 = ' ' then null else field75 end
Not sure this will be any faster. I'd look at temporarily creating indexes on each of the columns before the updates and dropping the indexes after the update. It really depends on the number of records and the distribution of data in each column.
March 10, 2010 at 9:01 am
Thanks for the advice Lynn. I will try the first suggestion.
March 10, 2010 at 9:14 am
Let us know. I was going to post what Lynn did, but too slow. I think that's the best way to do it.
March 10, 2010 at 4:51 pm
Paul_Harvey (3/10/2010)
Hi All,Hopefully an easy one for somebody in "the know".
I have a table with some 75 fields, that acts as a staging table for incoming data feeds.
I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:
Update table
set field1 = Null where field1 = '' or field1 = ' '
Update table
set field2 = Null where field2 = '' or field2 = ' '
Update table
set field3 = Null where field3 = '' or field3 = ' '
Update table
set field4 = Null where field4 = '' or field4 = ' '
Update table
set field5 = Null where field5 = '' or field5 = ' '
Update table
set field6 = Null where field6 = '' or field6 = ' '
Update table
set field7 = Null where field7 = '' or field7 = ' '
etc etc
I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?
Paul
If you're going to transfer the data from the staging table to a final table, don't waste your time changing the blanks to NULLs in the staging table. Lookup NULLIF and do it when you copy the data to the final table. It's very fast.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2010 at 8:49 pm
To illustrate Jeff's point:
DECLARE @Example
TABLE (
field1 VARCHAR(30) NULL,
field2 VARCHAR(30) NULL,
field3 VARCHAR(30) NULL,
field4 VARCHAR(30) NULL,
field5 VARCHAR(30) NULL,
field6 VARCHAR(30) NULL
);
INSERT @Example
(field1, field2, field3, field4, field5, field6)
VALUES (NULL, SPACE(0), SPACE(1), SPACE(2), SPACE(3), 'data');
SELECT field1 = NULLIF(field1, SPACE(0)),
field2 = NULLIF(field2, SPACE(0)),
field3 = NULLIF(field3, SPACE(0)),
field4 = NULLIF(field4, SPACE(0)),
field5 = NULLIF(field5, SPACE(0)),
field6 = NULLIF(field6, SPACE(0))
FROM @Example;
March 10, 2010 at 9:34 pm
Well done!
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2010 at 11:21 pm
Jeff Moden (3/10/2010)
Well done!
Thanks. I like to be helpful 😉
March 11, 2010 at 2:56 am
Thanks everyone for your swift responses.
After a little testing I have taken the advice of Jeff Moden (with the vital example given by Paul White!). I had not realised how effective the NULLIF statement is, and the difference in performance is staggering. The data flow is part of a SSIS package, and so instead of using a table as a datasource I am now using a SQL Query containing NULLIFs. I then removed the Execute SQL Task which removed the spaces. I haven't done any formal performance timing of the new and old methods, but I would conservatively estimate that the total running time of the package is now one tenth of what it was previously. The Execute SQL task was taking as long as some of the Fuzzy Group tasks, which are clearly more intensive.
Thanks very much to all who chipped in.
March 11, 2010 at 6:56 am
Very cool. Thanks for the feedback, Paul.
As a sidebar, I sometimes don't give an example because I really want people to take the time to look it up in Books Online and read about it. It's kind of like learning a new word each day. There's some very important information about which datatype is returned, etc.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2010 at 7:01 am
Good point Jeff. My approach though, is to always try to provide code if I have time. My thinking is that although my script might be copy-and-pasted, anyone interested in learning will always 'hit F1'. They sure will the first time it fails to work as expected. That, or we get another question!
March 11, 2010 at 7:02 am
Oh I totally understand that point Jeff. And I agree with it. The lazy side of me is just always grateful for being spoon-fed of course.
Nonetheless this exercise has taught me about NULLIF and also SPACE(n).
March 11, 2010 at 7:29 am
Heh... I agree... spoonfeeding is good. One picture IS worth a thousand words. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2010 at 7:32 am
and I actually DID press F1 and read a little more. Because it worked I took more interest. I suspect if it took me too long to work it out myslef I may have lost interest.. particularly as it was such a "wow" moment when I saw the performance advantage!
March 30, 2010 at 8:37 am
Hi there,
I am now going to demonstrate my lack of understanding on this NULLIF function but....
Today I discovered to my surprise (see!?!) that NULLIF(field,space(0)) not only removes spaces from my data, but also Zeros. Unfortunately 0 is a valid value in my table, and if difference from NULL.
Is there any way to limit NULLIF to removeing spaces?
Paul
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply