Removing Blank Fields

Question

Removing Blank Fields

Paul_Harvey

Hall of Fame

Points: 3822
More actions
March 10, 2010 at 8:40 am

#221419

Hi All,
Hopefully an easy one for somebody in "the know".
I have a table with some 75 fields, that acts as a staging table for incoming data feeds.
I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:
Update table
set field1 = Null where field1 = '' or field1 = ' '
Update table
set field2 = Null where field2 = '' or field2 = ' '
Update table
set field3 = Null where field3 = '' or field3 = ' '
Update table
set field4 = Null where field4 = '' or field4 = ' '
Update table
set field5 = Null where field5 = '' or field5 = ' '
Update table
set field6 = Null where field6 = '' or field6 = ' '
Update table
set field7 = Null where field7 = '' or field7 = ' '
etc etc
I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?
Paul

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Paul_Harvey (3/10/2010)
Hi All,
Hopefully an easy one for somebody in "the know".
I have a table with some 75 fields, that acts as a staging table for incoming data feeds.
I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:
Update table
set field1 = Null where field1 = '' or field1 = ' '
Update table
set field2 = Null where field2 = '' or field2 = ' '
Update table
set field3 = Null where field3 = '' or field3 = ' '
Update table
set field4 = Null where field4 = '' or field4 = ' '
Update table
set field5 = Null where field5 = '' or field5 = ' '
Update table
set field6 = Null where field6 = '' or field6 = ' '
Update table
set field7 = Null where field7 = '' or field7 = ' '
etc etc
I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?
Paul

update dbo.table set

field1 = case when field1 is null or field1 = '' or field1 = ' ' then null else field1 end,

field2 = case when field2 is null or field2 = '' or field2 = ' ' then null else field2 end,

...,

field75 = case when field75 is null or field75 = '' or field75 = ' ' then null else field75 end

Not sure this will be any faster. I'd look at temporarily creating indexes on each of the columns before the updates and dropping the indexes after the update. It really depends on the number of records and the distribution of data in each column.

Paul_Harvey Hall of Fame Points: 3822 More actions · Answer 2

Thanks for the advice Lynn. I will try the first suggestion.

Steve Jones - SSC Editor SSC Guru Points: 736479 More actions · Answer 3

Let us know. I was going to post what Lynn did, but too slow. I think that's the best way to do it.

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 4

Paul_Harvey (3/10/2010)
Hi All,
Hopefully an easy one for somebody in "the know".
I have a table with some 75 fields, that acts as a staging table for incoming data feeds.
I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:
Update table
set field1 = Null where field1 = '' or field1 = ' '
Update table
set field2 = Null where field2 = '' or field2 = ' '
Update table
set field3 = Null where field3 = '' or field3 = ' '
Update table
set field4 = Null where field4 = '' or field4 = ' '
Update table
set field5 = Null where field5 = '' or field5 = ' '
Update table
set field6 = Null where field6 = '' or field6 = ' '
Update table
set field7 = Null where field7 = '' or field7 = ' '
etc etc
I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?
Paul

If you're going to transfer the data from the staging table to a final table, don't waste your time changing the blanks to NULLs in the staging table. Lookup NULLIF and do it when you copy the data to the final table. It's very fast.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 5

To illustrate Jeff's point:

DECLARE @Example

TABLE (

field1 VARCHAR(30) NULL,

field2 VARCHAR(30) NULL,

field3 VARCHAR(30) NULL,

field4 VARCHAR(30) NULL,

field5 VARCHAR(30) NULL,

field6 VARCHAR(30) NULL

);

INSERT @Example

(field1, field2, field3, field4, field5, field6)

VALUES (NULL, SPACE(0), SPACE(1), SPACE(2), SPACE(3), 'data');

SELECT field1 = NULLIF(field1, SPACE(0)),

field2 = NULLIF(field2, SPACE(0)),

field3 = NULLIF(field3, SPACE(0)),

field4 = NULLIF(field4, SPACE(0)),

field5 = NULLIF(field5, SPACE(0)),

field6 = NULLIF(field6, SPACE(0))

FROM @Example;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 6

Well done!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 7

Jeff Moden (3/10/2010)
Well done!

Thanks. I like to be helpful 😉

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul_Harvey Hall of Fame Points: 3822 More actions · Answer 8

Thanks everyone for your swift responses.

After a little testing I have taken the advice of Jeff Moden (with the vital example given by Paul White!). I had not realised how effective the NULLIF statement is, and the difference in performance is staggering. The data flow is part of a SSIS package, and so instead of using a table as a datasource I am now using a SQL Query containing NULLIFs. I then removed the Execute SQL Task which removed the spaces. I haven't done any formal performance timing of the new and old methods, but I would conservatively estimate that the total running time of the package is now one tenth of what it was previously. The Execute SQL task was taking as long as some of the Fuzzy Group tasks, which are clearly more intensive.

Thanks very much to all who chipped in.

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 9

Very cool. Thanks for the feedback, Paul.

As a sidebar, I sometimes don't give an example because I really want people to take the time to look it up in Books Online and read about it. It's kind of like learning a new word each day. There's some very important information about which datatype is returned, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 10

Good point Jeff. My approach though, is to always try to provide code if I have time. My thinking is that although my script might be copy-and-pasted, anyone interested in learning will always 'hit F1'. They sure will the first time it fails to work as expected. That, or we get another question!

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul_Harvey Hall of Fame Points: 3822 More actions · Answer 11

Oh I totally understand that point Jeff. And I agree with it. The lazy side of me is just always grateful for being spoon-fed of course.

Nonetheless this exercise has taught me about NULLIF and also SPACE(n).

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 12

Heh... I agree... spoonfeeding is good. One picture IS worth a thousand words. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul_Harvey Hall of Fame Points: 3822 More actions · Answer 13

and I actually DID press F1 and read a little more. Because it worked I took more interest. I suspect if it took me too long to work it out myslef I may have lost interest.. particularly as it was such a "wow" moment when I saw the performance advantage!

Paul_Harvey Hall of Fame Points: 3822 More actions · Answer 14

Hi there,

I am now going to demonstrate my lack of understanding on this NULLIF function but....

Today I discovered to my surprise (see!?!) that NULLIF(field,space(0)) not only removes spaces from my data, but also Zeros. Unfortunately 0 is a valid value in my table, and if difference from NULL.

Is there any way to limit NULLIF to removeing spaces?

Paul