December 11, 2008 at 8:31 am
Hello, all:
I'm trying to clean up a dataset that's a complete mess to make it semi-usable, and I wondered if I could get a bit of help. I'm trying to extract usable zip code data from an attendance table, but the (non-SQL) database into which the cashiers enter this data does not have any restrictions set on what can be entered in the zip code field. Therefore, I have a zip code field that has both alphanumeric and non-alphanumeric data in it. I've figured out how to filter out the non-alphanumeric data, but here's where it gets tricky.
The entry protocol dictates that if a customer is international, the cashier should enter their country code in the zip code field (I know, I know), but sometimes the cashiers have entered full country names instead of the codes. So basically, I need to keep and manually scrub 🙁 the alphabetical entries instead of getting rid of everything that's not actually a zip code. The only other filter I can apply would be one that removes any records in which the zip code field contains both alphabetical and numeric characters--it should either be one or the other. I'm not sure how to do this, since this field has an nvarchar datatype.
Sorry to be so long-winded--can you tell I'm frustrated? :crazy: Any ideas?
December 11, 2008 at 9:48 am
this task is tough and has to involve regular expression techniques. Please visit
http://regexadvice.com/forums/
and ask the qustion ther . Don't forget to include the real examples og the originla text to clean. If you don't it'll dalsy the answers.
Good luck
Sergei Z
December 11, 2008 at 9:50 am
these are the once which are both alpha and have numbers: you can decide whether to delete or manually edit from this subset, i guess
select postalcode
from yourtable
--alphanumeric ANDALSO numeric
where postalcode like [A-Z,a-z]
AND postalcode like [0-9]
Lowell
December 11, 2008 at 10:02 am
also, if you have a table with all allowable postal codes, you could simply join against it where the postalcodes not found, right:
--all invalid postalcodes
select * from yourtable
left outer join AllowedPostalcodes on yourtable.postalcode = AllowedPostalcodes.postalcode
where AllowedPostalcodes.ID IS NULL
Lowell
December 11, 2008 at 11:14 am
Lowell (12/11/2008)
these are the once which are both alpha and have numbers: you can decide whether to delete or manually edit from this subset, i guessselect postalcode
from yourtable
--alphanumeric ANDALSO numeric
where postalcode like [A-Z,a-z]
AND postalcode like [0-9]
A slight syntax correction:
select postalcode
from yourtable
--alphanumeric ANDALSO numeric
where postalcode like '%[A-Z,a-z]%'
AND postalcode like '%[0-9]%'
December 11, 2008 at 11:25 am
doh! thanks Seth;
I should have seen that, but wanted to let the OP know that he didn't need any fancy regular expressions, because it is supported to a degree in the like statement.
excellent and thanks!
Lowell
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply