Standardizing a Field

  • I have a data issue with one column of data in a table with 400 million+ rows. I would like to standardize the format of the data to make searching it easier. Here is what I need to do:

    The format I want is a letter followed by nine numbers so that it looks like this Z123456789.

    I need a stored procedure that checks for several things and corrects them. Here are the things I need to check:

    First. Check the field (znumber) to see if it is between 7 and 9 characters. Anything less or if it is a null value or it is 10 characters, the SP should skip to the next row.

    Second, check to see if the first position has a letter character. If not, insert a "Z", and move on.

    Third. Check to see how many numbers follow the letter Z. It should be nine numbers. If it is eight numbers, a zero should be inserted after the Z. If it is 7 numbers, two zeros should be inserted after the Z.

    I need to do this on the entire data set and all new in coming data. Any direction would be much appreciated. Thanks

  • This can be done. Here are a couple of tips that should help. Check out the STUFF() function in BOL and lookup Tally table here on SSC. I think both of these will help you with the solution.

    Could you post a table definition and some sample rows (some for each of the scenarios)? See the first link in my signature for how to post this. This makes it easier to provide an accurate and tested solution.

  • The thing is that SQL doesn't process row by row. It works on the entire table or result set at a time.

    So what you can do is write conditions to check things. I'd look at the SUBSTRING function to parse out sections of your string. Then you can use IN or LEN to check for some of your conditions and then filter out those rows. The thing you haven't provided is what should happen to those rows that don't match.

  • Is something like this what you are looking for? I just used a temp table but check out the update statement. I think it is close to what you are asking.

    CREATE TABLE #temp (test VARCHAR(10))

    INSERT INTO #temp (test) values ('Z123456')

    INSERT INTO #temp (test) values ('123456789')

    INSERT INTO #temp (test) values ('Z123456789')

    INSERT INTO #temp (test) VALUES (NULL)

    INSERT INTO #temp (test) VALUES ('123')

    SELECT * FROM #temp

    UPDATE #temp SET test = CASE WHEN ((LEN(test) BETWEEN 7 AND 9) OR test IS NULL) THEN

    CASE WHEN LEFT(test,1) = 'Z' then

    LEFT(test,1) + REPLICATE('0',10-LEN(test))+SUBSTRING(test,2,LEN(test))

    ELSE

    'Z' + REPLICATE('0',9-LEN(test))+test

    END

    ELSE

    test

    END

    SELECT * FROM #temp

    drop table #temp

    As for the "future data", you could try a constraint on the table.

  • Matt, that was perfect. I hate how easy you guys all make it seem. I screwed around with Case statement for an entire day. Thanks again.

  • I'd make one change to Matt's script.

    I'd put something like this in the WHERE clause to avoid touching rows that you are not going to update:

    UPDATE #temp

    SET test = CASE WHEN LEFT(test, 1) = 'Z'

    then LEFT(test, 1) + REPLICATE('0',

    10 - LEN(test))

    + SUBSTRING(test, 2, LEN(test))

    ELSE 'Z' + REPLICATE('0', 9 - LEN(test)) + test

    END

    WHERE

    LEN(test) BETWEEN 7 AND 9 OR test IS NOT NULL

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply