How to Normalize many columns of CSV (mostly) values. Example xls included

  • If you check out my example for the real data that I am trying to normalize. This is truncated, there are actually 60 rows and I included 16. Data types should be clear when you look, most fields are varchar(max) but there are a few int.

    What I am currently doing is:

    For the 'SALNUM' column, I have a table with every number that is in use. I have tried two different methods to do a comparrison with a cross join

    select * from this_table tt cross join all_salnum_table ast cross join all_PC_table apt

    where charindex(ast.salnum,tt.SALNUM) > 0 --(method 1) very very slow

    where tt.SALNUM like '%' + ast.salnum + '%' --(method 2) slow but not as slow as method 1

    I have to do this for every line, every evaluated column.

    additionally, some of these can be blank which I handle in one of two ways. If the column name has an 'E' at the end, then it is an exclusion and it means exclude none, if it has an 'I' it is an include and a blank means 'ALL'

    so I add to me where clause | tt.PCI /*when 'E' then NOT*/ like '%' + apt.PCI + '%' OR tt.PCI = ''

    The problem that I run into here is that the SALNUM column can have 200 items in the list. so for the first 4 rows of counter/Counter1 (which has 109 items) I get 4*109.

    Thats isnt too bad.

    now the PCI column usually has 1 or 2, so 4*109*2. THe PCE column frequently has 8-16 so 4*109*2*16

    GASFLGI and GASFLGE are ints so 4*109*2*16*2*2 and so on.

    There are two things that I think are killing me. Either the massive line count I get from running this without the where statements, or that the were statements are really slow. At this point I'm not sure if using UNPIVOT (and the XML trick for csv items in the column value) will help and if so how to actually pull it off. or maybe there is a better solution.

    Thanks for any help in advance

  • not sure this will help, or if its too lightweight, or if i completey missed the point, but i think this may be worth a look for you.

    http://www.amazon.com/Excel-2007-PivotTables-Recipes-Problem-Solution/dp/1590599209/ref=sr_1_7?s=books&ie=UTF8&qid=1300460093&sr=1-7#reader_1590599209

  • I actually need to UNPIVOT the CSV values instead of PIVOT them. The real difficulty I am having is the where clauses right now

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply