Selecting a range from mixed alpha-numeric column

  • Hello all, My company has me working on a system to look up parts between a range of numbers. The problem is that some of the part numbers have alpha and special characters in them. An example is:

    123-1204a

    Convert will not work in the case. I wrote a proc that removes all the non numeric characters but it runs way to slowly with nearly 150000 parts to be searched. I need them to be able to put in a range of say 1000 - 2000 and get all of the parts that fall within that range including the parts with alpha and special characters in them.

    Thanks for any help.

  • You should be able to do range searches on strings.

    create table RangeProof (

    ID varchar(25) primary key)

    go

    insert into dbo.rangeproof(id)

    select '123-1234a' union all

    select '124-1234a' union all

    select '125-1234a' union all

    select '123-1234b' union all

    select '123-1234c' union all

    select '123-1234d' union all

    select '123-1234e' union all

    select '123-1234f' union all

    select '122-1234a' union all

    select 'a23-1234a' union all

    select 'b23-1234a' union all

    select '12a-1234a' union all

    select '12a-1234b' union all

    select '12b-1234a' union all

    select '12b-1234b' union all

    select '12b-1234c'

    go

    select *

    from dbo.rangeproof

    where id between '123' and '123-1234b'

    I tested it with a few ranges of codes, and it worked.

    Or is there some other part of the problem that I'm not seeing?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the reply... The problem is that say you have a part number of 10001b and you select a range from 1000-2000 it will include the 10001b in the range and it should not be included because the first part of the number is greater than 2000...

    any other ideas?

  • biscuit (3/14/2008)


    An example is:

    123-1204a

    I'm curious, for the part number example you give above, what range would you input to check for this number, theoretically speaking? (I'm asking to understand your problem).

    (edit...fixed tags)

    If it was easy, everybody would be doing it!;)

  • That number would be between 100 and 200. basically they want to disregard anything after the first non numeric character and consider that to be the part number. If I use the between clause then the number would be returned but a lot of other numbers such as 110000b would also be returned.

    Here's an example

    I did a query with a range of 1000 - 2000. It returned the records between 1000 and 2000 but it also returned 200-13726-2 which if you disregard the numbers after the dash is only 200.

  • I'm working on a solution using 'like' in the where...I don't use that much so it is trial and error. Perhaps someone else can chime in here.

    where partnumber like '[1][0-9][0-9][_]%'

    ...I'm messing with this now...but feel free to give it a try too! 😀

    If it was easy, everybody would be doing it!;)

  • Using G's data...

    select *

    from rangeproof

    where id like '[1][0-9][0-9][-]%'

    ...this query pulled back all the ids that were of the format '1##-'

    If you want to check for a character in a position use '[a-z]'

    You can adjust the 'like' for whatever format you need to find. Give it a try against your data and see if this has potential.

    Hope this helped!

    If it was easy, everybody would be doing it!;)

  • Thanks for the reply. Unfortunately, the data is not uniform enough. one part number might be 100-1023a and another would be 1000000-10b and still another on might be 1030. I can't think of a like clause that could cover all of these circumstances.

  • If I had this issue, the first avenue I would explore is whether I could order the partcodes in the order I thought was correct. If so, your best bet is to get a row number on there, figure out the row numbers of the lower and upper bounds of your search, and filter that way. Possibly you have other data (like the division the parts are used by, where they're created, something) to help you order them.

    If you don't, are there reasonable limits on how many places the initial number can be? I.e. I'm assuming it's at least 1 to 5 places (1a-123 -> 12345a-123), but any more than that? Is the part code generator reasonably static?

    Here's where I'm going:

    Let's say the range above is 1-5, and it's pretty static:

    create table #PartCodeTable ( partcode varchar(10) )

    --all the single-number partcodes:

    select

    case

    when isnumeric(left(partcode, 5)) = 1 then 5

    when isnumeric(left(partcode, 4)) = 1 then 4

    when isnumeric(left(partcode, 3)) = 1 then 3

    when isnumeric(left(partcode, 2)) = 1 then 2

    when isnumeric(left(partcode, 1)) = 1 then 1

    end as code_numlength,

    partcode

    from #PartCodeTable

    This would segment the parts into different buckets. You could then generate row numbers, ordering by code_numlength, then partcode. I confess, I don't know how fast isnumeric() is, but it's worked well for me in the past. This is more of a brute-force method than I'd like, but there you are...

  • thanks guys for all of the suggestions. Since the part numbers are created by humans, there is no set format or any kind of normalization. I've pasted the proc that I wrote below in hopes that maybe someone can improve it to speed things up.

    ALTER PROCEDURE [dbo].[PC_ViewRange]

    (

    @start int ,

    @end int

    )

    AS

    CREATE TABLE #output (idINT,

    partidVARCHAR(20),

    partdescVARCHAR(250),

    reservedBIT,

    reservedby VARCHAR(20),

    reserveddate VARCHAR(10),

    insystemBIT,

    locationVARCHAR(20),

    econumberVARCHAR(10)

    )

    DECLARE CUR CURSOR FOR

    SELECT a.id, a.partid, a.partdesc, b.reserved, b.reservedby, b.reserveddate, b.insystem, b.location, b.econumber

    FROM systemparts as a left outer join systempartsdetails as b

    on a.id = b.partid

    ORDER BY a.partid

    OPEN CUR

    DECLARE @idINT,

    @partidVARCHAR(20),

    @partdescVARCHAR(250),

    @reservedBIT,

    @reservedbyVARCHAR(20),

    @reservedDateVARCHAR(10),

    @insystemBIT,

    @locationVARCHAR(20),

    @econumberVARCHAR(10),

    @convertVARCHAR(100),

    @convert2VARCHAR(100),

    @posINT

    FETCH NEXT FROM CUR INTO @id,@partid, @partdesc, @reserved, @reservedBy, @reserveddate, @insystem, @location, @ecoNumber

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @convert2 = ''

    if (CHARINDEX('-', @partid) > 0)

    SET @convert = substring(@partID,0,CHARINDEX('-',@partid))

    ELSE

    SET @convert = @partid

    SET @pos = 1

    WHILE (@pos <= LEN(@convert))

    BEGIN

    IF ASCII(SUBSTRING(@convert,@pos,1)) BETWEEN 48 AND 57

    SET @convert2 = @convert2 + SUBSTRING(@convert,@pos,1)

    ELSE

    BREAK

    SET @pos=@pos+1

    END

    IF (CONVERT(FLOAT,@convert2) BETWEEN @start AND @end)

    INSERT INTO #output VALUES (@id, @partid, @partdesc, @reserved, @reservedby,@reserveddate, @insystem, @location, @econumber)

    FETCH NEXT FROM CUR INTO @id,@partid, @partdesc, @reserved, @reservedBy, @reserveddate, @insystem, @location, @ecoNumber

    END

    CLOSE CUR

    DEALLOCATE CUR

    SELEct * FROM #output

    DROP TABLE #output

  • biscuit (3/14/2008)


    Thanks for the reply. Unfortunately, the data is not uniform enough. one part number might be 100-1023a and another would be 1000000-10b and still another on might be 1030. I can't think of a like clause that could cover all of these circumstances.

    That is why I asked what range your sample number would be in.

    So you want your range to be 100 - 1000000 (or whatever is needed), and include all the other characters too? That is different than what you implied. :hehe:

    If it was easy, everybody would be doing it!;)

  • The range would not generally be that big but the maximum number for a part is 20 characters. I just want to disregard anything beyond the numeric portion of the left of the string and use that as a comparison to see if it is in range. I hope that makes sense.

    Thanks

  • OK, I see what you are trying to do now...(that helps, right?)

    I have some ideas using string functions...I'll cobble some things together as I have time to see what I can come up with.

    By the way, I can see why your current process is so slow...that is definitely doing it 'THE HARD WAY'.

    If it was easy, everybody would be doing it!;)

  • One other thing I'm thinking of, you can always add a new column to the table that contains only the numeric portion of the part number. You will have to populate it for all the existing records once, then have your 'insert' routine do the same kind of parsing so that you populate it for each new part number as it is added.

    That way you can do your searches on that column...much faster, especially if you do these part number searches often (and it sounds like you will be).

    If it was easy, everybody would be doing it!;)

  • That's a good idea. I need to check all the procs that access that table to see if they will be broken by a new column (I didn't develop them)...

    Thanks again.

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply