Interesting pattern-matching problem

  • I'm creating a billing system for an international carrier's telephone switch. Each call that goes through the switch is logged in SQL server, in a table called DSC_Archive that includes the following fields, among others:

    - OrigGrp (the ID of the customer initiating the call - i.e. BellSouth, Qwest, etc.)

    - DialedNo (the number dialed)

    - OrigDate (the date the call started)

    - CallDur (the length of the call in minutes)

    - RateID (the ID of the rate at which the call should be billed)

    I also have a table called Rates, which holds the per-minute price charged each customer for each dialing area.

    - RateID

    - OrigGrp (the ID of the customer)

    - Prefix (the first few digits of the dialed number, which identifies the dialing area)

    - RegionName (the name of the dialing area)

    What makes this problem challenging is that the dialing areas aren't mutually exclusive. For instance, you have these rates for a given customer:

    
    
    RateID Prefix RegionName Rate
    ------ ------ ------------ ------
    1727 62 Indonesia .17250
    1728 6221 Indonesia - Jakarta .02870
    1729 628 Indonesia - Mobile .12420

    So if the number dialed is 628149234, the call is billed as "Indonesia - Mobile", while if the number dialed is 62595923, the call is billed as "Indonesia". The rule is that the _longest_ matching prefix determines the rate.

    As the data is added, each call is tagged with the correct RateID. However, sometimes the Rates table is updated after the calls have been logged, and I need to go through the entire DSC_Archive table (which currently has 12 million rows, for three months' worth of practice data) and make sure the rates are correct.

    Here's the code I use right now to go through DSC_Archive and assign the correct RateID to each call:

     
    
    UPDATE DSC_Archive
    SET
    RateID=
    (
    SELECT TOP 1 r.RateID
    FROM Rates r
    WHERE d.DialedNo LIKE r.Prefix+'%' // match prefix
    ANDd.OrigGrp = r.OrigGrp // match customer
    ORDER BY LEN(r.Prefix) DESC // longest matching prefix wins
    )
    FROM DSC_Archive d

    This works, but it takes hours with only three month's worth of data. Is there a more efficient way to do this?

    Thanks

    Herb

    Edited by - hcaudill on 07/12/2002 12:28:46 PM

    Edited by - hcaudill on 07/12/2002 12:30:39 PM

  • I couldn't get the query to work so I don't have proof but my guess is that the Order By in your correlated subselect is killing you.

    You don't state it but from your example my guess is that there are only 3 possible lengths for the prefixes. How about running the following queries:

    UPDATE DSC_Archive

    SET RateID = r.RateID

    FROM Rates r INNER JOIN DSC_Archive d

    ON r.Prefix = SUBSTRING(d.DialedNo,1,2)

    UPDATE DSC_Archive

    SET RateID = r.RateID

    FROM Rates r INNER JOIN DSC_Archive d

    ON r.Prefix = SUBSTRING(d.DialedNo,1,3)

    UPDATE DSC_Archive

    SET RateID = r.RateID

    FROM Rates r INNER JOIN DSC_Archive d

    ON r.Prefix = SUBSTRING(d.DialedNo,1,4)

    The order is important and I would run them as one transaction. The solution isn't elegant and I haven't tried it with a large dataset but the joins are simple enough.

  • I believe I helped with this same thing previously. Can you run the following so I can get an idea of what the Query Manager is doing. Run

    SET SHOWPLAN_TEXT ON

    GO

    YourQueryRunsHere

    GO

    And paste the resulting output here. May can see a way to speed things along.

    Also it may help to play with the Rates table by adding a length column and put a trigger on it for inserts and updates to keep the length accurate. Then place a clustered index key on that column in desc order so you can reduce the order time. (This may not help as I cannot test for sure, but it might).

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • First the short term fix:

    A helper view to make things easier

    CREATE VIEW HelperView

    AS

    SELECT

    Prefix,

    DialedNo,

    RateID,

    CHARINDEX(REVERSE(Prefix), REVERSE(DialedNo)) AS PrefixPositionFromRight

    FROM DSC_Archive a, Rates b

    WHERE a.DialedNo LIKE b.Prefix + '%'

    Finally, to get the data we want:

    SELECT *

    FROM HelperView a

    WHERE

    PrefixPositionFromRight = (SELECT MIN(PrefixPositionFromRight)

    FROM HelperView b

    WHERE a.DialedNo = b.DialedNo)

    Since all the data you need is already in this query, I suggest removing/not using the RateID column in the DSC_Archive table. You can wrap this SELECT statement as another view so that other tools and users can use it without knowing how its implemented

    The medium term fix:

    Assign the value of RateID in DSC_Archive as calls are placed. Its likely that the rate should be determined when the call is placed and not some time after. You can "precompute" the rate so that your queries will effectively handle only a small subset of data at a time avoiding massive, long running UPDATE queries every month or so.

  • I rewrote the SELECT statement I originally posted as it uses a correlated subquery and we all know correlated subqueries are almost always slower than an equivalent query implemented thru joins. Although the queries are equivalent, the query optimizer is not smart enough to know and rewrite the query itself.

    The first version takes a little more than a minute. Any interactive scenario that takes more than a minute is too slow for me =). This on the computer Im working on as I browse the web in the foreground with the query running in the background for 1,000,000 rows on the table with the DialedNo column.

    This is the rewritten version:

    SELECT a.DialedNo, a.Prefix, a.PrefixPositionFromRight

    FROM HelperView a, HelperView b

    WHERE

    a.DialedNo = b.DialedNo

    GROUP BY

    a.DialedNo, a.Prefix, a.PrefixPositionFromRight

    HAVING

    a.PrefixPositionFromRight = MIN(b.PrefixPositionFromRight)

    This version takes only 15 seconds!!!!! I guess that should be fast enough for most purposes =)

  • I rewrote the SELECT statement I originally posted as it uses a correlated subquery and we all know correlated subqueries are almost always slower than an equivalent query implemented thru joins. Although the queries are equivalent, the query optimizer is not smart enough to know and rewrite the query itself.

    The first version takes a little more than a minute. Any interactive scenario that takes more than a minute is too slow for me =). This on the computer Im working on as I browse the web in the foreground with the query running in the background for 1,000,000 rows on the table with the DialedNo column.

    This is the rewritten version:

    SELECT a.DialedNo, a.Prefix, a.PrefixPositionFromRight

    FROM HelperView a, HelperView b

    WHERE

    a.DialedNo = b.DialedNo

    GROUP BY

    a.DialedNo, a.Prefix, a.PrefixPositionFromRight

    HAVING

    a.PrefixPositionFromRight = MIN(b.PrefixPositionFromRight)

    This version takes only 15 seconds!!!!! I guess that should be fast enough for most purposes =)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply