Interesting pattern-matching problem

Question

Interesting pattern-matching problem

hcaudill

Newbie

Points: 7
More actions
July 12, 2002 at 12:25 pm

#101195
I'm creating a billing system for an international carrier's telephone switch. Each call that goes through the switch is logged in SQL server, in a table called DSC_Archive that includes the following fields, among others:
- OrigGrp (the ID of the customer initiating the call - i.e. BellSouth, Qwest, etc.)
- DialedNo (the number dialed)
- OrigDate (the date the call started)
- CallDur (the length of the call in minutes)
- RateID (the ID of the rate at which the call should be billed)
I also have a table called Rates, which holds the per-minute price charged each customer for each dialing area.
- RateID
- OrigGrp (the ID of the customer)
- Prefix (the first few digits of the dialed number, which identifies the dialing area)
- RegionName (the name of the dialing area)
What makes this problem challenging is that the dialing areas aren't mutually exclusive. For instance, you have these rates for a given customer:
```
RateID   Prefix    RegionName                  Rate

------   ------    ------------                ------

1727     62        Indonesia                   .17250

1728     6221      Indonesia - Jakarta         .02870

1729     628       Indonesia - Mobile          .12420
```
So if the number dialed is 628149234, the call is billed as "Indonesia - Mobile", while if the number dialed is 62595923, the call is billed as "Indonesia". The rule is that the _longest_ matching prefix determines the rate.
As the data is added, each call is tagged with the correct RateID. However, sometimes the Rates table is updated after the calls have been logged, and I need to go through the entire DSC_Archive table (which currently has 12 million rows, for three months' worth of practice data) and make sure the rates are correct.
Here's the code I use right now to go through DSC_Archive and assign the correct RateID to each call:
```
 

UPDATE DSC_Archive 

SET

RateID=

(

SELECT TOP 1 r.RateID

FROM Rates r

WHERE d.DialedNo LIKE r.Prefix+'%'  // match prefix

ANDd.OrigGrp = r.OrigGrp        // match customer

ORDER BY LEN(r.Prefix) DESC         // longest matching prefix wins

)

FROM DSC_Archive d
```
This works, but it takes hours with only three month's worth of data. Is there a more efficient way to do this?
Thanks
Herb
Edited by - hcaudill on 07/12/2002 12:28:46 PM
Edited by - hcaudill on 07/12/2002 12:30:39 PM

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

siegfried.the@gmail.com SSC Journeyman Points: 75 More actions · Answer 1

I couldn't get the query to work so I don't have proof but my guess is that the Order By in your correlated subselect is killing you.

You don't state it but from your example my guess is that there are only 3 possible lengths for the prefixes. How about running the following queries:

UPDATE DSC_Archive

SET RateID = r.RateID

FROM Rates r INNER JOIN DSC_Archive d

ON r.Prefix = SUBSTRING(d.DialedNo,1,2)

UPDATE DSC_Archive

SET RateID = r.RateID

FROM Rates r INNER JOIN DSC_Archive d

ON r.Prefix = SUBSTRING(d.DialedNo,1,3)

UPDATE DSC_Archive

SET RateID = r.RateID

FROM Rates r INNER JOIN DSC_Archive d

ON r.Prefix = SUBSTRING(d.DialedNo,1,4)

The order is important and I would run them as one transaction. The solution isn't elegant and I haven't tried it with a large dataset but the joins are simple enough.

Antares686 SSC Guru Points: 125444 More actions · Answer 2

I believe I helped with this same thing previously. Can you run the following so I can get an idea of what the Query Manager is doing. Run

SET SHOWPLAN_TEXT ON

GO

YourQueryRunsHere

GO

And paste the resulting output here. May can see a way to speed things along.

Also it may help to play with the Rates table by adding a length column and put a trigger on it for inserts and updates to keep the length accurate. Then place a clustered index key on that column in desc order so you can reduce the order time. (This may not help as I cannot test for sure, but it might).

"Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

omen SSC Enthusiast Points: 198 More actions · Answer 3

First the short term fix:

A helper view to make things easier

CREATE VIEW HelperView

AS

SELECT

Prefix,

DialedNo,

RateID,

CHARINDEX(REVERSE(Prefix), REVERSE(DialedNo)) AS PrefixPositionFromRight

FROM DSC_Archive a, Rates b

WHERE a.DialedNo LIKE b.Prefix + '%'

Finally, to get the data we want:

SELECT *

FROM HelperView a

WHERE

PrefixPositionFromRight = (SELECT MIN(PrefixPositionFromRight)

FROM HelperView b

WHERE a.DialedNo = b.DialedNo)

Since all the data you need is already in this query, I suggest removing/not using the RateID column in the DSC_Archive table. You can wrap this SELECT statement as another view so that other tools and users can use it without knowing how its implemented

The medium term fix:

Assign the value of RateID in DSC_Archive as calls are placed. Its likely that the rate should be determined when the call is placed and not some time after. You can "precompute" the rate so that your queries will effectively handle only a small subset of data at a time avoiding massive, long running UPDATE queries every month or so.

omen SSC Enthusiast Points: 198 More actions · Answer 4

I rewrote the SELECT statement I originally posted as it uses a correlated subquery and we all know correlated subqueries are almost always slower than an equivalent query implemented thru joins. Although the queries are equivalent, the query optimizer is not smart enough to know and rewrite the query itself.

The first version takes a little more than a minute. Any interactive scenario that takes more than a minute is too slow for me =). This on the computer Im working on as I browse the web in the foreground with the query running in the background for 1,000,000 rows on the table with the DialedNo column.

This is the rewritten version:

SELECT a.DialedNo, a.Prefix, a.PrefixPositionFromRight

FROM HelperView a, HelperView b

WHERE

a.DialedNo = b.DialedNo

GROUP BY

a.DialedNo, a.Prefix, a.PrefixPositionFromRight

HAVING

a.PrefixPositionFromRight = MIN(b.PrefixPositionFromRight)

This version takes only 15 seconds!!!!! I guess that should be fast enough for most purposes =)

omen SSC Enthusiast Points: 198 More actions · Answer 5

I rewrote the SELECT statement I originally posted as it uses a correlated subquery and we all know correlated subqueries are almost always slower than an equivalent query implemented thru joins. Although the queries are equivalent, the query optimizer is not smart enough to know and rewrite the query itself.

The first version takes a little more than a minute. Any interactive scenario that takes more than a minute is too slow for me =). This on the computer Im working on as I browse the web in the foreground with the query running in the background for 1,000,000 rows on the table with the DialedNo column.

This is the rewritten version:

SELECT a.DialedNo, a.Prefix, a.PrefixPositionFromRight

FROM HelperView a, HelperView b

WHERE

a.DialedNo = b.DialedNo

GROUP BY

a.DialedNo, a.Prefix, a.PrefixPositionFromRight

HAVING

a.PrefixPositionFromRight = MIN(b.PrefixPositionFromRight)

This version takes only 15 seconds!!!!! I guess that should be fast enough for most purposes =)