Extracting Numeric values from addresses

Question

Extracting Numeric values from addresses

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 1

Oh well ... I guess that I'm one of the few surviving 'dinosaurs' left ... this tired mind remembers seeing first run epsiodes weekly in black and white !

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 2

Kenneth,

Does your Tally table have a Primary Key?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 3

The reason I ask is because I DO have a Clustered Primary Key on the N column of my Tally table and here's the results I got from the 3 queries you posted on a million row test on my desktop box... the Tally table came out the clear winner over repeated tests when the Clustered Primary Key was available... after removing the CPK, the Tally table was clearly the looser and I suspect that's the cause of your surprising results.

Parse1 - 01:02 - Tally or Nums table (WITH Clustered Primary Key)

Parse1 - 10:00+- Tally or Nums table (WITHOUT any key or index and stopped after 10 minutes)

Parse2 - 01:24 - While loop with replace

Parse3 - 01:28 - While loop with stuff

That notwithstanding, I still gotta say I admire the very clever code Ryan came up with especially since it does not require a Tally table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 4

Hmm... yes, there is. Updated stats on it just for kicks, but still the same results.

However, there may very well be some skewing going on, since I now noticed that all three variants spike the cpu to 100% when executed. This is all done on a box with very scarce resources..

Though the trend is still the same, the tally variation takes 4 seconds on the smaller set, while both the others finish in just 1 second.

Perhaps columnwidths play some role.. For my runs, I have a source table with a single varchar(35) null column containing adresslines, in total 3.7 million rows, with a clustered index.

The destination table is also a single column varchar(35) not null with a clustered index.

The tally table also single column int not null a clusterd PK and containing 8000 rows.

Apart from that, tests are done in a 'suspect' environment, since my desktop has all kinds of other stuff opened (Excel, Notes - the works..)

Probably a more fair test would be on some sort of 'server-ish' hardware...

Nonetheless, it's interesting

/Kenneth

RyanRandall SSChampion Points: 13623 More actions · Answer 5

Ah - thanks Jeff!

So it's looking like the answer to which is fastest is 'it depends'. So maybe I do have to test it in my environment after all

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 6

Yep... I'd test it... "it depends" and the age old adage of "one measurement is worth a thousand speculations" still holds true. It's especially true in the world of multi-processors...

Kenneth,

Thank you for the feedback. Yeah, I noticed that all of the examples pin the CPU right to the ceiling on single processor boxes and not much else can occur when that happens. I'll have to give the 3 methods a whirl on an 8-by server and see what happens...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 7

Sorry Kenneth... didn't exactly return the favor.

My Tally table is 9999 rows, single "n" column is INT with a Clustered PK.

The million row test table is made up of 6 different columns including an indexed date column, an identity column (nonclustered primary key), and the column under test is VARCHAR(35) and populated consistently with '1300 $t,e.s-e+ter 234'.

The box is a 1.8 Ghz single processor box with 2gb ram and IDE drives. Only QA was open during the test and the box is not networked.

This could be one of those things that is more data sensitive than box sensitive...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)