Indexing question

Question

Post reply

Indexing question

mrpolecat

SSCertifiable

Points: 7014
More actions
October 2, 2007 at 1:43 pm

#182125

I have a single column table with 140 Million rows. The single column is the Primary Key. This table is joined to others to do updates. Would the updates benefit from applying an index to this table?

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110208 More actions · Answer 1

Hard to believe, but PK is an index.

_____________
Code for TallyGenerator

mrpolecat SSCertifiable Points: 7014 More actions · Answer 2

I thought that was the case but originally the table had no index or PK. I added the PK and it did nothing to speed up my query. The table is truncated and bulkloaded each week from a text file and the PK slowed the process and was removed. Bad form, I know, but the benefit was there.

Sergiy SSC Guru Points: 110208 More actions · Answer 3

Index is not a magic wand.

Sometimes quality of design and queries is important as well.

😎

_____________
Code for TallyGenerator

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

One more thing to consider here, what is the cost of creating the index and what is the cost of not having in for the period between reloads.

As Sergiy said, there's something to be said for having the right design. But that is a call you'll have to make because we don't have enough info to make that call for ourselves.

Sergiy SSC Guru Points: 110208 More actions · Answer 5

One thing I can say for sure:

if the system requires 140 million rows downloads every week just to check if there are any changes then it was designed mindlessly.

_____________
Code for TallyGenerator

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Agreed, but we don't have enough details to call bullshit... yet.

mrpolecat SSCertifiable Points: 7014 More actions · Answer 7

The cost to have it in during the import is about an hour of extra load time I am told. I haven't seen this for myself. I will test it when I can. The cost to build the PK was about 40 Minutes. The update queries that use it take about 12 minutes regardless of the PK. I have tested writing the query few different ways.

1----------------------

select uid2,invalidflag from tb_leadbox

WHERE (HomePhone IN

(SELECT dbo.tb_LeadBox.Homephone

FROM dbo.tb_DNCNAT INNER JOIN

dbo.tb_LeadBox ON dbo.tb_DNCNAT.DNC_PHONE = dbo.tb_LeadBox.HomePhone and invalidflag = 0))

and invalidflag = 0 and sourcecode not in (SELECT SourceCode

FROM tb_LB_SourceCode

WHERE (Nat_DNC = 1))

2------------------------

select uid2,invalidflag from tb_leadbox

WHERE (HomePhone IN

(SELECT DNC_PHONE

FROM dbo.tb_DNCNAT ))

and invalidflag in (0,7) and sourcecode not in (SELECT SourceCode

FROM tb_LB_SourceCode

WHERE (Nat_DNC = 1))

3-----------------------

select uid2,invalidflag from tb_leadbox l

join tb_dncnat on homephone = dnc_phone

join tb_lb_sourcecode s on l.sourcecode = s.sourcecode

where (nat_dnc <> 1 or nat_dnc is null) and invalidflag in (0,7)

tables------------------------

tb_leadbox has indexes on uid2(PK), invalidflag, and homephone. 2M records

lb_sourcecode has indexes on sourccode(PK)

150 Records

mrpolecat SSCertifiable Points: 7014 More actions · Answer 8

Ninja's_RGR'us (10/2/2007)
Agreed, but we don't have enough details to call bullshit... yet.

It is the National Do Not Call List designeed by a Federal Agency that we are forced to use and keep current. Call it what you will.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

Indexes are not really usefull when you need to scan the whole table like that. Since you report no performance inprovement, I'd just drop the matter.

The real question is can you get a list of only modifications? (new numbers on the list and numbers removed from the list). That would considerably cut down on the time it takes to run this process...

Maybe you could even write your own... maybe it's less expansive to extract the changes from 2 different imports and then update the rest of the data than doing the 3 queries you run now. I'm just throwing this idea out there, I've never had to deal with that much data in a scenario like that.

mrpolecat SSCertifiable Points: 7014 More actions · Answer 10

Unfortunately the list only comes one way, in bulk. Since there are updates and adds in both the DNC and the customer table it was easier to just compare the two against each other rather than only look for adds and modifies on both sides. Note I said easier, not necessarily smarter. I had a huge breakthrough from an earlier post so now I am reviewing every step of the process to see what else can be done. Also it pains me to have a table with no PK.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

I've never seen this file, can you clear something up for me?

You have a DNC list, hence (in my head anyways), you have a list of # to NOT call. So the numbers can only be added and removed from the list (even if they are updated, how the heck can you track that with only 1 column).

Am I missing something obvious from the info you gave me, or is there something else I don't know.

If I'm not missing anything, it should be pretty easy to only track differences :

List of deletes = Exists in old but not in new

List of inserts = Exists in new but not in old

This could easily be done with a full join and track what's missing. You'd scan each table only once. I don't see how much faster than this you can get!!!

Now I'm assuming there's a catch for the matching to your list, but I can only presume at this point.

noeld SSC Guru Points: 96590 More actions · Answer 12

Another trick for the large import is to set the DB in BULK_LOGGED mode before the import and if possible run bcp/bulkinsert in parallel (if at all possible)

I am also confused by a "single column" table "design" 😉

* Noel

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

How does this bulk load work? Is this only a parameter you set or do you have to manually split the data file and call imports in parallel?

noeld SSC Guru Points: 96590 More actions · Answer 14

Ninja's_RGR'us (10/2/2007)
How does this bulk load work? Is this only a parameter you set or do you have to manually split the data file and call imports in parallel?

instead of giving you the "short" answer here you can have the full story

* Noel