October 2, 2007 at 1:43 pm
October 2, 2007 at 2:08 pm
Hard to believe, but PK is an index.
_____________
Code for TallyGenerator
October 2, 2007 at 2:23 pm
October 2, 2007 at 2:39 pm
Index is not a magic wand.
Sometimes quality of design and queries is important as well.
😎
_____________
Code for TallyGenerator
October 2, 2007 at 2:48 pm
One more thing to consider here, what is the cost of creating the index and what is the cost of not having in for the period between reloads.
As Sergiy said, there's something to be said for having the right design. But that is a call you'll have to make because we don't have enough info to make that call for ourselves.
October 2, 2007 at 2:55 pm
One thing I can say for sure:
if the system requires 140 million rows downloads every week just to check if there are any changes then it was designed mindlessly.
_____________
Code for TallyGenerator
October 2, 2007 at 3:00 pm
Agreed, but we don't have enough details to call bullshit... yet.
October 2, 2007 at 3:05 pm
The cost to have it in during the import is about an hour of extra load time I am told. I haven't seen this for myself. I will test it when I can. The cost to build the PK was about 40 Minutes. The update queries that use it take about 12 minutes regardless of the PK. I have tested writing the query few different ways.
1----------------------
select uid2,invalidflag from tb_leadbox
WHERE (HomePhone IN
(SELECT dbo.tb_LeadBox.Homephone
FROM dbo.tb_DNCNAT INNER JOIN
dbo.tb_LeadBox ON dbo.tb_DNCNAT.DNC_PHONE = dbo.tb_LeadBox.HomePhone and invalidflag = 0))
and invalidflag = 0 and sourcecode not in (SELECT SourceCode
FROM tb_LB_SourceCode
WHERE (Nat_DNC = 1))
2------------------------
select uid2,invalidflag from tb_leadbox
WHERE (HomePhone IN
(SELECT DNC_PHONE
FROM dbo.tb_DNCNAT ))
and invalidflag in (0,7) and sourcecode not in (SELECT SourceCode
FROM tb_LB_SourceCode
WHERE (Nat_DNC = 1))
3-----------------------
select uid2,invalidflag from tb_leadbox l
join tb_dncnat on homephone = dnc_phone
join tb_lb_sourcecode s on l.sourcecode = s.sourcecode
where (nat_dnc <> 1 or nat_dnc is null) and invalidflag in (0,7)
tables------------------------
tb_leadbox has indexes on uid2(PK), invalidflag, and homephone. 2M records
lb_sourcecode has indexes on sourccode(PK)
150 Records
October 2, 2007 at 3:06 pm
October 2, 2007 at 3:20 pm
Indexes are not really usefull when you need to scan the whole table like that. Since you report no performance inprovement, I'd just drop the matter.
The real question is can you get a list of only modifications? (new numbers on the list and numbers removed from the list). That would considerably cut down on the time it takes to run this process...
Maybe you could even write your own... maybe it's less expansive to extract the changes from 2 different imports and then update the rest of the data than doing the 3 queries you run now. I'm just throwing this idea out there, I've never had to deal with that much data in a scenario like that.
October 2, 2007 at 3:38 pm
Unfortunately the list only comes one way, in bulk. Since there are updates and adds in both the DNC and the customer table it was easier to just compare the two against each other rather than only look for adds and modifies on both sides. Note I said easier, not necessarily smarter. I had a huge breakthrough from an earlier post so now I am reviewing every step of the process to see what else can be done. Also it pains me to have a table with no PK.
October 2, 2007 at 3:50 pm
I've never seen this file, can you clear something up for me?
You have a DNC list, hence (in my head anyways), you have a list of # to NOT call. So the numbers can only be added and removed from the list (even if they are updated, how the heck can you track that with only 1 column).
Am I missing something obvious from the info you gave me, or is there something else I don't know.
If I'm not missing anything, it should be pretty easy to only track differences :
List of deletes = Exists in old but not in new
List of inserts = Exists in new but not in old
This could easily be done with a full join and track what's missing. You'd scan each table only once. I don't see how much faster than this you can get!!!
Now I'm assuming there's a catch for the matching to your list, but I can only presume at this point.
October 2, 2007 at 4:07 pm
Another trick for the large import is to set the DB in BULK_LOGGED mode before the import and if possible run bcp/bulkinsert in parallel (if at all possible)
I am also confused by a "single column" table "design" 😉
* Noel
October 2, 2007 at 4:21 pm
How does this bulk load work? Is this only a parameter you set or do you have to manually split the data file and call imports in parallel?
Viewing 15 posts - 1 through 15 (of 27 total)
You must be logged in to reply to this topic. Login to reply