Keeping a distinct list of inserted values ....

Question

Keeping a distinct list of inserted values ....

jackimo

SSCommitted

Points: 1885
More actions
March 18, 2008 at 6:29 am

#184394

I am trying to keep a distinct list of inserted values from a table. For example I have a table that lists 'customers' on any given day (around 7000 each day). I have a web app the lists distinct customer last names to query on. I am thinking that maybe an insert trigger that fires during an insert into the customer table and checks for that customer name in the distinct table and if its not found add it? Does this sound reasonable? Will the trigger slow down the insert into the master table? Does anyone have a sample create trigger script that would help me along?
Thanks everyone for their comments and suggestions!:D

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Kent Waldrop Hall of Fame Points: 3064 More actions · Answer 1

7000 is not a large number; it seems to me that it might be more appropriate to have your select list just fetch the distinct records rather than take the overhead of a trigger to maintain this. What are you reasons for wanting to materialize this data?

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 2

Take a look at the OUTPUT clause as a mechanism to capture the data that is inserted as it gets inserted. This is much more effecient, as well as easier to maintain and more transparent, than using a trigger.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jackimo SSCommitted Points: 1885 More actions · Answer 3

That's 7000 records added daily - over time I would prefer not to query hundreds of thousands of records to get the distinct list.

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 4

Crud. I completely misread your question. Disregard my last post. I'm an idiot.

Why can't you simply do a join against your table? Especially since we're talking about a distinct list, so it should have a very high selectivity, you should be able to write a query that runs just fine. You'd only be looking at two million rows a year. That's not that big. 20 million in ten years? And that assumes that every name entered is unique. What happens when you start getting duplicate hits? Less data. So you don't need a trigger, you just need a join on the insert such that it doesn't allow an insert where the name exists. Doing a join (not an individual look up) will make it set based... Should be fine.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jackimo SSCommitted Points: 1885 More actions · Answer 5

Thanks for the reply Grant,

I guess I need some clarification. I am using a stored procedure insert to insert records into a table - the primary key is not the customer name but another identifier not related to the customer. I call the stored procedure each time I read a record out of a log file. Could you give me an example? I could simply query the distinct table and insert the customer name if it does not exist but I think that would be too slow.

Here is my stored procedure (if it helps any) - I cut it down to make it easily to read.

CREATE proc [dbo].[MessageContentInsert]

@messageid varchar(50),

@customer varchar(40),

@city varchar(40),

@state varchar(40),

as

insert into MessageContent(messageid,customer,city,state)

values (@messageid,@customer,@city,@state)

Thanks again!

Kent Waldrop Hall of Fame Points: 3064 More actions · Answer 6

Just make sure you have an index that has customer Name first and insert using a WHERE NOT EXISTS clause on customer name -- should be very fast.

Edit:

Something else you might want to think about is the row-by-row process used for inserting this data. Transact SQL works far more efficiently with sets of data than it does with row-by-row processes. It might be worthwhile to change your process to take advantage of set-based processing rather than record-based processing.

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 7

I tried it two ways. I set up a table with a million rows for the tests.

This way works very well, with a nice index seek and only four reads and runs in 1ms, but only works off of one name at a time.

DECLARE @test-2 NVARCHAR(75)

SET @test-2 = 'aacfjipbavrosganqinf'

IF NOT EXISTS ( SELECT Customername

FROM namelist

WHERE customername = @test-2 )

INSERT INTO namelog ( LogName )

VALUES ( @test-2 )

Here's an example where you pass in all the list of names as a set, using XML but you could supply the set in some other fashion, and then uses a join to eliminate the list. This one takes a lot longer, 12ms. It still does a very nice index seek and nested loop on the XML data. Parsing the XML data takes the most time in the query.

(substitute an angled bracked for the square brackets)

DECLARE @test-2 XML

SET @test-2 = '[root]

[customername name="aacfjipbavrosganqinf"][/customername]

[customername name="54321" ][/customername]

[/root]'

INSERT INTO namelog ( LogName )

SELECT l.customername

FROM ( SELECT Ord.value('(@name)', 'nvarchar(75)') AS customername

FROM @test.nodes('/root/customername') AS x ( Ord )

) AS l

LEFT JOIN namelist n

ON n.customername = l.customername

WHERE n.customername IS NULL

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jackimo SSCommitted Points: 1885 More actions · Answer 8

Thanks Grant!

Your solution is great. Thanks for sharing!

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 9

Having this in the insert trigger shouldn't be too bad.

Create trigger trg_Name

on Customers after insert as

...

insert into DistinctTable(Name)

select i.Name

from inserted i

left join DistinctTable d

on i.Name = d.Name

where d.Name is null;

...

Only Name values not already existing in DistinctTable are inserted. Assuming the Name column in DistinctTable is indexed, the performance should be good. You would only use a Distinct query against the large Customers table after first creating DistinctTable. From then on, the trigger will maintain it with very little performance hit.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

Tomm Carr (3/19/2008)
Only Name values not already existing in DistinctTable are inserted. Assuming the Name column in DistinctTable is indexed, the performance should be good. You would only use a Distinct query against the large Customers table after first creating DistinctTable. From then on, the trigger will maintain it with very little performance hit.

Sorry to disagree - but there's no guarantee that the batch isn't carrrying duplicate names, so the trigger would have to insert DISTINCT names.

Another way would be to use EXISTS

Create trigger trg_Name

on Customers after insert as

...

insert into DistinctTable(Name)

select distinct i.Name

from inserted i

where not EXISTS (

select null

from DistinctTable d

where d.name=i.name

);

...

Same idea - just seems to run a little bit faster than the IN clause.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 11

Oops. One of those oversights that could go a really long time before manifesting itself. :blush:

However, I still like the join better. It's simple and easy to maintain -- and the optimizer should turn both ways into the same ex-plan.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms