December 7, 2006 at 1:35 am
We have 3 tables of customer transactions: donations, communications and sales, from which I want to find the earliest transaction. Generally these all have different fields though the following fields appear in all 3:
id (primary key)
transaction_date
communication_id
I need a neat way of getting the communication_id of the row with the earliest date, no matter which table it came from. Two of the tables have over a million records. Any two tables could have a transaction on the same day and each table could have more than one transaction on a day, but an arbitrary choice between these is acceptable.
Any ideas?
December 7, 2006 at 1:36 am
Sorry, this should have been entitled: 'Picking earliest transaction ...', but then it's still early in the day in London...
December 7, 2006 at 6:08 am
---DDL--
if
object_id('test_donations') is not null drop table test_donations
if
object_id('test_communications') is not null drop table test_communications
if
object_id('test_sales') is not null drop table test_sales
create table test_donations (id int identity, transaction_date smalldatetime, communication_id int)
create
table test_communications (id int identity, transaction_date smalldatetime, communication_id int)
create
table test_sales (id int identity, transaction_date smalldatetime, communication_id int)
---Populate Test Data--
insert test_donations
select
getdate()-1 , 15 union all
select
getdate()-2 , 14 union all
select
getdate()-3 , 13 union all
select
getdate()-4 , 12 union all
select
getdate()-520 , 11 ----Earliest date so query should return 11
insert
test_communications
select
getdate()-6 , 10 union all
select
getdate()-7 , 9 union all
select
getdate()-8 , 8 union all
select
getdate()-9 , 7 union all
select
getdate()-10 ,6
insert test_sales
select getdate()-11 , 5 union all
select
getdate()-12 , 4 union all
select
getdate()-13 , 3 union all
select
getdate()-14 , 2 union all
select
getdate()-15 , 1
--query returning communication id with earliest transaction date--
select
top 1 communication_id
from
(
select
transaction_date, communication_id
from
test_donations
union
all
select
transaction_date, communication_id
from
test_communications
union
all
select
transaction_date, communication_id
from
test_sales
)
x
order
by transaction_date asc
December 7, 2006 at 6:23 am
Yes, that certainly works. Thanks.
I now have to find the earliest communication_id for each customer: each table has a customer_id field. Will your code work with suitable modification? i.e. the output should contain two fields: customer_id and communication_id, one row per customer.
December 7, 2006 at 9:07 am
select min(transaction_date), customer_id
from
(
select
transaction_date, communication_id, customer_id
from
test_donations
union
all
select
transaction_date, communication_id, customer_id
from
test_communications
union
all
select
transaction_date, communication_id, customer_id
from
test_sales
)
x
group by customer_id
December 7, 2006 at 9:13 am
... but I need to have the communication_id of the earliest transaction for each customer!
December 7, 2006 at 9:20 am
Then how does Customer_ID relate to these tables. You told us he only common field names are
id (primary key)
transaction_date
communication_id
December 7, 2006 at 9:30 am
sorry, it IS a common field. I didn't originally think it was key, but in fact it is what makes the problem so tricky.
I'm sure this can be done using a cursor and stepping through customer by customer, e.g. using Jules' code. But can it be done on a dataset with just SQL?
December 7, 2006 at 10:07 am
---DDL--
if
object_id('test_donations') is not null drop table test_donations
if
object_id('test_communications') is not null drop table test_communications
if
object_id('test_sales') is not null drop table test_sales
create
table test_donations (id int identity, transaction_date smalldatetime, communication_id int, Customer_ID int )
create
table test_communications (id int identity, transaction_date smalldatetime, communication_id int, Customer_ID int)
create
table test_sales (id int identity, transaction_date smalldatetime, communication_id int, Customer_ID int)
---Populate Test Data--
insert
test_donations
select
getdate()-1 , 15,1 union all
select
getdate()-2 , 14,1 union all
select
getdate()-3 , 13,1 union all
select
getdate()-4 , 12,2 union all
select
getdate()-520 , 11,2 ----Earliest date so query should return 11
insert
test_communications
select
getdate()-6 , 10,3 union all
select
getdate()-123 , 9,3 union all
select
getdate()-8 , 8,3 union all
select
getdate()-9 , 7,4 union all
select
getdate()-10 ,6,4
insert
test_sales
select
getdate()-11 , 5,4 union all
select
getdate()-12 , 4,4 union all
select
getdate()-13 , 3,5 union all
select
getdate()-14 , 2,5 union all
select
getdate()-15 , 1,5
--query returning communication id with earliest transaction date--
select
stuff
(min(convert(varchar(20),transaction_date, 102)+ cast(communication_id as varchar(10))),1,10,'')EarliestCommunicationID,
max
(Customer_ID) CustomerID
from
(
select
transaction_date, communication_id, Customer_ID
from
test_donations
union
all
select
transaction_date, communication_id, Customer_ID
from
test_communications
union
all
select
transaction_date, communication_id, Customer_ID
from
test_sales
)
x
group
by Customer_ID
December 7, 2006 at 10:11 am
Version above only distinguishes between transaction on different days. Version below will do it down to milli seconds which is probably what you want.
---DDL--
if
object_id('test_donations') is not null drop table test_donations
if
object_id('test_communications') is not null drop table test_communications
if
object_id('test_sales') is not null drop table test_sales
create
table test_donations (id int identity, transaction_date smalldatetime, communication_id int, Customer_ID int )
create
table test_communications (id int identity, transaction_date smalldatetime, communication_id int, Customer_ID int)
create
table test_sales (id int identity, transaction_date smalldatetime, communication_id int, Customer_ID int)
---Populate Test Data--
insert
test_donations
select
getdate()-1 , 15,1 union all
select
getdate()-2 , 14,1 union all
select
getdate()-3 , 13,1 union all
select
getdate()-4 , 12,2 union all
select
getdate()-520 , 11,2 ----Earliest date so query should return 11
insert
test_communications
select
getdate()-6 , 10,3 union all
select
getdate()-123 , 9,3 union all
select
getdate()-8 , 8,3 union all
select
getdate()-9 , 7,4 union all
select
getdate()-10 ,6,4
insert
test_sales
select
getdate()-11 , 5,4 union all
select
getdate()-12 , 4,4 union all
select
getdate()-13 , 3,5 union all
select
getdate()-14 , 2,5 union all
select
getdate()-15 , 1,5
--query returning communication id with earliest transaction date--
select
stuff
(min(convert(varchar(20),transaction_date, 20)+ cast(communication_id as varchar(10))),1,19,'')EarliestCommunicationID,
max
(Customer_ID) CustomerID
from
(
select
transaction_date, communication_id, Customer_ID
from
test_donations
union
all
select
transaction_date, communication_id, Customer_ID
from
test_communications
union
all
select
transaction_date, communication_id, Customer_ID
from
test_sales
)
x
group
by Customer_ID
December 8, 2006 at 1:50 am
Thanks, I think this is the way to do it. I can also add in a code for the tables and so prioritise those in a certain order.
I'll have to see how long it takes to run, though I was intending to run it overnight and refresh a table with the results.
It's a pity the 'top 1' construct could be used because that was (more) neat!
Anyway, thanks a lot.
December 8, 2006 at 3:46 am
December 8, 2006 at 8:08 am
it takes about 50 seconds to run which isn't too bad.
what I meant was that I can prioritise one table over another id tey both have entries for the same date and time (the time is often 00:00:00).
Cheers.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply