July 30, 2009 at 6:35 pm
Hi.
I have a situation where I need to find the missing records exclusively from a table.
Here is my first table. This contains a set of account IDs that can have multiple expense headers:
create table #foo (
accountID int
, headerID int
, headerText varchar(40)
)
insert into #foo(accountID, headerID, headerText)
select 1, 1, 'abc' union all
select 1, 2, 'efg' union all
select 1, 3, 'iol' union all
select 2, 4, 'thy' union all
select 3, 1, 'abc' union all
select 3, 1, 'def'
Here are a set of mandatory headers. There can be other headers too, but they are not mandatory:
create table #headers(headerid int, headerText varchar(40))
insert into #headers(headerid, headerText)
select 1, 'abc' union all
select 1, 'def' union all
select 2, 'efg' union all
select 3, 'iol'
Now my requirement is to retrieve EXACTLY those rows in the account table that DO NOT have the mandatory
headers. I just need the account id and the missing header info.
After some attempts, here was some code that I came up with, but its neither fast, not does it return exactly what I want.
select distinct f.accountid, h.headerid, h.headertext
from #headers h, #foo f
where f.headerid not in (h.headerid)
and f.headertext not in (h.headertext)
Any ideas, gurus?
It seems like I am just not cut for T-SQL. I am an otherwise decent coder, who just can't think in T-SQL'ese'... :blink:
July 30, 2009 at 9:30 pm
I'm only missing one thing to really help you, what is the expected results based on the sample data provided above?
July 30, 2009 at 9:40 pm
But, I also think this is what you are looking for:
create table #foo (
accountID int
, headerID int
, headerText varchar(40)
);
insert into #foo(accountID, headerID, headerText)
select 1, 1, 'abc' union all
select 1, 2, 'efg' union all
select 1, 3, 'iol' union all
select 2, 4, 'thy' union all
select 3, 1, 'abc' union all
select 3, 1, 'def';
create table #headers(headerID int, headerText varchar(40));
insert into #headers(headerID, headerText)
select 1, 'abc' union all
select 1, 'def' union all
select 2, 'efg' union all
select 3, 'iol';
select distinct
f.accountID,
f.headerID,
f.headerText
from
#foo f
where
f.headerID not in (select h.headerID from #headers h)
or f.headerText not in (select h.headerText from #headers h);
drop table #foo;
drop table #headers;
July 30, 2009 at 10:24 pm
First, why does your #header table have a duplicate ID? The purpose of an ID field is to uniquely identify a particular record. Your headerID with a value one does not uniquely identify a record.
Second, it is good coding practice to have a single field as a primary key. Since your headerID does not uniquely specify a record, you are forced to have a compound key.
Third, because your #header table has a compound key, you are required to have both of those fields in your #foo table to uniquely identify a row. If you had a single field as a primary key, you would only need one field.
All that being said, the reason that you are having problems is that no single row in your #foo table contains enough information to make a decision about the required headers. You will need to do something to gather all of the necessary information in one place. Here is a start to a solution with a PIVOT, but this may not be practical with a large number of required headers. NOTE: I used the headerText as the "unique" identifier.
SELECT AccountID, [abc], [def], [efg], [iol]
FROM (
SELECT #foo.AccountID, #headers.headerID, #headers.headerText
FROM #foo
FULL OUTER JOIN #headers
ON #foo.headerID = #headers.headerID
AND #foo.headerText = #headers.headerText
) p
PIVOT (
Count(HeaderID)
FOR headerText
IN ( [abc], [def], [efg], [iol] )
) AS Pvt
and here is the beginning of one using a CTE. Again I used the headerText as a unique identifier.
WITH ReqHeaders AS (
SELECT Row_Number() OVER ( ORDER BY headerText ) AS RowNum, headerText
FROM #headers
)
SELECT accountID, #foo.headerText, RowNum
, Row_Number() OVER (PARTITION BY accountID ORDER BY RowNum ) As AcctHeaderNum
FROM ReqHeaders
FULL OUTER JOIN #foo
ON #foo.HeaderText = ReqHeaders.HeaderText
Notice that the row number from the header field does not match the row number for the account id when there is a required header missing. The problem is that it won't show the last required header if that is the only one missing, but at least this gives you an idea about a different way to think about the problem.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
July 31, 2009 at 12:46 am
I realized what I was missing when I first approached this problem. You have to get a result for every single combination of account ID and required header. This either means a pivot (as in my first approach) or a cross join. Here is what I came up with.
WITH Accounts AS (
SELECT DISTINCT accountID
FROM #foo
)
, ReqHeaders AS (
SELECT #headers.headerid, #headers.headerText, Accounts.accountID
FROM #headers
CROSS JOIN Accounts
)
SELECT ReqHeaders.accountID, ReqHeaders.headerid, ReqHeaders.HeaderText
FROM ReqHeaders
LEFT OUTER JOIN #foo
ON ReqHeaders.headerid = #foo.headerid
AND ReqHeaders.headerText = #foo.headerText
AND ReqHeaders.accountID = #foo.accountID
WHERE #foo.accountID is Null -- Only display the missing headers
ORDER BY ReqHeaders.AccountID, ReqHeaders.headerid, ReqHeaders.HeaderText
This will work for any number of required headers.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
July 31, 2009 at 1:30 pm
Ouch .... I thought it was apparent.... but I should have been clearer about what I am expecting.
I am expecting just the combinations that are missing from the #foo table.
So I am expecting the following:
AccountID : headerID: headerText
1: 1: def < -- There is only 1 mandatory header missing for this account id
2: 1: abc < -- Account id 2 has none of the mandatory headers
2: 1: def
2: 2: efg
2: 3: iol
3: 2: efg < -- Account id 3 has 2 missing mandatory headers
3: 3: iol
July 31, 2009 at 1:35 pm
Thanks for the reply Drew.
The data set I have given is artificial - so that we can attack the core quickly. Trust me, I have 'heavily' simplified the
structure of the tables that i am dealing with.
You might want to reformulate your query based on the expected result set that I added to this thread.
I am extremely sorry for not adding it to the original post ....
July 31, 2009 at 1:37 pm
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply