September 13, 2017 at 11:06 am
Its been a while since I have worked with SQL. So much that I am struggling to ask this appropriately so I will try best to spell out what I am If I have two such tables like:
create table inv (
inv_id3 int,
inv_desc varchar(200),
column1 char(6),
column2 char(200)
);
create table item_stores (
item_id int,
item_size_id varchar(200),
item_size2_id varchar(200),
column1 char(6),
column2 char(6),
column3 char(6)
);
where their contents look like this:
insert into inv values
(208798,'PANTS1'),
(108829,'PANTS2'),
(108828,'PANTS3');
insert into item_stores values
(208798,'34','34'),
(108829,'34','34'),
(108829,'34','i34'),
(108829,'34','i32'),
(108829,'34','i30'),
(108829,'34','i36'),
(108829,'34','36'),
(108828,'34','34');
What I am wanting to do is expose a list like "inv.inv_id3 , inv.inv_desc" of those records where situations like this exist
(108829,'34','34'),
(108829,'34','i34'),
Where someone has entered a size incorrectly. The letter I at the beginning is the correct format. The trick is that it isn't always the correct format. In other cases within the same data set the lack of the letter I is correct depending on how the item was first entered into the system. So, I am left to spot where this double occurrence occurs is that makes sense and call out that item_id.
My questions is short of ssomeone being over generous and providing a sample. Could you in layman's terms provide some guidance on the best way to proceed. I have to accomplish this in the query without the use of any other tooling. A select statement of sorts,
September 13, 2017 at 11:21 am
Well done on taking the time to post all of your set-up scripts ... this is a great start.
But I am having trouble understanding what is so special about the 'situations' which you wish to see. Can you please have another go at explaining the logic required to identify a 'situation'?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 13, 2017 at 11:32 am
I need to identify all the IDs that that have a letter "I" in one of their size combinations. Then determine if for that size combination containing the letter "I" if there is a like size combination without the letter I. Then return a list of bad IDs so that I can do later work moving inventory out of the bad size combinations into the correct ones and then purge the DB of all the mistakes.
The thing I worry about is if there are a couple million combinations, how to grab all of the size combinations for a given ID and then act upon that subset rather than comparing every record against the sum of all records
Does this make sense?
September 13, 2017 at 11:55 am
CaseyP 58642 - Wednesday, September 13, 2017 11:32 AMI need to identify all the IDs that that have a letter "I" in one of their size combinations. Then determine if for that size combination containing the letter "I" if there is a like size combination without the letter I. Then return a list of bad IDs so that I can do later work moving inventory out of the bad size combinations into the correct ones and then purge the DB of all the mistakes.The thing I worry about is if there are a couple million combinations, how to grab all of the size combinations for a given ID and then act upon that subset rather than comparing every record against the sum of all records
Does this make sense?
Makes some sense 🙂
So can you tell me why you would be interested in this pair
(108829,'34','34'),
(108829,'34','i34')
but not in this pair
(108829,'34','i36'),
(108829,'34','36'),
?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 13, 2017 at 12:02 pm
You are correct. I am interested in any pairing that occurs. With all of the possible sizes and combinations I wanted to keep the post as brief as possible but as far as the example data is concerned, the output would be 108829, PANTS2 because there is at a minimum one pair. In this example there are 2 matches.
It would be cool to output results that showed the ID and the conflicting sizes so that cleanup would be easier and the scope of the cleanup better known but I am having this mental block on how just to identify which IDs are the problem to begin with
September 13, 2017 at 12:58 pm
Try running this and see what you think. I'm not surprised that you had trouble coding it. There must be a better way than what I've come up with.CREATE TABLE #item_stores
(
item_id INT
, item_size_id VARCHAR(200)
, item_size2_id VARCHAR(200)
);
INSERT #item_stores
(
item_id
, item_size_id
, item_size2_id
)
VALUES
(
208798, '34', '34'
)
,(
108829, '34', '34'
)
,(
108829, '34', 'i34'
)
,(
108829, '34', 'i32'
)
,(
108829, '34', 'i30'
)
,(
108829, '34', 'i36'
)
,(
108829, '34', '36'
)
,(
108828, '34', '34'
);
SELECT *
FROM #item_stores ist;
WITH PrevValues
AS
(
SELECT
ist.item_id
, ist.item_size_id
, Prev_item_size_id = LAG(ist.item_size_id) OVER (PARTITION BY ist.item_id
ORDER BY
ist.item_id
, Cleaned.Clean_item_size_id
, Cleaned.Clean_item_size2_id
)
, ist.item_size2_id
, Prev_item_size2_id = LAG(ist.item_size2_id) OVER (PARTITION BY ist.item_id
ORDER BY
ist.item_id
, Cleaned.Clean_item_size_id
, Cleaned.Clean_item_size2_id
)
, Cleaned.Clean_item_size_id
, Cleaned.Clean_item_size2_id
FROM
#item_stores ist
CROSS APPLY
(
SELECT
Clean_item_size_id = REPLACE(ist.item_size_id, 'i', '')
, Clean_item_size2_id = REPLACE(ist.item_size2_id, 'i', '')
) Cleaned
)
SELECT
pv.item_id
, pv.item_size_id
, pv.item_size2_id
FROM PrevValues pv
WHERE
pv.Prev_item_size_id IS NOT NULL
AND (
(
pv.item_size_id = pv.Clean_item_size_id
AND pv.item_size_id <> pv.Prev_item_size_id
)
OR (
pv.item_size2_id = pv.Clean_item_size2_id
AND pv.item_size2_id <> pv.Prev_item_size2_id
)
);
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 13, 2017 at 1:10 pm
Phil Parkin - Wednesday, September 13, 2017 12:58 PMTry running this and see what you think. I'm not surprised that you had trouble coding it. There must be a better way than what I've come up with.CREATE TABLE #item_stores
(
item_id INT
, item_size_id VARCHAR(200)
, item_size2_id VARCHAR(200)
);INSERT #item_stores
(
item_id
, item_size_id
, item_size2_id
)
VALUES
(
208798, '34', '34'
)
,(
108829, '34', '34'
)
,(
108829, '34', 'i34'
)
,(
108829, '34', 'i32'
)
,(
108829, '34', 'i30'
)
,(
108829, '34', 'i36'
)
,(
108829, '34', '36'
)
,(
108828, '34', '34'
);SELECT *
FROM #item_stores ist;WITH PrevValues
AS
(
SELECT
ist.item_id
, ist.item_size_id
, Prev_item_size_id = LAG(ist.item_size_id) OVER (PARTITION BY ist.item_id
ORDER BY
ist.item_id
, Cleaned.Clean_item_size_id
, Cleaned.Clean_item_size2_id
)
, ist.item_size2_id
, Prev_item_size2_id = LAG(ist.item_size2_id) OVER (PARTITION BY ist.item_id
ORDER BY
ist.item_id
, Cleaned.Clean_item_size_id
, Cleaned.Clean_item_size2_id
)
, Cleaned.Clean_item_size_id
, Cleaned.Clean_item_size2_id
FROM
#item_stores ist
CROSS APPLY
(
SELECT
Clean_item_size_id = REPLACE(ist.item_size_id, 'i', '')
, Clean_item_size2_id = REPLACE(ist.item_size2_id, 'i', '')
) Cleaned
)
SELECT
pv.item_id
, pv.item_size_id
, pv.item_size2_id
FROM PrevValues pv
WHERE
pv.Prev_item_size_id IS NOT NULL
AND (
(
pv.item_size_id = pv.Clean_item_size_id
AND pv.item_size_id <> pv.Prev_item_size_id
)
OR (
pv.item_size2_id = pv.Clean_item_size2_id
AND pv.item_size2_id <> pv.Prev_item_size2_id
)
);
I would just do a count by item number and find any that have more than 1 value.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 13, 2017 at 1:16 pm
First, mind blowing. That someone would help out in such a way is phenomenal. Truly appreciate it. I have been testing this on:
http://sqlfiddle.com/#!6/d6f3d6/2
because the source system is heavily used and I am loathe to run ad hock queries against it until I am ready and sure of the results. I'm not getting the same results you are. That may just be the site and I may just have to wait until after hours to test what you have provided. I am super stoked though at the potential. I'm really going to have to work through what you have done. Thanks so much.
September 13, 2017 at 1:19 pm
drew.allen - Wednesday, September 13, 2017 1:10 PMI would just do a count by item number and find any that have more than 1 value.Drew
Wouldn't that find too many rows? Eg, that would identify
(1,1,32)
and
(1,1,36)
which (as far as I can see) should not be considered invalid.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 13, 2017 at 1:23 pm
Oh snap, I caught my problem with sqlfiddle. It didn't like the # sign. It works awesome !!! Again, simply awesome.
September 13, 2017 at 1:25 pm
CaseyP 58642 - Wednesday, September 13, 2017 1:16 PMFirst, mind blowing. That someone would help out in such a way is phenomenal. Truly appreciate it. I have been testing this on:http://sqlfiddle.com/#!6/d6f3d6/2
because the source system is heavily used and I am loathe to run ad hock queries against it until I am ready and sure of the results. I'm not getting the same results you are. That may just be the site and I may just have to wait until after hours to test what you have provided. I am super stoked though at the potential. I'm really going to have to work through what you have done. Thanks so much.
No problem, it was a fun diversion.
Here are the results I see, by the way:
Which appears to be what you are looking for, more or less.
But I've just noticed that we're in a 2005 forum, so that is an issue (I apologise). The LAG() function was introduced in 2012 and without it, the query will get even more complex.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 13, 2017 at 1:32 pm
Sadly I have to either export out the data anyway to SQL which was what I was thinking to do as the system is an older INFORMIX release or work it into INFORMIX's limited scope of SQL.The logic is what matters. The key thing was a starting point in logic which you have amply provided. I get the same results and you are absolutely on target and Ive an express instance where I think I can copy the data set to. But that made my head hurt so you idea of an interesting diversion and mine differ just slightly, LMAO.
September 13, 2017 at 1:51 pm
CaseyP 58642 - Wednesday, September 13, 2017 1:32 PMSadly I have to either export out the data anyway to SQL which was what I was thinking to do as the system is an older INFORMIX release or work it into INFORMIX's limited scope of SQL.The logic is what matters. The key thing was a starting point in logic which you have amply provided. I get the same results and you are absolutely on target and Ive an express instance where I think I can copy the data set to. But that made my head hurt so you idea of an interesting diversion and mine differ just slightly, LMAO.
I'm glad you found it helpful and thanks for posting back.
Just in case you don't know, SQL Server 2016 Developer Edition is free, so you can easily download and install it somewhere to test this stuff out on a 'proper' SQL instance local to you.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply