March 24, 2019 at 12:38 pm
I have a table with records. Each set of records is numbered 1 thru 5. I need to iterate thru each set and find records that do not have all five values
For example I need to delete the records in red since they are not a complete set. How can I do this in sql. Thanks
March 24, 2019 at 3:42 pm
So create a small table (or use VALUES as a temporary table constructor) with the numbers 1-5 in it and do an outer join to it with the table you're checking. Anything that returns a null means that it's missing values. If you expose all 5 values in the 1-5 table (or table constructor), you'll even be able to determine which of five values are missing.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 24, 2019 at 3:47 pm
Jeff Moden - Sunday, March 24, 2019 3:42 PMSo create a small table (or use VALUES as a temporary table constructor) with the numbers 1-5 in it and do an outer join to it with the table you're checking. Anything that returns a null means that it's missing values. If you expose all 5 values in the 1-5 table (or table constructor), you'll even be able to determine which of five values are missing.
The problem that I see with the table you posted is that there is duplication in both column 1 and 2 and there is no column apparent to "group the dupes" with, which also means that this is pretty much an impossible task to resolve. Is there some other column that delineates one group of 5 from another or are you simply relying on the unreliable thought of having 5 contiguous rows in a group according to the row position delineated by the ID column (which is a very bad and unreliable way to do such a thing, BTW)?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 24, 2019 at 5:35 pm
rpatlan - Sunday, March 24, 2019 12:38 PMI have a table with records. Each set of records is numbered 1 thru 5. I need to iterate thru each set and find records that do not have all five valuesFor example I need to delete the records in red since they are not a complete set. How can I do this in sql. Thanks
This should give you what you're looking for...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Id INT NOT NULL PRIMARY KEY,
Col2 TINYINT NOT NULL
);
INSERT #TestData (Id, Col2) VALUES
(6, 1),(7, 2),(8, 3),(9, 4),(10, 5),(11, 1),(12, 2),
(13, 3),(14, 1),(15, 2),(16, 3),(17, 4),(18, 5);
-- SELECT * FROM #TestData td;
--=============================
/****** Warning: The following solution relies on a lack of gaps in the Id column, within a group and the proper insertion order of Col2. Any variation in that pattern could result in the deletion of valid data.
Before DELETEing any data, you are advised to execute this as a SELECT to make sure you aren’t going to delete data valid data. ******/
WITH
cte_add_cnt_group AS (
SELECT
td.Id,
td.Col2,
cnt = COUNT(1) OVER (PARTITION BY Id - td.Col2)
FROM
#TestData td
)
SELECT *
--DELETE acg
FROM
cte_add_cnt_group acg
WHERE
acg.cnt < 5;
--=============================
-- check data after delete
SELECT * FROM #TestData td;
March 24, 2019 at 7:22 pm
Great idea, Jason. Just a caution though...
While that will work for the given data, it won't always work... all you need is some gaps in the ID column and it won't work correctly any more.
However, if you created a computed column as a sequential row number in the same order as the ID column and played the count formula against that instead of the ID column, then that would most likely work all of the time even if there were gaps in the ID column.
As a bit of a sidebar, if things are supposed to be in complete sets of 5, why is this problem centering around a set of 3? Seems to me that something went haywire and I'd be loath to just up and delete the evidence.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 24, 2019 at 8:24 pm
Jeff Moden - Sunday, March 24, 2019 7:22 PMGreat idea, Jason. Just a caution though...While that will work for the given data, it won't always work... all you need is some gaps in the ID column and it won't work correctly any more.
However, if you created a computed column as a sequential row number in the same order as the ID column and played the count formula against that instead of the ID column, then that would most likely work all of the time even if there were gaps in the ID column.
As a bit of a sidebar, if things are supposed to be in complete sets of 5, why is this problem centering around a set of 3? Seems to me that something went haywire and I'd be loath to just up and delete the evidence.
Thank you sir. 🙂
Yes, I did think about that and also the possibility that the Col2 values could be inserted out of order. That's an inherent problem with data designs that that rely on insert order to create meaning. Given the numerous ways in which this type of schema can go south, I figured it was best to simply solve for the data that was supplied and let the OP identify other anomalous data patters if they exist.
Good point about researching, rather than deleting... Which reminds me that I should have issued a warning with my previous solution...
It will delete potentially valid data if it doesn't follow the established pattern. I would suggest executing it as a SELECT, verify that valid data is not going to be deleted, BEFORE actually deleting it.
March 24, 2019 at 9:39 pm
Thanks for all your comments. This data is coming from text files. It is being parsed and cleaned as much as possible. So that is why some sets of 5 are missing. Although the records will be deleted this is only to allow for the import into a table. This data will be validated against production data. The sql works as I had wanted. Again thanks.
March 24, 2019 at 9:46 pm
The id column is an identity field with numbers generated auto so it should be o.k.
March 25, 2019 at 7:10 am
Thank you for the feedback and clarification. I'm glad the solution will work for you. 🙂
March 25, 2019 at 7:25 am
rpatlan - Sunday, March 24, 2019 9:39 PMThanks for all your comments. This data is coming from text files. It is being parsed and cleaned as much as possible. So that is why some sets of 5 are missing. Although the records will be deleted this is only to allow for the import into a table. This data will be validated against production data. The sql works as I had wanted. Again thanks.
So you've researched why some of the sets have less than 5 items and know 100% for sure that there isn't a problem either on your end or the provider of the data end and that you're not actually deleting meaningful data that could be critical to whatever you're doing?
I found that a lot of people that say "Yes" to the above question end up in deep Kimchi somewhere down the road. Be careful and don't assume. Get something in writing, if for no other reason, than to protect yourself.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 25, 2019 at 7:31 am
Jason A. Long - Sunday, March 24, 2019 8:24 PMJeff Moden - Sunday, March 24, 2019 7:22 PMGreat idea, Jason. Just a caution though...While that will work for the given data, it won't always work... all you need is some gaps in the ID column and it won't work correctly any more.
However, if you created a computed column as a sequential row number in the same order as the ID column and played the count formula against that instead of the ID column, then that would most likely work all of the time even if there were gaps in the ID column.
As a bit of a sidebar, if things are supposed to be in complete sets of 5, why is this problem centering around a set of 3? Seems to me that something went haywire and I'd be loath to just up and delete the evidence.
Thank you sir. 🙂
Yes, I did think about that and also the possibility that the Col2 values could be inserted out of order. That's an inherent problem with data designs that that rely on insert order to create meaning. Given the numerous ways in which this type of schema can go south, I figured it was best to simply solve for the data that was supplied and let the OP identify other anomalous data patters if they exist.
Good point about researching, rather than deleting... Which reminds me that I should have issued a warning with my previous solution...
It will delete potentially valid data if it doesn't follow the established pattern. I would suggest executing it as a SELECT, verify that valid data is not going to be deleted, BEFORE actually deleting it.
Totally agreed on all of that and thank you for taking my comments the right way. Concerning the bold underlined stuff, let's hope the OP truly understands why the anomalous data is there and that it's not actually an indication of a problem that is getting ready to bite the company.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 25, 2019 at 7:36 am
Yes, it is not a problem. We already have the data so we can cross reference with imported data to determine which records are missing. We can then go and get complete record if needed.
Thanks again.
March 25, 2019 at 7:51 am
rpatlan - Monday, March 25, 2019 7:36 AMYes, it is not a problem. We already have the data so we can cross reference with imported data to determine which records are missing. We can then go and get complete record if needed.Thanks again.
I guess I don't understand that. You already have the data but you're importing it again? And it comes across with missing data? Why not have an up-close and personal conversation with the data provider and get them to straighten out their act? 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
March 25, 2019 at 8:28 am
This data is exported from another older dos system to text file and has additional information that is needed. We have the matching Client ID. If data is missing we will be able to determine by doing left join. The missing data can be retrieved at a later date. Hard to explain process and why it id done this way. Be assured that the Client is aware of the issues.
March 25, 2019 at 9:35 am
rpatlan - Monday, March 25, 2019 8:28 AMThis data is exported from another older dos system to text file and has additional information that is needed. We have the matching Client ID. If data is missing we will be able to determine by doing left join. The missing data can be retrieved at a later date. Hard to explain process and why it id done this way. Be assured that the Client is aware of the issues.
K. Just trying to watch out for you. Thank you for the feedback and totally understood on "Hard to explain" processes.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply