December 12, 2014 at 8:01 am
Hello all,
I need some help putting together a query.
I have some data in the following format;
MYTABLE
DOC_NO // REV_NO // FILE_NAME
ABC123 // A // abc123.pdf
ABC123 // B // abc123_2.docx
ABC124 // A // abc124.xlsx
ABC124 // A // -
ABC125 // A // abc125.docx
ABC125 // C // abc125.jpg
ABC125 // C // abc125.docx
ABC125 // C // -
ABC126 // 0 // -
ABC127 // A1 // abc127.xlsx
ABC127 // A1 // abc127.pdf
I'm looking to select all rows where the DOC_NO and REV_NO appear only once.
(i.e. the combination of the two values together, not any distinct value in a column)
I have written the sub query to filter the correct results;
SELECT DOC_NO, REV_NO FROM [MYTABLE]
GROUP BY DOC_NO, REV_NO
HAVING COUNT(*) =1
I now need to strip out the records which have no file (represented as "-" in the FILE_NAME field) and select the other fields (same table - for example, lets just say "ADD1", "ADD2" and "ADD3")
I was looking to put together a query like;
SELECT DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3 FROM [MYTABLE]
WHERE FILE_NAME NOT LIKE '-' AND DOC_NO IN
(SELECT DOC_NO, REV_NO FROM [MYTABLE]
GROUP BY DOC_NO, REV_NO
HAVING COUNT(*) =1)
But of course, DOC_NO alone being in the subquery select is not sufficient, as (ABC125 /A) is a unique combination, but (ABC125 /C) is not, but these results would be pulled in.
I also cannot simply add an additional "AND" clause on its own to make sure the REV_NO value appears in the subquery, because it is highly repetitive and would have to specifically match the DOC_NO)
What is the easiest way of ensuring that I only pull in the records where both the DOC_NO and REV_NO (combination) are unique, or is there a better way of putting this select together altogether?
Many thanks in advance for any assistance.
December 12, 2014 at 8:30 am
I have about 4 ideas of what you might want from this. Can you put together some ddl and sample data in a consumable format? Also, if you can demonstrate what the desired output is based on your sample data it would greatly help. Please take a few minutes and read the first link in my signature for best practices when posting questions.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 12, 2014 at 9:27 am
CREATE TABLE MYTABLE
(
DOC_NO NVARCHAR (10),
REV_NO NVARCHAR (5),
FILE_NAME NVARCHAR (20),
ADD1 NVARCHAR (10),
ADD2 NVARCHAR (10),
ADD3 INT
)
INSERT INTO MYTABLE (DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3)
SELECT 'ABC123', 'A', 'abc123.pdf', 'CON1', 'LOC1', 1 UNION ALL
SELECT 'ABC123', 'B', 'abc123_2.docx', 'CON1', 'LOC2', 2 UNION ALL
SELECT 'ABC124', 'A', 'abc124.xlsx', 'CON2', 'LOC1', 9 UNION ALL
SELECT 'ABC124', 'A', '-', 'CON2', 'LOC3', 4 UNION ALL
SELECT 'ABC125', 'A', 'abc125.docx', 'CON1', 'LOC1', 6 UNION ALL
SELECT 'ABC125', 'C', 'abc125.jpg', 'CON1', 'LOC4', 1 UNION ALL
SELECT 'ABC125', 'C', 'abc125.docx', 'CON3', 'LOC7', 2 UNION ALL
SELECT 'ABC125', 'C', '-', 'CON3', 'LOC1', 4 UNION ALL
SELECT 'ABC126', '0', '-', 'CON7', 'LOC2', 5 UNION ALL
SELECT 'ABC127', 'A1', 'abc127.xlsx', 'CON1', 'LOC1', 7 UNION ALL
SELECT 'ABC127', 'A1', 'abc127.pdf', 'CON1', 'LOC1', 3
And the output:
DOC_NOREV_NOFILE_NAMEADD1ADD2ADD3
ABC123Aabc123.pdfCON1LOC11 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)
ABC123Babc123_2.docxCON1LOC2 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)
ABC124Aabc124.xlsxCON2LOC19 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC124A-CON2LOC34 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)
ABC125Aabc125.docxCON1LOC16 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC125Cabc125.jpgCON1LOC41 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC125Cabc125.docxCON3LOC72 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC125C-CON3LOC14 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)
ABC1260-CON7LOC25 ---> FILTERED OUT (NO FILE)
ABC127A1abc127.xlsxCON1LOC17 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)
ABC127A1abc127.pdfCON1LOC13
December 12, 2014 at 9:38 am
McSQL (12/12/2014)
CREATE TABLE MYTABLE(
DOC_NO NVARCHAR (10),
REV_NO NVARCHAR (5),
FILE_NAME NVARCHAR (20),
ADD1 NVARCHAR (10),
ADD2 NVARCHAR (10),
ADD3 INT
)
INSERT INTO MYTABLE (DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3)
SELECT 'ABC123', 'A', 'abc123.pdf', 'CON1', 'LOC1', 1 UNION ALL
SELECT 'ABC123', 'B', 'abc123_2.docx', 'CON1', 'LOC2', 2 UNION ALL
SELECT 'ABC124', 'A', 'abc124.xlsx', 'CON2', 'LOC1', 9 UNION ALL
SELECT 'ABC124', 'A', '-', 'CON2', 'LOC3', 4 UNION ALL
SELECT 'ABC125', 'A', 'abc125.docx', 'CON1', 'LOC1', 6 UNION ALL
SELECT 'ABC125', 'C', 'abc125.jpg', 'CON1', 'LOC4', 1 UNION ALL
SELECT 'ABC125', 'C', 'abc125.docx', 'CON3', 'LOC7', 2 UNION ALL
SELECT 'ABC125', 'C', '-', 'CON3', 'LOC1', 4 UNION ALL
SELECT 'ABC126', '0', '-', 'CON7', 'LOC2', 5 UNION ALL
SELECT 'ABC127', 'A1', 'abc127.xlsx', 'CON1', 'LOC1', 7 UNION ALL
SELECT 'ABC127', 'A1', 'abc127.pdf', 'CON1', 'LOC1', 3
And the output:
DOC_NOREV_NOFILE_NAMEADD1ADD2ADD3
ABC123Aabc123.pdfCON1LOC11 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)
ABC123Babc123_2.docxCON1LOC2 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)
ABC124Aabc124.xlsxCON2LOC19 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC124A-CON2LOC34 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)
ABC125Aabc125.docxCON1LOC16 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC125Cabc125.jpgCON1LOC41 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC125Cabc125.docxCON3LOC72 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)
ABC125C-CON3LOC14 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)
ABC1260-CON7LOC25 ---> FILTERED OUT (NO FILE)
ABC127A1abc127.xlsxCON1LOC17 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)
ABC127A1abc127.pdfCON1LOC13
It is not really clear what you want as the output you posted is really not legible.
Something like this maybe???
select *
from MYTABLE t
join
(
select DOC_NO, REV_NO
from MYTABLE mt
group by DOC_NO, REV_NO
having nullif(min(FILE_NAME), '-') > ''
) x on x.DOC_NO = t.DOC_NO
and x.REV_NO = t.REV_NO
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 12, 2014 at 9:43 am
That looks about right, Sean. I was just about to suggest that the last query in the original post was nearly there - it just needs the IN changed to a JOIN on DOC_NO and REV_NO.
John
December 12, 2014 at 11:52 am
Thanks for the query.
Unfortunately it is filtering out the documents with no file, but isn't filtering out duplicate DOC_NO + REV_NO, so seems to be acting the same as a SELECT * FROM MYTABLE WHERE FILE_NAME NOT LIKE '-'
December 12, 2014 at 11:54 am
Just to add clarity if there is any confusion as to what i'm struggling to select;
There are several DOC_NO duplicates
There are many more REV_NO duplicates
I want to select all records, where the combination of DOC_NO and REV_NO in a row, only appear once.
There is a secondary requirement to remove records with no file (represented with a value of '-') but if I can get the first part working, that should be fairly simple to implement.
Many thanks.
December 12, 2014 at 11:57 am
McSQL (12/12/2014)
Just to add clarity if there is any confusion as to what i'm struggling to select;There are several DOC_NO duplicates
There are many more REV_NO duplicates
I want to select all records, where the combination of DOC_NO and REV_NO in a row, only appear once.
There is a secondary requirement to remove records with no file (represented with a value of '-') but if I can get the first part working, that should be fairly simple to implement.
Many thanks.
Still kind of a guess because the desired output is totally unclear. If you could repost that in a temp table or some format that includes ONLY the rows you want it would be helpful.
Is this getting you closer?
select *
from MYTABLE t
join
(
select DOC_NO, REV_NO
from MYTABLE mt
group by DOC_NO, REV_NO
having nullif(min(FILE_NAME), '-') > ''
and COUNT(DOC_NO + '_' + REV_NO) = 1
) x on x.DOC_NO = t.DOC_NO
and x.REV_NO = t.REV_NO
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 12, 2014 at 7:28 pm
use COUNT() OVER... to select the ones you want.
with uniq as
(
select DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3, count(*) over(partition by DOC_NO, REV_NO) as cnt
from MYTABLE
where FILE_NAME <> '-'
)
select DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3
from uniq
where cnt=1;
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 12, 2014 at 7:50 pm
Many thanks, that works perfectly, is exactly what I was looking for.
Very much appreciated 🙂
December 12, 2014 at 7:59 pm
I tried the second query you added Sean also, and following it through it looks spot on in the logic too.
It's somehow filtering out 59 extra records ( in a 50,000 record set).
Not really sure why, as it looks perfect to me.
I'll take a look at what records make up the difference and will report back if I get to the bottom of it.
Might just be something else in my data.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply