November 25, 2011 at 6:16 am
I have 2 excel spreadsheets, which have a common column, "drawing number". One file has 51 columns and over 100,000 records and the other has just one column and 68,000 rows. The single-column file has all unique records, whereas some drawing numbers are repeated in the large file.
I would like to generate a third file which is filtered using the single column, which has 51 colums and 68,000 rows.
How can I filter the large file with the smaller, so that only the first occurrance of the drawing number is output to the new file. (All subsequent ones can be ignored)?
November 25, 2011 at 6:26 am
Import both files into staging tables in SQL Server, perform a join with MIN and GROUP BY, then export to your new file. Alternatively, if you use SSIS, you can do the comparisons on the fly without importing into SQL Server. You will want to try both methods to see which performs better.
John
November 25, 2011 at 7:44 am
John, I have imported the tables into SSMS. Could you tell me what SQL query to run?
November 25, 2011 at 8:07 am
Unfortunately I don't have the structure of your tables, nor any of the data in them, so there's not much I can do. As I suggested above, your query is likely to look something like this:
SELECT x, MIN(y)
FROM a JOIN b ON a.n = b.n
GROUP BY x
Have a go at writing something, and post back if you're still struggling. We'll need table DDL in the form of CREATE TABLE statements, sample data in the form of INSERT statement, your expected output, and what you've already tried.
John
November 25, 2011 at 9:08 am
This is as far as I've got:
select min(dbo.Largefile."Drawing Number")
from dbo.Largefile join dbo.Smallfile on dbo.Largefile."Drawing Number" = dbo.Smallfile."Drawing Number"
Here are simplified samples of the tables:
dbo.Largefile:
Size of drawingISO CodeDrawing Number
2 1 2
4 A4 429
3 A3 2836
4 A4 2836
3 A4 5017
4 A4 5017
4 A4 5017
4 A4 4566
dbo.Smallfile:
Drawing Number
2
429
2836
5017
4566
November 28, 2011 at 1:17 am
Still need DDL, sample data in the form of INSERT statements and expected output. Here's a hint: use GROUP BY in the way that I suggested. If it doesn't work, please tell us how the output differs from what you expect.
John
November 28, 2011 at 4:28 am
This is what I've tried:
SELECT [Drawing Number], MIN([Desc])
FROM dbo.Largeone JOIN dbo.Smallone ON dbo.Largeone.[Drawing Number] = dbo.Smallone.[Drawing Number]
GROUP BY [Drawing Number]
but I get an error "Ambiguous column name - Drawing Number" on line 3
November 28, 2011 at 4:33 am
You just need to qualify your Drawing Number column in the GROUP BY clause by putting the table name in front of it, in the same way as you did in the line above. Since you're doing an INNER JOIN on an equality, it doesn't make any difference which one you choose.
John
November 28, 2011 at 7:13 am
Success! Filter is working great now, thanks John
Tom
November 28, 2011 at 9:33 am
Here is the command that I'm using which gives me 2 columns, Drawing Number and Project Number.
SELECT dbo.Largefile.[Drawing Number], MIN([Project Number])
FROM dbo.Largefile
JOIN dbo.Smallfile ON dbo.Largefile.[Drawing Number] = dbo.Smallfile.[Drawing Number]
GROUP BY dbo.Largefile.[Drawing Number]
How can I now add the other columns from "Largefile" to the result?
November 29, 2011 at 1:45 am
Now I'm confused, because your latest query contains a column (Project Number) that wasn't in the sample data you posted. This is why it's important for you to provide CREATE TABLE and INSERT statements at the very beginning - so that we don't have to guess what you're working with.
What you need to do now is join your result set back to the original table. Do Drawing Number and Project Number together uniquely identify the rows in the table? If they do, your task is easy:
SELECT <column list>
FROM LargeTable l
JOIN (<Your latest query here - be sure to alias the MIN column>) q
ON l.DrawingNumber = q.DrawingNumber
AND l.ProjectNumber = q.ProjectNumber
If they don't, you'll need to decide how to choose which row to return for each Drawing Number and then put another GROUP BY in the outer query.
John
November 30, 2011 at 1:57 am
John Mitchell-245523 (11/29/2011)
Now I'm confused, because your latest query contains a column (Project Number) that wasn't in the sample data you posted. This is why it's important for you to provide CREATE TABLE and INSERT statements at the very beginning - so that we don't have to guess what you're working with.What you need to do now is join your result set back to the original table. Do Drawing Number and Project Number together uniquely identify the rows in the table? If they do, your task is easy:
SELECT <column list>
FROM LargeTable l
JOIN (<Your latest query here - be sure to alias the MIN column>) q
ON l.DrawingNumber = q.DrawingNumber
AND l.ProjectNumber = q.ProjectNumber
If they don't, you'll need to decide how to choose which row to return for each Drawing Number and then put another GROUP BY in the outer query.
John
A couple of things I'm not sure about:
-When you say "your latest query here" - do I pop in the whole query here, in brackets?
-The "q" character at the end - what is the purpose of this?
-"Alias the MIN" - please explain
Tom
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply