February 29, 2012 at 5:08 pm
CREATE Table #Fred(OrganisationID int, LocationID int)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,2)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,2)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,7)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,7)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,7)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,5)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,6)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(2,18)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(2,19)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(2,19)
SELECT * FROM #Fred
DROP Table #Fred
The code above produces data that looks like:
1,2
1,2
1,7
1,7
1,7
1,5
1,6
2,18
2,19
2,19
In the second column - for any distinct value in the first column - how can I find out which value in the second column occurs most often - and update all the values in Column 2 to that value which occurs most often.
So, in the example above - how do I establish that, for value 1 in the first column, 7 occurs more often than 2, 5 and 6 - and update all rows where the value in the first column is 1 so that the value in the second column is 7.
And ... that, for value 2 in the first column, 19 occurs more often than 18 - and update all rows where the value in the first column is 2 so that the value in the second column is 19.
Etc.
I haven't shown it above, but each row has a Primary Key.
Thanks for any help.
February 29, 2012 at 6:52 pm
THis?
SELECT * FROM #Fred;
; WITH CTE AS
(
SELECT OrganisationID , LocationID ,CT = COUNT(*) --, Cnt = COUNT(*) OVER ( PARTITION BY OrganisationID , LocationID)
FROM #Fred
GROUP BY OrganisationID , LocationID
)
, Rankd AS
(
SELECT OrganisationID , LocationID
, RN = ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY CT DESC)
FROM CTE
)
, TopPerformer AS
(
SELECT OrganisationID , LocationID
FROM Rankd
WHERE RN = 1
)
UPDATE F
SET F.LocationID = TP.LocationID
FROM #Fred F
INNER JOIN TopPerformer TP
ON TP.OrganisationID = F.OrganisationID
SELECT * FROM #Fred;
March 1, 2012 at 2:15 am
Wow, thanks very much. Yes, that's exactly what I need.
Seeing your code makes me realise what a novice I am at this lark.
Can I ask ...
Using that 'With' statement, you seem to be creating several temporary tables but you don't have to drop them - is that right?
And you start that statement with a semi-colon - and use commas to separate the statements that create the temporary tables?
Cheers
March 1, 2012 at 2:30 am
sku370870 (3/1/2012)
Wow, thanks very much. Yes, that's exactly what I need.Seeing your code makes me realise what a novice I am at this lark.
Can I ask ...
Using that 'With' statement, you seem to be creating several temporary tables but you don't have to drop them - is that right?
And you start that statement with a semi-colon - and use commas to separate the statements that create the temporary tables?
Cheers
Does it make it easier for you to understand if I rewrite his query like this?
UPDATE F
SET F.LocationID = updateValues.LocationID
FROM #Fred F
INNER JOIN (SELECT OrganisationID, LocationID
FROM (SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)
FROM (SELECT OrganisationID, LocationID, COUNT(*)
FROM #Fred
GROUP BY OrganisationID, LocationID) originalDataCount(OrganisationID, LocationID, Cnt)
) dataRanked(OrganisationID, LocationID, Rn)
WHERE RN = 1) updateValues ON updateValues.OrganisationID = F.OrganisationID;
Basically, a CTE is a derived table that can be referenced in your query. ColdCoffee has created 3 CTEs (named CTE, Rankd and TopPerformer) then used each one to build up the data he needed to update your table.
The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.
March 1, 2012 at 3:52 am
Thanks for your help in trying to get me to understand this.
I realise in your code you are doing up update query joining #fred to a table created by this statement.
SELECT OrganisationID, LocationID
FROM (SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)
FROM (SELECT OrganisationID, LocationID, COUNT(*) FROM #Fred
GROUP BY OrganisationID, LocationID)
originalDataCount(OrganisationID, LocationID, Cnt))
dataRanked(OrganisationID, LocationID, Rn)
WHERE RN = 1
Although I get the idea ... I don't understand the syntax at all ... there seem to be three consecutive 'FROMS'
I have only ever written
SELECT FieldA FROM TableB
I don't get .. SELECT something FROM something FROM something FROM #Fred
I'd be grateful for any further explanation - I'd really like to understand this stuff. Cheers.
Edit: Just as an aside - I spend most of time writing in C# in a .net environment ... where every line ends with a semi-colon or it won't compile.
In SQL Server I never use semi-colons to finish lines ... but you do?
March 1, 2012 at 4:09 am
OK, I'll give it a go but I've been told before that my teaching method comes across as abrasive.
First, we want to figure out the counts for each organisationID and locationID. We do this by doing a simple count:
SELECT OrganisationID, LocationID, COUNT(*)
FROM #Fred
GROUP BY OrganisationID, LocationID
Which returns:
OrganisationID LocationID
-------------- ----------- -----------
1 2 2
1 5 1
1 6 1
1 7 3
2 18 1
2 19 2
We need to know which is the highest count for each orgnanisationID, so we use a ranking function, but we want to base it from the data above so we use a derived table.
SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)
FROM (
--This is your derived table
SELECT OrganisationID, LocationID, COUNT(*)
FROM #Fred
GROUP BY OrganisationID, LocationID
) derivedTable(OrganisationID, LocationID, Cnt) --alias for derived table and column names
Which returns:
OrganisationID LocationID
-------------- ----------- --------------------
1 7 1
1 2 2
1 5 3
1 6 4
2 19 1
2 18 2
Now, those with a ROW_NUMBER of 1 are the highest counts for each organisationID, so we want to use another derived table to grab the above data and filter where ROW_NUMBER is 1.
SELECT OrganisationID, LocationID
FROM (
--This is your derived table
SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)
FROM (
--This is your derived table
SELECT OrganisationID, LocationID, COUNT(*)
FROM #Fred
GROUP BY OrganisationID, LocationID
) derivedTable(OrganisationID, LocationID, Cnt) --alias for derived table and column names
) derivedTable2(OrganisationID, LocationID, Rnk) --alias for derived table and column names
WHERE Rnk = 1
Which returns:
OrganisationID LocationID
-------------- -----------
1 7
2 19
Now we can join this back to #Fred based on the organisationID, allowing us to use the highest locationID.
SELECT f.OrganisationID AS originalOrganisationID, f.LocationID AS originalLocationID,
derivedTable3.OrganisationID AS derivedOrganisationID, derivedTable3.LocationID AS derivedLocationID
FROM #Fred f
INNER JOIN (--This is your derived table
SELECT OrganisationID, LocationID
FROM (
--This is your derived table
SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)
FROM (
--This is your derived table
SELECT OrganisationID, LocationID, COUNT(*)
FROM #Fred
GROUP BY OrganisationID, LocationID
) derivedTable(OrganisationID, LocationID, Cnt) --alias for derived table and column names
) derivedTable2(OrganisationID, LocationID, Rnk) --alias for derived table and column names
WHERE Rnk = 1
) derivedTable3 ON f.OrganisationID = derivedTable3.OrganisationID
Which results in:
originalOrganisationID originalLocationID derivedOrganisationID derivedLocationID
---------------------- ------------------ --------------------- -----------------
1 2 1 7
1 2 1 7
1 7 1 7
1 7 1 7
1 7 1 7
1 5 1 7
1 6 1 7
2 18 2 19
2 19 2 19
2 19 2 19
Does that make it clearer? Derived tables are sort of essential to almost all SQL development. . .
sku370870 (3/1/2012)
Edit: Just as an aside - I spend most of time writing in C# in a .net environment ... where every line ends with a semi-colon or it won't compile.In SQL Server I never use semi-colons to finish lines ... but you do?
No, I don't. But I should 😀
CTE is one of the first things in SQL that require the previous statement to have been terminated correctly.
March 1, 2012 at 9:35 am
Cadavre (3/1/2012)
OK, I'll give it a go but I've been told before that my teaching method comes across as abrasive.
Not at all Cadavre! Awesome explanation! Thanks for that. Hope the OP sees the code clearer now.
March 1, 2012 at 9:37 am
I must also ask a question here. What should happen if there are 2 LocationIDs with same number of appearances for a particular OrgID ?
March 1, 2012 at 9:54 am
The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.
To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.
Jared
CE - Microsoft
March 1, 2012 at 10:11 am
SQLKnowItAll (3/1/2012)
The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.
To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.
That's not fully accurate.
A CTE in a View definition not only doesn't require a semicolon, it won't work with one.
CREATE VIEW dbo.MyCTEView
AS
WITH CTE
AS (SELECT *
FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))
SELECT *
FROM CTE ;
GO
SELECT *
FROM dbo.MyCTEView ;
A CTE at the beginning of a batch doesn't require a preceding semicolon.
WITH CTE
AS (SELECT *
FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))
SELECT *
FROM CTE
A CTE as the first item in a stored procedure doesn't require one.
CREATE PROC dbo.MyCTEProc
AS
WITH CTE
AS (SELECT *
FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))
SELECT *
FROM CTE
GO
EXEC dbo.MyCTEProc
A CTE in the definition of an inline-select UDF doesn't require one and won't work with one (just like a View).
They're only required when there's a prior DML or DDL statement in a script/batch/procedure/function, and the use is to terminate that item, not to start the CTE. The reason for this is that With is an overloaded word, which can be used to add options to some commands, or to define a CTE.
select *
from MyTable
with CTEName as (CTE definition)
Is that a syntax error in defining a table hint, or is it a CTE? The dev can tell, but the SQL compiler can't. Hence, a semicolon is needed to terminate the prior statement.
If it's the first item in a batch, use one or not, at your option. I generally do, out of habit, but not out of necessity.
Edit: Note, all scripts except the last one work. The last one deliberately does NOT work, and will generate an error, which is the point for it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 1, 2012 at 10:15 am
GSquared (3/1/2012)
SQLKnowItAll (3/1/2012)
The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.
To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.
That's not fully accurate.
A CTE in a View definition not only doesn't require a semicolon, it won't work with one.
CREATE VIEW dbo.MyCTEView
AS
WITH CTE
AS (SELECT *
FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))
SELECT *
FROM CTE ;
GO
SELECT *
FROM dbo.MyCTEView ;
A CTE at the beginning of a batch doesn't require a preceding semicolon.
WITH CTE
AS (SELECT *
FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))
SELECT *
FROM CTE
A CTE as the first item in a stored procedure doesn't require one.
CREATE PROC dbo.MyCTEProc
AS
WITH CTE
AS (SELECT *
FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))
SELECT *
FROM CTE
GO
EXEC dbo.MyCTEProc
A CTE in the definition of an inline-select UDF doesn't require one and won't work with one (just like a View).
They're only required when there's a prior DML or DDL statement in a script/batch/procedure/function, and the use is to terminate that item, not to start the CTE. The reason for this is that With is an overloaded word, which can be used to add options to some commands, or to define a CTE.
select *
from MyTable
with CTEName as (CTE definition)
Is that a syntax error in defining a table hint, or is it a CTE? The dev can tell, but the SQL compiler can't. Hence, a semicolon is needed to terminate the prior statement.
If it's the first item in a batch, use one or not, at your option. I generally do, out of habit, but not out of necessity.
Edit: Note, all scripts except the last one work. The last one deliberately does NOT work, and will generate an error, which is the point for it.
Well heck! That's good to know! I was "taught" that a CTE must always have the semi-colon. Those who teach... 🙂 Thanks G!
Jared
CE - Microsoft
March 1, 2012 at 10:18 am
To the question about the nested From statements:
Those are what are called "derived tables".
You can do those in the body of the "outer query":
select *
from
(select *
from dbo.MyTable) as DerivedTable ;
or in a "Common Table Expression" (CTE):
with CTE (ColumnNames) as
(select *
from dbo.MyTable)
select *
from CTE ;
The advantage to CTEs is that they can be recursive. That means a CTE can query itself. This is used to query hierarchies, where each level in the hierarchy needs to query each level below it, till there are no more levels left, for example. There are other uses, but that's the most common good one. (There are a LOT of bad ways to use recursive CTEs, mainly because they can become serious performance killers.)
The other advantage to CTEs is that many people find them easier to read than the inline ones in the From clause. Not everyone benefits from this, but it is common.
Anything you can do in a regular query, you can do in a derived table. This can be very advantageous in breaking code up to make it more readable, without having to do things like nest views inside each other (which does the same thing as a derived table, but hides it, also often hiding performance-inhibiting code at the same time).
Does that help?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 1, 2012 at 12:08 pm
Thank you very much for taking the time to explain that - and - I understand it!
I will read, re-read and write if for myself with different tables and columns so that I can learn how to apply it without getting a pen and paper out and drawing tables all over it.
I've been using SQL Server for years now - and although I often join to what I call 'tables created on the fly with a select statement' - which, I think, is a 'derived table'(?), I had no idea you could use derived tables as you have - that reference the one above etc.
I rarely use cursors in stored procedures - seem to manage set based for most things. But, sometimes, when needing to manipulate data on a one off basis - I would usually create a temp table - bung it in cursor, loop through, create a second temp table, loop through etc. - gradually working my way down to get the result set I need - so I can do an update or whatever.
But your code has opened my eyes - I really must learn this stuff properly. It's a lot more interesting than boring .net front end stuff.
Edit: this reply was supposed to be a reply to Cadavre for his explanation ... and I want to thank Cold Coffee again for his original response and further explanations .. and SQLKnowItAll and GSQuared (wow, beginning to sound like an acceptance speech at the Oscars!) - anyway many thanks to all.
My real world example is a table with about 40,000 contacts belonging to about 3000 organisations where each organisation has between 1 and 10 locations - with the organisations / locations / contacts belong to one of about 200 projects. In the table (imported from another database) many of the contacts are assigned to lots of different locations for the same organisation. I need to update the data so that each contact is only assigned to one location for each organisation - and I want to assign them to the one that appears most often for that organisation.
Now, I can do it without a severe headache and about 4 nested cursors and temp tables.
March 1, 2012 at 8:09 pm
ColdCoffee (3/1/2012)
I must also ask a question here. What should happen if there are 2 LocationIDs with same number of appearances for a particular OrgID ?
Hey there folks. ColdCoffee asked a really valuable question here. If you think not, try using the following data and tell me whether or not the "1"s should be updated to 1,5 or 1,7.
CREATE Table #Fred(OrganisationID int, LocationID int)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,2)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,2)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,7)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,7)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,7)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,5)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,5)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,5)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(1,6)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(2,18)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(2,19)
INSERT INTO #Fred(OrganisationID, LocationID)
VALUES(2,19)
--Jeff Moden
Change is inevitable... Change for the better is not.
March 2, 2012 at 1:36 am
SQLKnowItAll (3/1/2012)
To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.
That's incorrect. If there is no preceding statement the CTE does not need a semi-colon. The reason it has become common practise is because people are misinformed.
Jeff Moden (3/1/2012)
ColdCoffee (3/1/2012)
I must also ask a question here. What should happen if there are 2 LocationIDs with same number of appearances for a particular OrgID ?Hey there folks. ColdCoffee asked a really valuable question here. If you think not, try using the following data and tell me whether or not the "1"s should be updated to 1,5 or 1,7.
Most definitely, I guess the OP missed it due to the multitude of posts.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply