December 23, 2011 at 6:40 am
Hello all,
i am new to these topics
can any body breifly explain about merge statement and cte.
i am confusing while learning cte.can someone differ temp table, table variable and cte
thank you in advance.
December 23, 2011 at 6:47 am
I don’t want to discourage or disappoint you but Google is the right place to start. Find BOL & books of your own interest and start learning it.
December 23, 2011 at 10:35 am
i m having big trouble understanding them.... I thought of getting some explanation here..but..
thanks for your suggestion
December 23, 2011 at 3:06 pm
You are really asking two questions: one about the MERGE statement and one about CTE's. I'll try to address your question about CTE's.
Common Table Expressions (aka CTE's) are just sub-queries. Let me show you an example of a sub-query.
SELECT ProductSales.* /*Don't use SELECT * in Production Code!*/
FROM ProductSales
JOIN ( SELECT CustomerID
FROM ProductSales
WHERE ProductID = 'xyz'
GROUP BY CustomerID /*Returns just one record per customer*/
) AS [ProductOwners]
ON ProductSales.CustomerID = ProductOwners.CustomerID;
This is a query that I wrote for another forum post. The inner query returns a list of Customer ID's for all customers who purchased product XYZ. The outer query returns all orders for those customers. (Amazon might use a query like this. Of the customers who purchased the product you are looking at right now, what else did they order?)
Here is the same query written with the sub-query as a CTE:
WITH [ProductOwners]
AS
( SELECT CustomerID
FROM ProductSales
WHERE ProductID = 'xyz'
GROUP BY CustomerID /*Returns just one record per customer*/
)
SELECT ProductSales.*
FROM ProductSales
JOIN ProductOwners
ON ProductSales.CustomerID = ProductOwners.CustomerID;
All we've done is take the sub-query out of the middle of the main query and put it at the beginning. Then we reference it in the main query with the alias we gave it, just as if it were a table. I like using CTE's because I feel they make your code easier to read.
OK, let's do that same query again, but this time we'll pull the sub-query out of the statement altogether and put it into a Table Variable.
DECLARE @ProductOwners AS TABLE
( CustomerID INT PRIMARY KEY );
INSERT @ProductOwners
SELECT CustomerID
FROM ProductSales
WHERE ProductID = 'xyz'
GROUP BY CustomerID; /*Returns just one record per customer*/
SELECT ProductSales.*
FROM ProductSales
JOIN @ProductOwners AS [ProductOwners]
ON ProductSales.CustomerID = ProductOwners.CustomerID;
OK. This time we declared a variable, @ProductOwners, of the type Table. Then we have to define the columns for that table variable, just like a regular table. Once we INSERT records into the table variable, we can reference it in other queries almost as if it were a table. The data in the CTE, on the other hand, ceases to exist outside the query statement in which it was defined - just like a sub-qery.
A temporary table is almost exactly like a table variable, syntax-wise.
CREATE TABLE #ProductOwners
( CustomerID INT PRIMARY KEY );
INSERT #ProductOwners
SELECT CustomerID
FROM ProductSales
WHERE ProductID = 'xyz'
GROUP BY CustomerID; /*Returns just one record per customer*/
SELECT ProductSales.*
FROM ProductSales
JOIN #ProductOwners AS [ProductOwners]
ON ProductSales.CustomerID = ProductOwners.CustomerID;
As you can see, using a temp table looks almost exactly like using a table variable. So, what's the difference?
Well there are some interesting differences between the two. For one thing, table variables can be passed as parameters to stored procedures, just like other variables, while temp tables cannot. On the other hand, table variables have the same "scope" as other variables. If you have a long script divided into multiple batches, a table variable ceases to exist outside the batch in which it was defined. (The keyword, GO, is the batch divider in T-SQL.)
Temp tables persist until the session ends. For example, when you open SQL Server Management Studio, each query tab that you open is a separate session or connection with SQL Server. When you close a query tab, the session ends, and any temp tables you defined in that session go away. Until you close that tab, though, any temp tables defined and the data stored in them persist. Table variables "go out of scope" and are dropped just as soon as the batch finishes executing, just like regular variables. (This can make temp tables useful for testing. You can define a temp table, load data into it, and then run all the queries on it that you like. The data doesn't go away until you close that tab.)
There are other differences between temp tables and table variables. In essence, Table Variables are simpler objects than Temp Table. Temp Tables are actually just like regular, permanent tables, except that they reside in the TempDB system database and they are automatically dropped when the session that defined them is closed. You can define indexes on temp tables, and temp tables have statistics just like regular tables. In all respects - except persistence - SQL processes temp tables just like regular, permanent tables.
Table Variables, on the other hand, are simpler objects. No statistics are defined for table variables, and you can only define one index on a table variable: the default, clustered index on the primary key. Also, table variables are single-threaded. This means that queries involving table variables can only be assigned to a single CPU core. (There are pros and cons to parallel vs. single-threaded query processing, but some complex queries on large datasets - in a data warehouse for example - can benefit from parallel processing.)
The fact that table variables don't have statistics like regular tables can occasionally create serious performance problems. Since there are no statistics, the query optimizer does not know how many records are stored in a table variable, and it doesn't know anything about the distribution of the data. So, it has to make some very general assumptions (and perhaps incorrect) assumptions about the data in a table variable, and this can result in a less-than-optimal query plan. Most of the time, this won't make any difference, but occasionally table variables will perform dramatically worse than temp tables. (If you are interested, here is a fascinating article about how SQL processes table variables and temp tables and the potential differences in performance between them: http://www.simple-talk.com/community/blogs/philfactor/archive/2011/10/27/104040.aspx.)
So, when should you use a CTE or a table variable or a temp table. Well, here is my general recommendation. If you have a sub-query embedded in a larger query, use a CTE to pull it out of the middle of that query and put it at the beginning of the query. In my opinion, this makes your code easier to read.
If you are referencing the results of that sub-query more than once and if the data is relatively simple (or if you need to pass it as a parameter to a stored procedure) use a table variable.
If the data is more complex use a temp table. For example, : perhaps the data distribution is (or could be) skewed or you want to define a non-clustered index on the data.
As you can see, this is a large topic, and I've just hit the high points, really. (Plus, I haven't even gotten to your question about the MERGE statement!) You can see why the other posters would want a more narrowly defined question. :w00t:
I hope that this information helps you, and I wish you the best of luck in your SQL journey! 🙂
December 23, 2011 at 3:46 pm
pramany (12/23/2011)
Hello all,i am new to these topics
can any body breifly explain about merge statement and cte.
i am confusing while learning cte.can someone differ temp table, table variable and cte
thank you in advance.
A CTE is nothing more than what you might find in a FROM clause. In other words, it's a SUB-QUERY. There are 4 "advantages"... 1) It allows for "top down" programming which is usually easier to read, 2) they can be called more than once in a given query (with the understanding that they will be executed more than once, just like a View), 3) they can be recursive (call themselves) in nature, and 4) it's very easy to cascade (make one call another).
Item 2 isn't necessarily an advantage. Sometimes it's better to populate a Temp Table and call on that more than once.
Item 3 isn't necessarily an advantage if the recursive CTe (rCTE) is a "counting" fCTE. Please see the following article for why "counting" rCTEs aren't such a good thing.
http://www.sqlservercentral.com/articles/T-SQL/74118/
--Jeff Moden
Change is inevitable... Change for the better is not.
December 23, 2011 at 5:22 pm
OK, now let me take a stab at answering your MERGE question. The example that I am going to use is taken from a blog post by Pinal Dave, which you can read here: http://blog.sqlauthority.com/2008/08/28/sql-server-2008-introduction-to-merge-statement-one-statement-for-insert-update-delete/. Pinal Dave is a veritable fountain of information about SQL, and I always enjoy reading his blog. 🙂
The MERGE statement is a new feature in SQL 2008 which allows you to perform INSERTs, UPDATEs and DELETEs on a table in a single statement. It can improve performance (if the table indexes are properly defined) because it can make all three changes in a single pass on the table where previously you would have needed one statement for the INSERTS, a second statement for the UPDATEs and a third statement for the DELETEs. For your further study, here is a Technet article on improving performance with MERGE: http://technet.microsoft.com/en-us/library/cc879317.aspx.
I should add here that the MERGE statement is designed for a specific niche in SQL. If you don't have a specific need to perform multiple DML operations on a table in a single pass and if MERGE is not giving you a performance benefit, then you shouldn't use it, because it is more complicated to read. At least, I think so.
That said, let's proceed with the example. To set up the example, let's create to tables: one for Student Details and one for Student Total Marks. I'm going to create these as temp tables, but you can make them permanent if you prefer. (Pinal Dave added them to the Adventure Works database.)
IF OBJECT_ID('TempDB..#StudentDetails') IS NOT NULL
DROP TABLE #StudentDetails;
CREATE TABLE #StudentDetails
( StudentID INT PRIMARY KEY
,StudentName VARCHAR(15) NOT NULL
);
INSERT #StudentDetails
VALUES(1,'SMITH');
INSERT #StudentDetails
VALUES(2,'ALLEN');
INSERT #StudentDetails
VALUES(3,'JONES');
INSERT #StudentDetails
VALUES(4,'MARTIN');
INSERT #StudentDetails
VALUES(5,'JAMES');
GO
IF OBJECT_ID('TempDB..#StudentTotalMarks') IS NOT NULL
DROP TABLE #StudentTotalMarks;
CREATE TABLE #StudentTotalMarks
( StudentID INT PRIMARY KEY
,StudentMarks INT NOT NULL
);
INSERT INTO #StudentTotalMarks
VALUES(1,230);
INSERT INTO #StudentTotalMarks
VALUES(2,255);
INSERT INTO #StudentTotalMarks
VALUES(3,200);
OK, now we have our test tables. Here is the test problem.
1. We want to delete any records from the [StudentTotalMarks] table where [StudentMarks] > 250.
2. Add 25 to [StudentMarks].
3. If a [StudentID] from [StudentDetails] does not exist in [StudentTotalMarks], then add it with an initial value of 25 for [StudentMarks].
Here is how we would have done this in SQL 2005 (and earlier):
DELETE #StudentTotalMarks
WHERE StudentMarks > 250;
UPDATE #StudentTotalMarks
SET StudentMarks = StudentMarks + 25
FROM #StudentDetails
JOIN #StudentTotalMarks
ON #StudentDetails.StudentID = #StudentTotalMarks.StudentID;
INSERT #StudentTotalMarks
( StudentID
,StudentMarks
)
SELECT #StudentDetails.StudentID
,25
FROM #StudentDetails
LEFT JOIN #StudentTotalMarks
ON #StudentDetails.StudentID = #StudentTotalMarks.StudentID
WHERE #StudentTotalMarks.StudentID IS NULL;
Here is #StudentTotalMarks before the updates:
StudentID StudentMarks
----------- ------------
1 230
2 255
3 200
And here is #StudentTotalMarks after the updates:
StudentID StudentMarks
----------- ------------
1 255
2 25
3 225
4 25
5 25
The basic syntax of the MERGE statement is as follows:
MERGE [Target_Table]
USING [Source]
ON Join_Condition
WHEN MATCHED
THEN [Action];
This is the basic syntax, but if you check Books Online (and you should), you will see more optional clauses that you can use with MERGE, as well as additional examples.
OK, so combining our three updates into a single MERGE statement, we would have:
MERGE #StudentTotalMarks AS StudentTotalMarks
USING ( SELECT StudentID
,StudentName
FROM #StudentDetails
) AS StudentDetails
ON StudentTotalMarks.StudentID = StudentDetails.StudentID
WHEN MATCHED AND StudentTotalMarks.StudentMarks > 250
THEN DELETE
WHEN MATCHED
THEN UPDATE
SET StudentTotalMarks.StudentMarks = StudentTotalMarks.StudentMarks + 25
WHEN NOT MATCHED
THEN INSERT(StudentID,StudentMarks)
VALUES(StudentDetails.StudentID,25);
The [Source] table in this example is specified as a sub-query. You could also use a CTE with this statement. (I think a CTE would make it easier to read.) Next we specify a series of updates to make, phrased as WHEN [Condition] THEN [Action].
There are a couple of special syntax quirks you need to note. First, you must end a MERGE statement with a semicolon. (Actually, you should get used to ending all of your SQL statements with a semicolon. Eventually Microsoft intends to require the semicolon as a statement terminator for every SQL statement.)
Second, if you specify an optional AND condition as part of the WHEN MATCHED clause, then that clause has to preceed the other WHEN MATCHED clauses.
Here are our results from the MERGE statement:
StudentID StudentMarks
----------- ------------
1 255
3 225
4 25
5 25
You will note that this result set is different than the result set from our first set of statements. In the original set, Student ID 2 was deleted by the DELETE statement, and then it was added back in by the INSERT statement. When we used the MERGE statement, Student ID 2 was simply deleted.
The WHEN - THEN clauses work much like a CASE statement: the first clause that satisfies the conditions for a given record will be executed, and the remaining clauses will be ignored (for that particular record).
I hope that this helps clarify the MERGE statement a little bit. (A big "Thank You" goes out to Pinal Dave! :-))
December 25, 2011 at 5:28 am
Nice Job David! But it would be good as an article. Many learners could be redirected there.
Looking at your post I am assure you can write good articles :-). Do you have plans / blog for it?
December 25, 2011 at 9:12 am
:blush: It's one of those things I keep telling myself I should do.
I'll make it a New Year's resolution. 🙂
December 25, 2011 at 4:46 pm
Dang... I didn't even see that David posted a full answer before I did. I wouldn't have even posted had I seen it. I guess I had the post open a wee bit too long.
Dev is right, David... you should turn one or both of these posts into a "spackle" article. That's where one of my first articles came from... one of my posts... almost word for word.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 8, 2012 at 4:35 pm
David Moutray (12/25/2011)
:blush: It's one of those things I keep telling myself I should do.I'll make it a New Year's resolution. 🙂
So! How's that New Year's resolution coming? 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
January 8, 2012 at 11:11 pm
David Moutray (12/23/2011)
...you can only define one index on a table variable: the default, clustered index on the primary key.
Table variables support unique non-clustered and clustered indexes (including non-clustered and clustered primary keys) defined as part of the table variable declaration (that is, you can't currently add indexes to an existing table variable). For example, these table variable definitions are both legal:
DECLARE @Example TABLE
(
row_id integer PRIMARY KEY CLUSTERED,
column1 integer NULL UNIQUE NONCLUSTERED,
column2 integer NOT NULL,
column3 integer NULL,
UNIQUE NONCLUSTERED (column2, column3)
)
GO
DECLARE @Example TABLE
(
row_id integer PRIMARY KEY NONCLUSTERED,
column1 integer NULL UNIQUE NONCLUSTERED,
column2 integer NOT NULL,
column3 integer NULL,
UNIQUE CLUSTERED (column2, column3)
)
Also, table variables are single-threaded. This means that queries involving table variables can only be assigned to a single CPU core.
A query that changes the contents of a table variable will always generate a serial (single-thread) execution plan, but queries that only read from table variables can use parallelism, for example:
See Wayne Sheffield's article on SSC (http://www.sqlservercentral.com/articles/Temporary+Tables/66720/) for more practical differences between table variables and temporary tables.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 11, 2012 at 4:17 pm
The following scenario is an example of using CTEs with a MERGE statement.
Let's say you would like to update the status (attended, no show, canceled, rescheduled, etc.) of appointment records in database table Scheduled_Appointment with scheduling system transaction records in database table Daily_Scheduled_Appointment_Transaction. These transactions could be for past, present, or future appointments.
You would also like your SSIS package to process the transactions one appointment month at a time.
The following stored procedure could be executed in a SSIS package Execute SQL task that is inside a Foreach Loop container. Your enumerator for this container would be the list of distinct appointment years and months in the daily transaction table.
The CTEs (Source_CTE, Target_CTE) are just subsets of the records in the larger database tables.
Note that the target table for the MERGE is a CTE (Target_CTE), which is a subset of the records in the Scheduled_Appointment database table.
The UPDATE and INSERT statements in the MERGE statement are, therefore, changing the contents of the Scheduled_Appointment database table.
CREATE PROCEDURE [dbo].[usp_Merge_Scheduled_Appointment_Transactions]
(
@AppointmentDateYear INT
, @AppointmentDateMonth INT
)
AS
SET NOCOUNT ON;
WITH Source_CTE ([Med_Rec_No]
,[Appt_Start_Date_Time]
,[Appt_Stop_Date_Time]
,[Appt_Link_Code]
,[Appt_Status]
)
AS
(
SELECT [Med_Rec_No]
,[Appt_Start_Date_Time]
,[Appt_Stop_Date_Time]
,[Appt_Link_Code]
,[Appt_Status]
FROM [dbo].[Daily_Scheduled_Appointment_Transaction]
WHERE ((DATEPART(YEAR,[Appt_Start_Date_Time]) = @AppointmentDateYear) AND
(DATEPART(MONTH,[Appt_Start_Date_Time]) = @AppointmentDateMonth))
),
Target_CTE ([Med_Rec_No]
,[Appt_Start_Date_Time]
,[Appt_Stop_Date_Time]
,[Appt_Link_Code]
,[Appt_Status]
)
AS
(
SELECT [Med_Rec_No]
,[Appt_Start_Date_Time]
,[Appt_Stop_Date_Time]
,[Appt_Link_Code]
,[Appt_Status]
FROM [dbo].[Scheduled_Appointment]
WHERE ((DATEPART(YEAR,[Appt_Start_Date_Time]) = @AppointmentDateYear) AND
(DATEPART(MONTH,[Appt_Start_Date_Time]) = @AppointmentDateMonth)))
MERGE Target_CTE AS Target
USING Source_CTE AS Source
ON ((Target.Appt_Link_Code = Source.Appt_Link_Code) AND
(Target.Med_Rec_No = Source.Med_Rec_No) AND
((DATEPART(YEAR,Target.[Appt_Start_Date_Time]) = @AppointmentDateYear) AND
(DATEPART(MONTH,Target.[Appt_Start_Date_Time]) = @AppointmentDateMonth)))
WHEN MATCHED
THEN UPDATE SET Target.[Appt_Status] = Source.[Appt_Status]
WHEN NOT MATCHED BY TARGET
THEN INSERT ([Med_Rec_No]
,[Appt_Start_Date_Time]
,[Appt_Stop_Date_Time]
,[Appt_Link_Code]
,[Appt_Status])
VALUES
(Source.[Med_Rec_No]
,Source.[Appt_Start_Date_Time]
,Source.[Appt_Stop_Date_Time]
,Source.[Appt_Link_Code]
,Source.[Appt_Status])
;
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply