merge statement and cte(common table expression)

  • Hello all,

    i am new to these topics

    can any body breifly explain about merge statement and cte.

    i am confusing while learning cte.can someone differ temp table, table variable and cte

    thank you in advance.

  • I don’t want to discourage or disappoint you but Google is the right place to start. Find BOL & books of your own interest and start learning it.

  • i m having big trouble understanding them.... I thought of getting some explanation here..but..

    thanks for your suggestion

  • If so, it would be a good starting point for you to present a code sample, and tell us what you don't understand.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • You are really asking two questions: one about the MERGE statement and one about CTE's. I'll try to address your question about CTE's.

    Common Table Expressions (aka CTE's) are just sub-queries. Let me show you an example of a sub-query.

    SELECT ProductSales.* /*Don't use SELECT * in Production Code!*/

    FROM ProductSales

    JOIN ( SELECT CustomerID

    FROM ProductSales

    WHERE ProductID = 'xyz'

    GROUP BY CustomerID /*Returns just one record per customer*/

    ) AS [ProductOwners]

    ON ProductSales.CustomerID = ProductOwners.CustomerID;

    This is a query that I wrote for another forum post. The inner query returns a list of Customer ID's for all customers who purchased product XYZ. The outer query returns all orders for those customers. (Amazon might use a query like this. Of the customers who purchased the product you are looking at right now, what else did they order?)

    Here is the same query written with the sub-query as a CTE:

    WITH [ProductOwners]

    AS

    ( SELECT CustomerID

    FROM ProductSales

    WHERE ProductID = 'xyz'

    GROUP BY CustomerID /*Returns just one record per customer*/

    )

    SELECT ProductSales.*

    FROM ProductSales

    JOIN ProductOwners

    ON ProductSales.CustomerID = ProductOwners.CustomerID;

    All we've done is take the sub-query out of the middle of the main query and put it at the beginning. Then we reference it in the main query with the alias we gave it, just as if it were a table. I like using CTE's because I feel they make your code easier to read.

    OK, let's do that same query again, but this time we'll pull the sub-query out of the statement altogether and put it into a Table Variable.

    DECLARE @ProductOwners AS TABLE

    ( CustomerID INT PRIMARY KEY );

    INSERT @ProductOwners

    SELECT CustomerID

    FROM ProductSales

    WHERE ProductID = 'xyz'

    GROUP BY CustomerID; /*Returns just one record per customer*/

    SELECT ProductSales.*

    FROM ProductSales

    JOIN @ProductOwners AS [ProductOwners]

    ON ProductSales.CustomerID = ProductOwners.CustomerID;

    OK. This time we declared a variable, @ProductOwners, of the type Table. Then we have to define the columns for that table variable, just like a regular table. Once we INSERT records into the table variable, we can reference it in other queries almost as if it were a table. The data in the CTE, on the other hand, ceases to exist outside the query statement in which it was defined - just like a sub-qery.

    A temporary table is almost exactly like a table variable, syntax-wise.

    CREATE TABLE #ProductOwners

    ( CustomerID INT PRIMARY KEY );

    INSERT #ProductOwners

    SELECT CustomerID

    FROM ProductSales

    WHERE ProductID = 'xyz'

    GROUP BY CustomerID; /*Returns just one record per customer*/

    SELECT ProductSales.*

    FROM ProductSales

    JOIN #ProductOwners AS [ProductOwners]

    ON ProductSales.CustomerID = ProductOwners.CustomerID;

    As you can see, using a temp table looks almost exactly like using a table variable. So, what's the difference?

    Well there are some interesting differences between the two. For one thing, table variables can be passed as parameters to stored procedures, just like other variables, while temp tables cannot. On the other hand, table variables have the same "scope" as other variables. If you have a long script divided into multiple batches, a table variable ceases to exist outside the batch in which it was defined. (The keyword, GO, is the batch divider in T-SQL.)

    Temp tables persist until the session ends. For example, when you open SQL Server Management Studio, each query tab that you open is a separate session or connection with SQL Server. When you close a query tab, the session ends, and any temp tables you defined in that session go away. Until you close that tab, though, any temp tables defined and the data stored in them persist. Table variables "go out of scope" and are dropped just as soon as the batch finishes executing, just like regular variables. (This can make temp tables useful for testing. You can define a temp table, load data into it, and then run all the queries on it that you like. The data doesn't go away until you close that tab.)

    There are other differences between temp tables and table variables. In essence, Table Variables are simpler objects than Temp Table. Temp Tables are actually just like regular, permanent tables, except that they reside in the TempDB system database and they are automatically dropped when the session that defined them is closed. You can define indexes on temp tables, and temp tables have statistics just like regular tables. In all respects - except persistence - SQL processes temp tables just like regular, permanent tables.

    Table Variables, on the other hand, are simpler objects. No statistics are defined for table variables, and you can only define one index on a table variable: the default, clustered index on the primary key. Also, table variables are single-threaded. This means that queries involving table variables can only be assigned to a single CPU core. (There are pros and cons to parallel vs. single-threaded query processing, but some complex queries on large datasets - in a data warehouse for example - can benefit from parallel processing.)

    The fact that table variables don't have statistics like regular tables can occasionally create serious performance problems. Since there are no statistics, the query optimizer does not know how many records are stored in a table variable, and it doesn't know anything about the distribution of the data. So, it has to make some very general assumptions (and perhaps incorrect) assumptions about the data in a table variable, and this can result in a less-than-optimal query plan. Most of the time, this won't make any difference, but occasionally table variables will perform dramatically worse than temp tables. (If you are interested, here is a fascinating article about how SQL processes table variables and temp tables and the potential differences in performance between them: http://www.simple-talk.com/community/blogs/philfactor/archive/2011/10/27/104040.aspx.)

    So, when should you use a CTE or a table variable or a temp table. Well, here is my general recommendation. If you have a sub-query embedded in a larger query, use a CTE to pull it out of the middle of that query and put it at the beginning of the query. In my opinion, this makes your code easier to read.

    If you are referencing the results of that sub-query more than once and if the data is relatively simple (or if you need to pass it as a parameter to a stored procedure) use a table variable.

    If the data is more complex use a temp table. For example, : perhaps the data distribution is (or could be) skewed or you want to define a non-clustered index on the data.

    As you can see, this is a large topic, and I've just hit the high points, really. (Plus, I haven't even gotten to your question about the MERGE statement!) You can see why the other posters would want a more narrowly defined question. :w00t:

    I hope that this information helps you, and I wish you the best of luck in your SQL journey! 🙂

  • pramany (12/23/2011)


    Hello all,

    i am new to these topics

    can any body breifly explain about merge statement and cte.

    i am confusing while learning cte.can someone differ temp table, table variable and cte

    thank you in advance.

    A CTE is nothing more than what you might find in a FROM clause. In other words, it's a SUB-QUERY. There are 4 "advantages"... 1) It allows for "top down" programming which is usually easier to read, 2) they can be called more than once in a given query (with the understanding that they will be executed more than once, just like a View), 3) they can be recursive (call themselves) in nature, and 4) it's very easy to cascade (make one call another).

    Item 2 isn't necessarily an advantage. Sometimes it's better to populate a Temp Table and call on that more than once.

    Item 3 isn't necessarily an advantage if the recursive CTe (rCTE) is a "counting" fCTE. Please see the following article for why "counting" rCTEs aren't such a good thing.

    http://www.sqlservercentral.com/articles/T-SQL/74118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK, now let me take a stab at answering your MERGE question. The example that I am going to use is taken from a blog post by Pinal Dave, which you can read here: http://blog.sqlauthority.com/2008/08/28/sql-server-2008-introduction-to-merge-statement-one-statement-for-insert-update-delete/. Pinal Dave is a veritable fountain of information about SQL, and I always enjoy reading his blog. 🙂

    The MERGE statement is a new feature in SQL 2008 which allows you to perform INSERTs, UPDATEs and DELETEs on a table in a single statement. It can improve performance (if the table indexes are properly defined) because it can make all three changes in a single pass on the table where previously you would have needed one statement for the INSERTS, a second statement for the UPDATEs and a third statement for the DELETEs. For your further study, here is a Technet article on improving performance with MERGE: http://technet.microsoft.com/en-us/library/cc879317.aspx.

    I should add here that the MERGE statement is designed for a specific niche in SQL. If you don't have a specific need to perform multiple DML operations on a table in a single pass and if MERGE is not giving you a performance benefit, then you shouldn't use it, because it is more complicated to read. At least, I think so.

    That said, let's proceed with the example. To set up the example, let's create to tables: one for Student Details and one for Student Total Marks. I'm going to create these as temp tables, but you can make them permanent if you prefer. (Pinal Dave added them to the Adventure Works database.)

    IF OBJECT_ID('TempDB..#StudentDetails') IS NOT NULL

    DROP TABLE #StudentDetails;

    CREATE TABLE #StudentDetails

    ( StudentID INT PRIMARY KEY

    ,StudentName VARCHAR(15) NOT NULL

    );

    INSERT #StudentDetails

    VALUES(1,'SMITH');

    INSERT #StudentDetails

    VALUES(2,'ALLEN');

    INSERT #StudentDetails

    VALUES(3,'JONES');

    INSERT #StudentDetails

    VALUES(4,'MARTIN');

    INSERT #StudentDetails

    VALUES(5,'JAMES');

    GO

    IF OBJECT_ID('TempDB..#StudentTotalMarks') IS NOT NULL

    DROP TABLE #StudentTotalMarks;

    CREATE TABLE #StudentTotalMarks

    ( StudentID INT PRIMARY KEY

    ,StudentMarks INT NOT NULL

    );

    INSERT INTO #StudentTotalMarks

    VALUES(1,230);

    INSERT INTO #StudentTotalMarks

    VALUES(2,255);

    INSERT INTO #StudentTotalMarks

    VALUES(3,200);

    OK, now we have our test tables. Here is the test problem.

    1. We want to delete any records from the [StudentTotalMarks] table where [StudentMarks] > 250.

    2. Add 25 to [StudentMarks].

    3. If a [StudentID] from [StudentDetails] does not exist in [StudentTotalMarks], then add it with an initial value of 25 for [StudentMarks].

    Here is how we would have done this in SQL 2005 (and earlier):

    DELETE #StudentTotalMarks

    WHERE StudentMarks > 250;

    UPDATE #StudentTotalMarks

    SET StudentMarks = StudentMarks + 25

    FROM #StudentDetails

    JOIN #StudentTotalMarks

    ON #StudentDetails.StudentID = #StudentTotalMarks.StudentID;

    INSERT #StudentTotalMarks

    ( StudentID

    ,StudentMarks

    )

    SELECT #StudentDetails.StudentID

    ,25

    FROM #StudentDetails

    LEFT JOIN #StudentTotalMarks

    ON #StudentDetails.StudentID = #StudentTotalMarks.StudentID

    WHERE #StudentTotalMarks.StudentID IS NULL;

    Here is #StudentTotalMarks before the updates:

    StudentID StudentMarks

    ----------- ------------

    1 230

    2 255

    3 200

    And here is #StudentTotalMarks after the updates:

    StudentID StudentMarks

    ----------- ------------

    1 255

    2 25

    3 225

    4 25

    5 25

    The basic syntax of the MERGE statement is as follows:

    MERGE [Target_Table]

    USING [Source]

    ON Join_Condition

    WHEN MATCHED

    THEN [Action];

    This is the basic syntax, but if you check Books Online (and you should), you will see more optional clauses that you can use with MERGE, as well as additional examples.

    OK, so combining our three updates into a single MERGE statement, we would have:

    MERGE #StudentTotalMarks AS StudentTotalMarks

    USING ( SELECT StudentID

    ,StudentName

    FROM #StudentDetails

    ) AS StudentDetails

    ON StudentTotalMarks.StudentID = StudentDetails.StudentID

    WHEN MATCHED AND StudentTotalMarks.StudentMarks > 250

    THEN DELETE

    WHEN MATCHED

    THEN UPDATE

    SET StudentTotalMarks.StudentMarks = StudentTotalMarks.StudentMarks + 25

    WHEN NOT MATCHED

    THEN INSERT(StudentID,StudentMarks)

    VALUES(StudentDetails.StudentID,25);

    The [Source] table in this example is specified as a sub-query. You could also use a CTE with this statement. (I think a CTE would make it easier to read.) Next we specify a series of updates to make, phrased as WHEN [Condition] THEN [Action].

    There are a couple of special syntax quirks you need to note. First, you must end a MERGE statement with a semicolon. (Actually, you should get used to ending all of your SQL statements with a semicolon. Eventually Microsoft intends to require the semicolon as a statement terminator for every SQL statement.)

    Second, if you specify an optional AND condition as part of the WHEN MATCHED clause, then that clause has to preceed the other WHEN MATCHED clauses.

    Here are our results from the MERGE statement:

    StudentID StudentMarks

    ----------- ------------

    1 255

    3 225

    4 25

    5 25

    You will note that this result set is different than the result set from our first set of statements. In the original set, Student ID 2 was deleted by the DELETE statement, and then it was added back in by the INSERT statement. When we used the MERGE statement, Student ID 2 was simply deleted.

    The WHEN - THEN clauses work much like a CASE statement: the first clause that satisfies the conditions for a given record will be executed, and the remaining clauses will be ignored (for that particular record).

    I hope that this helps clarify the MERGE statement a little bit. (A big "Thank You" goes out to Pinal Dave! :-))

  • Nice Job David! But it would be good as an article. Many learners could be redirected there.

    Looking at your post I am assure you can write good articles :-). Do you have plans / blog for it?

  • :blush: It's one of those things I keep telling myself I should do.

    I'll make it a New Year's resolution. 🙂

  • Dang... I didn't even see that David posted a full answer before I did. I wouldn't have even posted had I seen it. I guess I had the post open a wee bit too long.

    Dev is right, David... you should turn one or both of these posts into a "spackle" article. That's where one of my first articles came from... one of my posts... almost word for word.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David Moutray (12/25/2011)


    :blush: It's one of those things I keep telling myself I should do.

    I'll make it a New Year's resolution. 🙂

    So! How's that New Year's resolution coming? 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David Moutray (12/23/2011)


    ...you can only define one index on a table variable: the default, clustered index on the primary key.

    Table variables support unique non-clustered and clustered indexes (including non-clustered and clustered primary keys) defined as part of the table variable declaration (that is, you can't currently add indexes to an existing table variable). For example, these table variable definitions are both legal:

    DECLARE @Example TABLE

    (

    row_id integer PRIMARY KEY CLUSTERED,

    column1 integer NULL UNIQUE NONCLUSTERED,

    column2 integer NOT NULL,

    column3 integer NULL,

    UNIQUE NONCLUSTERED (column2, column3)

    )

    GO

    DECLARE @Example TABLE

    (

    row_id integer PRIMARY KEY NONCLUSTERED,

    column1 integer NULL UNIQUE NONCLUSTERED,

    column2 integer NOT NULL,

    column3 integer NULL,

    UNIQUE CLUSTERED (column2, column3)

    )

    Also, table variables are single-threaded. This means that queries involving table variables can only be assigned to a single CPU core.

    A query that changes the contents of a table variable will always generate a serial (single-thread) execution plan, but queries that only read from table variables can use parallelism, for example:

    See Wayne Sheffield's article on SSC (http://www.sqlservercentral.com/articles/Temporary+Tables/66720/) for more practical differences between table variables and temporary tables.

  • The following scenario is an example of using CTEs with a MERGE statement.

    Let's say you would like to update the status (attended, no show, canceled, rescheduled, etc.) of appointment records in database table Scheduled_Appointment with scheduling system transaction records in database table Daily_Scheduled_Appointment_Transaction. These transactions could be for past, present, or future appointments.

    You would also like your SSIS package to process the transactions one appointment month at a time.

    The following stored procedure could be executed in a SSIS package Execute SQL task that is inside a Foreach Loop container. Your enumerator for this container would be the list of distinct appointment years and months in the daily transaction table.

    The CTEs (Source_CTE, Target_CTE) are just subsets of the records in the larger database tables.

    Note that the target table for the MERGE is a CTE (Target_CTE), which is a subset of the records in the Scheduled_Appointment database table.

    The UPDATE and INSERT statements in the MERGE statement are, therefore, changing the contents of the Scheduled_Appointment database table.

    CREATE PROCEDURE [dbo].[usp_Merge_Scheduled_Appointment_Transactions]

    (

    @AppointmentDateYear INT

    , @AppointmentDateMonth INT

    )

    AS

    SET NOCOUNT ON;

    WITH Source_CTE ([Med_Rec_No]

    ,[Appt_Start_Date_Time]

    ,[Appt_Stop_Date_Time]

    ,[Appt_Link_Code]

    ,[Appt_Status]

    )

    AS

    (

    SELECT [Med_Rec_No]

    ,[Appt_Start_Date_Time]

    ,[Appt_Stop_Date_Time]

    ,[Appt_Link_Code]

    ,[Appt_Status]

    FROM [dbo].[Daily_Scheduled_Appointment_Transaction]

    WHERE ((DATEPART(YEAR,[Appt_Start_Date_Time]) = @AppointmentDateYear) AND

    (DATEPART(MONTH,[Appt_Start_Date_Time]) = @AppointmentDateMonth))

    ),

    Target_CTE ([Med_Rec_No]

    ,[Appt_Start_Date_Time]

    ,[Appt_Stop_Date_Time]

    ,[Appt_Link_Code]

    ,[Appt_Status]

    )

    AS

    (

    SELECT [Med_Rec_No]

    ,[Appt_Start_Date_Time]

    ,[Appt_Stop_Date_Time]

    ,[Appt_Link_Code]

    ,[Appt_Status]

    FROM [dbo].[Scheduled_Appointment]

    WHERE ((DATEPART(YEAR,[Appt_Start_Date_Time]) = @AppointmentDateYear) AND

    (DATEPART(MONTH,[Appt_Start_Date_Time]) = @AppointmentDateMonth)))

    MERGE Target_CTE AS Target

    USING Source_CTE AS Source

    ON ((Target.Appt_Link_Code = Source.Appt_Link_Code) AND

    (Target.Med_Rec_No = Source.Med_Rec_No) AND

    ((DATEPART(YEAR,Target.[Appt_Start_Date_Time]) = @AppointmentDateYear) AND

    (DATEPART(MONTH,Target.[Appt_Start_Date_Time]) = @AppointmentDateMonth)))

    WHEN MATCHED

    THEN UPDATE SET Target.[Appt_Status] = Source.[Appt_Status]

    WHEN NOT MATCHED BY TARGET

    THEN INSERT ([Med_Rec_No]

    ,[Appt_Start_Date_Time]

    ,[Appt_Stop_Date_Time]

    ,[Appt_Link_Code]

    ,[Appt_Status])

    VALUES

    (Source.[Med_Rec_No]

    ,Source.[Appt_Start_Date_Time]

    ,Source.[Appt_Stop_Date_Time]

    ,Source.[Appt_Link_Code]

    ,Source.[Appt_Status])

    ;

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply