June 5, 2015 at 7:19 pm
I'm a bit late to the party but ...
Just because Microsoft does it it or does not do it is no excuse for me to do it or not do it. Set based is better in lots of ways as others have pointed out.
We should avoid EAV tables as well.. Then go look at sys.extended_properties view some time. There is an EAV like design for you. Just because Microsoft has this is no reason that I'm going to include a monster like this in my design.
ATBCharles Kincaid
June 5, 2015 at 9:27 pm
Charles Kincaid (6/5/2015)
I'm a bit late to the party but ...Just because Microsoft does it it or does not do it is no excuse for me to do it or not do it. Set based is better in lots of ways as others have pointed out.
We should avoid EAV tables as well.. Then go look at sys.extended_properties view some time. There is an EAV like design for you. Just because Microsoft has this is no reason that I'm going to include a monster like this in my design.
Heh... I have to disagree, Charles. Even the horrific EAV and NVP tables have their good uses. In fact, I'll even suggest that certain normalized tables are really, at the very least, NVP tables and some fully qualify as EAV tables.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 7, 2015 at 10:44 pm
Nothing wrong with cursors they have their place and are very useful.
Oracle use both set based and cursor base queries - both are very fast - not sure why Microsoft have made them so inefficient - may the optimiser in Oracle is better.
July 3, 2015 at 1:51 pm
andrew.norris 90251 (6/7/2015)
Nothing wrong with cursors they have their place and are very useful.
Do you have examples for something other hierarchies and bin stacking problems?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 3, 2015 at 2:31 pm
[deleted]
July 3, 2015 at 6:35 pm
patrickmcginnis59 10839 (7/3/2015)
Jeff Moden (7/3/2015)
andrew.norris 90251 (6/7/2015)
Nothing wrong with cursors they have their place and are very useful.Do you have examples for something other hierarchies and bin stacking problems?
doing inserts to a table on a linked server comes to mind!
Since it was Andrew that posted, I was after him to provide an example of where he thought a cursor might be useful.
You're kind of doing the same thing. You've identified a use but not the reason. Why would you use a cursor to do "inserts to a table on a linked server" instead of something set-based?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 3, 2015 at 7:10 pm
Simple example. You have a parent child relationship in your data that has a ragged child hierarchy. You need to use different logic on different types of parent records and then on different types of child records that are related to that parent, and even based on the number of child records you find the logic to be executed needs to be different. The non set based logic of a cursor makes debugging easier as you can very easily step through the logic, rather than simply looking at the result of a set operation (or more likely a number of set operations)
While it will be possible to do with set based operations and numerous temp tables - in some circumstances cursors provide a better mechanism, not sure on speed. But clearly if a cursor is perfoming in a 'performant' manner from a user perspective and resources are not constrained on the server, then what is the issue. I have also seen set operations which perform poorly, so let's not necessarily blame the construct.
Again I am not sure why Oracle can make their PL/SQL using cursors as fast as set based operations and MS cannot. Cursors are not evil, and not bad, just a different method of processing data.
Just my opinion, and I will use them where I think there is an advantage in doing so.
Cheers
July 3, 2015 at 10:24 pm
andrew.norris 90251 (7/3/2015)
Simple example. You have a parent child relationship in your data that has a ragged child hierarchy. You need to use different logic on different types of parent records and then on different types of child records that are related to that parent, and even based on the number of child records you find the logic to be executed needs to be different. The non set based logic of a cursor makes debugging easier as you can very easily step through the logic, rather than simply looking at the result of a set operation (or more likely a number of set operations)While it will be possible to do with set based operations and numerous temp tables - in some circumstances cursors provide a better mechanism, not sure on speed. But clearly if a cursor is perfoming in a 'performant' manner from a user perspective and resources are not constrained on the server, then what is the issue. I have also seen set operations which perform poorly, so let's not necessarily blame the construct.
Again I am not sure why Oracle can make their PL/SQL using cursors as fast as set based operations and MS cannot. Cursors are not evil, and not bad, just a different method of processing data.
Just my opinion, and I will use them where I think there is an advantage in doing so.
Cheers
First, I agree that MS could have done a better job in its rendition of cursors. Since you've said it twice, I also agree that Oracle did cursors better but good set-based code still crushes equivalent code even there. So let's move on because we can't change what MS did and what Oracle does has nothing to do with what MS did. 😀
I also specifically asked for something other than a hierarchical example for where cursor usage is valuable because it's been proven many times that hierarchies are one place where most people over-look set based methods and think the only way to solve such things quickly and easily is through the use of cursors and other forms of RBAR. Here's a quote of that post.
Jeff Moden (7/3/2015)
andrew.norris 90251 (6/7/2015)
Nothing wrong with cursors they have their place and are very useful.Do you have examples for [font="Arial Black"]something other hierarchies [/font]and bin stacking problems?
Since you brought up hierarchies, though, let's use that as an example. The stored procedure near the end of this post will build a "perfect" Adjacency List hierarchy (meaning "no cycles" in the hierarchy). Your challenge, if you decide to accept it, is to create a result set from that Adjacency list that does the following.
For each EmployeeID...
1. List the EmployeeID
2. Produce a grand total row for each employee consisting of the following:
2.1. The hierarchical Level of the employee compared to the root node.
2.2. The hierarchical Level of the employee compared to their self.
2.3. The total number of employees in the "downline" of that employee including that employee.
2.4. The total amount of sales in the "downline" of that employee including that employee.
3. Produce a sub-total row for each level in that employee's "downline" consisting of the following:
3.1. The hierarchical Level of that level compared to the root node.
3.2. The hierarchical Level of that level compared to the employee that is at the top (the "top" being the EmployeeID from #1 above) of that particular "downline".
3.3. The total number of employees in that level of that particular "downline"
3.4. The total sales of all employees in that level of that particular "downline".
In other words, give a shorter example of an Adjacency List like this (the Tan area being just one employee {Bob} and his downline)...
... produce a result set that looks like the following (the colored rows match the colored nodes in the org chart above)...
You use a cursor. I'll do something else. We'll test against the smaller org chart to ensure the code works correctly and then well run it against the table the attached proc builds and see if we actually need a cursor to do this or not and which might be faster. I'll also admit that I've never tried something like this with a cursor, so it's VERY possible that you might be teaching me a great lesson here.
Are you up for it?
Here's the code to build the small table according to the org chart above...
--===== Do this in a nice, safe place that everyone has
USE tempdb;
GO
CREATE PROCEDURE dbo.BuildSmallEmployeeTable AS
/****************************************************************************
Purpose:
Create a standard "well formed" Adjacency List hierarchy with indexes.
This procedure takes no parameters.
Usage:
EXEC dbo.BuildSmallEmployeeTable;
Revision History:
Initial creation - Circa 2009 - Jeff Moden
Rev 01 - 15 May 2010 - Jeff Moden
- Abort if current DB isn't "tempdb" to protect users.
****************************************************************************/
--===========================================================================
-- Presets
--===========================================================================
--===== Supresss the autodisplay of rowcounts to cleanup the display and to
-- prevent false error returns if called from a GUI.
SET NOCOUNT ON;
--===== Make sure that we're in a safe place to run this...
IF DB_NAME() <> N'tempdb'
BEGIN
RAISERROR('Current DB is NOT tempdb. Run aborted.',11,1);
RETURN;
END;
--===== Conditionaly drop the test table so we can do reruns more easily
IF OBJECT_ID('tempdb.dbo.Employee','U') IS NOT NULL
DROP TABLE tempdb.dbo.Employee;
--===========================================================================
RAISERROR('Building the hierarchy table...',0,1) WITH NOWAIT;
--===========================================================================
--===== Create the test table with a clustered PK and an FK to itself to make
-- sure that a ManagerID is also an EmployeeID.
CREATE TABLE dbo.Employee
(
EmployeeID INT NOT NULL,
ManagerID INT NULL,
EmployeeName VARCHAR(10) NOT NULL,
CONSTRAINT PK_Employee
PRIMARY KEY CLUSTERED (EmployeeID),
CONSTRAINT FK_Employee_Employee
FOREIGN KEY (ManagerID)
REFERENCES dbo.Employee (EmployeeID)
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
;
--===========================================================================
RAISERROR('Populate the hierarchy table...',0,1) WITH NOWAIT;
--===========================================================================
--===== Populate the test table with test data. Each child ID has a parent ID
-- adjacent to it on the same row which is why it's called an "Adjacency
-- List".
INSERT INTO dbo.Employee
(EmployeeID, ManagerID, EmployeeName)
SELECT 26,NULL,'Jim' UNION ALL
SELECT 2, 26,'Lynne' UNION ALL
SELECT 3, 26,'Bob' UNION ALL
SELECT 6, 17,'Eric' UNION ALL
SELECT 8, 3,'Bill' UNION ALL
SELECT 7, 3,'Vivian' UNION ALL
SELECT 12, 8,'Megan' UNION ALL
SELECT 14, 8,'Kim' UNION ALL
SELECT 17, 2,'Butch' UNION ALL
SELECT 18, 39,'Lisa' UNION ALL
SELECT 20, 3,'Natalie' UNION ALL
SELECT 21, 39,'Homer' UNION ALL
SELECT 39, 26,'Ken' UNION ALL
SELECT 40, 26,'Marge'
;
RAISERROR('There are %u rows in the hierarchy.',0,1,@@ROWCOUNT) WITH NOWAIT;
--===========================================================================
RAISERROR('Adding an additional index ...',0,1) WITH NOWAIT;
--===========================================================================
--===== Create an additional index to speed things up
CREATE UNIQUE INDEX By_ManagerID_EmployeeID
ON TempDB.dbo.Employee (ManagerID,EmployeeID)
;
--===========================================================================
-- Exit
--===========================================================================
RAISERROR('===============================================',0,1) WITH NOWAIT;
RAISERROR('RUN COMPLETE',0,1) WITH NOWAIT;
RAISERROR('===============================================',0,1) WITH NOWAIT;
GO
... and here's the code to build the large one I spoke of....
--===== Do this in a nice, safe place that everyone has
USE tempdb;
GO
CREATE PROCEDURE dbo.BuildLargeEmployeeTable
/****************************************************************************
Purpose:
Create a randomized "well formed" Adjacency List hierarchy with indexes.
Progammer's Notes:
1. Each EmployeeID (except for the Root Node, of course) is assigned a
random ManagerID number which is initially always less than the current
EmployeeID to ensure that no cycles occur in the hierarcy.
2. The second parameter used to call this stored procedure will optionally
randomize the EmployeeIDss to make the hierarchy truly random as it would
likely be in real life. This, of course, takes a small amounnt of extra
time.
3. This code runs nasty fast and is great for testing hierarchical
processing code. Including the index builds, this code will build a
million node Adjacency List on a 4 processor (i5) laptop with 6GB of RAM
in just several seconds. The optional randomization adds just several
more seconds.
Usage:
--===== Create the hierarchy where all the ManagerIDs are less than the
-- EmployeeIDs. This is the fastest option and will build a million node
-- hierarchy in just about 7 seconds on a modern machine.
EXEC dbo.BuildLargeEmployeeTable 1000000;
--===== Making the second parameter a non-zero value will further randomize
-- the IDs in the hierarchy. This, of course, takes extra time and will
-- build a million row hierarchy in about 17 seconds on a modern
-- machine.
EXEC dbo.BuildLargeEmployeeTable 1000000,1;
Revision History:
Initial concept and creation - Circa 2009 - Jeff Moden
Rev 01 - 15 May 2010 - Jeff Moden
- Abort if current DB isn't "tempdb" to protect users.
Rev 02 - 13 Oct 2012 - Jeff Moden
- Add a randomization stop to make the hierarchy more like real life.
****************************************************************************/
--===== Declare the I/O parameters
@pRowsToBuild INT,
@pRandomize TINYINT = 0
AS
--===========================================================================
-- Presets
--===========================================================================
--===== Supresss the autodisplay of rowcounts to cleanup the display and to
-- prevent false error returns if called from a GUI.
SET NOCOUNT ON;
--===== Make sure that we're in a safe place to run this...
IF DB_NAME() <> N'tempdb'
BEGIN
RAISERROR('Current DB is NOT tempdb. Run aborted.',11,1);
RETURN;
END;
--===== Conditionaly drop the test table so we can do reruns more easily
IF OBJECT_ID('tempdb.dbo.Employee','U') IS NOT NULL
DROP TABLE tempdb.dbo.Employee;
--===========================================================================
RAISERROR('Building the hierarchy...',0,1) WITH NOWAIT;
--===========================================================================
--===== Build the test table and populate it on the fly.
-- Everything except ManagerID is populated here. The code uses a
-- technique called a "Psuedo-Cursor" (kudos to R. Barry Young for the
-- term) to very quickly and easily build large numbers of rows.
SELECT TOP (@pRowsToBuild)
EmployeeID = ISNULL(CAST(
ROW_NUMBER() OVER (ORDER BY (SELECT 1))
AS INT),0),
ManagerID = CAST(NULL AS INT),
EmployeeName = CAST(NEWID() AS VARCHAR(36))
INTO TempDB.dbo.Employee
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
CROSS JOIN master.sys.all_columns ac3
;
RAISERROR('There are %u rows in the hierarchy.',0,1,@@ROWCOUNT) WITH NOWAIT;
--===== Update the test table with ManagerID's. The ManagerID is some random
-- value which is always less than the current EmployeeID to keep the
-- hierarchy "clean" and free from "loop backs".
UPDATE TempDB.dbo.Employee
SET ManagerID = CASE
WHEN EmployeeID > 1
THEN ABS(CHECKSUM(NEWID())) % (EmployeeID-1) +1
ELSE NULL
END
;
--===========================================================================
-- Conditionally randomize the hierarchy to be more like real life
--===========================================================================
IF @pRandomize <> 0
BEGIN
--===== Alert the operator
RAISERROR('Randomizing the hierarchy...',0,1) WITH NOWAIT;
--===== Create a randomized cross reference list to randomize the
-- EmployeeIDs with.
SELECT RandomEmployeeID = IDENTITY(INT,1,1),
EmployeeID
INTO #RandomXRef
FROM dbo.Employee
ORDER BY NEWID()
;
--===== Update the ManagerIDs in the Employee table with the new
-- randomized IDs
UPDATE emp
SET emp.ManagerID = RandomEmployeeID
FROM dbo.Employee emp
JOIN #RandomXRef xref ON emp.ManagerID = xref.EmployeeID
;
--===== Update the EmployeeIDs in the Employee table with the new
--randomized IDs
UPDATE emp
SET emp.EmployeeID = RandomEmployeeID
FROM dbo.Employee emp
JOIN #RandomXRef xref ON emp.EmployeeID = xref.EmployeeID
;
END
ELSE
BEGIN
--===== Alert the operator
RAISERROR('The hierarchy is not randomized',0,1) WITH NOWAIT;
END
;
--===========================================================================
-- Build the indexes necessary for performance.
--===========================================================================
--===== Alert the operator
RAISERROR('Building the keys and indexes...',0,1) WITH NOWAIT;
--===== Add some indexes that most folks would likely have on such a table
ALTER TABLE TempDB.dbo.Employee
ADD CONSTRAINT PK_Employee PRIMARY KEY CLUSTERED (EmployeeID)
;
CREATE UNIQUE INDEX By_ManagerID_EmployeeID
ON TempDB.dbo.Employee (ManagerID,EmployeeID)
;
ALTER TABLE dbo.Employee
ADD CONSTRAINT FK_Employee_Employee FOREIGN KEY
(ManagerID) REFERENCES dbo.Employee (EmployeeID)
ON UPDATE NO ACTION
ON DELETE NO ACTION
;
--===========================================================================
-- Exit
--===========================================================================
RAISERROR('===============================================',0,1) WITH NOWAIT;
RAISERROR('RUN COMPLETE',0,1) WITH NOWAIT;
RAISERROR('===============================================',0,1) WITH NOWAIT;
GO
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2015 at 1:51 am
Jeff Moden (7/3/2015)
andrew.norris 90251 (6/7/2015)
Nothing wrong with cursors they have their place and are very useful.Do you have examples for something other hierarchies and bin stacking problems?
A few examples I have run into during the last few years:
A DB application that is designed to interact with an external system, using an interface to get information (e.g. current usage of an account) and change information (new accounts, usage caps, etc). The interface is API based - a single call for a single account. I needed a lot of cursors for that customer!
A different DB application that had a list of procedure execution rules in a configuration table. At set times, it would analyse what events had taken place (setbased), determine which procedures had to run based on that (setbased), determine the proper execution order (setbased), and then call each of the procedures that, again in setbased fashion, would process the appropriate changes. The bit where the procedures were called was cursor-based.
And then there are of course the many utility procedures used by DAs. Ever tried to create an index maintenance script that does not use a cursor or other form of iteration?
July 4, 2015 at 8:34 am
Hugo Kornelis (7/4/2015)
Jeff Moden (7/3/2015)
andrew.norris 90251 (6/7/2015)
Nothing wrong with cursors they have their place and are very useful.Do you have examples for something other hierarchies and bin stacking problems?
A few examples I have run into during the last few years:
A DB application that is designed to interact with an external system, using an interface to get information (e.g. current usage of an account) and change information (new accounts, usage caps, etc). The interface is API based - a single call for a single account. I needed a lot of cursors for that customer!
A different DB application that had a list of procedure execution rules in a configuration table. At set times, it would analyse what events had taken place (setbased), determine which procedures had to run based on that (setbased), determine the proper execution order (setbased), and then call each of the procedures that, again in setbased fashion, would process the appropriate changes. The bit where the procedures were called was cursor-based.
And then there are of course the many utility procedures used by DAs. Ever tried to create an index maintenance script that does not use a cursor or other form of iteration?
Now all 3 of those are what I think cursors where brought into being for and are excellent examples of one great aspect of what cursors sometimes need to be used for... flow control and forced RBAR. Yes, things like index maintenance can be done without cursors or the nearly equivalent Temp Table/While Loop methods but they just make separate commands and concatenate them together. The end result is the same... one command per command row.
To emphasize a bit on Hugo's good examples...
Other examples might be when you need to do the same thing to multiple tables or multiple databases. An example of this can be found in the sys.sp_MSforeachdb and sys.sp_MSforeachtable system stored procedures that people sometimes use.
Shifting gears a bit, there are two points that a lot of people bring up during discussions of cursors....
One of the points that I'm slowly trying to build up to is like the hierarchy challenge I posted. The "best practice" (notice the quotes) is to do things set-based unless you absolutely have to use a cursor (which is what I also call While Loops, single-row-per-iteration Recursive CTEs, and certain types of UDFs). Unfortunately, a lot of people think that they absolutely have to use a cursor way too early in the game.
That's a segue into the next problem that a lot of people have with cursors and other forms of RBAR. They'll frequently say that cursors are easier to write, understand, read, and troubleshoot and that set-based methods take more code and it's complex to troubleshoot. Yes, writing set-based code does require a different paradigm (see my signature line below) and you do actually need to know what you're doing, but that's not a justification for writing cursors or other forms of RBAR. Rather, it should be motivation to learn more about how to write effective database code. In most cases (there are exceptions to all rules and observations, "It Depends"), not only is set-based code frequently tens to thousands of times faster and more resource efficient, it's usually shorter and easier to understand once you've trained yourself to understand such things. People saying that set-based is confusing just means they haven't spent enough time learning it. It's the same problem that a lot of people had when OOPs appeared or using Knuth's famous quote to justify what is actually poor programming rather than what its original purpose was.
The bottom line is that if people don't spend the time to learn how to effectively program against databases, then set-based code will never be in their comfort zone and they'll continue to fall back into their personal comfort zone and they'll continue to justify their use of cursors with the "best practice" of using a cursor if it "can't" be done using set-based code. So, learn some of the math (it's not that hard and I'm not talking about deep relational database concepts such as being able to define things like "relational division", although that will never hurt) that makes for really good set-based code and practice finding a set-based solution every chance you get and you'll be a much more valuable "hybrid" programmer. Yep... I understand getting stuff out the door but if you don't go back to find the set-based alternate for a RBAR solution, then you're not doing yourself or the company you work for any favors. It's part of your job to get better at what you do.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 5, 2015 at 8:50 am
[deleted]
July 6, 2015 at 9:31 am
patrickmcginnis59 10839 (7/5/2015)
Jeff Moden (7/3/2015)
patrickmcginnis59 10839 (7/3/2015)
Jeff Moden (7/3/2015)
andrew.norris 90251 (6/7/2015)
Nothing wrong with cursors they have their place and are very useful.Do you have examples for something other hierarchies and bin stacking problems?
doing inserts to a table on a linked server comes to mind!
Since it was Andrew that posted, I was after him to provide an example of where he thought a cursor might be useful.
You're kind of doing the same thing. You've identified a use but not the reason. Why would you use a cursor to do "inserts to a table on a linked server" instead of something set-based?
Ask Microsoft! Its not like I didn't write the insert as a set based op.
Thanks for the link. I'll have to do some testing there but the use of a cursor wouldn't speed anything up there.
Shifting gears, I don't use OPENQUERY for such things. I use OPENROWSET and I don't use "*". I always use column names. I wonder if the use of "SELECT *" could be throwing a monkey wrench into the works.
Shifting gears one more time, I also find it interesting the people continue to use rCTEs to build test data in their performance related posts. Seems to be a contradiction there.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2015 at 12:14 pm
Someone earlier asked for an example of something that works best when using cursors and here is a script I have that I run when I need to make sure that all the objects have the right permissions set:
DECLARE @ObjectName VARCHAR(100)
,@PermissionName VARCHAR(10)
,@SQL NVARCHAR(4000)
,@RoleName VARCHAR(25) = 'valuation_exec';
DECLARE TablePermissions CURSOR LOCAL FAST_FORWARD
FOR
--all tables and views should have 'SELECT' and stored
--proc should have 'EXECUTE to make sure that the new login would work correctly for everything.
WITH WhatWeExpect
AS (SELECT name
,object_id
,PO.permission_name
,S.type
FROM sys.objects S
CROSS APPLY ( VALUES
( 'SELECT', 'U'),
( 'SELECT', 'V'),
( 'EXECUTE', 'P') ) PO (permission_name, type)
WHERE S.type = PO.type
),
PermissionList
AS (SELECT major_id
,permission_name
FROM sys.database_permissions
WHERE permission_name IN ('SELECT', 'EXECUTE')
AND USER_NAME(grantee_principal_id) = 'valuation_exec'
)
SELECT WWE.name
,WWE.permission_name
FROM WhatWeExpect WWE
LEFT OUTER JOIN PermissionList P
ON WWE.object_id = P.major_id
AND WWE.permission_name = P.permission_name
WHERE P.permission_name IS NULL
ORDER BY WWE.name
,WWE.permission_name
OPEN TablePermissions
FETCH NEXT FROM TablePermissions INTO @ObjectName, @PermissionName
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = N'GRANT ' + @PermissionName + ' ON [dbo].[' + @ObjectName + '] TO [' + @RoleName + ']'
PRINT @sql;
EXECUTE sp_executesql @sql;
FETCH NEXT FROM TablePermissions INTO @ObjectName, @PermissionName
END
CLOSE TablePermissions
DEALLOCATE TablePermissions
July 6, 2015 at 2:02 pm
Maybe not. I guess I didn't know that one. But the overall point, as far as cursors go, is if you have a situation where you need to execute a system stored procedure for something, like if you are renaming tables that maybe have a specific string in the name and you would like to replace what string in the name with a different string. I can see using a cursor (unless there is a way to do that too that I don't know about).
Viewing 15 posts - 136 through 150 (of 215 total)
You must be logged in to reply to this topic. Login to reply