Need to PIVOT without aggregate function

Question

Need to PIVOT without aggregate function

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply

David in .AU SSCommitted Points: 1784 More actions · Answer 1

Actually it sounds like you might need to rethink your table design.

If you are storing a single row of data for a single object and only updating the values on that row, then why are you using a relational database? sounds like Excel might be more useful.

In a basic relational model the seats and engines would be fields in the primary 'aircraft' table (as you are describing the object itself and are unlikely to change or be shared amoungst different objects, I am not going to go into object polymorphism here...) and the landings and hours would be based on something like a 'flights' table where you would store a row for every trip the aircraft took recording things such as 'hours', 'fuel used', 'beverages consumed' etc. the number of rows SHOULD equal the number of landings (maybe a date field called Date_Landed if you want to use a more accurate method, you could count the # rows where the field is not null)

as for the not being able to add more colums.. that would be untrue, you could add a column whenever you felt like it, if you dont wish to allow null values, simply set the default value field for the column to 0 or whatever is applicable for the data type.

FYI, there is a blog entry for The Rozenshtein Method which uses logic arithmetic to perform pivot operations. Good example, explains what each part of the logic is doing. I have used it for an aged debt report (from an SQL 2000 server, thus, no pivot) and it worked nicely.

Bruce Quackenbush-333510 SSChasing Mays Points: 632 More actions · Answer 2

Thank you for your input. The database structure is very unorthodox, but it has to be that way to satisfy the requirements of the application we are writing. Actually several "sister" applications using the same database but with different table columns, hence the "pivoted" table structure; we add rows instead of columns. As you know, there are many reasons not to use Excel in place of SQL Server, regardless of the table structure. Someone suggested Optimizing Transact-SQL by David Rozenshtein so I tracked down a used copy (it is no longer in print) and the first chapter is about, apparently, his method to which you refer. I haven't tried it yet, but will soon.

Sergiy SSC Guru Points: 110208 More actions · Answer 3

Application has nothing to do with table structure.

It must use ONLY SP calls.

If you run ad-hoc queries it's probably too late to optimise performance of your application(s).

_____________
Code for TallyGenerator

David in .AU SSCommitted Points: 1784 More actions · Answer 4

Yeah, not knowing what it is that your application needs to do makes commenting difficult, and if the structure works for your app, then it works for your app

Application has nothing to do with table structure? Actually I would say you should design your application first and then create your table structure to suit.

there are obviously going to be exceptions to this, but you need to work out the application data flow before you can realistically create tables that will support that data flow.

And not everything has to be a stored proc. I'm sure you will find a bazillion articles and white papers on the pros and cons of when and where to use them.

eg. A software developer may wish to embed the SQL within their compiled code so that people cant just wander in and see how the software does what it does in the database and thus be replaced by some other suspect software company trying to steal business.

In short, it's a philisophical debate with the odd fact here and there

Sergiy SSC Guru Points: 110208 More actions · Answer 5

David, you definitely never were involved in any big project. Otherwise you would not say such a b... sh...

Sorry.

Table structure must support data entities and its relationships, not data flow.

Data flow must be supported by application, and SPs must be a part of it.

If you are worrying about stealing your code use encripted SPs and sequre your database.

There is nothing philisophical to debate. If you use ad-hoc queries addressing tables in DB your application just cannot pass any sequrity audit.

It's that simple.

Not to mention performance issues.

_____________
Code for TallyGenerator

Bruce Quackenbush-333510 SSChasing Mays Points: 632 More actions · Answer 6

Sergiy - That "table structure must support data entities and its relationship" is the rub. Traditional relational databases do not do that as well as we'd like for objects, (as opposed to for recordsets). In the old days we'd just query, use a join, get a recordset and populate a form with the data from the rs. Nowadays we have to assign values from the database tables to object properties, hence the stirring of interest in object oriented database engines. I have thought about writing procs in a CLR language and returning datasets, but I don't know how bad the performance hit would be. For the short term, at least, we did what we did because we have to deal with several entities in each table that have very different properties. We could have created a separate table for every entity type, and some argued for that, but the requirement we were given was to construct a database that would NOT require ongoing structural updates. I am happy to say that the result works great, performance is more than acceptable and will be even better when we implement Rozenshtein's method of pivoting data in procs (is now done in the data access layer), and I am confident that it will pass rigorous security testing - required since it is software for the military.

David - Sergiy was not entirely fair. Contrary to what he appears to assume, it is not always the case that security and/or performance are the top priorities. I have worked on applications that, while not huge, were not small, where we used no stored procedures largely because the future database was unknown. That is, it might have changed at some later date to Oracle or even MySql, and the budget did not allow for coding for that contingency. SPs are always the preferred way to go for performance and security, but there are times when it is justified, depending on the client's requirements, to do otherwise. Suppose, for example, that you knew you would not be able to modify procs in the db after deployment, but would be able to replace the exe - what do you do? Everything is a tradeoff. Having said that, however, you should avoid ad-hoc queries like the plague! Incidentally, some of the uninitiated will think that it is smart to pass into a SP strings that will include column names to be used to construct the SELECT clause, or WHERE clause or whatever, probably to reduce the number of required SPs. Bad idea - if the db doesn't "know" what the query is (because it doesn't change), it has to parse it every time it runs and there goes the performance advantage of using stored procedures. I had to argue against that in the current project.

Jeff Moden SSC Guru Points: 1004505 More actions · Answer 7

Unorthodox? If you think about it, the table structure is similar to that which XML offers (thought everyone but me loved XML ) but without the parent/child relationships. Every data element has a "tag" to identify what the element is. The table is not unlike those working tables spawned by SQL Server when you resolve an XML document.

There are dynamic and non-dynamic solutions to this but, as some have suggested, the "cross-tab"-like code using either MIN or MAX is very quick either in dynamic or static form. And, though shorter code is not always better or faster code, in this particular instance the cross-tab method will make this very short and easy to understand code. The dynamic method would certainly make it "self healing" when new attributes are made to the table.

Bruce, are you all set or do you need more help with this one?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Bruce Quackenbush-333510 SSChasing Mays Points: 632 More actions · Answer 8

Jeff - not sure yet if we are set. I purchased Optimizing Transact-SQL by David Rozenshtein (and others) and the first chapter is "Single Statement SQL Solutions to the Table Pivoting and Folding Problems." When I get the time I'll see if his solution is what we need, if not I think we'll go with the new PIVOT function unless there is a better solution. I'm certainly open to suggestions. Rozenshtein writes "Simply stated, table pivoting problem is that of turning - or pivoting - long, narrow tables into short, wide ones." That is precisely what we are doing. Pulling the records we need is easy because every record has the requisite foreign key value so it is simply "SELECT * FROM tbl WHERE id =" @id. The nature of the records returned varies depending on what kind of entity is represented by @id. The actual properties of the entity are defined in code, so any "excess" records would simply be ignored and any missing records would result in an empty or null value for the corresponding object property. If you have ideas on this I am all "ears." Changing the table structure is not an option, however.

Initially I used reflection to match values from the tables to the object properties, having named the object properties exactly the same as the values from the tables. For example, a record could be id, property, value. Property could be "HOURS" or "WHEELS" or whatever.

Using reflection I could get each object property name and match it up to the value in the property column, and get the value from the value column. It works fine, and because this application never has more than a handful of users at any given time, performance is ok, but I want to maximize performance by doing away with the reflection stuff and pivoting the data in stored procedures...or adopting any other better idea that might be suggested.

Thanks,

Bruce

Jeff Moden SSC Guru Points: 1004505 More actions · Answer 9

OK, Bruce... here you go...

You were worried about the performance of a cross-tab (aggregate functions)... as some have suggested (especially Serqiy), that really shouldn't be a concern. I know... you've heard lots of bad things about aggregates and dynamic SQL and you'd probably like to do a test, just to make sure, but you don't have enough data to really make sure... That's no longer true, my friend!!!

The following code makes two tables (not named the same as yours but close) using the same methods I use to make Tally or Numbers tables... one simulates your long skinny property table (tblPropertiesTest) and the other table (tblPropertiesNameTest) contains distinct names of properties from the first table. This second table is necessary for two reasons...

It is the "parent" of the PropertyName column (by foreign key) and will keep people from adding adhoc PropertyName's to the long skinny table but makes it very easy to allow/add a new property just by adding the property name to the table. No code changes to do this!!
It allows the results to be put in a consistant order as the dynamic SQL is built in the same order as the PropertyNameID column

Here's the code to build both tables... it makes 1,000,000 rows of test data and it takes about 50 seconds to execute... If you look at the two tables AFTER they are created, you will also find what I believe you have/need for PKs and FKs...

--======================================================================================== -- Create the temporary test table and populate with random test data. --======================================================================================== -- IF OBJECT_ID('dbo.tblPropertiesTEST') IS NOT NULL -- DROP TABLE dbo.tblPropertiesTEST
CREATE TABLE dbo.tblPropertiesTEST ( Aircraft INT NOT NULL , PropertyName VARCHAR(25) NOT NULL , Value VARCHAR(25) NOT NULL )
CREATE UNIQUE NONCLUSTERED INDEX IXU_tblPropertiesTEST_Aircraft_Value ON dbo.tblPropertiesTEST (Aircraft,PropertyName) WITH IGNORE_DUP_KEY
INSERT INTO dbo.tblPropertiesTEST (AirCraft,PropertyName,Value) SELECT TOP 1000000 AirCraft = CAST(RAND(CAST(NEWID() AS VARBINARY))*100000+1 AS INT), PropertyName = 'PropertyName_'+CHAR(RAND(CAST(NEWID() AS VARBINARY))*26+65), Value = CAST(RAND(CAST(NEWID() AS VARBINARY))*2000+1 AS INT) FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2
DROP INDEX dbo.tblPropertiesTEST.IXU_tblPropertiesTEST_Aircraft_Value ALTER TABLE dbo.tblPropertiesTEST ADD CONSTRAINT PK_tblPropertiesTEST__Aircraft_Value PRIMARY KEY NONCLUSTERED (Aircraft,PropertyName)
--======================================================================================== -- Create the temporary property name test table and populate with existing property -- names. --======================================================================================== IF OBJECT_ID('dbo.tblPropertiesNameTEST') IS NOT NULL DROP TABLE dbo.tblPropertiesNameTEST
CREATE TABLE dbo.tblPropertiesNameTEST ( PropertyNameID INT IDENTITY(1,1) NOT NULL , PropertyName VARCHAR(25) NOT NULL PRIMARY KEY CLUSTERED )
INSERT INTO tblPropertiesNameTEST (PropertyName) SELECT DISTINCT PropertyName FROM dbo.tblPropertiesTEST ORDER BY PropertyName

If you do a check, you'll find that there are about 100,000 different Aircraft in the table each having an average of about 10 out of 26 different properties. I figure that's more than you'll EVER have... but who knows?

Now... here's the code that you should probably turn into a stored procedure. If you change the assigned value of @pAircraft, you can see that it's not only fast as hell (about 50 milliseconds for 1 aircraft, only 42 seconds for ALL 100,000 aircraft in the grid mode), it's also "programmable". The comments in the code explain most everything. You could certain modify the code by adding parameters that would become different criteria, etc, in the dynamic SQL. Do be careful to test each parameter for "odd" characters to really reduce the chances of SQL Injection attacks, eh?

--===== Declare a variable to hold the Aircraft we want to find -- If you turn this into a stored procedure (and you should) -- this will be in input parameter to find the info for a -- given aircraft. If it's NULL, it will return ALL aircraft. DECLARE @pAircraft VARCHAR(12) SET @pAircraft = NULL
--===== Declare a timer variable so we can measure how long this takes DECLARE @Start DATETIME SET @Start = GETDATE()
--===== Suppress the auto-display of rowcounts for speed and appearance SET NOCOUNT ON
--===== Declare local variables to hold dynamic SQL DECLARE @SQLCmd VARCHAR(8000) DECLARE @SQLSelectList VARCHAR(8000) DECLARE @SQLFrom VARCHAR(8000) DECLARE @SQLCriteria VARCHAR(8000) DECLARE @SQLGroupBy VARCHAR(8000)
--===== Preset the static portions of the dynamic SQL SELECT @SQLCmd = 'SELECT Aircraft,' SELECT @SQLFrom = CHAR(13)+'FROM dbo.tblPropertiesTEST' SELECT @SQLGroupBy = CHAR(13)+ 'GROUP BY Aircraft'
--===== Preset the criteria portion of the dynamic SQL SELECT @SQLCriteria = ISNULL(CHAR(13)+' WHERE Aircraft = '+@pAircraft,'')
--===== Calculate the SELECT LIST portion of the dynamic SQL SELECT @SQLSelectList = COALESCE(@SQLSelectList+',',' ')+CHAR(13) + 'MIN(CASE WHEN PropertyName = '''+PropertyName+''' THEN Value ELSE NULL END) AS ' + PropertyName FROM dbo.tblPropertiesNameTEST ORDER BY PropertyNameID
--===== Assemble and execute the dynamic SQL command EXEC (@SQLCmd+@SQLSelectList+@SQLFrom+@SQLCriteria+@SQLGroupBy)
--===== Print the duration PRINT DATEDIFF(ms,@Start,GETDATE())

Oh yeah... I'm thinking you might not need that book anymore ... If you have any questions on this, don't hesitate to post back but do read the comments in the code and play with it for a while...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004505 More actions · Answer 10

Forgot something else... if you parameterize the table name for the small names table, you can pass a parameter to use different name tables... that will allow you some extreme flexibility in what the result columns should be and the left-to-right order they should be in. Of course, each of those names tables should have a foreign key to the "master" names table that has all of the allowable PropertyNames.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)