October 19, 2006 at 10:12 am
Hi,
We have a batch process that is written entirely as a set of stored procedures.
We have been asked to add some additonal calculations into the database which have been specified in Excel. The analyst has used the Excel Solver to find the lowest value of a each number in a set of 3 numbers that feed into several other calculations. We need to duplicate this process in a stored proc. Does any one know of any information that might be able to help us? There will be several thousand numbers that need to be calculated in this way and performance is a major issue.
I would be interested to know if anyone has hit this problem before and how you got round it.
Thanks.
October 19, 2006 at 11:16 am
Would 10000 rows in 16 milliseconds be fast enough? Here's the test code and the demo of the formula...
--===== If the test table exists, drop it
SET NOCOUNT ON
IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL
DROP TABLE #MyHead
GO
--===== Create the test table with a primary key
CREATE TABLE #MyHead
(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Val01 INT,
Val02 INT,
Val03 INT
)
--===== Populate the test table with random test data including some negative numbers
INSERT INTO #MyHead
(Val01,Val02,Val03)
SELECT TOP 10000
Val01 = CAST(RAND(CAST(NEWID() AS VARBINARY))*10000-1000 AS INT),
Val02 = CAST(RAND(CAST(NEWID() AS VARBINARY))*10000-1000 AS INT),
Val03 = CAST(RAND(CAST(NEWID() AS VARBINARY))*10000-1000 AS INT)
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Start a timer and find the min of the 3 columns on each row
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
SELECT ID,
(((Val01+Val02-ABS(Val01-Val02))/2) + Val03 - ABS(((Val01+Val02- ABS(Val01-Val02))/2) - Val03))/2
FROM #MyHead
PRINT DATEDIFF(ms,@StartTime,GETDATE())
--Jeff Moden
Change is inevitable... Change for the better is not.
October 19, 2006 at 11:27 am
How about we use the good old case statment??
--===== If the test table exists, drop it
SET NOCOUNT ON
IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL
DROP TABLE #MyHead
GO
--===== Create the test table with a primary key
CREATE TABLE #MyHead
(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Val01 INT,
Val02 INT,
Val03 INT
)
--===== Populate the test table with random test data including some negative numbers
INSERT INTO #MyHead
(Val01,Val02,Val03)
SELECT TOP 500000
Val01 = CAST(RAND(CAST(NEWID() AS VARBINARY))*10000-1000 AS INT),
Val02 = CAST(RAND(CAST(NEWID() AS VARBINARY))*10000-1000 AS INT),
Val03 = CAST(RAND(CAST(NEWID() AS VARBINARY))*10000-1000 AS INT)
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Start a timer and find the min of the 3 columns on each row
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
SELECT ID,
(((Val01+Val02-ABS(Val01-Val02))/2) + Val03 - ABS(((Val01+Val02- ABS(Val01-Val02))/2) - Val03))/2 AS MinValue
FROM #MyHead
PRINT DATEDIFF(ms,@StartTime,GETDATE())
SET @StartTime = GETDATE()
SELECT ID, CASE WHEN Val01 < Val02 AND Val01 < Val03 THEN Val01
WHEN Val02 < Val03 AND Val02 < Val01 THEN Val02
ELSE Val03
END AS MinValue
FROM #MyHead
PRINT DATEDIFF(ms,@StartTime,GETDATE())
DROP TABLE #MyHead
My version seens to outrun yours consistently on a 500K rows table (3 sec on a very slow machine). They were neck to neck at only 10K so I changed the test .
October 19, 2006 at 2:47 pm
Nice tries, my friends.
But I would fire a database designer who puts same meaning values in different columns of the same table. Immediately. Without any discussions.
You should not compare values in different columns. Under any circumstances.
How about that?
Or we don't do relational databases anymore?
When did we switch to Excel?
_____________
Code for TallyGenerator
October 19, 2006 at 3:00 pm
I would tend to agree but there may be some strange business logic in there so I don't want to say something I'll later regret .
October 19, 2006 at 4:23 pm
Me too, Serqiy... however, sometimes you just gotta play the cards you've been dealt. Do you have a code solution for this that you would like to share?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 19, 2006 at 6:53 pm
Hey Jeff, care to explain how you can expand your formula to any number of columns to compare?
October 19, 2006 at 7:05 pm
Jeff, I see nothing was dealt here.
Key words:
We have been asked to add some additonal calculations into the database which have been specified in Excel.
Guys just copied Excel tables to SQL Server and now want to copy functionality as well.
Just to minimise thinking efforts. And avoid learning some RDMS basic technics.
Wanna be part of this?
_____________
Code for TallyGenerator
October 19, 2006 at 10:12 pm
"Just to minimise thinking efforts"... heck, Serqiy, what effort? I don't think they (the analyst's) thought about it at all and the users that are asking these guys to do this stuff certainly don't have a clue, either. Remember, the key words were...
We have been asked to add some additonal calculations... The analyst has used ...
Translation... "we've been told" by the users/analysts and our manager doesn't know any better so he/she is holding an anvil over our heads to get it done.
Makes the old "T"-shirt saying ring true...
SELECT *
FROM Users
WHERE Clue > 0
(0 row(s) affected)
But I do remember being the poor slob who had to fix or even perpetuate everybody's mistakes and I had to do it as they designed it or they'd find someone else... since jobs don't grow on trees (although I'd certainly be looking for one right after I was given this task), I thought I'd lend them a hand with solving their immediate short-term problem. I normally don't even mention anything about their bad design because, usually, it's too far gone to help remotely and the guy asking the question isn't the one who did the design.
Out for Justice, some friendly advice...
Even though I help out on things like this, I sometimes have to marvel at the requests and suppress the urge to say "yep, I can do that but do these folks really have any idea how much trouble they're in or what?" Serqiy is actually very correct albeit a bit brusk. So, I'll say it with a slightly different tone... You have a much bigger problem because anytime you have to do something like what you requested (min of 3 column values at the row level), it smacks of a denormalized table(s) and a poorly designed database. Don't take that personally... it happens with a lot of databases that have been "designed by users/analysts" that are "familiar with SQL"... they don't really understand that a database is much more than just a dumping ground for poorly formed spreadsheet-like data. And the people that are driving you guys through this project aren't helping much, either.
I probably don't have the time to help you do a full redesign of whatever project you're working on, but if you're worried about the performance on a mere "several thousand records", then a redesign is probably the best advise anyone could give you. Now, most folks will claim they don't have the time for a redesign... I'd suggest, and I'm certain my old friend Serqiy and others would agree, you don't have the time to not to redesign... this will come back to bite you over and over and over.
Consider doing the redesign... if you come up with the same answer as before, get a second opinion because, you've gotta trust us on this one, the basic design you currently have is just not right. If you insist that it's right, then just stick with the spreadsheet... it will serve you better because you don't need the functionality of a database.
I hope you don't think we're just dumping on you... we're just trying to motivate you into redefining the problem in terms of a database instead of a spreadsheet. Of course, I could just be preaching to the choir and you poor buggers are just trying to do what you're told.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 20, 2006 at 5:00 am
Hi everyone and thanks for your comments.
Unfortunately I clearly didn't define my problem clearly enough because we seem to have gone on to a side issue questioning the database design, rather than answering the question I originally posed.
I can promise you that the problem does not lie in the database design; the three fields in question are most definitely not "same meaning values in different columns of the same table". I'm not sure where this idea came from, I clearly didn't explain myself very well at the outset. These numbers are distinct data items that feed into a long set of interrelated calculations, which eventually result in a single number. This single number is not a "correct answer" as such, merely a approximation. What the Excel Solver is able to do is to give us the best approximation (in our case the lowest set of numbers), given the set of constraints we impose. If anyone is interested (and there is no reason why you should be) the set of calculations are based on the Nelson Siegel Approximation, (but go off in several different directions). We have been asked to duplicate this functionality in our system.
The example using the random numbers is fun, but in no way guarantees that we will ever get the best approximation. I believe that the way the solver works is to make an initial guess to the numbers and then modify them up and down to try to find the best set.
I have already spoken to the analyst to explain that what he is asking will be difficult to build. I'm sure these calculations were performed before excel was invented, so maybe there is a manual process that we can copy in code. It may be that T-SQL is not up to the task and we will need to build some kind of add-on, but I was hoping that someone may have already come across this problem before and that there might be a product already out there we could use.
Thanks again for taking the time to think about this, any further comments or thoughts will always be greatly appreciated!
October 20, 2006 at 5:48 am
Sure we'll keep helping.
I suggest you post some sample data from the table and the required results. That way we will be able to give you the correct query.
Also why did jeff or my solution didn't work in this case?
October 20, 2006 at 5:58 am
One thing is clear enough:
you cannot compare apples to oranges.
If you compare some values then they all are either apples or oranges.
According to Relational Data model all apples must be stored in one and only one column. Same for oranges.
You are putting them into different columns. That's a root of all your problems.
If you asked to move solution from Excel to SQL then you need to move from spreadsheet to relational data storage.
Start with placing data correctly - and it will return you a favour.
_____________
Code for TallyGenerator
October 20, 2006 at 6:24 am
Justice,
You're ok... The "example using the random numbers" is just a setup for test data to show the code works (as explained in the comments) and should not be included in your final code. The code that solves your 3 number min is boiled down to one of the two code snippets (extracted from the answers above)...
SELECT ID,
(((Val01+Val02-ABS(Val01-Val02))/2) + Val03 - ABS(((Val01+Val02- ABS(Val01-Val02))/2) - Val03))/2 AS MinValue
FROM #MyHead
... or ...
SELECT ID, CASE WHEN Val01 < Val02 AND Val01 < Val03 THEN Val01
WHEN Val02 < Val03 AND Val02 < Val01 THEN Val02
ELSE Val03
END AS MinValue
FROM #MyHead
Either will do... Of course, you will need to change Val01, Val02, and Val03 to whatever table column names you actually have and you will need to change the #MyHead table name to the table name or names (with a join) that you actually have.
Both snippets run well although the second one (with CASE) seems to have an advantage in performance.
If you need the min of more than 3 numbers, we'll need to do something a bit different.
Sorry for all the rhetoric about bad design, etc... we just worry about those things, alot.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 20, 2006 at 6:29 am
Jeff, my question still stands... how the heck do I change your formula to apply it on 4 or more columns ?
October 20, 2006 at 7:16 am
Ninja,
Thanks, but unfortunately, I can't post any specific examples of our calculations. The data itself would not be meaningful without the calculations that produced them, and I can't post these calculations as they are covered by a confidentiality agreement. The final number we are looking for is calculated using around 30 other variables, some of which are fixed data items and others which are calculated fields.
Sergiy:
I'm not sure I follow what you are saying. My 3 data items have to be different fields, they have different meanings and are only related by formulae. If for example, you wanted to store an order to buy 10 widgets at 50 each plus 20 for postage, these are 3 separate fields in the database (price, quantity, postage) which are related by a formula : (10 * 50) + 20. You wouldn't put those in the same field would you? This is the same with the 3 numbers I have, they do not represent the same data item. The question is about how to calculate particular values rather than where to put them.
Jeff,
Thanks for your example, what I am trying to achieve has many interdependancies on other data items. The Excel Solver is able to do this in a generic way. It's a bit like solving simultaneous equations.
Viewing 15 posts - 1 through 15 (of 47 total)
You must be logged in to reply to this topic. Login to reply