December 13, 2010 at 5:46 am
In a previous topic I posted I asked if it were possible to put a limit/cap on a dataset, from the answers on that topic I not only learned yes but also mayor improvements to my logic, now I want to separate a problem to get more feedback on it specifically
See the original topic here.
Let say you got the following scenario:
Table dbo.Machine
with the fields
MachineId int,
CompanyId int,
UtcTime datetime, Value int,
Description1 nvarchar(20)
Let say you want the size of the resultset of the following SQL
Select TOP 10 MachineId,CompanyId,Description1 From dbo.Machine where MachineId <> 1 And CompanyId <=100
How do you do it, take into account that the solution must work for other possible selects as well.
In the end the solution should be able to work with a dynamic sql.
Anyone that has any ideas please contribute.
December 14, 2010 at 2:22 am
In order to get at least some feedback I'm including some data (note that this data is complete nonsense and just made up for this scenario)
insert into dbo.Machine values
(123,88,GETDATE(),0,'LOW GROUND')
insert into dbo.Machine values
(1,100,GETDATE(),0,'SIERR')
insert into dbo.Machine values
(1,170,GETDATE(),0,'ABE8')
insert into dbo.Machine values
(2,101,GETDATE(),0,'NORWAY')
insert into dbo.Machine values
(2,100,GETDATE(),0,'NORWAY 2')
insert into dbo.Machine values
(556,22,GETDATE(),0,'NOW OR NEVER')
insert into dbo.Machine values
(1,677,GETDATE(),0,'NOT VALID DESCRIPTI')
insert into dbo.Machine values
(2,44,GETDATE(),0,'VALID DESCRIPTION')
insert into dbo.Machine values
(7,7,GETDATE(),7,'7')
insert into dbo.Machine values
(123,123,GETDATE(),123,'123')
insert into dbo.Machine values
(0,0,GETDATE(),0,'0')
insert into dbo.Machine values
(1,1,GETDATE(),1,'SPLIT PEA SOUP')
insert into dbo.Machine values
(2,88,GETDATE(),0,'FOWARD')
insert into dbo.Machine values
(130,5,GETDATE(),999,'HIGH GROUND')
The result of the query
Select TOP 10 MachineId,CompanyId,Description1 From dbo.Machine where MachineId <> 1 And CompanyId <=100
is as follows
12388LOW GROUND
2100NORWAY 2
55622NOW OR NEVER
244VALID DESCRIPTION
777
000
288FOWARD
1305HIGH GROUND
Now the idea is to calculate what the size is of the result without using datalength
December 14, 2010 at 2:54 am
Have you created the table of data types and their widths like I suggested in the other thread? And why don't you want to use DATALENGTH?
I'll tell you how I'd do it. I'd have a table enumerating all the possible varchar and nvarchar columns. Once a day at a quiet time (or possibly more often, depending on the volatility of your data) I'd run a job that runs something like the following for each column and stores the results in another table:
SELECT AVG(DATALENGTH(MyColumn)) FROM MyTable (NOLOCK)
You can then use those results to estimate the size of the result set based on the columns contained in it and the average size of the data in those columns.
Note on the use of the NOLOCK hint: I would normally never advocate the use of this in production code because of the danger of dirty reads. However, in this instance, we are looking for an estimate of the size of the data and therefore it doesn't matter if the occasional dirty read means we get slightly wrong results once in a while. It's more important that this process doesn't interfere with the normal OLTP activities in your database.
John
December 15, 2010 at 1:37 am
OK I didn't answer yesterday since I had a ton's of other stuff to do.
No I haven't created a table yet to hold the sizes for each dataype,
I was hoping for some more feedback before doing anything drastic.
I don't want to use datalength cause these request will indeed go over a network and it might affect network traffic to much.
December 15, 2010 at 1:47 am
Sorry, you've lost me. Why would using DATALENGTH to determine row size affect the amount of traffic that goes across the network?
John
December 15, 2010 at 2:30 am
Since this will in the end all serve to make a semi-dynamic report, so there might be many requests (I read somewhere that the use of datelength could affect network performance and throughput in such a scenario)
December 15, 2010 at 2:34 am
I'd like to see the link for that if you still have it, please. I can't see how it would affect network performance, although I admit that it is likely to slow down whatever routine you use to calculate the size of the data. That's why I recommended doing it just once a day and storing the results in a table.
John
December 15, 2010 at 5:51 am
I don't know where I read it anymore & don't have the linke, as for calculating it daily i don't think that's an option as some databases on our server contain tables with over a million records.
December 15, 2010 at 7:35 am
Resender (12/15/2010)
as for calculating it daily i don't think that's an option as some databases on our server contain tables with over a million records.
Well, the alternative is to calculate it every time you run the stored procedure. I know which I'd rather do.
John
December 16, 2010 at 7:04 am
I've been thinking is there a) way to calculate the size a temp table has
and b) a way to cap the size of a temp table
off course the new problem that arises would be creating a temp table with dynamic comumns
December 16, 2010 at 7:14 am
You can do:
EXEC tempdb.dbo.sp_spaceused #MyTempTable
I suppose once you have the size of the temp table you could estimate how many rows would add up to your size cutoff. The only drawback to this is that you may be extracting a lot of rows from the database that you will ultimately just throw away.
John
December 16, 2010 at 8:06 am
Yeah more or less came to the same conclusion,
a way to solve my issue would be to let the stored procedure run and put the result in a temp table (dynamically tailored to fit my end use) and limit the amount of rows or the size of this temp table.
Isn't it always a drawback then, cause you can calculate the size of the recordset but you can't really cap it.
A very ugly solution (and I mean ugly in every sense of the word) is to use a temp table to calculate the total size of the recordset, divide that by the amount of rows and then calculate the amount of rows before the cut off. (Off course you could then just select top X * from #TempTable)
December 17, 2010 at 1:17 am
This little project has been terminated since we'll be getting new projects in on Monday.
I'll be posting the entire code and files later today, so that those that are interested in it can work on it.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply