How best to calculate the size of a resultset (in kb) without the use of datalength

  • In a previous topic I posted I asked if it were possible to put a limit/cap on a dataset, from the answers on that topic I not only learned yes but also mayor improvements to my logic, now I want to separate a problem to get more feedback on it specifically

    See the original topic here.

    Let say you got the following scenario:

    Table dbo.Machine

    with the fields

    MachineId int,

    CompanyId int,

    UtcTime datetime, Value int,

    Description1 nvarchar(20)

    Let say you want the size of the resultset of the following SQL

    Select TOP 10 MachineId,CompanyId,Description1 From dbo.Machine where MachineId <> 1 And CompanyId <=100

    How do you do it, take into account that the solution must work for other possible selects as well.

    In the end the solution should be able to work with a dynamic sql.

    Anyone that has any ideas please contribute.

  • In order to get at least some feedback I'm including some data (note that this data is complete nonsense and just made up for this scenario)

    insert into dbo.Machine values

    (123,88,GETDATE(),0,'LOW GROUND')

    insert into dbo.Machine values

    (1,100,GETDATE(),0,'SIERR')

    insert into dbo.Machine values

    (1,170,GETDATE(),0,'ABE8')

    insert into dbo.Machine values

    (2,101,GETDATE(),0,'NORWAY')

    insert into dbo.Machine values

    (2,100,GETDATE(),0,'NORWAY 2')

    insert into dbo.Machine values

    (556,22,GETDATE(),0,'NOW OR NEVER')

    insert into dbo.Machine values

    (1,677,GETDATE(),0,'NOT VALID DESCRIPTI')

    insert into dbo.Machine values

    (2,44,GETDATE(),0,'VALID DESCRIPTION')

    insert into dbo.Machine values

    (7,7,GETDATE(),7,'7')

    insert into dbo.Machine values

    (123,123,GETDATE(),123,'123')

    insert into dbo.Machine values

    (0,0,GETDATE(),0,'0')

    insert into dbo.Machine values

    (1,1,GETDATE(),1,'SPLIT PEA SOUP')

    insert into dbo.Machine values

    (2,88,GETDATE(),0,'FOWARD')

    insert into dbo.Machine values

    (130,5,GETDATE(),999,'HIGH GROUND')

    The result of the query

    Select TOP 10 MachineId,CompanyId,Description1 From dbo.Machine where MachineId <> 1 And CompanyId <=100

    is as follows

    12388LOW GROUND

    2100NORWAY 2

    55622NOW OR NEVER

    244VALID DESCRIPTION

    777

    000

    288FOWARD

    1305HIGH GROUND

    Now the idea is to calculate what the size is of the result without using datalength

  • Have you created the table of data types and their widths like I suggested in the other thread? And why don't you want to use DATALENGTH?

    I'll tell you how I'd do it. I'd have a table enumerating all the possible varchar and nvarchar columns. Once a day at a quiet time (or possibly more often, depending on the volatility of your data) I'd run a job that runs something like the following for each column and stores the results in another table:

    SELECT AVG(DATALENGTH(MyColumn)) FROM MyTable (NOLOCK)

    You can then use those results to estimate the size of the result set based on the columns contained in it and the average size of the data in those columns.

    Note on the use of the NOLOCK hint: I would normally never advocate the use of this in production code because of the danger of dirty reads. However, in this instance, we are looking for an estimate of the size of the data and therefore it doesn't matter if the occasional dirty read means we get slightly wrong results once in a while. It's more important that this process doesn't interfere with the normal OLTP activities in your database.

    John

  • OK I didn't answer yesterday since I had a ton's of other stuff to do.

    No I haven't created a table yet to hold the sizes for each dataype,

    I was hoping for some more feedback before doing anything drastic.

    I don't want to use datalength cause these request will indeed go over a network and it might affect network traffic to much.

  • Sorry, you've lost me. Why would using DATALENGTH to determine row size affect the amount of traffic that goes across the network?

    John

  • Since this will in the end all serve to make a semi-dynamic report, so there might be many requests (I read somewhere that the use of datelength could affect network performance and throughput in such a scenario)

  • I'd like to see the link for that if you still have it, please. I can't see how it would affect network performance, although I admit that it is likely to slow down whatever routine you use to calculate the size of the data. That's why I recommended doing it just once a day and storing the results in a table.

    John

  • I don't know where I read it anymore & don't have the linke, as for calculating it daily i don't think that's an option as some databases on our server contain tables with over a million records.

  • Resender (12/15/2010)


    as for calculating it daily i don't think that's an option as some databases on our server contain tables with over a million records.

    Well, the alternative is to calculate it every time you run the stored procedure. I know which I'd rather do.

    John

  • I've been thinking is there a) way to calculate the size a temp table has

    and b) a way to cap the size of a temp table

    off course the new problem that arises would be creating a temp table with dynamic comumns

  • You can do:

    EXEC tempdb.dbo.sp_spaceused #MyTempTable

    I suppose once you have the size of the temp table you could estimate how many rows would add up to your size cutoff. The only drawback to this is that you may be extracting a lot of rows from the database that you will ultimately just throw away.

    John

  • Yeah more or less came to the same conclusion,

    a way to solve my issue would be to let the stored procedure run and put the result in a temp table (dynamically tailored to fit my end use) and limit the amount of rows or the size of this temp table.

    Isn't it always a drawback then, cause you can calculate the size of the recordset but you can't really cap it.

    A very ugly solution (and I mean ugly in every sense of the word) is to use a temp table to calculate the total size of the recordset, divide that by the amount of rows and then calculate the amount of rows before the cut off. (Off course you could then just select top X * from #TempTable)

  • This little project has been terminated since we'll be getting new projects in on Monday.

    I'll be posting the entire code and files later today, so that those that are interested in it can work on it.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply