Split Table while keeping only unique records

  • i have a table with the following fields

    [GUID] [uniqueidentifier] ROWGUIDCOL NOT NULL,

    [VehicleGUID] [uniqueidentifier] NOT NULL,

    [TimeStamp] [datetime] NOT NULL,

    [X] [float] NULL,

    [Y] [float] NULL,

    [Accuracy] [tinyint] NOT NULL,

    [Speed] [float] NOT NULL,

    [Heading] [smallint] NOT NULL,

    [CreationDate] [datetime] NULL,

    [Battery] [tinyint] NULL

    the key of the table is (VehicleGUID,TimeStamp)

    which holds GPS data of vehicles. the table holds over 50 million records and takes a lot of disk space (this table with all it's indexes and a few other fields not listed take over 40GB)

    i have discovered that the Battery field changes roughly for every vehicle once an hour while the table itself has data every minute from the GPS for that vehicle. and i have also discovered that the battery field on some vehicles is null as the GPS does not send battery info.

    to save disk space, i want to split the battery field to another table and i also want to keep only the value changes so i will have a table structure like this

    GPS Data Table:

    [GUID] [uniqueidentifier] ROWGUIDCOL NOT NULL,

    [VehicleGUID] [uniqueidentifier] NOT NULL,

    [TimeStamp] [datetime] NOT NULL,

    [X] [float] NULL,

    [Y] [float] NULL,

    [Accuracy] [tinyint] NOT NULL,

    [Speed] [float] NOT NULL,

    [Heading] [smallint] NOT NULL,

    [CreationDate] [datetime] NULL

    Battery Changes Table:

    [VehicleGUID] [uniqueidentifier] NOT NULL,

    [TimeStamp] [datetime] NOT NULL,

    [Battery] [tinyint] NULL,

    the Battery Changes Table will hold only dates where the battery value actually changed.

    obviously, this design will save a lot of disk space but i encountered a problem:

    how do i run a query on the tables that will return data as it was before the table split (i.e. for every record in the GPS data table, find a record with the same vehicle Guid and a the closest earlier row in the Battery Changes Table and use the battery from that row)

    i know that in this specific case the disk space saved is small but i am talking about the concept. i am going to implement this idea over fields which will save much more data per row

    Dani Avni
    OfficeCore

  • Can you supply some sample data (both what you have and how you want it to look)? I think I understand what you're looking for, but I don't have time to go through your table setup. If we can figure out from a quick glance what you want, we can help you out a lot easier.

    Thanks!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • for simplicity assume i have this data

    IDDateXYData

    11/1/2000 00:00:00121

    21/1/2000 00:01:00341

    31/1/2000 00:02:00561

    41/1/2000 01:00:00562

    51/1/2000 01:01:00782

    61/1/2000 01:02:009102

    as you can see the Data field only changes twice (00:00 and 01:00). so i want to create two tables:

    the actual data

    IDDateXY

    11/1/2000 00:00:0012

    21/1/2000 00:01:0034

    31/1/2000 00:02:0056

    41/1/2000 01:00:0056

    51/1/2000 01:01:0078

    61/1/2000 01:02:00910

    And another table for the changes in the Data field

    DateData

    1/1/2000 00:00:001

    1/1/2000 01:00:002

    my question is assuming i have these two tables, how do i write a query that will return results like the original table (i.e. Data field in every row)

    Dani Avni
    OfficeCore

  • I think what you are looking for is an INNER JOIN,

    Try this query:

    Select Table1.VehicleGUID, Table2.TimeStamp As [Battry Changed On] From GPSDataTable

    Inner Join BattryChangeTable Table2 ON Table2.VehicleGUID = Table1.VehicleGUID

    Vishal Gajjar
    http://SqlAndMe.com

  • Its purely inner join with "case when" condition.

    column should be fetched from primary table when there is no record available for that vehicleGUID in battery table, otherwise you just need to get it from the battery table.

    do you need query for it?

  • yes please. a sample would be much appriciated.

    thanks.

    Dani Avni
    OfficeCore

  • declare @positons table(

    vehicle_ID smallint,

    date datetime,

    x smallint,

    y smallint)

    insert into @positons

    select 1, '1/1/2000 00:00:00', 1, 2 union all

    select 1, '1/1/2000 00:01:00', 3, 4 union all

    select 1, '1/1/2000 00:02:00', 5, 6 union all

    select 1, '1/1/2000 01:00:00', 5, 6 union all

    select 1, '1/1/2000 01:01:00', 7, 8 union all

    select 1, '1/1/2000 01:02:00', 9, 10

    declare @battery_change table(

    vehicle_ID smallint,

    date datetime,

    change smallint)

    insert into @battery_change

    select 1,'1/1/2000 00:00:00', 1 union all

    select 1,'1/1/2000 01:00:00', 0

    select p.*,(select change from @battery_change where (vehicle_id = p.vehicle_id) and date =(select max(date) from @battery_change bc where bc.date <= p.date and bc.vehicle_id = p.vehicle_id)) battery_change from @positons p

  • Regarding the table size itself:

    I noticed you're using FLOAT quite often (e.g. for [X], [Y], and [Speed]). Float will use 8 byte.

    You could change the Speed column to smallint (2 byte). I'm not sure if X and Y are integer values or not and what the scale is. But it might be possible to adjust those data types as well.

    Regarding your indexes: Assuming SS2K5 or above, you might benefit from moving some columns to the INCLUDE part of the index. This will not only reduce the required space but will also improve performance (if done correctly) since the index will be more narrow.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi Lutz,

    thanks for the message. these tables where built by me about 6-7 years ago when i didn't know too much SQL. when the tables where small and held thousands of records the bytes wasted weren't too much a deal. now i have tens of millions of records and suddenly every byte counts. i am in the process of complete rewrite of this table looking to remove every byte possible.

    for example the speed can be even reduced to 1 byte (after all cars do not travel at speeds > 255 Km/h) and even if they do i can still store it in 1 byte which will hold the speed divided by 2 (permitting speeds of up to 510 Km/h)

    Dani Avni

    Dani Avni
    OfficeCore

  • danavni (12/30/2010)


    Hi Lutz,

    thanks for the message. these tables where built by me about 6-7 years ago when i didn't know too much SQL. when the tables where small and held thousands of records the bytes wasted weren't too much a deal. now i have tens of millions of records and suddenly every byte counts. i am in the process of complete rewrite of this table looking to remove every byte possible.

    for example the speed can be even reduced to 1 byte (after all cars do not travel at speeds > 255 Km/h) and even if they do i can still store it in 1 byte which will hold the speed divided by 2 (permitting speeds of up to 510 Km/h)

    Dani Avni

    I hope you'll never have to track data for a PORSCHE on the German Autobahn... :-D:-D

    (Max. speed >300km/h on the road. Trust me. Personal experience.)

    I'd strongly vote against storing data in a non-natural format just to save one byte. If you forget to re-calculate it a single sproc or view you'll display plain wrong data.

    I'm sure there's noone around without the experience you're going through right now... But the number of people actually rewriting their own code is much smaller... Good luck!!



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply