Calculations in query

  • Hi,

    I have the following schema and sample data.

    CREATE TABLE [dbo].[sscSignalTypes](

    [signalTypeId] [varchar](3) NOT NULL,

    [description] [varchar](20) NOT NULL,

    [entrySignal] [bit] NOT NULL,

    CONSTRAINT [PK_sscSignalTypes] PRIMARY KEY CLUSTERED

    (

    [signalTypeId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    INSERT INTO [sscSignalTypes] ([signalTypeId],[description],[entrySignal])VALUES('IE','Initial Entry',1)

    INSERT INTO [sscSignalTypes] ([signalTypeId],[description],[entrySignal])VALUES('MX','Minor Exit',0)

    INSERT INTO [sscSignalTypes] ([signalTypeId],[description],[entrySignal])VALUES('SC','Scale in',1)

    INSERT INTO [sscSignalTypes] ([signalTypeId],[description],[entrySignal])VALUES('XXX','Final Exit',0)

    GO

    CREATE TABLE [dbo].[sscOrders](

    [orderId] [int] NOT NULL,

    [tradeId] [int] NOT NULL,

    [units] [tinyint] NOT NULL,

    [side] [char](1) NOT NULL,

    [price] [decimal](18, 6) NOT NULL,

    [signalTypeId] [varchar](3) NOT NULL,

    [orderDateTime] [datetime] NOT NULL,

    CONSTRAINT [PK_sscOrders] PRIMARY KEY CLUSTERED

    (

    [orderId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    go

    ALTER TABLE [dbo].[sscOrders] WITH CHECK ADD CONSTRAINT [FK_sscOrders_signalTypes] FOREIGN KEY([signalTypeId])

    REFERENCES [dbo].[sscSignalTypes] ([signalTypeId])

    GO

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(666,180,2,'S',1.410600,'IE','Jun 3 2010 3:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(667,180,1,'B',1.397800,'MX','Jun 4 2010 3:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(668,180,2,'S',1.389600,'SC','Jun 4 2010 7:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(669,180,1,'B',1.388400,'MX','Jun 7 2010 3:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(670,180,1,'B',1.388800,'MX','Jun 8 2010 3:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(671,180,1,'B',1.379700,'XXX','Jun 8 2010 7:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(672,181,2,'S',1.383400,'IE','Jun 17 2010 7:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(673,181,1,'B',1.377300,'MX','Jun 17 2010 7:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(674,181,2,'S',1.371600,'SC','Jun 18 2010 11:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(675,181,1,'B',1.373300,'MX','Jun 21 2010 7:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(676,181,2,'S',1.367500,'SC','Jun 21 2010 11:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(677,181,1,'B',1.359800,'MX','Jun 23 2010 7:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(678,181,1,'B',1.346400,'MX','Jun 25 2010 3:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(679,181,2,'B',1.325100,'XXX','Jun 30 2010 7:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(680,182,2,'B',1.371200,'IE','Aug 3 2010 7:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(681,182,1,'S',1.382500,'MX','Aug 5 2010 7:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(682,182,2,'B',1.388500,'SC','Aug 10 2010 11:00:00:000AM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(683,182,1,'S',1.383400,'MX','Aug 10 2010 3:00:00:000PM')

    INSERT INTO [sscOrders] ([orderId],[tradeId],[units],[side],[price],[signalTypeId],[orderDateTime])VALUES(684,182,2,'S',1.378500,'XXX','Aug 11 2010 3:00:00:000AM')

    GO

    I need a query that will return all columns from the sscOrders table, but I want to add an extra column called profit. The profit needs to be calculated when the entrySignal column in sscSignalTypes is 0. sscOrders joins to sscSignalTypes on signalTypeId, for each tradeId in sscOrders. When the entrySignal column is 1 leave the data as NULL.

    Profit is calculated as (price when signalTypeId = 0) - (earliest price when signalTypeId = 1) for each tradeId. The problem lies that for the second profit calculation (i.e. if there are more than one signalTypes for a tradeId) I need to use the price from the second entry. This is further complicated by the fact that entries and exits can be taken in units.

    For example tradeId 182

    Initial entry of BUY 2 units at 1.3712

    MX of SELL 1 unit at 1.3825 (profit of 1.3825-1.3712 = 113)

    SC of BUY 2 units at 1.3885

    MX of SELL 1 unit at 1.3834 (profit of 1.3834 - 1.3712 = 122)

    XXX of SELL 2 units at 1.3785 (profit of 2 * (1.3785 - 1.3885)) = -200

    Notice that the XXX needs to use the price from the SC for the calc because the MXs were used for the IE entry.

    The column called 'side' means S=SELL, B=BUY.

  • Wow CELKO, what a great reply, thanks! I learnt a lot from this. You asked for an example, well I gave one with trade_id 182. I can walk through it again to make it clear.

    A trade order with signal type 'IE' always has 2 units and in this example the price was 1.3712. When an order with signal type MX is executed, only 1 unit is removed from the market at price 1.3825. So, there is still one unit "open" in the market. ANOTHER entry is then made with a further 2 units at price 1.3885, so at this point in time there are 3 units open in the market. Then, an MX signal type order is executed and 1 unit is removed at price 1.3834, leaving 2 units open in the market. Finally, an XXX order is executed and the final 2 units are removed at price 1.3785.

    I need to know the profit when a unit is removed from the trade - from my data these are MX and XXX signal types. Does that clear it up? Let me know if you need further info, and thanks very much for your time and insight.

    Your point about whether to use a CHECK constraint over lookup tables is something I struggle with a lot.

    CELKO (6/6/2011)


    Thank yoyu for the DDL; unfortunately, you don't know the ISO-11170 rules and have vague or improper element names; There is NOT such thing as a “type_id” ; do yoyu have a “bloodf_type” or a “blood_type_id” – see how silly that is?

    Also, most encoding are fixed length. We do NOT use BIT flags in SQL; that was Assembly language. SQL is a “predicate language” that discovers a fact, instead of marking it on a punch card.

    You might also want to look at the bad effects of camelCase and why everyone dropped it. Not as bad a Hungarian notation, but still awful. The ISO convention is underscores that separate the attribute, attribute properties and roles.

    Also, please use ISO-8601 temporal formats and not dialect. Is units the unit of measure or the number of something in the trade? I gather that “trade_side” is short for “buy/sell”? The last such databases I did used plus and minus on the share count – it made the math easier and the SQL much easier to write. This kind of flag is usual the old accountant's (DB/CR) mindset that came from not having negative numbers (well, actually, this started when they used Roman Numerals).

    Do you know about the new temporal types in SQL-2008? Do you really keep data to the fractional second?

    Why not let the exit codes begin with 'X' and the in's begin with 'I' and skip that silly bit flag? An alphabetic Dewey decimal encoding! This design flaw is called attribute splitting; the meaning of an atomic value is split into two scalars.

    CREATE TABLE SSC_Signal_Types

    (signal_type CHAR(2) NOT NULL PRIMARY KEY

    ssc_type_description VARCHAR(20) NOT NULL)

    INSERT INTO SSC_Signal_Types

    VALUES('IE', 'Initial Entry'), ('IS', 'Scale in'),

    ('XM', 'Minor Exit'), ('XX', 'Final Exit');

    Frankly, with so few codes and their static nature, I would use a CHECK() instead of a REFERENCES to save a table and implied join. But you decide ..

    CREATE TABLE SSC_Orders

    (order_id INTEGER NOT NULL PRIMARY KEY,

    trade_id INTEGER NOT NULL,

    trade_unit_cnt SMALLINT NOT NULL,

    unit_price DECIMAL (18,6) NOT NULL,

    signal_type VARCHAR(3) NOT NULL

    REFERENCES SSC_SignalTypes (signal_type),

    order_timestamp DATETIME2(0) NOT NULL);

    INSERT INTO SSC_Orders

    VALUES(666, 180, 2, -1.410600, 'IE', '2010-06-03 15:00:00'),

    (667, 180, 1, +1.397800, 'XM', '2010-06-04 15:00:00'),

    (668, 180, 2, -1.389600, 'IS', '2010-06-04 19:00:00'),

    (669, 180, 1, +1.388400, 'XM', '2010-06-07 03:00:00'),

    (670, 180, 1, +1.388800, 'XM', '2010-06-08 03:00:00'),

    (671, 180, 1, +1.379700, 'XX', '2010-06-08 19:00:00'),

    (672, 181, 2, -1.383400, 'IE', '2010-06-17 07:00:00'),

    (673, 181, 1, +1.377300, 'XM', '2010-06-17 19:00:00'),

    (674, 181, 2, -1.371600, 'IS', '2010-06-18 11:00:00'),

    (675, 181, 1, +1.373300, 'XM', '2010-06-21 07:00:00'),

    (676, 181, 2, -1.367500, 'IS', '2010-06-21 23:00:00'),

    (677, 181, 1, +1.359800, 'XM', '2010-06-23 7:00:00'),

    (678, 181, 1, +1.346400, 'XM', '2010-06-25 15:00:00'),

    (679, 181, 2, +1.325100, 'XX', '2010-06-30 07:00:00'),

    (680, 182, 2, +1.371200, 'IE', '2010-08-030 07:00:00'),

    (681, 182, 1, -1.382500, 'XM', '2010-08-05 07:00:00'),

    (682, 182, 2, +1.388500, 'IS', '2010-08-10 11:00:00'),

    (683, 182, 1, -1.383400, 'XM', '2010-08-10 15:00:00'),

    (684, 182, 2, -1.378500, 'XX', '2010-08-11 03:00:00');

    >> I need a query that will return all columns from the SSC_Orders table, but I want to add an extra column called profit. The profit needs to be calculated when the entry_signal column in SSC_Signal_Types is 0. [signal_type LIKE 'X_'] <<

    SSC_Orders joins to SSC_SignalTypes on signal_type, for each trade_id in SSC_Orders. When the entry_signal column is 1 [signal_type LIKE 'I_'] leave the data as NULL. <<

    No need with an improved encoding to do an extra join back to the encoding table:

    CASE WHEN signal_type LIKE 'I_' THEN NULL

    WHEN signal_type LIKE 'X_' THEN ???

    ELSE NULL END

    AS profit

    >> Profit is calculated as (price when signal_type = 0 [signal_type LIKE 'X_') - (earliest price when signal_type = 1 [signal_type LIKE 'I_']) for each trade_id. <<

    Let's find the first exit order in each trade:

    WITH Trade_Exit(trade_id, unit_price, order_timestamp)

    AS

    (SELECT trade_id, unit_price, order_timestamp

    FROM SSC_Orders

    WHERE signal_type LIKE 'X_'),

    First_Trade_Exit(trade_id, unit_price)

    AS

    (SELECT X1.trade_id, X1.unit_price

    FROM Trade_Exit AS X1

    WHERE X1.order_timestamp

    = (SELECT MIN(X2.order_timestamp)

    FROM Trade_Exit AS X2

    WHERE X1.trade_id = X2.trade_id))

    << main query here >>;

    >> The problem lies that for the second profit calculation (i.e. if there are more than one signal_types for a trade_id) I need to use the price from the second entry. This is further complicated by the fact that entries and exits can be taken in units. <<

    I am not sure what that means; can you give an example ?

  • Joe, this is essentially a running totals problem further complicated by the need to associate each sale with a buying price. It's like FIFO inventory. The problem is that some "lots" of purchases will be split between two or more sales and some sales will cover inventory from two or more purchases with different prices.

    Keymoo, I have an approach I'd like to try but it will have to wait until I finish with my day job. I make no guarantees about how fast it will run if it works.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks, I'm wondering if I should be doing this in the application layer in C#. :-/ However it makes sense to me to do calculations on sets of data close to the data, in the database itself.

    The Dixie Flatline (6/6/2011)


    Joe, this is essentially a running totals problem further complicated by the need to associate each sale with a buying price. It's like FIFO inventory. The problem is that some "lots" of purchases will be split between two or more sales and some sales will cover inventory from two or more purchases with different prices.

    Keymoo, I have an approach I'd like to try but it will have to wait until I finish with my day job. I make no guarantees about how fast it will run if it works.

  • Frankly, this strikes me as the sort of problem that might be better done on the client side, but you can test this logic out.

    Basically, the query below uses CTEs to divide sscOrders into Sales and Buys, ordering each by orderDateTime within tradeID, and most importantly giving a sequential number to each unit bought or sold. It then joins the first unit bought to the first unit sold, the second unit bought to the second unit sold, etc. With the units joined in this fashion, profit is simple to calculate (unit sell price - unit buy price). Grouping units back together by tradeID and orderID gives the total profit for each order ID.

    Let us know if this gives the correct results and then let us know if it performs at your production volumes.

    This may be a set based solution, but it does a LOT of cranking to join up the units.

    With proper indexing, it might be acceptable for handling a singe tradeID at a time. But if you are wanting to run it against a few million rows it will drag. Also, the volume of units will result in a huge number of rows being generated and I suspect you have trades ranging from thousands to millions of units. When I increased your buys and sales from units to blocks of 100,000 to 200,000, it took 30 seconds for my system to produce results.

    ; with Buys as (

    select tradeID,orderID,price,ROW_NUMBER() over(partition by tradeID order by orderDateTime,N) as BuyUnit

    from dbo.SSCOrders Ord

    join dbo.SSCSignalTypes ST on ST.signalTypeID = ord.signalTypeID

    cross join tally

    where entrySignal = 1 and N <= units

    )

    ,Sales as (

    select tradeID,orderID,price,ROW_NUMBER() over(partition by tradeID order by orderDateTime,N) as SellUnit

    from dbo.SSCOrders Ord

    join dbo.SSCSignalTypes ST on ST.signalTypeID = ord.signalTypeID

    cross join tally

    where entrySignal = 0 and N <= units

    )

    ,Profit as (

    select s.*,b.price as buying_price, s.price-b.price as profit

    from Sales s

    join Buys b on b.tradeID = s.tradeID and b.BuyUnit = s.SellUnit

    )

    select tradeID,orderID,count(*) as units, SUM(profit) as profit

    from Profit

    group by tradeid,orderID

    order by tradeID,orderID

    It's not perfect. No logic has been included to deal with Sales that exceed Buys for instance. But, if the results so far are correct, the rest can be fleshed out. Hope this helps. If it's too slow, maybe some speed phreaks can give you a quicker solution.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • That's great thanks a lot, I have amended it a little to cope with short trades by including a CASE statement. Thanks for your time.

    The Dixie Flatline (6/6/2011)


    It's not perfect. No logic has been included to deal with Sales that exceed Buys for instance. But, if the results so far are correct, the rest can be fleshed out. Hope this helps. If it's too slow, maybe some speed phreaks can give you a quicker solution.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply