January 14, 2012 at 9:16 am
Hi,
i'm new to t-sql and i have a problem now with row by row comparison in sql.
I already done this things in my C# code,but it has performance issue, so i decide to do processing in database.
This is my data
IF OBJECT_ID('TempDB..#trading','U') IS NOT NULL
DROP TABLE #trading
--===== Create the test table with
CREATE TABLE #trading
(
ID INT,
[Date] DATETIME,
[Time] DATETIME,
[Price] varchar(20),
[AskClient] varchar(8),
[BidClient] varchar(8)
)
--===== Setup any special required conditions especially where dates are concerned
SET DATEFORMAT DMY
INSERT INTO #trading (ID,[Date],[Time],[Price],[AskClient],[BidClient])
SELECT'1','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'2','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100007' Union All
SELECT'3','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100008' Union All
SELECT'4','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100009' Union All
SELECT'5','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100010' Union All
SELECT'6','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100002' Union All
SELECT'7','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100007' Union All
SELECT'8','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100008' Union All
SELECT'9','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100009' Union All
SELECT'10','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100010' Union All
SELECT'11','Jun 26 2011 12:00AM','Jan 14 2012 11:01AM','650','100001','100002' Union All
SELECT'12','Jun 26 2011 12:00AM','Jan 14 2012 11:01AM','650','100001','100007' Union All
SELECT'13','Jun 26 2011 12:00AM','Jan 14 2012 11:01AM','650','100001','100008' Union All
SELECT'14','Jun 26 2011 12:00AM','Jan 14 2012 11:01AM','650','100001','100009' Union All
SELECT'15','Jun 26 2011 12:00AM','Jan 14 2012 11:01AM','650','100001','100010' Union All
SELECT'16','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'17','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100007' Union All
SELECT'18','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100008' Union All
SELECT'19','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100009' Union All
SELECT'20','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100010' Union All
SELECT'21','Jun 28 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100002' Union All
SELECT'22','Jun 28 2011 12:00AM','Jan 14 2012 11:00AM','650','100003','100002' Union All
SELECT'23','Jun 28 2011 12:00AM','Jan 14 2012 11:00AM','650','100004','100002' Union All
SELECT'24','Jun 28 2011 12:00AM','Jan 14 2012 11:00AM','650','100005','100002' Union All
SELECT'25','Jun 28 2011 12:00AM','Jan 14 2012 11:00AM','650','100006','100002' Union All
SELECT'26','Jun 29 2011 12:00AM','Jan 14 2012 11:01AM','650','100001','100002' Union All
SELECT'27','Jun 29 2011 12:00AM','Jan 14 2012 11:01AM','650','100003','100002' Union All
SELECT'28','Jun 29 2011 12:00AM','Jan 14 2012 11:01AM','650','100004','100002' Union All
SELECT'29','Jun 29 2011 12:00AM','Jan 14 2012 11:01AM','650','100005','100002' Union All
SELECT'30','Jun 29 2011 12:00AM','Jan 14 2012 11:01AM','650','100006','100002' Union All
SELECT'31','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'32','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'33','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'34','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'35','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'36','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'37','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100003','100002' Union All
SELECT'38','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100004','100002' Union All
SELECT'39','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100005','100002' Union All
SELECT'40','Jun 30 2011 12:00AM','Jan 14 2012 10:50AM','650','100006','100002' Union All
SELECT'41','Jul 1 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'42','Jul 1 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'43','Jul 1 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'44','Jul 1 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'45','Jul 1 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'46','Jul 3 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'47','Jul 3 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'48','Jul 3 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'49','Jul 3 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'50','Jul 3 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'51','Jul 4 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'52','Jul 4 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'53','Jul 4 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'54','Jul 4 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'55','Jul 4 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002'
My requirement is :
i have to find out what askclient that meet this requirement :
has same Date,Price and AskClient that doing transaction five or more times sequentially.
for example (from above data)
i have transaction
SELECT'6','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100002'
SELECT'7','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100007'
SELECT'8','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100008'
SELECT'9','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100009'
SELECT'10','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100010'
From that transaction i get AskClient 100001 as my suspect.
Could anyone help me with this problem...?
January 14, 2012 at 9:33 am
Blimey! First post, and you've set up a ton of sample data! Top work!
I think this is what you're looking for:
SELECT
Date,
Price,
AskClient,
Dupes = COUNT(*)
FROM #trading
GROUP BY Date, Price, AskClient
HAVING COUNT(*) > 4
There's a lot of duplication in the sample data set, removing some of the dupes may help to verify the code.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 14, 2012 at 11:32 am
ChrisM@home (1/14/2012)
Blimey! First post, and you've set up a ton of sample data! Top work!I think this is what you're looking for:
SELECT
Date,
Price,
AskClient,
Dupes = COUNT(*)
FROM #trading
GROUP BY Date, Price, AskClient
HAVING COUNT(*) > 4
There's a lot of duplication in the sample data set, removing some of the dupes may help to verify the code.
You might want to drop the count to > 1 as there may only be two duplicate rows. The example provided in the original post had more.
Edit: Never mind, missed part of the spec on first read.
January 14, 2012 at 11:37 am
Lynn Pettis (1/14/2012)
You might want to drop the count to > 1 as there may only be two duplicate rows. The example provided in the original post had more.
The original post stated: "has same Date,Price and AskClient that doing transaction five or more times sequentially."
The only improvement that needs to be made is to accommodate the sequentially part.
I assume sequentially means in order of ID/Time.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 14, 2012 at 11:52 am
SQL Kiwi (1/14/2012)
Lynn Pettis (1/14/2012)
You might want to drop the count to > 1 as there may only be two duplicate rows. The example provided in the original post had more.The original post stated: "has same Date,Price and AskClient that doing transaction five or more times sequentially."
The only improvement that needs to be made is to accommodate the sequentially part.
I assume sequentially means in order of ID/Time.
Thanks Paul, I missed that part in my first read. Guess I should read it several times before posting when I first get up in the morning.
January 14, 2012 at 1:14 pm
Ok, I had to create some new sample data to make this a bit clearer to me, but I think this is right:
DECLARE @Trading TABLE
(
ID integer NOT NULL PRIMARY KEY,
[Date] date NOT NULL,
[Time] time NOT NULL,
[Price] smallmoney NOT NULL,
[AskClient] integer NOT NULL,
[BidClient] integer NOT NULL
)
INSERT @Trading
(ID, Date, Time, Price, AskClient, BidClient)
VALUES
(01, '2000-01-01', '08:00:00', $6.50, 1, 1),
(02, '2000-01-01', '09:00:00', $6.50, 1, 2),
(03, '2000-01-01', '10:00:00', $6.50, 2, 3), -- different Ask
(04, '2000-01-01', '11:00:00', $6.50, 1, 4),
(05, '2000-01-01', '12:00:00', $6.50, 1, 5),
(06, '2000-01-01', '13:00:00', $6.50, 1, 1),
(07, '2000-01-01', '14:00:00', $6.70, 1, 2), -- different Price
(08, '2000-01-01', '15:00:00', $6.50, 1, 3),
(09, '2000-01-01', '16:00:00', $6.50, 1, 4),
(10, '2000-01-01', '17:00:00', $6.50, 1, 5),
(11, '2000-01-02', '08:00:00', $6.50, 1, 1), -- different Date
(12, '2000-01-02', '09:00:00', $6.50, 1, 2),
(13, '2000-01-02', '10:00:00', $6.50, 1, 3),
(14, '2000-01-02', '11:00:00', $6.50, 1, 4),
(15, '2000-01-02', '12:00:00', $6.50, 1, 5)
-- Incorrectly reports 2000-01-01
SELECT
Date,
Price,
AskClient,
Dupes = COUNT(*),
MIN(ID),
MAX(ID)
FROM @Trading AS t
GROUP BY Date, Price, AskClient
HAVING COUNT(*) > 4;
-- Correct results
WITH Numbered AS
(
SELECT
GroupID =
ROW_NUMBER() OVER (ORDER BY t.Date, t.Time, t.ID) -
ROW_NUMBER() OVER (PARTITION BY t.AskClient, t.Date, t.Price ORDER BY t.Date, t.Time, t.ID),
*
FROM @Trading AS t
), Ranges AS
(
SELECT
MIN(N.ID) AS range_start_id,
MAX(N.ID) AS range_end_id,
COUNT_BIG(*) AS range_count
FROM Numbered AS N
GROUP BY N.GroupID
HAVING COUNT_BIG(*) >= 5
)
SELECT
t.Date,
t.Price,
t.AskClient,
r.range_start_id,
r.range_end_id,
r.range_count
FROM Ranges AS r
JOIN @Trading AS t ON
t.ID = r.range_start_id
edit: corrected last code line to range_start_id (from 'range_start')
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 14, 2012 at 1:46 pm
Paul, have you compared the performance of this method against a rCTE yet? I must admit I should have but haven't taken the time out to do it. If I have time tomorrow, I think it would be a worthwhile exercise.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 14, 2012 at 1:58 pm
ChrisM@home (1/14/2012)
Paul, have you compared the performance of this method against a rCTE yet? I must admit I should have but haven't taken the time out to do it. If I have time tomorrow, I think it would be a worthwhile exercise.
I didn't no. I was so sure it was a classic gaps-n-islands problem I just set about modifying that technique to work here. The islands method will go a bit faster if the ID column order is *absolutely guaranteed* to follow date & time order, but I'd still be nervous about making that change. Overall though, I expect it to suck quite badly on very large inputs due to the full-set sort(s).
The recursive CTE is an excellent idea to try. I might explore a CLR solution...
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 14, 2012 at 2:40 pm
SQL Kiwi (1/14/2012)
ChrisM@home (1/14/2012)
Paul, have you compared the performance of this method against a rCTE yet? I must admit I should have but haven't taken the time out to do it. If I have time tomorrow, I think it would be a worthwhile exercise.I didn't no. I was so sure it was a classic gaps-n-islands problem I just set about modifying that technique to work here. The islands method will go a bit faster if the ID column order is *absolutely guaranteed* to follow date & time order, but I'd still be nervous about making that change. Overall though, I expect it to suck quite badly on very large inputs due to the full-set sort(s).
The recursive CTE is an excellent idea to try. I might explore a CLR solution...
We need to know from OP what order to sequence the gaps-n-islands - it could even be [bidclient]. I'll take the rCTE.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 14, 2012 at 10:03 pm
ChrisM@home (1/14/2012)
Blimey! First post, and you've set up a ton of sample data! Top work!I think this is what you're looking for:
SELECT
Date,
Price,
AskClient,
Dupes = COUNT(*)
FROM #trading
GROUP BY Date, Price, AskClient
HAVING COUNT(*) > 4
There's a lot of duplication in the sample data set, removing some of the dupes may help to verify the code.
That's the point,my raw data will has many duplicates on Date,Price and AskClient (also bidClient) and i want to find AskClient who has sequence in data >= 5.
For example if i have this in my #trading table
SELECT'1','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'2','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100007' Union All
SELECT'3','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100002','100008' Union All
SELECT'4','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100009' Union All
SELECT'5','Jun 24 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100010' Union All
SELECT'6','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100002' Union All
SELECT'7','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'8','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100007' Union All
SELECT'9','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100008' Union All
SELECT'10','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100009' Union All
SELECT'11','Jun 26 2011 12:00AM','Jan 14 2012 10:50AM','650','100002','100010' Union All
SELECT'12','Jun 26 2011 12:00AM','Jan 14 2012 11:00AM','650','100002','100002' Union All
SELECT'13','Jun 26 2011 12:00AM','Jan 14 2012 10:50AM','650','100002','100002' Union All
SELECT'14','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100007' Union All
SELECT'15','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100008' Union All
SELECT'16','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100009' Union All
SELECT'17','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100010' Union All
SELECT'18','Jun 27 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100002'
my expected result is this
--My expected Result from above data
SELECT'6','Jun 25 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100002' Union All
SELECT'7','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100002' Union All
SELECT'8','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100007' Union All
SELECT'9','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100008' Union All
SELECT'10','Jun 25 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100009' Union All
SELECT'14','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100007' Union All
SELECT'15','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100008' Union All
SELECT'16','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100009' Union All
SELECT'17','Jun 27 2011 12:00AM','Jan 14 2012 10:50AM','650','100001','100010' Union All
SELECT'18','Jun 27 2011 12:00AM','Jan 14 2012 11:00AM','650','100001','100002'
i read from many resource about CTE,and i think this problem will be solved using CTE but i'm learn about that.
SQL Kiwi (1/14/2012)
Ok, I had to create some new sample data to make this a bit clearer to me, but I think this is right:
DECLARE @Trading TABLE
(
ID integer NOT NULL PRIMARY KEY,
[Date] date NOT NULL,
[Time] time NOT NULL,
[Price] smallmoney NOT NULL,
[AskClient] integer NOT NULL,
[BidClient] integer NOT NULL
)
INSERT @Trading
(ID, Date, Time, Price, AskClient, BidClient)
VALUES
(01, '2000-01-01', '08:00:00', $6.50, 1, 1),
(02, '2000-01-01', '09:00:00', $6.50, 1, 2),
(03, '2000-01-01', '10:00:00', $6.50, 2, 3), -- different Ask
(04, '2000-01-01', '11:00:00', $6.50, 1, 4),
(05, '2000-01-01', '12:00:00', $6.50, 1, 5),
(06, '2000-01-01', '13:00:00', $6.50, 1, 1),
(07, '2000-01-01', '14:00:00', $6.70, 1, 2), -- different Price
(08, '2000-01-01', '15:00:00', $6.50, 1, 3),
(09, '2000-01-01', '16:00:00', $6.50, 1, 4),
(10, '2000-01-01', '17:00:00', $6.50, 1, 5),
(11, '2000-01-02', '08:00:00', $6.50, 1, 1), -- different Date
(12, '2000-01-02', '09:00:00', $6.50, 1, 2),
(13, '2000-01-02', '10:00:00', $6.50, 1, 3),
(14, '2000-01-02', '11:00:00', $6.50, 1, 4),
(15, '2000-01-02', '12:00:00', $6.50, 1, 5)
-- Incorrectly reports 2000-01-01
SELECT
Date,
Price,
AskClient,
Dupes = COUNT(*),
MIN(ID),
MAX(ID)
FROM @Trading AS t
GROUP BY Date, Price, AskClient
HAVING COUNT(*) > 4;
-- Correct results
WITH Numbered AS
(
SELECT
GroupID =
ROW_NUMBER() OVER (ORDER BY t.Date, t.Time, t.ID) -
ROW_NUMBER() OVER (PARTITION BY t.AskClient, t.Date, t.Price ORDER BY t.Date, t.Time, t.ID),
*
FROM @Trading AS t
), Ranges AS
(
SELECT
MIN(N.ID) AS range_start_id,
MAX(N.ID) AS range_end_id,
COUNT_BIG(*) AS range_count
FROM Numbered AS N
GROUP BY N.GroupID
HAVING COUNT_BIG(*) >= 5
)
SELECT
t.Date,
t.Price,
t.AskClient,
r.range_start_id,
r.range_end_id,
r.range_count
FROM Ranges AS r
JOIN @Trading AS t ON
t.ID = r.range_start
thanks for the help,i'm still learn your solution for now...but from the result this query has made,i think it's nearly solve my problem.
NB : the important field for comparison in my data is Date,Price and AskClient. There is nothing to do with other field,it's only use for result in my application later.
Sorry for my bad english...i really appreciate all of your help..
January 15, 2012 at 1:24 am
Hi Martin
By "sequence in data" - do you mean sequence by ID?
Cheers
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 15, 2012 at 1:47 am
ChrisM@home (1/15/2012)
Hi MartinBy "sequence in data" - do you mean sequence by ID?
Cheers
Hi,
i'm sorry, maybe i'm using wrong word for my problem..
sequence that i mean is same transaction that occur more than 5 times with same Date,Price and AskClient.
so at the first time from my raw data we have to order by Date.
January 15, 2012 at 5:45 am
martin.david (1/15/2012)
ChrisM@home (1/15/2012)
Hi MartinBy "sequence in data" - do you mean sequence by ID?
Cheers
Hi,
i'm sorry, maybe i'm using wrong word for my problem..
sequence that i mean is same transaction that occur more than 5 times with same Date,Price and AskClient.
so at the first time from my raw data we have to order by Date.
So if you have 5 or more rows of data with the same values for Date,Price and AskClient, does the order matter?
Or can it be 5 or more rows in any order - as long as Date,Price and AskClient are the same?
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 15, 2012 at 5:48 am
ChrisM@home (1/15/2012)
martin.david (1/15/2012)
ChrisM@home (1/15/2012)
Hi MartinBy "sequence in data" - do you mean sequence by ID?
Cheers
Hi,
i'm sorry, maybe i'm using wrong word for my problem..
sequence that i mean is same transaction that occur more than 5 times with same Date,Price and AskClient.
so at the first time from my raw data we have to order by Date.
So if you have 5 or more rows of data with the same values for Date,Price and AskClient, does the order matter?
Or can it be 5 or more rows in any order - as long as Date,Price and AskClient are the same?
it has to be order by [Date] before do the comparison...
January 15, 2012 at 8:54 am
martin.david (1/15/2012)
it has to be order by [Date] before do the comparison...
I am going to continue to assume this means ordered by date and time, Price, and Ask Client - otherwise the original HAVING COUNT query is all that is needed. Short of time this weekend, but I will post an explanation of the method I posted soon.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply