January 21, 2016 at 6:49 pm
nm.rajesh (1/17/2016)
Thanks a lot 🙂
See my post above. Unless the guarantees that I stated above are enforced by constraints and then the a_m column is brought into play as an "enforced indication of grouping" in the code, you're going to be in for an awful surprise sometime in the very near future. There's nothing that guarantees the order otherwise.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 21, 2016 at 6:59 pm
Jeff Moden (1/21/2016)
nm.rajesh (1/17/2016)
Thanks a lot 🙂See my post above. Unless the guarantees that I stated above are enforced by constraints and then the a_m column is brought into play as an "enforced indication of grouping" in the code, you're going to be in for an awful surprise sometime in the very near future. There's nothing that guarantees the order otherwise.
No constraints can guarantee correct output for a query using ORDER BY (SELECT NULL).
I just selected top 10 rows from a 11 M row table having unique clustered index on it.
Those 10 rows were somewhere from the middle of the range.
Constraints can enforce any kind of order only within a single data page.
_____________
Code for TallyGenerator
January 21, 2016 at 9:30 pm
Sometimes it's better to say so than to offer any code help.
He may be after picking some random rows on the intermediate result of a union alls for whatever reason. AS I said we dont know. To know if a problem was solved, we have to define the problem . The OP defined a simple situation then and marked an answer later .
Whether that is the correct problem to solve is , that is an unknown. As I have implied, it Is good that Sergiy followed up with the "why are you doing it this way" approach, since things going astray can be foreseen here. No qualms there.
Now on to the next matter 🙂
To be honest, I don't see any bashing occurring ....
and will quite possibly fail the very next week after you cashed the check for your valuable service.
Maybe I should get my fees upfront and do sloppy work at the last minute, since reputation does not matter really. 🙂 Just Kidding, though I can see how someone can take that remark ,not addressing the topic (personal actually), and think of it as a little bashing.
----------------------------------------------------
January 22, 2016 at 12:49 am
Jeff Moden (1/21/2016)
To be honest, I don't see any bashing occurring except maybe from you ;-).
You might want to reread Sergiy's second post in this topic. That's the one that prompted my comment.
Sometimes it's better to say so than to offer any code help.
No. Sometimes it's better to ask clarifying questions and/or to suggest rearchitecting than to offer any code help.
January 22, 2016 at 1:22 am
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is.
😎
USE tempdb;
GO
SET NOCOUNT ON;
;WITH SAMPLE_DATA(ID,code,recoID,Post,mts,a_m,DesiredFlag) AS
(SELECT * FROM (VALUES
(1,'999010',42207,912,369,17.2,1)
,(2,'000010',42207,912,369,17.2,1)
,(3,'433010',42207,912,370,17.2,1)
,(4,'543010',42207,912,370,17.2,1)
,(1,'656010',42207,912,373,22.6,0)
,(2,'661010',42207,912,373,22.6,0)
,(3,'661010',42207,912,374,22.6,0)
,(1,'661010',42207,912,705, 0.4,0)
,(2,'661010',42207,912,706, 0.4,0)
,(1,'999010',42207,912,708,-0.9,0)
) AS X(ID,code,recoID,Post,mts,a_m,DesiredFlag)
)
,BASE_DATA AS
(
SELECT
SD.ID
,SD.code
,SD.recoID
,SD.Post
,SD.mts
,SD.a_m
,SD.DesiredFlag
,ROW_NUMBER() OVER
(
ORDER BY SD.mts
,SD.ID
) - SD.ID AS SD_GRP
FROM SAMPLE_DATA SD
)
SELECT
BD.ID
,BD.code
,BD.recoID
,BD.Post
,BD.mts
,BD.a_m
,BD.DesiredFlag
,CASE
WHEN COUNT(*) OVER
(
PARTITION BY BD.SD_GRP
) = 4 THEN 1
ELSE 0
END AS CALC_FLAG
FROM BASE_DATA BD;
Output
ID code recoID Post mts a_m DesiredFlag CALC_FLAG
---- ------ ------- ----- ---- ------ ----------- -----------
1 999010 42207 912 369 17.2 1 1
2 000010 42207 912 369 17.2 1 1
3 433010 42207 912 370 17.2 1 1
4 543010 42207 912 370 17.2 1 1
1 656010 42207 912 373 22.6 0 0
2 661010 42207 912 373 22.6 0 0
3 661010 42207 912 374 22.6 0 0
1 661010 42207 912 705 0.4 0 0
2 661010 42207 912 706 0.4 0 0
1 999010 42207 912 708 -0.9 0 0
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
January 22, 2016 at 2:23 pm
Sergiy (1/21/2016)
Jeff Moden (1/21/2016)
nm.rajesh (1/17/2016)
Thanks a lot 🙂See my post above. Unless the guarantees that I stated above are enforced by constraints and then the a_m column is brought into play as an "enforced indication of grouping" in the code, you're going to be in for an awful surprise sometime in the very near future. There's nothing that guarantees the order otherwise.
No constraints can guarantee correct output for a query using ORDER BY (SELECT NULL).
I just selected top 10 rows from a 11 M row table having unique clustered index on it.
Those 10 rows were somewhere from the middle of the range.
Constraints can enforce any kind of order only within a single data page.
Absolutely agreed about the ORDER BY (SELECT NULL). That's why I said that the a_m column must be brought into play and only if it's a guaranteed grouping number that will never be used for any other grouping in the table.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2016 at 2:31 pm
Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is....{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.
To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 23, 2016 at 9:39 am
Jeff Moden (1/22/2016)
Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is....{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.
To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.
Jeff, I do very much appreciate when I'm corrected if I'm wrong and of course this is no exception, my intention with the initial suggestion was to demonstrate the method of subtracting a broken sequence from an unbroken one in order to create a group identifier. But I literally had less than two minutes to construct the answer which unfortunately deprived me of the opportunity to complete it, I posted it incomplete which was my bad but I must say that I did somewhat count on the likes of you and others to pitch in, like you did, to complete the solution. What I don't like is when posters troll around the threads and bash other solution posters without any constructive input.
😎
Edit: Typo
January 23, 2016 at 9:59 am
Eirikur Eiriksson (1/23/2016)
Jeff Moden (1/22/2016)
Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is....{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.
To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.
Jeff, I do very much appreciate when I'm corrected if I'm wrong and of course this is no exception, my intention with the initial suggestion was to demonstrate the method of subtracting a broken sequence from an unbroken one in order to create a group identifier. But I literally had less than two minutes to construct the answer which unfortunately deprived me of the opportunity to complete it, I posted it incomplete which was my bad but I must say that I did somewhat count on the likes of you and others to pitch in, like you did, to complete the solution. What I don't like is when posters troll around the threads and bash other solution posters without any constructive input.
😎
Edit: Typo
I couldn't agree more with your last statement, Eirikur. Unfortunately, it's become more and more common lately.
January 24, 2016 at 9:42 pm
Hi Hugo Kornelis,
Apologies for delay in getting back to you, please find the details of business context and implemented solution Below : find the Table and Query used Below :
Source : csv File (which is Pre Sorted Based On Metres)
Business Logic :
•IF there are more than 7 or more consecutive rows repeated with same value we need to flag those records which will be filtered in reports ( Please note all the source data is from a csv file which is pre-sorted.
•If there are more than 7 or more consecutive rows repeated with one or 2 values in between are different flag all of these records. Example ( 1,1,1,1,42,34,1,1,1,3,4,1,1,3,3,3,3,) Here in the above example all the values Until the Last 1 should be Flagged.
(Please note i have only implemented Business Logic 1 and i am working on the 2nd One ) any help would greatly be appreciated.
Table:
CREATE TABLE [dbo].[SampleTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[RCAcode] [float] NULL,
[Post] [float] NULL,
[Metres] [float] NULL,
[Measure] [float] NULL,
) ON [PRIMARY]
Sample Data:
INSERT INTO [dbo].[Source] (RCAcode, Post, Metres, Measure1, Measure2)
VALUES
(661010,912,369,17.2,22.9)
, (661010,912,369,17.2,22.9)
, (661010,912,370,17.2,22.9)
,(661010,912,370,17.2,22.9)
,(661010,912,371,17.2,22.9)
,(661010,912,371,17.2,22.9)
,(661010,912,373,22.6,27.6)
,(661010,912,373,22.6,27.6)
,(661010,912,374,22.6,27.6)
,(661010,912,374,22.6,27.6)
,(661010,912,375,22.6,27.6)
,(661010,912,375,22.6,27.6)
,(661010,912,376,22.6,27.6)
,(661010,912,376,22.6,27.6)
,(661010,912,377,13.6,34.4)
,(661010,912,377,18.2,25.6)
,(661010,912,378,10.9,34.4)
,(661010,912,378,10.4,34.4)
,(661010,912,379,21.7,34.4)
,(661010,912,379,15.2,34.4)
,(661010,912,380,13.5,34.4)
,(661010,912,380,13.5,34.4)
,(661010,912,381,13.5,30.8)
,(661010,912,381,13.5,34.4)
,(661010,912,382,17,33.9)
,(661010,912,382,13.5,33.9)
,(661010,912,383,17,33.9)
,(661010,912,383,17,33.9)
,(661010,912,384,17,33.9)
,(661010,912,384,17,33.9)
,(661010,912,385,17,33.9)
,(661010,912,385,17,33.9)
,(661010,912,386,15.9,29.5)
,(661010,912,386,15.9,33.9)
Query:
WITH FlagCount
AS (
SELECT
CASE
WHEN LEAD(Measure1, 1) OVER (ORDER BY (SELECT ID)) = Measure1
OR LAG(Measure1, 1) OVER (ORDER BY (SELECT ID)) = Measure1
THEN ROW_NUMBER() OVER (Partition BY Measure1
ORDER BY (
(SELECT ID)
)
) - ROW_NUMBER() OVER (
ORDER BY (
SELECT ID
)
)
END Measure1_Flag,
CASE
WHEN LEAD(Measure2, 1) OVER (ORDER BY (SELECT ID)) = Measure2
OR LAG(Measure2, 1) OVER (ORDER BY (SELECT ID)) = Measure2
THEN ROW_NUMBER() OVER (Partition BY Measure2
ORDER BY (
(SELECT ID)
)
) - ROW_NUMBER() OVER (
ORDER BY (
SELECT ID
)
)
END Measure2_Flag,
LEAD(Measure2, 1) OVER (ORDER BY ID) LEAD2,LAG(Measure2, 1) OVER (ORDER BY ID) LAG2,LEAD(Measure1, 1) OVER (ORDER BY ID) LEAD1,LAG(Measure1, 1) OVER (ORDER BY ID) LAG1,*
FROM source
)
, FLAG
AS
(
SELECT CASE WHEN COUNT(*) OVER (Partition BY Measure1_FLAG) >= 7 and Measure1_FLAG is NOT NULL THEN 1 ELSE 0 END Measure1_FLag_COUNT,
CASE WHEN COUNT(*) OVER (Partition BY Measure2_FLAG) >= 7 and Measure2_FLAG is NOT NULL THEN 1 ELSE 0 END Measure2_FLag_Count
,*
FROM FlagCount
)
SELECT * FROM FLAG
ORDER BY METRES
January 25, 2016 at 2:06 pm
Thanks for posting CREATE TABLE and INSERT statements, Rajesh!
Your explanation about the order of the data is also valuable: the data comes from an external source, you load it in a way that preserves order, and because of that the IDENTITY column can be used in the data to checkk the order of the data. Perfect, we can work with that.
I also like that you post the query work you have done so far. A small tip - you can make it a lot shorter and a lot easier to read by replacing all the "ORDER BY (SELECT ID)" with just "ORDER BY ID". That is exactly the same, but shorter.
Unfortunately, I must still ask for more clarification before I can try to solve your problem.
First, I could only get the code to run after making some modifications. The table changes name between the CREATE TABLE and the INSERT statement and the columns don't match. I can probably guess to how the table should have looked, but I ask you to post this information to prevent guesswork. Please correct and repost. Also, after writing the code you plan to post, create an empty database (preferably with a case sensitive collation) on your test system, copy and paste the code in a query window and hit execute to see if it runs. That ensures that the code will execute for us as well, without us having to guess on how to correct it.
Also, please post the expected end result for the data you posted. That helps me (and others) verify our understanding of the posted requirements, and verify our attempted solutions.
Second, I need more clarification on the logic. You mention in the explanation that you look for sets of consecutive rows with the same value, but you don't specify in which of the five columns the value has to be the same. From looking on your query, I assume that this is the column Measure1, and that you then also want to apply the same logic for Measure2. Is that a correct assumption? Do the RCAcode, Post, and Metres columns matter in any way? If not, then you can help us focus better on the actual problem by removing them from the CREATE TABLE and INSERT statements. I do see that you have an ORDER BY Metres at the end of the query, which is a bit strange because the sequencing for the consecutive-row-determination is based on the ID column.
The requirement for the second flag is a bit fuzzy. Apparently a series of consecutive identical values can have streaks of up to two different values in between and can still be considered a flaggable series. So the range 1, 1, 1, 8, 8, 1, 1, 1, 1 is a series of 7 1's. But what about a series such as 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2 - is that a series of 8 1's with three breaks in between, or a series of 8 2's with three breaks in between?
January 25, 2016 at 4:49 pm
Hi Hugo Kornelis,
Apologies for wasting your time , I have given the wrong table and insert statement i have Now corrected them in the below queries :
As i do not have SQL SERVER 2012 and above at home i am unable to run the Query , but i have given a mock result which is working as expected , now i want to implement the second Logic , as the current process is visually identifying the similar records which is a bit of fuzzy , my goal is to automate the process having a defined set of rules .
Rule :
Rule 1 : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 is same (Which is Working from the below query not sure if this mechanism is correct or suggest for any better approach)
Rule 2 : : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 where 6 out of 8 records are similar Example 1, 8, 1, 8, 1, 1, 8, 1,1,1,8,8,1,1,1
The above example should be flagged for all records, here i need to identify 6 out of 8 records should be same then flag the all the records. ( i am not sure whether i am using the correct example to the problem if you are unclear please get back to me )
CREATE TABLE [dbo].[SampleTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Measure1] [float] NULL,
[Measure2] [float] NULL,
) ON [PRIMARY]
Sample Data:
INSERT INTO [dbo].[SampleTable] (Measure1, Measure2)
VALUES
(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(13.6,34.4)
,(18.2,25.6)
,(10.9,34.4)
,(10.4,34.4)
,(21.7,34.4)
,(15.2,34.4)
,(13.5,34.4)
,(13.5,34.4)
,(13.5,30.8)
,(13.5,34.4)
,(17,33.9)
,(13.5,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(15.9,29.5)
,(15.9,33.9)
SELECT * FROM [SampleTable]
Query:
WITH FlagCount
AS (
SELECT CASE
WHEN LEAD(Measure1, 1) OVER (
ORDER BY ID
) = Measure1
OR LAG(Measure1, 1) OVER (
ORDER BY ID
) = Measure1
THEN ROW_NUMBER() OVER (
PARTITION BY Measure1 ORDER BY ID
) - ROW_NUMBER() OVER (
ORDER BY (
ID
)
)
END Measure1_Flag
,CASE
WHEN LEAD(Measure2, 1) OVER (
ORDER BY ID
) = Measure2
OR LAG(Measure2, 1) OVER (
ORDER BY ID
) = Measure2
THEN ROW_NUMBER() OVER (
PARTITION BY Measure2 ORDER BY ID
) - ROW_NUMBER() OVER (
ORDER BY (
ID
)
)
END Measure2_Flag
,LEAD(Measure2, 1) OVER (
ORDER BY ID
) LEAD2
,LAG(Measure2, 1) OVER (
ORDER BY ID
) LAG2
,LEAD(Measure1, 1) OVER (
ORDER BY ID
) LEAD1
,LAG(Measure1, 1) OVER (
ORDER BY ID
) LAG1
,*
FROM SampleTable
)
,FLAG
AS (
SELECT CASE
WHEN COUNT(*) OVER (PARTITION BY Measure1_FLAG) >= 7
AND Measure1_FLAG IS NOT NULL
THEN 1
ELSE 0
END Measure1_FLag_COUNT
,CASE
WHEN COUNT(*) OVER (PARTITION BY Measure2_FLAG) >= 7
AND Measure2_FLAG IS NOT NULL
THEN 1
ELSE 0
END Measure2_FLag_Count
,*
FROM FlagCount
)
SELECT ID,Measure1, Measure1,CASE
WHEN Measure1_FLag_COUNT = 1
OR Measure2_FLag_Count = 1
THEN 1
ELSE 0
END FINAL_FLAG
FROM FLAG
ORDER BY ID
--RESULT For Rule 1:
IDMeasure1Measure2Flag
117.222.91
217.222.91
317.222.91
417.222.91
517.222.91
617.222.91
722.627.61
822.627.61
922.627.61
1022.627.61
1122.627.61
1222.627.61
1322.627.61
1422.627.61
1513.634.40
1618.225.60
1710.934.40
1810.434.40
1921.734.40
2015.234.40
2113.534.40
2213.534.40
2313.530.80
2413.534.40
251733.91
2613.533.91
271733.91
281733.91
291733.91
301733.91
311733.91
321733.91
3315.929.50
3415.933.90
January 25, 2016 at 10:25 pm
Quick suggestion, requires SQL Server 2012 or later
😎
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.SampleTable') IS NOT NULL DROP TABLE dbo.SampleTable;
CREATE TABLE [dbo].[SampleTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Measure1] [float] NULL,
[Measure2] [float] NULL,
)
--Sample Data:
INSERT INTO [dbo].[SampleTable] (Measure1, Measure2)
VALUES
(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(17.2,22.9)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(22.6,27.6)
,(13.6,34.4)
,(18.2,25.6)
,(10.9,34.4)
,(10.4,34.4)
,(21.7,34.4)
,(15.2,34.4)
,(13.5,34.4)
,(13.5,34.4)
,(13.5,30.8)
,(13.5,34.4)
,(17,33.9)
,(13.5,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(17,33.9)
,(15.9,29.5)
,(15.9,33.9);
;WITH BASE_DATA AS
(
SELECT
ST.ID
,ST.Measure1
,ST.Measure2
,ROW_NUMBER() OVER
(
ORDER BY ST.ID
) - COUNT(*) OVER
(
PARTITION BY ST.Measure1
ORDER BY ST.ID
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS M1_RCNT
,ROW_NUMBER() OVER
(
ORDER BY ST.ID
) - COUNT(*) OVER
(
PARTITION BY ST.Measure2
ORDER BY ST.ID
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS M2_RCNT
FROM dbo.SampleTable ST
)
SELECT
BD.ID
,BD.Measure1
,BD.Measure2
,SIGN(CASE WHEN COUNT(*) OVER
(
PARTITION BY BD.M1_RCNT
) > 7 THEN 1 ELSE 0 END
+ CASE WHEN COUNT(*) OVER
(
PARTITION BY BD.M2_RCNT
) > 7 THEN 1 ELSE 0 END) AS FLAG
FROM BASE_DATA BD
ORDER BY BD.ID;
Results
ID Measure1 Measure2 FLAG
----------- ---------------------- ---------------------- -----------
1 17.2 22.9 0
2 17.2 22.9 0
3 17.2 22.9 0
4 17.2 22.9 0
5 17.2 22.9 0
6 17.2 22.9 0
7 22.6 27.6 1
8 22.6 27.6 1
9 22.6 27.6 1
10 22.6 27.6 1
11 22.6 27.6 1
12 22.6 27.6 1
13 22.6 27.6 1
14 22.6 27.6 1
15 13.6 34.4 0
16 18.2 25.6 0
17 10.9 34.4 0
18 10.4 34.4 0
19 21.7 34.4 0
20 15.2 34.4 0
21 13.5 34.4 0
22 13.5 34.4 0
23 13.5 30.8 0
24 13.5 34.4 0
25 17 33.9 1
26 13.5 33.9 1
27 17 33.9 1
28 17 33.9 1
29 17 33.9 1
30 17 33.9 1
31 17 33.9 1
32 17 33.9 1
33 15.9 29.5 0
34 15.9 33.9 0
January 26, 2016 at 5:09 am
nm.rajesh (1/25/2016)
Apologies for wasting your time , I have given the wrong table and insert statement i have Now corrected them in the below queries :
No apologies needed, everyone can make a mistake. The only one who can waste my time is me, and in this case I posted back with my extra questions just to avoid wasting time.
As i do not have SQL SERVER 2012 and above at home i am unable to run the Query
http://www.microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx
Rule :
Rule 1 : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 is same (Which is Working from the below query not sure if this mechanism is correct or suggest for any better approach)
Rule 2 : : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 where 6 out of 8 records are similar Example 1, 8, 1, 8, 1, 1, 8, 1,1,1,8,8,1,1,1
The above example should be flagged for all records, here i need to identify 6 out of 8 records should be same then flag the all the records. ( i am not sure whether i am using the correct example to the problem if you are unclear please get back to me )
Okay, now it gets really confusing. You posted three times in this topic now, and each post has completely different requirements. First you look for consecutive series of 1, 2, 3, 4. Then you look for consecutive series of 7 identical values and consecutive series of 7 identical with one or two different in between. And now I see that you look for consecutive series of 8 or more identical and consecutive series of 8 or more with at least 6 identical. These constantly changing requirements gives me the impression that the actual requirement is not completely clear yet - either because you are still struggling to understand what your customer/employer wants, or because the customer/employer is still in the decision-making phase. That is all completely okay, it is a normal phase in IT projects, but it does mean that it is simply too early to try to code. First the requirements have to be crystal clear, and everyone needs to sign off on them; only then is it time to start to code.
Now even though I am afraid it's a moving target, I will try to fire a few shots at it. Since you already have rule 1, I will concentrate on rule 2. I can interpret this in two ways. The first way is to mark a row if it is member of at least one sequence of eight consecutive rows that have six identical values. In this interpretation, I do not understand why you say that all rows should be marked - in the first eight rows, there are five values 1 and three value 8, so why should row 1 and 2 be marked?
The second possible interpretation is to mark all rows that are part of a sequence of eight or more rows where the same value appears at least six times. Problem with this interpretation is that I don't think you really want it. If I expand the sample size to, for instance, 100 million rows, then the combination of all those rows together is definitely a sequence of eight or more rows. And in such a large collection of data, the chance of finding at least 6 rows that have the same value is quite high, even if all values are random. This interpretation would mean that in that case all 100 million rows are marked, and I don't think that you actually want this.
Based on what you write, I get that you are looking for sequences of "mostly" unchanged measurements, where the "mostly" means that there is a fuzzy part to the equation where a limited amount of different values should not interrupt the sequence. But for SQL Server, "mostly" simply does not cut it - SQL Server needs an exact definition, and so we (and you) need an exact definition. My suggestion is that you go back to the employer/customer to seek clarification. Ask them to provide examples of series that should and should not be marked, and ask for explanation why. If they come with clear-cut cases only (and they probably will), then take their examples, modify them to make the differences smaller and keep asking questions until you know what the rule is. Then try to find weird edge cases (like I did with the 1 1 2 2 1 1 2 2 sequence in my earlier post and with the 100 million rows in this post) to try to challenge the rules you found. Keep doing this until you finally arrive at a definition for the flag2 logic that all agree on, and that you can then build the code for.
--RESULT For Rule 1:
When I asked in the previous post for the expected results, I meant the expected results for the part of the logic you need help with. Not the part of the logic that you already figured out.
Now I see that Eirikur did post some code. I still don't fully understand your requirements yet, so I have no idea if his code helps you or not. If not, then please follow my suggestion above to first make sure that you get the actual requirements fully clarified, then post again - as before with CREATE TABLE and INSERT, with an explanation of the desired logic, and with the expected results for that logic. I am still willing to help you, but I will not spend time shooting at a moving target - so I will not even start on code until I am confident that the real requirements are clear and stable.
January 26, 2016 at 8:16 pm
Ed Wagner (1/23/2016)
Eirikur Eiriksson (1/23/2016)
Jeff Moden (1/22/2016)
Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is....{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.
To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.
Jeff, I do very much appreciate when I'm corrected if I'm wrong and of course this is no exception, my intention with the initial suggestion was to demonstrate the method of subtracting a broken sequence from an unbroken one in order to create a group identifier. But I literally had less than two minutes to construct the answer which unfortunately deprived me of the opportunity to complete it, I posted it incomplete which was my bad but I must say that I did somewhat count on the likes of you and others to pitch in, like you did, to complete the solution. What I don't like is when posters troll around the threads and bash other solution posters without any constructive input.
😎
Edit: Typo
I couldn't agree more with your last statement, Eirikur. Unfortunately, it's become more and more common lately.
I'd take some heart in what was said as mean as it sounded. We have quick posts that don't survive the future and people saying they solved what was given at the time instead of asking the right questions before posting. Are you sure you'd want a future employer to find this post that started off by ignoring the fact that there was nothing in the original post to reliably sort on? Think about it a little more. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 38 total)
You must be logged in to reply to this topic. Login to reply