Flag Values Based on Columns

Question

Flag Values Based on Columns

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 1

nm.rajesh (1/17/2016)
Thanks a lot 🙂

See my post above. Unless the guarantees that I stated above are enforced by constraints and then the a_m column is brought into play as an "enforced indication of grouping" in the code, you're going to be in for an awful surprise sometime in the very near future. There's nothing that guarantees the order otherwise.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 2

Jeff Moden (1/21/2016)
nm.rajesh (1/17/2016)
Thanks a lot 🙂
See my post above. Unless the guarantees that I stated above are enforced by constraints and then the a_m column is brought into play as an "enforced indication of grouping" in the code, you're going to be in for an awful surprise sometime in the very near future. There's nothing that guarantees the order otherwise.

No constraints can guarantee correct output for a query using ORDER BY (SELECT NULL).

I just selected top 10 rows from a 11 M row table having unique clustered index on it.

Those 10 rows were somewhere from the middle of the range.

Constraints can enforce any kind of order only within a single data page.

_____________
Code for TallyGenerator

MMartin1 One Orange Chip Points: 27795 More actions · Answer 3

Sometimes it's better to say so than to offer any code help.

He may be after picking some random rows on the intermediate result of a union alls for whatever reason. AS I said we dont know. To know if a problem was solved, we have to define the problem . The OP defined a simple situation then and marked an answer later .

Whether that is the correct problem to solve is , that is an unknown. As I have implied, it Is good that Sergiy followed up with the "why are you doing it this way" approach, since things going astray can be foreseen here. No qualms there.

Now on to the next matter 🙂

To be honest, I don't see any bashing occurring ....

and will quite possibly fail the very next week after you cashed the check for your valuable service.

Maybe I should get my fees upfront and do sloppy work at the last minute, since reputation does not matter really. 🙂 Just Kidding, though I can see how someone can take that remark ,not addressing the topic (personal actually), and think of it as a little bashing.

----------------------------------------------------

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 4

Jeff Moden (1/21/2016)
To be honest, I don't see any bashing occurring except maybe from you ;-).

You might want to reread Sergiy's second post in this topic. That's the one that prompted my comment.

Sometimes it's better to say so than to offer any code help.

No. Sometimes it's better to ask clarifying questions and/or to suggest rearchitecting than to offer any code help.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 5

The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is.

😎

USE tempdb;

GO

SET NOCOUNT ON;

;WITH SAMPLE_DATA(ID,code,recoID,Post,mts,a_m,DesiredFlag) AS

(SELECT * FROM (VALUES

(1,'999010',42207,912,369,17.2,1)

,(2,'000010',42207,912,369,17.2,1)

,(3,'433010',42207,912,370,17.2,1)

,(4,'543010',42207,912,370,17.2,1)

,(1,'656010',42207,912,373,22.6,0)

,(2,'661010',42207,912,373,22.6,0)

,(3,'661010',42207,912,374,22.6,0)

,(1,'661010',42207,912,705, 0.4,0)

,(2,'661010',42207,912,706, 0.4,0)

,(1,'999010',42207,912,708,-0.9,0)

) AS X(ID,code,recoID,Post,mts,a_m,DesiredFlag)

)

,BASE_DATA AS

(

SELECT

SD.ID

,SD.code

,SD.recoID

,SD.Post

,SD.mts

,SD.a_m

,SD.DesiredFlag

,ROW_NUMBER() OVER

(

ORDER BY SD.mts

,SD.ID

) - SD.ID AS SD_GRP

FROM SAMPLE_DATA SD

)

SELECT

BD.ID

,BD.code

,BD.recoID

,BD.Post

,BD.mts

,BD.a_m

,BD.DesiredFlag

,CASE

WHEN COUNT(*) OVER

(

PARTITION BY BD.SD_GRP

) = 4 THEN 1

ELSE 0

END AS CALC_FLAG

FROM BASE_DATA BD;

Output

ID code recoID Post mts a_m DesiredFlag CALC_FLAG

---- ------ ------- ----- ---- ------ ----------- -----------

1 999010 42207 912 369 17.2 1 1

2 000010 42207 912 369 17.2 1 1

3 433010 42207 912 370 17.2 1 1

4 543010 42207 912 370 17.2 1 1

1 656010 42207 912 373 22.6 0 0

2 661010 42207 912 373 22.6 0 0

3 661010 42207 912 374 22.6 0 0

1 661010 42207 912 705 0.4 0 0

2 661010 42207 912 706 0.4 0 0

1 999010 42207 912 708 -0.9 0 0

This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 6

Sergiy (1/21/2016)
Jeff Moden (1/21/2016)
nm.rajesh (1/17/2016)
Thanks a lot 🙂
See my post above. Unless the guarantees that I stated above are enforced by constraints and then the a_m column is brought into play as an "enforced indication of grouping" in the code, you're going to be in for an awful surprise sometime in the very near future. There's nothing that guarantees the order otherwise.
No constraints can guarantee correct output for a query using ORDER BY (SELECT NULL).
I just selected top 10 rows from a 11 M row table having unique clustered index on it.
Those 10 rows were somewhere from the middle of the range.
Constraints can enforce any kind of order only within a single data page.

Absolutely agreed about the ORDER BY (SELECT NULL). That's why I said that the a_m column must be brought into play and only if it's a guaranteed grouping number that will never be used for any other grouping in the table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 7

Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is.
...{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).

But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.

To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 8

Jeff Moden (1/22/2016)
Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is.
...{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.
To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.

Jeff, I do very much appreciate when I'm corrected if I'm wrong and of course this is no exception, my intention with the initial suggestion was to demonstrate the method of subtracting a broken sequence from an unbroken one in order to create a group identifier. But I literally had less than two minutes to construct the answer which unfortunately deprived me of the opportunity to complete it, I posted it incomplete which was my bad but I must say that I did somewhat count on the likes of you and others to pitch in, like you did, to complete the solution. What I don't like is when posters troll around the threads and bash other solution posters without any constructive input.

😎

Edit: Typo

Ed Wagner SSC Guru Points: 287026 More actions · Answer 9

Eirikur Eiriksson (1/23/2016)
Jeff Moden (1/22/2016)
Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is.
...{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.
To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.
Jeff, I do very much appreciate when I'm corrected if I'm wrong and of course this is no exception, my intention with the initial suggestion was to demonstrate the method of subtracting a broken sequence from an unbroken one in order to create a group identifier. But I literally had less than two minutes to construct the answer which unfortunately deprived me of the opportunity to complete it, I posted it incomplete which was my bad but I must say that I did somewhat count on the likes of you and others to pitch in, like you did, to complete the solution. What I don't like is when posters troll around the threads and bash other solution posters without any constructive input.
😎
Edit: Typo

I couldn't agree more with your last statement, Eirikur. Unfortunately, it's become more and more common lately.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

nm.rajesh SSC-Addicted Points: 421 More actions · Answer 10

Hi Hugo Kornelis,

Apologies for delay in getting back to you, please find the details of business context and implemented solution Below : find the Table and Query used Below :

Source : csv File (which is Pre Sorted Based On Metres)

Business Logic :

•IF there are more than 7 or more consecutive rows repeated with same value we need to flag those records which will be filtered in reports ( Please note all the source data is from a csv file which is pre-sorted.

•If there are more than 7 or more consecutive rows repeated with one or 2 values in between are different flag all of these records. Example ( 1,1,1,1,42,34,1,1,1,3,4,1,1,3,3,3,3,) Here in the above example all the values Until the Last 1 should be Flagged.

(Please note i have only implemented Business Logic 1 and i am working on the 2nd One ) any help would greatly be appreciated.

Table:

CREATE TABLE [dbo].[SampleTable](

[ID] [int] IDENTITY(1,1) NOT NULL,

[RCAcode] [float] NULL,

[Post] [float] NULL,

[Metres] [float] NULL,

[Measure] [float] NULL,

) ON [PRIMARY]

Sample Data:

INSERT INTO [dbo].[Source] (RCAcode, Post, Metres, Measure1, Measure2)

VALUES

(661010,912,369,17.2,22.9)

, (661010,912,369,17.2,22.9)

, (661010,912,370,17.2,22.9)

,(661010,912,370,17.2,22.9)

,(661010,912,371,17.2,22.9)

,(661010,912,373,22.6,27.6)

,(661010,912,374,22.6,27.6)

,(661010,912,375,22.6,27.6)

,(661010,912,376,22.6,27.6)

,(661010,912,377,13.6,34.4)

,(661010,912,377,18.2,25.6)

,(661010,912,378,10.9,34.4)

,(661010,912,378,10.4,34.4)

,(661010,912,379,21.7,34.4)

,(661010,912,379,15.2,34.4)

,(661010,912,380,13.5,34.4)

,(661010,912,381,13.5,30.8)

,(661010,912,381,13.5,34.4)

,(661010,912,382,17,33.9)

,(661010,912,382,13.5,33.9)

,(661010,912,383,17,33.9)

,(661010,912,384,17,33.9)

,(661010,912,385,17,33.9)

,(661010,912,386,15.9,29.5)

,(661010,912,386,15.9,33.9)

Query:

WITH FlagCount

AS (

SELECT

CASE

WHEN LEAD(Measure1, 1) OVER (ORDER BY (SELECT ID)) = Measure1

OR LAG(Measure1, 1) OVER (ORDER BY (SELECT ID)) = Measure1

THEN ROW_NUMBER() OVER (Partition BY Measure1

ORDER BY (

(SELECT ID)

)

) - ROW_NUMBER() OVER (

ORDER BY (

SELECT ID

)

END Measure1_Flag,

CASE

WHEN LEAD(Measure2, 1) OVER (ORDER BY (SELECT ID)) = Measure2

OR LAG(Measure2, 1) OVER (ORDER BY (SELECT ID)) = Measure2

THEN ROW_NUMBER() OVER (Partition BY Measure2

ORDER BY (

(SELECT ID)

)

) - ROW_NUMBER() OVER (

ORDER BY (

SELECT ID

)

END Measure2_Flag,

LEAD(Measure2, 1) OVER (ORDER BY ID) LEAD2,LAG(Measure2, 1) OVER (ORDER BY ID) LAG2,LEAD(Measure1, 1) OVER (ORDER BY ID) LEAD1,LAG(Measure1, 1) OVER (ORDER BY ID) LAG1,*

FROM source

)

, FLAG

AS

(

SELECT CASE WHEN COUNT(*) OVER (Partition BY Measure1_FLAG) >= 7 and Measure1_FLAG is NOT NULL THEN 1 ELSE 0 END Measure1_FLag_COUNT,

CASE WHEN COUNT(*) OVER (Partition BY Measure2_FLAG) >= 7 and Measure2_FLAG is NOT NULL THEN 1 ELSE 0 END Measure2_FLag_Count

,*

FROM FlagCount

)

SELECT * FROM FLAG

ORDER BY METRES

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 11

Thanks for posting CREATE TABLE and INSERT statements, Rajesh!

Your explanation about the order of the data is also valuable: the data comes from an external source, you load it in a way that preserves order, and because of that the IDENTITY column can be used in the data to checkk the order of the data. Perfect, we can work with that.

I also like that you post the query work you have done so far. A small tip - you can make it a lot shorter and a lot easier to read by replacing all the "ORDER BY (SELECT ID)" with just "ORDER BY ID". That is exactly the same, but shorter.

Unfortunately, I must still ask for more clarification before I can try to solve your problem.

First, I could only get the code to run after making some modifications. The table changes name between the CREATE TABLE and the INSERT statement and the columns don't match. I can probably guess to how the table should have looked, but I ask you to post this information to prevent guesswork. Please correct and repost. Also, after writing the code you plan to post, create an empty database (preferably with a case sensitive collation) on your test system, copy and paste the code in a query window and hit execute to see if it runs. That ensures that the code will execute for us as well, without us having to guess on how to correct it.

Also, please post the expected end result for the data you posted. That helps me (and others) verify our understanding of the posted requirements, and verify our attempted solutions.

Second, I need more clarification on the logic. You mention in the explanation that you look for sets of consecutive rows with the same value, but you don't specify in which of the five columns the value has to be the same. From looking on your query, I assume that this is the column Measure1, and that you then also want to apply the same logic for Measure2. Is that a correct assumption? Do the RCAcode, Post, and Metres columns matter in any way? If not, then you can help us focus better on the actual problem by removing them from the CREATE TABLE and INSERT statements. I do see that you have an ORDER BY Metres at the end of the query, which is a bit strange because the sequencing for the consecutive-row-determination is based on the ID column.

The requirement for the second flag is a bit fuzzy. Apparently a series of consecutive identical values can have streaks of up to two different values in between and can still be considered a flaggable series. So the range 1, 1, 1, 8, 8, 1, 1, 1, 1 is a series of 7 1's. But what about a series such as 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2 - is that a series of 8 1's with three breaks in between, or a series of 8 2's with three breaks in between?

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

nm.rajesh SSC-Addicted Points: 421 More actions · Answer 12

Hi Hugo Kornelis,

Apologies for wasting your time , I have given the wrong table and insert statement i have Now corrected them in the below queries :

As i do not have SQL SERVER 2012 and above at home i am unable to run the Query , but i have given a mock result which is working as expected , now i want to implement the second Logic , as the current process is visually identifying the similar records which is a bit of fuzzy , my goal is to automate the process having a defined set of rules .

Rule :

Rule 1 : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 is same (Which is Working from the below query not sure if this mechanism is correct or suggest for any better approach)

Rule 2 : : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 where 6 out of 8 records are similar Example 1, 8, 1, 8, 1, 1, 8, 1,1,1,8,8,1,1,1

The above example should be flagged for all records, here i need to identify 6 out of 8 records should be same then flag the all the records. ( i am not sure whether i am using the correct example to the problem if you are unclear please get back to me )

CREATE TABLE [dbo].[SampleTable](

[ID] [int] IDENTITY(1,1) NOT NULL,

[Measure1] [float] NULL,

[Measure2] [float] NULL,

) ON [PRIMARY]

Sample Data:

INSERT INTO [dbo].[SampleTable] (Measure1, Measure2)

VALUES

(17.2,22.9)

,(17.2,22.9)

,(22.6,27.6)

,(13.6,34.4)

,(18.2,25.6)

,(10.9,34.4)

,(10.4,34.4)

,(21.7,34.4)

,(15.2,34.4)

,(13.5,34.4)

,(13.5,30.8)

,(13.5,34.4)

,(17,33.9)

,(13.5,33.9)

,(17,33.9)

,(15.9,29.5)

,(15.9,33.9)

SELECT * FROM [SampleTable]

Query:

WITH FlagCount

AS (

SELECT CASE

WHEN LEAD(Measure1, 1) OVER (

ORDER BY ID

) = Measure1

OR LAG(Measure1, 1) OVER (

ORDER BY ID

) = Measure1

THEN ROW_NUMBER() OVER (

PARTITION BY Measure1 ORDER BY ID

) - ROW_NUMBER() OVER (

ORDER BY (

ID

)

END Measure1_Flag

,CASE

WHEN LEAD(Measure2, 1) OVER (

ORDER BY ID

) = Measure2

OR LAG(Measure2, 1) OVER (

ORDER BY ID

) = Measure2

THEN ROW_NUMBER() OVER (

PARTITION BY Measure2 ORDER BY ID

) - ROW_NUMBER() OVER (

ORDER BY (

ID

)

END Measure2_Flag

,LEAD(Measure2, 1) OVER (

ORDER BY ID

) LEAD2

,LAG(Measure2, 1) OVER (

ORDER BY ID

) LAG2

,LEAD(Measure1, 1) OVER (

ORDER BY ID

) LEAD1

,LAG(Measure1, 1) OVER (

ORDER BY ID

) LAG1

,*

FROM SampleTable

)

,FLAG

AS (

SELECT CASE

WHEN COUNT(*) OVER (PARTITION BY Measure1_FLAG) >= 7

AND Measure1_FLAG IS NOT NULL

THEN 1

ELSE 0

END Measure1_FLag_COUNT

,CASE

WHEN COUNT(*) OVER (PARTITION BY Measure2_FLAG) >= 7

AND Measure2_FLAG IS NOT NULL

THEN 1

ELSE 0

END Measure2_FLag_Count

,*

FROM FlagCount

)

SELECT ID,Measure1, Measure1,CASE

WHEN Measure1_FLag_COUNT = 1

OR Measure2_FLag_Count = 1

THEN 1

ELSE 0

END FINAL_FLAG

FROM FLAG

ORDER BY ID

--RESULT For Rule 1:

IDMeasure1Measure2Flag

117.222.91

217.222.91

317.222.91

417.222.91

517.222.91

617.222.91

722.627.61

822.627.61

922.627.61

1022.627.61

1122.627.61

1222.627.61

1322.627.61

1422.627.61

1513.634.40

1618.225.60

1710.934.40

1810.434.40

1921.734.40

2015.234.40

2113.534.40

2213.534.40

2313.530.80

2413.534.40

251733.91

2613.533.91

271733.91

281733.91

291733.91

301733.91

311733.91

321733.91

3315.929.50

3415.933.90

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 13

Quick suggestion, requires SQL Server 2012 or later

😎

USE tempdb;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.SampleTable') IS NOT NULL DROP TABLE dbo.SampleTable;

CREATE TABLE [dbo].[SampleTable](

[ID] [int] IDENTITY(1,1) NOT NULL,

[Measure1] [float] NULL,

[Measure2] [float] NULL,

)

--Sample Data:

INSERT INTO [dbo].[SampleTable] (Measure1, Measure2)

VALUES

(17.2,22.9)

,(17.2,22.9)

,(22.6,27.6)

,(13.6,34.4)

,(18.2,25.6)

,(10.9,34.4)

,(10.4,34.4)

,(21.7,34.4)

,(15.2,34.4)

,(13.5,34.4)

,(13.5,30.8)

,(13.5,34.4)

,(17,33.9)

,(13.5,33.9)

,(17,33.9)

,(15.9,29.5)

,(15.9,33.9);

;WITH BASE_DATA AS

(

SELECT

ST.ID

,ST.Measure1

,ST.Measure2

,ROW_NUMBER() OVER

(

ORDER BY ST.ID

) - COUNT(*) OVER

(

PARTITION BY ST.Measure1

ORDER BY ST.ID

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW

) AS M1_RCNT

,ROW_NUMBER() OVER

(

ORDER BY ST.ID

) - COUNT(*) OVER

(

PARTITION BY ST.Measure2

ORDER BY ST.ID

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW

) AS M2_RCNT

FROM dbo.SampleTable ST

)

SELECT

BD.ID

,BD.Measure1

,BD.Measure2

,SIGN(CASE WHEN COUNT(*) OVER

(

PARTITION BY BD.M1_RCNT

) > 7 THEN 1 ELSE 0 END

+ CASE WHEN COUNT(*) OVER

(

PARTITION BY BD.M2_RCNT

) > 7 THEN 1 ELSE 0 END) AS FLAG

FROM BASE_DATA BD

ORDER BY BD.ID;

Results

ID Measure1 Measure2 FLAG

----------- ---------------------- ---------------------- -----------

1 17.2 22.9 0

2 17.2 22.9 0

3 17.2 22.9 0

4 17.2 22.9 0

5 17.2 22.9 0

6 17.2 22.9 0

7 22.6 27.6 1

8 22.6 27.6 1

9 22.6 27.6 1

10 22.6 27.6 1

11 22.6 27.6 1

12 22.6 27.6 1

13 22.6 27.6 1

14 22.6 27.6 1

15 13.6 34.4 0

16 18.2 25.6 0

17 10.9 34.4 0

18 10.4 34.4 0

19 21.7 34.4 0

20 15.2 34.4 0

21 13.5 34.4 0

22 13.5 34.4 0

23 13.5 30.8 0

24 13.5 34.4 0

25 17 33.9 1

26 13.5 33.9 1

27 17 33.9 1

28 17 33.9 1

29 17 33.9 1

30 17 33.9 1

31 17 33.9 1

32 17 33.9 1

33 15.9 29.5 0

34 15.9 33.9 0

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 14

nm.rajesh (1/25/2016)
Apologies for wasting your time , I have given the wrong table and insert statement i have Now corrected them in the below queries :

No apologies needed, everyone can make a mistake. The only one who can waste my time is me, and in this case I posted back with my extra questions just to avoid wasting time.

As i do not have SQL SERVER 2012 and above at home i am unable to run the Query

http://www.microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx

Rule :
Rule 1 : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 is same (Which is Working from the below query not sure if this mechanism is correct or suggest for any better approach)
Rule 2 : : Identify and flag rows where 8 or more consecutive rows for measure1 or measure 2 where 6 out of 8 records are similar Example 1, 8, 1, 8, 1, 1, 8, 1,1,1,8,8,1,1,1
The above example should be flagged for all records, here i need to identify 6 out of 8 records should be same then flag the all the records. ( i am not sure whether i am using the correct example to the problem if you are unclear please get back to me )

Okay, now it gets really confusing. You posted three times in this topic now, and each post has completely different requirements. First you look for consecutive series of 1, 2, 3, 4. Then you look for consecutive series of 7 identical values and consecutive series of 7 identical with one or two different in between. And now I see that you look for consecutive series of 8 or more identical and consecutive series of 8 or more with at least 6 identical. These constantly changing requirements gives me the impression that the actual requirement is not completely clear yet - either because you are still struggling to understand what your customer/employer wants, or because the customer/employer is still in the decision-making phase. That is all completely okay, it is a normal phase in IT projects, but it does mean that it is simply too early to try to code. First the requirements have to be crystal clear, and everyone needs to sign off on them; only then is it time to start to code.

Now even though I am afraid it's a moving target, I will try to fire a few shots at it. Since you already have rule 1, I will concentrate on rule 2. I can interpret this in two ways. The first way is to mark a row if it is member of at least one sequence of eight consecutive rows that have six identical values. In this interpretation, I do not understand why you say that all rows should be marked - in the first eight rows, there are five values 1 and three value 8, so why should row 1 and 2 be marked?

The second possible interpretation is to mark all rows that are part of a sequence of eight or more rows where the same value appears at least six times. Problem with this interpretation is that I don't think you really want it. If I expand the sample size to, for instance, 100 million rows, then the combination of all those rows together is definitely a sequence of eight or more rows. And in such a large collection of data, the chance of finding at least 6 rows that have the same value is quite high, even if all values are random. This interpretation would mean that in that case all 100 million rows are marked, and I don't think that you actually want this.

Based on what you write, I get that you are looking for sequences of "mostly" unchanged measurements, where the "mostly" means that there is a fuzzy part to the equation where a limited amount of different values should not interrupt the sequence. But for SQL Server, "mostly" simply does not cut it - SQL Server needs an exact definition, and so we (and you) need an exact definition. My suggestion is that you go back to the employer/customer to seek clarification. Ask them to provide examples of series that should and should not be marked, and ask for explanation why. If they come with clear-cut cases only (and they probably will), then take their examples, modify them to make the differences smaller and keep asking questions until you know what the rule is. Then try to find weird edge cases (like I did with the 1 1 2 2 1 1 2 2 sequence in my earlier post and with the 100 million rows in this post) to try to challenge the rules you found. Keep doing this until you finally arrive at a definition for the flag2 logic that all agree on, and that you can then build the code for.

--RESULT For Rule 1:

When I asked in the previous post for the expected results, I meant the expected results for the part of the logic you need help with. Not the part of the logic that you already figured out.

Now I see that Eirikur did post some code. I still don't fully understand your requirements yet, so I have no idea if his code helps you or not. If not, then please follow my suggestion above to first make sure that you get the actual requirements fully clarified, then post again - as before with CREATE TABLE and INSERT, with an explanation of the desired logic, and with the expected results for that logic. I am still willing to help you, but I will not spend time shooting at a moving target - so I will not even start on code until I am confident that the real requirements are clear and stable.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 15

Ed Wagner (1/23/2016)
Eirikur Eiriksson (1/23/2016)
Jeff Moden (1/22/2016)
Eirikur Eiriksson (1/22/2016)
The quick suggestion towards a solution posted earlier was just that, less than two minutes of coding to demonstrate a method for identifying groups in sequences. Now with further 5 minutes to piece together a complete solution, here it is.
...{snip}...
This is as far as one can go on this problem without further input from the O/P but this is at least a solution, not often that one sees such a simple problem stretch to more than handful of posts without one, some definitely have more time to waist than I do;-).
But it's not a simple solution if it's wrong, Eirikur, and, make no doubt about, the solutions have been incorrect up 'til now and may still be incorrect because we've not heard squat from the OP. Unfortunately, the OP "OK'd" one of those early "simple" solutions and it's wrong. It's never a waste to get to the truth, even after the fact. Even the inclusion of the a_m column is only correct if it's guaranteed to be a proper group number of sorts. Heh... and remember... it's usually the "simple" problems that are the biggest killers in code.
To be sure, I'm not pointing a finger at anyone for trying. I'm just saying that folks missed a nuance that will cause undetectable errors (they won't fail... they'll just present wrong answers) in the future and we have a chance to help fix that.
Jeff, I do very much appreciate when I'm corrected if I'm wrong and of course this is no exception, my intention with the initial suggestion was to demonstrate the method of subtracting a broken sequence from an unbroken one in order to create a group identifier. But I literally had less than two minutes to construct the answer which unfortunately deprived me of the opportunity to complete it, I posted it incomplete which was my bad but I must say that I did somewhat count on the likes of you and others to pitch in, like you did, to complete the solution. What I don't like is when posters troll around the threads and bash other solution posters without any constructive input.
😎
Edit: Typo
I couldn't agree more with your last statement, Eirikur. Unfortunately, it's become more and more common lately.

I'd take some heart in what was said as mean as it sounded. We have quick posts that don't survive the future and people saying they solved what was given at the time instead of asking the right questions before posting. Are you sure you'd want a future employer to find this post that started off by ignoring the fact that there was nothing in the original post to reliably sort on? Think about it a little more. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)