Need help building a query.

Question

Post reply

Need help building a query.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004453 More actions · Answer 1

Remember the old ways before CROSS APPLY and ROW_NUMBER?

In days of old,

When Knights were bold,

And Windowing was not invented,

He wrapped his code

In a familiar mode

And RBAR was prevented. 😀

To wit...

CREATE INDEX dodah ON auData (Emp_ID, dt_End, dt_Start)

SELECT auID,

Emp_ID,

dt_Start,

dt_End = (SELECT MIN(dt_End) FROM auData csq WHERE csq.Emp_ID = d.Emp_ID AND csq.dt_End > d.dt_Start),

Emp_Cat

FROM auData d

WHERE Emp_ID > 0

AND d.dt_Start >= CAST('1753' AS DATETIME)

ORDER BY d.Emp_ID, d.dt_Start

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004453 More actions · Answer 2

CELKO (11/24/2010)
Can I re-write your DDL to ISO-11179 rules, add keys. I hope you know why NUMERIC(s,p) is not like DECIMAL(s,p). T-SQL has a DATE data type now and can use row constructors in a VALUES clause. Is this what you meant?
CREATE TABLE Timesheets
(emp_id NUMERIC (1, 0) NOT NULL,
employment_start_date DATE,
-- PRIMARY KEY (emp_id, employment_start_date),
employment_end_date DATE, -- null means current
emp_cat INTEGER NOT NULL);
GO
INSERT INTO Timesheets
VALUES(1, '2010-11-18', NULL, 1),
(1, NULL, '2010-11-18', 1),
(1, '2010-11-18', NULL, 1),
(1, NULL, '2010-11-23', 1),
(1, '2010-11-23', NULL, 90);
Opps! You screwed up the the DDL, so that the natrual primary key cannot be declared. That makes for complicaetd code:
WITH X
AS
(SELECT *,
(ROW_NUMBER() OVER (PARTITION BY emp_id
ORDER BY COALESCE (employment_start_date, employment_end_date))
+1) /2 AS pairs
FROM Timesheets)
SELECT X.emp_id, MAX(employment_start_date) AS employment_start_date,
MAX(employment_end_date) AS employment_end_date,
MAX(emp_cat)
FROM X
GROUP BY emp_id, pairs;

"WITH X"???? :blink: BWWAAA-HAAA-HAAAA-HAAAAA!!! ROFLMAO!!! And you have the nerve to bitch at people about their naming conventions? HOOOOOO-EEEEEE!!! :Whistling: You having Fortran flashbacks or what?

And what's with this ROW_NUMBER garbage? When are you going to come out of the 50's and stop treating tables like mag tape with sequence numbers on each record? Please read a book on basic RDBMS, so you can stop writing COBOL and assembly language in SQL and start using SQL for RDBMS.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95461 More actions · Answer 3

Jeff Moden (11/24/2010)
In days of old,
When Knights were bold,
And Windowing was not invented,
He wrapped his code
In a familiar mode
And RBAR was prevented.

You ought to put this into your signature block.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004453 More actions · Answer 4

{edit} wrong post.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 5

Wayne, just because a table has an ordered clustered index, doesn't mean an identity column will honor that order. Usually it will, but you can't guarantee it.

Jeff's solution works if there aren't any "skip steps", where you have two start and then one end, or anything like that, in date sequences. If you do have skips in the data (and timeclocks sometimes do), you'd need to use some variation on a cursor. In SQL 2000, that means either an actual cursor or a hidden cursor (quirky update). In SQL 2005 and beyond, you can use row_number() (which is another hidden cursor).

But don't use identity columns for that. They will usually work, but you can't guarantee it, especially if the data in the table is volatile, or the server has more than one query or more than one CPU core/thread available to it.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

WayneS SSC Guru Points: 95461 More actions · Answer 6

GSquared (11/29/2010)
Wayne, just because a table has an ordered clustered index, doesn't mean an identity column will honor that order. Usually it will, but you can't guarantee it.
Jeff's solution works if there aren't any "skip steps", where you have two start and then one end, or anything like that, in date sequences. If you do have skips in the data (and timeclocks sometimes do), you'd need to use some variation on a cursor. In SQL 2000, that means either an actual cursor or a hidden cursor (quirky update). In SQL 2005 and beyond, you can use row_number() (which is another hidden cursor).
But don't use identity columns for that. They will usually work, but you can't guarantee it, especially if the data in the table is volatile, or the server has more than one query or more than one CPU core/thread available to it.

GSquared - I wasn't depending on the clustered index... I was depending on the ORDER BY of the SELECT statement that was inserting the data into the table. Since I specified an ORDER BY, the results should be inserted in the proper order, which will assign the identity column in the proper sequence. Won't the identity column work in this case?

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004453 More actions · Answer 7

GSquared (11/29/2010)
Wayne, just because a table has an ordered clustered index, doesn't mean an identity column will honor that order. Usually it will, but you can't guarantee it.
Jeff's solution works if there aren't any "skip steps", where you have two start and then one end, or anything like that, in date sequences. If you do have skips in the data (and timeclocks sometimes do), you'd need to use some variation on a cursor. In SQL 2000, that means either an actual cursor or a hidden cursor (quirky update). In SQL 2005 and beyond, you can use row_number() (which is another hidden cursor).
But don't use identity columns for that. They will usually work, but you can't guarantee it, especially if the data in the table is volatile, or the server has more than one query or more than one CPU core/thread available to it.

The only time I've ever seen it actually fail is when you use the IDENTITY function and either you've used an indeterminate source to sort on (sorting by GETDATE() or the like) or parallelism occurs. I'm pretty sure that everyone knows what MAXDOP 1 does to fix that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 8

WayneS (11/29/2010)
GSquared (11/29/2010)
Wayne, just because a table has an ordered clustered index, doesn't mean an identity column will honor that order. Usually it will, but you can't guarantee it.
Jeff's solution works if there aren't any "skip steps", where you have two start and then one end, or anything like that, in date sequences. If you do have skips in the data (and timeclocks sometimes do), you'd need to use some variation on a cursor. In SQL 2000, that means either an actual cursor or a hidden cursor (quirky update). In SQL 2005 and beyond, you can use row_number() (which is another hidden cursor).
But don't use identity columns for that. They will usually work, but you can't guarantee it, especially if the data in the table is volatile, or the server has more than one query or more than one CPU core/thread available to it.
GSquared - I wasn't depending on the clustered index... I was depending on the ORDER BY of the SELECT statement that was inserting the data into the table. Since I specified an ORDER BY, the results should be inserted in the proper order, which will assign the identity column in the proper sequence. Won't the identity column work in this case?

Ah, I misread what you wrote there.

Yes, that should work. Details here: http://support.microsoft.com/kb/273586

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

michaelwelcome SSC-Addicted Points: 471 More actions · Answer 9

you could do a correlated sub query.

select

d1.auID,

d1.Emp_ID,

d1.dt_Start,

(

select top 1

d2.dt_End

from auData d2

where d2.auID > d1.auID

and d2.Emp_ID = d2.Emp_ID

order by d2.auID

) as dt_end

Emp_Cat

)

from auData d1

where d1.dt_Start is not null

This will select only records with a start date, and then do a sub query to find the next record with the same employee id. When querying your data is this hard it may mean that you have not designed your tables correctly. What about filling in your end dates as the data is received? Just find the most recent record for an employee and update the end date. Then each record would have a full time interval.

Jeff Moden SSC Guru Points: 1004453 More actions · Answer 10

michaelwelcome (12/1/2010)
you could do a correlated sub query.
select
d1.auID,
d1.Emp_ID,
d1.dt_Start,
(
select top 1
d2.dt_End
from auData d2
where d2.auID > d1.auID
and d2.Emp_ID = d2.Emp_ID
order by d2.auID
) as dt_end
Emp_Cat
)
from auData d1
where d1.dt_Start is not null
This will select only records with a start date, and then do a sub query to find the next record with the same employee id. When querying your data is this hard it may mean that you have not designed your tables correctly. What about filling in your end dates as the data is received? Just find the most recent record for an employee and update the end date. Then each record would have a full time interval.

That's pretty much what I did... correlated subquery with the ability to find the "min" end date. As some of the other folks pointed out, neither of ours works exactly correctly if there are "skips" in the data where there are things like 2 end dates for one start date.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sri76 Say Hey Kid Points: 666 More actions · Answer 11

Hi,

Try this query

Select Distinct x.auID, x.Emp_ID, x.dt_Start, (Select top 1 dt_End From auData Where dt_End > x.dt_Start) as dt_End

From

(Select auID, Emp_ID, dt_Start, Emp_Cat From auData Where dt_Start is not null) x,

(Select auID, Emp_ID, dt_End, Emp_Cat From auData Where dt_End is not null) y

Jeff Moden SSC Guru Points: 1004453 More actions · Answer 12

sri76 (12/6/2010)
Hi,
Try this query
Select Distinct x.auID, x.Emp_ID, x.dt_Start, (Select top 1 dt_End From auData Where dt_End > x.dt_Start) as dt_End
From
(Select auID, Emp_ID, dt_Start, Emp_Cat From auData Where dt_Start is not null) x,
(Select auID, Emp_ID, dt_End, Emp_Cat From auData Where dt_End is not null) y

Try it on, say, 10,000 rows and watch what SQL Server does with that partial cross join on the Execution Plan... 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)