July 26, 2004 at 7:21 am
I'm sure one of you gurus can help me figure this out. I just can't get it.
I've got a table that looks like the following:
recnbr | mynbr | mydate | empid |
1 | 0 | 1/1/2001 | bob |
2 | NULL | 1/2/2001 | bob |
3 | NULL | 1/7/2001 | bob |
4 | NULL | 1/8/2001 | bob |
5 | NULL | 1/21/2001 | bob |
6 | 0 | 8/4/2003 | cindy |
7 | NULL | 8/5/2003 | cindy |
8 | NULL | 9/1/2003 | cindy |
9 | NULL | 9/8/2003 | cindy |
10 | NULL | 9/15/2003 | cindy |
11 | 0 | 3/1/2004 | terry |
12 | NULL | 3/3/2004 | terry |
13 | NULL | 3/4/2004 | terry |
14 | NULL | 3/4/2004 | terry |
15 | NULL | 3/20/2004 | terry |
I want to update the mynbr field to equal the difference in the number of days between the current mydate and the previous record's mydate plus the previous record's mynbr value. I want these numbers to be specific to each empid. The resulting table would look like the following:
recnbr | mynbr | mydate | empid |
1 | 0 | 1/1/2001 | bob |
2 | 1 | 1/2/2001 | bob |
3 | 6 | 1/7/2001 | bob |
4 | 7 | 1/8/2001 | bob |
5 | 20 | 1/21/2001 | bob |
6 | 0 | 8/4/2003 | cindy |
7 | 1 | 8/5/2003 | cindy |
8 | 28 | 9/1/2003 | cindy |
9 | 35 | 9/8/2003 | cindy |
10 | 42 | 9/15/2003 | cindy |
11 | 0 | 3/1/2004 | terry |
12 | 2 | 3/3/2004 | terry |
13 | 3 | 3/4/2004 | terry |
14 | 4 | 3/5/2004 | terry |
15 | 19 | 3/20/2004 | terry |
I'm trying to find the fastest way to do this because I may have 100,000 records or more. This is not a lot of records but it becomes a lot if you have to iterate through all of the records. I'm trying to find a way to do this with one query, without any loops.
Anybody got any ideas?
Thanks,
hawg
----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011
July 26, 2004 at 9:43 am
Someone may know better than I, but I think you are looking at a stored procedure for that one. You might be surprised how fast it could run.
You would need a cursor to get the prior record (based off recnbr I assume) to be sure no records had been dropped leaving you with holes in your sequence.
You could also consider writing the stored procedure to do one row and execute as a trigger whenever a row is inserted, updated, or deleted in your table. You would need to operate on all those events since any change in sequence would affect your whole outcome. This seems true even if your sequence was based on the date instead of recnumber.
However, I'm guessing there might be a better way to accomplish your end goal whatever it is.
July 26, 2004 at 10:21 am
I don't usually recommend this, but I'd use a cursor for this. Assuming that recno is an actual field in your table, use that and run a loop. Actually you don't need to do that, but here's a couple things to consider.
1. Don't try this in raw sql as one batch. Doing this many records causes a lot of transactions, locking, and may take a long time to complete, or rollback. http://www.sqlservercentral.com/columnists/sjones/batching.asp. Instead, use batches of rows or a cursor and multiple transactions.
2. I appears you can cursor on the empid. I might run a
select * from table order by empid, mydate
then keep track of the empid and as long as it's the same, calculate the time as
select datediff( d, a.mydate, b.mydate)
from mytable a
inner join mytable b
where a.empid = b.empid
and b.mydate = ( select max( c.mydate) from mytable c where c.empid = a.empid and c.mydate < a.mydate)
use that for the update.
3. Test on small batches, not the whole table or a copy of it. And be sure to test the boundary conidtions, only one row for an emp, etc.
Good luck
July 26, 2004 at 6:35 pm
Am I missing something here?
what is wrong with this:
Update o Set mynbr = datediff(d,(select min(mydate) from testt t where t.empid = o.Empid ),o.mydate)
from testt o
Tests:
create table testt (recnbr int , mynbr int, mydate datetime, empid varchar(50))
go
insert into testt (recnbr, mynbr,mydate,empid) Values (1, 0 , '1/1/2001', 'bob')
insert into testt (recnbr, mynbr,mydate,empid) Values (2, NULL , '1/2/2001', 'bob')
insert into testt (recnbr, mynbr,mydate,empid) Values (3, NULL , '1/7/2001', 'bob')
insert into testt (recnbr, mynbr,mydate,empid) Values (4, NULL , '1/8/2001', 'bob')
insert into testt (recnbr, mynbr,mydate,empid) Values (5, NULL , '1/21/2001', 'bob')
insert into testt (recnbr, mynbr,mydate,empid) Values (6, 0 , '8/4/2003', 'cindy')
insert into testt (recnbr, mynbr,mydate,empid) Values (7, NULL , '8/5/2003', 'cindy')
insert into testt (recnbr, mynbr,mydate,empid) Values (8, NULL , '9/1/2003', 'cindy')
insert into testt (recnbr, mynbr,mydate,empid) Values (9, NULL , '9/8/2003', 'cindy')
insert into testt (recnbr, mynbr,mydate,empid) Values (10, NULL, '9/15/2003', 'cindy')
insert into testt (recnbr, mynbr,mydate,empid) Values (11, 0, '3/1/2004', 'terry')
insert into testt (recnbr, mynbr,mydate,empid) Values (12, NULL, ' 3/3/2004', 'terry')
insert into testt (recnbr, mynbr,mydate,empid) Values (13, NULL, ' 3/4/2004', 'terry')
insert into testt (recnbr, mynbr,mydate,empid) Values (14, NULL, ' 3/4/2004', 'terry')
insert into testt (recnbr, mynbr,mydate,empid) Values (15, NULL, '3/20/2004', 'terry')
Update o Set mynbr = datediff(d,(select min(mydate) from testt t where t.empid = o.Empid ),o.mydate)
from testt o
select * from testt
recnbr mynbr mydate empid
------ ----- ----------------------- ------
1 0 2001-01-01 00:00:00.000 bob
2 1 2001-01-02 00:00:00.000 bob
3 6 2001-01-07 00:00:00.000 bob
4 7 2001-01-08 00:00:00.000 bob
5 20 2001-01-21 00:00:00.000 bob
6 0 2003-08-04 00:00:00.000 cindy
7 1 2003-08-05 00:00:00.000 cindy
8 28 2003-09-01 00:00:00.000 cindy
9 35 2003-09-08 00:00:00.000 cindy
10 42 2003-09-15 00:00:00.000 cindy
11 0 2004-03-01 00:00:00.000 terry
12 2 2004-03-03 00:00:00.000 terry
13 3 2004-03-04 00:00:00.000 terry
14 3 2004-03-04 00:00:00.000 terry
15 19 2004-03-20 00:00:00.000 terry
(15 row(s) affected)
* Noel
July 27, 2004 at 3:10 am
noeld, your solution has produced the correct answer this time but is assuming properties of the data that we don't know for sure - with the sample data shown it is true that mynbr ends up as the number of days from the employee's first date to mydate, but that is not the rule we are asked to follow.
(Original poster might want to clarify here! - if noeld's simpler statement about your data is true then his solution will be fine)
A query is possible that implements the rule as stated (update the mynbr field to equal the difference in the number of days between the current mydate and the previous record's mydate plus the previous record's mynbr value), it uses a correlated subquery to find, for each row, the previous (in date order) row for that emp:
update mytable T1
set mynbr =
-- number of days since previous mydate for this emp
-- previous mydate = highest mydate that is strictly less than this mydate
datediff(d, (select max(mydate) from mytable T2 where T2.empid = T1.empid and T2.mydate < T1.mydate), mydate)
+
-- add on mynbr from previous record (in mydate order)
(select mynbr from mytable T3 where T3.empid = T1.empid
and T3.mydate = (select max(mydate) from mytable T4 where T4.empid = T1.empid and T4.mydate < T1.mydate) )
(I like to alias each reference to the same table differnetly, it's a personal foible, don't know how necessary it is ) Notice how we identify the 'previous' record twice, in the same way each time - first to get its mydate, then to get its mynbr.
Oh, and I leave as an exercise for the reader to work out what happens in boundary cases...
July 27, 2004 at 11:42 am
AKM,
What the poster said was NOT what the numbers example was showing. I followed the numbers example and the post should have been the difference between the FIRST date of the employee entry and the current date So my example Still holds. If you run your query you are NOT going to get the same results than what the Numbers example showed, therefore is NOT a coincidence either
Hope is clear.
* Noel
July 28, 2004 at 4:09 am
Noeld's solution seems to be fine - there is just one question about the word "previous" in original post.
In case the records are not always ordered by date ascending, and the "previous" means "that with nearest lower value of recnbr", Noeld's query wouldn't work. But I can hardly imagine why anyone should want to (obviously) count a number of days from first occurrence, and then mess it all up by adding some value from some "previous" record - so it is 99% OK. I hope hawg willl confirm that we understood everything well, or elaborate a bit on what the query means and how it should work.
cheers, Vladan
July 28, 2004 at 7:33 am
Thank you all for your help and my apologies for not being clear. Thank you especially, noeld. Your answer does work in my case.
I had just been so close to the problem for so long that I got tunnel vision and could not see outside of the solution I was trying to create.
In my data, it is a fact that the data will come in by empid and in mydate, ascending, order. The whole purpose was to get the number of days from the first date for each particular employee. I wasn't thinking in that way and I messed everyone up by giving an insight into how I was trying to solve the problem.
The recnbr field is really of no consequence now. It was there from an earlier attempt to solve this problem.
Again, my apologies for the confusion but I appreciate everyone's help and time. I know I can always get great help from this community and I'll continue to do my best to provide help when I can.
Thanks again!
----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply