How to Return a recordset while looping with FETCH

Question

Post reply

How to Return a recordset while looping with FETCH

Matthew K. Lindley

Grasshopper

Points: 10

July 22, 2005 at 3:56 pm

#90130

If this is not the correct forum, my apologies.

I have a table that stores log data of file downloads. Each record includes a user IP and a date/time stamp. I want to eliminate all duplicate records that occur within a 5 minute period. So far, I have figured out how to declare a cursor and cycle through the records (using FETCH and a WHILE loop), storing the fields I need to compare and successfully performing the comparison.

My question is this: I can use the print function to return results, but how do I return the data as a recordset to the calling script?

Here is my code - be warned, I am a relative newbie & may be doing something stupid.

Use encor5
Go
declare @OldTime smalldatetime
declare @OldIP nvarchar(50)
declare @OldSong nvarchar(255)
declare @CurrTime smalldatetime
declare @CurrIP nvarchar(50)
declare @CurrSong nvarchar(255)
declare @CurrFile nvarchar(255)
declare @CurrComposer nvarchar(100)
declare @CurrArtist nvarchar(100)
declare @CurrEntertainer nvarchar(100)

declare SongLogs cursor for
SELECT     tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name, 
                      tblSongs.[file], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name
FROM         tblEntertainers RIGHT OUTER JOIN
                      tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN
                      tblSongLogs ON tblSongs.id = tblSongLogs.Song_id
ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time

Open SongLogs
FETCH NEXT FROM SongLogs
INTO @CurrTime,@CurrIP,@CurrSong,@CurrFile,@CurrComposer,@CurrArtist,@CurrEntertainer
WHILE @@FETCH_STATUS = 0
BEGIN
If ((@OldSong=@CurrSong) and (@OldIP=@CurrIP) and DateDiff(mi,@OldTime, @CurrTime)<5)
BEGIN
print '***** Duplicate Found of ' + @CurrSong + '. *****'
Set @OldSong = @CurrSong
Set @OldIP = @CurrIP
Set @OldTime = @CurrTime
END
Else
BEGIN
print @CurrSong + ' , ' + CAST(DateDiff(mi,@OldTime, @CurrTime) AS nvarchar(100))
Set @OldSong = @CurrSong
Set @OldIP = @CurrIP
Set @OldTime = @CurrTime
END
FETCH NEXT FROM SongLogs
INTO @CurrTime,@CurrIP,@CurrSong,@CurrFile,@CurrComposer,@CurrArtist,@CurrEntertainer
END
CLOSE SongLogs
DEALLOCATE SongLogs

Am I even on the right track?

Kindest Regards,

Matthew K. Lindley

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Ian Yates SSCoach Points: 19738 More actions · Answer 1

To make things efficient, try rewriting to not use a cursor. I'll have a quick shot at it now...

But. When you say "eliminate all duplicate records that occur within a 5 minute period" it appears that a duplicate record is one where the same IP address has used the same song within a 5 minute period. Do you want these to be

purged from the table,
returned to you as a recordset or
return all, except the duplicates to you as a recordset?

if you want the the third option (forgive me - i'm tired but on review of this msg I think that's what you want), then, from your SQL

SELECT tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name,

tblSongs.[file], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name

FROM tblEntertainers RIGHT OUTER JOIN

tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN

tblSongLogs ON tblSongs.id = tblSongLogs.Song_id

ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time

we replace the tblSongsLog table with a derived table (some SQL that it will treat as a table reference)..

select log.request_date_time, log.user_ip, log.song_id
from tblSongsLog log
left join tblSongsLog logFuture
  on log.user_ip = logFuture.user_ip
  and log.song_id = logFuture.song_id
  and logFuture.request_date_time between log.request_date_time and DateAdd(minute, 5, log.request_date_time)
where logFuture.request_date_time is null

This essentially takes the log table and joins it to itself, with the join condition being to join with records containing the same user, same song within a period between the log's time and the log's time plus 5 minutes.

The where clause then limits the rows returned to those that did not have a join (ie, no songs in the future 5 minute period).

This is probably the most efficient way that I can think of the perform the filtering you require. Make sure that you have an index on your songlog table that does the user_ip, song_id and request_date_time (in that order).

You could recode this using "not exists" or even "not in" but past experience has usually shown that the outer join (here it is a left join) tends to look a bit stranger due to the need for the where [fieldname] = null clause but tends to perform faster.

In any case, this is the entire SQL statement that you should run...

SELECT tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name,

tblSongs.[file], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name

FROM tblEntertainers RIGHT OUTER JOIN

tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN

(

select log.request_date_time, log.user_ip, log.song_id

from tblSongsLog log

left join tblSongsLog logFuture

on log.user_ip = logFuture.user_ip

and log.song_id = logFuture.song_id

and logFuture.request_date_time between log.request_date_time and DateAdd(minute, 5, log.request_date_time)

where logFuture.request_date_time is null

&nbsp tblSongLogs ON tblSongs.id = tblSongLogs.Song_id

ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time

The winking face you see is actually a closing bracket... I am not sure how to get rid of it!?

Then you needn't worry about cursors, etc. By the way, to return the data using your cursor, you would need to insert the desired rows one by one into a temporary table and then select all data from that table...

Hope this helps!!

Matthew K. Lindley Grasshopper Points: 10 More actions · Answer 2

Ok, I am trying this again. Yesterday, I tried to make a post, but took so long that when I went to preview it, I lost everything I had done. No, I was not pleased.

Anyway, thank you for your help. Unfortunatly, it did not work - it returns an empty set. However, it seems that you were close, so I started working with the portion of the query that will find all of the duplicate entries.

Also, to simplify things, I created a view that combines the data from the relevant tables. Just for reference, here is the view vw_SongViews:

SELECT     tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name AS [Song Name], 
                      tblSongs.[file] AS [Song File], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name AS Entertainer, 
                      tblSongLogs.Song_id, tblSongLogs.id AS [Log ID]
FROM         tblEntertainers RIGHT OUTER JOIN
                      tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN
                      tblSongLogs ON tblSongs.id = tblSongLogs.Song_id
ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time

Ok, then I tried adapting the "duplicates" portion of the query to use this view. What I came up with is this:

SELECT logCurrent.* FROM vw_SongViews logCurrent
LEFT JOIN vw_SongViews logFuture
ON logCurrent.User_IP = logFuture.User_IP
AND logCurrent.Song_id=logFuture.Song_id
AND logFuture.Request_Date_Time BETWEEN logCurrent.Request_Date_Time AND DateAdd(minute,5,logCurrent.Request_Date_Time)

My expectation was that if I could get a set of repeat song request from the same IP returned, I would be a little closer to my goal. Removing the "WHERE" clause gave me some search results. The problem appeared to be that this query returned every row of the log file, whether or not it had any duplicate entries within the specified time span. Looking a little closer, I found that this query did produce duplicate records for certain log records. By that I mean that some Log ID's would appear once, some twice, some ten times, etc.

By cross checking this with the raw log data, I found that it was producing duplicate log entries for all log files that had duplicates within the 5 minute time span. Or so it seemed.

For example, if I have log data for the following records:

Request_Date_Time    User_IP         Song Name    Song File    Composer      Original Artist  Entertainer   Song_id  Log ID
2005-06-14 13:17:00  12.119.150.134  City Lights  cityligh.rm  Emerald City  Emerlad City     Emerald City  261      950
2005-06-14 13:17:00  12.119.150.134  City Lights  cityligh.rm  Emerald City  Emerlad City     Emerald City  261      951
2005-06-14 13:20:00  12.119.150.134  City Lights  cityligh.rm  Emerald City  Emerlad City     Emerald City  261      952
2005-06-14 13:21:00  12.119.150.134  Love Shack   rrshack.mp3  NULL          NULL             Roof Raisers  323      953

Then the query would return records (Log ID's) 950 and 951 three times each, and would return record 953 only one time. This is because for each of the two log records (950-951) there are three matches for the query - each record matches itself, and the other two are also within the specified time range. Record 953 was not a duplicate request for the City Lights song, so it only appears one time, as you would expect. The oddball here is record 952. This record was returned only one time.

Try as I might, I cannot puzzle that out. I had though that if the query is returning these duplicates in a predictable manner, I may be able to use that somehow. But this behavior does not seem to follow the logic of the query. I would have expected to see all three records duplicated 3 times, since each of them occurs within a 5 minute period. Indeed, in other places, where there are three requests by the same IP for the same song, all in less than 1 minute, each log ID is returned 3 times. The strange behavior only pops up when there is at least 1 minute separating the log entries.

I would not have thought that 1 minute would make a difference, since the criteria specifies a 5 minute time span. If I remove the time portion of the query, then I get predictable results - well, more predictable to me. Each duplicate request log is returned as many times as there are duplicates (i.e. 5 requests from the same IP for same song will return 25 records, 5 duplicates of each of the 5 log records). Since I have to count repeated downloads, if they were not within a 5 minute period, this does not help me.

I think I was closer with the cursor solution. I hate that because it seems clumsy. Maybe I would be better off just comparing the records in my ASP. I'll research creating a temporary table while using the cursor. Since I have that logic working (I think) it might be the easiest solution for me at this point.

Thank you for your help, anyway. I knew this was a tough question before I even started working on it. Now that I am thinking about it, I wonder why I thought it would be easier to do these computations on the SQL server, rather than in my own ASP. It would not necessarily be faster, would it?

Kindest Regards,

Matthew K. Lindley