How to Return a recordset while looping with FETCH

  • If this is not the correct forum, my apologies.

    I have a table that stores log data of file downloads. Each record includes a user IP and a date/time stamp. I want to eliminate all duplicate records that occur within a 5 minute period. So far, I have figured out how to declare a cursor and cycle through the records (using FETCH and a WHILE loop), storing the fields I need to compare and successfully performing the comparison.

    My question is this: I can use the print function to return results, but how do I return the data as a recordset to the calling script?

    Here is my code - be warned, I am a relative newbie & may be doing something stupid.

    Use encor5
    Go
    declare @OldTime smalldatetime
    declare @OldIP nvarchar(50)
    declare @OldSong nvarchar(255)
    declare @CurrTime smalldatetime
    declare @CurrIP nvarchar(50)
    declare @CurrSong nvarchar(255)
    declare @CurrFile nvarchar(255)
    declare @CurrComposer nvarchar(100)
    declare @CurrArtist nvarchar(100)
    declare @CurrEntertainer nvarchar(100)
    
    declare SongLogs cursor for
    SELECT     tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name, 
                          tblSongs.[file], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name
    FROM         tblEntertainers RIGHT OUTER JOIN
                          tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN
                          tblSongLogs ON tblSongs.id = tblSongLogs.Song_id
    ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time
    
    Open SongLogs
    FETCH NEXT FROM SongLogs
    INTO @CurrTime,@CurrIP,@CurrSong,@CurrFile,@CurrComposer,@CurrArtist,@CurrEntertainer
    WHILE @@FETCH_STATUS = 0
    BEGIN
    If ((@OldSong=@CurrSong) and (@OldIP=@CurrIP) and DateDiff(mi,@OldTime, @CurrTime)<5)
    BEGIN
    print '***** Duplicate Found of ' + @CurrSong + '. *****'
    Set @OldSong = @CurrSong
    Set @OldIP = @CurrIP
    Set @OldTime = @CurrTime
    END
    Else
    BEGIN
    print @CurrSong + ' , ' + CAST(DateDiff(mi,@OldTime, @CurrTime) AS nvarchar(100))
    Set @OldSong = @CurrSong
    Set @OldIP = @CurrIP
    Set @OldTime = @CurrTime
    END
    FETCH NEXT FROM SongLogs
    INTO @CurrTime,@CurrIP,@CurrSong,@CurrFile,@CurrComposer,@CurrArtist,@CurrEntertainer
    END
    CLOSE SongLogs
    DEALLOCATE SongLogs
    

    Am I even on the right track?


    Kindest Regards,

    Matthew K. Lindley

  • To make things efficient, try rewriting to not use a cursor.  I'll have a quick shot at it now...

    But.  When you say "eliminate all duplicate records that occur within a 5 minute period" it appears that a duplicate record is one where the same IP address has used the same song within  a 5 minute period.  Do you want these to be

    • purged from the table,
    • returned to you as a recordset or
    • return all, except the duplicates to you as a recordset?

    if you want the the third option (forgive me - i'm tired but on review of this msg I think that's what you want), then, from your SQL

    SELECT     tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name,

                          tblSongs.[file], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name

    FROM         tblEntertainers RIGHT OUTER JOIN

                          tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN

                          tblSongLogs ON tblSongs.id = tblSongLogs.Song_id

    ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time

    we replace the tblSongsLog table with a derived table (some SQL that it will treat as a table reference).. 

    select log.request_date_time, log.user_ip, log.song_id
    from tblSongsLog log
    left join tblSongsLog logFuture
      on log.user_ip = logFuture.user_ip
      and log.song_id = logFuture.song_id
      and logFuture.request_date_time between log.request_date_time and DateAdd(minute, 5, log.request_date_time)
    where logFuture.request_date_time is null

    This essentially takes the log table and joins it to itself, with the join condition being to join with records containing the same user, same song within a period between the log's time and the log's time plus 5 minutes.

    The where clause then limits the rows returned to those that did not have a join (ie, no songs in the future 5 minute period).

    This is probably the most efficient way that I can think of the perform the filtering you require.  Make sure that you have an index on your songlog table that does the user_ip, song_id and request_date_time (in that order).

    You could recode this using "not exists" or even "not in" but past experience has usually shown that the outer join (here it is a left join) tends to look a bit stranger due to the need for the where [fieldname] = null clause but tends to perform faster.

     

    In any case, this is the entire SQL statement that you should run...

    SELECT     tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name,

                          tblSongs.[file], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name

    FROM         tblEntertainers RIGHT OUTER JOIN

                          tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN

                          (

    select log.request_date_time, log.user_ip, log.song_id

    from tblSongsLog log

    left join tblSongsLog logFuture

      on log.user_ip = logFuture.user_ip

      and log.song_id = logFuture.song_id

      and logFuture.request_date_time between log.request_date_time and DateAdd(minute, 5, log.request_date_time)

    where logFuture.request_date_time is null   

                         &nbsp tblSongLogs ON tblSongs.id = tblSongLogs.Song_id

    ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time

    The winking face you see is actually a closing bracket... I am not sure how to get rid of it!?

    Then you needn't worry about cursors, etc.  By the way, to return the data using your cursor, you would need to insert the desired rows one by one into a temporary table and then select all data from that table...

     

    Hope this helps!!

  • Ok, I am trying this again. Yesterday, I tried to make a post, but took so long that when I went to preview it, I lost everything I had done. No, I was not pleased.

    Anyway, thank you for your help. Unfortunatly, it did not work - it returns an empty set. However, it seems that you were close, so I started working with the portion of the query that will find all of the duplicate entries.

    Also, to simplify things, I created a view that combines the data from the relevant tables. Just for reference, here is the view vw_SongViews:

    SELECT     tblSongLogs.Request_Date_Time, tblSongLogs.User_IP, tblSongs.name AS [Song Name], 
                          tblSongs.[file] AS [Song File], tblSongs.Composer, tblSongs.[Original Artist], tblEntertainers.name AS Entertainer, 
                          tblSongLogs.Song_id, tblSongLogs.id AS [Log ID]
    FROM         tblEntertainers RIGHT OUTER JOIN
                          tblSongs ON tblEntertainers.id = tblSongs.entertainer_id RIGHT OUTER JOIN
                          tblSongLogs ON tblSongs.id = tblSongLogs.Song_id
    ORDER BY tblSongLogs.User_IP, tblSongLogs.Request_Date_Time

    Ok, then I tried adapting the "duplicates" portion of the query to use this view. What I came up with is this:

    SELECT logCurrent.* FROM vw_SongViews logCurrent
    LEFT JOIN vw_SongViews logFuture
    ON logCurrent.User_IP = logFuture.User_IP
    AND logCurrent.Song_id=logFuture.Song_id
    AND logFuture.Request_Date_Time BETWEEN logCurrent.Request_Date_Time AND DateAdd(minute,5,logCurrent.Request_Date_Time)

    My expectation was that if I could get a set of repeat song request from the same IP returned, I would be a little closer to my goal. Removing the "WHERE" clause gave me some search results. The problem appeared to be that this query returned every row of the log file, whether or not it had any duplicate entries within the specified time span. Looking a little closer, I found that this query did produce duplicate records for certain log records. By that I mean that some Log ID's would appear once, some twice, some ten times, etc.

    By cross checking this with the raw log data, I found that it was producing duplicate log entries for all log files that had duplicates within the 5 minute time span. Or so it seemed.

    For example, if I have log data for the following records:

    Request_Date_Time    User_IP         Song Name    Song File    Composer      Original Artist  Entertainer   Song_id  Log ID
    2005-06-14 13:17:00  12.119.150.134  City Lights  cityligh.rm  Emerald City  Emerlad City     Emerald City  261      950
    2005-06-14 13:17:00  12.119.150.134  City Lights  cityligh.rm  Emerald City  Emerlad City     Emerald City  261      951
    2005-06-14 13:20:00  12.119.150.134  City Lights  cityligh.rm  Emerald City  Emerlad City     Emerald City  261      952
    2005-06-14 13:21:00  12.119.150.134  Love Shack   rrshack.mp3  NULL          NULL             Roof Raisers  323      953
    

    Then the query would return records (Log ID's) 950 and 951 three times each, and would return record 953 only one time. This is because for each of the two log records (950-951) there are three matches for the query - each record matches itself, and the other two are also within the specified time range. Record 953 was not a duplicate request for the City Lights song, so it only appears one time, as you would expect. The oddball here is record 952. This record was returned only one time.

    Try as I might, I cannot puzzle that out. I had though that if the query is returning these duplicates in a predictable manner, I may be able to use that somehow. But this behavior does not seem to follow the logic of the query. I would have expected to see all three records duplicated 3 times, since each of them occurs within a 5 minute period. Indeed, in other places, where there are three requests by the same IP for the same song, all in less than 1 minute, each log ID is returned 3 times. The strange behavior only pops up when there is at least 1 minute separating the log entries.

    I would not have thought that 1 minute would make a difference, since the criteria specifies a 5 minute time span. If I remove the time portion of the query, then I get predictable results - well, more predictable to me. Each duplicate request log is returned as many times as there are duplicates (i.e. 5 requests from the same IP for same song will return 25 records, 5 duplicates of each of the 5 log records). Since I have to count repeated downloads, if they were not within a 5 minute period, this does not help me.

    I think I was closer with the cursor solution. I hate that because it seems clumsy. Maybe I would be better off just comparing the records in my ASP. I'll research creating a temporary table while using the cursor. Since I have that logic working (I think) it might be the easiest solution for me at this point.

    Thank you for your help, anyway. I knew this was a tough question before I even started working on it. Now that I am thinking about it, I wonder why I thought it would be easier to do these computations on the SQL server, rather than in my own ASP. It would not necessarily be faster, would it?


    Kindest Regards,

    Matthew K. Lindley

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply