How to catch a screen-scraper?

  • I'll try and keep it short and sweet. The company I work for gets the bulk of its revenue from doing financial analysis and distributing it via a subscription based web site. While running a trace in this morning, I happened to notice a set of Stored Procs being called repetitively with different parameters (increasing order). All legit calls, no alarm there, but they were coming in way too fast and uniform for somebody sitting at a PC. Tracked the calls back to a user and seems they've been screen-scraping this one page for a while - daily same time and number of calls. The user agreement forbids this so they've been notified to knock it off or have the account revoked.

    We have an audit table where we track what users are hitting what pages so we can make some inferences after the fact but is anybody else dealing with a similar issue and/or have a creative way to pro-actively identify something like this?

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • It seems like the service you provide doesn't meet the requirements of some users.

    Sounds like there are some people out there looking for a way to get lists rather than single line communication (don't have a better wording).

    What you could do is to offer the service for a range-based report.

    I use to say: "the more crappy a system design is the beastlier are the methods to continue to work with it". (especially applies to a Software where you hAve to Pray after you submitted the data.)

    I'm not saying your software falls into the same category, but it seems like there is an extended need for information not yet covered by your application.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I agree that each consumer has different consumption needs and in the past we've mitigated this via web services and/or custom reports. As it turns out, the sales team had already been working with this user's company to make something like this happen. I guess it either wasn't moving fast enough or this guy was unaware of the path forward but decided to take it upon himself to solve the problem.

    At the end of the day, regardless of the reason, I need to be able to identify when this type of thing is happening.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • Why not just add a "hit" log to the stored procedure? You found the information by using Profiler... seems like a user hit log would do well. Then, write another proc to go through the "manual" process of determining such abuse and have it send you a morning report.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can offer a solution that will do this and also much more - complete recording with drill down to screenshots of every activity, key stroke looging, what IM chat sites, what was printed copied, emailed, who uses most bandwidth, what websites visited, what was he really doing when he crashed the system what was the error message etc etc etc. Silent install does not show in Task Manager cannot be unistalled by end user etc.

    stephen.jones@synergy-software.com

  • Ultimately I'm not sure you can do much other than revoke their account. You could make it difficult, put some limiter on the number of calls/hour or something else. But once the data hits their screen, they can get it.

    Years ago Intel spent millions to prevent engineers from stealing designs while working remotely. Couldn't print, save, etc. from the app. One of the engineers was caught selling designs to AMD. He used a camera to take pictures of the screen.

  • Thanks all for your input.

    Taking Jeff's advice, and since we're already logging site activity, I'll create a script that aggregates hits and timing and then manually scan it for trends. This guy was easy: 600+ hits starting at the same time every day.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • That's how I caught the "bots" at my last job. Just simple hit counters that capture both the user info and the IP address because bots are capable of rotating one or the other quite easily.

    Simple is good.

    Thanks for the feedback, RP_DBA.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply