Where do I get useful datasets?

  • I remember that Steve Jones posted an article, maybe 2 years ago, in SSC about getting some data to use as test data. I've tried looking through my saved emails, etc., but I can't find it. This would be data like weather patterns in a certain area, or presidential elections, or baseball scores, etc. Can anyone point me to there I can find some, please?

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work - Tuesday, March 13, 2018 3:54 PM

    I remember that Steve Jones posted an article, maybe 2 years ago, in SSC about getting some data to use as test data. I've tried looking through my saved emails, etc., but I can't find it. This would be data like weather patterns in a certain area, or presidential elections, or baseball scores, etc. Can anyone point me to there I can find some, please?

    If you're just looking for large volumes of data to test things with, I've found that it's usually easier to just make your own so that you can customize the data any way you want or need it and you can do so very quickly.  Please see the following articles.  The first two contain techniques for creating random but constrained data over a fairly flat range and also explain the principle of very high performance "pseudo cursors" to act as a row source instead of using While Loops, Recursive CTEs, and other forms of RBAR to generate the data.  The third is for non-uniform random generation by Dwain Camps, a dearly departed friend.

    http://www.sqlservercentral.com/articles/Data+Generation/87901/
    http://www.sqlservercentral.com/articles/Test+Data/88964/
    http://www.sqlservercentral.com/articles/SQL+Uniform+Random+Numbers/91103/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Those are good, Jeff, but I was thinking more like the baseball scores of during a particular time span, or something along those lines. Something that might be interesting rather than totally random. We've got to start testing something new (to us) and we've already got a database full of generated test data, but it's hard to get along with first names of "Ndh@gsvb" and so on.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work - Tuesday, March 13, 2018 3:54 PM

    I remember that Steve Jones posted an article, maybe 2 years ago, in SSC about getting some data to use as test data. I've tried looking through my saved emails, etc., but I can't find it. This would be data like weather patterns in a certain area, or presidential elections, or baseball scores, etc. Can anyone point me to there I can find some, please?

    US Government has a lot of files available in different formats - I've used this as a starting point:
    Datasets - Data.gov

    Sue

  • Ah... got it.  I've not had the need, especially for names, because I can normally glean and randomize the data from "local sources".  I did do a search for "Large downloadable data sets" and got a whole lot of hits.  It's been a while since I've been on the Kaggle site but they did seem to have a bunch.  The only time I've had to find such things was for different studies (mostly solar insolation related) or requirements (list of used NPA/NXX's for telephony).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Rod at work - Wednesday, March 14, 2018 8:23 AM

    Those are good, Jeff, but I was thinking more like the baseball scores of during a particular time span, or something along those lines. Something that might be interesting rather than totally random. We've got to start testing something new (to us) and we've already got a database full of generated test data, but it's hard to get along with first names of "Ndh@gsvb" and so on.

    Brent Ozar is using the Stack Overflow database most of the time.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • Sue_H - Wednesday, March 14, 2018 8:42 AM

    Rod at work - Tuesday, March 13, 2018 3:54 PM

    I remember that Steve Jones posted an article, maybe 2 years ago, in SSC about getting some data to use as test data. I've tried looking through my saved emails, etc., but I can't find it. This would be data like weather patterns in a certain area, or presidential elections, or baseball scores, etc. Can anyone point me to there I can find some, please?

    US Government has a lot of files available in different formats - I've used this as a starting point:
    Datasets - Data.gov

    Sue

    That's what I'm looking for! Thanks, Sue.

    Kindest Regards, Rod Connect with me on LinkedIn.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply