Where to find a lot of data to build a database to practice on?

  • I don't like practicing on production data or even QA copies of it too much and I'm always looking for data sources. in the articles there are usually scripts to create a table or two to practice on, but I would like to find better data sources to make halfway real databases to practice on and to try to build to ideal conditions.

    So far I found stock market data you can downnload from yahoo as csv's. There is also some data from the Florida that the Wrox SSIS book links to in their tutorial.

    Does anyone else have a good source of a large amount of data that people can download freely off the internet to use as a learning resource?

    P.S. Just found http://www.freedb.org. Haven't downloaded it yet though.

  • here's one suggestion:

    http://geonames.usgs.gov/fips55.html

    that has some files that has every geographic place name in the US.(158K rows)

    you can also find similar data for free by searching for "zipcode database (that has around 42K zip code records);

    good practice for importing into SQL server tables, and then cleaning up the data;

    pulling out "normalized" data so you have a zipcode table,city table, county table, and state table. I fiddle with that when I get bored.

    there is other data there as well, such as the "official" codes for every country in the world, and other things as well.

    I've got a database with every email address in the world, but I only use that to send out my "Full of health? then don't click!" viagra spam.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • another similar source of data is here: nationalfile.zip

    http://geonames.usgs.gov/domestic/download_data.htm

    1.8 + million records in the text file, it's featurename/city/state/county along with latitude/longitude of different places/features in the US.

    because it is a lot bigger file, when you select * from it immediately after importing it, you'll learn you'll want to index the table; it's good reallife practice to put the index on it, select something based on the index, and then select something NOT in the index, and see the difference in SQL performance. it makes you learn to plan your indexes accordingly.

     

     

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • DTS the production tables to a development database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply