November 1, 2010 at 5:52 pm
I was reading the discussion:
http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx
about this article:
http://www.sqlservercentral.com/articles/BETWEEN/71395/[/url]
having to do with a faster BETWEEN for dates.
One of the issues that came up was problems when using fake/unrealistic sample data.
I had already bookmarked many places that have sample data. Much of it is "real" -- meaning it is actually data people use, meaning you should be able to have more realistic examples, very large data sets, better optimize your queries/plans, etc. And once you get familiar with some of these data sets, it is probably faster than building some loop to generate faked data.
So here are some sources, do you know any more?
MASSIVE: http://www.datawrangling.com/some-datasets-available-on-the-web
http://www.guardian.co.uk/data-store
http://infochimps.com/datasets
http://stackoverflow.com/questions/57068/good-databases-with-sample-data
http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php
http://www.ferc.gov/docs-filing/eqr/soft-tools/sample-csv.asp
http://www.findingdulcinea.com/guides.html?pg=00&topic=/categories/sports/football
http://www.postneo.com/2007/09/09/accidental-apis-nfl-edition
http://www.livefantasyscoring.com/2008.shtml
http://www.red-gate.com/products/SQL_Data_Generator/index.htm
http://www.mauvais.com/Publish/ZD-Northwind.htm
http://www.codeplex.com/Wikipage?ProjectName=SqlServerSamples#databases
November 1, 2010 at 9:12 pm
I, for one, really appreciate the lists of data you've published. My problem, though, has frequently been that it takes way too much time to find a database from those types of samples that resembles what I need close enough. For example, let's say I need to simulate something that needs to have 1 million rows of random but constrained dates and amounts across 2 years and 200 accounts. How long would it take someone to find just the right data? Chances are they could never find it quite the way they want it sooooooo, they may have to find two or three databases and write some code to glean the data in a format they wanted.
Why not just write code to make the data to meet the requirements to begin with? It's really not that hard. I know, I know... what about some random names? Do they have to be real names or just variable lengths of characters that we could derive from a GUID? What about addresses? Again, do they have to be real street addresses or can we generate a random number and concatenate with parts of a GUID. Sure, there will be times when the addresses actually have to be valid for test... then maybe one of those databases you posted will come in handy. But for that date problem you cited? It's easier to build the data than to find it in one of the databases you were kind enough to post.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 2, 2010 at 8:30 am
Data sets with addresses, etc:
http://aws.amazon.com/publicdatasets/
http://www.manifold.net/updates/product_downloads.shtml
http://www.gsd.harvard.edu/gis/manual/realestate/index.htm
http://www.ibm.com/developerworks/xml/library/x-geomap2/index.html
http://www.gearthblog.com/blog/archives/2006/06/huge_database_u.html
Free data set generator, including names, addresses, phone numbers, emails, etc. Not real-world data, but probably pretty close and much easier than creating your own script...
Demo: http://www.generatedata.com/#generator
Download: http://www.generatedata.com/#download
November 2, 2010 at 12:27 pm
jpSQLDude (11/2/2010)
Data sets with addresses, etc:http://aws.amazon.com/publicdatasets/
http://www.manifold.net/updates/product_downloads.shtml
http://www.gsd.harvard.edu/gis/manual/realestate/index.htm
http://www.ibm.com/developerworks/xml/library/x-geomap2/index.html
http://www.gearthblog.com/blog/archives/2006/06/huge_database_u.html
Free data set generator, including names, addresses, phone numbers, emails, etc. Not real-world data, but probably pretty close and much easier than creating your own script...
Demo: http://www.generatedata.com/#generator
Download: http://www.generatedata.com/#download
From the data generator site...
Requirements
MySQL 4+
PHP 4+
Any modern, JS-enabled browser
Have you tried it with T-SQL?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply