April 22, 2008 at 9:36 am
Hello,
I would like to learn how to use SQL Server 2005 to build a horse racing database.
I am 100% unfamilar on how to automate the importing of the data files I download everyday for every track in N.America.
I was hoping to find others who use SQL Server 2005 with their horse racing databases, or at least any members who might be able to help me out to the best of their ability.
Things I need to learn how to do are:
Be able to unzip the original files and then be able to import the .csv format files.
I need to be able to split up the original files into more user friendly tables, the race data files contain nearly 1500 columns so that is not very user friendly. Some of the information I think would also need to be put into arrays.
I then need to make sure I eliminate any duplicate data that is already in the database.
I'm using Procaps data files and Exotic Results files from TSN http://www.tsnhorse.com. Here is a link to the structure of the ProCaps files http://www.tsnhorse.com/library/procaps.txt. Here is a link to the structure of the Exotic Results files http://www.tsnhorse.com/downloads/xrd.txt. Maybe being able to look at how the files are structured it might help in understanding how I would need to split these files up.
Here is a link to a traditional Past Performance layout http://www.tsnhorse.com/library/f10.pdf. I would best like to try to structure my tables the way they sort of have it in the traditional past performances. You can clearly see how they have it basically broken down into areas and those areas I guess could translate into tables.
I just wanted to provide some links so anyone not familar with this would get an idea of what I am trying to accomplish.
I am looking for the best and easiest way to import this data, manipulate it in a more user friendly fashion and eliminate any duplicate information so when queries are done it won't interfere with the results I am trying to achieve. This database will be used to asses races and information needs to be as accurate as possible.
If there is anybody out there willing to help me out it would be greatly appreciated. I'm not familar at all with building databases out of files like this so I need to crawl before I can run here. Manual entry is out of the question as there are simply too many tracks and too many races each day to even try to accomplish that. I think once I understand the basics, commands and how to build scripts or what not to automate the proccess I will be able to grow by leaps and bounds. To the experienced user my requests may be easy as pie, and that's what I am hoping for.
Also at http://www.tsnhorse.com they do provide sample files as well, just incase anybody simply just wanted to toy around with them a little. For the ProCaps here is the link http://www.tsnhorse.com/cgi-bin/HTML/products.html#pcs just click on the ProCaps data files and it will pop up a zip file to download, this is the procaps sample file. I can't seem to find any samples for the Exotic Results files though. I can though on request provide a few files to sample as well.
I hope this request isn't too far out there, and I hope I might be able to find the help I need to get this database started on the right foot. Any help you guys can provide for me will be greatly appreciated as well.
I'm not asking anyone to build my database, I'm only asking to help point me in the proper direction so I can learn and build a reliable database myself. Granted anyone who is willing to help out beyond the call of duty is always welcome since I'm not an expert and any help is most welcomed.
Once again thanks for anybody who takes the time to read this and chimes in.
April 22, 2008 at 9:47 am
haven't built a horse-racing one, but did build a rugby-league one a few years ago. didnt help to much with the betting though.
SSIS (Integration Studio) will be the tool best suited to unzipping a file and importing it into your system, it will be especially useful if the files are in a standard format as you will be able to create a package to auto-load the data every time it is updated. I will have a look at the data and see if i can come up with some useful tips from it.
read up on database design and normalisation before you start this task as you want to have a good design. Arrays are not used in SQL server at all, you would want to create look-up tables and data tables with primary and foreign keys.
Learn some basic db design for relational databases and you will save yourself headaches in the future.
April 22, 2008 at 3:01 pm
First, I recommending looking into getting SQL Server Integrations Services (SSIS) to import the files into a flat table. Same columns as the files you're importing.
If you're dealing with .zip files, the "command line utility" from WinZip can automate that. Download the files, have SSIS step through them with the "ForEachNext" collection. It can be used to issue the unzip command to each file. Other applications almost certainly have "command line" utilities for unzipping files, WinZip is the one I happen to use.
Once you have it unzipping the files for you, work out a "DataFlow" task that imports each file one-by-one. The ForEachNext structure you built to unzip them can be used to control this. (I'm writing this as if this were a lesson plan. What you do in each step will be helpful to the next step.)
Once you've worked out how to get it into the database in a flat table, you'll need to work out what final structure you want it to go into. I recommend looking up "normalization" with regard to databases. There are good articles on the subject on this site, in Wikipedia, and in almost any book on SQL.
Most likely, you'll want a table of horses, a table of riders, a table of tracks, a table of races, and a "many-to-many" table that hooks all of those together and includes the finish time. I'm sure there's a lot more to it, but I would think that would be the core. I'm not familiar with horse racing, except in the movies, but those would be a good place to start.
Once you can get SSIS to put the data into those tables, you're on to your analysis and reporting actions. By the time you've done all the prior steps, you'll be starting to get an idea of how this can work and how to look up the things that you don't know.
All in all, it shouldn't be too difficult a project. Once you get the basic ideas on how to look up the details in Books Online and MSDN and Google, you'll be fine.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 24, 2008 at 7:11 am
Thanks steveb and GSquared for the suggestions, I will look into them.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply