November 30, 2004 at 5:52 pm
Hello,
I m writing a stored procedure to query a table Population that has the following fields - CityId, CityName, Plus0, Plus10, Plus20, Plus30, Plus40, Plus50, Plus60, Plus70, Plus80. The field Plus0 contains the number of people of age > 0 living in the city, Plus10 contains the number of people of age > 10 living in the city and so on. Given the city id and age groups like 20To40, 50To60, 40Plus, etc., I should be able to query the number of people in the city corresponding to the requested age group. Note that if the requested age group was 20To60, I need to make use of only 2 fields Plus20 and Plus60 in the table to compute this value. And if the requested age group was 40Plus, then I need only the value in the field Plus40. The problem is that a wide variety of age groups can be requested like 0Plus, 10Plus, ... , 80Plus, 0To10, 0To20, 0To30, .... 70To80.
Which is the most effecient way to handle this ?
1. Have a stored procedure that returns all the fields even though only 1 or 2 of them would be actually used ?
In this case, if I returned data for a large number of cities then there would be a lot of unnecessary information that was returned by the query. Lots of data would be passed through the network though most of it would not be used.
2. Have a stored procedure that takes in parameters @Plus0, @Plus10, @Plus20, .. @Plus80 that are bits indicating whether the field was required or not and then using a CASE statement to return values for a field only if the corresponding bit parameter was set, and returning NULL if the corresponding bit paramter was not set ?
In this case, I would be returning NULL for all those fields that were not required for a particular age group. This would save some network bandwidth, wouldn't it ?
3. Pass in the age group itself (ex: 0To20) as a parameter to the stored procedure and have lots of IF statements, one for each age group, that return only the fields that are needed for that age group.
This leads to a lot of code repitition.
4. Use a similar approach as above but use dynamic SQL queries to avoid code repitition.
But using dynamic SQL queries can affect the performance of the stored procedure as they need to be compiled each time.
5. Any other possible approaches ??
Looking forward to your responses,
Thanks much,
bmgun.
December 1, 2004 at 4:30 am
I'd take option 3. If the code looks too nasty, you could balance this by calling other stored procs - though then you end up with lots of nearly-identical SPs. Otherwise, good commenting will assist anyone unlucky enough to have to read the code.
Option 1 is nasty, and becomes worse when some bright spark asks if the app can be made mobile or accessed through a dial-up or something.
Option 2 would save some b/w I think, but if all you really want back is a single number, your calling routine is going to have to work out what to do with multiple fields.
Option 4: I dislike dynamic SQL on principle, though in practice, I think many application users wouldn't notice the compilation time compared to the time taken to run the query and push the data back over the network. Only testing and understanding the likely load will help.
HTH
Bill.
December 1, 2004 at 11:04 am
Thanks a lot for your response, Bill. I agree with you that option 3 would be best as far as effeciency of the code is concerned. But we might end up needing two or three tables similar to the Population table, and having two or three kinds of stored procedures for each table would lead to a huge number of stored procedures that might lead to difficulty in maintaining the code. For option 2, I can write a small function that deals with which fields to process for which age group, so that is also an option I'm looking at. Regarding option 1, you made a good point regarding the app being accessed from a dial-up connection (I don't think it'll go mobile in the near future). Are there any other concerns with option 1 other than the speed of transfer of a large amount of data ? If I knew that my clients all had high-speed connections, would you consider it ?
December 2, 2004 at 2:38 am
My objection is really on principal - it's one reason why I reject code with "SELECT *" in it. However, all of this is a compromise - if all your users are well connected, and your data volumes are small (a single number per age range), then let the network take the strain.
A further compromise always is that you're trying to get the best solution for your business. Less complex code is a worthwhile trade for less efficient operation sometimes, because it speeds up the development and eases support.
Good luck,
Bill.
December 2, 2004 at 10:33 am
normalize, you must! solve problems, it will!
A true jedi will not store data in columns in that manner.
Consider Yoda's design:
Table: CityPopulation (primary key of City/AgeGroup)
Columns:
City
AgeGroup
PopulationCount
Query it you will! Choose age group and return results!
select City, PopulationCount
from CityPopulation
where AgeGroup = @AgeGroup
Unnormalized tables lead to the dark side! Fix it you must -- and queries will be much shorter, quicker, more efficient. These are the goals of the Jedi.
A table of Age groups you will also add, for true integrity!
AgeGroups
"0-10"
"11-30"
..etc...
Unlearn how Excel has trained you -- a true jedi will normalize his or her tables! The same data in multiple columns, you will not store! it leads quickly to the dark side.
December 2, 2004 at 12:52 pm
I accept that normalizing the tables would solve many problems. But I have this other table ShopPopulation that contains the number of people of different age groups that are present in a shop every hour. The columns of this table are :
ShopId
CityId
CountDate
Plus0
Plus10
...
Plus70
Plus80
CountDate is a datetime that contains the date and time when a sample was taken for a particular ShopId. The CountDate varies but generally there are about 24 samples taken every day (approx. 1 every hour) for each ShopId. So I already have a large ShopPopulation table if I store data for about 500 shops for a year (say), that is about 24 * 365 * 500 = 4.3 million rows. If I were to normalize this table by adding an age group column, this would multiply the number of rows by about 20 (the number of age groups). Also it would lead to (ShopId, CityId, CountDate) being repeated for each age group, leading to a lot of redundant information. Is there a better way to normalize the ShopPopulation table ?
bmgun.
December 2, 2004 at 2:50 pm
more rows. less columns. more efficient and more scaleable database design. Most importantly, more indexable data! The tools of a true jedi!
Redundant it is not, as it is the nature of the relational database system.
If normalize it you cannot, an old jedi mind trick might work. A view you will create, to allow for a normalized look:
select shopID, CityID, CountDate, 0 as PlusValue, Plus0 as Value
union all
select shopID, CityID, CountDate, 10 as PlusValue, Plus10 as Value
union all
select shopID, CityID, CountDate, 20 as PlusValue, Plus20 as Value
union all
select shopID, CityID, CountDate, 30 as PlusValue, Plus30 as Value
union all
...
select shopID, CityID, CountDate, 80 as PlusValue, Plus80 as Value
Understand this, hm? Not for the weak-minded, indeed.
create it once, you shall. Save as a view, and query from this you may. Add more columns as needed, accomodate all of your requests, you shall.
Should performance suffer greatly, a cross join trick Yoda may show ... but beware of the most powerful and dangerous of jedi techniques!
December 6, 2004 at 1:25 pm
Thanks to both of you for your valuable suggestions. We are going to carry out some tests based on them and see what we get. Shall post it here if we find out some interesting results.
bmgun.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply