January 26, 2003 at 1:58 pm
The application that I'm designing the database is for tracking hits for individuals of which they can mine statistics. If I have 5000 web site owners tracking 30,000 hits a month....well, you get the picture. I end up with a VLDB.
I'm wondering if I should store all the hits in a single table or create a separate table for each individual user [to speed up lookup times]?
My concern is that with the amount of records that will need to be looked up for any given user, lookup times may get very long.
Would SQL Server have a problem with a database with 5000 tables?
Is there a way to really tweak out a tables indexes for high speed lookups?
Is there an Ideal approach to this situation?
Any insight to this would be very greatly apprecicated. TIA
Jeffrey V. Lemire
January 27, 2003 at 6:30 am
quote:
Is there a way to really tweak out a tables indexes for high speed lookups?Is there an Ideal approach to this situation?
Generally, you should try a normalized solution before attempting to denormalize a db design. Possibly, you could create a table housing the raw data...then a separate table keyed on site that stores some calculated fields (total hits, etc...) A well indexed table with narrow primary keys should provide the speed you require. How is hit data stored, meaning, how are you planning to key the data...site id and what else?
January 30, 2003 at 11:42 am
perhaps instead of creating one table per web site owner you could create one table per month and let all the owners search the same table.
Views might help you get over the multiple table problem...
January 30, 2003 at 2:23 pm
Totally agree you should build it right, then if perf is an issue look for solutions. Keep the table narrow as possible. You might consider summarizing the data daily or monthly to a different table unless they truly need detail for the entire period.
Andy
February 6, 2003 at 3:05 am
Yes, do it right, then tune. In fact you can summarize with indexed views and still keep your table 'correct', but watch out for blocking in the indexed view when doing inserts or updates.
If blocking becomes an issue you could (perhaps) be inspired by below example, which is somewhat theoretical, difficult to program and was made most for illustrative purposes, but never the less works perfectly.
Just copy below text into Query Analyzer and read carefully. (hope it is not to long, haven't figured out how to attach SQL scripts)
/* 2002/11/26/JKJ Below illustrates the use of SQL Server Indexed Views,
with no blocking on insertion of new non posted rows.
Generally, any creation of indexed views/materialized views will introduce heavy blocking
because updating data included in the precalculated aggregate needs to block. But often
we have a usage (pattern?) in which some data are very static and some data are very
dynamic. If we know the pattern, we can design the aggregate to contain only the data
generally NOT updated, whereas heavily updated data are not including in the aggregate.
This will allow us to use a pre-aggregated value for static data, summed with an
'on the fly' generated aggregate for dynamic data. Latter simply to avoid blocking on
the update of the pre-aggregate itself.
In below example this is implemented using an isPosted Column (which is kind of speculative)
Netto, this simple implementation will NOT block inserts of non posted transactions,
but will block when transactions are posted, because then the aggregated sum for
posted transactions will need to be updated.
So tons of inserts of non posted transactions can be handled with no blocking, but when
transactions are eventually posted, they will block (which is kind of by nature). So
until the transactions are reflected in the pre-aggregated sum they have high concurrency.
If the posting is delayed until the very end of all posting transactions, this is still
a high concurrency solution.
*/
/* Clean up */
if exists
(SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'ledgerAccountBalanceView'
AND table_type = 'VIEW')
DROP VIEW ledgerAccountBalanceView
go
if exists
(SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'ledgerBalanceTransactions'
AND table_type = 'BASE TABLE')
DROP TABLE ledgerBalanceTransactions
go
/* Create a silly table, which both include historic (mostly posted) transactions
and a few non posted transactions (Nature of Accounting systems should mean that
in time, most transactions are posted)
*/
CREATE TABLE ledgerBalanceTransactions
(ledgerAccount VARCHAR(10),
amount DECIMAL(20,4) NOT NULL,
datePosted DATETIME,
isPosted SMALLINT DEFAULT 0
)
go
/* Make a lot of posted transactions */
DECLARE @i INT
DECLARE @j-2 INT
SET @i = 1000
SET NOCOUNT ON
WHILE (@i > 0)
BEGIN
SET @j-2 = 100
WHILE (@j > 0)
BEGIN
INSERT INTO LedgerBalanceTransactions
(ledgerAccount, amount, datePosted, isPosted)
VALUES
(CAST( @i%10 AS VARCHAR(10)), @i % 10, {ts '2001-01-01 00:00:00.0'}, 1)
END
SET @i = @i -1
END
SET NOCOUNT OFF
go
/* Make a few not posted transactions */
DECLARE @i INT
SET @i = 100
SET NOCOUNT ON
WHILE (@i > 0)
BEGIN
INSERT INTO ledgerBalanceTransactions
(ledgerAccount, amount, datePosted, isPosted)
VALUES
(CAST( @i%10 AS VARCHAR(10)), @i % 10, NULL, 0)
SET @i = @i -1
END
SET NOCOUNT OFF
go
/* Lets index the table */
-- DROP INDEX ledgerBalanceTransactions.ledgerBalanceTransactionsIdx
CREATE CLUSTERED INDEX ledgerBalanceTransactionsIdx ON
ledgerBalanceTransactions(ledgerAccount)
CREATE NONCLUSTERED INDEX ledgerBalanceTransactionsPostIdx ON
ledgerBalanceTransactions(isPosted)
go
/* Lets find the sum of all posted transactions per ledgerAccount */
select sum(amount), ledgerAccount
from ledgerbalancetransactions
where isPosted = 1
group by ledgerAccount
go
Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 535, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 250 ms, elapsed time = 254 ms.
/* Lets find the sum of all non posted transactions per ledgerAccount */
select sum(amount), ledgerAccount
from ledgerbalancetransactions
where isPosted = 0
group by ledgerAccount
Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 1 ms, elapsed time = 1 ms.
/* Lets find the grand sum of all transactions per ledgerAccount */
select sum(amount), ledgerAccount
from ledgerbalancetransactions
group by ledgerAccount
go
Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 535, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 101 ms, elapsed time = 101 ms.
/* Lets make a View for the SUM(amount) on all posted transaction */
CREATE VIEW ledgerAccountBalanceView
WITH SCHEMABINDING
AS
SELECT ledgerAccount, SUM(amount) AS amountSumPosted, COUNT_BIG(amount) ledgerCount, isPosted
FROM dbo.ledgerBalanceTransactions
WHERE isPosted = 1
GROUP BY ledgerAccount, isPosted
go
/* Lets materialize the view into a clustered index */
CREATE UNIQUE CLUSTERED INDEX accountBalanceViewIndex ON
ledgerAccountBalanceView( ledgerAccount )
go
/* Select sum(ammount) per ledgerAccount from the View for all posted transactions */
SELECT amountSumPosted, LedgerAccount, ledgerCount FROM ledgerAccountBalanceView
go
Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
/* Lets find the sum of all posted transactions per ledgerAccount */
select sum(amount), ledgerAccount
from ledgerbalancetransactions
where isPosted = 1
group by ledgerAccount
go
Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
/* Select sum(amount) per ledgeraccount for all non posted transactions */
SELECT sum(amount) as amountSumNotPosted, ledgerAccount FROM ledgerBalanceTransactions
WHERE isPosted = 0
GROUP BY ledgerAccount
go
Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
/* Lets use the aggregated data to get sum of posted amounts from aggregated view plus
the sum of non posted amounts from the transaction table itself. We need the derived
table openT, because we want to GROUP before the join, otherwise the view sum will be
summed per duplicate ledgerAccount row in transactions */
SELECT SUM(openT.amount+closedT.amountSumPosted), closedT.ledgerAccount
FROM ledgerAccountBalanceView AS closedT
JOIN
(SELECT SUM(amount) AS amount, ledgerAccount, isPosted
FROM ledgerBalanceTransactions
WHERE isPosted = 0
GROUP BY ledgerAccount, isPosted) AS openT
ON closedT.ledgerAccount = openT.ledgerAccount
AND closedT.isPosted = 1 AND openT.isPosted = 0
GROUP BY closedT.ledgerAccount
go
Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.
Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
/* As not all accounts will have both a historic and a current transaction, the query
must UNION ledgerAccounts in both View and Table + ledgerAccounts only in View +
ledgerAccount only in View */
-- Below is the sum of open and closed transactions in both View and Table
SELECT SUM(openT.amount+closedT.amountSumPosted), closedT.ledgerAccount
FROM ledgerAccountBalanceView AS closedT
JOIN
(SELECT SUM(amount) AS amount, ledgerAccount, isPosted
FROM ledgerBalanceTransactions
WHERE isPosted = 0
GROUP BY ledgerAccount, isPosted) AS openT
ON closedT.ledgerAccount = openT.ledgerAccount
AND closedT.isPosted = 1 AND openT.isPosted = 0
GROUP BY closedT.ledgerAccount
UNION ALL
-- Below is the sum of (closed) transactions only in the View
SELECT SUM( closedT.amountSumPosted ), closedT.ledgerAccount
FROM ledgerAccountBalanceView AS closedT
RIGHT OUTER JOIN ledgerBalanceTransactions AS openT
ON closedT.ledgerAccount = openT.ledgerAccount
AND closedT.isPosted = 1 AND openT.ledgerAccount IS NULL
GROUP BY closedT.ledgerAccount
HAVING closedT.ledgerAccount IS NOT NULL
UNION ALL
-- Below is the sum of (open) transactions only in the Table
SELECT SUM( openT.amount ), openT.ledgerAccount
FROM ledgerBalanceTransactions AS openT
RIGHT OUTER JOIN ledgerAccountBalanceView AS closedT
ON closedT.ledgerAccount = openT.ledgerAccount
AND openT.isPosted = 0 AND closedT.ledgerAccount IS NULL
GROUP BY openT.ledgerAccount
HAVING openT.ledgerAccount IS NOT NULL
go
Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.
Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms.
/* To compare with the standard query */
SELECT SUM(amount), ledgerAccount, count(*)
FROM ledgerBalanceTransactions
GROUP BY ledgerAccount
go
Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 536, physical reads 0, read-ahead reads 0.
SQL Server Execution Times: CPU time = 93 ms, elapsed time = 93 ms.
/* Notice that this query does NOT block */
BEGIN TRAN
DECLARE @i INT
SET @i = 10
SET NOCOUNT ON
WHILE (@i > 0)
BEGIN
INSERT INTO ledgerBalanceTransactions
(ledgerAccount, amount, datePosted, isPosted)
VALUES
(CAST( @i%10 AS VARCHAR(10)), @i % 10, NULL, 0)
SET @i = @i -1
END
SET NOCOUNT OFF
go
COMMIT TRAN
/* Whereasthis query DOES block */
BEGIN TRAN
DECLARE @i INT
SET @i = 10
SET NOCOUNT ON
WHILE (@i > 0)
BEGIN
INSERT INTO ledgerBalanceTransactions
(ledgerAccount, amount, datePosted, isPosted)
VALUES
(CAST( @i%10 AS VARCHAR(10)), @i % 10, NULL, 1)
SET @i = @i -1
END
SET NOCOUNT OFF
go
COMMIT TRAN
regards
jensk
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply