Retrieve records with (more than one row of same type)

Question

Post reply

Retrieve records with (more than one row of same type)

smallmoney

SSCarpal Tunnel

Points: 4378
More actions
August 14, 2012 at 3:50 pm

#280263

For example I create some fake data about database files and retrieve only the databases with more than one transaction log file.
I was trying to test my T-SQL skills and see what was the best way of doing this. Below is the first two that came to mind. Any help or advise would be appreciated!
CREATE TABLE #stuff (
[databaseid]TINYINT,
[databasename]VARCHAR(50) NOT NULL,
[filename]VARCHAR(50) NOT NULL UNIQUE,
[filetype]VARCHAR(50) NOT NULL
)
INSERT INTO #stuff VALUES (1, 'games', 'game.mdf', 'data')
INSERT INTO #stuff VALUES (1, 'games', 'game.ldf', 'log')
INSERT INTO #stuff VALUES (2, 'movies', 'movies.mdf', 'data')
INSERT INTO #stuff VALUES (2, 'movies', 'movies_1.ldf', 'log')
INSERT INTO #stuff VALUES (2, 'movies', 'movies_2.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n.mdf', 'data')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log1.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log2.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log3.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log4.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log5.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log6.ldf', 'log')
INSERT INTO #stuff VALUES (4, 'music', 'music.mdf', 'data')
INSERT INTO #stuff VALUES (4, 'music', 'music.ndf', 'data')
INSERT INTO #stuff VALUES (4, 'music', 'music.ldf', 'log')
SELECTs.[databaseid]
,COUNT(*) AS qty
INTO#stuffaggregate
FROM#stuff s
WHEREs.[filetype] = 'log'
GROUP BY s.[databaseid]
HAVING COUNT(*) > 1
ORDER BY s.[databaseid]
SELECTs1.[databaseid]
,s1.[databasename]
,s1.[filename]
,s1.[filetype]
FROM#stuff s1
WHEREs1.[databaseid] IN (SELECT s2.[databaseid] FROM #stuffaggregate s2)
ORDER BY s1.[databasename] ASC
SELECTs1.[databaseid]
,s1.[databasename]
,s1.[filetype]
FROM#stuff s1
WHEREs1.[databaseid] IN (SELECT s2.[databaseid] FROM #stuff s2 WHERE s2.[filetype]= 'log' GROUP BY s2.[databaseid] HAVING COUNT(*) > 1)
TRUNCATE TABLE #stuffaggregate
DROP TABLE #stuffaggregate
TRUNCATE TABLE #stuff
DROP TABLE #stuff

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 1

Here is another query:

CREATE TABLE #stuff (

[databaseid]TINYINT,

[databasename]VARCHAR(50) NOT NULL,

[filename]VARCHAR(50) NOT NULL UNIQUE,

[filetype]VARCHAR(50) NOT NULL

)

INSERT INTO #stuff VALUES (1, 'games', 'game.mdf', 'data')

INSERT INTO #stuff VALUES (1, 'games', 'game.ldf', 'log')

INSERT INTO #stuff VALUES (2, 'movies', 'movies.mdf', 'data')

INSERT INTO #stuff VALUES (2, 'movies', 'movies_1.ldf', 'log')

INSERT INTO #stuff VALUES (2, 'movies', 'movies_2.ldf', 'log')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n.mdf', 'data')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log.ldf', 'log')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log1.ldf', 'log')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log2.ldf', 'log')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log3.ldf', 'log')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log4.ldf', 'log')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log5.ldf', 'log')

INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log6.ldf', 'log')

INSERT INTO #stuff VALUES (4, 'music', 'music.mdf', 'data')

INSERT INTO #stuff VALUES (4, 'music', 'music.ndf', 'data')

INSERT INTO #stuff VALUES (4, 'music', 'music.ldf', 'log')

GO

WITH AggData AS (

SELECT

COUNT(*) OVER (PARTITION BY databaseid, filetype) FileCnt,

databaseid,

filetype

FROM

#stuff

)

SELECT

databaseid,

databasename,

filename,

filetype

FROM

#stuff s

WHERE

EXISTS(SELECT 1 FROM AggData ag WHERE ag.databaseid = s.databaseid AND ag.filetype = 'log' AND ag.FileCnt > 1)

;

GO

DROP TABLE #stuff;

GO

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 2

I personally usually just do them in a single build, like so:

SELECT

s.[databaseid]

,s.[databasename]

,s.[filename]

,s.[filetype]

FROM

#stuff AS s

JOIN

(SELECT DatabaseID FROM #stuff GROUP BY DatabaseID HAVING COUNT(*) > 1) AS drv

ONs.DatabaseID = drv.DatabaseID

That said, it'll depend on how large your dataset is and if you have indexes optimized to DatabaseID or whatever you're linking on for quick-counts. A temp table with alternative indexing can help if the data-set's large enough.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

smallmoney SSCarpal Tunnel Points: 4378 More actions · Answer 3

Beautiful! I have no experience with partition. Does it have a better execution plan too? I could find out myself but I'm on an iPhone right now.

smallmoney (8/14/2012)
For example I create some fake data about database files and retrieve only the databases with more than one transaction log file.
I was trying to test my T-SQL skills and see what was the best way of doing this. Below is the first two that came to mind. Any help or advise would be appreciated!
CREATE TABLE #stuff (
[databaseid]TINYINT,
[databasename]VARCHAR(50) NOT NULL,
[filename]VARCHAR(50) NOT NULL UNIQUE,
[filetype]VARCHAR(50) NOT NULL
)
INSERT INTO #stuff VALUES (1, 'games', 'game.mdf', 'data')
INSERT INTO #stuff VALUES (1, 'games', 'game.ldf', 'log')
INSERT INTO #stuff VALUES (2, 'movies', 'movies.mdf', 'data')
INSERT INTO #stuff VALUES (2, 'movies', 'movies_1.ldf', 'log')
INSERT INTO #stuff VALUES (2, 'movies', 'movies_2.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n.mdf', 'data')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log1.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log2.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log3.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log4.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log5.ldf', 'log')
INSERT INTO #stuff VALUES (3, 'pr0n', 'pr0n_log6.ldf', 'log')
INSERT INTO #stuff VALUES (4, 'music', 'music.mdf', 'data')
INSERT INTO #stuff VALUES (4, 'music', 'music.ndf', 'data')
INSERT INTO #stuff VALUES (4, 'music', 'music.ldf', 'log')
SELECTs.[databaseid]
,COUNT(*) AS qty
INTO#stuffaggregate
FROM#stuff s
WHEREs.[filetype] = 'log'
GROUP BY s.[databaseid]
HAVING COUNT(*) > 1
ORDER BY s.[databaseid]
SELECTs1.[databaseid]
,s1.[databasename]
,s1.[filename]
,s1.[filetype]
FROM#stuff s1
WHEREs1.[databaseid] IN (SELECT s2.[databaseid] FROM #stuffaggregate s2)
ORDER BY s1.[databasename] ASC
SELECTs1.[databaseid]
,s1.[databasename]
,s1.[filetype]
FROM#stuff s1
WHEREs1.[databaseid] IN (SELECT s2.[databaseid] FROM #stuff s2 WHERE s2.[filetype]= 'log' GROUP BY s2.[databaseid] HAVING COUNT(*) > 1)
TRUNCATE TABLE #stuffaggregate
DROP TABLE #stuffaggregate
TRUNCATE TABLE #stuff
DROP TABLE #stuff

smallmoney SSCarpal Tunnel Points: 4378 More actions · Answer 4

Ah man I forgot we can replace subqueries with joins which is faster than subqueries

Evil Kraig F (8/14/2012)
I personally usually just do them in a single build, like so:
SELECT
s.[databaseid]
,s.[databasename]
,s.[filename]
,s.[filetype]
FROM
#stuff AS s
JOIN
(SELECT DatabaseID FROM #stuff GROUP BY DatabaseID HAVING COUNT(*) > 1) AS drv
ONs.DatabaseID = drv.DatabaseID
That said, it'll depend on how large your dataset is and if you have indexes optimized to DatabaseID or whatever you're linking on for quick-counts. A temp table with alternative indexing can help if the data-set's large enough.

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 5

Evil Kraig F (8/14/2012)
I personally usually just do them in a single build, like so:
SELECT
s.[databaseid]
,s.[databasename]
,s.[filename]
,s.[filetype]
FROM
#stuff AS s
JOIN
(SELECT DatabaseID FROM #stuff GROUP BY DatabaseID HAVING COUNT(*) > 1) AS drv
ONs.DatabaseID = drv.DatabaseID
That said, it'll depend on how large your dataset is and if you have indexes optimized to DatabaseID or whatever you're linking on for quick-counts. A temp table with alternative indexing can help if the data-set's large enough.

Not quite right. Should only return values where there are more than 1 log files.

smallmoney SSCarpal Tunnel Points: 4378 More actions · Answer 6

you're right. i couldn't test this because i'm on a iphone right now but if he added WHERE filetype = 'log' in his subquery on the join then it should be correct results?

JOIN

(SELECT DatabaseID FROM #stuff WHERE filetype = 'log' GROUP BY DatabaseID HAVING COUNT(*) > 1) AS drv

ONs.DatabaseID = drv.DatabaseID

Lynn Pettis (8/14/2012)
Evil Kraig F (8/14/2012)
I personally usually just do them in a single build, like so:
SELECT
s.[databaseid]
,s.[databasename]
,s.[filename]
,s.[filetype]
FROM
#stuff AS s
JOIN
(SELECT DatabaseID FROM #stuff GROUP BY DatabaseID HAVING COUNT(*) > 1) AS drv
ONs.DatabaseID = drv.DatabaseID
That said, it'll depend on how large your dataset is and if you have indexes optimized to DatabaseID or whatever you're linking on for quick-counts. A temp table with alternative indexing can help if the data-set's large enough.
Not quite right. Should only return values where there are more than 1 log files.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 7

smallmoney (8/14/2012)
you're right. i couldn't test this because i'm on a iphone right now but if he added WHERE filetype = 'log' in his subquery on the join then it should be correct results?

DERP! :blush:

Yeaaaah, just put your where clause in. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA