One Million Tables

Question

One Million Tables

Steve Jones - SSC Editor

SSC Guru

Points: 736212
More actions
May 9, 2018 at 10:22 pm

#77195

Comments posted to this topic are about the item One Million Tables
May 10, 2018 at 3:29 pm

This was removed by the editor as SPAM

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

keith.gilbert-1064677 Mr or Mrs. 500 Points: 549 More actions · Answer 1

Part of the extra time is taken by poor indexing of SQL Servers meta data. If you do an existence check on a table in the 1 million database it will take significantly longer than an existence check in the 1000 database. I have reported this to Microsoft in the past and they came back with the stock answer that this is by design. I was unaware that it takes longer to insert data in a database with 1 million tables compared to 1000 tables. Thanks for this very interesting.

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 2

Heh... I advise people against trying to display a million rows even in the Grid and it has no meta-data to resolve. I wouldn't expect the Explorer window to deal with a million of anything, which not only displays graphically, but also has to resolve the hierarchical meta-data, as well. 😀

Interesting article, though. Thanks, Steve.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 3

Jeff Moden - Thursday, May 10, 2018 5:03 AM
Heh... I advise people against trying to display a million rows even in the Grid and it has no meta-data to resolve. I wouldn't expect the Explorer window to deal with a million of anything, which not only displays graphically, but also has to resolve the hierarchical meta-data, as well. 😀
Interesting article, though. Thanks, Steve.

Try to do the same, but when you have tables that have billions of records, it's often hard. The biggest threat is running out of client memory and crashing your system though. I've had cases where machines have completely locked forcing hard restarts. 😉

I will say in the MPP world with Azure Data Warehouse that the chances of more than what you would want with table count is higher because MPP does often force you to duplicate fact tables on different hashing keys as they only support one column per hash in order to distribute data across the hidden 60 databases. I can see in those instances, reaching a high amount of tables for sure. Hopefully that will change when they add support for more than one key.

andrew 13724 Old Hand Points: 348 More actions · Answer 4

So I made a variant using TSQL running in management studio for the insert and only tracked the Milliseconds taken for the insert - I got bored waiting for thetable creation so my one million database has about 340,000 tables

begin transaction

declare @loops int declare @i int Set @loops = 1000 Set @i = 1

declare @start datetime

declare @end datetime

declare @taken int

set @taken = 0

declare @number int declare @query nvarchar(200) while (@i<@loops) Begin

set @number= ABS(CHECKSUM(NewId())) % 1000 +1

set @query = 'insert mytable' + cast(@number as nvarchar(10)) + ' select ' + cast(@number as nvarchar(10)) + ', replicate(CHAR(65), 1000)' set @start = getdate() EXECUTE sp_executesql @query

set @end = getdate()

set @taken = @taken + DATEDIFF(ms,@start,@end)

Set @i = @i +1 End

select @taken

rollback

Like you I didn't run this as a true performance benchmark just a vague idea of any differences. I ran each version a few times against each database, these are rough numbers
Min Max Most Common 1KDB 20 600 25-45 340KDB 650 900 750-800

I'm thinking a lot of the inserts in the 1KDB took under 1ms which skews the result a lot

EDIT::

A Quick alter to datetime2 and nanosecond checks had the 340K returning roughly 1,000,000,000 and the 1K at anything between 59,999,900 and 653,333,300

Eric M Russell SSC Guru Points: 125561 More actions · Answer 5

But can SQL Server support a single table spread across 1 million partitions and files?

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

james.bookman SSC Rookie Points: 40 More actions · Answer 6

You asked about PowerShell: "Note: If I'm testing poorly with PoSh, let me know. I'd be curious why there would be a difference if PoSh is slowed with Get-Random somehow. "
I think your method for measuring elapsed time and the Write-Host cmdlet is having a much larger impact on your performance. I ran 4 tests which I'll attach, but first the summaries.

Run 1 - Removed SQL insert, wrapped original test in Measure-Command statement. TotalSeconds : 4.4825437

Run 2 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement. TotalSeconds : 3.6304238

Run 3 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement, commented out write-host. TotalSeconds : 0.392731

Run 4 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement, commented out write-host, increased loops to 1 million. TotalSeconds : 38.7010968

qbrt SSCrazy Points: 2737 More actions · Answer 7

I'm no expert in the SQL PoSH module, but does the Invoke-Sqlcmd cmdlet open and then close a connection to the database on each call?
Is that how you wish to test the inserts?
How about creating an SQLConnection object, open a connection and only close it after all the inserts are completed.
No time to do this myself right now :(.

stephen.sabey SSC Rookie Points: 45 More actions · Answer 8

I'm guessing query plan caching will have had some impact. In the case where you run 10,000 iterations across the one thousand tables, the sql you generate on each iteration will be found in the query plan cache because you will only have a maximum of 1,000 different queries being possible).

In the case where you have one million tables, each of the 10,000 iterations will likely be generating a sql query that has never been executed before and is not in the plan cache.

It would be interesting to see the results if you leave the max random value at 1000 for the one million table test...

Perry Whittle SSC Guru Points: 234013 More actions · Answer 9

That 2 billion covers all objects including SPs, views, tables, functions, etc.

Sys.objects has a foreign key on schema_id to sys.schemas so the 2 billion includes all schemes not per schema.

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Steve Jones - SSC Editor SSC Guru Points: 736212 More actions · Answer 10

qbrt - Thursday, May 10, 2018 10:44 AM
I'm no expert in the SQL PoSH module, but does the Invoke-Sqlcmd cmdlet open and then close a connection to the database on each call?
Is that how you wish to test the inserts?
How about creating an SQLConnection object, open a connection and only close it after all the inserts are completed.
No time to do this myself right now :(.

This might do that, but some of what I was interested in was the load from multiple clients. I'm not trying to be efficient in inserts here, but simulate different clients and see if the number of tables makes a difference.

Steve Jones - SSC Editor SSC Guru Points: 736212 More actions · Answer 11

christianb - Thursday, May 10, 2018 3:29 PM
You have too much time on your hands 🙂

Perhaps, but this took me almost 2 months to fit in around other stuff. It was only slightly interesting.

Steve Jones - SSC Editor SSC Guru Points: 736212 More actions · Answer 12

keith.gilbert-1064677 - Thursday, May 10, 2018 1:28 AM
Part of the extra time is taken by poor indexing of SQL Servers meta data. If you do an existence check on a table in the 1 million database it will take significantly longer than an existence check in the 1000 database. I have reported this to Microsoft in the past and they came back with the stock answer that this is by design. I was unaware that it takes longer to insert data in a database with 1 million tables compared to 1000 tables. Thanks for this very interesting.

That's interesting. I hadn't considered indexing the metadata.

Steve Jones - SSC Editor SSC Guru Points: 736212 More actions · Answer 13

andrew 13724 - Thursday, May 10, 2018 6:38 AM
I'm thinking a lot of the inserts in the 1KDB took under 1ms which skews the result a lot
EDIT::
A Quick alter to datetime2 and nanosecond checks had the 340K returning roughly 1,000,000,000 and the 1K at anything between 59,999,900 and 653,333,300

Interesting.