May 9, 2018 at 10:22 pm
Comments posted to this topic are about the item One Million Tables
May 10, 2018 at 1:28 am
Part of the extra time is taken by poor indexing of SQL Servers meta data. If you do an existence check on a table in the 1 million database it will take significantly longer than an existence check in the 1000 database. I have reported this to Microsoft in the past and they came back with the stock answer that this is by design. I was unaware that it takes longer to insert data in a database with 1 million tables compared to 1000 tables. Thanks for this very interesting.
May 10, 2018 at 5:03 am
Heh... I advise people against trying to display a million rows even in the Grid and it has no meta-data to resolve. I wouldn't expect the Explorer window to deal with a million of anything, which not only displays graphically, but also has to resolve the hierarchical meta-data, as well. 😀
Interesting article, though. Thanks, Steve.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 10, 2018 at 5:47 am
Jeff Moden - Thursday, May 10, 2018 5:03 AMHeh... I advise people against trying to display a million rows even in the Grid and it has no meta-data to resolve. I wouldn't expect the Explorer window to deal with a million of anything, which not only displays graphically, but also has to resolve the hierarchical meta-data, as well. 😀Interesting article, though. Thanks, Steve.
Try to do the same, but when you have tables that have billions of records, it's often hard. The biggest threat is running out of client memory and crashing your system though. I've had cases where machines have completely locked forcing hard restarts. 😉
I will say in the MPP world with Azure Data Warehouse that the chances of more than what you would want with table count is higher because MPP does often force you to duplicate fact tables on different hashing keys as they only support one column per hash in order to distribute data across the hidden 60 databases. I can see in those instances, reaching a high amount of tables for sure. Hopefully that will change when they add support for more than one key.
May 10, 2018 at 6:38 am
So I made a variant using TSQL running in management studio for the insert and only tracked the Milliseconds taken for the insert - I got bored waiting for thetable creation so my one million database has about 340,000 tables
begin transaction
declare @loops int
declare @i int
Set @loops = 1000
Set @i = 1
declare @start datetime
declare @end datetime
declare @taken int
set @taken = 0
declare @number int
declare @query nvarchar(200)
while (@i<@loops)
Begin
set @number= ABS(CHECKSUM(NewId())) % 1000 +1
set @query = 'insert mytable' + cast(@number as nvarchar(10)) + ' select ' + cast(@number as nvarchar(10)) + ', replicate(CHAR(65), 1000)'
set @start = getdate()
EXECUTE sp_executesql @query
set @end = getdate()
set @taken = @taken + DATEDIFF(ms,@start,@end)
Set @i = @i +1
End
select @taken
rollback
Like you I didn't run this as a true performance benchmark just a vague idea of any differences. I ran each version a few times against each database, these are rough numbers
Min Max Most Common
1KDB 20 600 25-45
340KDB 650 900 750-800
I'm thinking a lot of the inserts in the 1KDB took under 1ms which skews the result a lot
EDIT::
A Quick alter to datetime2 and nanosecond checks had the 340K returning roughly 1,000,000,000 and the 1K at anything between 59,999,900 and 653,333,300
May 10, 2018 at 7:10 am
But can SQL Server support a single table spread across 1 million partitions and files?
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
May 10, 2018 at 7:41 am
You asked about PowerShell: "Note: If I'm testing poorly with PoSh, let me know. I'd be curious why there would be a difference if PoSh is slowed with Get-Random somehow. "
I think your method for measuring elapsed time and the Write-Host cmdlet is having a much larger impact on your performance. I ran 4 tests which I'll attach, but first the summaries.
Run 1 - Removed SQL insert, wrapped original test in Measure-Command statement. TotalSeconds : 4.4825437
Run 2 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement. TotalSeconds : 3.6304238
Run 3 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement, commented out write-host. TotalSeconds : 0.392731
Run 4 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement, commented out write-host, increased loops to 1 million. TotalSeconds : 38.7010968
May 10, 2018 at 10:44 am
I'm no expert in the SQL PoSH module, but does the Invoke-Sqlcmd cmdlet open and then close a connection to the database on each call?
Is that how you wish to test the inserts?
How about creating an SQLConnection object, open a connection and only close it after all the inserts are completed.
No time to do this myself right now :(.
May 10, 2018 at 10:45 am
I'm guessing query plan caching will have had some impact. In the case where you run 10,000 iterations across the one thousand tables, the sql you generate on each iteration will be found in the query plan cache because you will only have a maximum of 1,000 different queries being possible).
In the case where you have one million tables, each of the 10,000 iterations will likely be generating a sql query that has never been executed before and is not in the plan cache.
It would be interesting to see the results if you leave the max random value at 1000 for the one million table test...
May 10, 2018 at 2:19 pm
That 2 billion covers all objects including SPs, views, tables, functions, etc.
Sys.objects has a foreign key on schema_id to sys.schemas so the 2 billion includes all schemes not per schema.
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
May 10, 2018 at 3:29 pm
This was removed by the editor as SPAM
May 10, 2018 at 4:18 pm
qbrt - Thursday, May 10, 2018 10:44 AMI'm no expert in the SQL PoSH module, but does the Invoke-Sqlcmd cmdlet open and then close a connection to the database on each call?
Is that how you wish to test the inserts?
How about creating an SQLConnection object, open a connection and only close it after all the inserts are completed.
No time to do this myself right now :(.
This might do that, but some of what I was interested in was the load from multiple clients. I'm not trying to be efficient in inserts here, but simulate different clients and see if the number of tables makes a difference.
May 10, 2018 at 4:19 pm
christianb - Thursday, May 10, 2018 3:29 PMYou have too much time on your hands 🙂
Perhaps, but this took me almost 2 months to fit in around other stuff. It was only slightly interesting.
May 10, 2018 at 4:20 pm
keith.gilbert-1064677 - Thursday, May 10, 2018 1:28 AMPart of the extra time is taken by poor indexing of SQL Servers meta data. If you do an existence check on a table in the 1 million database it will take significantly longer than an existence check in the 1000 database. I have reported this to Microsoft in the past and they came back with the stock answer that this is by design. I was unaware that it takes longer to insert data in a database with 1 million tables compared to 1000 tables. Thanks for this very interesting.
That's interesting. I hadn't considered indexing the metadata.
May 10, 2018 at 4:21 pm
andrew 13724 - Thursday, May 10, 2018 6:38 AMI'm thinking a lot of the inserts in the 1KDB took under 1ms which skews the result a lotEDIT::
A Quick alter to datetime2 and nanosecond checks had the 340K returning roughly 1,000,000,000 and the 1K at anything between 59,999,900 and 653,333,300
Interesting.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply