November 3, 2015 at 11:13 am
Resender (11/3/2015)
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
Total number of machine tables (3/11/2015 13:44 CET) 123851Those tables are the same
[MachineRegisterId] [int] NOT NULL,
[UtcTime] [datetime] NOT NULL,
[Value] [float] NOT NULL
With primary key on the combination of MachineRegisterId & UtcTime
Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?
What is the total rowcount?
oh not at all the idea is to partition all 123,851 tables over 2 partitions so that each table has an archive/active section,my appologies if my explanation was bad.
Doesn't make sense to partition in such a manner because it does nothing for the performance or ease of moving data to the ARCHIVE. I'm running through the rest of this post but wanted to say that two partitions for this is almost useless. We'll need more information. I'll see if more has been posted on the next page.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 3, 2015 at 11:14 am
Resender (11/3/2015)
In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is89,372,968,620
ChrisM@Work (11/3/2015)and how is this lot typically interrogated?
I wish I knew but I'm frequently being told that the reports are beyond my paygrade
Is the problem that the code is proprietary or that they think you're asking for information you shouldn't have? Because I hope they realize that if you can see the tables to alter them, you can probably query off them too.
November 3, 2015 at 11:18 am
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is89,372,968,620
ChrisM@Work (11/3/2015)and how is this lot typically interrogated?
I wish I knew but I'm frequently being told that the reports are beyond my paygrade
Until you posted up that number, I was thinking that the best way to handle this lot would be a single table, but the largest single table I've worked with was a little over INT rows, about 2,800,000,000 - and it took some effort to get it to play nicely. Here, our largest table is a little under 1,000,000,000 rows and we get results back in milliseconds. 89,372,968,620 is out of my experience / comfort zone but I'd still lean towards one or two tables, one to hold individual devices, the other to hold activity. Let's see what alternatives folks come up with.
Ok... I've reached the end of what's been posted, so far and, it may just be coffee depravation but, considering the performance you mentioned above, I have no clue what your problem is nor what you want to do. π Could you summarize a bit, please?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 3, 2015 at 11:54 am
Resender (11/3/2015)
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is89,372,968,620
ChrisM@Work (11/3/2015)and how is this lot typically interrogated?
I wish I knew but I'm frequently being told that the reports are beyond my paygrade
Until you posted up that number, I was thinking that the best way to handle this lot would be a single table, but the largest single table I've worked with was a little over INT rows, about 2,800,000,000 - and it took some effort to get it to play nicely. Here, our largest table is a little under 1,000,000,000 rows and we get results back in milliseconds. 89,372,968,620 is out of my experience / comfort zone but I'd still lean towards one or two tables, one to hold individual devices, the other to hold activity. Let's see what alternatives folks come up with.
Let's up that unconformtability a bit,from the figures I get a minute of unexpected downtime and the inputbuffers for the data not holding would cost us 4160 euro.
Or to be better put this is an application that resulted from the merge of several dozens instead being developed from the ground up, and we're reached a point where we can neither maintain or keep feedback from it properly.
Quick thought, the problem is not insurmountable, having dealt with far greater sizes I do know what it entails, you better get a good consultant to advice you on this, 200-300 Euro an hour is nothing in comparison if you can prevent an outage of few hours. There are too many unknowns for anyone here to make proper suggestions, my guess is that this would be a couple of days or a week worth of work either onsite or with full remote access.
π
November 4, 2015 at 6:36 am
OK I'll try to summarise
The company I work for produces a certain type of machinery used from factories to common households around the world.
The database is where data being kept that comes from the sensors on these machines.
Ounce a machine leaves the assembly line it's put in a warehouse, where it's internal sensors are hooked up to the system,the data were talking about are the sensor readings.
New tables are created automatically when a machine is hooked up.
Each machine has its own table, which contains it own internal registry, a timestamp and the value the sensors give off at that timestamp.
Ounce a machine leaves the warehouse to the customer,it's lifecycle is halted, we maintain the data for the machine but no new data is added.
But if a machine is broken or decapriated it is sent back to us, it re-enters the warehouse, it's monitoring systems are hooked up and we receive new data,so a table can have a gap of several years between data.
It's up to the engineers to determine at the hand of this data wheter a machine is fit to be sold, repaired or scraped.
If a machine is defunct or obsoleted the table gets removed,although this is maybe something that only happens a dozen times in a year (and it hasn't happened since I've been working on it).
The application that is running was instated somewhere between 2009-2011.
Now the reports the engineers use are getting sluggish compared to what they were back then,there is also an issue with diskspace.
The idea would to do the following, all the recent data which is usually used in the reports to be on a "faster to access" disk and the historical data to be on a more rugged data storage device.
November 4, 2015 at 7:27 am
Resender (11/4/2015)
OK I'll try to summarise
The company I work for produces a certain type of machinery used from factories to common households around the world.
The database is where data being kept that comes from the sensors on these machines.
Ounce a machine leaves the assembly line it's put in a warehouse, where it's internal sensors are hooked up to the system,the data were talking about are the sensor readings.
New tables are created automatically when a machine is hooked up.
Each machine has its own table, which contains it own internal registry, a timestamp and the value the sensors give off at that timestamp.
Ounce a machine leaves the warehouse to the customer,it's lifecycle is halted, we maintain the data for the machine but no new data is added.
But if a machine is broken or decapriated it is sent back to us, it re-enters the warehouse, it's monitoring systems are hooked up and we receive new data,so a table can have a gap of several years between data.
It's up to the engineers to determine at the hand of this data wheter a machine is fit to be sold, repaired or scraped.
If a machine is defunct or obsoleted the table gets removed,although this is maybe something that only happens a dozen times in a year (and it hasn't happened since I've been working on it).
The application that is running was instated somewhere between 2009-2011.
Now the reports the engineers use are getting sluggish compared to what they were back then,there is also an issue with diskspace.
The idea would to do the following, all the recent data which is usually used in the reports to be on a "faster to access" disk and the historical data to be on a more rugged data storage device.
Echoing what others have said - rent a specialist for this work, not necessarily because it's technically difficult, but because of the high cost of downtime should you trip up. Mitigate the risk.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 4, 2015 at 7:40 am
You need to tell the engineers that "the reports are getting sluggish" isn't really a very good diagnostic information message.
If each machine has its own table, then the problem is less likely the actual structure of the database and more likely a report issue (in my opinion). If reports are above your pay grade, then your boss should be aware that this "someone above your pay grade" needs to diagnose and resolve the issue.
The issue could also be a hardware issue. You might not have a page file that is properly sized to your RAM needs. You might have lost some RAM or have a NIC going out. It could be that your database and the indexes are overly fragmented, or the statistics are completely out of date. If they can't give you a test DB to check out your theories, then the risk of troubleshooting is outweighed by your company's uptime needs. Meaning, it's almost impossible to troubleshoot without costing your company some money.
Best scenario, the systems are down for an hour. Worst scenario, the systems are just down. So if I were in your shoes, I would put together a cost analysis of leaving the system "as is" compared to the cost analysis of troubleshooting in production (case for the test environment) compared to the cost analysis of the system going down during troubleshooting compared to the cost analysis of hiring a consultant to solve your issues.
Then let the boss decide. And if the boss decides to let you or one of your teammates troubleshoot in Production, it's his or her call.
November 4, 2015 at 7:51 am
you should change the key.
[UtcTime] [datetime] NOT NULL, -- clustered not unique
--not part of index
[MachineRegisterId] [int] NOT NULL,
[Value] [float] NOT NULL
and add an unique index on utctime and machineregisterid, so it does what it originally did.
try it on your biggest table and see how you like it. I'm sure a script could be made to automate changing them. Doesn't address the 'moving data' problem, but I think what they really want is for it to go fast, which this should do.
November 4, 2015 at 9:32 am
Piling on further, now 23 posts into this thread and yet there are nowhere near enough information to even begin to guess let alone advice on the problem. Few facts though,
1. partitioning is not going to help if the data is kept in a table per subject structure.
2. Both hardware and software/SQL changes are needed, there is no SQL only solution here.
3. Querying 8*10^9 rows requires IOPS capacity beyond "fast disks"
4. Alternative technologies could provide better fitting solution such as Big Data in the Cloud.
π
November 6, 2015 at 2:53 am
OK People thank you all for the answers
Sorry I couldn't be more specific,but yeah this is al internal working and I'm already working on the skin of my teeth.
The subject can be closed.
Viewing 10 posts - 16 through 24 (of 24 total)
You must be logged in to reply to this topic. Login to reply