Thoughts on AI Data Infrastructure Field Day #1

On October 2^nd and 3^rd, I took part in an amazing event called AI Data Infrastructure Field Day as a delegate. If you’ve never heard of this event before, it’s a series of tech events from the folks over at Tech Field Day where different tech companies present on cutting-edged technologies. The delegates get to ask the hard questions during the presentations to try to understand the tech and determine if it really lives up to the marketing. It’s a blast, and the presentations are on some serious next-level gear.

The companies presenting at this event were Google Cloud, HPE, Infinidat, MinIO, Pure Storage, and Solidigm. The scale is just incredible with all of these groups. The petabyte is the new gigabyte, and exabytes were thrown around like I drink glasses of tea. The price of storage is secondary to the price of the GPU, so the goal is make the most of the GPU by maximizing everything underneath.

Now, what is ‘AI data infrastructure’ you might ask? I had to ask that question a few times, as my background is largely relational data. OLTP databases require ultra-low latency, high IOPs, and lots of concurrency, but with AI data, the needs are different.

From some companies’ perspectives, it’s delivering the fastest and best scale-out storage that money can buy so you can scale-out read faster than anyone, and then checkpoint to disk your AI training data to keep the GPUs busy. The faster you can read, the more data you can feed the GPUs. The faster you can checkpoint, the more the AI training can return to the GPU. The more you lean on the GPU, the more efficient the platform is, and the business gains a competitive edge.

So, we need massive scale-out capabilities. We need ultra-high throughput since these data objects can be quite large. We also need insanely low latency so we can checkpoint to disk – and NOW.

Most of the presentations fell into this category, but each took a different twist on the offerings.

MinIO makes software that allows for object storage as containers on both public and private clouds. Deploy container-based scale-out S3-compatible object storage endpoints quickly and easily. They’ve become a standard for object storage, even though the basic offering is open source and free. They demonstrated that this is a go-to for this sort of storage, and did a great job demonstrating their efficacy at providing this level of storage, no matter the choice in the underlying platform. I love the flexibility their offerings deliver, mostly because it was a standardized storage protocol (S3) and one platform that could scale as low or as high as required.

Google Cloud then presented on their object storage offerings. Google shouldn’t be a stranger at this point. I find it interesting that they have numerous distinct offerings, based on your needs – Cloud Storage, Parallelstore, Storage Scale, Filestore, and Hyperdisk ML. It also felt like there was significant overlap in the offerings. They have so many offerings they had a decision tree matrix, and even then, it felt a bit complex. The platform would certainly scale and work very well, if your AI workload were located on the Google public cloud. Native offerings on a given cloud provider should just work. I would just really need to study to determine which is the right initial platform to choose, and as a workload scaled, determine the best method to move the data to the next level as needed.

Infinidat came next to discuss their on-premises massive storage arrays. I’ve been a big fan of their arrays for years for relational database workloads, as we have many clients on their storage with great success stories. Their presentation was largely focused on the pretense of ‘you need fast storage for AI, so here’s what we can do.’ Storage companies are always focused on faster, larger, and easier, and they hammered on these points hard. I just didn’t see any features within the array that could contribute to the AI initiative, other than being great at scaling out and being fast.

Solidigm took a different approach that I found refreshing. Solidigm makes NVMe storage rather than arrays or protocols. They focused on the Total Cost of Ownership (TCO) rather than a more operational approach of faster, bigger, etc. The presentation focused on how their storage provide more efficient storage, which saves costs through reducing factors such as power, rack space, cooling, maintenance, disposal, etc. It was quite refreshing, and I could have spent the rest of the day just drilling into the actual calculations on the cost drivers and price optimization that businesses can leverage from utilizing their storage.

Pure Storage also discussed their scale-out FlashBlade storage platforms. I’ve also been a huge fan of Pure since the beginning, have numerous clients happily using their arrays, and have actually done significant work with their internal teams on the SQL Server side of the house. The presentation felt similar to Infinidat in that it focused on ‘we are insanely fast and can scale out’. Both did a great job presenting their storage, but could have done a better job of talking specifics on optimizing the AI workloads rather than focusing on ‘faster means better AI.’

From a different perspective, rather than being a foundation for improved efficiency in the GPUs, other offerings want to be more than just storage. To me, HPE stood out in this area, and I feel that the presentation was the most impressive of the entire event. HPE focused not just on storage, but wanted to provide the end-to-end foundation for enabling businesses to get started with AI. Organizations could have storage or compute ready to go, but may need tooling to feed their existing data into AI platforms. They have to wire in each data source manually, determine the data model, figure out a way to get the data into a central location (and store it), then figure out how to best utilize it. Too many organizations, IMHO, are too busy putting out fires each day to put the energy into these layers of the AI story, but these items are required before the data can be even considered for AI modeling.

They have released a platform that they call “HPE Private Cloud AI”. It’s an end-to-end solution that contains the storage, compute, GPUs, AND the software components. The software is included that allows organizations to integrate and connect to the various data sources, plus includes the AI-specific packages as well so you can quickly get up and running. The platform packages up numerous open-source packages to provide a ready-to-go endpoint that will act as the cornerstone that allows business to just ‘use’ the platform to start the journey into AI.

I think this one of the best ‘private cloud’-like stories that I’ve ever seen. Many companies treat private cloud as a virtual platform with administrators on the other end of the support ticket manually performing tasks like deploying and integrating, but the automation aspect just means more bodies. Few have what I’d consider good automation in-house, and fewer have good integration of data sources. This platform allows the organization to skip the lengthy journey into automation and integration, and to just start using it. Save the stress, time, and re-work, and just get going.

Check out the sessions on YouTube here. I’m proud to have been a part of this even, even if I was a bit quieter than I’d have liked. It was a lot of great data to absorb, and I’m still digesting a lot of the content here weeks later. I can’t wait for the next one!

Thoughts on AI Data Infrastructure Field Day #1

Rate

Share

Share

Rate