MinIO is one of the world's most popular open source S3-compatible object storage systems. Due to its combination of performance and simplicity, it has been widely used to store data for a variety of applications. But with the emergence of generative artificial intelligence (GenAI), MinIO recognized the opportunity to provide artificial intelligence-centered object storage, and now MinIO has launched AIStor.
MinIO founder and CEO AB Periasamy is known for his reluctance to add features to the object store. "We try very hard not to add new features," he told the outlet in 2017. "We removed a lot of code last year. We've sincerely tried to keep it to a minimum."
This minimalist approach has worked very well for MinIO since it launched the object store in November 2014. Two years ago, the company reported that the project was serving more than 1 million reads per day, and 330 million per year. At this rate, MinIO will now be downloaded more than 1.5 billion times, making it one of the most popular open source software in the world.
But that was before ChatGPT came out in November 2022, and GenAI took off like a rocket. Jonathan Symonds, Chief Marketing Officer of MinIO, said that the GenAI revolution has greatly enhanced the company's need for big data.
"In terms of data being stored on MinIO, we have multiple clients that exceed Exabyte and they are running completely different types of workloads than they were in the past," Symonds said. "So, if you're a national laboratory and all the data is in archives, most of it on tape, you might get Exabytes of data. But that's not what we're talking about here. We're talking about Exabytes of data artificial intelligence and machine learning workloads.”
Organizations are collecting and storing large amounts of unstructured data on MinIO’s object storage for the specific purpose of building and training AI models. The data can be video, log files, and telemetry from the car. It can be a log file for network threat detection or media for a streaming service. To serve this emerging storage market, it launched the DataPod reference architecture earlier this year.
Artificial Intelligence use cases became so popular and important to MinIO's business that it forced Periasamy to reevaluate his natural reluctance to add new features and open himself up with a rapidly lean object store to combat feature creep and product bloat. double risk. Rather than continuing to build its (non-open source) enterprise object storage as a horizontal product that excels at a broad range of use cases, MinIO decided to double down on AI and redesign its enterprise product specifically around the emerging need for AI to store and access data.
MinIO's new promptObject API allows users to query unstructured data such as restaurant receipts.
"Enterprise object storage... is a complete data infrastructure stack, but it's still a general product. It's a horizontal product," Periasamy said. "But given our current success rate in terms of customer base and new channels is being built, more and more people are moving towards artificial intelligence and scale.”
Organizations that once felt the pain of managing big data around 100TB now easily exceed 100PB, and the number of companies approaching the 1EB threshold is increasing every day. This is a major change in the storage market, necessitating the creation of AIStore, the artificial intelligence-based version of MinIO's flagship product.
New AIStor adds AI-specific features to object storage, including a new S3-API-compatible promptObject that allows users to “talk” to private repositories of unstructured data and AI models, a Huggingface Substitute. AIStore has also added new features to support emerging AI data workloads, such as support for RDMA connections over S3, and a new global console to make management easier.
The new promptObject API will enable users to interact directly and efficiently with data using natural language prompts without the need for extensive development work around data preparation, vector databases, Retrieval Augmentation Generation (RAG) and other GenAI tools and techniques.
For example, say a customer has a picture of a restaurant menu in their target store. Using the promptObject API, developers can ask the image to extract the physical address from the menu and return it as output. MinIO engineer Dil Radhakrishnan said the API also supports prompt linking, which enables users or applications to interact with multiple objects at the same time. The API currently supports unstructured data such as text, PDFs and images, and will soon support video, he added.
Perasamy said it's a new way to query unstructured data.
AIStor also introduces a new GUI console for administrators.
In the previous generation, when enterprises were dominated by structured data, you would type SQL queries or something similar to SQL. In the modern world, most enterprise data is unstructured. What do you do with this data? ...You basically treat unstructured data like a database. "
Support for high-speed remote direct memory access (RDMA) over 400Gb and 800Gb Ethernet networks is also important to help resolve network bottlenecks that arise in large-scale storage clusters used to power GPUs.
"The reason RDMA is important is that now when you bring GPUs to the client, 100Gb is considered slow," Periasamy said. "If you are launching a GPU infrastructure today, you should consider 400Gb as your starting point."
Periasamy said that working with Nvidia, AMD and Intel to ensure that the RoCE (RDMA over Converged Ethernet) version 2 standard is a solid, industry-neutral interface is very important to encourage enterprise adoption.
"We're working closely with Nvidia, AMD and Intel to make this happen in a way that's compatible with all three architectures, and the S3 API is still the S3 API," he said. "The control channel is over HTTP, but when the data is pushed, Whether it's from CPU to memory or GPU to memory, we set it to S3. We didn't create a new API specification and kept S3 underneath. API. RDMA is transparent, so you can leverage RDMA without understanding its complexities."
Meanwhile, the new AIHub provides MinIO customers with a facility to securely store their AI models in their own environment. It is a replacement for Huggingface, a very popular repository for artificial intelligence models, but by definition it is open to the public.
This is just the beginning of the artificial intelligence capabilities MinIO has planned for its enterprise object storage. The company believes there will be significant growth in enabling customers to store and process AI data, and is eager to build these capabilities into its products to achieve this goal.
"We're doing this because we're evolving enterprise object storage into AIStor to narrow its use cases," Periasamy said. "Instead of winning hundreds of use cases. Win one use case, the AI use case, and make it Gotta be bigger. This use case is big enough that we don’t care about anything else.”
AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.
The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.
Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).
You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.
You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.