DURHAM, N.C. — IBM announced that it planned to donate to the Cloud Native Computing Foundation (CNCF) three open source projects aimed at helping AI developers prepare data to train language models and work more easily with AI agents.
The announcement was made Tuesday at the first-ever All Things Open AI conference, by Sriram Raghavan, vice president of IBM Research AI.
The three projects are:
Data Prep Kit: This tool, released in 2024, speeds up unstructured data preparation for large language model (LLM) application developers by cleaning and enriching data for pre-training, fine-tuning and retrieval-automation generation (RAG) of models.
Docling: This project makes document processing simpler with the capability to parse a variety of formats, such as PDFs, into files that are easy for LLMs and other foundation models to digest. Docling was open sourced a year ago.
BeeAI: BeeAI is an agent that helps devs discover, run and build AI agents (in Python or TypeScript) from any framework, such as CrewAI, LangGraph and AutoGen.
The decision to donate the three projects, Ragjavan said, was aligned with IBM’s desire to make AI as accessible as possible for everyone.
Also, he said, “IBM as a company, we’ve been committed to open in many ways for a long time, and we’re starting to contribute a lot to this ecosystem.
“We’ve been in the journey of AI, actually, from that time the word ‘AI’ was mined in 1955… We have gone all the way from chess-playing machines to an LLM, because we’ve seen that journey.”
The Speed of Innovation
In September 2023, IBM introduced its Granite family of foundation models. In addition to the three projects it’s donating to the CNCF, in May IBM Research, along with Red Hat, introduced InstructLab, an open source framework for fine-tuning and collaborating on open source LLMs.
At All Things Open AI, Raghavan told the keynote audience on Tuesday that he marvels at the speed with which AI innovation is moving.
“We have now seen more and more frameworks emerge to start to help you put this together,” he said. “We are early in the journey. But what’s exciting is how quickly the narrative went from, ‘AI is going to be dominated by closed, proprietary model,’ to, in 24 months, usable models, usable frameworks, and now the emergence of the world of agents.”
Raghavan was bullish in Durham about the future of open source AI.
“I look forward to seeing what my community will do,” he told the keynote audience. “I want to end with just going back to where we started. It’s very clear to us that a transformative technology like AI isn’t going to have the economic output and impact that we all wanted to have without the coming together, open approaches to collaboration.”
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don’t miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.