by Dr. Darío Gil, IBM SVP and Director of Research
I remember about two years ago talking about the future of AI being Foundation models. At the time, maybe that sounded a little bit abstract and theoretical—this was pre-GPT days—and boy, has that happened. As that evolution of technology happened, we have seen a proliferation of foundation models of every possible size and capability. It feels, to some degree, that the task is continuously keeping an eye on them, evaluating them, and figuring out what use cases to leverage them for. But today, I’m going to challenge all of you to go on a slightly different mission. Before I give you that mission, I’m going to focus on what is truly going on and what is essential in this modern-day AI revolution, and that has to do with the power of data representations and the power of being able to encode incredible amounts of information of every possible form inside this new, incredibly capable representation that is Foundation models.
To really understand how profound this is, I would like to briefly touch on and go back to the origin of our digital world—an origin that was understood and conceptualized almost 350 years ago by Leibniz. Leibniz already then understood that you could take the information that was available around us in the form of language or mathematics, or you name it, and be able to encode it in a binary representation to create everything. “One thing is sufficient,” he said. He already knew the value and the power of representing information differently. In fact, the last number of decades have seen a tremendous amount of value creation and business transformation driven by the evolution of data representations.
As an example, we could encode data in a relational database. When relational databases were invented and created, they allowed a different way to organize and connect data that we couldn’t do before. They had a very profound impact, not just on technology providers like IBM, Oracle, and many others, but on enterprises. All of a sudden, we could do payroll and transaction processing and so many other core processes differently. You could take the data and encode it differently through a graph. You could have nodes and edges and traverse that graph, and that representation turns out to be very important if you’re in the business of doing things like internet search or social media, and doing graphs in terms of connecting people and groups. For the more technically inclined among you, you could take temporal data as a signal from an EKG, for example, and transform it through a Fourier transform to a frequency representation. All of a sudden, you can do signal processing.
What is going on right now is actually the ability to have Foundation models that can take training data and represent it inside these models. When we create a model, we take the training data and break it down into smaller chunks that we call tokens. A token can be a word or a fragment of a word, and this process creates trillions of these tokens. We convert each of those tokens into a vector—a collection of numbers—and use vectors to represent the tokens in a form a neural network can understand. Here, we’re talking about words, but we could do the same thing for code, images, or any other kind of data. After the tokens are converted to vectors, they pass through the layers of the neural network, and we apply a series of mathematical operations, mostly matrix multiplications and a few other simple operations, but they are done at a massive scale. As we progress through the network, we combine and recombine information across the sequence of tokens. We can even combine information from different modalities in the same model. During training, we adjust the network parameters so that it gets better and better at representing the sequences of tokens. As it goes through this process, it learns more and more of the structure of the data, its nuances, and the knowledge contained in the data.
It’s not really magic; it’s just math, human ingenuity, and a lot of computing power. The power of this new representation within Foundation models derives its capability from its scale—the sheer amount of data we can bring in—from its connectivity, and from its multimodality. This connectivity is very important because we are taking these wide disparate kinds of data, and by being inside the neural network and the structure, we are establishing semantic connections of that data once it is expressed inside it.
I will make an observation now and remark on a contrast. While we have witnessed over the last couple of years that we can literally take all the public data available in the world and put it inside a foundation model—for the sake of argument, let’s say 100% of that kind of data can make its way into a foundation model—let me contrast that with what percentage of enterprise data is inside Foundation models. I would say tiny—not even 1%. This is an interesting contrast: all the public data has made its way there, but none of the enterprise data has. So, I want to give you all a mission—a different way to look at AI. This mission is about going together on a journey to represent enterprise data with Foundation models to unlock its value. How many of you would say that your data is one of your most valuable assets inside your business? I bet all of you. So, the task is not about evaluating models; the task is to figure out how to progressively, safely, securely, and cost-effectively bring more and more of your enterprise data inside this new representation to create a massive amount of value.
You’ll recall that last year I was talking about not being an AI user but being an AI value creator. This is the story of being an AI value creator with data. So, the question is how—how should you do it? We’re going to walk through three steps to go from 0% to a large percentage of your enterprise data represented.
The first step is to start from a trusted base model because we’re going to add the data to it, so we need to know what is in it and how it works. Once we have a base, we need a process to represent and encode enterprise data in a systematic way, and we’re going to show you how to do that. After that, we have to deploy, scale, and create value with your AI.
Why should you choose a trusted base model? Let me give you an analogy. Imagine that I give you a vessel. The vessel is going to be important because this is where we’re going to add the data. But in this analogy, the vessel at the beginning looks opaque, and by the way, the vessel has some liquid inside—somebody else’s data, some public data. Now, we’re going to add our data or liquid to it. It’s going to get mixed. You don’t know what is inside. You’re going to shake it and drink it. Probably not—that doesn’t feel good. So, in this world, the vessel needs to be glass. You need to be able to see inside, know whether it has water or ice or whatever the right things are in there, so when you put your ingredients inside, you know what happens. You need a base model that has transparency, so you know its contents, the data used, and the methodology. When you add and mix it, you do it safely and securely.
These base models need to have undebatable performance, transparency, and broad commercial rights. Remember, this story is not about model providers; it is about your data. You need the rights so that when you encode your information in it, you have full freedom of action to do what you need for your business. Also, because you’re consuming something that has data from the outside world and capability, it should be indemnified so you feel safe operating in that fashion.
This step of starting from a trusted base model is why we have built Granite, and it is why we have open-sourced the Granite family of models. This is really important, and we’re very proud of this work. I want to show you what we have created and why it matters so much for the world of enterprises today. We have already released 18 models from the Granite series, with more capabilities to come. There are models for coding, software assistance, time series, language, and geospatial data.
Let me begin with the code models. Code is becoming more and more the lingua franca of business, so the ability to deal with code—to write, debug, and enhance the productivity of developers—is tremendously powerful and important. We have released a 3 billion parameter model, an 8 billion parameter model, a 20 billion parameter model, and a 34 billion parameter model. I’ll pick the 8 billion model because it is increasingly becoming the industry’s workhorse with the right scale, performance, and cost. We trained this model with 116 programming languages, 4.5 trillion tokens, and it does all the things you expect: generating, translating, fixing, documenting, and explaining code. Interestingly, it’s also very good at reasoning.
Comparing our Granite 8 billion model against Google’s Gemini model, Meta’s LLaMA model, and Mistral, our model is the highest performing in the world for writing code with 8 billion parameters. These models are special in that they are released under an Apache 2 license, giving maximum freedom of action and rights. The team has published a detailed paper on the Granite code models, which includes benchmarks, comparisons, training methodologies, and more.
We also created a Granite time series model, making predictions of energy demand in Spain. The Granite zero-shot prediction comes closer to the actual energy demand than the statistical benchmark. After fine-tuning with weather correlations, the model performs even better. Our time series models outperform several state-of-the-art models, with dramatic improvements in model size and average error reduction.
In collaboration with NASA, we created a geospatial foundation model called PIBE, which we fine-tuned for higher resolution climate projections. Our model captures regional and local extremes and impacts, scaling to global and regional areas. These geospatial models will soon be available on Hugging Face and IBM Granite.
So, step one is having the vessel with the right properties—performance, transparency, freedom of action, and indemnification. The next step is how to take enterprise data and add it to the vessel, creating a new representation of the data.
Currently, we use large language models and Foundation models to interact with data through methods like RAG (retrieval-augmented generation). This involves vectorizing documents and using them to ground model-generated answers. However, this doesn’t improve the model itself or add long-term value. Another method is fine-tuning, where we alter model weights with specific data, creating specialized models for particular use cases. This approach leads to multiple models and loses generality.
Our research team has invented a new methodology called InstructLab, enabling Foundation models to learn incrementally, like humans. This allows models to improve capabilities progressively and steadily, through incremental skill teaching.
InstructLab begins with a taxonomy representing foundational skills. For example, if you want your model to write emails, you create a taxonomy tree with nodes representing compositional skills and examples of writing in prose. This taxonomy provides the base capability for the model.
When adding new skills or knowledge, you start from the taxonomy and create new nodes with examples. These are reviewed, approved, and sent to InstructLab, which uses a teacher model to generate a large dataset of synthetic examples. This incremental skill and knowledge addition improves the model without losing generality.
As an example, we applied this methodology to our 20 billion parameter code model, which lacked knowledge of COBOL. Using InstructLab, we added foundational knowledge from COBOL books and programming manuals, along with examples of COBOL to Java conversion. This process generated 200,000 synthetic examples, significantly improving the model’s performance in one week compared to months of traditional fine-tuning.
InstructLab enhances value creation, making RAG and fine-tuning more effective. This methodology allows you to incrementally encode more enterprise data, creating a powerful path for continued evolution.
We productized this capability through a core platform, starting with Red Hat Enterprise Linux AI, providing a bootable model runtime and InstructLab for single developers. As you scale, Red Hat OpenShift AI optimizes inference and deployment, while watsonx facilitates application integration, governance, and flexibility.
In conclusion, we live in an incredibly exciting time for computing, with rapid advancements in AI, semiconductors, and quantum technologies. The future of AI is undoubtedly open, driven by collaborative innovation. By embracing openness and collective progress, businesses can harness the full potential of their data, driving innovation and creating AI solutions that meet societal needs. This open innovation strategy is not only the right approach for business success but also essential for supporting the diverse needs of our global society.