From Frontier Models to SLMs: The Shift Toward Predictable AI

Prathamesh Kulkarni
Jan 11
3 min read

From what I am seeing, people have finally started to realize the hype around LLM applications was overblown, and they are seeing the real downsides of jumping on the hype train. It mostly comes down to ROI. Maintaining and running these systems in production is costly, and most of the time, we have to shrug our shoulders and say, "It is what it is" when something stops working. Maybe the LLM just doesn't comply, or the system behaves weirdly for reasons you can't pin down, or your pipeline simply won't run the way it did a few months ago.

Mostly, this is because we have been producing systems by directly consuming frontier models and building solutions around them. I won't go so far as calling them "wrappers"; there are real solutions being built, it is just that everything revolves around a specific model we are consuming and the way it works at this exact moment. We are essentially leaving it in the hands of the AI overlords if the model starts behaving differently.

Now that the hype is dying out, people are asking actual engineering questions. This is leading to what I call "falling back to ML pipelines." It is a loose connection, but it conveys the point: just as there are robust traditional ML pipelines, with observability, monitoring, model retraining, and data pipelines, the same logic is starting to apply to LLMs.

When we want extreme control over what our model should or should not answer, and we want to train on specific data to guide accuracy, we turn to SLMs (Small Language Models). You can train them on specific data; they are small, and you have significantly more control over the output. You can keep optimizing them until you are satisfied; it is no longer "in the hands of the universe." Because the size is small, you don't need a massive model for a simple task. Deployment is in our control, it is cheaper because it requires less compute, and we can continuously improve it on newer data.

Yes, the initial cost of building an SLM system is higher, but think of it as an investment that gets cheaper over time. Current API-based systems are cheaper to build initially, but the cost and technical debt increase over time. While SLMs still require active maintenance, you can technically automate the entire lifecycle, much like traditional ML pipelines, so the system takes care of itself without you having to manually mess with models every time a provider updates an API.

For AI Engineers, it was fine until now to know literally nothing about fine-tuning. But as we fall back on traditional ML-style systems, we have to ensure we are aware of the full stack. The expectations for newer engineers are shifting: you need to be a "deep generalist." You should know software engineering, AI/ML, DevOps, cloud, networking, security, customer handling, and pre-sales.

Is that too much to ask? I think so. But the harsh reality, which even I fear sometimes, is that you will be thrown out if you can't show these capabilities. Do I know these fields? Yes, I am deep into them. You can do it as well, but it comes down to: how deep do you want to go? We can talk about that in detail in another blog.

To conclude, you should start looking into SLMs. Understand the drawbacks and the advantages. If you can't build the infrastructure yet or if it's too expensive, at least watch a few videos on fine-tuning and related topics.

Prathamesh Kulkarni