Skip to content

Thoughts on Constrained Intelligence

In my career I've focused mostly on applying what is now called 'traditional machine learning': regression, classification, time series, anomaly detection and clustering algorithms. You could frame machine learning as applying an algorithmic 'constrained intelligence' to a specific business problem. The challenge has always been to 'unconstrain the intelligence' (f.e. by tuning hyperparameters) and to further specify the business problem (proper target definition, clean data, proper cross validation schemes). The advent of large language models is starting to flip the equation; from 'unconstraining' intelligence to 'constraining' it instead.

Large language models as unconstrained intelligence

Large language models can be seen as having 'world knowledge'. They are generic models that have been trained on 'everything' (high quality text data). I like how François Chollet (creator of Keras) puts it:

So a (very) large language model is just a huge repository of many millions of vectors programs containing generic world knowledge. A prompt would select one of the vector programs. Prompt engineering can thus be seen the effort to constrain all that 'intelligence'.

The overlap of machine learning with LLMs is becoming larger. You can use an LLM to determine if an email as 'spam' or 'not spam' (classification), which department should handle an incoming email (multi-class classification), or measuring the quality of a CV (ordinal classification or regression). So for a given business problem, is it easier to 'constrain the intelligence' of a large language model, or to 'unconstrain the intelligence' of a machine learning model?

The limits of prompt engineering

You would think that making LLMs more stupid (constraining intelligence) is a simple matter. It's not. A couple of arguments:

A funny example where an LLM failed to be just a helpful chatbot and started giving away cars:

All this goes to show that constraining these 'intelligent' generic large language models is challenging. Just like reducing the constraints of the limited intelligence of traditional machine-learning models is very challenging. Can we learn something from both approaches; is there something in the middle?

Where LLMs meet traditional ML

It's well established that traditional ML algorithms are still the undisputed king for tabular data when compared to deep learning based approaches (source). They are likely to capture most of the signal present in the data (see my other post Is XGBoost all we need?). Many real world ML problems involve some level of predicting human behaviour and/or randomness. Adding world knowledge won't add much — we won't be seeing LLM-based classifiers and regressors very soon.

The breakthrough with LLMs was based on the scale of the data used. Trained on generic data (many different types of texts), the LLMs were then able to solve domain-specific problems (questions in prompts). This lesson; that generic models outperform specific models; seems to apply to machine learning as well. In businesses the same model is often rebuilt for each region or product or customer group separately. A single, larger, generic model however often outperforms the more specific ones. A series of experiment in the blogpost The Unreasonable Effectiveness of General Models seems to hint at the direction also.

Conclusions & thoughts

So an LLM is just a specific type of generic model for text. And perhaps we can't properly constrain them because they are not intelligent at all. Perhaps compression is all there is: