Why you should consider using Mixture Models in AI

AGI is so old-school

Devansh
TechFlows

--

Single Expert models are so 2019. If you’re looking to build powerful AI, consider using mixture models.

The image below is taken from the excellent publication “Modelling heterogeneous distributions with an Uncountable Mixture of Asymmetric Laplacians”. The authors generated multi-distribution synthetic data. Such distributions are very difficult for single AI models, no matter how good the models are. However, utilizing a mixture of different models are able to model this perfectly.

To quote-

“In regression tasks, aleatoric uncertainty is commonly addressed by considering a parametric distribution of the output variable, which is based on strong assumptions such as symmetry, unimodality or by supposing a restricted shape. These assumptions are too limited in scenarios where complex shapes, strong skews or multiple modes are present…. We demonstrate that UMAL produces proper distributions, which allows us to extract richer insights and to sharpen decision-making.”

This has utility in very important real-world use cases. For example, Google AI was able to leverage a mixture of Laplacian models combined with LSTMs to significantly improve flood forecasts for 460 Million People.

Here is how the researchers describe their architecture-

Our river forecast model uses two LSTMs applied sequentially: (1) a “hindcast” LSTM ingests historical weather data (dynamic hindcast features) up to the present time (or rather, the issue time of a forecast), and (2) a “forecast” LSTM ingests states from the hindcast LSTM along with forecasted weather data (dynamic forecast features) to make future predictions. One year of historical weather data are input into the hindcast LSTM, and seven days of forecasted weather data are input into the forecast LSTM. Static features include geographical and geophysical characteristics of watersheds that are input into both the hindcast and forecast LSTMs and allow the model to learn different hydrological behaviors and responses in various types of watersheds.

Output from the forecast LSTM is fed into a “head” layer that uses mixture density networks to produce a probabilistic forecast (i.e., predicted parameters of a probability distribution over streamflow). Specifically, the model predicts the parameters of a mixture of heavy-tailed probability density functions, called asymmetric Laplacian distributions, at each forecast time step. The result is a mixture density function, called a Countable Mixture of Asymmetric Laplacians (CMAL) distribution, which represents a probabilistic prediction of the volumetric flow rate in a particular river at a particular time.

Here the Countable Mixture of Asymmetric Laplacians is used to better model the uncertainty To read more about how Google uses CMAL check out the article below-

--

--

Devansh
TechFlows

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction