Balancing priors and learning biases to improve Bayesian Neural Networks

News

July 21-25, 2025: Rio de Janeiro, Brazil

Our paper: Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations was accepted to the 41st Conference on Uncertainty in Artificial Intelligence (UAI2025). Meet us in Rio!
December 14, 2024: Vancouver, British Columbia, Canada

Our paper: Hi-fi functional priors by learning activations was accepted to NeurIPS workshop on Bayesian Decision-making and Uncertainty. An extended version can be found on [arXiv] and code at [github].
November 15, 2024: Cracow, Poland

Mateusz Pyla joined our team as a scholarship recipient!
July 22-26, 2024: Vienna, Austria

Meet us at ICML 2024 in Vienna!
April 18-19, 2024: Mons, Belgium

We attended the Marie Skłodowska-Curie Actions (MSCA) conference, organised by the Belgian Presidency of the Council of the European Union with the support of the European Commission.
April 01, 2024: Cracow, Poland

We are excited to announce start of the project!

About

Title: Balancing priors and learning biases to improve Bayesian Neural Networks
Acronym: PLBNN
Principal Investigator: Tomasz Kuśmierczyk, Ph.D.
Mentor: prof. Jacek Tabor, D.Sc.

Hosting Entity: WMII @ UJ

The project is being executed at the Faculty of Mathematics and Computer Science of the Jagiellonian University in Cracow.

Description for the general public

Deep Neural Networks (DNNs) are nowadays the most popular approach used in most applications of AI. They are structures composed of many complex processing layers (hence "deep"), each of which is composed of multiple processing units, so-called neurons (hence "neural"). DNNs are used for making predictions, classifying objects, controlling robots, and many more tasks. They can be viewed as processing machines that when served an input produce relevant outputs. These inputs can be some features, for example, an image of a cat, and outputs can be labels, for example, "cat". The predictions standard DNNs produce are, however, far from perfect and they can be catastrophically incorrect. Being incorrect itself is not such a major issue, but what is important, DNNs do not know when they are incorrect. This is because DNNs do not manage uncertainty well - they are overly confident about their predictions, for example, in the case when they are shown an input that was not previously (during training of such a network) seen.

A more reliable counterpart of DNNs is Bayesian Neural Networks (BNNs). BNNs are DNNs in the sense that they follow the same design pattern: consist of multiple complex processing layers. On the other hand, they learn and perform predictions differently, and what is the most important they know when they do not know. This is thanks to the Bayesian inference framework. All information (including predictions) in the Bayesian framework is inherently encoded in a way including information about uncertainty, by representing it with distributions over possible values. It applies also to BNN's parameters: when training a BNN we learn their values only up to some level of certainty.

In Bayesian learning, model before seeing any data is already assumed to encode certain a priori knowledge. In particular, by priors we mean distributions over a model's (e.g. BNN's) parameters (sometimes also over model structure) before actual training or inference is performed. Then, when data is presented to the model it can update these distributions accordingly to the so-called posteriors. For example, before seeing ever a cat we can know it has four legs and a tail, but after seeing some pictures a cat detection model can update itself in a way that it would also check for fur and pointy ears. Nevertheless, if the original beliefs were completely wrong, it would take a long time to adjust them appropriately. Additionally, if they were also too strong it may be entirely impossible.

As illustrated above, setting the priors right is an important part of the model building process. However, for complex models such as BNNs, it is also a nontrivial task, since we lack the intuition about how a particular parameter's value translates to beliefs expressed by model outputs. Furthermore, the previous evidence shows that setting them naively may have significant consequences on performance of the model. Another problem with BNNs is that finding the posteriors is also a challenging task. This is due to the large number of parameters these models have as well as due to the large size of the data used for training them (=finding posteriors). In practice, posteriors are found only in an approximate way, which often results in suboptimal performance and additionally complicates the understanding of priors' impact. These limitations taken altogether cause BNNs to perform far below the expectations and the standard DNNs are still used more often in practice.

Our goal is to address some of the above challenges and make BNNs more competitive and release some of their potential. We hypothesize that to achieve better performance for various tasks, network structure, learning approach and priors need to be decided jointly. In particular, priors need to be given more attention and we plan to investigate how better priors can improve the performance of BNNs. Our goal is to make these priors smarter so they would adapt themselves to a given task. This implies two questions: First, how to create such smart and flexible enough priors. Second, how to make them learn with the approximate methods, in a way that BNNs would be competitive in terms of both training time and effectiveness.

To sum up our objectives in this project, we will start by investigating how priors learning can improve the performance of BNNs and when it is optimal to learn priors. We will begin with the identification of selection and learning methods for priors in Bayesian neural networks optimal for various objectives. Furthermore, we will search for optimal architectures of flexible priors and posteriors for parameters of BNNs. We plan to look into structured, hierarchical, and heterogeneous priors. Finally, from a theoretical point of view, we will study the correspondence of the BNNs and other Bayesian models and in particular, how BNNs relate to so-called Gaussian Processes.

We will evaluate our approaches on the standard benchmarks for several interesting and important settings. We aim at improving effectiveness against state-of-the-art approaches for uncertainty quantification.

Funding

This research is part of the project No. 2022/45/P/ST6/02969 co-funded by the National Science Centre and the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 945339.

Project

Deep Neural Networks are nowadays a dominant tool used in most applications of AI. They suffer from the inability to handle epistemic uncertainty, the requirement for large processed (e.g., labeled) datasets, and finally, the dependence on implicit inductive biases. The difficulties of classic deep learning approaches can be alleviated by taking the Bayesian perspective and incorporating Bayesian components into the modeling framework, which altogether gives rise to modern Bayesian deep learning. Despite great expectations from BNNs as an approach that bridges two very successful methodologies, namely Deep and Bayesian learning, results so far have been unsatisfactory. The theoretical superiority of this class of models has not yet been successfully confirmed by applications.

Some challenges are related to modeling and model specification. In particular, we lack intuition about priors, and it is unclear how they should be specified. For example, it is an open question how priors should depend on model structure or even what their role should be (in Bayesian learning, priors can not only carry some kind of upfront information about the data, but for example, shrinkage or clustering priors can also bias learning towards certain outcomes). Another challenging aspect is learning. Existing inference methods are unable to recover posteriors for models with hundreds of parameters, even with multiple simplifications. Additionally, the learning limitations reflect on modeling capabilities. For example, by assuming a Gaussian posterior with a diagonal covariance matrix learned using variational inference, we may also implicitly filter out some potential impacts or benefits of priors.

Better uncertainty quantification, OOD detection, and model calibration are possible by addressing the above challenges jointly and accounting for priors, model structure, and learning methods at the same time.

Main Hypothesis

Priors, architecture and learning method decided together lead to better performance.

Research Questions

RQ1: How better priors can improve performance of BNNs?
RQ2: What architectures of priors and posteriors are optimal?
RQ3: How to select priors for BNNs?
RQ4: Shall models with advanced priors be handled in a more or less Bayesian way?

Preceding publications

E. de Souza da Silva, T. Kuśmierczyk, M. Hartmann, A. Klami: Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching. Journal of Machine Learning Research (JMLR) 24 (2023) 1-51.

The behavior of many Bayesian models used in machine learning critically depends on the choice of prior distributions, controlled by some hyperparameters typically selected through Bayesian optimization or cross-validation. This requires repeated, costly, posterior inference. We provide an alternative for selecting good priors without carrying out posterior inference, building on the prior predictive distribution that marginalizes the model parameters. We estimate virtual statistics for data generated by the prior predictive distribution and then optimize over the hyperparameters to learn those for which the virtual statistics match the target values provided by the user or estimated from (a subset of) the observed data. (...)

T. Kuśmierczyk, J. Sakaya, A. Klami: Correcting Predictions for Approximate Bayesian Inference. AAAI 2020.

Bayesian models quantify uncertainty and facilitate optimal decision-making in downstream applications. For most models, however, practitioners are forced to use approximate inference techniques that lead to sub-optimal decisions due to incorrect posterior predictive distributions. We present a novel approach that corrects for inaccuracies in posterior inference by altering the decision-making process. We train a separate model to make optimal decisions under the approximate posterior, combining interpretable Bayesian modeling with optimization of direct predictive accuracy in a principled fashion. The solution is generally applicable as a plug-in module for predictive decision-making for arbitrary probabilistic programs, irrespective of the posterior inference strategy. We demonstrate the approach empirically in several problems, confirming its potential.

T. Kuśmierczyk, J. Sakaya, A. Klami: Variational Bayesian Decision-making for Continuous Utilities. NeurIPS 2019.

Bayesian decision theory outlines a rigorous framework for making optimal decisions based on maximizing expected utility over a model posterior. However, practitioners often do not have access to the full posterior and resort to approximate inference strategies. In such cases, taking the eventual decision-making task into account while performing the inference allows for calibrating the posterior approximation to maximize the utility. We present an automatic pipeline that co-opts continuous utilities into variational inference algorithms to account for decision-making. We provide practical strategies for approximating and maximizing the gain, and empirically demonstrate consistent improvement when calibrating approximations for specific utilities.

P. Marszałek, K. Bałazy, J. Tabor, T. Kuśmierczyk: Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA. [arXiv]

Abstract: Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of large language models by decomposing weight updates into low-rank matrices, significantly reducing storage and computational overhead. While effective, standard LoRA lacks mechanisms for uncertainty quantification, leading to overconfident and poorly calibrated models. Bayesian variants of LoRA address this limitation, but at the cost of a significantly increased number of trainable parameters, partially offsetting the original efficiency gains. Additionally, these models are harder to train and may suffer from unstable convergence. In this work, we propose a novel parameter-efficient Bayesian LoRA, demonstrating that effective uncertainty quantification can be achieved in very low-dimensional parameter spaces. The proposed method achieves strong performance with improved calibration and generalization while maintaining computational efficiency. Our empirical findings show that, with the appropriate projection of the weight space: (1) uncertainty can be effectively modeled in a low-dimensional space, and (2) weight covariances exhibit low ranks.

Code for the paper is available at [github].

M. Sendera, A. Sorkhei, T. Kuśmierczyk: Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations. Accepted to UAI 2025. [arXiv]

Abstract: Gaussian Processes (GPs) provide a convenient framework for specifying function-space priors, making them a natural choice for modeling uncertainty. In contrast, Bayesian Neural Networks (BNNs) offer greater scalability and extendability but lack the advantageous properties of GPs. This motivates the development of BNNs capable of replicating GP-like behavior. However, existing solutions are either limited to specific GP kernels or rely on heuristics. We demonstrate that trainable activations are crucial for effective mapping of GP priors to wide BNNs. Specifically, we leverage the closed-form 2-Wasserstein distance for efficient gradient-based optimization of reparameterized priors and activations. Beyond learned activations, we also introduce trainable periodic activations that ensure global stationarity by design, and functional priors conditioned on GP hyperparameters to allow efficient model selection. Empirically, our method consistently outperforms existing approaches or matches performance of the heuristic methods, while offering stronger theoretical foundations.

M. Sendera, A. Sorkhei, T. Kuśmierczyk: Hi-fi functional priors by learning activations. NeurIPS workshop on Bayesian Decision-making and Uncertainty 2024. [OpenReview][Online]

Abstract: Function-space priors in Bayesian Neural Networks (BNNs) provide a more intuitive approach to embedding beliefs directly into the model’s output, thereby enhancing regularization, uncertainty quantification, and risk-aware decision-making. However, imposing function-space priors on BNNs is challenging. We address this task through optimization techniques that explore how trainable activations can accommodate higher-complexity priors and match intricate target function distributions. We investigate flexible activation models, including Pade functions and piecewise linear functions, and discuss the learning challenges related to identifiability, loss construction, and symmetries. Our empirical findings indicate that even BNNs with a single wide hidden layer when equipped with flexible trainable activation, can effectively achieve desired function-space priors.

Presentation delivered at the GMUM Tea Seminar on 2024-11-08 and at Statistical Analysis and Modeling Group (IPI PAN) seminar on 2024-12-03.

Code for the paper is available at [github].

NeurIPS workshop on Bayesian Decision-making and Uncertainty: Workshop poster.

Remaining Publications

Patryk Marszałek, Tomasz Kuśmierczyk, Witold Wydmański, Jacek Tabor, Marek Śmieja: ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data. [arXiv]

Abstract: Clustering tabular data remains a significant open challenge in data analysis and machine learning. Unlike for image data, similarity between tabular records often varies across datasets, making the definition of clusters highly dataset-dependent. Furthermore, the absence of supervised signals complicates hyperparameter tuning in deep learning clustering methods, frequently resulting in unstable performance. To address these issues and reduce the need for per-dataset tuning, we adopt an emerging approach in deep learning: zero-shot learning. We propose ZEUS, a self-contained model capable of clustering new datasets without any additional training or fine-tuning. It operates by decomposing complex datasets into meaningful components that can then be clustered effectively. Thanks to pre-training on synthetic datasets generated from a latent-variable prior, it generalizes across various datasets without requiring user intervention. To the best of our knowledge, ZEUS is the first zero-shot method capable of generating embeddings for tabular data in a fully unsupervised manner. Experimental results demonstrate that it performs on par with or better than traditional clustering algorithms and recent deep learning-based methods, while being significantly faster and more user-friendly.

Borycki, P., Kubacki, P., Przewięźlikowski, M., Kuśmierczyk, T., Tabor, J., & Spurek, P. (2025). Hypernetwork Approach to Bayesian MAML (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29325-29327. [Online]

Abstract: The main goal of Few-Shot learning algorithms is to enable learning from small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). In this paper, we propose a novel framework for Bayesian MAML called BH-MAML, which employs Hypernetworks for weight updates. It learns the universal weights point-wise, but a probabilistic structure is added when adapted for specific tasks. In such a framework, we can use simple Gaussian distributions or more complicated posteriors induced by Continuous Normalizing Flows.

Work in progress

Care to join our work and make a mark on advancements in Bayesian Neural Networks? Don't hesitate to drop us an email at t.kusmierczyk@uj.edu.pl

Remaining Presentations

Beyond BBB: Practical Alternatives to Posterior Approximation in Bayesian Neural Networks [Slides]

Introduction to SWAG and Laplace for approximating posteriors. Delivered at the GMUM Tea Seminar on 2025-01-24.

Introduction [Slides]

Lectures introducing Bayesian methods and Bayesian Neural Networks presented to 5th-year students at UJ on May 29, 2024, and June 12, 2024.
[Shorter version presented to 3rd-year students on June 14, 2024]

Remaining Code Contributions

Reparameterized PyTorch

Library code is available at [github].

Remaining Posters

[Teaser]

Project teaser highlighting key aspects of Bayesian Neural Networks and research goals of the project.

Team

Tomasz Kuśmierczyk, PhD

Principal Investigator

Bayesian methods
variational inference
approximate learning

t.kusmierczyk[@]uj.edu.pl

Prof. Jacek Tabor

Mentor

deep learning
clustering methods
information theory
entropy based classification

jacek.tabor[@]uj.edu.pl

Mateusz Pyla

PhD Student, fellow

optimization
generative models
uncertainty quantification

mateusz.pyla[@]doctoral.uj.edu.pl

Contact

ul. Profesora Stanisława Łojasiewicza 6/2061

30-348 Kraków

Email: t.kusmierczyk@uj.edu.pl