Function Encoders:
A Principled Approach to Transfer Learning in Hilbert Spaces

The University of Texas at Austin
ICML 2025

Abstract

A central challenge in transfer learning is designing algorithms that can quickly adapt and generalize to new tasks without retraining. Yet, the conditions of when and how algorithms can effectively transfer to new tasks is poorly characterized. We introduce a geometric characterization of transfer in Hilbert spaces and define three types of inductive transfer: interpolation within the convex hull, extrapolation to the linear span, and extrapolation outside the span. We propose a method grounded in the theory of function encoders to achieve all three types of transfer. Specifically, we introduce a novel training scheme for function encoders using least-squares optimization, prove a universal approximation theorem for function encoders, and provide a comprehensive comparison with existing approaches such as transformers and meta-learning on four diverse benchmarks. Our experiments demonstrate that the function encoder outperforms state-of-the-art methods on four benchmark tasks and on all three types of transfer.

A Geometric Characterization of Inductive Transfer

An illustration of the three types of transfer. Type 1 transfer is within the convex hull of the training functions. Type 2 transfer is extrapolation to the linear span. Type 3 transfer is extrapolation to the rest of the Hilbert space.

We consider an inductive transfer setting and present a geometric characterization of inductive transfer using principles from functional analysis. Inductive transfer involves transferring knowledge to new, unseen tasks while keeping the data distribution the same. For instance, labeling images according to a new, previously unknown class, where only a few examples are provided after training. While prior works have studied transfer learning, gaps remain in identifying when learned models will succeed and when they will fail. We introduce a characterization of inductive transfer, based on Hilbert spaces, which will provide intuition about the difficulty of a given transfer learning problem.

Specifically, we characterize transfer using three types:

(Type 1) Interpolation within the convex hull. Tasks that can be represented as a convex combination of observed source tasks.


(Type 2) Extrapolation to the linear span. Tasks that are in the linear span of source tasks, which may lie far from observed data but share meaningful features.


(Type 3) Extrapolation to the Hilbert space. Tasks that are outside the linear span of the source predictors in an infinite-dimensional function space. Type 3 transfer is the most important and challenging form of transfer.

The Function Encoder


Basis functions are a natural solution to inductive transfer learning in a Hilbert space. We build upon the function encoder algorithm, a method for learning neural network basis functions to span an arbitrary function space. While analytical approaches such as Fourier series scale poorly with the dimensionality of the input and output spaces, function encoders scale extremely well due to the use of neural networks.

We make several improvements to the function encoder algorithm. First, we generalize all definitions to use inner products only, allowing the function encoder to work on any Hilbert space. This generalization allows us to tackle new problems, such as few-shot classification. Second, we introduce a novel training scheme for function encoders using least-squares optimization. This training scheme greatly improves convergence rate and accuracy. Lastly, we prove a universal function space approximation theorem for function encoders, showing that they can approximate any function in a separable Hilbert space to any desired accuracy.

Experimental Results

We compare function encoders against state-of-the-art transfer learning algorithms on four different datasets. The first is simple polynomial regression, and this example is designed to highlight the failure modes of other approaches. The other three datasets are more challenging: Few-shot image classification for CIFAR, Pose estimation on the seven scenes dataset, and dynamics prediction for a MuJoCo ant. For each dataset, we also separately evaluate each type of transfer, and we show that the function encoder outperforms prior works, such as transformers, meta learning, and auto-encoders, on all three types of transfer.

An Illustrative Polynomial Example

In this example, the algorithms estimate a polynomial function from a few data points. The training set consists of quadratic functions. Type 1 transfer are function sampled from the same distribution as the training set. Type 2 transfer are functions that are much larger in magnitude then anything in the training set, but still quadratic. Type 3 transfer is cubic functions. To illustrate this, we provide a simple qualitative example comparing the function encoder against auto-encoders.
An illustration of a function encoder with 100 basis functions approximating a type 3 transfer funciton.
We observe that both approaches achieve reasonable performance for type 1 transfer. For type 2 transfer, the target function is much larger in magnitude than any function in the training set. The auto encoder fails at this function because it has only learned to output functions from the training function space. In contrast, the function encoder generalizes to the entire span of the training function space by design. For type 3 transfer, the target function is a cubic function. The auto encoder nonetheless outputs a function that is similar to the ones seen during training. When using a function encoder with only three basis functions, the basis functions only span the three-dimensional space of quadratic functions, and so its approximation is the best quadratic to fit the data. When using 100 basis functions, the basis functions spans the space of quadratics, but additionally have 97 unconstrained dimensions. Due to the use of least squares, the function encoder with 100 basis functions optimally uses these extra 97 dimensions to fit the new function. Therefore, it is able to reasonable approximate this function as well, despite having never seen a cubic function during training. The quantitative results are shown here:
The training curves for all baselines on the polynomial dataset. The function encoder outperforms all baselines by orders of magnitude.
All prior works such as meta-learning and transformers fail on type 2 and type 3 transfer, even for a simple 1-dimensional polynomial regression problem.

Few-Shot Image Classification

Due to the broad applicability of Hilbert spaces, we can apply the function encoder to many settings. To highlight this, we apply it to the few-shot image classification problem on the CIFAR dataset. In this setting, the model is given some positive examples showing what the class looks like. It is also given some negative examples of images belonging to other classes. Then, for any new class, the model should predict if the specified class is present in the image.
A visualization of the few-shot image classification problem. Some positive examples of a desired class are provided, along with some negative examples of other classes. The model should be able to identify the class for any new image.
We use 90 of the 100 classes for training. Type 1 transfer consists of unseen images from these 90 classes. Type 2 transfer is not easily testable, but would consists of images that belong to two classes at once. Type 3 transfer is the 10, unseen classes. The training results are shown below:
The training curves for all baselines on the CIFAR dataset. The function encoder outperforms all baselines, including ad-hoc approaches such as Siamese networks.
In addition to the typical baselines, we also compare against Siamese networks. The results show that function encoders perform even better than Siamese networks on this problem, despite the fact that Siamese networks are designed explicitly for this setting. All other approaches perform poorly, which highlights the difficulty of this problem.

Pose Estimation

Another interesting problem that can be expressed as a Hilbert space is pose estimation. The model is provided with a set of images and the location of the camera when these images were taken. Then, for any new image, the model should predict the location of the camera.
The model is provided with a set of images and their locations. For any new image, it should estimate the location of the camera.
We use the 7 scenes dataset, which consists of 7 different scenes. The model is trained on 6 of these scenes. Unseen images from these 6 scenes are used for type 1 transfer. Type 2 transfer would consist of shifting the origin or scaling the units. The seventh scene is used for type 3 transfer.
The training curves for all baselines on the 7 Scenes dataset. The function encoder outperforms all baselines.
Many approaches converge during training. As expected, all approaches perform much worse at type 1 transfer, indicating a degree of over-fitting. The function encoder performs best at both type 1 and type 3 transfer, indicating its ability to optimally use the learned features for unseen data.

Hidden-Parameter Dynamics Estimation for MuJoCo

Lastly, we run experiments on a hidden-parameter version of the MuJoCo Ant. The algorithm is given data from the beginning of a trajectory, and it must estimate the dynamics going forward. The training dataset consists of small robots. Type 1 transfer consists of robots sampled from the same hidden-parameter distribution as training. Type 2 transfer is evaluated via synthetically generated data consisting of the linear combination of the dynamics present in type 1. Type 3 transfer are robots that are much larger in size than the training set. We visualize these robots below:
Type 1 transfer consists of small, 4 legged robots. Type 3 transfer consists of very large, 4 legged robots. The training curves for all baselines on the ant dataset. The function encoder outperforms all baselines.
All algorithms demonstrate convergence during training, albeit to various levels. However, many algorithms perform much worse for type 1 transfer. The function encoder performs best, although many approaches, such as the transformer and MAML(n=5), are comparable. Furthermore, the function encoder is clearly best for type 2 transfer. Type 3 transfer tells an interesting story. The function encoder has the best stable performance, although approaches such as the transformer and the auto encoder are not far behind. At the beginning of training, the auto encoder shows the best overall performance, although it degrades as training continues. This is because training is not optimizing for type 3 transfer, and the best model parameters for the training dataset are not the best model parameters for type 3 transfer. Thus, its performance is unstable.

BibTeX

@article{ingebrand_2025_fe_transfer,
  author       = {Tyler Ingebrand and
                  Adam J. Thorpe and
                  Ufuk Topcu},
  title        = {Function Encoders: A Principled Approach to Transfer Learning in Hilbert Spaces},
  year         = {2025}
  journal      = {International Conference on Machine Learning (ICML)},
}