2021 IEEE International Symposium on Information Theory

12-20 July 2021 • Melbourne, Victoria, Australia

IEEE Information Theory Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Symposium on Information Theory

12-20 July 2021 • Melbourne, Victoria, Australia

All Dates/Times are Australian Eastern Standard Time (AEST)

Technical Program

Paper Detail

Paper ID	D4-S1-T3.1
Paper Title	An information-theoretic analysis of the impact of task similarity on meta-learning
Authors	Sharu Theresa Jose, Osvaldo Simeone, King's College London, United Kingdom
Session	D4-S1-T3: Online & Meta-Learning
Chaired Session:	Thursday, 15 July, 22:00 - 22:20
Engagement Session:	Thursday, 15 July, 22:20 - 22:40
Abstract	Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical loss measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the meta-learner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.