Doubly Perturbed Task Free Continual Learning

Abstract

Task Free online continual learning (TF-CL) is a challenging problem where the model incrementally learns tasks without explicit task information. Although training with entire data from the past, present as well as future is considered as the gold standard, naive approaches in TF-CL with the current samples may be conflicted with learning with samples in the future, leading to catastrophic forgetting and poor plasticity. Thus, a proactive consideration of an unseen future sample in TF-CL becomes imperative. Motivated by this intuition, we propose a novel TF-CL framework considering future samples and show that injecting adversarial perturbations on both input data and decision-making is effective. Then, we propose a novel method named Doubly Perturbed Continual Learning (DPCL) to efficiently implement these input and decision-making perturbations. Specifically, for input perturbation, we propose an approximate perturbation method that injects noise into the input data as well as the feature vector and then interpolates the two perturbed samples. For decision-making process perturbation, we devise multiple stochastic classifiers. We also investigate a memory management scheme and learning rate scheduling reflecting our proposed double perturbations. We demonstrate that our proposed method outperforms the state-of-the-art baseline methods by large margins on various TF-CL benchmarks.

Motivation of Considering Future Sample

In the paradigm of TF-CL, the gold standard is to train with entire data from the past, present, and future. Therefore, we proactively consider a unknown future sample, which conventional approaches in TF-CL didn't considered before.

Analysis of Considering a Future Sample in TF-CL

Intuition of the Proposed Framework

Intuitively, the adversarial loss flattens the input and weight loss landscape. For the input, it is desirable to achieve low losses for both past and future samples with the current network weights and a flatter input landscape is more conducive to achieving this goal. If the loss flat about weights, then one would expect only a minor increase in loss compared to a sharper weight landscape when the weights shift by training with new samples. h2>

Proposed Method (DPCL)

Sketch of our proposed method, Doubly Perturbed Continual Learning (DPCL). It mainly consists of two components: Perturbed Function Interpolation (Left) and Branched Stochastic Classifiers (Right). They efficiently flatten the input and weight loss landscape, respectively. We also suggest aㅜ effective memory management and adaptive learning rate scheme, called PIMA.

Results on Various Datasets on Disjoint Setup

Results on Various Setups and Complexity Analysis

Qualitative Analysis on Input Loss Landscape

Qualitative Analysis on Weight Loss Landscape

BibTeX


      @InProceedings{lee2023hle,
          author    = {Lee, Byung Hyun and Oh, Min-hwan and Chun, Se Young},
          title     = {Doubly Perturbed Task-Free Continual Learning},
          journal   = {arXiv preprint arXiv:2312.13027},
          year      = {2023},
      }