brulee 0.7.0
New models for tabular data:
Regularization Learning Networks (
brulee_rln()) use a conventional MLP architecture but each weight learns its own adaptive regularization coefficient.ResNet (
brulee_resnet()) can fit a multilayer neural network with skip (i.e. residual) connections and batch normalization.AutoInt (
brulee_auto_int()) uses residual connections and columnwise attention mechanisms to create embeddings that encourage in-context learning of features.Saint (
brulee_saint()) uses column and/or row attention mechanisms.All modeling functions now support GPU acceleration via the
deviceparameter. Users can specifydevice = "cpu",device = "cuda", ordevice = "mps"(Apple Silicon). Whendevice = NULL(default), the package automatically selects CUDA if available, otherwise defaults to CPU. Note: MPS is not auto-selected because it doesn’t support float64 dtype required by brulee. See?training_efficiencyfor some related notes.
Breaking Changes
Float tensors were changed from 64-bit floats to 32-bit. This is to enable GPU usage on MPS devices.
Parameters are initialized on CPU devices and then converted to the chosen device. In some cases, the RGN initialization code is independent of the seed.
For classification, the softmax was moved out of every model’s forward pass so the loss can use
torch::nnf_cross_entropy()(which applies the log-sum-exp trick internally) instead ofnll_loss(log(softmax(x))). This avoidslog(0)underflow that producedNaNlosses and “numerical overflow” early stopping on overspecifiedbrulee_saint()/brulee_auto_int()fits. Affectsbrulee_mlp(),brulee_logistic_reg(),brulee_multinomial_reg(),brulee_resnet(),brulee_auto_int(), andbrulee_saint(). New fits carryoutput_type = "logits"so the predict path applies softmax; serialized fits from earlier versions of brulee continue to predict correctly.
brulee 0.6.0
CRAN release: 2025-09-02
Transition from the magrittr pipe to the base R pipe.
-
To try to help avoiding numeric overflow in the loss functions:
Tensors are stored as a 64-bit float instead of 32-bit.
Starting values were transitioned to using Gaussian distribution (instead of uniform) with a smaller standard deviation.
The results always contain the initial results to use as a fallback if there is overflow during the first epoch.
brulee_mlp()has two additional parameters,grad_value_clipandgrad_value_clip, that prevent issues.The warning was changed to “Early stopping occurred at epoch {X} due to numerical overflow of the loss function.”
Several new SGD optimizers were added:
"ADAMw","Adadelta","Adagrad", and"RMSprop".Mixture parameter values different than zero cannot be used for several optimizers since they require L2 penalties.
brulee 0.5.0
CRAN release: 2025-04-07
- Removed a unit test for numerical overflow since it occurs less frequently and has become increasingly more challenging to reproduce.
brulee 0.4.0
CRAN release: 2025-01-30
Added a convenience function,
brulee_mlp_two_layer(), to more easily fit two-layer networks with parsnip.Various changes and improvements to error and warning messages.
Fixed a bug that occurred when linear activation was used for neural networks (#68).
brulee 0.3.0
CRAN release: 2024-02-14
Fixed bug where
coef()didn’t would error if used on abrulee_logistic_reg()that was trained with a recipe. (#66)Fixed a bug where SGD always being used as the optimizer (#61).
Additional activation functions were added (#74).
brulee 0.2.0
CRAN release: 2022-09-19
Several learning rate schedulers were added to the modeling functions (#12).
An
optimizerwas added to [brulee_mlp()], with a new default being LBFGS instead of stochastic gradient descent.
