Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
242 changes: 190 additions & 52 deletions capabilities/fine-tuning.mdx
Original file line number Diff line number Diff line change
@@ -1,84 +1,222 @@
---
title: "Fine Tuning"
description: "Optimize TabPFN models to your own data with fine-tuning."
title: "Fine-Tuning"
description: "Adapt TabPFN's pretrained foundation model to your data with gradient-based fine-tuning."
---

Fine-tuning enables you to **optimize TabPFNs pretrained foundation models** to your own datasets. It works by **updating the pretrained transformer parameters** by training with a user-provided dataset using gradient descent. This process retains TabPFNs learned priors while aligning it more closely with the target data distribution.
Fine-tuning updates TabPFN's pretrained transformer parameters using gradient descent on your dataset. This retains TabPFN's learned priors while aligning the model more closely with your target data distribution.

You can fine-tune both:

- [`TabPFNClassifier`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/classifier.py) – for classification tasks
- [`TabPFNRegressor`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/regressor.py) – for regression tasks
- [`FinetunedTabPFNClassifier`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/finetuning/finetuned_classifier.py) — for classification tasks
- [`FinetunedTabPFNRegressor`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/finetuning/finetuned_regressor.py) — for regression tasks

## When to Fine-Tune

Fine-tuning is not always necessary. TabPFN's in-context learning already adapts to your data at inference time. Fine-tuning adds value in specific scenarios:

### Good Candidates for Fine-Tuning

<CardGroup cols={2}>
<Card title="Niche or specialized domains" icon="microscope">
Your data represents a distribution not well-covered by TabPFN's pretraining priors — e.g., molecular properties, specialized sensor data, or domain-specific financial instruments.
</Card>
<Card title="Consistent data schema" icon="table">
You have a stable schema that you'll predict on repeatedly. Fine-tuning amortizes the upfront cost across many future predictions.
</Card>
<Card title="Large training sets (10k+ rows)" icon="database">
With more data, fine-tuning can learn meaningful adaptations without overfitting.
</Card>
<Card title="Multiple related tables" icon="layer-group">
You have a family of related datasets (e.g., multiple experiments, regional variants) and want to fine-tune a single model across them.
</Card>
</CardGroup>

### When Fine-Tuning is Less Likely to Help

- On very small datasets (< 1000 rows), the risk of overfitting outweighs adaptation benefits. Try [feature engineering](/tips-and-tricks#feature-engineering) or [AutoTabPFN ensembles](/extensions/post-hoc-ensembles) instead.
- If baseline TabPFN is already within a few percent of your target metric, the simpler approaches in [Tips & Tricks](/tips-and-tricks) often close the gap with less effort.
- On datasets with gradual temporal distribution shifts and many features, fine-tuning can be less stable. Make sure your train/validation split respects the time ordering.

### Decision Flowchart

<Steps>
<Step title="Run baseline TabPFN">
Evaluate default `TabPFNClassifier` or `TabPFNRegressor` on your task.
</Step>
<Step title="Try quick wins first">
Apply [feature engineering, metric tuning, and preprocessing tuning](/tips-and-tricks) — these are faster to iterate on.
</Step>
<Step title="Try AutoTabPFN or HPO">
If you need more, try [AutoTabPFN ensembles](/extensions/post-hoc-ensembles) or [hyperparameter optimization](/extensions/hpo).
</Step>
<Step title="Fine-tune when plateau'd">
If performance has plateau'd and you have sufficient data (1000+ rows), fine-tuning can push past the ceiling by adapting the model's internal representations.
</Step>
</Steps>

Fine-tuning can help especially when:
## Getting Started

- Your data represents an edge case or niche distribution not well-covered by TabPFN’s priors.
- You want to specialize the model for a single domain (e.g., healthcare, finance, IoT sensors)
Fine-tuning shares the same interface as `TabPFNClassifier` and `TabPFNRegressor`.

**Recommended setup**
### 1. Prepare Your Dataset

Fine-tuning requires **GPU acceleration** for efficient training.
Load and split your data into train and test sets. Use a proper validation strategy: for time-dependent data, use temporal splits rather than random splits.

## Getting Started
```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
```

### 2. Configure and Train

<CodeGroup>

```python Classifier
from tabpfn.finetuning import FinetunedTabPFNClassifier

finetuned_clf = FinetunedTabPFNClassifier(
device="cuda",
epochs=30,
learning_rate=1e-5,
)

finetuned_clf.fit(X_train, y_train)
```

```python Regressor
from tabpfn.finetuning import FinetunedTabPFNRegressor

finetuned_reg = FinetunedTabPFNRegressor(
device="cuda",
epochs=30,
learning_rate=1e-5,
)

finetuned_reg.fit(X_train, y_train)
```

</CodeGroup>

By default, fine-tuning splits off 10% of the training data for validation and uses early stopping (patience of 8 epochs). You can also provide your own validation set, which is useful for temporal data or other cases where a random split isn't appropriate:

```python
finetuned_clf.fit(X_train, y_train, X_val=X_val, y_val=y_val)
```

### 3. Predict

<CodeGroup>

```python Classifier
y_pred = finetuned_clf.predict(X_test)
y_pred_proba = finetuned_clf.predict_proba(X_test)
```

```python Regressor
y_pred = finetuned_reg.predict(X_test)
```

</CodeGroup>

## Hyperparameters

### Core Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `epochs` | `30` | Number of fine-tuning epochs. More epochs allow deeper adaptation but risk overfitting. |
| `learning_rate` | `1e-5` | Step size for gradient updates. Lower values are safer but slower to converge. |
| `device` | `"cuda"` | GPU is strongly recommended. Fine-tuning on CPU is very slow. |

### Tuning Guidelines

**Learning rate:**
- Start with `1e-5` (the default). This is conservative and preserves pretrained knowledge.
- For larger datasets (10k+ rows), you can try `3e-5` to `1e-4` for faster convergence.
- If you see training loss spike or diverge, reduce the learning rate.

**Epochs:**
- `10–30` epochs is a good starting range for most datasets.
- For high-accuracy tasks where you're fine-tuning carefully, use more epochs (50–100) with a lower learning rate to allow gradual adaptation without destroying pretrained representations.
- Monitor validation loss to detect overfitting — stop if validation performance degrades.

<Warning>
Fine-tuning requires GPU acceleration. While it will run on CPU, training times will be impractical for most use cases.
</Warning>

## Multi-GPU Fine-Tuning

Fine-tuning supports multi-GPU training via PyTorch DDP (Distributed Data Parallel). This is auto-detected when launched with `torchrun`:

The fine-tuning process is similar for classifiers and regressors and shares the same interface as the standard `TabPFNClassifier` and `TabPFNRegressor` classes.
```bash
torchrun --nproc-per-node=4 your_finetuning_script.py
```

1. **Prepare your dataset:** Load and split your data into a train and test set.
2. **Configure your model:** Initialize a [`FinetunedTabPFNClassifier`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/finetuning/finetuned_classifier.py) or [`FinetunedTabPFNRegressor`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/finetuning/finetuned_regressor.py) with your desired finetuning hyperparameters.
No code changes are needed. The DDP setup is handled internally based on the `LOCAL_RANK` environment variable that `torchrun` sets. Note that `.fit()` should only be called once per `torchrun` session.

<CodeGroup>
## How It Works

```python Classifier
finetuned_clf = FinetunedTabPFNClassifier(
device="cuda",
epochs=30,
learning_rate=1e-5,
)
```
TabPFN performs in-context learning: during inference, it processes both training data and test samples in a single forward pass, using attention to identify relevant patterns. Fine-tuning adapts the transformer's weights so that the attention mechanism more accurately reflects the similarity structure of your specific data.


```python Regressor
finetuned_reg = FinetunedTabPFNRegressor(
device="cuda",
epochs=30,
learning_rate=1e-5,
)
```
Concretely, after fine-tuning:
- The query representations of test samples and key representations of training samples produce dot products that better reflect their target similarity.
- This allows the fine-tuned model to more appropriately weight relevant in-context samples when making predictions.

</CodeGroup>
The fine-tuning process decouples the preprocessing pipeline to generate transformed tensors that mirror the preprocessing configurations used during inference, ensuring the model optimizes on the exact same data variations it encounters when making predictions.

## Best Practices

3. **Run fit on your train set:** This will run the finetuning training loop for the specified number of epochs.
<AccordionGroup>
<Accordion title="Always compare against baseline">
Before fine-tuning, establish a baseline with the default `TabPFNClassifier` or `TabPFNRegressor`. Fine-tuning should measurably improve on this baseline — if it doesn't, the simpler model is preferable.
</Accordion>

<CodeGroup>
<Accordion title="Use proper validation">
Split a held-out validation set and monitor performance across epochs. For time-series or temporal data, use a temporal split rather than random cross-validation.
</Accordion>

```python Classifier
finetuned_clf.fit(X_train, y_train)
```
<Accordion title="Start conservative, then adjust">
Begin with the defaults (`epochs=30`, `learning_rate=1e-5`). Only increase aggressiveness if you see clear room for improvement without signs of overfitting.
</Accordion>


```python Regressor
finetuned_reg.fit(X_train, y_train)
```
<Accordion title="Combine with feature engineering">
Fine-tuning and [feature engineering](/tips-and-tricks#feature-engineering) are complementary. Good features make fine-tuning more effective by giving the model better signal to adapt to.
</Accordion>

</CodeGroup>
<Accordion title="Watch for overfitting on small data">
With fewer than ~1000 rows, fine-tuning can overfit quickly. Use fewer epochs, a lower learning rate, or consider whether [AutoTabPFN ensembles](/extensions/post-hoc-ensembles) might be more appropriate.
</Accordion>
</AccordionGroup>

## Enterprise Fine-Tuning

5. **Make predictions with the finetuned model:**
For organizations with proprietary datasets, Prior Labs offers an enterprise fine-tuning program that includes:

<CodeGroup>
- Fine-tuning on your organization's data corpus for a customized, high-performance model
- Support for fine-tuning across collections of related datasets
- Optimized training infrastructure

```python Classifier
y_pred_proba = finetuned_clf.predict_proba(X_test)
```
<Card title="Enterprise Fine-Tuning" icon="building" horizontal href="mailto:sales@priorlabs.ai">
Learn more about fine-tuning TabPFN for your organization.
</Card>


```python Regressor
y_pred = finetuned_reg.predict(X_test)
```
## Related

</CodeGroup>
<CardGroup cols={3}>
<Card title="Tips & Tricks" icon="lightbulb" href="/tips-and-tricks">
Quick wins to try before fine-tuning.
</Card>
<Card title="AutoTabPFN Ensembles" icon="brain" href="/extensions/post-hoc-ensembles">
Automated ensembling as an alternative to fine-tuning.
</Card>
<Card title="Hyperparameter Optimization" icon="cog" href="/extensions/hpo">
Automated search over TabPFN's hyperparameter space.
</Card>
</CardGroup>

<Card title="GitHub Examples" icon="book" horizontal href="https://github.com/PriorLabs/TabPFN/blob/main/examples/finetune_classifier.py">
See more examples and fine-tuning utilities in our TabPFN GitHub repository.
</Card>
</Card>
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@
"quickstart",
"models",
"best-practices",
"tips-and-tricks",
"faq"
]
},
Expand Down
Loading