You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The latest release introduces major architectural improvements designed for **scalability**, **robustness on imbalanced data**, and **training speed**.
37
+
### Gradient Boosting Mode
38
+
39
+
LinearBoost now supports **gradient boosting** in addition to AdaBoost via the `boosting_type` parameter:
40
+
41
+
-**`boosting_type='adaboost'`** (default): Classic AdaBoost (SAMME or SAMME.R) that reweights samples by classification error.
42
+
-**`boosting_type='gradient'`**: Fits each base estimator to pseudo-residuals (negative gradient of log-loss). Often better for highly non-linear or XOR-like patterns and smoother decision boundaries.
43
+
44
+
```python
45
+
# Gradient boosting for complex non-linear patterns
46
+
clf = LinearBoostClassifier(
47
+
boosting_type='gradient',
48
+
n_estimators=200,
49
+
kernel='rbf'
50
+
)
51
+
```
52
+
53
+
### Class Weighting & Custom Loss
54
+
55
+
-**`class_weight`**: Use `'balanced'` or a dict of class weights for imbalanced data. Weights are applied in the boosting loop.
class_weight='balanced', # Adjust for imbalanced classes
61
+
n_estimators=200
62
+
)
63
+
```
64
+
65
+
### Default Algorithm
66
+
67
+
The default **`algorithm`** is now **`'SAMME.R'`** for faster convergence and typically lower test error with fewer iterations (when using `boosting_type='adaboost'`).
68
+
69
+
---
70
+
71
+
## 🚀 New in Version 0.1.5
72
+
73
+
Version 0.1.5 introduced major architectural improvements designed for **scalability**, **robustness on imbalanced data**, and **training speed**.
38
74
39
75
### ⚡ Scalable Kernel Approximation
40
76
@@ -157,8 +193,7 @@ Version 0.1.2 of **LinearBoost Classifier** is released. Here are the changes:
157
193
- Improved Scikit-learn compatibility.
158
194
159
195
160
-
Get Started and Documentation
161
-
-----------------------------
196
+
## Get Started and Documentation
162
197
163
198
The documentation is available at https://linearboost.readthedocs.io/.
164
199
@@ -172,13 +207,20 @@ The following parameters yielded optimal results during testing. All results are
172
207
-**`learning_rate`**:
173
208
Values between 0.01 and 1 typically perform well. Adjust based on the dataset's complexity and noise.
174
209
175
-
-**`algorithm`**:
176
-
Use either `SAMME` or `SAMME.R`. The choice depends on the specific problem:
Use either `SAMME` or `SAMME.R` (default). SAMME.R typically converges faster with lower test error.
177
212
-`SAMME`: May be better for datasets with clearer separations between classes.
178
-
-`SAMME.R`: Can handle more nuanced class probabilities.
213
+
-`SAMME.R`: Uses class probabilities; often better for nuanced boundaries.
179
214
180
215
**Note:** As of scikit-learn v1.6, the `algorithm` parameter is deprecated and will be removed in v1.8. LinearBoostClassifier will only implement the 'SAMME' algorithm in newer versions.
181
216
217
+
-**`boosting_type`***(new in v0.1.7)*:
218
+
-`'adaboost'`: Classic AdaBoost (default).
219
+
-`'gradient'`: Gradient boosting on pseudo-residuals; try for highly non-linear or XOR-like data.
220
+
221
+
-**`class_weight`***(new in v0.1.7)*:
222
+
Use `'balanced'` for imbalanced datasets so class weights are adjusted automatically.
223
+
182
224
-**`scaler`**:
183
225
The following scaling methods are recommended based on dataset characteristics:
184
226
-`minmax`: Best for datasets where features are on different scales but bounded.
@@ -193,25 +235,24 @@ The following parameters yielded optimal results during testing. All results are
193
235
-`poly`: For polynomial relationships.
194
236
-`sigmoid`: For sigmoid-like decision boundaries.
195
237
196
-
-**`kernel_approx`***(new in v0.1.6)*:
238
+
-**`kernel_approx`***(new in v0.1.5)*:
197
239
For large datasets with non-linear kernels:
198
240
-`None`: Use full kernel matrix (default, exact but \(O(n^2)\) memory).
199
241
-`'rff'`: Random Fourier Features (only with `kernel='rbf'`).
200
242
-`'nystrom'`: Nyström approximation (works with any kernel).
201
243
202
-
-**`subsample`***(new in v0.1.6)*:
244
+
-**`subsample`***(new in v0.1.5)*:
203
245
Values in (0, 1] control stochastic boosting. Use `0.8` for variance reduction while maintaining speed.
204
246
205
-
-**`shrinkage`***(new in v0.1.6)*:
247
+
-**`shrinkage`***(new in v0.1.5)*:
206
248
Values in (0, 1] scale each estimator's contribution. Use `0.8-0.95` to improve generalization.
207
249
208
-
-**`early_stopping`***(new in v0.1.6)*:
250
+
-**`early_stopping`***(new in v0.1.5)*:
209
251
Set to `True` with `n_iter_no_change=5` and `tol=1e-4` to automatically stop training when validation performance plateaus.
210
252
211
253
These parameters should serve as a solid starting point for most datasets. For fine-tuning, consider using hyperparameter optimization tools like [Optuna](https://optuna.org/).
212
254
213
-
Results
214
-
-------
255
+
## Results
215
256
216
257
All of the results are reported based on 10-fold Cross-Validation. The weighted F1 score is reported, i.e. f1_score(y_valid, y_pred, average = 'weighted').
@@ -353,19 +396,18 @@ LinearBoost's combination of **runtime efficiency** and **high accuracy** makes
353
396
*Discusses how LinearBoost outperforms traditional boosting frameworks in terms of speed while maintaining accuracy.*
354
397
355
398
356
-
Future Developments
357
-
-----------------------------
399
+
## Future Developments
400
+
358
401
These are not yet supported in this current version, but are in the future plans:
359
402
- Supporting categorical variables natively
360
403
- Adding regression support (`LinearBoostRegressor`)
361
404
- Multi-output classification
362
405
363
-
Reference Paper
364
-
-----------------------------
406
+
## Reference Paper
407
+
365
408
The paper is written by Hamidreza Keshavarz (Independent Researcher based in Berlin, Germany) and Reza Rawassizadeh (Department of Computer Science, Metropolitan college, Boston University, United States). It will be available soon.
366
409
367
-
License
368
-
-------
410
+
## License
369
411
370
412
This project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/LinearBoost/linearboost-classifier/blob/main/LICENSE) for additional details.
0 commit comments