Skip to content

Commit cbfafac

Browse files
authored
Merge pull request #20 from LinearBoost/v0.1.7
V0.1.7
2 parents 5a227ec + 88d090e commit cbfafac

5 files changed

Lines changed: 836 additions & 149 deletions

File tree

README.md

Lines changed: 62 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LinearBoost Classifier
22

3-
![Lastest Release](https://img.shields.io/badge/release-v0.1.6-green)
3+
![Latest Release](https://img.shields.io/badge/release-v0.1.7-green)
44
[![PyPI Version](https://img.shields.io/pypi/v/linearboost)](https://pypi.org/project/linearboost/)
55
![Python Versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)
66
[![PyPI Downloads](https://static.pepy.tech/badge/linearboost)](https://pepy.tech/projects/linearboost)
@@ -32,9 +32,45 @@ Key Features:
3232

3333
---
3434

35-
## 🚀 New in Version 0.1.6
35+
## 🚀 New in Version 0.1.7
3636

37-
The latest release introduces major architectural improvements designed for **scalability**, **robustness on imbalanced data**, and **training speed**.
37+
### Gradient Boosting Mode
38+
39+
LinearBoost now supports **gradient boosting** in addition to AdaBoost via the `boosting_type` parameter:
40+
41+
- **`boosting_type='adaboost'`** (default): Classic AdaBoost (SAMME or SAMME.R) that reweights samples by classification error.
42+
- **`boosting_type='gradient'`**: Fits each base estimator to pseudo-residuals (negative gradient of log-loss). Often better for highly non-linear or XOR-like patterns and smoother decision boundaries.
43+
44+
```python
45+
# Gradient boosting for complex non-linear patterns
46+
clf = LinearBoostClassifier(
47+
boosting_type='gradient',
48+
n_estimators=200,
49+
kernel='rbf'
50+
)
51+
```
52+
53+
### Class Weighting & Custom Loss
54+
55+
- **`class_weight`**: Use `'balanced'` or a dict of class weights for imbalanced data. Weights are applied in the boosting loop.
56+
- **`loss_function`**: Optional callable `(y_true, y_pred, sample_weight) -> float` for custom optimization objectives.
57+
58+
```python
59+
clf = LinearBoostClassifier(
60+
class_weight='balanced', # Adjust for imbalanced classes
61+
n_estimators=200
62+
)
63+
```
64+
65+
### Default Algorithm
66+
67+
The default **`algorithm`** is now **`'SAMME.R'`** for faster convergence and typically lower test error with fewer iterations (when using `boosting_type='adaboost'`).
68+
69+
---
70+
71+
## 🚀 New in Version 0.1.5
72+
73+
Version 0.1.5 introduced major architectural improvements designed for **scalability**, **robustness on imbalanced data**, and **training speed**.
3874

3975
### ⚡ Scalable Kernel Approximation
4076

@@ -157,8 +193,7 @@ Version 0.1.2 of **LinearBoost Classifier** is released. Here are the changes:
157193
- Improved Scikit-learn compatibility.
158194

159195

160-
Get Started and Documentation
161-
-----------------------------
196+
## Get Started and Documentation
162197

163198
The documentation is available at https://linearboost.readthedocs.io/.
164199

@@ -172,13 +207,20 @@ The following parameters yielded optimal results during testing. All results are
172207
- **`learning_rate`**:
173208
Values between 0.01 and 1 typically perform well. Adjust based on the dataset's complexity and noise.
174209

175-
- **`algorithm`**:
176-
Use either `SAMME` or `SAMME.R`. The choice depends on the specific problem:
210+
- **`algorithm`** (when `boosting_type='adaboost'`):
211+
Use either `SAMME` or `SAMME.R` (default). SAMME.R typically converges faster with lower test error.
177212
- `SAMME`: May be better for datasets with clearer separations between classes.
178-
- `SAMME.R`: Can handle more nuanced class probabilities.
213+
- `SAMME.R`: Uses class probabilities; often better for nuanced boundaries.
179214

180215
**Note:** As of scikit-learn v1.6, the `algorithm` parameter is deprecated and will be removed in v1.8. LinearBoostClassifier will only implement the 'SAMME' algorithm in newer versions.
181216

217+
- **`boosting_type`** *(new in v0.1.7)*:
218+
- `'adaboost'`: Classic AdaBoost (default).
219+
- `'gradient'`: Gradient boosting on pseudo-residuals; try for highly non-linear or XOR-like data.
220+
221+
- **`class_weight`** *(new in v0.1.7)*:
222+
Use `'balanced'` for imbalanced datasets so class weights are adjusted automatically.
223+
182224
- **`scaler`**:
183225
The following scaling methods are recommended based on dataset characteristics:
184226
- `minmax`: Best for datasets where features are on different scales but bounded.
@@ -193,25 +235,24 @@ The following parameters yielded optimal results during testing. All results are
193235
- `poly`: For polynomial relationships.
194236
- `sigmoid`: For sigmoid-like decision boundaries.
195237

196-
- **`kernel_approx`** *(new in v0.1.6)*:
238+
- **`kernel_approx`** *(new in v0.1.5)*:
197239
For large datasets with non-linear kernels:
198240
- `None`: Use full kernel matrix (default, exact but \(O(n^2)\) memory).
199241
- `'rff'`: Random Fourier Features (only with `kernel='rbf'`).
200242
- `'nystrom'`: Nyström approximation (works with any kernel).
201243

202-
- **`subsample`** *(new in v0.1.6)*:
244+
- **`subsample`** *(new in v0.1.5)*:
203245
Values in (0, 1] control stochastic boosting. Use `0.8` for variance reduction while maintaining speed.
204246

205-
- **`shrinkage`** *(new in v0.1.6)*:
247+
- **`shrinkage`** *(new in v0.1.5)*:
206248
Values in (0, 1] scale each estimator's contribution. Use `0.8-0.95` to improve generalization.
207249

208-
- **`early_stopping`** *(new in v0.1.6)*:
250+
- **`early_stopping`** *(new in v0.1.5)*:
209251
Set to `True` with `n_iter_no_change=5` and `tol=1e-4` to automatically stop training when validation performance plateaus.
210252

211253
These parameters should serve as a solid starting point for most datasets. For fine-tuning, consider using hyperparameter optimization tools like [Optuna](https://optuna.org/).
212254

213-
Results
214-
-------
255+
## Results
215256

216257
All of the results are reported based on 10-fold Cross-Validation. The weighted F1 score is reported, i.e. f1_score(y_valid, y_pred, average = 'weighted').
217258

@@ -337,6 +378,8 @@ params = {
337378
'algorithm': trial.suggest_categorical('algorithm', ['SAMME', 'SAMME.R']),
338379
'scaler': trial.suggest_categorical('scaler', ['minmax', 'robust', 'quantile-uniform', 'quantile-normal']),
339380
'kernel': trial.suggest_categorical('kernel', ['linear', 'rbf', 'poly']),
381+
'boosting_type': trial.suggest_categorical('boosting_type', ['adaboost', 'gradient']),
382+
'class_weight': trial.suggest_categorical('class_weight', [None, 'balanced']),
340383
'subsample': trial.suggest_float('subsample', 0.6, 1.0),
341384
'shrinkage': trial.suggest_float('shrinkage', 0.7, 1.0),
342385
'early_stopping': True,
@@ -353,19 +396,18 @@ LinearBoost's combination of **runtime efficiency** and **high accuracy** makes
353396
*Discusses how LinearBoost outperforms traditional boosting frameworks in terms of speed while maintaining accuracy.*
354397

355398

356-
Future Developments
357-
-----------------------------
399+
## Future Developments
400+
358401
These are not yet supported in this current version, but are in the future plans:
359402
- Supporting categorical variables natively
360403
- Adding regression support (`LinearBoostRegressor`)
361404
- Multi-output classification
362405

363-
Reference Paper
364-
-----------------------------
406+
## Reference Paper
407+
365408
The paper is written by Hamidreza Keshavarz (Independent Researcher based in Berlin, Germany) and Reza Rawassizadeh (Department of Computer Science, Metropolitan college, Boston University, United States). It will be available soon.
366409

367-
License
368-
-------
410+
## License
369411

370412
This project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/LinearBoost/linearboost-classifier/blob/main/LICENSE) for additional details.
371413

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ authors = [
1111
]
1212
description = "LinearBoost Classifier is a rapid and accurate classification algorithm that builds upon a very fast, linear classifier."
1313
readme = "README.md"
14-
readme-content-type = "text/markdown"
1514
keywords = [
1615
"classification", "classifier", "linear", "adaboost", "boosting", "boost"
1716
]

src/linearboost/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "0.1.6"
1+
__version__ = "0.1.7"
22

33
from .linear_boost import LinearBoostClassifier
44
from .sefr import SEFR

0 commit comments

Comments
 (0)