[ENH] Add TextFeatures transformer for text feature extraction#880
[ENH] Add TextFeatures transformer for text feature extraction#880ankitlade12 wants to merge 28 commits intofeature-engine:mainfrom
Conversation
ankitlade12
commented
Jan 8, 2026
- Add TextFeatures class to extract features from text columns
- Support for features: char_count, word_count, digit_count, uppercase_count, etc.
- Add comprehensive tests with pytest parametrize
- Add user guide documentation
solegalli
left a comment
There was a problem hiding this comment.
Hi @ankitlade12
Thanks a lot!
This transformer, function-wise, I'd say it's ready. I made a few suggestions regarding how to optimize the feature creation functions. Let me know if they make sense.
Other than that, we need the various docs file and we'll be good to go :)
Thanks again!
- Add ArcSinhTransformer class with loc and scale parameters - Support for positive and negative values (unlike LogTransformer) - Includes inverse_transform method - Add comprehensive tests with pytest parametrize - Add user guide documentation
…ide with comparison and references
|
We need to rebase main so the 2 remaining tests pass. |
solegalli
left a comment
There was a problem hiding this comment.
Hi @ankitlade12
I am very sorry for the delayed review. I am travelling till end of April, so I am a bit slower than usual.
I think, for the first version of the transformer, let's enforce the user to pass the names of the text variables. They can pass one or more variables in case there are more than one text column.
Other than that, we need to add the tranformer in the docs/index file, in the readme, and in the docs/api, and adjust the tests and the demo to the newer functionality. Then it is good to merge.
Thank you very much for this great addition.
ba25768 to
ba69e23
Compare
|
Hey @solegalli, I tracked down the cause of the CI failures. They are caused by Pandas 2.2/3.0 breaking changes in the CI environment (specifically datetime string formatting and select_dtypes behavior). Because these are breaking the entire library (280 failures), they need to be fixed in the main branch first. My |
ba69e23 to
41f9528
Compare
|
Hi @ankitlade12 Something went wrong here. The arcsintransformer files, for some reason are in this PR. We need to remove all commits starting from Could you please take a look? Thanks a lot! |
|
I made the tests passing optional, because there is a lot to fix on the pandas 3 side. If you want to go ahead and remove unrelated files, we can merge this transformer while we work on the maintenance fixes. |