Skip to content

Ignore binaries during ScanCode scan#265

Merged
soimkim merged 1 commit intomainfrom
test
Mar 31, 2026
Merged

Ignore binaries during ScanCode scan#265
soimkim merged 1 commit intomainfrom
test

Conversation

@soimkim
Copy link
Copy Markdown
Contributor

@soimkim soimkim commented Mar 31, 2026

Description

Enhanced binary file detection and exclusion during source code scans for improved accuracy

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

The dependency constraint for fosslight_binary was upgraded from >=5.0.0 to >=5.1.22. Additionally, binary file detection and exclusion logic was modified in the scanner to include a new pre-processing step that walks the target path, identifies binary files, and adds them to the exclusion list before scanning.

Changes

Cohort / File(s) Summary
Dependency Update
pyproject.toml
Upgraded fosslight_binary minimum version constraint from >=5.0.0 to >=5.1.22.
Binary Exclusion Logic
src/fosslight_source/run_scancode.py
Introduced pre-processing step using os.walk() to discover and exclude binary files via check_binary() calls before scanning begins; moved abs_path_to_scan initialization earlier; updated per-file binary detection to use explicit check_binary(..., True) parameter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main objective of the changeset: to ignore binary files during ScanCode scanning, which aligns with the core logic changes in run_scancode.py.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@soimkim soimkim changed the title feat(scancode): ignore binaries during ScanCode Ignore binaries during ScanCode scan Mar 31, 2026
@soimkim soimkim self-assigned this Mar 31, 2026
@soimkim soimkim added the enhancement [PR/Issue] New feature or request label Mar 31, 2026
@soimkim
Copy link
Copy Markdown
Contributor Author

soimkim commented Mar 31, 2026

Ignore

  • mac은 Python 3.14 에서 테스트하도록 하여 아직 FL Binary가 Python 3.14를 지원하지 않아 에러 발생. -> local test로 대체.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 36: The dependency line "fosslight_binary>=5.1.22" is unsatisfiable
because 5.1.22 does not exist; change the pyproject dependency to either (a)
gate the requirement with an environment marker like 'fosslight_binary>=5.1.21;
python_version < "3.13"' so only Python <3.13 installs fosslight_binary, (b) pin
to the latest available release 'fosslight_binary==5.1.21' combined with an
environment marker to avoid breaking advertised Python 3.13/3.14 support, or (c)
update the project's python_requires to <3.13 to match the dependency—apply the
chosen fix by editing the dependency string in the pyproject.toml where
"fosslight_binary>=5.1.22" appears.

In `@src/fosslight_source/run_scancode.py`:
- Around line 159-170: The os.walk over path_to_scan still traverses directories
that were previously marked excluded, causing unnecessary check_binary() calls;
modify the loop in run_scancode.py where for root, _, files in
os.walk(path_to_scan): by skipping any root that falls under the existing
excluded paths (path_to_exclude) before iterating files — e.g., normalize root
and compare against abs_path_to_scan + each path_to_exclude entry (or maintain a
set of excluded absolute prefixes) and continue when
root.startswith(excluded_prefix); keep the rest of the logic (check_binary,
building rel_path/rel_norm, appending to excluded_files, logger.debug) unchanged
so binary checks are only done for non-excluded directories.
- Around line 159-169: The parameter excluded_files currently uses a mutable
default (excluded_files: list = [])—change the function signature to use
excluded_files: Optional[list] = None (or excluded_files=None) and inside the
function initialize it with excluded_files = [] if excluded_files is None before
the os.walk loop; ensure all existing append usages (e.g., where rel_norm is
appended to excluded_files in the loop) operate on this freshly allocated list
so callers get a new list each invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6c9d1424-f927-4027-a622-ce473f8d3f8f

📥 Commits

Reviewing files that changed from the base of the PR and between 0e2bbb2 and 2042af6.

📒 Files selected for processing (2)
  • pyproject.toml
  • src/fosslight_source/run_scancode.py

Comment on lines +159 to +170
for root, _, files in os.walk(path_to_scan):
for name in files:
full_path = os.path.join(root, name)
try:
if not check_binary(full_path, True):
continue
except Exception:
continue
rel_path = os.path.relpath(full_path, abs_path_to_scan)
rel_norm = os.path.normpath(rel_path).replace("\\", "/")
excluded_files.append(rel_norm)
logger.debug(f"Excluded binary from scancode: {rel_norm}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip path_to_exclude during the binary pre-pass.

The new os.walk(path_to_scan) still descends into trees that were already excluded above, so large ignored directories still pay the full check_binary() cost. On repos that exclude build/vendor output, this can easily become the dominant runtime of the scan.

🧰 Tools
🪛 Ruff (0.15.7)

[error] 165-166: try-except-continue detected, consider logging the exception

(S112)


[warning] 165-165: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_source/run_scancode.py` around lines 159 - 170, The os.walk
over path_to_scan still traverses directories that were previously marked
excluded, causing unnecessary check_binary() calls; modify the loop in
run_scancode.py where for root, _, files in os.walk(path_to_scan): by skipping
any root that falls under the existing excluded paths (path_to_exclude) before
iterating files — e.g., normalize root and compare against abs_path_to_scan +
each path_to_exclude entry (or maintain a set of excluded absolute prefixes) and
continue when root.startswith(excluded_prefix); keep the rest of the logic
(check_binary, building rel_path/rel_norm, appending to excluded_files,
logger.debug) unchanged so binary checks are only done for non-excluded
directories.

Comment on lines +159 to +169
for root, _, files in os.walk(path_to_scan):
for name in files:
full_path = os.path.join(root, name)
try:
if not check_binary(full_path, True):
continue
except Exception:
continue
rel_path = os.path.relpath(full_path, abs_path_to_scan)
rel_norm = os.path.normpath(rel_path).replace("\\", "/")
excluded_files.append(rel_norm)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

head -70 src/fosslight_source/run_scancode.py | tail -20

Repository: fosslight/fosslight_source_scanner

Length of output: 832


🏁 Script executed:

sed -n '50,85p' src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 1436


🏁 Script executed:

sed -n '159,175p' src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 977


🏁 Script executed:

sed -n '1,30p' src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 1022


🏁 Script executed:

rg "run_scan\(" -A 2 -B 2 --type py

Repository: fosslight/fosslight_source_scanner

Length of output: 4170


🏁 Script executed:

fd -e py -type f | xargs grep -l "run_scan" | head -10

Repository: fosslight/fosslight_source_scanner

Length of output: 248


🏁 Script executed:

rg "run_scan\(" src/ tests/ -A 3 --type py | grep -A 3 "run_scan("

Repository: fosslight/fosslight_source_scanner

Length of output: 3763


🏁 Script executed:

sed -n '1,300p' src/fosslight_source/cli.py | grep -A 10 "run_scan"

Repository: fosslight/fosslight_source_scanner

Length of output: 1101


🏁 Script executed:

grep -n "excluded_files" src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 344


🏁 Script executed:

sed -n '100,160p' src/fosslight_source/cli.py | grep -A 15 "run_scan"

Repository: fosslight/fosslight_source_scanner

Length of output: 769


🏁 Script executed:

sed -n '135,155p' src/fosslight_source/cli.py

Repository: fosslight/fosslight_source_scanner

Length of output: 940


🏁 Script executed:

grep -n "def run_scanners" src/fosslight_source/*.py

Repository: fosslight/fosslight_source_scanner

Length of output: 127


🏁 Script executed:

sed -n '90,150p' src/fosslight_source/cli.py

Repository: fosslight/fosslight_source_scanner

Length of output: 2356


🏁 Script executed:

sed -n '389,500p' src/fosslight_source/cli.py

Repository: fosslight/fosslight_source_scanner

Length of output: 5896


🏁 Script executed:

sed -n '389,600p' src/fosslight_source/cli.py | head -100

Repository: fosslight/fosslight_source_scanner

Length of output: 5506


Fix mutable default argument pattern for excluded_files.

The function signature defines excluded_files: list = [] as a default parameter. While all current callers explicitly provide this argument, the mutable default pattern is a Python anti-pattern that could cause issues if calling conventions change. Replace the default with None and allocate a fresh list per invocation before appending (line 169).

🧰 Tools
🪛 Ruff (0.15.7)

[error] 165-166: try-except-continue detected, consider logging the exception

(S112)


[warning] 165-165: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_source/run_scancode.py` around lines 159 - 169, The parameter
excluded_files currently uses a mutable default (excluded_files: list =
[])—change the function signature to use excluded_files: Optional[list] = None
(or excluded_files=None) and inside the function initialize it with
excluded_files = [] if excluded_files is None before the os.walk loop; ensure
all existing append usages (e.g., where rel_norm is appended to excluded_files
in the loop) operate on this freshly allocated list so callers get a new list
each invocation.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Caution

Review failed

An error occurred during the review process. Please try again later.

📝 Walkthrough

Walkthrough

The dependency constraint for fosslight_binary was upgraded from >=5.0.0 to >=5.1.22. Additionally, binary file detection and exclusion logic was modified in the scanner to include a new pre-processing step that walks the target path, identifies binary files, and adds them to the exclusion list before scanning.

Changes

Cohort / File(s) Summary
Dependency Update
pyproject.toml
Upgraded fosslight_binary minimum version constraint from >=5.0.0 to >=5.1.22.
Binary Exclusion Logic
src/fosslight_source/run_scancode.py
Introduced pre-processing step using os.walk() to discover and exclude binary files via check_binary() calls before scanning begins; moved abs_path_to_scan initialization earlier; updated per-file binary detection to use explicit check_binary(..., True) parameter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main objective of the changeset: to ignore binary files during ScanCode scanning, which aligns with the core logic changes in run_scancode.py.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@soimkim soimkim merged commit 02adf69 into main Mar 31, 2026
6 of 7 checks passed
@soimkim soimkim deleted the test branch March 31, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement [PR/Issue] New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant