Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ dependencies = [
"PyYAML",
"wheel>=0.38.1",
"intbitset",
"fosslight_binary>=5.0.0",
"fosslight_binary>=5.1.22",
Comment thread
soimkim marked this conversation as resolved.
"scancode-toolkit>=32.0.2",
"fingerprints==1.2.3",
"normality==2.6.1",
Expand Down
17 changes: 15 additions & 2 deletions src/fosslight_source/run_scancode.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ def run_scan(
pretty_params["path_to_exclude"] = path_to_exclude
pretty_params["output_file"] = output_file_name
total_files_to_excluded = []
abs_path_to_scan = os.path.abspath(path_to_scan)
if path_to_exclude:
abs_path_to_scan = os.path.abspath(path_to_scan)
for path in path_to_exclude:
if os.path.isabs(path):
exclude_path = os.path.relpath(path, abs_path_to_scan)
Expand Down Expand Up @@ -156,6 +156,19 @@ def run_scan(
else:
total_files_to_excluded.append(exclude_path_normalized)

for root, _, files in os.walk(path_to_scan):
for name in files:
full_path = os.path.join(root, name)
try:
if not check_binary(full_path, True):
continue
except Exception:
continue
rel_path = os.path.relpath(full_path, abs_path_to_scan)
rel_norm = os.path.normpath(rel_path).replace("\\", "/")
excluded_files.append(rel_norm)
Comment on lines +159 to +169
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

head -70 src/fosslight_source/run_scancode.py | tail -20

Repository: fosslight/fosslight_source_scanner

Length of output: 832


🏁 Script executed:

sed -n '50,85p' src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 1436


🏁 Script executed:

sed -n '159,175p' src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 977


🏁 Script executed:

sed -n '1,30p' src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 1022


🏁 Script executed:

rg "run_scan\(" -A 2 -B 2 --type py

Repository: fosslight/fosslight_source_scanner

Length of output: 4170


🏁 Script executed:

fd -e py -type f | xargs grep -l "run_scan" | head -10

Repository: fosslight/fosslight_source_scanner

Length of output: 248


🏁 Script executed:

rg "run_scan\(" src/ tests/ -A 3 --type py | grep -A 3 "run_scan("

Repository: fosslight/fosslight_source_scanner

Length of output: 3763


🏁 Script executed:

sed -n '1,300p' src/fosslight_source/cli.py | grep -A 10 "run_scan"

Repository: fosslight/fosslight_source_scanner

Length of output: 1101


🏁 Script executed:

grep -n "excluded_files" src/fosslight_source/run_scancode.py

Repository: fosslight/fosslight_source_scanner

Length of output: 344


🏁 Script executed:

sed -n '100,160p' src/fosslight_source/cli.py | grep -A 15 "run_scan"

Repository: fosslight/fosslight_source_scanner

Length of output: 769


🏁 Script executed:

sed -n '135,155p' src/fosslight_source/cli.py

Repository: fosslight/fosslight_source_scanner

Length of output: 940


🏁 Script executed:

grep -n "def run_scanners" src/fosslight_source/*.py

Repository: fosslight/fosslight_source_scanner

Length of output: 127


🏁 Script executed:

sed -n '90,150p' src/fosslight_source/cli.py

Repository: fosslight/fosslight_source_scanner

Length of output: 2356


🏁 Script executed:

sed -n '389,500p' src/fosslight_source/cli.py

Repository: fosslight/fosslight_source_scanner

Length of output: 5896


🏁 Script executed:

sed -n '389,600p' src/fosslight_source/cli.py | head -100

Repository: fosslight/fosslight_source_scanner

Length of output: 5506


Fix mutable default argument pattern for excluded_files.

The function signature defines excluded_files: list = [] as a default parameter. While all current callers explicitly provide this argument, the mutable default pattern is a Python anti-pattern that could cause issues if calling conventions change. Replace the default with None and allocate a fresh list per invocation before appending (line 169).

🧰 Tools
🪛 Ruff (0.15.7)

[error] 165-166: try-except-continue detected, consider logging the exception

(S112)


[warning] 165-165: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_source/run_scancode.py` around lines 159 - 169, The parameter
excluded_files currently uses a mutable default (excluded_files: list =
[])—change the function signature to use excluded_files: Optional[list] = None
(or excluded_files=None) and inside the function initialize it with
excluded_files = [] if excluded_files is None before the os.walk loop; ensure
all existing append usages (e.g., where rel_norm is appended to excluded_files
in the loop) operate on this freshly allocated list so callers get a new list
each invocation.

logger.debug(f"Excluded binary from scancode: {rel_norm}")
Comment on lines +159 to +170
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip path_to_exclude during the binary pre-pass.

The new os.walk(path_to_scan) still descends into trees that were already excluded above, so large ignored directories still pay the full check_binary() cost. On repos that exclude build/vendor output, this can easily become the dominant runtime of the scan.

🧰 Tools
🪛 Ruff (0.15.7)

[error] 165-166: try-except-continue detected, consider logging the exception

(S112)


[warning] 165-165: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_source/run_scancode.py` around lines 159 - 170, The os.walk
over path_to_scan still traverses directories that were previously marked
excluded, causing unnecessary check_binary() calls; modify the loop in
run_scancode.py where for root, _, files in os.walk(path_to_scan): by skipping
any root that falls under the existing excluded paths (path_to_exclude) before
iterating files — e.g., normalize root and compare against abs_path_to_scan +
each path_to_exclude entry (or maintain a set of excluded absolute prefixes) and
continue when root.startswith(excluded_prefix); keep the rest of the logic
(check_binary, building rel_path/rel_norm, appending to excluded_files,
logger.debug) unchanged so binary checks are only done for non-excluded
directories.


if excluded_files:
total_files_to_excluded.extend(f"**/{file_path}" for file_path in excluded_files)

Expand Down Expand Up @@ -207,7 +220,7 @@ def run_scan(
for scan_item in result_list:
if os.path.isdir(scan_item.source_name_or_path):
continue
if check_binary(os.path.join(path_to_scan, scan_item.source_name_or_path)):
if check_binary(os.path.join(path_to_scan, scan_item.source_name_or_path), True):
scan_item.exclude = True
except Exception as ex:
success = False
Expand Down
Loading