Skip to content

Latest commit

 

History

History
37 lines (26 loc) · 1.88 KB

File metadata and controls

37 lines (26 loc) · 1.88 KB

Processing

Turn collected logs into CSV summaries.

Processors

Outputs

  • summary.csv written to 00_DATA/00_PROCESSED/RUN_FOLDER/
  • Individual per-file CSVs are also generated alongside summary.csv

CSV schemas

  • v4/log.py: columns = [IP Address, Access Date, Module Viewed, Status Code, Data Saved (GB), Device Used, Browser Used]
  • v5/logv2.py: columns = [IP Address, Access Date, Module Viewed, Status Code, Data Saved (GB), Device Used, Browser Used]
  • v5/castle.py: columns = [IP Address, Access Date, Access Time, Module Viewed, Location Viewed, Status Code, Data Saved (GB), Device Used, Browser Used]
  • v3/dhub.py: columns = [IP Address, Access Date, Module Viewed, Status Code, Data Saved (GB), Device Used, Browser Used]
  • v6/log-v6.py: columns = [IP Address, Access Date, Module Viewed, Status Code, Data Saved (GB), Device Used, Browser Used]

Notes & edge cases

  • logv2.py expects each line to be JSON with a message field containing a combined-log-like string
  • dhub.py is based on logv2.py but extracts module names from extended D-Hub paths: /uploads/modules/[id]/[module-name], /modules/[id]/[module-name], or /uploads/other-modules/[module-name]
  • log-v6.py is similar to dhub.py but for logs stored in /var/log/oc4d folder with v6-*.log filename pattern
  • castle.py parses a more structured message; it logs regex and timestamp errors into error_log.txt in the processed folder and normalizes IPv6 ::ffff: prefix
  • All processors normalize sizes to gigabytes and parse user agents to OS family and browser family

Usage