Skip to content

Updated cpp#26

Open
ksherlock wants to merge 7 commits intodrh:masterfrom
ksherlock:updated_cpp
Open

Updated cpp#26
ksherlock wants to merge 7 commits intodrh:masterfrom
ksherlock:updated_cpp

Conversation

@ksherlock
Copy link
Copy Markdown

This patch some updates from the plan 9 cpp back ported to the lcc cpp.

  • variadic macros
  • fix to prevent ## concatenation when it's not appropriate
  • #warning directive
  • long character constants (L'\x80' won't warn that it's > 127)
  • \a support in strings
  • 'elsif' typo in some errors

AJBats added a commit to AJBats/saturncc that referenced this pull request Apr 18, 2026
Three compiler changes that close the remaining gap between our
output and Hitachi SHC's, plus the first byte-identical corpus
match in project history.

1. sh_elim_redundant_ext: extend the sign-extension tracker to
   recognize .w and .b displacement-mode (`@(disp,rN)`) and
   indexed-mode (`@(r0,rN)`) loads as sources. Previously only the
   indirect (`@rN`), GBR-relative, and pool-load forms were
   recognized; .w disp/indexed loads from Gap 0/Gap 2 work were
   leaking redundant exts.w through the pass.

2. sh_fuse_mov_into_add: new peephole. Collapses `mov rA,rB;
   add rB,rC` into `add rA,rC` when rB is dead after the add
   (forward liveness scan, bails at labels/branches). Common
   shape when a .w/.b load's r0 result is copied to a temp and
   then added to an accumulator — the copy is pure waste.

3. sh_elim_dead_labels: new peephole. Deletes label-only lines
   (sh_is_label_line == 1) whose label is never referenced by any
   other line. Token-boundary reference check via strstr with
   alphanumeric/underscore lookahead (so `L1` in `L10` isn't a
   false positive). LCC emits a function-exit label before the
   epilogue for multi-return functions; single-return functions
   leave it as dead code.

Impact vs prod (tier-1 byte-match):

    FUN_06044834:  20 -> 10 -> 0    <-- byte-identical
    FUN_00280710:   9 ->  9 -> 6
    FUN_06000AF8:  17 -> 17 -> 16
    FUN_06047748:  28 -> 28 -> 26
    FUN_0604025C:  30 -> 30 -> 29
    FUN_0602A664: 139 ->139 ->130
    FUN_06040EA0: 261 ->261 ->260
    FUN_06044BCC: 459 ->445 ->438
    FUN_06037E28:1074 ->1062->1053

9 of 10 corpus functions improved, 0 regressed. Broad corpus
smoke 168/168 pass (unchanged). All 12 stage-4 regression tests
pass. Subagent independently graded all three sampled functions
IMPROVED with no semantic hazards.

FUN_06044834 is the project's first byte-identical corpus match —
closes the proof-of-thesis milestone tracked in
methodology_remediation.md.

Our output matches prod's 10 instructions exactly:
    mov.w @(14,r4),r0    mov r0,r1
    mov drh#26,r0           mov.w @(r0,r4),r0    add r0,r1
    mov drh#30,r0           mov.w @(r0,r4),r0    add r0,r1
    rts                  neg r1,r0

The path: Gap 0 (int→char* refactor, 385eafb), Gap 2 (#pragma
sh_weird_rule_1, b67fe7e), Gap 9 extension + mov-fuse + dead-label
elim (this commit). Four separate layers — C source, custom
pragma, peephole pass, sign-extension tracker — composed cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant