Skip to content

feat: MySQL X Protocol plugin (dynamically loaded)#5593

Open
renecannao wants to merge 130 commits intov3.0from
ProtocolX
Open

feat: MySQL X Protocol plugin (dynamically loaded)#5593
renecannao wants to merge 130 commits intov3.0from
ProtocolX

Conversation

@renecannao
Copy link
Copy Markdown
Contributor

@renecannao renecannao commented Apr 10, 2026

Summary

Adds a MySQL X Protocol plugin to ProxySQL, enabling it to accept, authenticate, and route connections using MySQL's X Protocol (port 33060). The plugin is built as a dynamically loaded .so — no core engine modifications required for the plugin itself.

This is the first protocol plugin for ProxySQL, establishing the plugin infrastructure (ProxySQL_PluginManager) that future plugins can reuse.

61 commits, 80 files changed, ~34,500 lines added.


What's Included

Plugin Infrastructure (Core Changes)

  • ProxySQL_PluginManager — dlopen/dlsym-based plugin loader with lifecycle management (load → start → stop)
  • ProxySQL_Plugin C ABI — stable interface header for third-party plugin development
  • Admin command dispatch — dispatch_plugin_admin_command routes plugin-specific SQL commands
  • Config parsing — plugin_modules field in ProxySQL config for declaring plugins
  • Lifecycle hooks in main.cpp — LoadConfiguredPlugins / StartConfiguredPlugins / StopConfiguredPlugins

MySQL X Protocol Plugin (plugins/mysqlx/)

  • 8 source files + 8 headers implementing the full plugin
  • Protobuf bindings — pre-compiled .pb.cc/.pb.h from MySQL X Protocol .proto definitions (committed, not generated at build time)
  • Config Store — Runtime/memory configuration for users, routes, listeners, variables
  • Worker threads — libev-based event loop for accepting and managing X Protocol connections
  • Frontend sessions — X Protocol handshake, MYSQL41 and PLAIN authentication
  • Backend sessions — Hostgroup-based routing to MySQL backend X Protocol ports
  • Stats — Connection counters, latency histograms, routing metrics
  • Admin schema — 7 SQLite tables (mysqlx_users, mysqlx_routes, mysqlx_listeners, mysqlx_variables, runtime_mysqlx_*, stats_mysqlx_*)

Admin Commands

Standard ProxySQL syntax, completely separate from MySQL commands:

LOAD MYSQLX USERS TO RUNTIME
LOAD MYSQLX ROUTES TO RUNTIME
LOAD MYSQLX LISTENERS TO RUNTIME
LOAD MYSQLX VARIABLES TO RUNTIME
SAVE MYSQLX USERS FROM MEMORY
SAVE MYSQLX ROUTES FROM MEMORY
-- etc. (with FROM/TO alias support)

Testing (292+ assertions across 16 test files)

  • 11 unit tests — Protocol parsing, config store, admin schema, stats, plugin manager, plugin registry, plugin config, listener smoke test
  • 2 integration tests — Admin schema + admin commands end-to-end
  • 3 E2E tests — Handshake, routing, and full flow against real MySQL 8.x X Protocol
  • Test harnessfake_plugin.cpp + test_globals.h + test_init.h for unit testing plugin code

CI

  • .github/workflows/CI-mysqlx.yml — GitHub Actions workflow with unit-tests and e2e-tests jobs
  • Uses dbdeployer (not Docker Compose) for backend provisioning
  • E2E test group in test/tap/groups/mysqlx-e2e/

Documentation

  • doc/mysqlx/README.md — Reference manual (tables, commands, auth, routing, quick start)
  • doc/mysqlx/ARCHITECTURE.md — Architecture with ASCII diagrams, data flows, thread model, ABI contract
  • doc/mysqlx/TESTING.md — Testing documentation with coverage map
  • docs/superpowers/specs/ — Design spec + canonical commands design

Security Hardening (from Code Review)

  • Constant-time auth comparison via CRYPTO_memcmp
  • DNS resolution via getaddrinfo (no gethostbyname)
  • IPv6 support in both listener and backend sockets
  • TCP_NODELAY on all X Protocol sockets
  • EINTR handling on read/write syscalls
  • PLAIN auth hardened against timing attacks

Not in Scope (Phase 2)

  • TLS/SSL for X Protocol connections
  • Cluster sync for mysqlx_* admin tables
  • SHA256 authentication method
  • X Plugin CRAM authentication

How to Test Locally

# Build
make clean && make debug

# Build plugin
cd plugins/mysqlx && make

# Run unit tests
cd test/tap && make build_tap_test_debug
cd tests/unit && make mysqlx_*_unit-t plugin_*_unit-t

# Run with a real MySQL 8.x backend
# (see test/tap/groups/mysqlx-e2e/env.sh for setup)

Summary by CodeRabbit

  • New Features

    • MySQL X Protocol plugin: X Protocol listeners, routing, backend endpoint selection, session/connection pooling, auth (MYSQL41/PLAIN), TLS groundwork, and runtime stats.
  • New Admin Commands

    • Canonical LOAD/SAVE MYSQLX commands for users/routes/backend endpoints (new aliases supported).
  • Documentation

    • Comprehensive architecture, design, testing, API, plans, and parity/feature-gap docs for the plugin and plugin API.
  • Infrastructure

    • CI workflow added for unit/integration/E2E tests and protobuf-backed artifacts.
  • Chores

    • Build/clean and VCS ignore updates to include plugin outputs.

Plugin start now opens TCP listeners for each active route in
runtime_mysqlx_routes.  An accept thread dispatches incoming
connections to worker threads round-robin.  Phase 1 workers
accept-then-close; Tasks 7-8 will add X Protocol handshake
and backend relay.
Compile mysqlx*.proto files from MySQL Connector-C 8.4.0 into
C++ sources.  The plugin .so now links -lprotobuf for X Protocol
frame/message encoding and decoding needed by Tasks 7-8.
Add mysqlx_protocol.h/cpp with frame encode/decode, MYSQL41 scramble
auth using EVP SHA1, and protobuf message helpers for X Protocol
Error/Ok/Capabilities/Auth frames.

Add mysqlx_frontend_session.h/cpp implementing the full handshake
state machine: CapabilitiesGet→Capabilities, CapabilitiesSet→Ok,
AuthenticateStart→challenge→AuthenticateContinue→AuthenticateOk.

Supports MYSQL41 and PLAIN auth methods.  Enforces x_enabled flag,
rejects pass_through backend auth mode.  Workers now run the
frontend session instead of immediately closing connections.
Add MysqlxBackendSession with TCP connect, X Protocol MYSQL41 auth
to backend, and bidirectional byte-level relay via select().

Implement round_robin and round_robin_with_fallback endpoint
selection strategies in MysqlxConfigStore::pick_endpoint().

Workers now connect to the backend after frontend auth and relay
traffic until close.  Phase 1 is one frontend = one backend, no
pooling.
Add MysqlxStatsStore with atomic route counters (conn_ok, conn_err,
bytes_sent, bytes_recv) and flush_to_sqlite() for stats_mysqlx_routes.

Register stats_mysqlx_routes and stats_mysqlx_processlist tables
in the plugin admin schema.

Wire stats recording into the worker: record_conn_ok on successful
backend connect, record_conn_err on failure.

Add topology generation capture at session bind time as the Phase 2
seam for GR notification-driven route invalidation.
1. SQL injection in stats flush: escape route names with sqlite_escape()
2. Timing attack in PLAIN auth: use CRYPTO_memcmp for constant-time
   password comparison
3. OOM via oversized frame: add MYSQLX_MAX_PAYLOAD_SIZE (16MB) check
   in mysqlx_read_frame before allocating payload buffer
4. Thread safety: add shared_mutex to MysqlxConfigStore, shared_lock
   for readers (resolve_identity, pick_endpoint), unique_lock for
   writer (load_from_runtime), separate rr_mutex_ for round-robin
   counter mutation
1. Replace select() with poll() in accept loop and relay to avoid
   FD_SETSIZE undefined behavior with high file descriptors
2. Make g_rr_index atomic for safe concurrent access
3. Add SO_RCVTIMEO/SO_SNDTIMEO (10s backend, 30s frontend) to
   prevent indefinite blocking from slow peers
4. Reject TLS CapabilitiesSet requests explicitly in Phase 1
5. Enforce require_tls per user — reject if set since Phase 1
   does not implement TLS negotiation
6. Enforce allowed_auth_methods per user — check the comma-separated
   list before proceeding with auth
The session's identity-lookup callback now returns the full
MysqlxResolvedIdentity struct from the config store rather than a
stripped-down MysqlxCredentials. Password-hash derivation moves into
the session via a private derive_stored_hash helper (handles both
"*HEX40" and cleartext forms). No behavior change; the existing
credential-lookup guard semantics are preserved so sessions without a
lookup still follow the open-proxy path exercised by the robustness
tests. mysqlx_robustness_unit-t passes unchanged (36/36).
Introduces MysqlxSession::resolve_backend_target(), a private method that
translates the authenticated user's identity_->default_route into the
concrete (target_hostgroup_, target_address_, target_port_) triple that
handler_connecting_server needs to reach the backend. The method reads
identity_->default_route, looks up the hostgroup + endpoint via the
thread's MysqlxConfigStore, and returns 0 on success; on failure it
returns 4000/4001/4002 (empty default_route / unknown route / no-backend
respectively), emits an X-Protocol Error frame, records a stats miss via
mysqlx_stats().record_conn_err(), and marks the session unhealthy.

Why: fixes the design gap where sessions populated target_* fields only
from a pool cache lookup and otherwise connected to "" on port 0 (see
docs/superpowers/specs/2026-04-17-mysqlx-route-identity-design.md). The
pre-Ok timing is deliberate — once the X-Protocol Ok frame is on the
wire, a routing failure cannot be cleanly reported, so resolution must
run before send_auth_ok().

Invariants preserved: resolve_backend_target() is not yet wired into the
auth flow. Task 4 owns that step. This commit is therefore a pure
addition — no production call site touches the new method; only the
new unit tests exercise it, via test-only public accessors
(inject_identity_for_test / resolve_backend_target_for_test /
target_*_for_test). Mysqlx_Thread gains get_config_store() so the session
can reach the store; MysqlxStatsStore gains reset_for_test() and
get_last_conn_err_for_test() so the stats-recording contract can be
asserted without a live SQLite stats DB.

Out of scope: wiring resolve_backend_target() into handle_auth_plain and
handler_auth_challenge_response (Task 4); removing the dormant worker
path (Task 5); reconciling error codes 4000/4001/4002 with the
project-wide ProxySQL error-code policy (follow-up commit).

Test coverage (plan bumped +7, now 43 assertions):
  - test_routing_happy_path        — rc=0, target fields populated
  - test_routing_no_default_route  — rc=4000
  - test_routing_unknown_route     — rc=4001
  - test_routing_no_backend        — rc=4002 (route exists, 0 endpoints)
  - test_routing_stats_on_failure  — (route,hg) tuples for all 3 modes
  - test_routing_unknown_user      — identity_lookup nullopt => unhealthy

Also updates mysqlx_session_unit-t / mysqlx_thread_unit-t /
mysqlx_message_dispatch_unit-t / mysqlx_concurrent_unit-t Makefile rules
to link mysqlx_config_store.cpp + mysqlx_stats.cpp, which were already
required transitively by mysqlx_thread.cpp::resolve_identity() since
Task 2 but had not been wired into the unit-test link set.
Both auth paths (PLAIN and MYSQL41) now call resolve_backend_target()
after credential verification and BEFORE send_auth_ok(). On a routing
failure (empty default_route -> 4000, unknown route -> 4001, no
backend -> 4002) the session emits an X-Protocol Error frame,
transitions to X_SESSION_CLOSING, and never puts an Ok frame on the
wire. The MYSQL41 branch also keeps to_process=true on failure so the
state machine drives the session to closed on the next handler tick.

Fixes the reported bug where sessions would complete auth with
target_address_ == "" and target_port_ == 0 on cache miss, then
attempt to connect to "":0 on the first client query.

Testing:
  - New integration test test_plain_auth_empty_default_route_closes_session
    drives PLAIN auth (TLS-marked client_ds) with an identity whose
    default_route is empty; asserts session reaches X_SESSION_CLOSING,
    is_healthy() is false, and the client sees a Mysqlx ERROR frame
    (not an Ok).
  - setup_authenticated_session now stands up a process-global
    MysqlxConfigStore + Mysqlx_Thread with a "default_test_route"
    fixture and pre-seeds the session's identity_ via the existing
    inject_identity_for_test hook. Pre-seeding identity_ directly
    (instead of set_identity_lookup) preserves the helper's existing
    all-zero-scramble contract by keeping identity_lookup_ unset, so
    the auth handler's credential verification is still skipped --
    matching test_mysql41_no_credential_lookup_accepts_any's
    open-proxy invariant while still supplying a resolvable route.
  - test_mysql41_no_credential_lookup_accepts_any is updated the same
    way: attached to the default test thread + pre-seeded identity.
    Its "without identity_lookup, any scramble accepted" assertion is
    unchanged.
  - Plan bumped 43 -> 46; all 46 assertions pass.

Out of scope (future tasks): the dormant MysqlxWorker path, cache
key/resolver-key semantics, and listener wiring.
What:
- resolve_backend_target()'s !identity_ guard now calls
  mysqlx_stats().record_conn_err("", 0) so the missing-identity path does
  not silently drop observability.
- Stale "Not yet wired" comment in mysqlx_session.h replaced with a
  current-state doc reflecting that resolve_backend_target() is called
  from the auth handlers before send_auth_ok().

Why:
- Final review flagged that the spec's "all failure modes record stats"
  contract wasn't held on the programming-error guard. Tuple ("", 0)
  matches the empty-route 4000 case, which is acceptable aliasing since
  both legitimately lack route context.
- Comment update prevents future confusion about whether resolve is
  integrated (Task 4 is merged).

Out of scope: no behavior change, no test change. Assertion count
remains 46.
Adds two reference documents alongside the code commits so reviewers
and future maintainers have design context next to implementation.

- docs/superpowers/specs/2026-04-17-mysqlx-route-identity-design.md
  captures the problem statement, why user-driven routing was picked
  over listener-driven or hybrid, the three failure modes with their
  error codes, the architectural decision to widen the identity ABI
  to the existing MysqlxResolvedIdentity struct, and the explicit
  non-goals (listener-to-route mapping, worker-path removal, error
  code reconciliation, CI infrastructure).

- docs/superpowers/plans/2026-04-17-mysqlx-route-identity.md
  captures the task breakdown (route_exists predicate, ABI refactor,
  resolve_backend_target method, auth-flow wiring) with exact file
  paths and step-by-step TDD guidance. Used to drive the five
  implementation commits on this branch.

Out of scope: nothing — these are descriptive artifacts, not code.
What: removed `if (cnt == 0) continue;` in `sync_disk_to_memory` and
`copy_to_runtime` in `plugins/mysqlx/src/mysqlx_plugin.cpp`. Also removed
the now-dead `SELECT COUNT(*)` + result unpacking that those skip
conditions depended on.

Why: the original skip was meant as an optimization for "nothing to
copy" but it was a correctness bug. If a user emptied a mysqlx table in
`main` and saved to disk, on next restart `sync_disk_to_memory` would
see disk count == 0, skip the replace, and leave stale rows in `main.*`.
Same gap for main -> runtime. The BEGIN/DELETE/INSERT/COMMIT sequence is
already atomic and correctly no-ops when the source is empty (DELETE
dest, INSERT zero rows).

Testing: added a unit test at
`test/tap/tests/unit/mysqlx_robustness_unit-t.cpp` exercising the
"empty source overwrites stale dest" invariant. Also added
`mysqlx_config_store.cpp` to the `mysqlx_robustness_unit-t` link line so
the test binary resolves `MysqlxConfigStore::resolve_identity` (a
latent build gap uncovered while validating the fix). Assertion count
33 -> 35.

Out of scope: no refactor of the inline atomic-replace sequence (its
atomicity is pre-existing). No changes to the admin-schema `copy_table`
path, which was already unconditional.
Deletes the mysqlx worker implementation and the TAP smoke test that
was its only caller. Also removes the corresponding entries from the
plugin Makefile, the unit-tests Makefile, and the CI groups manifest.

Files removed:
- plugins/mysqlx/src/mysqlx_worker.cpp
- plugins/mysqlx/include/mysqlx_worker.h
- test/tap/tests/test_mysqlx_listener_smoke-t.cpp

References cleaned:
- plugins/mysqlx/Makefile: dropped mysqlx_worker.cpp from SRCS
- test/tap/tests/unit/Makefile: dropped test_mysqlx_listener_smoke-t
  from UNIT_TESTS and deleted its build rule
- test/tap/groups/groups.json: removed the smoke-test entry from the
  unit-tests-g1 group

Why: the worker was an earlier parallel implementation of the mysqlx
session path (separate accept thread + worker queue + per-listener
route tracking + identity.default_route -> pick_endpoint). It never
reached any call site in the production proxysql binary. The only
consumers of its public API were the smoke test and the worker code
itself. The route-identity fix in PR #5641 used the worker's design
as a reference and implemented the fix in the active session path
(Mysqlx_Thread + MysqlxSession + MysqlxConfigStore::pick_endpoint).
With that landed, the dormant code is pure tech debt — reviewers
repeatedly mistook its logic for the active path.

Why retire the smoke test at the same time: the test exercised
mysqlx_start_listeners_from_runtime_routes() which was the worker's
only entry function. With the worker gone, the smoke test cannot
link; keeping it would produce a broken CI target.

Out of scope: no changes to the active session path, no changes to
the MysqlxConfigStore or admin-schema tables, no changes to any
other tests. Pure deletion + Makefile/groups cleanup.
What:
  Stop rewrapping the raw backend fd in a fresh session-owned
  MysqlxDataStream after backend auth completes. Route every
  data-plane read/write (forward_to_backend, handler_waiting_server_msg,
  handler_session_reset_waiting, and the cached-conn attach path)
  through MysqlxConnection::backend_ds() instead.

Why:
  The optional backend TLS handshake runs against the connection's
  backend_ds_ (see mysqlx_connection.cpp init_ssl_connect) and the
  resulting SSL* lives there. The prior session code discarded that
  SSL* by init()-ing a new plain MysqlxDataStream around the same
  raw fd, then used it for cleartext I/O on a socket where the
  server expects TLS frames. Flagged in the ProtocolX code review
  (item #2).

Testing:
  mysqlx_robustness_unit-t picks up five new assertions that pin
  the invariant: MysqlxSession::server_ds() aliases
  backend_conn_->backend_ds() whenever a backend is attached, and
  falls back to a fd == -1 placeholder otherwise. Baseline 33,
  post-fix 38, all pass.

Out of scope:
  - Connection cache semantics: cached connections keep their
    SSL state; MysqlxConnection::reset() already leaves
    backend_ds_ untouched so the invariant extends to pooled reuse.
  - MysqlxDataStream internals (untouched).
  - The dormant worker path (different PR).

Pre-existing ripple:
  test/tap/tests/unit/Makefile mysqlx_robustness_unit-t link line
  was missing mysqlx_config_store.cpp, same gap other mysqlx tests
  have been closing as they get touched. Added it here so the
  test links.
…TIME

Startup previously capped listener creation at `pool_size` routes via
`ti < ctx.threads.size()` in mysqlx_plugin.cpp's startup loop, so any
routes beyond the thread-pool size had no listener. Runtime route
changes via `LOAD MYSQLX ROUTES TO RUNTIME` never touched the listener
topology at all — adding, removing, or toggling a route's `active` flag
had no effect until a full restart. Flagged in the ProtocolX code
review as item #4.

This commit:

- Drops the `ti < pool_size` cap in the startup loop.
- Distributes routes across threads round-robin (the original intent)
  using a shared plugin-scope `route_to_thread` map guarded by a
  mutex, with a `next_rr_index` cursor.
- Adds route-name tracking to Mysqlx_Thread: a new parallel
  `listener_route_names_` vector alongside `listener_fds_` /
  `listener_addrs_`, plus a new `remove_listener_for_route(name)`
  method that closes the fd and prunes all three vectors. Returns
  true/false so callers can use it idempotently.
- Adds a `mysqlx_reconcile_listeners(admindb)` desired-state
  reconciler. Reads active routes from `runtime_mysqlx_routes`,
  binds missing listeners round-robin, and removes listeners for
  routes that are gone or deactivated. Startup and
  `LOAD MYSQLX ROUTES TO RUNTIME` both go through this single path,
  so both agree. The reconciler is idempotent: re-running with the
  same desired set is a no-op.
- The reconciliation core lives in a new file
  `plugins/mysqlx/src/mysqlx_listener_reconcile.cpp` as a pure
  helper `mysqlx_reconcile_listeners_impl(...)` taking state by
  parameter, so unit tests can drive it against a minimal fake
  context. The convenience wrapper that reads
  `mysqlx_context()` lives in plugin.cpp and is declared weak so
  tests of admin_schema.cpp that don't link plugin.cpp still link
  cleanly (admin_schema null-checks the weak pointer).

Tests: mysqlx_robustness_unit-t grows from 33 to 38 assertions —
four thread-API tests (add_listener with route name on two threads,
remove_listener_for_route removes and is idempotent) plus one
integration test that builds an in-memory admin DB with one route
and asserts `mysqlx_reconcile_listeners_impl` binds exactly one
listener and records the mapping.

Also fixes the pre-existing unit-test Makefile link gap: four
tests that include `mysqlx_thread.cpp` (which references
`MysqlxConfigStore::resolve_identity`) now correctly link
`mysqlx_config_store.cpp`, and the robustness test link line picks
up `mysqlx_config_store.cpp` + `mysqlx_listener_reconcile.cpp` for
the new coverage.

Out of scope: switching the distribution strategy to SO_REUSEPORT /
all-threads-on-all-routes (future design iteration); treating an
in-place `active` flag toggle as anything other than remove + add.
…read assumption

The listener reconciler built its desired snapshot keyed only by route name,
so editing a route's `bind` column (e.g. from `:33061` to `:33062`) and running
LOAD MYSQLX ROUTES TO RUNTIME left the old listener running and never opened
the new port — the removal pass saw the name still in the desired set and
skipped the route entirely.

The removal pass now also compares the currently-bound `host:port` on the
owning thread against the desired bind and treats any mismatch as a removal;
the subsequent addition pass rebinds under the new spec.

To support that comparison, `Mysqlx_Thread` now stores bind ports in a
parallel `listener_ports_` vector alongside the existing `listener_addrs_`
and exposes `get_listener_addr_for_route()` returning the canonical
`"host:port"` form.

Also added a comment documenting the pre-existing single-admin-thread
invariant that justifies snapshotting the DB outside of `route_to_thread_mutex`
(noted in code review, not a behavioral change).

Testing: new unit-test assertions exercise a full bind-change reconcile
end-to-end — pick two free ports, reconcile at port1, update the runtime
row to port2, reconcile again, and assert the listener is now bound at
port2 with total listener count still 1. 42 assertions pass (was 38).

Out of scope: changing the concurrency model. The single-admin-thread
assumption is pre-existing throughout ProxySQL admin execution.
…d connection

The session's backend-connection setup site in handler_connecting_server
now falls back to identity_->backend_username when that field is
non-empty, and only reuses the frontend username_ when it is empty
(the mapped-mode / default case).

coderabbitai flagged on PR #5645 (outside-diff; the code originates in
PR #5641): the pre-fix setup call always passed the frontend username_
to set_backend_user() while pairing it with the resolved password from
identity_->backend_password. For mysqlx_users rows whose
backend_auth_mode is `service_account` and whose backend_username
differs from the frontend username, this mismatch paired userA's
password with userB's name — backend auth failed with access-denied
even though both columns were internally consistent.

The route-identity refactor (earlier commits on this branch) widened
the user lookup to return the full MysqlxResolvedIdentity struct, which
already carries backend_username. This commit wires that existing field
through at the backend-setup site; no store-side changes are required.
See MysqlxBackendAuthMode in mysqlx_config_store.h for the full set of
modes and their semantics.

Testing: two new unit assertions covering both branches of the
selector — mapped mode (empty backend_username falls back to frontend
username_) and service_account mode (distinct backend_username is used
verbatim). Assertion count moves from 46 to 48. A test-only getter
MysqlxConnection::get_backend_user_for_test() was added so the test can
observe the string that was ultimately passed to set_backend_user.

Out of scope: enforcement of MysqlxBackendAuthMode semantics (for
example, rejecting service_account rows that omit backend_username at
load time). That validation belongs with the config-store loader and
is tracked separately.
sync_disk_to_memory() and copy_to_runtime() in mysqlx_plugin.cpp now check
every admindb.execute() return value. On BEGIN / DELETE / INSERT / COMMIT
failure the code issues a best-effort ROLLBACK, logs via
services->log_message, and skips the current table. Both functions return
false if any table's transaction failed, true if every replace succeeded.

coderabbitai flagged this on #5642, #5643, and #5644 (outside-diff because
the file appears in each PR's surrounding context). Without the return
checks, a failed INSERT landing between a successful DELETE and the
unconditional COMMIT silently wiped routes / users / endpoints /
backend_endpoints — the transaction wrap advertised atomicity it did not
deliver. The fix restores that guarantee.

The shared BEGIN/DELETE/INSERT/COMMIT body is now a local helper
replace_table_atomically() that takes a ProxySQL_PluginServices* so it
can log via the existing plugin log_message callback; the two callers
reduce to four-line loops over their table lists. mysqlx_start passes
ctx.services through; its return values remain advisory (log-and-continue
on a false return), intentionally — plugins shouldn't block proxysql
startup on one table's sync failure. Upgrading to fail-fast is a future
decision.

Testing: added test_insert_failure_rolls_back in
test/tap/tests/unit/mysqlx_robustness_unit-t.cpp. It seeds a dst table
with a CHECK (id %% 2 = 0) constraint plus pre-existing rows, points src
at odd ids, runs the atomic replace, and asserts (a) the function
returns false, (b) dst still holds its original rows, (c) the row sum
matches the pre-transaction state. plan(35) -> plan(39). All 39
assertions pass.

Out of scope: the Makefile link-line fix from 82fe27f still applies;
no further ripple.
Addresses the coderabbitai outside-diff finding originally surfaced on
PR #5641: the previous check_connect() conflated poll() errors with
timeouts, never retried on EINTR, did not check getsockopt()'s return,
and accepted POLLNVAL / POLLERR / POLLHUP as though they were normal
"not ready" states.

The practical failure mode was slow and misleading: a fatal poll error
or a bad descriptor was masked as "not ready yet" until the outer
connect_timeout_ms_ (default 10s) elapsed, at which point the session
reported a generic "connect failed" reason rather than the actual
cause. A concurrent signal (EINTR) would extend that masked window
further. Meanwhile, a failing getsockopt() with an uninspected return
would leave `err` at its initializer value of 0 and optimistically
transition the state machine to AUTHENTICATING, so the very next read
or write on the invalid fd would fail in a less diagnostic path.

What changed:
- poll() is now retried on EINTR.
- poll() returning -1 sets ERROR_STATE and returns -1 immediately.
- POLLNVAL / POLLERR / POLLHUP in revents are treated as hard errors
  even if POLLOUT is also set, since the underlying socket is already
  in a terminal async condition.
- getsockopt() return is checked; a nonzero return is now a hard error
  rather than a silent state advance.

Testing: three new assertions in mysqlx_robustness_unit-t.cpp exercise
the bad-fd error path, the happy-path connect, and the not-ready
timeout path. Plan count 33 → 42.

Out of scope: other poll() sites in the mysqlx plugin (the unit test
helpers at the top of mysqlx_robustness_unit-t.cpp have their own
poll() loops but they aren't on the backend-connect critical path);
the connect_timeout_ms_ check itself, which is unchanged.

Pre-existing ripple: same Makefile link-gap fix as earlier PRs — adds
mysqlx_config_store.cpp to the mysqlx_robustness_unit-t link line so
MysqlxConfigStore::resolve_identity() resolves when the binary links.
The removed test case assumed that calling poll(POLLOUT) on a fresh
non-blocking socket (one that never had connect() called) would return
0 (no POLLOUT) and drive check_connect() through the "not ready yet"
branch. On Linux, a fresh unconnected socket has an empty send buffer
so POLLOUT fires immediately; getsockopt() then returns SO_ERROR=0 and
check_connect() legitimately transitions to AUTHENTICATING.

The production code is correct — the test premise was wrong. A
deterministic "not ready" scenario would require an in-progress
connect() to a blackhole or filtered endpoint, which is either flaky
or slow to drive in a unit test. The "not ready" path (pr == 0 branch
returning 1) is trivial by inspection and exercised in practice by
real backend connects where the TCP handshake is pending.

Kept the two genuine tests: bad-fd error path and happy-path connect.

Plan count 42 -> 39.
chore(mysqlx): retire dormant MysqlxWorker path and its smoke test
fix(mysqlx): sync empty source tables to overwrite stale rows
fix(mysqlx): harden check_connect() poll and getsockopt handling
…' into HEAD

# Conflicts:
#	test/tap/tests/unit/mysqlx_robustness_unit-t.cpp
…nto HEAD

# Conflicts:
#	plugins/mysqlx/Makefile
#	plugins/mysqlx/src/mysqlx_plugin.cpp
#	test/tap/tests/unit/Makefile
#	test/tap/tests/unit/mysqlx_robustness_unit-t.cpp
Add g_plugin_lifecycle_mutex to serialize load/start/stop transitions
so two reload paths cannot race on g_registry_target and the
registration-failure globals. g_active_plugin_manager_mutex remains the
pointer-read guard for the dispatch path; the new mutex is held only
for the duration of a lifecycle transition.

In proxysql_stop_configured_plugins, always call manager.reset() so the
.so is unmapped and no stale function pointers remain reachable, even
when stop_all() reports a failure. stop_all() is idempotent across
failure (each plugin is marked stopped after one attempt) so the
destructor's stop_all() will be a no-op.

Also fix the log-level table in PLUGIN_API.md: numeric levels are 3
(Error), 4 (Warning), and any-other (Info), matching the internal
proxy_* severity scheme — not 1/2/3 as previously documented.
Detect the installed libprotobuf via pkg-config at configure time and
fail fast unless it is 3.x. The vendored .pb.cc/.pb.h were generated
with protoc 3.21.12, and the plugin links dynamically against the
system libprotobuf — which is ABI-compatible only within the 3.x major
version (4.x released with an incompatible ABI and a changed SONAME).
Without this check, a .so that linked cleanly would crash the first
time a virtual dispatched into the proto runtime.

Add -fvisibility=hidden and -fvisibility-inlines-hidden so only the
explicitly extern "C"-declared proxysql_plugin_descriptor_v1 entry
point is exported. Prevents ODR collisions with the proxysql core when
the .so is dlopen'd, and stops template instantiations from leaking
across the boundary.

Add -fstack-protector-strong unconditionally, and -D_FORTIFY_SOURCE=2
when not building under ASAN and when OPTZ is not -O0 (both conditions
are incompatible with FORTIFY_SOURCE).

Install the built ProxySQL_MySQLX_Plugin.so to
/usr/lib/proxysql/plugins/ from the top-level Makefile install target,
with matching cleanup in uninstall. Previously the plugin was built
but never staged into a system location, so `make install` produced a
proxysql binary that couldn't find it.
mysqlx_connection.cpp:
Drain leading NOTICE frames in read_auth_frame() instead of returning
nullopt on the first NOTICE. MySQL backends commonly emit a
session-state-change notice before AuthenticateContinue or Ok, and
returning nullopt caused the auth state machine to spin on try-read
for the full 10s handshake timeout before completing. The two callers
(step_auth_capabilities_get_sent and step_auth_capabilities_set_sent)
now use the shared helper and drop their duplicated NOTICE checks.
Also added a frame-size guard before reading the message-type byte.

mysqlx_data_stream.{h,cpp}:
Add close_and_reset() which tears down SSL/BIO state and clears every
read/write buffer and parse flag without close()ing the fd. Required
by mysqlx_session.cpp's return_backend_to_pool(), where the fd is
owned by the pooled MysqlxConnection and must stay open after the
data stream is wiped. Fix SSL_read return handling: a 0-return is a
clean TLS shutdown (close_notify) and must surface as a connection
close, not as a WANT_IO/retry. The previous code treated 0 and <0
identically and would loop forever on a cleanly-closed TLS peer.

mysqlx_protocol.cpp:
mysqlx_build_frame now rejects serialized payloads at the uint32
boundary so the +1 for the message-type byte cannot wrap to 0. This
mirrors the X_MAX_PAYLOAD_SIZE clamp already applied by the inbound
parser in MysqlxDataStream.

mysqlx_stats.cpp:
Rewrite the stats_mysqlx_routes INSERT builder to use std::string
concatenation instead of a fixed 1024-byte snprintf buffer. Long
route names plus escaping could overflow the buffer and the row was
silently dropped without reaching the statsdb.
Session auth previously accepted any client when credential_lookup_
was unset — an open-proxy fallback that turned a wiring bug into a
silent security hole. Both PLAIN (handle_auth_plain) and MYSQL41
(handler_auth_challenge_response) now hard-reject with 1045 when no
lookup is installed. A credential_lookup is always wired from
mysqlx_thread.cpp via the config store, so there is no legitimate
code path that reaches auth without one.

Also tightened the credential check: password_hash must be exactly
20 bytes (MYSQL41 hash length) rather than simply non-empty, which
could previously let a short or garbage hash pass CRYPTO_memcmp's
min()-clamped comparison.

In return_backend_to_pool(), replace the fresh-construct move-assign
over server_ds_ with server_ds_.close_and_reset(). The old code
constructed a new MysqlxDataStream (fd=-1, no SSL) and moved it in,
whose destructor then close()d the fd that had just been returned to
the pooled MysqlxConnection — silently corrupting the pool entry.
close_and_reset() releases SSL/buffers without touching the fd.

Unit test mysqlx_robustness_unit-t.cpp:
setup_authenticated_session now installs a real credential_lookup
with a known password, drives the server's AuthenticateContinue
frame to extract the challenge, and computes the MYSQL41 scramble
reply. This matches the new reality that auth requires a valid
lookup. The previously-misnamed test_mysql41_no_credential_lookup_
accepts_any is renamed _rejects and asserts that without a lookup
the session goes unhealthy and does not reach WAITING_CLIENT_XMSG.
process_all_sessions previously forced sess->to_process=true on every
tick and unconditionally called sess->handler(), burning CPU at large
idle session counts (one full state-machine traversal per session per
loop iteration, regardless of whether anything had changed).

Now only call handler() when at least one of these is true:
  - a poll event landed on the client or server data stream
  - the session self-flagged to_process (handler wants to re-run)
  - a complete frame is already buffered on either stream

Also make Mysqlx_Thread::sessions_mutex_ mutable and take it in
get_session_count() const. Previously const accessors that needed to
lock the mutex couldn't — and the session count was read without the
lock at all, racing the writer that appends/removes sessions.
CI-mysqlx.yml previously only ran the test_mysqlx_e2e_* binaries and
hard-coded the MYSQLX_E2E_* variables inline. That missed the
integration smoke tests (test_mysqlx_admin_tables,
test_mysqlx_listener_smoke, test_mysqlx_plugin_load) and meant the
group's env.sh was never consulted, so MYSQLX_E2E_PROXYSQL_PORT was
unset and test_mysqlx_e2e_routing-t silently skipped.

Now the step sources test/tap/groups/mysqlx-e2e/env.sh and iterates
every test_mysqlx_*-t binary. Errors out if the glob matches nothing,
so a future rename that quietly elides every mysqlx test doesn't pass
a green CI.

env.sh: use the mysqlx_test / mysqlx_test credentials provisioned by
setup-infras.bash, not root with empty password.

setup-infras.bash: return exit 1 when X Protocol isn't reachable on
33060 rather than exit 0, so CI actually fails instead of running
every test against a missing backend.

pre-cleanup.bash: derive the sandbox version from the msb_*/ directory
name rather than hardcoding 8.4.8. setup-infras.bash uses --newest so
the patch level drifts with upstream MySQL releases.
The Makefile had `.DEFAULT: default` intending to make the `default` target
(which builds the project) the fallback when `make` is invoked with no
arguments. `.DEFAULT` is a different special target — it provides a fallback
*recipe* for targets with no rule, and has no effect on default goal
selection.

GNU make picks the first real target as the default goal unless
`.DEFAULT_GOAL` is set. The first real target in this Makefile is
`lint-generate-cdb` (line 19), so `make` with no arguments ran
`scripts/lint/generate-compile-commands.sh`, which does
`bear -- make -j\$(nproc)` — a full rebuild of the project under Bear's
compile-intercept wrappers. Each C++ compile under Bear spawned a wrapper
process, producing hundreds of bear processes per plain `make` invocation
and burying the intended default behaviour.

Replace `.DEFAULT: default` with `.DEFAULT_GOAL := default` so `make` with
no arguments runs `default: build_src` as originally intended.
generate-compile-commands.sh wrapped the top-level build with
  bear -- make -j\$(nproc)
with no explicit target, so the inner make resolved its default goal
dynamically. Combined with the previous Makefile bug that left
lint-generate-cdb as the default goal, invoking `make` with no arguments
produced an exponential fork bomb: lint-generate-cdb → this script →
bear -- make -jN → lint-generate-cdb → this script → …, with each level
forking up to \$(nproc) concurrent sub-makes under Bear's LD_PRELOAD
exec intercept. Spawned hundreds of `bear -- make -jN` processes before
grinding the system to a halt.

The Makefile default-goal fix in 7a1d639 closes the trigger, but the
script itself should also be robust against a future re-introduction of
the same hazard. Pass `build_src` explicitly as the target so the
inner make never re-enters lint-generate-cdb regardless of what the
default goal happens to be.
MysqlxConfigStore::load_from_runtime issues five fetch_result queries
in sequence and returns false (leaving the store empty) if any one
fails. One of those queries — SELECT FROM runtime_mysqlx_variables —
hit a table that create_test_db() never created, so every scenario
that depended on data actually being loaded silently short-circuited
before the swap.

Effect: eight assertions in mysqlx_config_store_pure_unit-t.cpp were
quietly failing under the pre-existing plan(25) — none of the tests
that exercised a loaded identity, route, or endpoint were actually
verifying the production code, because identities_/routes_/
hostgroup_endpoints_ were never populated.

Adding the DDL fixes all eight. No production changes; test harness
only.
…nto HEAD

# Conflicts:
#	plugins/mysqlx/src/mysqlx_session.cpp
#	test/tap/tests/unit/Makefile
#	test/tap/tests/unit/mysqlx_robustness_unit-t.cpp
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
9 Security Hotspots
E Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant