feat: MySQL X Protocol plugin (dynamically loaded)#5593
Open
renecannao wants to merge 130 commits intov3.0from
Open
feat: MySQL X Protocol plugin (dynamically loaded)#5593renecannao wants to merge 130 commits intov3.0from
renecannao wants to merge 130 commits intov3.0from
Conversation
Plugin start now opens TCP listeners for each active route in runtime_mysqlx_routes. An accept thread dispatches incoming connections to worker threads round-robin. Phase 1 workers accept-then-close; Tasks 7-8 will add X Protocol handshake and backend relay.
Compile mysqlx*.proto files from MySQL Connector-C 8.4.0 into C++ sources. The plugin .so now links -lprotobuf for X Protocol frame/message encoding and decoding needed by Tasks 7-8.
Add mysqlx_protocol.h/cpp with frame encode/decode, MYSQL41 scramble auth using EVP SHA1, and protobuf message helpers for X Protocol Error/Ok/Capabilities/Auth frames. Add mysqlx_frontend_session.h/cpp implementing the full handshake state machine: CapabilitiesGet→Capabilities, CapabilitiesSet→Ok, AuthenticateStart→challenge→AuthenticateContinue→AuthenticateOk. Supports MYSQL41 and PLAIN auth methods. Enforces x_enabled flag, rejects pass_through backend auth mode. Workers now run the frontend session instead of immediately closing connections.
Add MysqlxBackendSession with TCP connect, X Protocol MYSQL41 auth to backend, and bidirectional byte-level relay via select(). Implement round_robin and round_robin_with_fallback endpoint selection strategies in MysqlxConfigStore::pick_endpoint(). Workers now connect to the backend after frontend auth and relay traffic until close. Phase 1 is one frontend = one backend, no pooling.
Add MysqlxStatsStore with atomic route counters (conn_ok, conn_err, bytes_sent, bytes_recv) and flush_to_sqlite() for stats_mysqlx_routes. Register stats_mysqlx_routes and stats_mysqlx_processlist tables in the plugin admin schema. Wire stats recording into the worker: record_conn_ok on successful backend connect, record_conn_err on failure. Add topology generation capture at session bind time as the Phase 2 seam for GR notification-driven route invalidation.
1. SQL injection in stats flush: escape route names with sqlite_escape() 2. Timing attack in PLAIN auth: use CRYPTO_memcmp for constant-time password comparison 3. OOM via oversized frame: add MYSQLX_MAX_PAYLOAD_SIZE (16MB) check in mysqlx_read_frame before allocating payload buffer 4. Thread safety: add shared_mutex to MysqlxConfigStore, shared_lock for readers (resolve_identity, pick_endpoint), unique_lock for writer (load_from_runtime), separate rr_mutex_ for round-robin counter mutation
1. Replace select() with poll() in accept loop and relay to avoid FD_SETSIZE undefined behavior with high file descriptors 2. Make g_rr_index atomic for safe concurrent access 3. Add SO_RCVTIMEO/SO_SNDTIMEO (10s backend, 30s frontend) to prevent indefinite blocking from slow peers 4. Reject TLS CapabilitiesSet requests explicitly in Phase 1 5. Enforce require_tls per user — reject if set since Phase 1 does not implement TLS negotiation 6. Enforce allowed_auth_methods per user — check the comma-separated list before proceeding with auth
The session's identity-lookup callback now returns the full MysqlxResolvedIdentity struct from the config store rather than a stripped-down MysqlxCredentials. Password-hash derivation moves into the session via a private derive_stored_hash helper (handles both "*HEX40" and cleartext forms). No behavior change; the existing credential-lookup guard semantics are preserved so sessions without a lookup still follow the open-proxy path exercised by the robustness tests. mysqlx_robustness_unit-t passes unchanged (36/36).
Introduces MysqlxSession::resolve_backend_target(), a private method that translates the authenticated user's identity_->default_route into the concrete (target_hostgroup_, target_address_, target_port_) triple that handler_connecting_server needs to reach the backend. The method reads identity_->default_route, looks up the hostgroup + endpoint via the thread's MysqlxConfigStore, and returns 0 on success; on failure it returns 4000/4001/4002 (empty default_route / unknown route / no-backend respectively), emits an X-Protocol Error frame, records a stats miss via mysqlx_stats().record_conn_err(), and marks the session unhealthy. Why: fixes the design gap where sessions populated target_* fields only from a pool cache lookup and otherwise connected to "" on port 0 (see docs/superpowers/specs/2026-04-17-mysqlx-route-identity-design.md). The pre-Ok timing is deliberate — once the X-Protocol Ok frame is on the wire, a routing failure cannot be cleanly reported, so resolution must run before send_auth_ok(). Invariants preserved: resolve_backend_target() is not yet wired into the auth flow. Task 4 owns that step. This commit is therefore a pure addition — no production call site touches the new method; only the new unit tests exercise it, via test-only public accessors (inject_identity_for_test / resolve_backend_target_for_test / target_*_for_test). Mysqlx_Thread gains get_config_store() so the session can reach the store; MysqlxStatsStore gains reset_for_test() and get_last_conn_err_for_test() so the stats-recording contract can be asserted without a live SQLite stats DB. Out of scope: wiring resolve_backend_target() into handle_auth_plain and handler_auth_challenge_response (Task 4); removing the dormant worker path (Task 5); reconciling error codes 4000/4001/4002 with the project-wide ProxySQL error-code policy (follow-up commit). Test coverage (plan bumped +7, now 43 assertions): - test_routing_happy_path — rc=0, target fields populated - test_routing_no_default_route — rc=4000 - test_routing_unknown_route — rc=4001 - test_routing_no_backend — rc=4002 (route exists, 0 endpoints) - test_routing_stats_on_failure — (route,hg) tuples for all 3 modes - test_routing_unknown_user — identity_lookup nullopt => unhealthy Also updates mysqlx_session_unit-t / mysqlx_thread_unit-t / mysqlx_message_dispatch_unit-t / mysqlx_concurrent_unit-t Makefile rules to link mysqlx_config_store.cpp + mysqlx_stats.cpp, which were already required transitively by mysqlx_thread.cpp::resolve_identity() since Task 2 but had not been wired into the unit-test link set.
Both auth paths (PLAIN and MYSQL41) now call resolve_backend_target()
after credential verification and BEFORE send_auth_ok(). On a routing
failure (empty default_route -> 4000, unknown route -> 4001, no
backend -> 4002) the session emits an X-Protocol Error frame,
transitions to X_SESSION_CLOSING, and never puts an Ok frame on the
wire. The MYSQL41 branch also keeps to_process=true on failure so the
state machine drives the session to closed on the next handler tick.
Fixes the reported bug where sessions would complete auth with
target_address_ == "" and target_port_ == 0 on cache miss, then
attempt to connect to "":0 on the first client query.
Testing:
- New integration test test_plain_auth_empty_default_route_closes_session
drives PLAIN auth (TLS-marked client_ds) with an identity whose
default_route is empty; asserts session reaches X_SESSION_CLOSING,
is_healthy() is false, and the client sees a Mysqlx ERROR frame
(not an Ok).
- setup_authenticated_session now stands up a process-global
MysqlxConfigStore + Mysqlx_Thread with a "default_test_route"
fixture and pre-seeds the session's identity_ via the existing
inject_identity_for_test hook. Pre-seeding identity_ directly
(instead of set_identity_lookup) preserves the helper's existing
all-zero-scramble contract by keeping identity_lookup_ unset, so
the auth handler's credential verification is still skipped --
matching test_mysql41_no_credential_lookup_accepts_any's
open-proxy invariant while still supplying a resolvable route.
- test_mysql41_no_credential_lookup_accepts_any is updated the same
way: attached to the default test thread + pre-seeded identity.
Its "without identity_lookup, any scramble accepted" assertion is
unchanged.
- Plan bumped 43 -> 46; all 46 assertions pass.
Out of scope (future tasks): the dormant MysqlxWorker path, cache
key/resolver-key semantics, and listener wiring.
What:
- resolve_backend_target()'s !identity_ guard now calls
mysqlx_stats().record_conn_err("", 0) so the missing-identity path does
not silently drop observability.
- Stale "Not yet wired" comment in mysqlx_session.h replaced with a
current-state doc reflecting that resolve_backend_target() is called
from the auth handlers before send_auth_ok().
Why:
- Final review flagged that the spec's "all failure modes record stats"
contract wasn't held on the programming-error guard. Tuple ("", 0)
matches the empty-route 4000 case, which is acceptable aliasing since
both legitimately lack route context.
- Comment update prevents future confusion about whether resolve is
integrated (Task 4 is merged).
Out of scope: no behavior change, no test change. Assertion count
remains 46.
Adds two reference documents alongside the code commits so reviewers and future maintainers have design context next to implementation. - docs/superpowers/specs/2026-04-17-mysqlx-route-identity-design.md captures the problem statement, why user-driven routing was picked over listener-driven or hybrid, the three failure modes with their error codes, the architectural decision to widen the identity ABI to the existing MysqlxResolvedIdentity struct, and the explicit non-goals (listener-to-route mapping, worker-path removal, error code reconciliation, CI infrastructure). - docs/superpowers/plans/2026-04-17-mysqlx-route-identity.md captures the task breakdown (route_exists predicate, ABI refactor, resolve_backend_target method, auth-flow wiring) with exact file paths and step-by-step TDD guidance. Used to drive the five implementation commits on this branch. Out of scope: nothing — these are descriptive artifacts, not code.
What: removed `if (cnt == 0) continue;` in `sync_disk_to_memory` and `copy_to_runtime` in `plugins/mysqlx/src/mysqlx_plugin.cpp`. Also removed the now-dead `SELECT COUNT(*)` + result unpacking that those skip conditions depended on. Why: the original skip was meant as an optimization for "nothing to copy" but it was a correctness bug. If a user emptied a mysqlx table in `main` and saved to disk, on next restart `sync_disk_to_memory` would see disk count == 0, skip the replace, and leave stale rows in `main.*`. Same gap for main -> runtime. The BEGIN/DELETE/INSERT/COMMIT sequence is already atomic and correctly no-ops when the source is empty (DELETE dest, INSERT zero rows). Testing: added a unit test at `test/tap/tests/unit/mysqlx_robustness_unit-t.cpp` exercising the "empty source overwrites stale dest" invariant. Also added `mysqlx_config_store.cpp` to the `mysqlx_robustness_unit-t` link line so the test binary resolves `MysqlxConfigStore::resolve_identity` (a latent build gap uncovered while validating the fix). Assertion count 33 -> 35. Out of scope: no refactor of the inline atomic-replace sequence (its atomicity is pre-existing). No changes to the admin-schema `copy_table` path, which was already unconditional.
Deletes the mysqlx worker implementation and the TAP smoke test that was its only caller. Also removes the corresponding entries from the plugin Makefile, the unit-tests Makefile, and the CI groups manifest. Files removed: - plugins/mysqlx/src/mysqlx_worker.cpp - plugins/mysqlx/include/mysqlx_worker.h - test/tap/tests/test_mysqlx_listener_smoke-t.cpp References cleaned: - plugins/mysqlx/Makefile: dropped mysqlx_worker.cpp from SRCS - test/tap/tests/unit/Makefile: dropped test_mysqlx_listener_smoke-t from UNIT_TESTS and deleted its build rule - test/tap/groups/groups.json: removed the smoke-test entry from the unit-tests-g1 group Why: the worker was an earlier parallel implementation of the mysqlx session path (separate accept thread + worker queue + per-listener route tracking + identity.default_route -> pick_endpoint). It never reached any call site in the production proxysql binary. The only consumers of its public API were the smoke test and the worker code itself. The route-identity fix in PR #5641 used the worker's design as a reference and implemented the fix in the active session path (Mysqlx_Thread + MysqlxSession + MysqlxConfigStore::pick_endpoint). With that landed, the dormant code is pure tech debt — reviewers repeatedly mistook its logic for the active path. Why retire the smoke test at the same time: the test exercised mysqlx_start_listeners_from_runtime_routes() which was the worker's only entry function. With the worker gone, the smoke test cannot link; keeping it would produce a broken CI target. Out of scope: no changes to the active session path, no changes to the MysqlxConfigStore or admin-schema tables, no changes to any other tests. Pure deletion + Makefile/groups cleanup.
What: Stop rewrapping the raw backend fd in a fresh session-owned MysqlxDataStream after backend auth completes. Route every data-plane read/write (forward_to_backend, handler_waiting_server_msg, handler_session_reset_waiting, and the cached-conn attach path) through MysqlxConnection::backend_ds() instead. Why: The optional backend TLS handshake runs against the connection's backend_ds_ (see mysqlx_connection.cpp init_ssl_connect) and the resulting SSL* lives there. The prior session code discarded that SSL* by init()-ing a new plain MysqlxDataStream around the same raw fd, then used it for cleartext I/O on a socket where the server expects TLS frames. Flagged in the ProtocolX code review (item #2). Testing: mysqlx_robustness_unit-t picks up five new assertions that pin the invariant: MysqlxSession::server_ds() aliases backend_conn_->backend_ds() whenever a backend is attached, and falls back to a fd == -1 placeholder otherwise. Baseline 33, post-fix 38, all pass. Out of scope: - Connection cache semantics: cached connections keep their SSL state; MysqlxConnection::reset() already leaves backend_ds_ untouched so the invariant extends to pooled reuse. - MysqlxDataStream internals (untouched). - The dormant worker path (different PR). Pre-existing ripple: test/tap/tests/unit/Makefile mysqlx_robustness_unit-t link line was missing mysqlx_config_store.cpp, same gap other mysqlx tests have been closing as they get touched. Added it here so the test links.
…TIME Startup previously capped listener creation at `pool_size` routes via `ti < ctx.threads.size()` in mysqlx_plugin.cpp's startup loop, so any routes beyond the thread-pool size had no listener. Runtime route changes via `LOAD MYSQLX ROUTES TO RUNTIME` never touched the listener topology at all — adding, removing, or toggling a route's `active` flag had no effect until a full restart. Flagged in the ProtocolX code review as item #4. This commit: - Drops the `ti < pool_size` cap in the startup loop. - Distributes routes across threads round-robin (the original intent) using a shared plugin-scope `route_to_thread` map guarded by a mutex, with a `next_rr_index` cursor. - Adds route-name tracking to Mysqlx_Thread: a new parallel `listener_route_names_` vector alongside `listener_fds_` / `listener_addrs_`, plus a new `remove_listener_for_route(name)` method that closes the fd and prunes all three vectors. Returns true/false so callers can use it idempotently. - Adds a `mysqlx_reconcile_listeners(admindb)` desired-state reconciler. Reads active routes from `runtime_mysqlx_routes`, binds missing listeners round-robin, and removes listeners for routes that are gone or deactivated. Startup and `LOAD MYSQLX ROUTES TO RUNTIME` both go through this single path, so both agree. The reconciler is idempotent: re-running with the same desired set is a no-op. - The reconciliation core lives in a new file `plugins/mysqlx/src/mysqlx_listener_reconcile.cpp` as a pure helper `mysqlx_reconcile_listeners_impl(...)` taking state by parameter, so unit tests can drive it against a minimal fake context. The convenience wrapper that reads `mysqlx_context()` lives in plugin.cpp and is declared weak so tests of admin_schema.cpp that don't link plugin.cpp still link cleanly (admin_schema null-checks the weak pointer). Tests: mysqlx_robustness_unit-t grows from 33 to 38 assertions — four thread-API tests (add_listener with route name on two threads, remove_listener_for_route removes and is idempotent) plus one integration test that builds an in-memory admin DB with one route and asserts `mysqlx_reconcile_listeners_impl` binds exactly one listener and records the mapping. Also fixes the pre-existing unit-test Makefile link gap: four tests that include `mysqlx_thread.cpp` (which references `MysqlxConfigStore::resolve_identity`) now correctly link `mysqlx_config_store.cpp`, and the robustness test link line picks up `mysqlx_config_store.cpp` + `mysqlx_listener_reconcile.cpp` for the new coverage. Out of scope: switching the distribution strategy to SO_REUSEPORT / all-threads-on-all-routes (future design iteration); treating an in-place `active` flag toggle as anything other than remove + add.
…read assumption The listener reconciler built its desired snapshot keyed only by route name, so editing a route's `bind` column (e.g. from `:33061` to `:33062`) and running LOAD MYSQLX ROUTES TO RUNTIME left the old listener running and never opened the new port — the removal pass saw the name still in the desired set and skipped the route entirely. The removal pass now also compares the currently-bound `host:port` on the owning thread against the desired bind and treats any mismatch as a removal; the subsequent addition pass rebinds under the new spec. To support that comparison, `Mysqlx_Thread` now stores bind ports in a parallel `listener_ports_` vector alongside the existing `listener_addrs_` and exposes `get_listener_addr_for_route()` returning the canonical `"host:port"` form. Also added a comment documenting the pre-existing single-admin-thread invariant that justifies snapshotting the DB outside of `route_to_thread_mutex` (noted in code review, not a behavioral change). Testing: new unit-test assertions exercise a full bind-change reconcile end-to-end — pick two free ports, reconcile at port1, update the runtime row to port2, reconcile again, and assert the listener is now bound at port2 with total listener count still 1. 42 assertions pass (was 38). Out of scope: changing the concurrency model. The single-admin-thread assumption is pre-existing throughout ProxySQL admin execution.
…d connection The session's backend-connection setup site in handler_connecting_server now falls back to identity_->backend_username when that field is non-empty, and only reuses the frontend username_ when it is empty (the mapped-mode / default case). coderabbitai flagged on PR #5645 (outside-diff; the code originates in PR #5641): the pre-fix setup call always passed the frontend username_ to set_backend_user() while pairing it with the resolved password from identity_->backend_password. For mysqlx_users rows whose backend_auth_mode is `service_account` and whose backend_username differs from the frontend username, this mismatch paired userA's password with userB's name — backend auth failed with access-denied even though both columns were internally consistent. The route-identity refactor (earlier commits on this branch) widened the user lookup to return the full MysqlxResolvedIdentity struct, which already carries backend_username. This commit wires that existing field through at the backend-setup site; no store-side changes are required. See MysqlxBackendAuthMode in mysqlx_config_store.h for the full set of modes and their semantics. Testing: two new unit assertions covering both branches of the selector — mapped mode (empty backend_username falls back to frontend username_) and service_account mode (distinct backend_username is used verbatim). Assertion count moves from 46 to 48. A test-only getter MysqlxConnection::get_backend_user_for_test() was added so the test can observe the string that was ultimately passed to set_backend_user. Out of scope: enforcement of MysqlxBackendAuthMode semantics (for example, rejecting service_account rows that omit backend_username at load time). That validation belongs with the config-store loader and is tracked separately.
sync_disk_to_memory() and copy_to_runtime() in mysqlx_plugin.cpp now check every admindb.execute() return value. On BEGIN / DELETE / INSERT / COMMIT failure the code issues a best-effort ROLLBACK, logs via services->log_message, and skips the current table. Both functions return false if any table's transaction failed, true if every replace succeeded. coderabbitai flagged this on #5642, #5643, and #5644 (outside-diff because the file appears in each PR's surrounding context). Without the return checks, a failed INSERT landing between a successful DELETE and the unconditional COMMIT silently wiped routes / users / endpoints / backend_endpoints — the transaction wrap advertised atomicity it did not deliver. The fix restores that guarantee. The shared BEGIN/DELETE/INSERT/COMMIT body is now a local helper replace_table_atomically() that takes a ProxySQL_PluginServices* so it can log via the existing plugin log_message callback; the two callers reduce to four-line loops over their table lists. mysqlx_start passes ctx.services through; its return values remain advisory (log-and-continue on a false return), intentionally — plugins shouldn't block proxysql startup on one table's sync failure. Upgrading to fail-fast is a future decision. Testing: added test_insert_failure_rolls_back in test/tap/tests/unit/mysqlx_robustness_unit-t.cpp. It seeds a dst table with a CHECK (id %% 2 = 0) constraint plus pre-existing rows, points src at odd ids, runs the atomic replace, and asserts (a) the function returns false, (b) dst still holds its original rows, (c) the row sum matches the pre-transaction state. plan(35) -> plan(39). All 39 assertions pass. Out of scope: the Makefile link-line fix from 82fe27f still applies; no further ripple.
Addresses the coderabbitai outside-diff finding originally surfaced on PR #5641: the previous check_connect() conflated poll() errors with timeouts, never retried on EINTR, did not check getsockopt()'s return, and accepted POLLNVAL / POLLERR / POLLHUP as though they were normal "not ready" states. The practical failure mode was slow and misleading: a fatal poll error or a bad descriptor was masked as "not ready yet" until the outer connect_timeout_ms_ (default 10s) elapsed, at which point the session reported a generic "connect failed" reason rather than the actual cause. A concurrent signal (EINTR) would extend that masked window further. Meanwhile, a failing getsockopt() with an uninspected return would leave `err` at its initializer value of 0 and optimistically transition the state machine to AUTHENTICATING, so the very next read or write on the invalid fd would fail in a less diagnostic path. What changed: - poll() is now retried on EINTR. - poll() returning -1 sets ERROR_STATE and returns -1 immediately. - POLLNVAL / POLLERR / POLLHUP in revents are treated as hard errors even if POLLOUT is also set, since the underlying socket is already in a terminal async condition. - getsockopt() return is checked; a nonzero return is now a hard error rather than a silent state advance. Testing: three new assertions in mysqlx_robustness_unit-t.cpp exercise the bad-fd error path, the happy-path connect, and the not-ready timeout path. Plan count 33 → 42. Out of scope: other poll() sites in the mysqlx plugin (the unit test helpers at the top of mysqlx_robustness_unit-t.cpp have their own poll() loops but they aren't on the backend-connect critical path); the connect_timeout_ms_ check itself, which is unchanged. Pre-existing ripple: same Makefile link-gap fix as earlier PRs — adds mysqlx_config_store.cpp to the mysqlx_robustness_unit-t link line so MysqlxConfigStore::resolve_identity() resolves when the binary links.
The removed test case assumed that calling poll(POLLOUT) on a fresh non-blocking socket (one that never had connect() called) would return 0 (no POLLOUT) and drive check_connect() through the "not ready yet" branch. On Linux, a fresh unconnected socket has an empty send buffer so POLLOUT fires immediately; getsockopt() then returns SO_ERROR=0 and check_connect() legitimately transitions to AUTHENTICATING. The production code is correct — the test premise was wrong. A deterministic "not ready" scenario would require an in-progress connect() to a blackhole or filtered endpoint, which is either flaky or slow to drive in a unit test. The "not ready" path (pr == 0 branch returning 1) is trivial by inspection and exercised in practice by real backend connects where the TCP handshake is pending. Kept the two genuine tests: bad-fd error path and happy-path connect. Plan count 42 -> 39.
chore(mysqlx): retire dormant MysqlxWorker path and its smoke test
fix(mysqlx): sync empty source tables to overwrite stale rows
fix(mysqlx): harden check_connect() poll and getsockopt handling
…' into HEAD # Conflicts: # test/tap/tests/unit/mysqlx_robustness_unit-t.cpp
…nto HEAD # Conflicts: # plugins/mysqlx/Makefile # plugins/mysqlx/src/mysqlx_plugin.cpp # test/tap/tests/unit/Makefile # test/tap/tests/unit/mysqlx_robustness_unit-t.cpp
Add g_plugin_lifecycle_mutex to serialize load/start/stop transitions so two reload paths cannot race on g_registry_target and the registration-failure globals. g_active_plugin_manager_mutex remains the pointer-read guard for the dispatch path; the new mutex is held only for the duration of a lifecycle transition. In proxysql_stop_configured_plugins, always call manager.reset() so the .so is unmapped and no stale function pointers remain reachable, even when stop_all() reports a failure. stop_all() is idempotent across failure (each plugin is marked stopped after one attempt) so the destructor's stop_all() will be a no-op. Also fix the log-level table in PLUGIN_API.md: numeric levels are 3 (Error), 4 (Warning), and any-other (Info), matching the internal proxy_* severity scheme — not 1/2/3 as previously documented.
Detect the installed libprotobuf via pkg-config at configure time and fail fast unless it is 3.x. The vendored .pb.cc/.pb.h were generated with protoc 3.21.12, and the plugin links dynamically against the system libprotobuf — which is ABI-compatible only within the 3.x major version (4.x released with an incompatible ABI and a changed SONAME). Without this check, a .so that linked cleanly would crash the first time a virtual dispatched into the proto runtime. Add -fvisibility=hidden and -fvisibility-inlines-hidden so only the explicitly extern "C"-declared proxysql_plugin_descriptor_v1 entry point is exported. Prevents ODR collisions with the proxysql core when the .so is dlopen'd, and stops template instantiations from leaking across the boundary. Add -fstack-protector-strong unconditionally, and -D_FORTIFY_SOURCE=2 when not building under ASAN and when OPTZ is not -O0 (both conditions are incompatible with FORTIFY_SOURCE). Install the built ProxySQL_MySQLX_Plugin.so to /usr/lib/proxysql/plugins/ from the top-level Makefile install target, with matching cleanup in uninstall. Previously the plugin was built but never staged into a system location, so `make install` produced a proxysql binary that couldn't find it.
mysqlx_connection.cpp:
Drain leading NOTICE frames in read_auth_frame() instead of returning
nullopt on the first NOTICE. MySQL backends commonly emit a
session-state-change notice before AuthenticateContinue or Ok, and
returning nullopt caused the auth state machine to spin on try-read
for the full 10s handshake timeout before completing. The two callers
(step_auth_capabilities_get_sent and step_auth_capabilities_set_sent)
now use the shared helper and drop their duplicated NOTICE checks.
Also added a frame-size guard before reading the message-type byte.
mysqlx_data_stream.{h,cpp}:
Add close_and_reset() which tears down SSL/BIO state and clears every
read/write buffer and parse flag without close()ing the fd. Required
by mysqlx_session.cpp's return_backend_to_pool(), where the fd is
owned by the pooled MysqlxConnection and must stay open after the
data stream is wiped. Fix SSL_read return handling: a 0-return is a
clean TLS shutdown (close_notify) and must surface as a connection
close, not as a WANT_IO/retry. The previous code treated 0 and <0
identically and would loop forever on a cleanly-closed TLS peer.
mysqlx_protocol.cpp:
mysqlx_build_frame now rejects serialized payloads at the uint32
boundary so the +1 for the message-type byte cannot wrap to 0. This
mirrors the X_MAX_PAYLOAD_SIZE clamp already applied by the inbound
parser in MysqlxDataStream.
mysqlx_stats.cpp:
Rewrite the stats_mysqlx_routes INSERT builder to use std::string
concatenation instead of a fixed 1024-byte snprintf buffer. Long
route names plus escaping could overflow the buffer and the row was
silently dropped without reaching the statsdb.
Session auth previously accepted any client when credential_lookup_ was unset — an open-proxy fallback that turned a wiring bug into a silent security hole. Both PLAIN (handle_auth_plain) and MYSQL41 (handler_auth_challenge_response) now hard-reject with 1045 when no lookup is installed. A credential_lookup is always wired from mysqlx_thread.cpp via the config store, so there is no legitimate code path that reaches auth without one. Also tightened the credential check: password_hash must be exactly 20 bytes (MYSQL41 hash length) rather than simply non-empty, which could previously let a short or garbage hash pass CRYPTO_memcmp's min()-clamped comparison. In return_backend_to_pool(), replace the fresh-construct move-assign over server_ds_ with server_ds_.close_and_reset(). The old code constructed a new MysqlxDataStream (fd=-1, no SSL) and moved it in, whose destructor then close()d the fd that had just been returned to the pooled MysqlxConnection — silently corrupting the pool entry. close_and_reset() releases SSL/buffers without touching the fd. Unit test mysqlx_robustness_unit-t.cpp: setup_authenticated_session now installs a real credential_lookup with a known password, drives the server's AuthenticateContinue frame to extract the challenge, and computes the MYSQL41 scramble reply. This matches the new reality that auth requires a valid lookup. The previously-misnamed test_mysql41_no_credential_lookup_ accepts_any is renamed _rejects and asserts that without a lookup the session goes unhealthy and does not reach WAITING_CLIENT_XMSG.
process_all_sessions previously forced sess->to_process=true on every tick and unconditionally called sess->handler(), burning CPU at large idle session counts (one full state-machine traversal per session per loop iteration, regardless of whether anything had changed). Now only call handler() when at least one of these is true: - a poll event landed on the client or server data stream - the session self-flagged to_process (handler wants to re-run) - a complete frame is already buffered on either stream Also make Mysqlx_Thread::sessions_mutex_ mutable and take it in get_session_count() const. Previously const accessors that needed to lock the mutex couldn't — and the session count was read without the lock at all, racing the writer that appends/removes sessions.
CI-mysqlx.yml previously only ran the test_mysqlx_e2e_* binaries and hard-coded the MYSQLX_E2E_* variables inline. That missed the integration smoke tests (test_mysqlx_admin_tables, test_mysqlx_listener_smoke, test_mysqlx_plugin_load) and meant the group's env.sh was never consulted, so MYSQLX_E2E_PROXYSQL_PORT was unset and test_mysqlx_e2e_routing-t silently skipped. Now the step sources test/tap/groups/mysqlx-e2e/env.sh and iterates every test_mysqlx_*-t binary. Errors out if the glob matches nothing, so a future rename that quietly elides every mysqlx test doesn't pass a green CI. env.sh: use the mysqlx_test / mysqlx_test credentials provisioned by setup-infras.bash, not root with empty password. setup-infras.bash: return exit 1 when X Protocol isn't reachable on 33060 rather than exit 0, so CI actually fails instead of running every test against a missing backend. pre-cleanup.bash: derive the sandbox version from the msb_*/ directory name rather than hardcoding 8.4.8. setup-infras.bash uses --newest so the patch level drifts with upstream MySQL releases.
The Makefile had `.DEFAULT: default` intending to make the `default` target (which builds the project) the fallback when `make` is invoked with no arguments. `.DEFAULT` is a different special target — it provides a fallback *recipe* for targets with no rule, and has no effect on default goal selection. GNU make picks the first real target as the default goal unless `.DEFAULT_GOAL` is set. The first real target in this Makefile is `lint-generate-cdb` (line 19), so `make` with no arguments ran `scripts/lint/generate-compile-commands.sh`, which does `bear -- make -j\$(nproc)` — a full rebuild of the project under Bear's compile-intercept wrappers. Each C++ compile under Bear spawned a wrapper process, producing hundreds of bear processes per plain `make` invocation and burying the intended default behaviour. Replace `.DEFAULT: default` with `.DEFAULT_GOAL := default` so `make` with no arguments runs `default: build_src` as originally intended.
generate-compile-commands.sh wrapped the top-level build with bear -- make -j\$(nproc) with no explicit target, so the inner make resolved its default goal dynamically. Combined with the previous Makefile bug that left lint-generate-cdb as the default goal, invoking `make` with no arguments produced an exponential fork bomb: lint-generate-cdb → this script → bear -- make -jN → lint-generate-cdb → this script → …, with each level forking up to \$(nproc) concurrent sub-makes under Bear's LD_PRELOAD exec intercept. Spawned hundreds of `bear -- make -jN` processes before grinding the system to a halt. The Makefile default-goal fix in 7a1d639 closes the trigger, but the script itself should also be robust against a future re-introduction of the same hazard. Pass `build_src` explicitly as the target so the inner make never re-enters lint-generate-cdb regardless of what the default goal happens to be.
MysqlxConfigStore::load_from_runtime issues five fetch_result queries in sequence and returns false (leaving the store empty) if any one fails. One of those queries — SELECT FROM runtime_mysqlx_variables — hit a table that create_test_db() never created, so every scenario that depended on data actually being loaded silently short-circuited before the swap. Effect: eight assertions in mysqlx_config_store_pure_unit-t.cpp were quietly failing under the pre-existing plan(25) — none of the tests that exercised a loaded identity, route, or endpoint were actually verifying the production code, because identities_/routes_/ hostgroup_endpoints_ were never populated. Adding the DDL fixes all eight. No production changes; test harness only.
…nto HEAD # Conflicts: # plugins/mysqlx/src/mysqlx_session.cpp # test/tap/tests/unit/Makefile # test/tap/tests/unit/mysqlx_robustness_unit-t.cpp
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Adds a MySQL X Protocol plugin to ProxySQL, enabling it to accept, authenticate, and route connections using MySQL's X Protocol (port 33060). The plugin is built as a dynamically loaded
.so— no core engine modifications required for the plugin itself.This is the first protocol plugin for ProxySQL, establishing the plugin infrastructure (
ProxySQL_PluginManager) that future plugins can reuse.61 commits, 80 files changed, ~34,500 lines added.
What's Included
Plugin Infrastructure (Core Changes)
ProxySQL_PluginManager— dlopen/dlsym-based plugin loader with lifecycle management (load → start → stop)ProxySQL_PluginC ABI — stable interface header for third-party plugin developmentdispatch_plugin_admin_commandroutes plugin-specific SQL commandsplugin_modulesfield in ProxySQL config for declaring pluginsmain.cpp— LoadConfiguredPlugins / StartConfiguredPlugins / StopConfiguredPluginsMySQL X Protocol Plugin (
plugins/mysqlx/).pb.cc/.pb.hfrom MySQL X Protocol.protodefinitions (committed, not generated at build time)MYSQL41andPLAINauthenticationmysqlx_users,mysqlx_routes,mysqlx_listeners,mysqlx_variables,runtime_mysqlx_*,stats_mysqlx_*)Admin Commands
Standard ProxySQL syntax, completely separate from MySQL commands:
Testing (292+ assertions across 16 test files)
fake_plugin.cpp+test_globals.h+test_init.hfor unit testing plugin codeCI
.github/workflows/CI-mysqlx.yml— GitHub Actions workflow with unit-tests and e2e-tests jobstest/tap/groups/mysqlx-e2e/Documentation
doc/mysqlx/README.md— Reference manual (tables, commands, auth, routing, quick start)doc/mysqlx/ARCHITECTURE.md— Architecture with ASCII diagrams, data flows, thread model, ABI contractdoc/mysqlx/TESTING.md— Testing documentation with coverage mapdocs/superpowers/specs/— Design spec + canonical commands designSecurity Hardening (from Code Review)
CRYPTO_memcmpgetaddrinfo(nogethostbyname)TCP_NODELAYon all X Protocol socketsNot in Scope (Phase 2)
mysqlx_*admin tablesHow to Test Locally
Summary by CodeRabbit
New Features
New Admin Commands
Documentation
Infrastructure
Chores