Skip to content

Codegen Bistream Optimization#20

Merged
jr1221 merged 33 commits intomainfrom
test-jinja-bitmasking-stuff
Feb 24, 2026
Merged

Codegen Bistream Optimization#20
jr1221 merged 33 commits intomainfrom
test-jinja-bitmasking-stuff

Conversation

@bjackson312006
Copy link
Copy Markdown
Contributor

@bjackson312006 bjackson312006 commented Jan 20, 2026

This PR replaces the codegen systsem's use of the bitstream API with direct bitmasking/bitshifting operations that are optimized to the structure of the message.

Things that were changed:

  • Use of the bitstream API was removed and replaced with direct bitmasking/bitshifting operations
  • For bitstream messages, endian_swap() was replaced with __builtin_bswap16, __builtin_bswap32, and __builtin_bswap64 (jinja automatically determines which ones to use based on message size).

Things that were not changed:

  • Messages with use_struct = true shouldn't be modified at all in this PR (i.e., that side of the codegen should work exactly the same)
  • endian_swap() is still used for struct messages (instead of the builtins).
  • The rules of use_struct haven't changed (all messages default to using bitstream, but if a single point in a message has endianness = 'little', that message will use struct instead of bitstream.

More in-depth explanation of PR

TLDR: Since we have codegen now, we don't really need to use the bitstream API for formatting non-byte-aligned messages. We can just have jinja generate the manual bitshifting/bitmasking code to format the messages. This is probably more performant:

  • The bitstream API uses a for loop to add each bit to the buffer individually
  • But w/ this new method, all the masks and bitshift amounts are constant at compile-time.

Note: I don't actually know if this is more performant (it technically should be, but the compiler might be smart enough to optimize away the bitstream for loops or something). Also it might not even matter/be noticeable if the bitstream stuff was already pretty fast.

The newly-generated functions are definitely a bit uglier to look at, but I think it would be easy enough to add _handle_unsigned_overflow() and _handle_signed_overflow() functions to the codegen to remove redundancy a bit.

Testing

I haven't tested this on a board yet, but I have on my laptop (since I can still check how the message buffers look). I've basically just been testing the newly-generated message functions against the old bitstream-based message functions. So far, they've produced messages in the same formats. I'll comment with the functions I've tested so far on my laptop (since this PR will be too long if I add them here).

UPDATE as of 2/22/2026:
I've now tested on the board. I'm testing with the VCU test message to make sure there aren't any differences between this system and the current system.

  • The message has this structure: uint8_t send_vcu_test_message (uint8_t three_bits,float float_value,uint8_t five_bits,uint16_t sixteen_bits,int8_t signed_8_bits).
  • I'm passing in these parameters for all tests: send_vcu_test_message(7, 19.342, 30, 13942, -122);

Here's my MQTT output with the current system:
image
^^ Consistent with what's been passed into the message.

Here's my MQTT output with this new system (i.e., what's been added in this PR):
image
^^ Consistent with what's been passed into the message (also, consistent with what's being returned by the current system)

@bjackson312006 bjackson312006 self-assigned this Jan 20, 2026
@bjackson312006
Copy link
Copy Markdown
Contributor Author

Test: VCU Shutdown Pins Message

/* Newly-generated one */
ret_t send_shutdown_pins(bool bms_gpio, bool bots_gpio, bool spare_gpio, bool bspd_gpio, bool hv_c, bool hvd_gpio, bool imd_gpio, bool ckpt_gpio, bool inertia_sw_gpio, bool tsms_gpio, uint8_t UNUSED)
{
    ret_t ret = { 0 };

    uint16_t data = 0;
    uint32_t bms_gpio_i = (uint32_t)(bms_gpio);
    if (bms_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point bms_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        bms_gpio_i = 1;
    }
    data |= ((bms_gpio_i) & 0x1ULL) << 15;

    uint32_t bots_gpio_i = (uint32_t)(bots_gpio);
    if (bots_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point bots_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        bots_gpio_i = 1;
    }
    data |= ((bots_gpio_i) & 0x1ULL) << 14;

    uint32_t spare_gpio_i = (uint32_t)(spare_gpio);
    if (spare_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point spare_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        spare_gpio_i = 1;
    }
    data |= ((spare_gpio_i) & 0x1ULL) << 13;

    uint32_t bspd_gpio_i = (uint32_t)(bspd_gpio);
    if (bspd_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point bspd_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        bspd_gpio_i = 1;
    }
    data |= ((bspd_gpio_i) & 0x1ULL) << 12;

    uint32_t hv_c_i = (uint32_t)(hv_c);
    if (hv_c_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point hv_c! Capping point to its max value (point is 1 bits, so max_value=1).");
        hv_c_i = 1;
    }
    data |= ((hv_c_i) & 0x1ULL) << 11;

    uint32_t hvd_gpio_i = (uint32_t)(hvd_gpio);
    if (hvd_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point hvd_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        hvd_gpio_i = 1;
    }
    data |= ((hvd_gpio_i) & 0x1ULL) << 10;

    uint32_t imd_gpio_i = (uint32_t)(imd_gpio);
    if (imd_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point imd_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        imd_gpio_i = 1;
    }
    data |= ((imd_gpio_i) & 0x1ULL) << 9;

    uint32_t ckpt_gpio_i = (uint32_t)(ckpt_gpio);
    if (ckpt_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point ckpt_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        ckpt_gpio_i = 1;
    }
    data |= ((ckpt_gpio_i) & 0x1ULL) << 8;

    uint32_t inertia_sw_gpio_i = (uint32_t)(inertia_sw_gpio);
    if (inertia_sw_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point inertia_sw_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        inertia_sw_gpio_i = 1;
    }
    data |= ((inertia_sw_gpio_i) & 0x1ULL) << 7;

    uint32_t tsms_gpio_i = (uint32_t)(tsms_gpio);
    if (tsms_gpio_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point tsms_gpio! Capping point to its max value (point is 1 bits, so max_value=1).");
        tsms_gpio_i = 1;
    }
    data |= ((tsms_gpio_i) & 0x1ULL) << 6;

    uint32_t UNUSED_i = (uint32_t)(UNUSED);
    if (UNUSED_i > 63ULL)
    {
        PRINTLN_WARNING("Overflow occured for point UNUSED! Capping point to its max value (point is 6 bits, so max_value=63).");
        UNUSED_i = 63;
    }
    data |= ((UNUSED_i) & 0x3FULL) << 0;

    uint16_t data_bigendian = __builtin_bswap16(data);
    memcpy(ret.data, &data_bigendian, 2);

    return ret;
}

/* Old bitstream-based one */
ret_t send_shutdown_pins_2
(bool bms_gpio,bool bots_gpio,bool spare_gpio,bool bspd_gpio,bool hv_c,bool hvd_gpio,bool imd_gpio,bool ckpt_gpio,bool inertia_sw_gpio,bool tsms_gpio,uint8_t UNUSED)
{   
    ret_t ret = { 0 };

    bitstream_t shutdown_pins_msg;
	uint8_t bitstream_data[2];
	bitstream_init(&shutdown_pins_msg, bitstream_data, 2);
	
    bitstream_add(&shutdown_pins_msg, bms_gpio, 1);
    bitstream_add(&shutdown_pins_msg, bots_gpio, 1);
    bitstream_add(&shutdown_pins_msg, spare_gpio, 1);
    bitstream_add(&shutdown_pins_msg, bspd_gpio, 1);
    bitstream_add(&shutdown_pins_msg, hv_c, 1);
    bitstream_add(&shutdown_pins_msg, hvd_gpio, 1);
    bitstream_add(&shutdown_pins_msg, imd_gpio, 1);
    bitstream_add(&shutdown_pins_msg, ckpt_gpio, 1);
    bitstream_add(&shutdown_pins_msg, inertia_sw_gpio, 1);
    bitstream_add(&shutdown_pins_msg, tsms_gpio, 1);
    bitstream_add(&shutdown_pins_msg, UNUSED, 6);

    //handle_bitstream_overflow(&shutdown_pins_msg, msg.id);
    
    memcpy(ret.data, &bitstream_data, sizeof(bitstream_data));

    return ret;
}

int main(void) {
    const bool bms_gpio = true;
    const bool bots_gpio = false;
    const bool spare_gpio = true;
    const bool bspd_gpio = true;
    const bool hv_c = true;
    const bool hvd_gpio = true;
    const bool imd_gpio = false;
    const bool ckpt_gpio = true;
    const bool inertia_sw_gpio = true;
    const bool tsms_gpio = false;
    uint8_t UNUSED = 0;

    ret_t ret1 = send_shutdown_pins(bms_gpio, bots_gpio, spare_gpio, bspd_gpio, hv_c, hvd_gpio, imd_gpio, ckpt_gpio, inertia_sw_gpio, tsms_gpio, UNUSED);
    ret_t ret2 = send_shutdown_pins_2(bms_gpio, bots_gpio, spare_gpio, bspd_gpio, hv_c, hvd_gpio, imd_gpio, ckpt_gpio, inertia_sw_gpio, tsms_gpio, UNUSED);

    print_ret_binary(&ret1);
    print_ret_binary(&ret2);

    return 0;
}

Output:

Binary: 10111101 10000000 00000000 00000000 00000000 00000000 00000000 00000000 
Binary: 10111101 10000000 00000000 00000000 00000000 00000000 00000000 00000000 

(so they're the same)

@bjackson312006
Copy link
Copy Markdown
Contributor Author

Test: VCU Car State Message

/* Old bitstream one */
ret_t send_car_state
(bool home_mode,uint8_t nero_index,int32_t car_speed,bool tsms,uint32_t torque_limit_percentage,bool reverse,uint16_t regen_limit,bool launch_control)
{
    ret_t ret = { 0 };

    bitstream_t car_state_msg;
	uint8_t bitstream_data[6];
	bitstream_init(&car_state_msg, bitstream_data, 6);
	
    bitstream_add(&car_state_msg, home_mode, 4);
    bitstream_add(&car_state_msg, nero_index, 4);
    bitstream_add_signed(&car_state_msg, car_speed*10, 16);
    bitstream_add(&car_state_msg, tsms, 1);
    bitstream_add(&car_state_msg, torque_limit_percentage*100, 7);
    bitstream_add(&car_state_msg, reverse, 1);
    bitstream_add(&car_state_msg, regen_limit, 10);
    bitstream_add(&car_state_msg, launch_control, 1);
    
    memcpy(ret.data, &bitstream_data, 6);

    return ret;
}

/* New one */
ret_t send_car_state_2(bool home_mode, uint8_t nero_index, int32_t car_speed, bool tsms, uint32_t torque_limit_percentage, bool reverse, uint16_t regen_limit, bool launch_control)
{
    ret_t ret = { 0 };

    uint64_t data = 0;
    uint32_t home_mode_i = (uint32_t)(home_mode);
    if (home_mode_i > 15ULL)
    {
        PRINTLN_WARNING("Overflow occured for point home_mode! Capping point to its max value (point is 4 bits, so max_value=15).");
        home_mode_i = 15;
    }
    data |= ((home_mode_i) & 0xFULL) << 60;

    uint32_t nero_index_i = (uint32_t)(nero_index);
    if (nero_index_i > 15ULL)
    {
        PRINTLN_WARNING("Overflow occured for point nero_index! Capping point to its max value (point is 4 bits, so max_value=15).");
        nero_index_i = 15;
    }
    data |= ((nero_index_i) & 0xFULL) << 56;

    int32_t car_speed_i = (int32_t)(car_speed * 10);
    if (car_speed_i > 32767)
    {
        PRINTLN_WARNING("Overflow occured for point car_speed! Capping point to its max value (point is 16 bits signed, so max_value=32767).");
        car_speed_i = 32767;
    }
    else if (car_speed_i < -32768)
    {
        PRINTLN_WARNING("Underflow occured for point car_speed! Capping point to its min value (point is 16 bits signed, so min_value=-32768).");
        car_speed_i = -32768;
    }
    data |= ((uint32_t)(car_speed_i) & 0xFFFFULL) << 40;

    uint32_t tsms_i = (uint32_t)(tsms);
    if (tsms_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point tsms! Capping point to its max value (point is 1 bits, so max_value=1).");
        tsms_i = 1;
    }
    data |= ((tsms_i) & 0x1ULL) << 39;

    uint32_t torque_limit_percentage_i = (uint32_t)(torque_limit_percentage * 100);
    if (torque_limit_percentage_i > 127ULL)
    {
        PRINTLN_WARNING("Overflow occured for point torque_limit_percentage! Capping point to its max value (point is 7 bits, so max_value=127).");
        torque_limit_percentage_i = 127;
    }
    data |= ((torque_limit_percentage_i) & 0x7FULL) << 32;

    uint32_t reverse_i = (uint32_t)(reverse);
    if (reverse_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point reverse! Capping point to its max value (point is 1 bits, so max_value=1).");
        reverse_i = 1;
    }
    data |= ((reverse_i) & 0x1ULL) << 31;

    uint32_t regen_limit_i = (uint32_t)(regen_limit);
    if (regen_limit_i > 1023ULL)
    {
        PRINTLN_WARNING("Overflow occured for point regen_limit! Capping point to its max value (point is 10 bits, so max_value=1023).");
        regen_limit_i = 1023;
    }
    data |= ((regen_limit_i) & 0x3FFULL) << 21;

    uint32_t launch_control_i = (uint32_t)(launch_control);
    if (launch_control_i > 1ULL)
    {
        PRINTLN_WARNING("Overflow occured for point launch_control! Capping point to its max value (point is 1 bits, so max_value=1).");
        launch_control_i = 1;
    }
    data |= ((launch_control_i) & 0x1ULL) << 20;

    uint64_t data_bigendian = __builtin_bswap64(data);
    memcpy(ret.data, &data_bigendian, 8);

    return ret;
}

int main(void) {
    const bool home_mode = false;
    const uint8_t nero_index = 1;
    int32_t car_speed = 14;
    bool tsms = false;
    uint32_t torque_limit_percentage = 13;
    bool reverse = false;
    uint16_t regen_limit = 1423;
    bool launch_control = false;

    ret_t ret1 = send_car_state(home_mode, nero_index, car_speed, tsms, torque_limit_percentage, reverse, regen_limit, launch_control);
    ret_t ret2 = send_car_state_2(home_mode, nero_index, car_speed, tsms, torque_limit_percentage, reverse, regen_limit, launch_control);

    print_ret_binary(&ret1);
    print_ret_binary(&ret2);

    return 0;
}

Output:

Binary: 00000001 00000000 10001100 01111111 01111111 11100000 00000000 00000000 
Binary: 00000001 00000000 10001100 01111111 01111111 11100000 00000000 00000000 

(they're the same, and they both handled the overflow and signed points the same)

@bjackson312006
Copy link
Copy Markdown
Contributor Author

Test: VCU Pedal Voltages Message

/* New one */
ret_t send_pedal_sensor_voltages(float accel1_volts, float accel2_volts, float brake1_volts, float brake2_volts)
{
    ret_t ret = { 0 };

    uint64_t data = 0;
    uint32_t accel1_volts_i = (uint32_t)(accel1_volts * 100);
    if (accel1_volts_i > 65535ULL)
    {
        PRINTLN_WARNING("Overflow occured for point accel1_volts! Capping point to its max value (point is 16 bits, so max_value=65535).");
        accel1_volts_i = 65535;
    }
    data |= ((accel1_volts_i) & 0xFFFFULL) << 48;

    uint32_t accel2_volts_i = (uint32_t)(accel2_volts * 100);
    if (accel2_volts_i > 65535ULL)
    {
        PRINTLN_WARNING("Overflow occured for point accel2_volts! Capping point to its max value (point is 16 bits, so max_value=65535).");
        accel2_volts_i = 65535;
    }
    data |= ((accel2_volts_i) & 0xFFFFULL) << 32;

    uint32_t brake1_volts_i = (uint32_t)(brake1_volts * 100);
    if (brake1_volts_i > 65535ULL)
    {
        PRINTLN_WARNING("Overflow occured for point brake1_volts! Capping point to its max value (point is 16 bits, so max_value=65535).");
        brake1_volts_i = 65535;
    }
    data |= ((brake1_volts_i) & 0xFFFFULL) << 16;

    uint32_t brake2_volts_i = (uint32_t)(brake2_volts * 100);
    if (brake2_volts_i > 65535ULL)
    {
        PRINTLN_WARNING("Overflow occured for point brake2_volts! Capping point to its max value (point is 16 bits, so max_value=65535).");
        brake2_volts_i = 65535;
    }
    data |= ((brake2_volts_i) & 0xFFFFULL) << 0;

    uint64_t data_bigendian = __builtin_bswap64(data);
    memcpy(ret.data, &data_bigendian, 8);

    return ret;
}

/* Old bitstream one */
ret_t send_pedal_sensor_voltages_2
(float accel1_volts,float accel2_volts,float brake1_volts,float brake2_volts)
{
    ret_t ret = { 0 };

    bitstream_t pedal_sensor_voltages_msg;
	uint8_t bitstream_data[8];
	bitstream_init(&pedal_sensor_voltages_msg, bitstream_data, 8);
	
    bitstream_add(&pedal_sensor_voltages_msg, accel1_volts*100, 16);
    bitstream_add(&pedal_sensor_voltages_msg, accel2_volts*100, 16);
    bitstream_add(&pedal_sensor_voltages_msg, brake1_volts*100, 16);
    bitstream_add(&pedal_sensor_voltages_msg, brake2_volts*100, 16);
    
    memcpy(ret.data, &bitstream_data, 8);

    return ret;
}

int main(void) {
    const float accel1_volts = 2.4f;
    const float accel2_volts = 2.5f;
    const float brake1_volts = 1.1f;
    const float brake2_volts = 1.3f;

    ret_t ret1 = send_pedal_sensor_voltages(accel1_volts, accel2_volts, brake1_volts, brake2_volts);
    ret_t ret2 = send_pedal_sensor_voltages_2(accel1_volts, accel2_volts, brake1_volts, brake2_volts);

    print_ret_binary(&ret1);
    print_ret_binary(&ret2);

    return 0;
}

Output:

Binary: 00000000 11110000 00000000 11111010 00000000 01101110 00000000 10000010 
Binary: 00000000 11110000 00000000 11111010 00000000 01101110 00000000 10000010 

(both the same)

@bjackson312006 bjackson312006 marked this pull request as ready for review January 21, 2026 00:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes the CAN message encoder code generation by replacing the generic bitstream API with optimized direct bit manipulation operations. The change aims to improve performance by using compile-time constant masks and bit shifts instead of runtime loops.

Changes:

  • Replaced bitstream API with direct bitmasking/bitshifting for non-byte-aligned messages
  • Replaced endian_swap() with GCC built-in byte swap functions (__builtin_bswap16/32/64)
  • Added new msg_size_bits macro to compute total message size in bits

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
code-gen/templates/macros.j2 Added u_tx_debug.h import and new msg_size_bits macro for bit-level size calculation
code-gen/templates/encoders_old.c.j2 Preserved old bitstream-based implementation as reference
code-gen/templates/encoders.c.j2 Replaced bitstream API with direct bit operations and __builtin_bswap functions for both bitstream and struct encoding paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread code-gen/templates/encoders.c.j2 Outdated
Comment thread code-gen/templates/encoders_old.c.j2 Outdated
Comment thread code-gen/templates/encoders.c.j2 Outdated
Comment thread code-gen/templates/encoders.c.j2
Comment thread code-gen/templates/encoders.c.j2 Outdated
Comment thread code-gen/templates/encoders.c.j2 Outdated
Comment thread code-gen/templates/encoders_old.c.j2 Outdated
Comment thread code-gen/templates/encoders_old.c.j2 Outdated
Comment thread code-gen/templates/encoders.c.j2
Comment thread code-gen/templates/encoders.c.j2
@bjackson312006 bjackson312006 changed the title Codegen Bistream Experimenting Codegen Bistream Optimization Jan 21, 2026
Copy link
Copy Markdown
Contributor

@jr1221 jr1221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only big issue imo is the prints. however I dont really get why we want to do this. if the compiler inlines the functions overhead will be extremely minimal.

Comment thread code-gen/templates/encoders.c.j2 Outdated
Copy link
Copy Markdown
Contributor

@jr1221 jr1221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two quick things. thanks for doing this.

Comment thread code-gen/templates/macros.j2
Comment thread code-gen/templates/encoders_old.c.j2 Outdated
@jr1221 jr1221 merged commit 1fe76d2 into main Feb 24, 2026
@jr1221 jr1221 deleted the test-jinja-bitmasking-stuff branch February 24, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants