Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,22 @@ jobs:
- name: Install cargo-hack
run: cargo install cargo-hack
- name: Build
run: cargo hack build --manifest-path cold-string/Cargo.toml --verbose --feature-powerset --version-range 1.60..
run: cargo hack build --manifest-path cold-string/Cargo.toml --verbose --feature-powerset --version-range 1.60.. --exclude-features rkyv
- name: Check
run: cargo hack check --manifest-path cold-string/Cargo.toml --verbose --feature-powerset --version-range 1.60..
run: cargo hack check --manifest-path cold-string/Cargo.toml --verbose --feature-powerset --version-range 1.60.. --exclude-features rkyv
- name: Test No Exposed Provenance
run: cargo +1.74 hack test --manifest-path cold-string/Cargo.toml --verbose --feature-powerset
run: cargo +1.74 hack test --manifest-path cold-string/Cargo.toml --verbose --feature-powerset --exclude-features rkyv
- name: Tests
run: cargo hack test --manifest-path cold-string/Cargo.toml --verbose --feature-powerset
- name: Install nightly + Miri
run: |
rustup toolchain install nightly
rustup component add miri --toolchain nightly
- name: Miri 64 bit LE
run: cargo +nightly miri test --manifest-path cold-string/Cargo.toml
run: cargo +nightly miri test --all-features --manifest-path cold-string/Cargo.toml
- name: Miri 64 bit BE
run: cargo +nightly miri test --manifest-path cold-string/Cargo.toml --target powerpc64-unknown-linux-gnu
run: cargo +nightly miri test --all-features --manifest-path cold-string/Cargo.toml --target powerpc64-unknown-linux-gnu
- name: Miri 32 bit LE
run: cargo +nightly miri test --target i686-unknown-linux-gnu
run: cargo +nightly miri test --all-features --target i686-unknown-linux-gnu
- name: Miri 32 bit BE
run: cargo +nightly miri test --manifest-path cold-string/Cargo.toml --target mips-unknown-linux-gnu
run: cargo +nightly miri test --all-features --manifest-path cold-string/Cargo.toml --target mips-unknown-linux-gnu
5 changes: 3 additions & 2 deletions bench/benches/bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,11 @@ fn bench_as_str_inner<T: FromStr + AsRef<str>>(
indices: &[usize], // Pass pre-shuffled indices
) {
// Pre-convert to the target type
let strings: Vec<_> = strings.iter()
let strings: Vec<_> = strings
.iter()
.map(|s| T::from_str(s).map_err(|_| ()).unwrap())
.collect();

let strings = black_box(strings);
let label = format!("{}-len={}-{}", name, min, max);

Expand Down
8 changes: 2 additions & 6 deletions bench/tests/memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -155,9 +155,9 @@ fn system_memory(name: &str, workload: impl Fn(usize, usize)) {
/// cargo test test_system_memory --release -- --no-capture --include-ignored
/// ```
#[test]
#[rustfmt::skip]
#[ignore]
fn test_system_memory() {
// Print table header
print!("{:<NAME_WIDTH$} ", "Crate");
for &size in SIZES {
print!(" | {:>CELL_WIDTH$}", format!("{}..={}", 0, size));
Expand All @@ -170,13 +170,9 @@ fn test_system_memory() {
}
println!();


system_memory("cold-string", hash_map_workload::<cold_string::ColdString>);
system_memory("compact_str", hash_map_workload::<compact_str::CompactString>);
system_memory(
"compact_string",
hash_map_workload::<compact_string::CompactString>,
);
system_memory("compact_string", hash_map_workload::<compact_string::CompactString>);
system_memory("smallstr", hash_map_workload::<smallstr::SmallString<[u8; 8]>>);
system_memory("smartstring", hash_map_workload::<smartstring::alias::String>);
system_memory("smol_str", hash_map_workload::<smol_str::SmolStr>);
Expand Down
138 changes: 137 additions & 1 deletion cold-string/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions cold-string/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@ maintenance = { status = "actively-developed" }
[features]
default = []
serde = ["dep:serde", "serde/alloc"]
rkyv = ["dep:rkyv", "rkyv/alloc", "rkyv/bytecheck"]

[dependencies]
serde = { version = "1.0.228", optional = true, default-features = false }
rkyv = { version = "0.8.15", optional = true, default-features = false }
sptr = { version = "0.3.2", default-features = false }
rustversion = "1.0.22"

Expand Down
42 changes: 35 additions & 7 deletions cold-string/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,33 @@
![MSRV](https://img.shields.io/crates/msrv/cold-string?style=for-the-badge)
![Downloads](https://img.shields.io/crates/d/cold-string?style=for-the-badge)

A 1-word sized representation of immutable UTF-8 strings. In-lines up to 1 word bytes. Optimized for memory usage and struct packing.
A 1-word (8-byte) sized representation of immutable UTF-8 strings that in-lines up to 8 bytes. Optimized for memory usage and struct packing.

# Overview

`ColdString` is optimized for memory efficiency for **large** and **short** strings:
- 0..=8 bytes: always 8 bytes total (fully inlined).
- 9..=128 bytes: 8-byte pointer + 1-byte length encoding
- 129..=16384 bytes: 8-byte pointer + 2-byte length encoding
- Continues logarithmically up to 18 bytes overhead for sizes up to `isize::MAX`.

Compared to `String`, which stores capacity and length inline (3 machine words), `ColdString` avoids storing length inline for heap strings and compresses metadata into tagged pointer space. This leads to substantial memory savings in benchmarks (see [Memory Comparison (System RSS)](#memory-comparison-system-rss)):
- **36% – 68%** smaller than `String` in `HashMap`
- **28% – 65%** smaller than other short-string crates in `HashMap`
- **30% – 75%** smaller than `String` in `BTreeSet`
- **13% – 63%** smaller than other short-string crates in `BTreeSet`

`ColdString`'s MSRV is 1.60, is `no_std` compatible, and is a drop in replacement for immutable Strings.

### Safety
`ColdString` is written using [Rust's strict provenance API](https://doc.rust-lang.org/beta/std/ptr/index.html#strict-provenance), carefully handles unaligned access internally, and is validated with property testing and MIRI.

### Why "Cold"?

The heap representation stores the length on the heap, not inline in the struct. This saves memory in the struct itself but *slightly* increases the cost of `len()` since it requires a heap read. In practice, the `len()` cost is only marginally slower than inline storage and is typically negligible compared to:
- Memory savings
- Cache density improvements
- Faster collection operations due to reduced footprint

# Usage

Expand Down Expand Up @@ -45,25 +71,27 @@ pub struct ColdString {
```
`encoded` acts as either a pointer to the heap for strings longer than 8 bytes or is the inlined data itself. The first/"tag" byte indicates one of 3 encodings:

## Inline Mode (0 to 7 Bytes)
### Inline Mode (0 to 7 Bytes)
The tag byte has bits 11111xxx, where xxx is the length. `self.0[1]` to `self.0[7]` store the bytes of string.

## Inline Mode (8 Bytes)
### Inline Mode (8 Bytes)
The tag byte is any valid UTF-8 byte. `self.0` stores the bytes of string. Since the string is UTF-8, the tag byte is guaranteed to not be 10xxxxx or 11111xxx.

## Heap Mode
### Heap Mode
`self.0` encodes the pointer to heap, where tag byte is 10xxxxxx. 10xxxxxx is chosen because it's a UTF-8 continuation byte and therefore an impossible tag byte for inline mode. Since a heap-alignment of 4 is chosen, the pointer's least significant 2 bits are guaranteed to be 0 ([See more](https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html#method.from_size_align)). These bits are swapped with the 10 "tag" bits when de/coding between `self.0` and the address value.

On the heap, the data starts with a variable length integer encoding of the length, followed by the bytes.
```text,ignore
ptr --> <var int length> <data>
```

# Memory Comparisons
# Memory Comparisons (Allocator)

Memory usage per string, measured by tracking the memory requested by the allocator:

![string_memory](https://github.com/user-attachments/assets/6644ae40-1da7-42e2-9ae6-0596e77e953e)
![string_memory](https://github.com/user-attachments/assets/adf09756-9910-4618-a97f-b5ab91a2515a)

## Memory Usage Comparison
## Memory Comparison (System RSS)

RSS per insertion of various collections containing strings of random lengths 0..=N:

Expand Down
Loading