D-MemFS

An in-process virtual filesystem with hard quota enforcement for Python.

Why MFS?

MemoryFileSystem gives you a fully isolated filesystem-like workspace inside a Python process.

Hard quota (MFSQuotaExceededError) to reject oversized writes before OOM
Hierarchical directories and multi-file operations (import_tree, copy_tree, move)
File-level RW locking + global structure lock for thread-safe operations
Free-threaded Python compatible (PYTHON_GIL=0) — stress-tested under 50-thread contention
Async wrapper (AsyncMemoryFileSystem) powered by asyncio.to_thread
Zero runtime dependencies (standard library only)

This is useful when io.BytesIO is too primitive (single buffer), and OS-level RAM disks/tmpfs are impractical (permissions, container policy, Windows driver friction).

Installation

pip install D-MemFS

Requirements: Python 3.11+

Quick Start

from dmemfs import MemoryFileSystem, MFSQuotaExceededError

mfs = MemoryFileSystem(max_quota=64 * 1024 * 1024)

mfs.mkdir("/data")
with mfs.open("/data/hello.bin", "wb") as f:
    f.write(b"hello")

with mfs.open("/data/hello.bin", "rb") as f:
    print(f.read())  # b"hello"

print(mfs.listdir("/data"))
print(mfs.is_file("/data/hello.bin"))  # True

try:
    with mfs.open("/huge.bin", "wb") as f:
        f.write(bytes(512 * 1024 * 1024))
except MFSQuotaExceededError as e:
    print(e)

API Highlights

`MemoryFileSystem`

open(path, mode, *, preallocate=0, lock_timeout=None)
mkdir, remove, rmtree, rename, move, copy, copy_tree
listdir, exists, is_dir, is_file, walk, glob
stat, stats, get_size
export_as_bytesio, export_tree, iter_export_tree, import_tree

Constructor parameters:

max_quota (default 256 MiB): byte quota for file data
max_nodes (default None): optional cap on total node count (files + directories). Raises MFSNodeLimitExceededError when exceeded.
default_storage (default "auto"): storage backend for new files — "auto" / "sequential" / "random_access"
promotion_hard_limit (default None): byte threshold above which Sequential→RandomAccess auto-promotion is suppressed (None uses the built-in 512 MiB limit)
chunk_overhead_override (default None): override the per-chunk overhead estimate used for quota accounting

Note: The BytesIO returned by export_as_bytesio() is outside quota management. Exporting large files may consume significant process memory beyond the configured quota limit.

Supported binary modes: rb, wb, ab, r+b, xb

`MemoryFileHandle`

read, write, seek, tell, truncate, flush, close
file-like capability checks: readable, writable, seekable

flush() is intentionally a no-op (compatibility API for file-like integrations).

`stat()` return (`MFSStatResult`)

size, created_at, modified_at, generation, is_dir

Supports both files and directories
For directories: size=0, generation=0, is_dir=True

Text Mode

D-MemFS natively operates in binary mode. For text I/O, use MFSTextHandle:

from dmemfs import MemoryFileSystem, MFSTextHandle

mfs = MemoryFileSystem()
mfs.mkdir("/data")

# Write text
with mfs.open("/data/hello.bin", "wb") as f:
    th = MFSTextHandle(f, encoding="utf-8")
    th.write("こんにちは世界\n")
    th.write("Hello, World!\n")

# Read text line by line
with mfs.open("/data/hello.bin", "rb") as f:
    th = MFSTextHandle(f, encoding="utf-8")
    for line in th:
        print(line, end="")

MFSTextHandle is a thin, bufferless wrapper. It encodes on write() and decodes on read() / readline(). Unlike io.TextIOWrapper, it introduces no buffering issues when used with MemoryFileHandle.

Use Case Tutorials

ETL Staging

Stage data through raw → processed → output directories:

from dmemfs import MemoryFileSystem

mfs = MemoryFileSystem(max_quota=16 * 1024 * 1024)
mfs.mkdir("/raw")
mfs.mkdir("/processed")

raw_data = b"id,name,value\n1,foo,100\n2,bar,200\n"
with mfs.open("/raw/data.csv", "wb") as f:
    f.write(raw_data)

with mfs.open("/raw/data.csv", "rb") as f:
    data = f.read()

with mfs.open("/processed/data.csv", "wb") as f:
    f.write(data.upper())

mfs.rmtree("/raw")  # cleanup staging

Archive-like Operations

Store, list, and export multiple files as a tree:

from dmemfs import MemoryFileSystem

mfs = MemoryFileSystem()
mfs.import_tree({
    "/archive/doc1.bin": b"Document 1",
    "/archive/doc2.bin": b"Document 2",
    "/archive/sub/doc3.bin": b"Document 3",
})

print(mfs.listdir("/archive"))  # ['doc1.bin', 'doc2.bin', 'sub']

snapshot = mfs.export_tree(prefix="/archive")  # dict of {path: bytes}

SQLite Snapshot

Serialize an in-memory SQLite DB into MFS and restore it later:

import sqlite3
from dmemfs import MemoryFileSystem

mfs = MemoryFileSystem()
conn = sqlite3.connect(":memory:")
conn.execute("CREATE TABLE t (id INTEGER, val TEXT)")
conn.execute("INSERT INTO t VALUES (1, 'hello')")
conn.commit()

with mfs.open("/snapshot.db", "wb") as f:
    f.write(conn.serialize())
conn.close()

with mfs.open("/snapshot.db", "rb") as f:
    raw = f.read()
restored = sqlite3.connect(":memory:")
restored.deserialize(raw)
rows = restored.execute("SELECT * FROM t").fetchall()  # [(1, 'hello')]

Concurrency and Locking Notes

Path/tree operations are guarded by _global_lock.
File access is guarded by per-file ReadWriteLock.
lock_timeout behavior:
- None: block indefinitely
- 0.0: try-lock (fail immediately with BlockingIOError)
- > 0: timeout in seconds, then BlockingIOError
Current ReadWriteLock is non-fair: under sustained read load, writers can starve.

Operational guidance:

Keep lock hold duration short
Set an explicit lock_timeout in latency-sensitive code paths
walk() and glob() provide weak consistency: each directory level is snapshotted under _global_lock, but the overall traversal is NOT atomic. Concurrent structural changes may produce inconsistent results.

Async Usage

from dmemfs import AsyncMemoryFileSystem

async def run() -> None:
    mfs = AsyncMemoryFileSystem(max_quota=64 * 1024 * 1024)
    await mfs.mkdir("/a")
    async with await mfs.open("/a/f.bin", "wb") as f:
        await f.write(b"data")
    async with await mfs.open("/a/f.bin", "rb") as f:
        print(await f.read())

Benchmarks

Minimal benchmark tooling is included:

MFS vs io.BytesIO vs PyFilesystem2 (MemoryFS) vs tempfile
Cases: many-small-files and stream write/read
Optional report output to benchmarks/results/

Note: As of setuptools 82 (February 2026), pyfilesystem2 fails to import due to a known upstream issue (#597). Benchmark results including PyFilesystem2 were measured with setuptools ≤ 81 and are valid as historical comparison data.

Run:

uvx --with-requirements requirements.txt --with-editable . python benchmarks/compare_backends.py --save-md auto --save-json auto

See BENCHMARK.md for details.

Latest benchmark snapshot:

benchmark_current_result.md

Testing and Coverage

Test execution and dev flow are documented in TESTING.md.

Typical local run:

uv pip compile requirements.in -o requirements.txt
uvx --with-requirements requirements.txt --with-editable . pytest tests/ -v --timeout=30 --cov=dmemfs --cov-report=xml --cov-report=term-missing

CI (.github/workflows/test.yml) runs tests with coverage XML generation.

API Docs Generation

API docs can be generated as Markdown (viewable on GitHub) using pydoc-markdown:

uvx --with pydoc-markdown --with-editable . pydoc-markdown '{
  loaders: [{type: python, search_path: [.]}],
  processors: [{type: filter, expression: "default()"}],
  renderer: {type: markdown, filename: docs/api_md/index.md}
}'

Or as HTML using pdoc (local browsing only):

uvx --with-requirements requirements.txt pdoc dmemfs -o docs/api

API Reference (Markdown)

Compatibility and Non-Goals

Core open() is binary-only (rb, wb, ab, r+b, xb). Text I/O is available via the MFSTextHandle wrapper.
No symlink/hardlink support — intentionally omitted to eliminate path traversal loops and structural complexity (same rationale as pathlib.PurePath).
No direct pathlib.Path / os.PathLike API — MFS paths are virtual and must not be confused with host filesystem paths. Accepting os.PathLike would allow third-party libraries or a plain open() call to silently treat an MFS virtual path as a real OS path, potentially issuing unintended syscalls against the host filesystem. All paths must be plain str with POSIX-style absolute notation (e.g. "/data/file.txt").
No kernel filesystem integration (intentionally in-process only)

Auto-promotion behavior:

By default (default_storage="auto"), new files start as SequentialMemoryFile and auto-promote to RandomAccessMemoryFile when random writes are detected.
Promotion is one-way (no downgrade back to sequential).
Use default_storage="sequential" or "random_access" to fix the backend at construction; use promotion_hard_limit to suppress auto-promotion above a byte threshold.
Storage promotion temporarily doubles memory usage for the promoted file. The quota system accounts for this, but process-level memory may spike briefly.

Security note: In-memory data may be written to physical disk via OS swap or core dumps. MFS does not provide memory-locking (e.g., mlock) or secure erasure. Do not rely on MFS alone for sensitive data isolation.

Exception Reference

Exception	Typical cause
`MFSQuotaExceededError`	write/import/copy would exceed quota
`MFSNodeLimitExceededError`	node count would exceed `max_nodes` (subclass of `MFSQuotaExceededError`)
`FileNotFoundError`	path missing
`FileExistsError`	creation target already exists
`IsADirectoryError`	file operation on directory
`NotADirectoryError`	directory operation on file
`BlockingIOError`	lock timeout or open-file conflict
`io.UnsupportedOperation`	mode mismatch / unsupported operation
`ValueError`	invalid mode/path/seek/truncate arguments

Testing with pytest

D-MemFS ships a pytest plugin that provides an mfs fixture:

# conftest.py — register the plugin explicitly
pytest_plugins = ["dmemfs._pytest_plugin"]

Note: The plugin is not auto-discovered. Users must declare it in conftest.py to opt in.

# test_example.py
def test_write_read(mfs):
    mfs.mkdir("/tmp")
    with mfs.open("/tmp/hello.txt", "wb") as f:
        f.write(b"hello")
    with mfs.open("/tmp/hello.txt", "rb") as f:
        assert f.read() == b"hello"

Development Notes

Design documents (Japanese):

Architecture Spec v13 — API design, internal structure, CI matrix
Detailed Design Spec — component-level design and rationale
Test Design Spec — test case table and pseudocode

These documents are written in Japanese and serve as internal design references.

Performance Summary

Key results from the included benchmark (300 small files × 4 KiB, 16 MiB stream, 2 GiB large stream):

Case	MFS (ms)	BytesIO (ms)	tempfile (ms)
small_files_rw	34	5	164
stream_write_read	64	51	17
random_access_rw	24	53	27
large_stream_write_read	1 438	7 594	1 931
many_files_random_read	777	163	4 745

MFS incurs a small overhead on tiny-file workloads but delivers significantly better performance on large streams and random-access patterns compared with BytesIO. See BENCHMARK.md and benchmark_current_result.md for full data.

Note: tempfile results above were measured with the system temp directory on a RAM disk. On a physical SSD/HDD, tempfile performance will be substantially slower.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
dmemfs		dmemfs
docs		docs
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
BENCHMARK.md		BENCHMARK.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_ja.md		README_ja.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
TESTING.md		TESTING.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D-MemFS

Why MFS?

Installation

Quick Start

API Highlights

`MemoryFileSystem`

`MemoryFileHandle`

`stat()` return (`MFSStatResult`)

Text Mode

Use Case Tutorials

ETL Staging

Archive-like Operations

SQLite Snapshot

Concurrency and Locking Notes

Async Usage

Benchmarks

Testing and Coverage

API Docs Generation

Compatibility and Non-Goals

Exception Reference

Testing with pytest

Development Notes

Performance Summary

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

nightmarewalker/D-MemFS

Folders and files

Latest commit

History

Repository files navigation

D-MemFS

Why MFS?

Installation

Quick Start

API Highlights

MemoryFileSystem

MemoryFileHandle

stat() return (MFSStatResult)

Text Mode

Use Case Tutorials

ETL Staging

Archive-like Operations

SQLite Snapshot

Concurrency and Locking Notes

Async Usage

Benchmarks

Testing and Coverage

API Docs Generation

Compatibility and Non-Goals

Exception Reference

Testing with pytest

Development Notes

Performance Summary

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`MemoryFileSystem`

`MemoryFileHandle`

`stat()` return (`MFSStatResult`)

Packages