This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Performance Optimization

Relevant source files

This page describes techniques and strategies for optimizing throughput, latency, and memory usage in rust-muxio applications. It covers serialization efficiency, chunking strategies, memory allocation patterns, and profiling approaches. For general architectural patterns, see Core Concepts. For transport-specific tuning, see Transport Implementations.

Binary Serialization Efficiency

The system uses bitcode for binary serialization of RPC method parameters and responses. This provides compact encoding with minimal overhead compared to text-based formats like JSON.

graph LR
 
   A["Application\nRust Types"] --> B["encode_request\nbitcode::encode"]
B --> C["Vec&lt;u8&gt;\nBinary Buffer"]
C --> D["RpcHeader\nrpc_metadata_bytes"]
D --> E["Frame Protocol\nLow-Level Transport"]
E --> F["decode_request\nbitcode::decode"]
F --> A

Serialization Strategy

Sources:

src/rpc/rpc_dispatcher.rs:249-251
Diagram 6 in high-level architecture

Optimization Guidelines

Technique	Impact	Implementation
Use `#[derive(bitcode::Encode, bitcode::Decode)]`	Automatic optimal encoding	Applied in service definitions
Avoid nested `Option<Option<T>>`	Reduces byte overhead	Flatten data structures
Prefer fixed-size types over variable-length	Predictable buffer sizes	Use `[u8; N]` instead of `Vec<u8>` when size is known
Use `u32` instead of `u64` when range allows	Halves integer encoding size	RPC method IDs use `u32`

Sources:

Service definition patterns in example-muxio-rpc-service-definition
Cargo.lock:158-168 (bitcode dependencies)

Chunking Strategy and Throughput

The max_chunk_size parameter controls how large payloads are split into multiple frames. Optimal chunk size balances latency, memory usage, and transport efficiency.

graph TB
    subgraph "Small Chunks (e.g., 1KB)"
 
       A1["Lower Memory\nPer Request"] --> A2["More Frames"]
A2 --> A3["Higher CPU\nFraming Overhead"]
end
    
    subgraph "Large Chunks (e.g., 64KB)"
 
       B1["Higher Memory\nPer Request"] --> B2["Fewer Frames"]
B2 --> B3["Lower CPU\nFraming Overhead"]
end
    
    subgraph "Optimal Range"
 
       C1["8KB - 16KB"] --> C2["Balance of\nMemory & CPU"]
end

Chunk Size Selection

Performance Characteristics:

Chunk Size	Latency	Memory	CPU	Best For
1-2 KB	Excellent	Minimal	High overhead	Real-time, WASM
4-8 KB	Very Good	Low	Moderate	Standard RPC
16-32 KB	Good	Moderate	Low	Large payloads
64+ KB	Fair	High	Minimal	Bulk transfers

Sources:

src/rpc/rpc_dispatcher.rs230 (max_chunk_size parameter)
src/rpc/rpc_dispatcher.rs:260-266 (encoder initialization with chunking)

Prebuffering vs Streaming

The system supports two payload transmission modes with different performance trade-offs.

graph TB
    subgraph "Prebuffered Mode"
        PB1["RpcRequest\nis_finalized=true"]
PB2["Single write_bytes"]
PB3["Immediate end_stream"]
PB4["Low Latency\nHigh Memory"]
PB1 -->
 PB2 -->
 PB3 --> PB4
    end
    
    subgraph "Streaming Mode"
        ST1["RpcRequest\nis_finalized=false"]
ST2["Multiple write_bytes\ncalls"]
ST3["Delayed end_stream"]
ST4["Higher Latency\nLow Memory"]
ST1 -->
 ST2 -->
 ST3 --> ST4
    end

Mode Comparison

Prebuffered Response Handling

The prebuffer_response flag controls whether response payloads are accumulated before delivery:

Mode	Memory Usage	Latency	Use Case
`prebuffer_response=true`	Accumulates entire payload	Delivers complete response	Small responses, simpler logic
`prebuffer_response=false`	Streams chunks as received	Minimal per-chunk latency	Large responses, progress tracking

Implementation Details:

Prebuffering accumulates chunks in prebuffered_responses HashMap
Buffer is stored until RpcStreamEvent::End is received
Handler is invoked once with complete payload
Buffer is immediately cleared after handler invocation

Sources:

src/rpc/rpc_dispatcher.rs233 (prebuffer_response parameter)
src/rpc/rpc_dispatcher.rs:269-283 (prebuffered payload handling)
src/rpc/rpc_internals/rpc_respondable_session.rs:26-27 (prebuffering state)
src/rpc/rpc_internals/rpc_respondable_session.rs:115-147 (prebuffering logic)

Memory Management Patterns

graph LR
 
   A["Inbound Frames"] --> B["RpcDispatcher\nread_bytes"]
B --> C["Mutex Lock"]
C --> D["VecDeque\nrpc_request_queue"]
D --> E["Push/Update/Delete\nOperations"]
E --> F["Mutex Unlock"]
G["Application"] --> H["get_rpc_request"]
H --> C

Request Queue Design

The RpcDispatcher maintains an internal request queue using Arc<Mutex<VecDeque<(u32, RpcRequest)>>>. This design has specific performance implications:

Lock Contention Considerations:

Operation	Lock Duration	Frequency	Optimization
`read_bytes`	Per-frame decode	High	Minimize work under lock
`get_rpc_request`	Read access only	Medium	Returns guard, caller controls lock
`delete_rpc_request`	Single element removal	Low	Uses `VecDeque::remove`

Memory Overhead:

Each in-flight request: ~100-200 bytes base + payload size
VecDeque capacity grows as needed
Payload bytes accumulated until is_finalized=true

Sources:

src/rpc/rpc_dispatcher.rs50 (rpc_request_queue declaration)
src/rpc/rpc_dispatcher.rs:362-374 (read_bytes implementation)
src/rpc/rpc_dispatcher.rs:381-394 (get_rpc_request with lock guard)
src/rpc/rpc_dispatcher.rs:411-420 (delete_rpc_request)

Preventing Memory Leaks

The dispatcher must explicitly clean up completed or failed requests:

Critical Pattern:

Request added to queue on Header event
Payload accumulated on PayloadChunk events
Finalized on End event
Application must calldelete_rpc_request() to free memory

Failure to delete finalized requests causes unbounded memory growth.

Sources:

src/rpc/rpc_dispatcher.rs:121-141 (request creation)
src/rpc/rpc_dispatcher.rs:144-169 (payload accumulation)
src/rpc/rpc_dispatcher.rs:171-185 (finalization)
src/rpc/rpc_dispatcher.rs:411-420 (cleanup)

graph LR
 
   A["next_rpc_request_id\nu32 counter"] --> B["increment_u32_id()"]
B --> C["Assign to\nRpcHeader"]
C --> D["Store in\nresponse_handlers"]
D --> E["Match on\ninbound response"]

Request Correlation Overhead

Each outbound request is assigned a unique u32 ID for response correlation. The system uses monotonic ID generation with wraparound.

ID Generation Strategy

Performance Characteristics:

Aspect	Cost	Justification
ID generation	Minimal (single addition)	`u32::wrapping_add(1)`
HashMap insertion	O(1) average	`response_handlers.insert()`
Response lookup	O(1) average	`response_handlers.get_mut()`
Memory per handler	~24 bytes + closure size	`Box<dyn FnMut>` overhead

Concurrency Considerations:

next_rpc_request_id is NOT thread-safe
Each client connection should have its own RpcDispatcher
Sharing a dispatcher across threads requires external synchronization

Sources:

src/rpc/rpc_dispatcher.rs42 (next_rpc_request_id field)
src/rpc/rpc_dispatcher.rs:241-242 (ID assignment)
src/rpc/rpc_internals/rpc_respondable_session.rs24 (response_handlers HashMap)

graph TB
 
   A["Connection\nClosed"] --> B["fail_all_pending_requests"]
B --> C["std::mem::take\nresponse_handlers"]
C --> D["For each handler"]
D --> E["Create\nRpcStreamEvent::Error"]
E --> F["Invoke handler\nwith error"]
F --> G["Drop handler\nboxed closure"]

Handler Cleanup and Backpressure

Failed Request Handling

When a transport connection drops, all pending response handlers must be notified to prevent resource leaks and hung futures:

Implementation:

The fail_all_pending_requests() method takes ownership of all handlers and invokes them with an error event. This ensures:

Awaiting futures are woken with error result
Callback memory is freed immediately
No handlers remain registered after connection failure

Performance Impact:

Invocation cost: O(n) where n = number of pending requests
Each handler invocation is synchronous
Memory freed immediately after iteration

Sources:

src/rpc/rpc_dispatcher.rs:428-456 (fail_all_pending_requests implementation)

Benchmarking with Criterion

The codebase uses criterion for performance benchmarking. To run benchmarks:

Benchmark Structure

Key Metrics to Track:

Metric	What It Measures	Target
Throughput	Bytes/sec processed	Maximize
Latency	Time per operation	Minimize
Allocation Rate	Heap allocations	Minimize
Frame Overhead	Protocol bytes vs payload	< 5%

Sources:

Cargo.lock:317-338 (criterion dependency)
example-muxio-ws-rpc-app benchmarks

Platform-Specific Optimizations

Native (Tokio) vs WASM

Platform Tuning:

Platform	Chunk Size	Buffer Strategy	Concurrency
Native (Tokio)	8-16 KB	Reuse buffers	Multiple connections
WASM (Browser)	2-4 KB	Small allocations	Single connection
Native (Server)	16-32 KB	Pre-allocated pools	Connection pooling

WASM-Specific Considerations:

JavaScript boundary crossings have cost (~1-5 μs per call)
Minimize calls to wasm-bindgen functions
Use larger RPC payloads to amortize overhead
Prebuffer responses when possible to reduce event callbacks

Sources:

muxio-tokio-rpc-client vs muxio-wasm-rpc-client crate comparison
Cargo.lock:935-953 (WASM dependencies)

Profiling and Diagnostics

Tracing Integration

The system uses tracing for instrumentation. Enable logging to identify bottlenecks:

Key Trace Points:

Location	Event	Performance Insight
`RpcDispatcher::call`	Request initiation	Call frequency, payload sizes
`read_bytes`	Frame processing	Decode latency, lock contention
Handler callbacks	Response processing	Handler execution time

Sources:

src/rpc/rpc_dispatcher.rs12 (tracing import)
src/rpc/rpc_dispatcher.rs98 (#[instrument] macro usage)

Detecting Performance Issues

Tooling:

Sources:

DRAFT.md:29-31 (coverage tooling)
DRAFT.md:34-40 (module analysis)

Best Practices Summary

Optimization	Technique	Impact
Minimize allocations	Reuse buffers, use `Vec::with_capacity`	High
Choose optimal chunk size	8-16 KB for typical RPC	Medium
Prebuffer small responses	Enable `prebuffer_response` < 64KB	Medium
Clean up completed requests	Call `delete_rpc_request()` promptly	High
Use fixed-size types	Prefer `[u8; N]` over `Vec<u8>` in hot paths	Low
Profile before optimizing	Use criterion + flamegraph	Critical

Sources:

Performance patterns observed across src/rpc/rpc_dispatcher.rs
Memory management in src/rpc/rpc_internals/rpc_respondable_session.rs

Dismiss

Refresh this wiki

Enter email to refresh

Keyboard shortcuts

rust-muxio Documentation