Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Error Handling

Relevant source files

Purpose and Scope

This document describes the error handling architecture in the rust-muxio system. It covers error types at each layer (framing, RPC, transport), error propagation mechanisms, and patterns for handling failures in distributed RPC calls. For information about RPC service definitions and method dispatch, see RPC Framework. For transport-specific connection management, see Transport State Management.


Error Type Hierarchy

The rust-muxio system defines errors at three distinct layers, each with its own error type. These types compose to provide rich error context while maintaining clean layer separation.

Sources:

graph TB
    subgraph "Application Layer Errors"
        RpcServiceError["RpcServiceError\n(muxio-rpc-service)"]
RpcVariant["Rpc(RpcServiceErrorPayload)"]
TransportVariant["Transport(io::Error)"]
CancelledVariant["Cancelled"]
end
    
    subgraph "RPC Error Codes"
        NotFound["NotFound\nMETHOD_ID not registered"]
Fail["Fail\nHandler returned error"]
System["System\nInternal failure"]
Busy["Busy\nResource unavailable"]
end
    
    subgraph "Framing Layer Errors"
        FrameEncodeError["FrameEncodeError\n(muxio core)"]
FrameDecodeError["FrameDecodeError\n(muxio core)"]
CorruptFrame["CorruptFrame"]
end
    
 
   RpcServiceError --> RpcVariant
 
   RpcServiceError --> TransportVariant
 
   RpcServiceError --> CancelledVariant
    
 
   RpcVariant --> NotFound
 
   RpcVariant --> Fail
 
   RpcVariant --> System
 
   RpcVariant --> Busy
    
 
   TransportVariant --> FrameEncodeError
 
   TransportVariant --> FrameDecodeError
    
 
   FrameEncodeError --> CorruptFrame
 
   FrameDecodeError --> CorruptFrame

RpcServiceError

RpcServiceError is the primary error type exposed to application code. It represents failures that can occur during RPC method invocation, from method lookup through response decoding.

Error Variants

VariantDescriptionTypical Cause
Rpc(RpcServiceErrorPayload)Remote method handler returned an errorHandler logic failure, validation error, or internal server error
Transport(io::Error)Low-level transport or encoding failureNetwork disconnection, frame corruption, or serialization failure
CancelledRequest was cancelled before completionConnection dropped while request was pending

RpcServiceErrorPayload

The RpcServiceErrorPayload structure provides detailed information about server-side errors:

RpcServiceErrorCode

CodeUsageExample Scenario
NotFoundMethod ID not registered on serverClient calls method that server doesn't implement
FailHandler explicitly returned an errorBusiness logic validation failed (e.g., "item does not exist")
SystemInternal server panic or system failureHandler panicked or server encountered critical error
BusyServer cannot accept requestResource exhaustion or rate limiting

Sources:


Framing Layer Errors

The framing layer defines two error types for low-level binary protocol operations. These errors are typically wrapped in RpcServiceError::Transport before reaching application code.

FrameEncodeError

Occurs when encoding RPC requests or responses into binary frames fails. Common causes:

  • Frame data exceeds maximum allowed size
  • Corrupt internal state during encoding
  • Memory allocation failure

FrameDecodeError

Occurs when decoding binary frames into RPC messages fails. Common causes:

  • Malformed or truncated frame data
  • Protocol version mismatch
  • Corrupt frame headers
  • Mutex poisoning in shared state

Sources:


Error Propagation Through RPC Layers

The following diagram shows how errors flow from their point of origin through the system layers to the application code:

Sources:

sequenceDiagram
    participant App as "Application Code"
    participant Caller as "RpcCallPrebuffered\ntrait"
    participant Client as "RpcServiceCallerInterface\nimplementation"
    participant Dispatcher as "RpcDispatcher"
    participant Transport as "WebSocket or\nother transport"
    participant Server as "RpcServiceEndpointInterface"
    participant Handler as "Method Handler"
    
    Note over App,Handler: Success Path (for context)
    App->>Caller: method::call(args)
    Caller->>Client: call_rpc_buffered()
    Client->>Dispatcher: call()
    Dispatcher->>Transport: emit frames
    Transport->>Server: receive frames
    Server->>Handler: invoke handler
    Handler->>Server: Ok(response_bytes)
    Server->>Transport: emit response
    Transport->>Dispatcher: read_bytes()
    Dispatcher->>Client: decode response
    Client->>Caller: Ok(result)
    Caller->>App: Ok(output)
    
    Note over App,Handler: Error Path 1: Handler Failure
    App->>Caller: method::call(args)
    Caller->>Client: call_rpc_buffered()
    Client->>Dispatcher: call()
    Dispatcher->>Transport: emit frames
    Transport->>Server: receive frames
    Server->>Handler: invoke handler
    Handler->>Server: Err("validation failed")
    Server->>Transport: response with Fail code
    Transport->>Dispatcher: read_bytes()
    Dispatcher->>Client: decode RpcServiceError::Rpc
    Client->>Caller: Err(RpcServiceError::Rpc)
    Caller->>App: Err(RpcServiceError::Rpc)
    
    Note over App,Handler: Error Path 2: Transport Failure
    App->>Caller: method::call(args)
    Caller->>Client: call_rpc_buffered()
    Client->>Dispatcher: call()
    Dispatcher->>Transport: emit frames
    Transport--xClient: connection dropped
    Dispatcher->>Dispatcher: fail_all_pending_requests()
    Dispatcher->>Client: RpcStreamEvent::Error
    Client->>Caller: Err(RpcServiceError::Cancelled)
    Caller->>App: Err(RpcServiceError::Cancelled)

Dispatcher Error Handling

The RpcDispatcher is responsible for correlating requests with responses and managing error conditions during stream processing.

Mutex Poisoning Policy

The dispatcher uses a Mutex to protect the shared request queue. If this mutex becomes poisoned (i.e., a thread panicked while holding the lock), the dispatcher implements a "fail-fast" policy:

This design choice prioritizes safety over graceful degradation. A poisoned queue indicates partial state mutation, and continuing could lead to:

  • Incorrect request/response correlation
  • Data loss or duplication
  • Undefined behavior in dependent code

Sources:

Stream Error Events

When decoding errors occur during stream processing, the dispatcher generates RpcStreamEvent::Error events:

These events are delivered to registered response handlers, allowing them to detect and react to mid-stream failures.

Sources:

graph TB
 
   A["fail_all_pending_requests()"] --> B["Take ownership of\nresponse_handlers map"]
B --> C["Iterate over all\npending request IDs"]
C --> D["Create synthetic\nRpcStreamEvent::Error"]
D --> E["Invoke handler\nwith error event"]
E --> F["Handler wakes\nawaiting Future"]
F --> G["Future resolves to\nRpcServiceError::Cancelled"]

Connection Failure Cleanup

When a transport connection is dropped, all pending requests must be notified to prevent indefinite waiting. The fail_all_pending_requests() method handles this:

This ensures that all client code waiting for responses receives a timely error indication rather than hanging indefinitely.

Sources:


graph TB
 
   Start["call(client, input)"] --> Encode["encode_request(input)"]
Encode -->|io::Error| EncodeErr["Return RpcServiceError::Transport"]
Encode -->|Success| CreateReq["Create RpcRequest"]
CreateReq --> CallBuffered["call_rpc_buffered()"]
CallBuffered -->|RpcServiceError| ReturnErr1["Return error directly"]
CallBuffered -->|Success| Unwrap["Unwrap nested result"]
Unwrap --> CheckInner["Check inner Result"]
CheckInner -->|Ok bytes| Decode["decode_response(bytes)"]
CheckInner -->|Err RpcServiceError| ReturnErr2["Return RpcServiceError"]
Decode -->|io::Error| DecodeErr["Wrap as Transport error"]
Decode -->|Success| Success["Return decoded output"]

Error Handling in RpcCallPrebuffered

The RpcCallPrebuffered trait implementation demonstrates the complete error handling flow from client perspective:

The nested result structure (Result<Result<T, io::Error>, RpcServiceError>) separates transport-level errors from decoding errors:

  • Outer Result: Transport or RPC-level errors (RpcServiceError)
  • Inner Result: Decoding errors after successful transport (io::Error from deserialization)

Sources:


Testing Error Conditions

Handler Failures

Integration tests verify that handler errors propagate correctly to clients:

Sources:

Method Not Found

Tests verify that calling unregistered methods returns NotFound errors:

Sources:

Mock Client Error Injection

Unit tests use mock clients to inject specific error conditions:

Sources:


Error Code Mapping

The following table shows how different failure scenarios map to RpcServiceErrorCode values:

ScenarioCodeTypical MessageDetected By
Method ID not in registryNotFound"Method not found"Server endpoint
Handler returns Err(String)SystemHandler error messageServer endpoint
Handler panicsSystem"Method has panicked"Server endpoint (catch_unwind)
Business logic failureFailCustom validation messageHandler implementation
Transport disconnectionN/A (Cancelled variant)N/ADispatcher on connection drop
Frame decode errorN/A (Transport variant)VariesFraming layer
Serialization failureN/A (Transport variant)"Failed to encode/decode"Bitcode layer

Sources:


Best Practices

For Service Implementations

  1. UseFail for Expected Errors: Return Err("descriptive message") from handlers for expected failure cases like validation errors or missing resources.

  2. Let System Handle Panics : If a handler panics, the server automatically converts it to a System error. No explicit panic handling is needed.

  3. Provide Descriptive Messages : Error messages are transmitted to clients and should contain enough context for debugging without exposing sensitive information.

For Client Code

  1. Match on Error Variants : Distinguish between recoverable errors (Rpc with Fail code) and fatal errors (Cancelled, Transport).

  2. Handle Connection Loss : Be prepared for Cancelled errors and implement appropriate reconnection logic.

  3. Don't Swallow Transport Errors : Transport errors indicate serious issues like protocol corruption and should be logged or escalated.

For Testing

  1. Test Both Success and Failure Paths : Every RPC method should have tests for successful calls and expected error conditions.

  2. Verify Error Codes : Match on specific RpcServiceErrorCode values rather than just checking is_err().

  3. Test Connection Failures : Simulate transport disconnection to ensure proper cleanup and error propagation.

Sources:

Dismiss

Refresh this wiki

Enter email to refresh