p.enthalabs

GitHub - GenshIv/hft-ipc

**hft-ipc** is an ultra-low latency, zero-syscall Inter-Process Communication (IPC) library for Go. It is designed specifically for High-Frequency Trading (HFT) systems, game servers, and other performance-critical applications where microsecond or nanosecond latency is required.

Features

[](https://github.com/GenshIv/hft-ipc#features)

- **Zero-Syscall Message Passing**: After the initial memory mapping, the OS kernel is completely bypassed. Data is transferred directly through shared physical/virtual memory.

- **Lock-Free Ring Buffer**: Uses pure Go `sync/atomic` operations. No mutexes, no channels, no blocking.

- **Cache-Line Padding**: Core structures are padded to 64 bytes (`CacheLineSize`) to prevent False Sharing across CPU cores.

- **Extreme Throughput**: Capable of processing up to **~18,000,000 transactions per second** on standard hardware (as measured in local benchmarks).

- **Cross-Platform**: Works natively on Linux, Windows, and macOS. Develop locally on Windows/macOS and deploy to Linux production with zero code changes.

Installation

[](https://github.com/GenshIv/hft-ipc#installation)

go get github.com/GenshIv/hft-ipc

Architecture

[](https://github.com/GenshIv/hft-ipc#architecture) The project is built on two primary components:

1. **`shm` (Shared Memory)**: Utilizes `mmap` to project a single file (e.g., `hft_shared_memory.bin`) into the virtual memory space of multiple independent Go processes. 2. **`ringbuf` (Ring Buffer)**: A lock-free circular buffer mapped directly onto the shared memory region. Processes spin-poll the buffer using atomic `Head` and `Tail` pointers.

Quick Start

[](https://github.com/GenshIv/hft-ipc#quick-start)

1. Initialize the Ring Buffer

[](https://github.com/GenshIv/hft-ipc#1-initialize-the-ring-buffer) Both processes (e.g., Reader and Writer) must map the same file and initialize the buffer.

package main

import ( "github.com/GenshIv/hft-ipc/ringbuf" "github.com/GenshIv/hft-ipc/shm" )

func main() { capacity := uint64(1000 * 1000) size := int(ringbuf.DataOffset) + int(capacity*ringbuf.PayloadSize)

// Map memory mapped, file, _ := shm.OpenOrCreateMmap("hft_shared_memory.bin", size) defer file.Close() defer mapped.Unmap()

// Initialize Lock-Free Buffer rb := ringbuf.Init(mapped, capacity) }

2. Producer (Writer)

[](https://github.com/GenshIv/hft-ipc#2-producer-writer) Push data into the ring buffer. This operation is non-blocking. If the buffer is full, it returns `false`.

payload := make([]byte, ringbuf.PayloadSize) // ... fill payload with data (e.g., using binary.LittleEndian)

for { if rb.Push(mapped, payload) { // Successfully sent to another process! } else { // Buffer is full, handle backpressure } }

3. Consumer (Reader)

[](https://github.com/GenshIv/hft-ipc#3-consumer-reader) Pop data from the ring buffer. Uses a spin-lock strategy for minimum latency.

payload := make([]byte, ringbuf.PayloadSize)

for { if rb.Pop(mapped, payload) { // Successfully received! Process the payload. } else { // Buffer empty. Yield to scheduler or spin. runtime.Gosched() } }

Running the Examples

[](https://github.com/GenshIv/hft-ipc#running-the-examples) The repository includes a basic benchmark/demo via the `cmd` package.

**Terminal 1 (Reader):**

go run ./cmd/reader/main.go

**Terminal 2 (Writer):**

go run ./cmd/writer/main.go

Advanced Samples

[](https://github.com/GenshIv/hft-ipc#advanced-samples) The `samples/` directory contains realistic usage patterns:

1. **Market Data Feed** (`samples/marketdata`): Classic low-latency binary data transfer. 2. **High-Throughput Logger** (`samples/logger`): Offloading I/O operations from critical code paths. 3. **Hot-Swappable Plugin System** (`samples/plugin_system`): Two-way, process-level modularity using dual SPSC buffers.

4. **Dynamic Orchestrator** (`samples/price_parser`): A Multi-Producer, Single-Consumer (MPSC) architecture where multiple independent parsers (`csv_parser`, `json_parser`) write to their own channels, and a central Orchestrator dynamically discovers and polls them without Mutex locks or restarts.

``` [Server 1] [Server 2] [Server 3] +------------+ +------------+ +------------+ | Parser 1 | | Parser 3 | | Parser 5 | | Parser 2 | | Parser 4 | | Parser 6 | | | | | | | | | | | v | | v | | v | |Orchestrator| |Orchestrator| |Orchestrator| +------------+ +------------+ +------------+ | | | \---------------------+---------------------/ | v [Database] ```

See `samples/README.md` for run instructions.

Tests & Benchmarks

[](https://github.com/GenshIv/hft-ipc#tests--benchmarks) The library is completely cross-platform. You can run tests and benchmarks natively on any supported OS.

**To run unit tests:**

go test ./...

**To run benchmarks natively on your current OS:**

go test -bench . -benchmem ./benchmarks

**To run benchmarks for a specific target OS (Cross-Compilation):**

For Linux / macOS

GOOS=linux go test -bench . -benchmem ./benchmarks GOOS=darwin go test -bench . -benchmem ./benchmarks

For Windows (PowerShell)

$env:GOOS="windows"; go test -bench . -benchmem ./benchmarks

**Results (AMD Ryzen 9 7950X3D):**

- **Data Packing (CSV/JSON):** ~7.2 ns/op

- **Delivery 1-to-1:** ~56.3 ns/op (~17.5 million TPS)

- **Delivery 3-to-1 (Orchestrator):** ~51.8 ns/message (155.6 ns per 3-source cycle)

_Note: The orchestrator pattern achieves higher efficiency (43ns vs 54ns) because the fast-polling loop multiplexes data sources, practically eliminating CPU spin-wait starvation._

Use Cases

[](https://github.com/GenshIv/hft-ipc#use-cases)

- **HFT Trading Engines**: Web or TCP gateway processes handling JSON/FIX protocols can write directly to the core matching engine, decoupling I/O from computation.

- **Gateway/Engine Architecture**: Decouple slow, blocking I/O (WebSockets, HTTP) into a separate process, isolating your core business logic engine from network failures, DDoS, or GC pauses of the web server.

- **Hot-Reloading Modules**: Update components of your system on the fly by spawning a new process and redirecting the IPC ring buffer to it without stopping the main application.

Crash Resilience & Guaranteed Delivery

[](https://github.com/GenshIv/hft-ipc#crash-resilience--guaranteed-delivery)

`hft-ipc` provides robust protection against process crashes. Because the `Head`, `Tail`, and data payload are stored directly in the `mmap` file (outside the process heap), a crashing Consumer (e.g., an OOM kill) does **not** corrupt the buffer or lose unread data. When the Consumer restarts, it re-maps the file, automatically picks up the old `Tail` pointer, and resumes reading exactly where it left off.

You have two choices for reading data, depending on your strictness requirements:

1. Maximum Speed (At-Most-Once)

[](https://github.com/GenshIv/hft-ipc#1-maximum-speed-at-most-once) Using `Pop()` reads the data and _immediately_ advances the `Tail` pointer. If your process crashes immediately after `Pop()` but before processing the data, that single message is lost.

if rb.Pop(mapped, payload) { process(payload) }

2. Guaranteed Delivery (At-Least-Once)

[](https://github.com/GenshIv/hft-ipc#2-guaranteed-delivery-at-least-once)

To guarantee zero message loss, use the `Peek()` and `Ack()` pattern. `Peek()` reads the data without moving the `Tail`. Only after your business logic successfully processes the data (e.g., saves to a DB) do you call `Ack()` to mark it as consumed.

if rb.Peek(mapped, payload) { // 1. Read data and execute complex logic err := saveToDatabase(payload)

// 2. Only advance Tail if successful if err == nil { rb.Ack() } }

Kubernetes Deployment (Sidecar Pattern)

[](https://github.com/GenshIv/hft-ipc#kubernetes-deployment-sidecar-pattern) Using `hft-ipc` in Kubernetes is highly effective when applying the **Sidecar Pattern**. Since `mmap` requires shared physical/virtual memory, you must run the communicating processes within the same **Pod** and share an in-memory volume.

To achieve maximum HFT-level speed and avoid disk I/O bottlenecks, mount an `emptyDir` volume with `medium: Memory` (which maps to Linux `tmpfs` / `/dev/shm`).

Example YAML Manifest

[](https://github.com/GenshIv/hft-ipc#example-yaml-manifest)

apiVersion: v1 kind: Pod metadata: name: hft-trading-node spec:

Create a shared memory volume (RAM-backed tmpfs)

volumes:

- name: shared-memory-vol

emptyDir: medium: Memory # CRITICAL: Ensures files are kept in RAM, not on disk sizeLimit: 1Gi # Optional: limit memory usage

containers:

1. Main Orchestrator / Trading Engine

- name: orchestrator

image: my-registry/orchestrator:v1.0.0 volumeMounts:

- name: shared-memory-vol

mountPath: /app/channels # Directory where ring buffers are stored

2. Sidecar Parser (e.g. JSON WebSockets)

- name: json-parser

image: my-registry/json-parser:v1.0.0 volumeMounts:

- name: shared-memory-vol

mountPath: /app/channels # It will create /app/channels/json_parser.bin here

**Benefits of this architecture in K8s:**

1. **Zero Network Overhead:** Communication happens at tens of millions of TPS without ever touching the Kubernetes network stack (CNI, iptables, kube-proxy). 2. **Fault Isolation:** If the `json-parser` sidecar crashes or OOMs, the `orchestrator` continues to run uninterrupted. Kubelet will simply restart the parser container, and it will instantly reconnect via the shared memory file. 3. **Security:** No need to open ports or use privileged `hostIPC` flags. Everything is safely encapsulated within the Pod.

Security Architecture

[](https://github.com/GenshIv/hft-ipc#security-architecture) IPC via shared memory (`mmap`) provides an incredibly strong security posture compared to traditional network-based microservices:

1. **Zero Network Attack Surface**: `hft-ipc` opens absolutely NO network ports (TCP/UDP). It is immune to port scanning, network DDoS, packet sniffing, Man-in-the-Middle (MitM), and SSRF attacks.

2. **Strict OS-Level Permissions**: By default, `hft-ipc` creates memory-mapped files using `0600` permissions. This means that only the OS user (Owner) who started the process can read or write the data. If an attacker breaches another service on the same server under a different user (e.g., `www-data`), they will get a `Permission Denied` error when trying to access the ring buffer.

3. **Container Isolation (Kubernetes)**: When using the Sidecar pattern with an `emptyDir` volume, the shared memory is physically accessible _only_ to the containers within that specific Pod. Other Pods in the same cluster or even on the same Node cannot access it.