BufReader

BufReader: Zero-Copy Network Reading with Advanced Memory Management

Key Takeaways

If you're short on time, here are the most important conclusions:

BufReader's Core Advantages (Concurrent Scenarios):

⭐ 98.5% GC Reduction: 134 GCs → 2 GCs (streaming server scenario)
🚀 99.93% Less Allocations: 5.57 million → 3,918 allocations
🔄 10-20x Throughput Improvement: Zero allocation + memory reuse

Key Data:

Streaming Server Scenario (100 concurrent streams):
bufio.Reader: 79 GB allocated, 134 GCs
BufReader:    0.6 GB allocated, 2 GCs

Ideal Use Cases:

✅ High-concurrency network servers
✅ Streaming media processing
✅ Long-running services (24/7)

Quick Test:

sh scripts/benchmark_bufreader.sh

Introduction

In high-performance network programming, frequent memory allocation and copying are major sources of performance bottlenecks. While Go's standard library bufio.Reader provides buffered reading capabilities, it still involves significant memory allocation and copying operations when processing network data streams. This article provides an in-depth analysis of these issues and introduces BufReader from the Monibuca project, demonstrating how to achieve zero-copy, high-performance network data reading through the GoMem memory allocator.

1. Memory Allocation Issues in Standard Library bufio.Reader

1.1 How bufio.Reader Works

bufio.Reader uses a fixed-size internal buffer to reduce system call frequency:

type Reader struct {
    buf          []byte    // Fixed-size buffer
    rd           io.Reader // Underlying reader
    r, w         int       // Read/write positions
}

func (b *Reader) Read(p []byte) (n int, err error) {
    // 1. If buffer is empty, read data from underlying reader to fill buffer
    if b.r == b.w {
        n, err = b.rd.Read(b.buf)  // Data copied to internal buffer
        b.w += n
    }
    
    // 2. Copy data from buffer to target slice
    n = copy(p, b.buf[b.r:b.w])    // Another data copy
    b.r += n
    return
}

1.2 Memory Allocation Problem Analysis

When using bufio.Reader to read network data, the following issues exist:

Issue 1: Multiple Memory Copies

sequenceDiagram
    participant N as Network Socket
    participant B as bufio.Reader Internal Buffer
    participant U as User Buffer
    participant A as Application Layer
    
    N->>B: System call reads data (1st copy)
    Note over B: Data stored in fixed buffer
    B->>U: copy() to user buffer (2nd copy)
    Note over U: User gets data copy
    U->>A: Pass to application layer (possible 3rd copy)
    Note over A: Application processes data

Each read operation requires at least two memory copies:

From network socket to bufio.Reader's internal buffer
From internal buffer to user-provided slice

Issue 2: Fixed Buffer Limitations

// bufio.Reader uses fixed-size buffer
reader := bufio.NewReaderSize(conn, 4096)  // Fixed 4KB

// Reading large chunks requires multiple operations
data := make([]byte, 16384)  // Need to read 16KB
for total := 0; total < 16384; {
    n, err := reader.Read(data[total:])  // Need to loop 4 times
    total += n
}

Issue 3: Frequent Memory Allocation

// Each read requires allocating new slices
func processPackets(reader *bufio.Reader) {
    for {
        // Allocate new memory for each packet
        header := make([]byte, 4)        // Allocation 1
        reader.Read(header)
        
        size := binary.BigEndian.Uint32(header)
        payload := make([]byte, size)    // Allocation 2
        reader.Read(payload)
        
        // After processing, memory is GC'd
        processPayload(payload)
        // Next iteration allocates again...
    }
}

1.3 Performance Impact

In high-frequency network data processing scenarios, these issues lead to:

Increased CPU Overhead: Frequent copy() operations consume CPU resources
Higher GC Pressure: Massive temporary memory allocations increase garbage collection burden
Increased Latency: Each memory allocation and copy adds processing latency
Reduced Throughput: Memory operations become bottlenecks, limiting overall throughput

2. BufReader: A Zero-Copy Solution

2.1 Design Philosophy

BufReader is designed based on the following core principles:

Zero-Copy Reading: Read directly from network to final memory location, avoiding intermediate copies
Memory Reuse: Reuse memory blocks through GoMem allocator, avoiding frequent allocations
Chained Buffering: Use multiple memory blocks in a linked list instead of a single fixed buffer
On-Demand Allocation: Dynamically adjust memory usage based on actual read amount

2.2 Core Data Structures

type BufReader struct {
    Allocator *ScalableMemoryAllocator  // Scalable memory allocator
    buf       MemoryReader               // Memory block chain reader
    totalRead int                        // Total bytes read
    BufLen    int                        // Block size per read
    Mouth     chan []byte                // Data input channel
    feedData  func() error               // Data feeding function
}

// MemoryReader manages multiple memory blocks
type MemoryReader struct {
    *Memory                    // Memory manager
    Buffers [][]byte          // Memory block chain
    Size    int               // Total size
    Length  int               // Readable length
}

2.3 Workflow

2.3.1 Zero-Copy Data Reading Flow

sequenceDiagram
    participant N as Network Socket
    participant A as ScalableMemoryAllocator
    participant B as BufReader.buf
    participant U as User Code
    
    U->>B: Read(n)
    B->>B: Check if buffer has data
    alt Buffer empty
        B->>A: Request memory block
        Note over A: Get from pool or allocate new block
        A-->>B: Return memory block reference
        B->>N: Read directly to memory block
        Note over N,B: Zero-copy: data written to final location
    end
    B-->>U: Return slice view of memory block
    Note over U: User uses directly, no copy needed
    U->>U: Process data
    U->>A: Recycle memory block (optional)
    Note over A: Block returns to pool for reuse

2.3.2 Memory Block Management Flow

graph TD
    A[Start Reading] --> B{buf has data?}
    B -->|Yes| C[Return data view directly]
    B -->|No| D[Call feedData]
    D --> E[Allocator.Read requests memory]
    E --> F{Pool has free block?}
    F -->|Yes| G[Reuse existing memory block]
    F -->|No| H[Allocate new memory block]
    G --> I[Read data from network]
    H --> I
    I --> J[Append to buf.Buffers]
    J --> K[Update Size and Length]
    K --> C
    C --> L[User reads data]
    L --> M{Data processed?}
    M -->|Yes| N[ClipFront recycle front blocks]
    N --> O[Allocator.Free return to pool]
    O --> P[End]
    M -->|No| A

2.4 Core Implementation Analysis

2.4.1 Initialization and Memory Allocation

func NewBufReader(reader io.Reader) *BufReader {
    return NewBufReaderWithBufLen(reader, defaultBufSize)
}

func NewBufReaderWithBufLen(reader io.Reader, bufLen int) *BufReader {
    r := &BufReader{
        Allocator: NewScalableMemoryAllocator(bufLen),  // Create allocator
        BufLen:    bufLen,
        feedData: func() error {
            // Key: Read from allocator, fill directly to memory block
            buf, err := r.Allocator.Read(reader, r.BufLen)
            if err != nil {
                return err
            }
            n := len(buf)
            r.totalRead += n
            // Directly append memory block reference, no copy
            r.buf.Buffers = append(r.buf.Buffers, buf)
            r.buf.Size += n
            r.buf.Length += n
            return nil
        },
    }
    r.buf.Memory = &Memory{}
    return r
}

Zero-Copy Key Points:

Allocator.Read() reads directly from io.Reader to allocated memory block
Returned buf is a reference to the actual data storage memory block
append(r.buf.Buffers, buf) only appends reference, no data copy

2.4.2 Read Operations

func (r *BufReader) ReadByte() (b byte, err error) {
    // If buffer is empty, trigger data filling
    for r.buf.Length == 0 {
        if err = r.feedData(); err != nil {
            return
        }
    }
    // Read from memory block chain, no copy needed
    return r.buf.ReadByte()
}

func (r *BufReader) ReadRange(n int, yield func([]byte)) error {
    for r.recycleFront(); n > 0 && err == nil; err = r.feedData() {
        if r.buf.Length > 0 {
            if r.buf.Length >= n {
                // Directly pass slice view of memory block, no copy
                r.buf.RangeN(n, yield)
                return
            }
            n -= r.buf.Length
            r.buf.Range(yield)
        }
    }
    return
}

Zero-Copy Benefits:

yield callback receives a slice view of the memory block
User code directly operates on original memory blocks without intermediate copying
After reading, processed blocks are automatically recycled

2.4.3 Memory Recycling

func (r *BufReader) recycleFront() {
    // Clean up processed memory blocks
    r.buf.ClipFront(r.Allocator.Free)
}

func (r *BufReader) Recycle() {
    r.buf = MemoryReader{}
    if r.Allocator != nil {
        // Return all memory blocks to allocator
        r.Allocator.Recycle()
    }
    if r.Mouth != nil {
        close(r.Mouth)
    }
}

2.5 Comparison with bufio.Reader

graph LR
    subgraph "bufio.Reader (Multiple Copies)"
        A1[Network] -->|System Call| B1[Kernel Buffer]
        B1 -->|Copy 1| C1[bufio Buffer]
        C1 -->|Copy 2| D1[User Slice]
        D1 -->|Copy 3?| E1[Application]
    end
    
    subgraph "BufReader (Zero-Copy)"
        A2[Network] -->|System Call| B2[Kernel Buffer]
        B2 -->|Direct Read| C2[GoMem Block]
        C2 -->|Slice View| D2[User Code]
        D2 -->|Recycle| C2
        C2 -->|Reuse| C2
    end

Feature	bufio.Reader	BufReader
Memory Copies	2-3 times	0 times (slice view)
Buffer Mode	Fixed-size single buffer	Variable-size chained buffer
Memory Allocation	May allocate each read	Object pool reuse
Memory Recycling	GC automatic	Active return to pool
Large Data Handling	Multiple operations needed	Single append to chain
GC Pressure	High	Very low

3. Performance Benchmarks

3.1 Test Scenario Design

3.1.1 Real Network Simulation

To make benchmarks more realistic, we implemented a mockNetworkReader that simulates real network behavior.

Real Network Characteristics:

In real network reading scenarios, the data length returned by each Read() call is uncertain, affected by multiple factors:

TCP receive window size
Network latency and bandwidth
OS buffer state
Network congestion
Network quality fluctuations

Simulation Implementation:

type mockNetworkReader struct {
    data     []byte
    offset   int
    rng      *rand.Rand
    minChunk int  // Minimum chunk size
    maxChunk int  // Maximum chunk size
}

func (m *mockNetworkReader) Read(p []byte) (n int, err error) {
    // Each time return random length data between minChunk and maxChunk
    chunkSize := m.minChunk + m.rng.Intn(m.maxChunk-m.minChunk+1)
    n = copy(p[:chunkSize], m.data[m.offset:])
    m.offset += n
    return n, nil
}

Different Network Condition Simulations:

Network Condition	Data Block Range	Real Scenario
Good Network	1024-4096 bytes	Stable LAN, premium network
Normal Network	256-2048 bytes	Regular internet connection
Poor Network	64-512 bytes	High latency, small TCP window
Worst Network	1-128 bytes	Mobile network, severe congestion

This simulation makes benchmark results more realistic and reliable.

3.1.2 Test Scenario List

We focus on the following core scenarios:

Concurrent Network Connection Reading - Demonstrates zero allocation
Concurrent Protocol Parsing - Simulates real applications
GC Pressure Test - Shows long-term running advantages ⭐
Streaming Server Scenario - Real business scenario ⭐

3.2 Benchmark Design

Core Test Scenarios

Benchmarks focus on concurrent network scenarios and GC pressure comparison:

1. Concurrent Network Connection Reading

Simulates 100+ concurrent connections continuously reading data
Each read processes 1KB data packets
bufio: Allocates new buffer each time (make([]byte, 1024))
BufReader: Zero-copy processing (ReadRange)

2. Concurrent Protocol Parsing

Simulates streaming server parsing protocol packets
Reads packet header (4 bytes) + data content
Compares memory allocation strategies

3. GC Pressure Test (⭐ Core)

Continuous concurrent reading and processing
Tracks GC count, total memory allocation, allocation count
Demonstrates differences in long-term running

4. Streaming Server Scenario (⭐ Real Application)

Simulates 100 concurrent streams
Each stream reads and forwards data to subscribers
Complete real application scenario comparison

Key Test Logic

Concurrent Reading:

// bufio.Reader - Allocate each time
buf := make([]byte, 1024)  // 1KB allocation
n, _ := reader.Read(buf)
processData(buf[:n])

// BufReader - Zero-copy
reader.ReadRange(1024, func(data []byte) {
    processData(data)  // Direct use, no allocation
})

GC Statistics:

// Record GC statistics
var beforeGC, afterGC runtime.MemStats
runtime.ReadMemStats(&beforeGC)

b.RunParallel(func(pb *testing.PB) {
    // Concurrent testing...
})

runtime.ReadMemStats(&afterGC)
b.ReportMetric(float64(afterGC.NumGC-beforeGC.NumGC), "gc-runs")
b.ReportMetric(float64(afterGC.TotalAlloc-beforeGC.TotalAlloc)/1024/1024, "MB-alloc")

Complete test code: pkg/util/buf_reader_benchmark_test.go

3.3 Running Benchmarks

We provide complete benchmark code (pkg/util/buf_reader_benchmark_test.go) and convenient test scripts.

Method 1: Using Test Script (Recommended)

# Run complete benchmark suite
sh scripts/benchmark_bufreader.sh

This script will run all tests sequentially and output user-friendly results.

Method 2: Manual Testing

cd pkg/util

# Run all benchmarks
go test -bench=BenchmarkConcurrent -benchmem -benchtime=2s -test.run=xxx

# Run specific tests
go test -bench=BenchmarkGCPressure -benchmem -benchtime=5s -test.run=xxx

# Run streaming server scenario
go test -bench=BenchmarkStreamingServer -benchmem -benchtime=3s -test.run=xxx

Method 3: Run Key Tests Only

cd pkg/util

# GC pressure comparison (core advantage)
go test -bench=BenchmarkGCPressure -benchmem -test.run=xxx

# Streaming server scenario (real application)
go test -bench=BenchmarkStreamingServer -benchmem -test.run=xxx

3.4 Actual Performance Test Results

Actual results from running benchmarks on Apple M2 Pro:

Test Environment:

CPU: Apple M2 Pro (12 cores)
OS: macOS (darwin/arm64)
Go: 1.23.0

3.4.1 Core Performance Comparison

Test Scenario	bufio.Reader	BufReader	Difference
Concurrent Network Read	103.2 ns/op 1027 B/op, 1 allocs	147.6 ns/op 4 B/op, 0 allocs	Zero alloc ⭐
GC Pressure Test	1874 ns/op 5,576,659 mallocs 3 gc-runs	112.7 ns/op 3,918 mallocs 2 gc-runs	16.6x faster ⭐⭐⭐
Streaming Server	374.6 ns/op 79,508 MB-alloc 134 gc-runs	30.29 ns/op 601 MB-alloc 2 gc-runs	12.4x faster ⭐⭐⭐

3.4.2 GC Pressure Comparison (Core Finding)

GC Pressure Test results best demonstrate long-term running differences:

bufio.Reader:

Operation Latency:   1874 ns/op
Allocation Count:    5,576,659 times (over 5 million!)
GC Runs:            3 times
Per Operation:      2 allocs/op

BufReader:

Operation Latency:   112.7 ns/op (16.6x faster)
Allocation Count:    3,918 times (99.93% reduction)
GC Runs:            2 times
Per Operation:      0 allocs/op (zero allocation!)

Key Metrics:

🚀 16x Throughput Improvement: 45.7M ops/s vs 2.8M ops/s
⭐ 99.93% Allocation Reduction: From 5.57 million to 3,918 times
✨ Zero Allocation Operations: 0 allocs/op vs 2 allocs/op

3.4.3 Streaming Server Scenario (Real Application)

Simulating 100 concurrent streams, continuously reading and forwarding data:

bufio.Reader:

Operation Latency:   374.6 ns/op
Memory Allocation:   79,508 MB (79 GB!)
GC Runs:            134 times
Per Operation:      4 allocs/op

BufReader:

Operation Latency:   30.29 ns/op (12.4x faster)
Memory Allocation:   601 MB (99.2% reduction)
GC Runs:            2 times (98.5% reduction!)
Per Operation:      0 allocs/op

Stunning Differences:

🎯 GC Runs: 134 → 2 (98.5% reduction)
💾 Memory Allocation: 79 GB → 0.6 GB (132x reduction)
⚡ Throughput: 10.1M → 117M ops/s (11.6x improvement)

3.4.4 Long-Term Running Impact

For streaming server scenarios, 1-hour running estimation:

bufio.Reader:

Estimated Memory Allocation: ~2.8 TB
Estimated GC Runs: ~4,800 times
Cumulative GC Pause: Significant

BufReader:

Estimated Memory Allocation: ~21 GB (133x reduction)
Estimated GC Runs: ~72 times (67x reduction)
Cumulative GC Pause: Minimal

Usage Recommendations:

Scenario	Recommended	Reason
Simple file reading	bufio.Reader	Standard library sufficient
High-concurrency network server	BufReader ⭐	98% GC reduction
Streaming media processing	BufReader ⭐	Zero allocation, high throughput
Long-running services	BufReader ⭐	More stable system

3.4.5 Essential Reasons for Performance Improvement

While bufio.Reader is faster in some simple scenarios, BufReader's design goals are not to be faster in all cases, but rather:

Eliminate Memory Allocation - Avoid frequent make([]byte, n) in real applications
Reduce GC Pressure - Reuse memory through object pool, reducing garbage collection burden
Zero-Copy Processing - Provide ReadRange API for direct data manipulation
Chained Buffering - Support complex data processing patterns

In scenarios like Monibuca streaming server, the value of these features far exceeds microsecond-level latency differences.

Real Impact: When handling 1000 concurrent streaming connections:

// bufio.Reader approach
// 1000 connections × 30fps × 1024 bytes/packet = 30,720,000 allocations per second
// 1024 bytes per allocation = ~30GB/sec temporary memory allocation
// Triggers massive GC

// BufReader approach  
// 0 allocations (memory reuse)
// 90%+ GC pressure reduction
// Significantly improved system stability

Selection Guidelines:

📁 Simple file reading → bufio.Reader
🔄 High-concurrency network services → BufReader (98% GC reduction)
💾 Long-running services → BufReader (zero allocation)
🎯 Streaming server → BufReader (10-20x throughput)

4. Real-World Use Cases

4.1 RTSP Protocol Parsing

// Use BufReader to parse RTSP requests
func parseRTSPRequest(conn net.Conn) (*RTSPRequest, error) {
    reader := util.NewBufReader(conn)
    defer reader.Recycle()
    
    // Read request line: zero-copy, no memory allocation
    requestLine, err := reader.ReadLine()
    if err != nil {
        return nil, err
    }
    
    // Read headers: directly operate on memory blocks
    headers, err := reader.ReadMIMEHeader()
    if err != nil {
        return nil, err
    }
    
    // Read body (if present)
    if contentLength := headers.Get("Content-Length"); contentLength != "" {
        length, _ := strconv.Atoi(contentLength)
        // ReadRange provides zero-copy data access
        var body []byte
        err = reader.ReadRange(length, func(chunk []byte) {
            body = append(body, chunk...)
        })
    }
    
    return &RTSPRequest{
        RequestLine: requestLine,
        Headers:     headers,
    }, nil
}

4.2 Streaming Media Packet Parsing

// Use BufReader to parse FLV packets
func parseFLVPackets(conn net.Conn) error {
    reader := util.NewBufReader(conn)
    defer reader.Recycle()
    
    for {
        // Read packet header: 4 bytes
        packetType, err := reader.ReadByte()
        if err != nil {
            return err
        }
        
        // Read data size: 3 bytes big-endian
        dataSize, err := reader.ReadBE32(3)
        if err != nil {
            return err
        }
        
        // Read timestamp: 4 bytes
        timestamp, err := reader.ReadBE32(4)
        if err != nil {
            return err
        }
        
        // Skip StreamID: 3 bytes
        if err := reader.Skip(3); err != nil {
            return err
        }
        
        // Read actual data: zero-copy processing
        err = reader.ReadRange(int(dataSize), func(data []byte) {
            // Process data directly, no copy needed
            processPacket(packetType, timestamp, data)
        })
        if err != nil {
            return err
        }
        
        // Skip previous tag size
        if err := reader.Skip(4); err != nil {
            return err
        }
    }
}

4.3 Performance-Critical Scenarios

BufReader is particularly suitable for:

High-frequency small packet processing: Network protocol parsing, RTP/RTCP packet handling
Large data stream transmission: Continuous reading of video/audio streams
Multi-step protocol reading: Protocols requiring step-by-step reading of different length data
Low-latency requirements: Real-time streaming media transmission, online gaming
High-concurrency scenarios: Servers with massive concurrent connections

5. Best Practices

5.1 Correct Usage Patterns

// ✅ Correct: Specify appropriate block size on creation
func goodExample(conn net.Conn) {
    // Choose block size based on actual packet size
    reader := util.NewBufReaderWithBufLen(conn, 16384)  // 16KB blocks
    defer reader.Recycle()  // Ensure resource recycling
    
    // Use ReadRange for zero-copy
    reader.ReadRange(1024, func(data []byte) {
        // Process directly, don't hold reference to data
        process(data)
    })
}

// ❌ Wrong: Forget to recycle resources
func badExample1(conn net.Conn) {
    reader := util.NewBufReader(conn)
    // Missing defer reader.Recycle()
    // Memory blocks cannot be returned to object pool
}

// ❌ Wrong: Holding data reference
var globalData []byte

func badExample2(conn net.Conn) {
    reader := util.NewBufReader(conn)
    defer reader.Recycle()
    
    reader.ReadRange(1024, func(data []byte) {
        // ❌ Wrong: data will be recycled after Recycle
        globalData = data  // Dangling reference
    })
}

// ✅ Correct: Copy when data needs to be retained
func goodExample2(conn net.Conn) {
    reader := util.NewBufReader(conn)
    defer reader.Recycle()
    
    var saved []byte
    reader.ReadRange(1024, func(data []byte) {
        // Explicitly copy when retention needed
        saved = make([]byte, len(data))
        copy(saved, data)
    })
    // Now safe to use saved
}

5.2 Block Size Selection

// Choose appropriate block size based on scenario
const (
    // Small packet protocols (e.g., RTSP, HTTP headers)
    SmallPacketSize = 4 << 10   // 4KB
    
    // Medium data streams (e.g., audio)
    MediumPacketSize = 16 << 10  // 16KB
    
    // Large data streams (e.g., video)
    LargePacketSize = 64 << 10   // 64KB
)

func createReaderForProtocol(conn net.Conn, protocol string) *util.BufReader {
    var bufSize int
    switch protocol {
    case "rtsp", "http":
        bufSize = SmallPacketSize
    case "audio":
        bufSize = MediumPacketSize
    case "video":
        bufSize = LargePacketSize
    default:
        bufSize = util.defaultBufSize
    }
    return util.NewBufReaderWithBufLen(conn, bufSize)
}

5.3 Error Handling

func robustRead(conn net.Conn) error {
    reader := util.NewBufReader(conn)
    defer func() {
        // Ensure resources are recycled in all cases
        reader.Recycle()
    }()
    
    // Set timeout
    conn.SetReadDeadline(time.Now().Add(5 * time.Second))
    
    // Read data
    data, err := reader.ReadBytes(1024)
    if err != nil {
        if err == io.EOF {
            // Normal end
            return nil
        }
        // Handle other errors
        return fmt.Errorf("read error: %w", err)
    }
    
    // Process data
    processData(data)
    return nil
}

6. Performance Optimization Tips

6.1 Batch Processing

// ✅ Optimized: Batch reading and processing
func optimizedBatchRead(reader *util.BufReader) error {
    // Read large chunk of data at once
    return reader.ReadRange(65536, func(chunk []byte) {
        // Batch processing in callback
        for len(chunk) > 0 {
            packetSize := int(binary.BigEndian.Uint32(chunk[:4]))
            packet := chunk[4 : 4+packetSize]
            processPacket(packet)
            chunk = chunk[4+packetSize:]
        }
    })
}

// ❌ Inefficient: Read one by one
func inefficientRead(reader *util.BufReader) error {
    for {
        size, err := reader.ReadBE32(4)
        if err != nil {
            return err
        }
        packet, err := reader.ReadBytes(int(size))
        if err != nil {
            return err
        }
        processPacket(packet.Buffers[0])
    }
}

6.2 Avoid Unnecessary Copying

// ✅ Optimized: Direct processing, no copy
func zeroCopyProcess(reader *util.BufReader) error {
    return reader.ReadRange(4096, func(data []byte) {
        // Operate directly on original memory
        sum := 0
        for _, b := range data {
            sum += int(b)
        }
        reportChecksum(sum)
    })
}

// ❌ Inefficient: Unnecessary copy
func unnecessaryCopy(reader *util.BufReader) error {
    mem, err := reader.ReadBytes(4096)
    if err != nil {
        return err
    }
    // Another copy performed
    data := make([]byte, mem.Size)
    copy(data, mem.Buffers[0])
    
    sum := 0
    for _, b := range data {
        sum += int(b)
    }
    reportChecksum(sum)
    return nil
}

6.3 Proper Resource Management

// ✅ Optimized: Use object pool to manage BufReader
type ConnectionPool struct {
    readers sync.Pool
}

func (p *ConnectionPool) GetReader(conn net.Conn) *util.BufReader {
    if reader := p.readers.Get(); reader != nil {
        r := reader.(*util.BufReader)
        // Re-initialize
        return r
    }
    return util.NewBufReader(conn)
}

func (p *ConnectionPool) PutReader(reader *util.BufReader) {
    reader.Recycle()  // Recycle memory blocks
    p.readers.Put(reader)  // Recycle BufReader object itself
}

// Use connection pool
func handleConnection(pool *ConnectionPool, conn net.Conn) {
    reader := pool.GetReader(conn)
    defer pool.PutReader(reader)
    
    // Handle connection
    processConnection(reader)
}

7. Summary

7.1 Performance Comparison Visualization

Based on actual benchmark results (concurrent scenarios):

📊 GC Runs Comparison (Core Advantage) ⭐⭐⭐
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bufio.Reader   ████████████████████████████████████████████████████████████████  134 runs
BufReader      █  2 runs  ← 98.5% reduction!

📊 Total Memory Allocation Comparison
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bufio.Reader   ████████████████████████████████████████████████████████████████  79 GB
BufReader      █  0.6 GB  ← 99.2% reduction!

📊 Operation Throughput Comparison
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bufio.Reader   █████  10.1M ops/s
BufReader      ████████████████████████████████████████████████████████  117M ops/s  ← 11.6x!

Key Metrics (Streaming Server Scenario):

🎯 GC Runs: From 134 to 2 (98.5% reduction)
💾 Memory Allocation: From 79 GB to 0.6 GB (132x reduction)
⚡ Throughput: 11.6x improvement

7.2 Core Advantages

BufReader achieves zero-copy, high-performance network data reading through:

Zero-Copy Architecture
- Data read directly from network to final memory location
- Use slice views to avoid data copying
- Chained buffer supports large data processing
Memory Reuse Mechanism
- GoMem object pool reuses memory blocks
- Active memory management reduces GC pressure
- Configurable block sizes adapt to different scenarios
Significant Performance Improvement (in concurrent scenarios)
- GC runs reduced by 98.5% (134 → 2)
- Memory allocation reduced by 99.2% (79 GB → 0.6 GB)
- Throughput improved by 10-20x
- Significantly improved system stability

7.3 Ideal Use Cases

BufReader is particularly suitable for:

✅ High-performance network servers
✅ Streaming media data processing
✅ Real-time protocol parsing
✅ Large data stream transmission
✅ Low-latency requirements
✅ High-concurrency environments

Not suitable for:

❌ Simple file reading (standard library sufficient)
❌ Single small data reads
❌ Performance-insensitive scenarios

7.4 Choosing Between bufio.Reader and BufReader

Scenario	Recommended
Simple file reading	bufio.Reader
Low-frequency network reads	bufio.Reader
High-performance network server	BufReader
Streaming media processing	BufReader
Protocol parsers	BufReader
Zero-copy requirements	BufReader
Memory-sensitive scenarios	BufReader

7.5 Key Points

Remember when using BufReader:

Always call Recycle(): Ensure memory blocks are returned to object pool
Don't hold data references: Data in ReadRange callback will be recycled
Choose appropriate block size: Adjust based on actual packet size
Leverage ReadRange: Achieve true zero-copy processing
Use with GoMem: Fully leverage memory reuse advantages

Through the combination of BufReader and GoMem, Monibuca achieves high-performance network data processing, providing solid infrastructure support for streaming media servers.

References

GoMem Project
Monibuca v5 Documentation
Object Reuse Technology Deep Dive
Go standard library bufio package source code
Go standard library sync.Pool documentation

Uh oh!

BufReader

BufReader: Zero-Copy Network Reading with Advanced Memory Management

Table of Contents

Key Takeaways

Introduction

1. Memory Allocation Issues in Standard Library bufio.Reader

1.1 How bufio.Reader Works

1.2 Memory Allocation Problem Analysis

1.3 Performance Impact

2. BufReader: A Zero-Copy Solution

2.1 Design Philosophy

2.2 Core Data Structures

2.3 Workflow

2.3.1 Zero-Copy Data Reading Flow

2.3.2 Memory Block Management Flow

2.4 Core Implementation Analysis

2.4.1 Initialization and Memory Allocation

2.4.2 Read Operations

2.4.3 Memory Recycling

2.5 Comparison with bufio.Reader

3. Performance Benchmarks

3.1 Test Scenario Design

3.1.1 Real Network Simulation

3.1.2 Test Scenario List

3.2 Benchmark Design

Core Test Scenarios

Key Test Logic

3.3 Running Benchmarks

Method 1: Using Test Script (Recommended)

Method 2: Manual Testing

Method 3: Run Key Tests Only

3.4 Actual Performance Test Results

3.4.1 Core Performance Comparison

3.4.2 GC Pressure Comparison (Core Finding)

3.4.3 Streaming Server Scenario (Real Application)

3.4.4 Long-Term Running Impact

3.4.5 Essential Reasons for Performance Improvement

4. Real-World Use Cases

4.1 RTSP Protocol Parsing

4.2 Streaming Media Packet Parsing

4.3 Performance-Critical Scenarios

5. Best Practices

5.1 Correct Usage Patterns

5.2 Block Size Selection

5.3 Error Handling

6. Performance Optimization Tips

6.1 Batch Processing

6.2 Avoid Unnecessary Copying

6.3 Proper Resource Management

7. Summary

7.1 Performance Comparison Visualization

7.2 Core Advantages

7.3 Ideal Use Cases

7.4 Choosing Between bufio.Reader and BufReader

7.5 Key Points

References

Uh oh!

Clone this wiki locally