# bitcache

**Repository Path**: mirrors_jwilder/bitcache

## Basic Information

- **Project Name**: bitcache
- **Description**: Bitcask inspired persistent cache
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-25
- **Last Updated**: 2026-03-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# BitCache

A simple, fast persistent cache in Go library, inspired by the Bitcask design.

## Features

- **Fast**: O(1) average time complexity for lookups with an in-memory partitioned hash map index
- **Memory Cache Layer**: Optional high-performance read-through cache with bounded memory
- **Reliable**: CRC32 checksums for data integrity
- **Crash-safe**: Recovers from corrupted files by skipping bad entries with CRC validation
- **Compaction**: Reclaims disk space by removing old segments based on age and disk usage limits
- **Concurrent**: Concurrent reads and writes with partitioned locks to minimize contention
- **Low GC Pressure**: Memory cache uses object pooling to reduce allocations
- **Simple**: Clean API with minimal dependencies
- **Write-once**: Keys can be written multiple times but old values are never updated in place

## Design

BitCache uses a log-structured storage approach inspired by Bitcask:

- **Append-only log**: All writes go sequentially to an active segment file. When it reaches the size threshold, it's rotated and a new segment is created.
- **In-memory index**: A partitioned hash map (slice of maps keyed by segment ID, then by key) enables O(1) average time lookups and concurrent access without a global lock. Keys are hashed to determine which partition they belong to.
- **Compaction**: Removes old segments based on age (MaxAge) and/or disk usage limits (MaxDiskUsage). When a segment is removed, all its keys are evicted from the in-memory index by deleting the entire partition map for that segment.
- **Memory cache layer**: Optional read-through cache using sharded hash maps with LRU/LFU eviction and object pooling to reduce GC pressure.
- **Crash recovery**: On startup, the key directory is rebuilt by scanning log files. Corrupted entries are detected via CRC validation and skipped to recover as much data as possible.

## Installation

```bash
go get github.com/jwilder/bitcache
```

## Usage

### Library API

```go
package main

import (
    "fmt"
    "log"
    
    "github.com/jwilder/bitcache"
)

func main() {
    // Open or create a database
    db, err := bitcache.NewDiskCache("/path/to/data")
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()
    
    // Set a key-value pair
    err = db.Set([]byte("hello"), []byte("world"))
    if err != nil {
        log.Fatal(err)
    }
    
    // Get a value
    value, err := db.Get([]byte("hello"))
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("Value: %s\n", value)
    
    // Check if a key exists
    if db.Has([]byte("hello")) {
        fmt.Println("Key exists!")
    }
    
    // Get statistics
    stats := db.Stats()
    fmt.Printf("Keys: %d\n", stats.Keys)
    fmt.Printf("Data size: %d bytes\n", stats.DataSize)
    fmt.Printf("Reads: %d, Writes: %d\n", stats.Reads, stats.Writes)
    
    // Compact the database to reclaim space
    err = db.Compact()
    if err != nil {
        log.Fatal(err)
    }
}
```

### Using MemCache Layer

MemCache provides a high-performance read-through cache on top of DiskCache with bounded memory usage and low GC pressure through object pooling.

#### Basic MemCache Setup

```go
package main

import (
    "log"
    "github.com/jwilder/bitcache"
)

func main() {
    // Create backing disk cache
    diskCache, err := bitcache.NewDiskCache("/path/to/data")
    if err != nil {
        log.Fatal(err)
    }
    defer diskCache.Close()

    // Configure memory cache
    config := bitcache.MemCacheConfig{
        MaxMemoryBytes: 100 * 1024 * 1024, // 100MB memory limit
        EvictionPolicy: bitcache.EvictionLRU, // LRU or LFU
        ShardCount:     256,                // Reduces lock contention
    }

    // Create memory cache layer
    memCache, err := bitcache.NewMemCache(diskCache, config)
    if err != nil {
        log.Fatal(err)
    }
    defer memCache.Close()

    // Use the same Cache interface - transparently fast!
    memCache.Set([]byte("key"), []byte("value"))
    value, _ := memCache.Get([]byte("key")) // Served from memory
}
```

#### MemCache Features

**LRU vs LFU Eviction**

```go
// LRU: Evicts least recently accessed items (faster, O(1))
config := bitcache.MemCacheConfig{
    MaxMemoryBytes: 100 * 1024 * 1024,
    EvictionPolicy: bitcache.EvictionLRU,
}

// LFU: Evicts least frequently accessed items (good for hot keys)
config := bitcache.MemCacheConfig{
    MaxMemoryBytes: 100 * 1024 * 1024,
    EvictionPolicy: bitcache.EvictionLFU,
}
```

**Custom Cache Policy**

Control which entries are cached in memory:

```go
// Only cache entries with keys starting with "hot-"
policy := func(key []byte, value []byte) bool {
    return len(key) >= 4 && string(key[:4]) == "hot-"
}

config := bitcache.MemCacheConfig{
    MaxMemoryBytes: 100 * 1024 * 1024,
    EvictionPolicy: bitcache.EvictionLRU,
    CachePolicy:    policy, // Custom policy function
}
memCache, _ := bitcache.NewMemCache(diskCache, config)

// "hot-data" will be cached in memory
memCache.Set([]byte("hot-data"), []byte("value1"))

// "cold-data" will only be stored in backing DiskCache
memCache.Set([]byte("cold-data"), []byte("value2"))

// Both are retrievable, but only hot-data is in memory
```

**Cache Warming**

Pre-load frequently accessed keys on startup:

```go
keys := [][]byte{
    []byte("user:1234"),
    []byte("user:5678"),
    []byte("config:main"),
}
err := memCache.Warmup(keys)
if err != nil {
    log.Printf("Warmup failed: %v", err)
}
```

**Cache Invalidation**

Remove entries from memory cache without touching backing store:

```go
// Remove from memory cache only
memCache.Invalidate([]byte("stale-key"))

// Data still in backing store and will be read-through if accessed
value, _ := memCache.Get([]byte("stale-key"))
```

**Memory Compaction**

MemCache automatically compacts memory arenas to reclaim fragmented space:

```go
// Configure compaction thresholds
config := bitcache.MemCacheConfig{
    MaxMemoryBytes:      100 * 1024 * 1024, // 100MB
    EvictionPolicy:      bitcache.EvictionLRU,
    CompactionThreshold: 0.4,  // Compact when fragmentation exceeds 40% (default)
    CompactionInterval:  300,  // Check every 5 minutes (default)
}

// Manually trigger compaction
memCache.Compact()

// Check fragmentation before compacting
stats := memCache.MemStats()
fmt.Printf("Fragmentation: %.2f%%\n", stats.Fragmentation * 100)

if stats.Fragmentation > 0.5 {
    memCache.Compact()
    fmt.Println("Compaction completed")
}
```

When entries are deleted or evicted, memory becomes fragmented. Compaction:
1. Detects fragmentation via background worker monitoring
2. Rebuilds arena by creating a new compact arena and copying active entries
3. Reclaims memory by making old fragmented arenas eligible for GC
4. Maintains data accessibility during the entire process (thread-safe)

**Benefits:** Reduces memory footprint by 80-90% in fragmented scenarios, improves cache locality and performance.

**Cache Management**

```go
// Clear all memory cache entries (preserves backing store)
err := memCache.Clear()
```

**Monitoring**

Get detailed memory cache statistics:

```go
stats := memCache.MemStats()
fmt.Printf("Entries: %d\n", stats.Entries)
fmt.Printf("Memory Used: %d bytes\n", stats.MemoryUsed)
fmt.Printf("Memory Allocated: %d bytes\n", stats.MemoryAllocated)
fmt.Printf("Memory Limit: %d bytes\n", stats.MemoryLimit)
fmt.Printf("Shards: %d\n", stats.Shards)
fmt.Printf("Fragmentation: %.2f%%\n", stats.Fragmentation * 100)

// Memory usage percentage
usagePercent := float64(stats.MemoryUsed) / float64(stats.MemoryLimit) * 100
fmt.Printf("Usage: %.2f%%\n", usagePercent)

// Memory efficiency (how much allocated space is actually used)
efficiency := float64(stats.MemoryUsed) / float64(stats.MemoryAllocated) * 100
fmt.Printf("Efficiency: %.2f%%\n", efficiency)
```

## Performance

Benchmarks run on a MacBook Pro with SSD:

### MemCache Performance

With MemCache enabled, read performance increases dramatically:

```
BenchmarkMemCache/cached_get-10     28,685,608    41.68 ns/op    0 B/op    0 allocs/op
BenchmarkMemCache/miss_get-10          624,360  1,914.00 ns/op  144 B/op    3 allocs/op
BenchmarkMemCache/set-10               584,257  2,052.00 ns/op  144 B/op    3 allocs/op
```

- **Cached reads**: ~42 ns/op with zero allocations
- **Cache misses**: Falls back to disk with automatic caching
- **Low GC pressure**: Object pooling reduces allocations


## API Reference

### Core Interface

```go
type Cache interface {
    // Get retrieves the value for the given key
    // Returns ErrKeyNotFound if the key doesn't exist
    Get(key []byte) ([]byte, error)

    // Set stores a key-value pair in the cache
    Set(key []byte, value []byte) error

    // Has checks if a key exists in the cache
    Has(key []byte) bool

    // Stats returns cache statistics
    Stats() Stats
}
```

### Creating a Database

```go
// NewDiskCache creates or opens a BitCache database
func NewDiskCache(dataDir string) (*DiskCache, error)

// NewMemCache creates a memory cache layer on top of a backing cache
func NewMemCache(backing Cache, config MemCacheConfig) (*MemCache, error)
```

### MemCache Configuration

```go
type MemCacheConfig struct {
    // MaxMemoryBytes is the maximum memory to use for cached entries
    MaxMemoryBytes int64
    
    // EvictionPolicy determines which entries to evict (EvictionLRU or EvictionLFU)
    EvictionPolicy EvictionPolicy
    
    // CachePolicy determines which entries should be cached (nil = cache all)
    CachePolicy CachePolicyFunc
    
    // ShardCount is the number of shards to use (default: 256)
    ShardCount int
    
    // CompactionThreshold is the fragmentation ratio that triggers compaction (default: 0.4)
    CompactionThreshold float64
    
    // CompactionInterval is how often to check for compaction in seconds (default: 300)
    CompactionInterval int
}

// CachePolicyFunc decides whether a key-value pair should be cached
type CachePolicyFunc func(key []byte, value []byte) bool

// EvictionPolicy types
const (
    EvictionLRU EvictionPolicy = iota // Least Recently Used
    EvictionLFU                        // Least Frequently Used
)
```

### MemCache Methods

```go
// Invalidate removes a key from the memory cache (not from backing store)
func (mc *MemCache) Invalidate(key []byte) error

// Clear removes all entries from the memory cache
func (mc *MemCache) Clear() error

// Warmup pre-loads keys from backing store into memory
func (mc *MemCache) Warmup(keys [][]byte) error

// Compact manually triggers memory compaction to reclaim fragmented space
func (mc *MemCache) Compact() error
```

### Additional Methods

```go
// Close cleanly shuts down the database
func (d *DiskCache) Close() error

// Compact removes old segments based on configured age and disk usage limits
func (d *DiskCache) Compact() error

// Sync forces a fsync on the active segment
func (d *DiskCache) Sync() error
```

### Statistics

```go
type Stats struct {
    Keys      int64  // Number of keys in the cache
    DataSize  int64  // Total size of data on disk in bytes
    IndexSize int64  // Size of in-memory index in bytes
    Segments  int    // Number of segment files
    Reads     int64  // Total number of read operations
    Writes    int64  // Total number of write operations
}
```

## Error Handling

```go
import "github.com/jwilder/bitcache"

// Check for key not found
value, err := db.Get([]byte("key"))
if err == bitcache.ErrKeyNotFound {
    // Key doesn't exist
} else if err != nil {
    // Other error
}

// Check for closed database
err = db.Set([]byte("key"), []byte("value"))
if err == bitcache.ErrCacheClosed {
    // Database is closed
}
```

## Configuration

BitCache uses sensible defaults, but can be configured:

```go
config := bitcache.DiskCacheConfig{
    MaxSegmentSize:      16 * 1024 * 1024,  // 16 MB (default)
    MaxDiskUsage:        1024 * 1024 * 1024, // 1 GB limit (0 = no limit)
    MinSegmentAge:       time.Hour,          // Don't remove segments newer than 1 hour
    AutoCompactEnabled:  true,               // Enable automatic compaction
    AutoCompactInterval: 5 * time.Minute,    // Check every 5 minutes
    AutoCompactMaxAge:   7 * 24 * time.Hour, // Remove segments older than 7 days
}
db, err := bitcache.NewDiskCacheWithConfig("/path/to/data", config)
```

**Configuration Options:**

- **MaxSegmentSize**: Maximum size before rotating to a new segment file (default: 16 MB)
- **MaxDiskUsage**: Maximum total disk usage before compaction removes old segments (default: 0 = no limit)
- **MinSegmentAge**: Minimum age before a segment can be removed (default: 1 hour)
- **AutoCompactEnabled**: Enable automatic background compaction (default: false)
- **AutoCompactInterval**: Time between automatic compaction checks (default: 5 minutes)
- **AutoCompactMaxAge**: Maximum age for segments; older segments are removed (default: 0 = age-based removal disabled)

### MemCache Architecture

MemCache provides low GC pressure through object pooling and careful memory management:

**How Object Pooling Works:**

1. **Entry Pooling**: Uses `sync.Pool` to reuse cache entry objects, avoiding repeated allocations
2. **Node Pooling**: LRU list nodes are pooled and reused when entries are evicted
3. **Bounded Memory**: Strict memory limit with automatic eviction prevents unbounded growth
4. **Sharded Maps**: Distributes entries across multiple shards to reduce lock contention

**Memory Accounting:**

Each cached entry consumes:
- Key size (bytes)
- Value size (bytes)
- ~32 bytes overhead (entry metadata)

The memory limit is strictly enforced - entries are evicted before exceeding the limit.

**Thread Safety:**

MemCache is fully thread-safe with:
- Sharded locking to reduce contention (default: 256 shards)
- RWMutex per shard for concurrent reads
- Atomic operations for access counting

### Use Cases

BitCache is primarily designed for **persistent caching** scenarios:

- **Application caching**: Fast local cache with persistence across restarts
- **Computed results**: Caching expensive computations or API responses
- **Temporary data**: Short-lived data that benefits from persistence
- **Prototyping**: Quick storage solution for proof-of-concepts
- **Embedded databases**: Simple key-value storage within applications

As a persistent cache rather than a full storage engine, BitCache has intentional trade-offs:

- Single writer at a time (concurrent reads are fully supported)
- No delete operations (write-once keys with automatic age-based eviction)
- No query language or secondary indexes
- No transactions or batch operations

These limitations make it unsuitable as a primary database for most applications, but ideal for caching scenarios.

## MemCache Best Practices

1. **Size Memory Appropriately**: Set MaxMemoryBytes to 10-20% of available RAM for hot data
2. **Choose Right Eviction Policy**: Use LRU for recency-based workloads, LFU for frequency-based
3. **Use Cache Policy Wisely**: Filter large or rarely-accessed entries to maximize hit rate
4. **Monitor Stats**: Regularly check MemStats to ensure efficient memory usage
5. **Warm Critical Keys**: Pre-load frequently accessed keys on startup
6. **Shard Count**: Use default 256 shards, or increase for very high concurrency

## Limitations

- All keys must fit in memory (values are on disk)
- Single writer (concurrent reads are supported)
- No delete operations (use age-based or size-based compaction instead)
- No transactions or batch operations
- No built-in replication or clustering

## Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Acknowledgments

- Inspired by the [Bitcask](https://riak.com/assets/bitcask-intro.pdf) paper