Skip to content
GitHubDiscord

DuckDb Database

NuGet Version

Cortex.States.DuckDb is a state store implementation for the Cortex Data Framework that uses DuckDB as the underlying storage engine. DuckDB is an in-process analytical database management system designed for fast analytical queries, making it an excellent choice for scenarios requiring both transactional state management and analytical capabilities.

  • High-Performance Analytics: Leverages DuckDB’s columnar storage and vectorized query execution
  • In-Memory & Persistent Storage: Supports both in-memory databases for fast processing and file-based persistence
  • Native Export Capabilities: Export data directly to Parquet or CSV formats
  • Batch Operations: Efficient bulk insert and delete operations with transaction support
  • Thread-Safe: Built-in thread safety for concurrent access
  • Flexible Serialization: Customizable key and value serialization
  • Fluent Builder API: Easy configuration through builder pattern
dotnet add package Cortex.States.DuckDb
Install-Package Cortex.States.DuckDb
using Cortex.States.DuckDb;

// Create a persistent DuckDB state store
var stateStore = new DuckDbKeyValueStateStore<string, int>(
    name: "MyStateStore",
    databasePath: "./data/mystore.duckdb",
    tableName: "KeyValueStore"
);

// Store values
stateStore.Put("counter", 42);
stateStore.Put("total", 100);

// Retrieve values
var counter = stateStore.Get("counter"); // Returns 42

// Check if key exists
if (stateStore.ContainsKey("counter"))
{
    Console.WriteLine("Counter exists!");
}

// Remove a value
stateStore.Remove("counter");

// Get all keys
foreach (var key in stateStore.GetKeys())
{
    Console.WriteLine($"Key: {key}");
}

// Don't forget to dispose
stateStore.Dispose();
using Cortex.States.DuckDb;

// Create store using fluent builder
var stateStore = DuckDbKeyValueStateStoreBuilder<string, OrderSummary>
    .Create("OrderStore")
    .WithDatabasePath("./data/orders.duckdb")
    .WithTableName("Orders")
    .WithIndex(true)
    .WithMaxMemory("2GB")
    .WithThreads(4)
    .Build();

// Use the store
stateStore.Put("ORD-001", new OrderSummary { Total = 99.99m, Status = "Completed" });
using Cortex.States.DuckDb;

// Create an in-memory store for fast processing
var inMemoryStore = DuckDbKeyValueStateStoreBuilder<string, decimal>
    .Create("TemporaryStore")
    .UseInMemory()
    .WithTableName("TempData")
    .Build();

// Perfect for temporary computations
inMemoryStore.Put("sum", 1234.56m);
using Cortex.States.DuckDb;

// Create options for fine-grained control
var options = new DuckDbKeyValueStateStoreOptions
{
    DatabasePath = "./data/analytics.duckdb",
    TableName = "AnalyticsState",
    CreateIndex = true,
    MaxMemory = "4GB",
    Threads = 8,
    AccessMode = DuckDbAccessMode.ReadWrite
};

var stateStore = new DuckDbKeyValueStateStore<string, AnalyticsData>(
    name: "AnalyticsStore",
    options: options
);
using Cortex.States.DuckDb;

// Quick creation methods
var persistentStore = DuckDbStateStoreExtensions
    .CreatePersistentDuckDbStore<string, Product>("ProductStore", "./data/products.duckdb", "Products");

var inMemoryStore = DuckDbStateStoreExtensions
    .CreateInMemoryDuckDbStore<string, Session>("SessionStore", "Sessions");
// Efficient bulk insert
var items = new List<KeyValuePair<string, decimal>>
{
    new("price-1", 10.99m),
    new("price-2", 20.99m),
    new("price-3", 30.99m)
};

stateStore.PutMany(items);

// Bulk delete
stateStore.RemoveMany(new[] { "price-1", "price-2" });

DuckDB has native support for Parquet and CSV formats, making data export seamless:

// Export to Parquet (ideal for analytics)
stateStore.ExportToParquet("./exports/state-backup.parquet");

// Export to CSV (ideal for data sharing)
stateStore.ExportToCsv("./exports/state-backup.csv");
// Get total count
var count = stateStore.Count();
Console.WriteLine($"Total items: {count}");

// Clear all items
stateStore.Clear();

For persistent databases, you can force a checkpoint to ensure all data is written to disk:

stateStore.Checkpoint();

Use DuckDB state store with Cortex Streams for stateful stream processing:

using Cortex.Streams;
using Cortex.States.DuckDb;

// Create the state store
var stateStore = new DuckDbKeyValueStateStore<string, int>(
    name: "WordCountStore",
    databasePath: "./data/wordcount.duckdb",
    tableName: "WordCounts"
);

// Use in a stream pipeline
var stream = StreamBuilder<string>.CreateNewStream("WordCountStream")
    .Stream()
    .FlatMap(line => line.Split(' '))
    .GroupBy(word => word)
    .Aggregate(
        stateStore,
        (count, word) => count + 1,
        initialValue: 0)
    .Sink(result => Console.WriteLine($"{result.Key}: {result.Value}"))
    .Build();

stream.Start();

You can provide custom serializers for complex types:

using System.Text.Json;

var stateStore = new DuckDbKeyValueStateStore<Guid, ComplexObject>(
    name: "ComplexStore",
    databasePath: "./data/complex.duckdb",
    tableName: "ComplexData",
    keySerializer: key => key.ToString(),
    keyDeserializer: str => Guid.Parse(str),
    valueSerializer: value => JsonSerializer.Serialize(value),
    valueDeserializer: str => JsonSerializer.Deserialize<ComplexObject>(str)!
);
OptionDescriptionDefault
DatabasePathPath to the DuckDB database file. Use :memory: for in-memoryRequired
TableNameName of the table for key-value storageRequired
UseInMemoryUse in-memory database instead of filefalse
CreateIndexCreate index on key column for faster lookupstrue
MaxMemoryMaximum memory limit (e.g., “1GB”, “512MB”)Auto
ThreadsNumber of threads (0 = auto)0
AccessModeDatabase access mode (Automatic, ReadWrite, ReadOnly)Automatic

DuckDB is particularly well-suited for:

  • Analytical workloads: When you need to run analytical queries on your state
  • Large datasets: Efficient columnar storage for large amounts of data
  • Data export requirements: Native Parquet/CSV export capabilities
  • Embedded analytics: In-process database without external dependencies
  • Temporary processing: Fast in-memory mode for intermediate computations

Consider other state stores when:

  • You need distributed state across multiple nodes (use Cassandra, MongoDB)
  • You require extreme write throughput (use RocksDB)
  • You need full ACID transactions across multiple operations (use PostgreSQL, SQL Server)

The DuckDbKeyValueStateStore is thread-safe and can be used concurrently from multiple threads. For in-memory databases, a persistent connection is maintained to ensure data consistency.

try
{
    var value = stateStore.Get("non-existent-key");
    if (value == null)
    {
        Console.WriteLine("Key not found");
    }
}
catch (InvalidOperationException ex)
{
    Console.WriteLine($"Store not initialized: {ex.Message}");
}
  1. Dispose properly: Always dispose of the state store when done to release resources
  2. Use batch operations: For bulk inserts/deletes, use PutMany and RemoveMany
  3. Choose appropriate storage: Use in-memory for temporary data, file-based for persistence
  4. Set memory limits: Configure MaxMemory for large datasets to prevent excessive memory usage
  5. Regular checkpoints: Call Checkpoint() periodically for critical data in persistent mode
  • .NET 7.0 or later
  • DuckDB.NET.Data package (automatically included)

MIT License - see the license file for details.