Skip to content

ddn.data.csv

The ddn.data.csv module provides a high-performance, RFC 4180-compliant CSV reader and writer with configurable dialect options and performance features such as buffered I/O and zero-copy parsing.

Overview

This module offers a comprehensive solution for reading and writing CSV (Comma-Separated Values) files. It strictly follows the RFC 4180 specification while providing flexibility through configurable dialect options to handle various CSV formats encountered in the real world.

The implementation is designed for high throughput, making it suitable for processing large datasets efficiently.

Key Features

  • RFC 4180 Compliance: Full compliance with the CSV standard
  • Configurable Dialects: Customize delimiter, quote character, newline handling, and more
  • High Performance: Optimized for throughput with buffered I/O and memory mapping
  • Zero-Copy Parsing: Minimize allocations on the hot path
  • Error Handling Modes: Choose between permissive and fail-fast error handling
  • Header Row Support: Optional first-row-as-header interpretation
  • Embedded Newlines: Support for newlines within quoted fields
  • UTF-8 BOM Handling: Optionally accept and skip UTF-8 byte order marks

RFC 4180 Compliance

The module implements RFC 4180 with the following behaviors:

  • Record Delimiters: CRLF per RFC; reader detects CRLF, LF, and legacy CR
  • Header Row: Optional; same field count as data rows when enabled
  • Field Delimiter: Comma by default; configurable to other single-byte delimiters
  • Quoted Fields: Fields containing delimiter, quote, or newline are quoted
  • Quote Escaping: Doubled quotes within quoted fields
  • Embedded Newlines: Supported inside quoted fields
  • Whitespace: Spaces are data; optional trimming for unquoted fields

Basic Usage

Reading CSV Data

import ddn.data.csv;

void main() {
    const csv = "name,age,city\nAlice,30,New York\nBob,25,Los Angeles\n";

    long totalAge = 0;
    auto reader = byRows(csv);

    // Skip header row
    reader.popFront();

    // Process data rows
    while (!reader.empty) {
        auto row = reader.front;
        auto age = fromCsv!long(row[1]);
        if (age.isOk) {
            totalAge += age.value;
        }
        reader.popFront();
    }

    writeln("Total age: ", totalAge);  // 55
}

Writing CSV Data

import ddn.data.csv;

void main() {
    auto writer = csvWriter();

    // Write header
    writer.putRow(["name", "age", "city"]);

    // Write data rows
    writer.putRow(["Alice", "30", "New York"]);
    writer.putRow(["Bob", "25", "Los Angeles"]);

    string result = writer.data;
}

CsvDialect Configuration

The CsvDialect struct allows you to customize CSV parsing and writing behavior:

import ddn.data.csv;

// Create a custom dialect
auto dialect = CsvDialect(
    ';',                      // delimiter (semicolon)
    '"',                      // quote character
    true,                     // doubleQuote (escape quotes by doubling)
    false,                    // trimWhitespace
    NewlinePolicy.DETECT,     // newline detection
    EscapeStyle.NONE,         // RFC 4180 escaping only
    true                      // header row present
);

Dialect Options

Option Default Description
delimiter , Field delimiter character
quote " Quote character for fields
doubleQuote true Escape quotes by doubling them
trimWhitespace false Trim whitespace in unquoted fields
newlinePolicy DETECT How to handle line endings
escapeStyle NONE Escape style (RFC or backslash)
header false First row is header
strictFieldCount false Enforce consistent field count
errorMode PERMISSIVE Error handling mode

Newline Policies

// Detect CRLF and LF automatically (default)
dialect.newlinePolicy = NewlinePolicy.DETECT;

// Force CRLF (\r\n) handling
dialect.newlinePolicy = NewlinePolicy.FORCE_CRLF;

// Force LF (\n) handling
dialect.newlinePolicy = NewlinePolicy.FORCE_LF;

Escape Styles

// RFC 4180 only (double quotes inside quoted fields)
dialect.escapeStyle = EscapeStyle.NONE;

// Allow backslash escaping (non-RFC extension)
dialect.escapeStyle = EscapeStyle.BACKSLASH;

Error Handling

The module provides two error handling modes:

Permissive Mode (Default)

Malformed rows are skipped and errors are counted; iteration continues:

auto dialect = CsvDialect.init;
dialect.errorMode = ErrorMode.PERMISSIVE;
dialect.collectDiagnostics = true;  // Optionally collect error details

auto reader = CsvReader!string(csvData, dialect);
// Process rows...

// Check statistics after processing
auto stats = reader.stats;
writeln("Errors: ", stats.errorCount);

Fail-Fast Mode

Stop iteration at the first error:

auto dialect = CsvDialect.init;
dialect.errorMode = ErrorMode.FAIL_FAST;

auto reader = CsvReader!string(csvData, dialect);
// Will stop at first malformed row

Working with Rows and Fields

Accessing Fields

import ddn.data.csv;

auto reader = byRows("a,b,c\n1,2,3\n");

foreach (row; reader) {
    // Access by index
    auto first = row[0];   // FieldView
    auto second = row[1];

    // Get field as string
    string value = first.toString();

    // Get field count
    size_t count = row.length;
}

Type Conversion

Convert field values to typed data:

import ddn.data.csv;

auto row = reader.front;

// Convert to specific types
auto intResult = fromCsv!int(row[0]);
auto floatResult = fromCsv!double(row[1]);
auto boolResult = fromCsv!bool(row[2]);

if (intResult.isOk) {
    int value = intResult.value;
}

Type Aliases

The module provides convenient type aliases:

// Field and row views
CsvField f;          // alias for FieldView
CsvRow row;          // alias for RowView

// Result type
CsvResultT!int ri;   // alias for CsvResult!int

// Reader/writer types
alias Reader = CsvReaderOf!(const(char)[]);
alias Writer = CsvWriterTo!OutputRange;

Performance Considerations

For optimal performance:

  • Memory Mapping: Large files are automatically memory-mapped on 64-bit systems
  • Zero-Copy Parsing: Field views reference the original buffer without copying
  • Lazy Iteration: Rows are parsed on-demand during iteration
  • Buffered I/O: Efficient buffering for file-based reading

Memory Mapping Limits

On 32-bit platforms, files larger than ~1.5 GB fall back to buffered reading to avoid address space exhaustion.

Common Use Cases

Tab-Separated Values (TSV)

auto dialect = CsvDialect('\t');  // Tab delimiter
auto reader = CsvReader!string(tsvData, dialect);

Semicolon-Separated (European Format)

auto dialect = CsvDialect(';');
auto reader = CsvReader!string(csvData, dialect);

Strict Validation

auto dialect = CsvDialect.init;
dialect.strictFieldCount = true;
dialect.errorMode = ErrorMode.FAIL_FAST;

When to Use

Use this module when you need to:

  • Parse CSV files with high performance requirements
  • Handle various CSV dialects and formats
  • Process large CSV datasets efficiently
  • Write RFC 4180-compliant CSV output
  • Work with CSV data that contains special characters or embedded newlines
  • Validate CSV structure with configurable strictness