ddn.data.csv¶

The ddn.data.csv module provides a high-performance, RFC 4180-compliant CSV reader and writer with configurable dialect options and performance features such as buffered I/O and zero-copy parsing.

Overview¶

This module offers a comprehensive solution for reading and writing CSV (Comma-Separated Values) files. It strictly follows the RFC 4180 specification while providing flexibility through configurable dialect options to handle various CSV formats encountered in the real world.

The implementation is designed for high throughput, making it suitable for processing large datasets efficiently.

Key Features¶

RFC 4180 Compliance: Full compliance with the CSV standard
Configurable Dialects: Customize delimiter, quote character, newline handling, and more
High Performance: Optimized for throughput with buffered I/O and memory mapping
Zero-Copy Parsing: Minimize allocations on the hot path
Error Handling Modes: Choose between permissive and fail-fast error handling
Header Row Support: Optional first-row-as-header interpretation
Embedded Newlines: Support for newlines within quoted fields
UTF-8 BOM Handling: Optionally accept and skip UTF-8 byte order marks

RFC 4180 Compliance¶

The module implements RFC 4180 with the following behaviors:

Record Delimiters: CRLF per RFC; reader detects CRLF, LF, and legacy CR
Header Row: Optional; same field count as data rows when enabled
Field Delimiter: Comma by default; configurable to other single-byte delimiters
Quoted Fields: Fields containing delimiter, quote, or newline are quoted
Quote Escaping: Doubled quotes within quoted fields
Embedded Newlines: Supported inside quoted fields
Whitespace: Spaces are data; optional trimming for unquoted fields

Basic Usage¶

Reading CSV Data¶

import ddn.data.csv;

void main() {
    const csv = "name,age,city\nAlice,30,New York\nBob,25,Los Angeles\n";

    long totalAge = 0;
    auto reader = byRows(csv);

    // Skip header row
    reader.popFront();

    // Process data rows
    while (!reader.empty) {
        auto row = reader.front;
        auto age = fromCsv!long(row[1]);
        if (age.isOk) {
            totalAge += age.value;
        }
        reader.popFront();
    }

    writeln("Total age: ", totalAge);  // 55
}

Writing CSV Data¶

import ddn.data.csv;

void main() {
    auto writer = csvWriter();

    // Write header
    writer.putRow(["name", "age", "city"]);

    // Write data rows
    writer.putRow(["Alice", "30", "New York"]);
    writer.putRow(["Bob", "25", "Los Angeles"]);

    string result = writer.data;
}

CsvDialect Configuration¶

The CsvDialect struct allows you to customize CSV parsing and writing behavior:

import ddn.data.csv;

// Create a custom dialect
auto dialect = CsvDialect(
    ';',                      // delimiter (semicolon)
    '"',                      // quote character
    true,                     // doubleQuote (escape quotes by doubling)
    false,                    // trimWhitespace
    NewlinePolicy.DETECT,     // newline detection
    EscapeStyle.NONE,         // RFC 4180 escaping only
    true                      // header row present
);

Dialect Options¶

Option	Default	Description
`delimiter`	`,`	Field delimiter character
`quote`	`"`	Quote character for fields
`doubleQuote`	`true`	Escape quotes by doubling them
`trimWhitespace`	`false`	Trim whitespace in unquoted fields
`newlinePolicy`	`DETECT`	How to handle line endings
`escapeStyle`	`NONE`	Escape style (RFC or backslash)
`header`	`false`	First row is header
`strictFieldCount`	`false`	Enforce consistent field count
`errorMode`	`PERMISSIVE`	Error handling mode

Newline Policies¶

// Detect CRLF and LF automatically (default)
dialect.newlinePolicy = NewlinePolicy.DETECT;

// Force CRLF (\r\n) handling
dialect.newlinePolicy = NewlinePolicy.FORCE_CRLF;

// Force LF (\n) handling
dialect.newlinePolicy = NewlinePolicy.FORCE_LF;

Escape Styles¶

// RFC 4180 only (double quotes inside quoted fields)
dialect.escapeStyle = EscapeStyle.NONE;

// Allow backslash escaping (non-RFC extension)
dialect.escapeStyle = EscapeStyle.BACKSLASH;

Error Handling¶

The module provides two error handling modes:

Permissive Mode (Default)¶

Malformed rows are skipped and errors are counted; iteration continues:

auto dialect = CsvDialect.init;
dialect.errorMode = ErrorMode.PERMISSIVE;
dialect.collectDiagnostics = true;  // Optionally collect error details

auto reader = CsvReader!string(csvData, dialect);
// Process rows...

// Check statistics after processing
auto stats = reader.stats;
writeln("Errors: ", stats.errorCount);

Fail-Fast Mode¶

Stop iteration at the first error:

auto dialect = CsvDialect.init;
dialect.errorMode = ErrorMode.FAIL_FAST;

auto reader = CsvReader!string(csvData, dialect);
// Will stop at first malformed row

Working with Rows and Fields¶

Accessing Fields¶

import ddn.data.csv;

auto reader = byRows("a,b,c\n1,2,3\n");

foreach (row; reader) {
    // Access by index
    auto first = row[0];   // FieldView
    auto second = row[1];

    // Get field as string
    string value = first.toString();

    // Get field count
    size_t count = row.length;
}

Type Conversion¶

Convert field values to typed data:

import ddn.data.csv;

auto row = reader.front;

// Convert to specific types
auto intResult = fromCsv!int(row[0]);
auto floatResult = fromCsv!double(row[1]);
auto boolResult = fromCsv!bool(row[2]);

if (intResult.isOk) {
    int value = intResult.value;
}

Type Aliases¶

The module provides convenient type aliases:

// Field and row views
CsvField f;          // alias for FieldView
CsvRow row;          // alias for RowView

// Result type
CsvResultT!int ri;   // alias for CsvResult!int

// Reader/writer types
alias Reader = CsvReaderOf!(const(char)[]);
alias Writer = CsvWriterTo!OutputRange;

Performance Considerations¶

For optimal performance:

Memory Mapping: Large files are automatically memory-mapped on 64-bit systems
Zero-Copy Parsing: Field views reference the original buffer without copying
Lazy Iteration: Rows are parsed on-demand during iteration
Buffered I/O: Efficient buffering for file-based reading

Memory Mapping Limits¶

On 32-bit platforms, files larger than ~1.5 GB fall back to buffered reading to avoid address space exhaustion.

Common Use Cases¶

Tab-Separated Values (TSV)¶

auto dialect = CsvDialect('\t');  // Tab delimiter
auto reader = CsvReader!string(tsvData, dialect);

Semicolon-Separated (European Format)¶

auto dialect = CsvDialect(';');
auto reader = CsvReader!string(csvData, dialect);

Strict Validation¶

auto dialect = CsvDialect.init;
dialect.strictFieldCount = true;
dialect.errorMode = ErrorMode.FAIL_FAST;

When to Use¶

Use this module when you need to:

Parse CSV files with high performance requirements
Handle various CSV dialects and formats
Process large CSV datasets efficiently
Write RFC 4180-compliant CSV output
Work with CSV data that contains special characters or embedded newlines
Validate CSV structure with configurable strictness