Skip to content

ddn.data.csv

Overview

The ddn.data.csv module provides a high‑performance, RFC 4180–oriented CSV reader and writer for D. It focuses on:

  • Zero‑copy parsing on hot paths (views over the original buffer instead of copying strings)
  • Configurable CSV dialects (delimiter, quoting, newline handling, whitespace trimming, escapes)
  • Streaming I/O over generic ranges and sinks
  • Structured, non‑throwing error handling with optional exceptions

It is suitable for both small utilities and large, throughput‑sensitive data pipelines.


Installation

To use ddn.data.csv, ensure your project depends on the DDN library. If using DUB, add the following to your dub.sdl or dub.json:

dependency "ddn" version=">=1.0.0"

Then, import the module in your D code:

import ddn.data.csv;

Usage

Typical usage involves:

  • Defining a CsvDialect (or using the defaults) to describe how your CSV data is formatted
  • Reading rows via byRows / CsvReader and converting fields using fromCsv
  • Optionally using headers and name‑based access via HeaderIndex and RowView.byName
  • Writing CSV data using CsvWriter or the higher‑level CsvFile helper

Reading CSV from a string

import ddn.data.csv;

void main() {
    const csv = "id,value\n1,10\n2,32\n";

    auto rows = byRows(csv); // uses CsvDialect.init by default

    // Optional: handle header row
    auto header = rows.front;
    rows.popFront();

    long total = 0;
    while (!rows.empty) {
        auto row = rows.front;
        auto id  = fromCsv!long(row[0]);
        auto val = fromCsv!long(row[1]);

        if (id.isOk && val.isOk) {
            total += val.value;
        }

        rows.popFront();
    }
}

Reading with a custom dialect (semicolon delimiter)

import ddn.data.csv;

void main() {
    const csv = "id;value\n1;10\n2;32\n";

    auto dialect = CsvDialect(';'); // semicolon delimiter, other options default
    auto rows = byRows(csv, dialect);

    // ... iterate rows as above
}

Writing CSV

import ddn.data.csv;
import std.array : appender;

void main() {
    // Any sink type with a `put(const(char)[])` method can be used.
    auto buffer = appender!string();

    auto writer = CsvWriter!(typeof(buffer))(buffer, CsvDialect.init);

    writer.writeRow(["id", "name"]);
    writer.writeRow(["1", "Ada"]);
    writer.writeRow(["2", "Grace"]);

    // Flush buffered data into the sink
    writer.flush();

    assert(buffer.data == "id,name\n1,Ada\n2,Grace\n");
}

Using headers and name‑based access

import ddn.data.csv;

void main() {
    const csv = "id,name\n1,Ada\n";

    auto it = byRows(csv);

    // First row is the header
    auto headerRow = it.front;
    it.popFront();

    auto hidx = makeHeaderIndex(headerRow);

    // Next row is data
    auto row = it.front;
    row.attachHeader(&hidx);

    auto nameField = row.byName("name");
    assert(nameField.isOk);
    assert(nameField.value.toString() == "Ada");
}

API Reference

Module

module ddn.data.csv;

Enums

NewlinePolicy

Controls how record boundaries are detected and how newlines are emitted.

  • DETECT — Detect CRLF and LF while reading; writer uses CRLF by default.
  • FORCE_CRLF — Treat "\r\n" as the record terminator; writer always emits CRLF.
  • FORCE_LF — Treat "\n" as the record terminator; writer always emits LF.

EscapeStyle

Escape handling for non‑RFC data.

  • NONE — Only RFC 4180 double‑quote escaping inside quoted fields (default).
  • BACKSLASH — Treat backslash as an escape for delimiter/quote/newline in unquoted fields.

ErrorMode

Reader error‑handling mode.

  • PERMISSIVE — Malformed rows are skipped, errors counted; iteration continues.
  • FAIL_FAST — Stop at the first error and surface it via CsvReadStats.

CsvErrorCode

Error codes describing common CSV parsing and configuration issues.

  • NONE — No error
  • UNEXPECTED_EOF — Incomplete row or unterminated quoted field at end of input
  • INVALID_QUOTE — Misplaced or malformed quote
  • INVALID_ESCAPE — Invalid escape sequence (for EscapeStyle.BACKSLASH)
  • INCONSISTENT_FIELD_COUNT — Row has a different number of fields than expected
  • INVALID_DIALECT — Invalid dialect configuration or missing header
  • IO_FAILURE — Underlying I/O failure
  • INVALID_CONVERSION — Failed to convert a FieldView to the requested type
  • INVALID_COLUMN — Column name not found in the header index

Structs and Aliases

FieldView

Lightweight non‑owning view over a CSV field.

Key properties and methods:

  • const(char)[] data — Underlying slice into the source buffer
  • bool wasQuoted — Whether the original field was quoted
  • bool needsUnescape — Whether doubled quotes need unescaping
  • const(char)[] toString() const — Returns the raw slice
  • string unescaped() const — Returns the logical text with doubled quotes collapsed
  • size_t length() const — Length in bytes

Alias:

  • alias CsvField = FieldView;

HeaderIndex

Header index providing fast name‑to‑column lookup.

Typically built from the first row when CsvDialect.header == true and attached to subsequent RowView values.

Key aspects:

  • Constructed from an array of FieldView values
  • Stores original names and a normalized lookup map
  • Detects duplicate header names

Helper:

  • HeaderIndex makeHeaderIndex(RowView headerRow, bool caseSensitive = true)

RowView

Lightweight, non‑owning view over a parsed CSV row.

Key properties and methods:

  • FieldView[] fields — Fields in this row
  • size_t length() const — Number of fields
  • inout(FieldView) opIndex(size_t i) — Indexing access
  • void attachHeader(const(HeaderIndex)* header) — Attach a header index
  • CsvResult!FieldView byName(scope const(char)[] name) const — Lookup field by column name

Alias:

  • alias CsvRow = RowView;

CsvError

Structured error information returned via CsvResult!T.

Fields:

  • CsvErrorCode code — Error code
  • size_t line — 1‑based line, if known (0 when unknown)
  • size_t column — 1‑based column, if known (0 when unknown)
  • string message — Optional human‑readable message

CsvResult(T)

Result container for error‑aware APIs without throwing.

Fields and helpers:

  • bool isOktrue when operation succeeded
  • T value — Value when isOk == true
  • CsvError err — Error information when isOk == false
  • static CsvResult ok(T v) — Construct a success result
  • static CsvResult error(CsvError e) — Construct an error result
  • T valueOrThrow() — Return value or throw CsvException on error

Aliases:

  • alias CsvResultT(T) = CsvResult!T;

CsvDialect

Describes how CSV data is formatted: delimiter, quote rules, whitespace handling, newline policy, header usage, and error policy.

Key fields (with defaults):

  • char delimiter — Field delimiter, default ','
  • char quote — Quote character, default '"'
  • bool doubleQuote — Whether doubled quotes represent a literal quote, default true
  • bool trimWhitespace — Trim leading/trailing whitespace in unquoted fields, default false
  • NewlinePolicy newlinePolicy — Newline detection/emission policy, default NewlinePolicy.DETECT
  • EscapeStyle escapeStyle — Escape policy, default EscapeStyle.NONE
  • bool header — Interpret first record as header row, default false
  • ErrorMode errorMode — Reader error mode, default ErrorMode.PERMISSIVE
  • bool strictFieldCount — Enforce consistent number of fields per row, default false
  • bool collectDiagnostics — Collect per‑error diagnostics, default false

Constructor:

this(
    char delimiter,
    char quote = DEFAULT_QUOTE,
    bool doubleQuote = DEFAULT_DOUBLE_QUOTE,
    bool trimWhitespace = DEFAULT_TRIM_WHITESPACE,
    NewlinePolicy newlinePolicy = DEFAULT_NEWLINE_POLICY,
    EscapeStyle escapeStyle = DEFAULT_ESCAPE_STYLE,
    bool header = DEFAULT_HEADER
);

Methods:

  • bool isValid() const — Returns true when options are self‑consistent (e.g. delimiter != quote).

Related public defaults:

  • enum char DEFAULT_DELIMITER = ',';
  • enum char DEFAULT_QUOTE = '"';
  • enum bool DEFAULT_DOUBLE_QUOTE = true;
  • enum bool DEFAULT_TRIM_WHITESPACE = false;
  • enum NewlinePolicy DEFAULT_NEWLINE_POLICY = NewlinePolicy.DETECT;
  • enum EscapeStyle DEFAULT_ESCAPE_STYLE = EscapeStyle.NONE;
  • enum bool DEFAULT_HEADER = false;

CsvReadStats

Reading statistics and optional diagnostics collected by CsvReader.

Fields:

  • size_t rows — Number of successfully yielded rows
  • size_t badRows — Number of malformed rows
  • size_t errors — Total error count (currently equal to badRows)
  • CsvError lastError — Most recent error

Accessors:

  • size_t diagnosticsCount() const — Number of stored diagnostics
  • inout(CsvError)[] diagnostics() — View over collected diagnostics

CsvReader(Range)

High‑throughput CSV reader over an input range of bytes. Models a forward range of RowView values.

Construction is usually done via byRows, but the type itself can be used directly for advanced scenarios.

Alias:

  • alias CsvReaderOf(R) = CsvReader!R;

CsvWriter(OutputRange)

Streaming CSV writer to an output range OutputRange supporting put(const(char)[]).

Use it directly or via convenience wrapper types such as CsvFile.

Alias:

  • alias CsvWriterTo(S) = CsvWriter!S;

CsvFile

Higher‑level helper that combines CsvReader/CsvWriter with filesystem I/O.

It encapsulates opening a file, choosing memory‑mapped vs buffered I/O, and provides convenient iteration over rows.

Fields and methods follow the same semantics as CsvReader/CsvWriter, but are tied to a specific file path and open mode.


Functions

byRows

public auto byRows(Source)(Source source, CsvDialect dialect = CsvDialect.init)
    @safe nothrow @nogc;

Creates a lazy CsvReader over source, which can be any suitable byte range or I/O wrapper (for example, a string or ubyte[]).

fromCsv

public CsvResult!T fromCsv(T)(FieldView f) @safe nothrow @nogc;

Converts a FieldView to the requested type T, returning CsvResult!T. On failure, isOk is false and err.code == CsvErrorCode.INVALID_CONVERSION.

Supported conversions include:

  • Integral types (e.g. int, long, uint, ulong)
  • Floating‑point types (e.g. float, double)
  • bool (typical textual and 0/1 forms)
  • Enums (by name)

Exceptions

CsvException

Thrown by helpers such as CsvResult!T.valueOrThrow() when converting a result‑style error into an exception.

class CsvException : Exception {
    CsvError error;
}

Example:

auto res = fromCsv!int(field);
try {
    auto value = res.valueOrThrow();
    // use value
} catch (CsvException e) {
    // handle CSV error (e.error)
}

Examples

Basic parsing loop with stats and diagnostics

import ddn.data.csv;

void main() {
    const text = "id,value\n1,10\n2,not-a-number\n3,20\n";

    auto dialect = CsvDialect.init;
    dialect.header = true;
    dialect.strictFieldCount = true;
    dialect.collectDiagnostics = true;

    auto reader = byRows(text, dialect);

    // Skip header row
    if (!reader.empty) {
        reader.popFront();
    }

    long sum = 0;
    while (!reader.empty) {
        auto row = reader.front;
        auto id  = fromCsv!long(row[0]);
        auto val = fromCsv!long(row[1]);

        if (id.isOk && val.isOk) {
            sum += val.value;
        }

        reader.popFront();
    }

    auto stats = reader.stats; // CsvReadStats
    // stats.badRows and stats.diagnostics may report malformed rows
}

See Also


License

SPDX-License-Identifier: BSD-3-Clause