ddn.data.csv¶
The ddn.data.csv module provides a high-performance, RFC 4180-compliant CSV reader and writer with configurable dialect options and performance features such as buffered I/O and zero-copy parsing.
Overview¶
This module offers a comprehensive solution for reading and writing CSV (Comma-Separated Values) files. It strictly follows the RFC 4180 specification while providing flexibility through configurable dialect options to handle various CSV formats encountered in the real world.
The implementation is designed for high throughput, making it suitable for processing large datasets efficiently.
Key Features¶
- RFC 4180 Compliance: Full compliance with the CSV standard
- Configurable Dialects: Customize delimiter, quote character, newline handling, and more
- High Performance: Optimized for throughput with buffered I/O and memory mapping
- Zero-Copy Parsing: Minimize allocations on the hot path
- Error Handling Modes: Choose between permissive and fail-fast error handling
- Header Row Support: Optional first-row-as-header interpretation
- Embedded Newlines: Support for newlines within quoted fields
- UTF-8 BOM Handling: Optionally accept and skip UTF-8 byte order marks
RFC 4180 Compliance¶
The module implements RFC 4180 with the following behaviors:
- Record Delimiters: CRLF per RFC; reader detects CRLF, LF, and legacy CR
- Header Row: Optional; same field count as data rows when enabled
- Field Delimiter: Comma by default; configurable to other single-byte delimiters
- Quoted Fields: Fields containing delimiter, quote, or newline are quoted
- Quote Escaping: Doubled quotes within quoted fields
- Embedded Newlines: Supported inside quoted fields
- Whitespace: Spaces are data; optional trimming for unquoted fields
Basic Usage¶
Reading CSV Data¶
import ddn.data.csv;
void main() {
const csv = "name,age,city\nAlice,30,New York\nBob,25,Los Angeles\n";
long totalAge = 0;
auto reader = byRows(csv);
// Skip header row
reader.popFront();
// Process data rows
while (!reader.empty) {
auto row = reader.front;
auto age = fromCsv!long(row[1]);
if (age.isOk) {
totalAge += age.value;
}
reader.popFront();
}
writeln("Total age: ", totalAge); // 55
}
Writing CSV Data¶
import ddn.data.csv;
void main() {
auto writer = csvWriter();
// Write header
writer.putRow(["name", "age", "city"]);
// Write data rows
writer.putRow(["Alice", "30", "New York"]);
writer.putRow(["Bob", "25", "Los Angeles"]);
string result = writer.data;
}
CsvDialect Configuration¶
The CsvDialect struct allows you to customize CSV parsing and writing behavior:
import ddn.data.csv;
// Create a custom dialect
auto dialect = CsvDialect(
';', // delimiter (semicolon)
'"', // quote character
true, // doubleQuote (escape quotes by doubling)
false, // trimWhitespace
NewlinePolicy.DETECT, // newline detection
EscapeStyle.NONE, // RFC 4180 escaping only
true // header row present
);
Dialect Options¶
| Option | Default | Description |
|---|---|---|
delimiter |
, |
Field delimiter character |
quote |
" |
Quote character for fields |
doubleQuote |
true |
Escape quotes by doubling them |
trimWhitespace |
false |
Trim whitespace in unquoted fields |
newlinePolicy |
DETECT |
How to handle line endings |
escapeStyle |
NONE |
Escape style (RFC or backslash) |
header |
false |
First row is header |
strictFieldCount |
false |
Enforce consistent field count |
errorMode |
PERMISSIVE |
Error handling mode |
Newline Policies¶
// Detect CRLF and LF automatically (default)
dialect.newlinePolicy = NewlinePolicy.DETECT;
// Force CRLF (\r\n) handling
dialect.newlinePolicy = NewlinePolicy.FORCE_CRLF;
// Force LF (\n) handling
dialect.newlinePolicy = NewlinePolicy.FORCE_LF;
Escape Styles¶
// RFC 4180 only (double quotes inside quoted fields)
dialect.escapeStyle = EscapeStyle.NONE;
// Allow backslash escaping (non-RFC extension)
dialect.escapeStyle = EscapeStyle.BACKSLASH;
Error Handling¶
The module provides two error handling modes:
Permissive Mode (Default)¶
Malformed rows are skipped and errors are counted; iteration continues:
auto dialect = CsvDialect.init;
dialect.errorMode = ErrorMode.PERMISSIVE;
dialect.collectDiagnostics = true; // Optionally collect error details
auto reader = CsvReader!string(csvData, dialect);
// Process rows...
// Check statistics after processing
auto stats = reader.stats;
writeln("Errors: ", stats.errorCount);
Fail-Fast Mode¶
Stop iteration at the first error:
auto dialect = CsvDialect.init;
dialect.errorMode = ErrorMode.FAIL_FAST;
auto reader = CsvReader!string(csvData, dialect);
// Will stop at first malformed row
Working with Rows and Fields¶
Accessing Fields¶
import ddn.data.csv;
auto reader = byRows("a,b,c\n1,2,3\n");
foreach (row; reader) {
// Access by index
auto first = row[0]; // FieldView
auto second = row[1];
// Get field as string
string value = first.toString();
// Get field count
size_t count = row.length;
}
Type Conversion¶
Convert field values to typed data:
import ddn.data.csv;
auto row = reader.front;
// Convert to specific types
auto intResult = fromCsv!int(row[0]);
auto floatResult = fromCsv!double(row[1]);
auto boolResult = fromCsv!bool(row[2]);
if (intResult.isOk) {
int value = intResult.value;
}
Type Aliases¶
The module provides convenient type aliases:
// Field and row views
CsvField f; // alias for FieldView
CsvRow row; // alias for RowView
// Result type
CsvResultT!int ri; // alias for CsvResult!int
// Reader/writer types
alias Reader = CsvReaderOf!(const(char)[]);
alias Writer = CsvWriterTo!OutputRange;
Performance Considerations¶
For optimal performance:
- Memory Mapping: Large files are automatically memory-mapped on 64-bit systems
- Zero-Copy Parsing: Field views reference the original buffer without copying
- Lazy Iteration: Rows are parsed on-demand during iteration
- Buffered I/O: Efficient buffering for file-based reading
Memory Mapping Limits¶
On 32-bit platforms, files larger than ~1.5 GB fall back to buffered reading to avoid address space exhaustion.
Common Use Cases¶
Tab-Separated Values (TSV)¶
Semicolon-Separated (European Format)¶
Strict Validation¶
auto dialect = CsvDialect.init;
dialect.strictFieldCount = true;
dialect.errorMode = ErrorMode.FAIL_FAST;
When to Use¶
Use this module when you need to:
- Parse CSV files with high performance requirements
- Handle various CSV dialects and formats
- Process large CSV datasets efficiently
- Write RFC 4180-compliant CSV output
- Work with CSV data that contains special characters or embedded newlines
- Validate CSV structure with configurable strictness