Asynchronous CSV Parser in C

by Tadas Vilkeliskis, October 26, 2013

I literally just finished writing the initial version of async CSV parser. This parser is intended to be used in high performance applications, possibly within then event loop to process streaming CSV data. It uses callbacks to notify when the field information is available. The library itself is very bare bones and it’s up to the user to choose the data right data structure to store the data. The library can be found on github.

Example

This example shows how to load a CSV file into a linked list.

int field_cb(csv_parser_t *parser, const char *data, size_t length, int row, int col)
{
    csv_t *csv = parser->data;
    field_t *field;

    if (!csv->tail) {
        field = new_field(data, length, row, col);
        csv->tail = field;
        csv->head = field;
    } else {
        if (csv->tail->row == row && csv->tail->col == col) {
            strncat(csv->tail->data, data, length);
        } else {
            field = new_field(data, length, row, col);
            csv->tail->next = field;
            csv->tail = field;
        }
    }

    return 0;
}

The function above is our callback method. Callbacks can be called multiple times for the same row column combination because data can arrive in chunks. In this particular example we are concatenating data with the previous one if row and column are the same. Otherwise a new link is added.

int main(int argc, const char *argv[])
{
    csv_t csv;
    field_t *field;
    char buffer[64];
    ssize_t nread;
    csv_parser_t parser;
    csv_parser_settings_t settings;

    csv.head = NULL;
    csv.tail = NULL;

    settings.delimiter = ',';
    settings.field_cb = field_cb;

    csv_parser_init(&parser);
    parser.data = &csv;

    int fd = open("iris.csv", O_RDONLY);
    if (fd == -1) {
        perror("open");
        return -1;
    }

    while ((nread = read(fd, buffer, sizeof(buffer))) > 0) {
        csv_parser_execute(&parser, &settings, buffer, nread);
    }

    close(fd);

    field = csv.head;
    while (field) {
        printf("row: %3d, col: %3d, data: %s\n", field->row, field->col, field->data);
        field_t *next = field->next;
        free(field);
        field = next;
    }

    return 0;
}

The main function just read the file in 64 byte chunks and processes it with the CSV parser. The full example is in the repo.