How to Configure the CSV Collector (version 25)

Knowledge Base / Version 25 / System Admin Duties / Data Collection

updated 2 wk ago

RowBasedExample.json RowBasedExample ColumnBasedExample ColumnBasedExample.json

The following article is effective v25.3. If using a prior version, see How to Configure the CSV Collector.

The CSV Collector is used to consume time-series data from CSV files and store them in the Historian. With over 40 different parameters to choose from, the Collector can parse files of varying formats whether they be row-based (where tags names are listed over individual rows) or column-based (where each tag name is given its own column). A simple example of each is presented below and attached to the top of the article.

Row-Based CSV

Column-Based CSV

How It Works

The Collector monitors one or more directories for inbound CSV files to be processed. A .json config file is needed for each directory. This file(s) must be located in %ProgramData%\Canary\CsvCollector\Directories. Within the file, the user specifies the directory to be monitored, how to parse and process the CSV files, and where the data is to be sent.

Each configuration file is divided into three sections with an optional fourth:

Processing: Defines how files are handled and their relationships during and after processing.
CsvFileParsing: Specifies the format of the file’s data and how it’s interpreted.
Historian: Configures how data is transmitted to the Historian.
TagMapping: (optional) Section for changing tag names prior to them being stored in the Historian.

Any changes made to the configuration file(s) require the CSV Collector service to be restarted.

The .json config files used to parse the corresponding row- and column-based examples shown above are presented below and attached to the article.

Row-Based

{
  "Processing": {
    "DirectoryPath": "C:\\CSV Import\\",
    "ProcessedDirectory": "C:\\CSV Import\\Completed"
  },
  "CsvFileParsing": {
    "InputDelimiters": [
      ","
    ],
    "TagNameOffset": 0,
    "ValueOffset": 2,
    "CombinedDateTimeFieldTimestamp": {
      "FieldOffset": 1
    },
    "QualityOffset": 4,
    "DataTypeOffset": 3,
    "NoDataValues": [
      "#VALUE!",
      ""
    ],
    "HeaderCount": 1,
    "PropertyOffsets": [
      { "PropertyPosition": 5, "PropertyName": "Description" },
      { "PropertyPosition": 6, "PropertyName": "EngUnits" },
      { "PropertyPosition": 8, "PropertyName": "Region" },
      { "PropertyPosition": 9, "PropertyName": "Facility" }
    ]
  },
  "Historian": {
    "DatasetName": "CSV_Data",
    "DestinationHistorian": "HistorianServerName",
    "AutoCreateDataset": true,
    "AllowInsertedData": true
  },
  "TagMapping": {
    "TagMaps": {
      "tag1": "mappedTag1",
      "tag2": "mappedTag2"
    }
  }
}

Column-Based

{
  "Processing": {
    "DirectoryPath": "C:\\CSV Import\\",
    "ProcessedDirectory": "C:\\CSV Import\\Completed",
    "File Dependency": true,
    "ProcessingIntervalInSeconds":30
  },
  "CsvFileParsing": {
    "InputDelimiters": [
      ","
    ],
    "TagPositions": "2-8",
    "SplitDateTimeFieldTimestamp": {
      "DateFieldOffset": 0,
      "TimeFieldOffset": 1
    },
    "HeaderCount": 0,
  },
  "Historian": {
    "DestinationHistorian": "HistorianServerName",
    "AutoCreateDataset": true,
    "AllowInsertedData": true
  }
}

Configuration

Below are the available fields for each section listed above. Once the .json file has been created and placed in %ProgramData%\Canary\CsvCollector\Directories, the service can be restarted and the Collector can begin consuming CSV data as it arrives. For troubleshooting purposes, users can reference the Messages tile for any relevant information.

1. Processing

DirectoryPath - Directory location that the service will monitor for files being placed. This is required.
FileDependency - When set to true, this requires the Collector to process files one at a time. It implies that there is some dependency between the files. Default value is false.
ProcessingIntervalInSeconds - Frequency, in number of seconds, files are processed. Default value is 10.
ProcessedDirectory - Directory to copy files to when processing is complete. If unset or empty, files will be removed after processing but not copied. [Previously required KeepProcessedFiles]
MaxParallelFiles - Controls the number of concurrent files processed at one time when FileDependency is set to false. Default is 12.

2. CsvFileParsing

*There should only be one type of timestamp configured: CombinedDateTimeFieldTimestamp, SplitDateTimeFieldTimestamp, FieldIntervalDeltaTimestamp, LineNumberIntervalDeltaTimestamp, or UnixFieldTimestamp.

When counting column positions from left to right in the CSV file, positions start at 0.

InputDelimiters - Characters used to parse each line of data. Use the word “tab” to indicate a tab delimiter. Default value is [“,”].
DatasetOffset - Column position to be used as the DataSet name while building a tag name. Historian.DatasetName or DatasetOffset are required if the DataSet name is not prepended to the tag name being imported (e.g. Dataset.Boiler_Temp)
BranchOffset – An array of column positions to be used as a branch name while building a tag name.
BranchPosition - Row/Column offset of the name to be used as a branch name while building a tag name.
UseFilenameAsBranch - The file name without extension that will be used to name a branch in the Historian. Default value is false.
PropertyOffsets - Define a list of column offsets and their corresponding names for tag property values. (previously PropertyOffsets and PropertyNames)
TagPositions - Used in table-based processing to define a list of column offsets for tag names (e.g. "2, 4, 7-9, 12-*") (previously TagOffsets)
TagNameOffset - Column position to be used as a tag name.
ValueOffset - Column position to be used for the value.
*CombinedDateTimeFieldTimestamp - Configuration for combined date and time field timestamp.
- FieldOffset - Column position to be used for the date and time. Used if date and time are in the same column.
- Format - Format of datetime being imported if a non-standard format is used that .NET cannot convert automatically.
- IsUtc - Determines if the datetime provided is in UTC or a local time format. Default value is false.
*SplitDateTimeFieldTimestamp - Configuration for split date and time field timestamp
- DateFieldOffset - Column position to be used for the date.
- DateFieldFormat - Format of date being imported if a non-standard format is used that .Net can’t convert automatically.
- TimeFieldOffset - Column position to be used for the time.
- TimeFieldFormat - Format of time being imported if a non-standard format is used that .NET can’t convert automatically.
- IsUtc - Determines if the datetime provided is in UTC or a local time format. Default value is false.
*FieldIntervalDeltaTimestamp - Configuration for field interval delta timestamp.
- DeltaOffset - For file formats that are interval-based, column position of the interval or delta between each row
- Units - The units (seconds or minutes) that are used for the delta.
- StartDatePosition - Initial date to use to calculate the timestamp of each value. The row position of the start date must come before or on the same row as the StartTimePosition on the file being imported.
- StartTimePosition - Initial time to use to calculate the timestamp of each value.
*LineNumberIntervalDeltaTimestamp - Configuration for line number interval delta timestamp.
- TimeInterval - Default interval between each row for file formats that are interval-based but do not specify the interval as a column on each row. The time between each sample is assumed to be consistent.
- Units - The units (seconds or minutes) that are used for the interval between each row.
- StartDatePosition - Initial date to use to calculate the timestamp of each value. The row position of the start date must come before or on the same row as the StartTimePosition on the file being imported.
- StartTimePosition - Initial time to use to calculate the timestamp of each value.
*UnixFieldTimestamp - Configuration for Unix field timestamp
- FieldOffset - Column position to be used for the date and time that is held in a Unix timestamp format. It can be a 10-digit (10-digit Unix epoch), 13-digit (10-digit Unix epoch + 3-digit millisecond), 16-digit (10-digit Unix epoch + 6-digit microsecond), or 19-digit (10-digit Unix epoch + 9-digit nanosecond). Note that nanoseconds are truncated to the nearest microsecond.
QualityOffset - Column position to be used for the quality. If a quality column is unavailable, all values will assume a Good quality unless using the NoDataValues field. Only decimal quality codes are supported (e.g. 192 (Good), 0 (Bad), 64 (Uncertain)).
NoDataValues - A list of specific values that will instruct Store & Forward to insert a NoData quality for.
DataTypeOffset - Column position to be used if the data type is included with the value. Supported data types include: bool, boolean, I1, Int8, sbyte, UI1, UInt8, byte, I2, Int16, short, UI2, UInt16, ushort, I4, Int32, int, UI4, Uint32, uint, I8, Int64, long, UI8, UInt64, ulong, R4, single, R8, double.
HeaderCount - Number of rows to skip when reading a file to either get to the beginning of the data for row-based processing or to the record layout for table-based processing. Default value is 0.
SubHeaderCount - The number of rows between the header and the actual data in the file. Default value is 0.
PostProcessRegex - (Used in conjunction with TagNameMatch) Determines if the TagNameMatch regular expression will be run against the tag name before or after the dataset and branch have been prepended. Default value is false.
TagNameMatch - (Used in conjunction with TagNameReplace) Allows the user to define a regular expression in order to parse column headers.
TagNameReplace - (Used in conjunction with TagNameMatch) Allows the user to define which capture group to use as the tag name.
EngUnitReplace - (Used in conjunction with TagNameMatch) Allows the user to define which capture group to use as the Eng Units property.
UseSystemCultureForValueParsing - A Boolean value that tells the Collector to use the machine’s culture instead of the invariant culture. Default value is false.

Historian

DatasetName - Name of DataSet to route data to in the Historian. DatasetName or DatasetOffset are required if the DataSet is not prepended to the tag name being imported (e.g. Dataset.Boiler_Temp)
DestinationHistorian - Name of Historian to route data to. Default value is localhost.
DatasetFileDateTime - Allows the user to define the date of the HDB file within the Historian. [Previously was FileDate]
AutoCreateDataset - Determines if Store & Forward should create a DataSet in the Historian if the DataSet does not currently exist. Default value is true.
AllowInsertedData - Determines if Store & Forward will insert out of order data or if data must be received in chronological order. Default value is true.
AutoWriteNoData - Determines if Store & Forward will insert a NoData quality for each tag when a session disconnects. Default value is false.

Tag Mapping (optional)

TagMaps - Mapping of tags from their original name as they appear in the CSV file to their new name as they will be stored in the Historian. If tag maps are set only successfully mapped tags will be processed, all other tags will be bypassed.