Data Collector Overview and Methodology (version 23)

Knowledge Base / Version 23 / System Admin Duties / Data Collection

updated 1 yr ago

What are Data Collectors?

Data Collectors are the components of the Canary System responsible for transferring the process data from a client's data source to the Canary Historian. Each Data Collector has an interface which allows the administrator to customize how data is logged as well as indicate to which Canary Historian(s) the data is sent.

Types of Data Collectors

The OPC UA Collector

Built to follow the standards of the OPC Foundation for connecting to OPC UA servers.
Install with Canary Installer. From within the application 'Canary Admin', users can use the OPC Collector module to create, modify, and oversee the status of all OPC UA logging sessions.
Configuration settings include the application of deadbands, data transformation, adjusting the update rate (Sample Interval), adding custom metadata, and adjusting the decimal precision of timestamps using timestamp normalization.

The OPC DA Collector

Built to follow the standards of the OPC Foundation for connecting to OPC DA servers.
Install with Canary Installer. From the Logger Administrator application, users can create, modify, and oversee the status of local and remote OPC DA logging sessions.
Configuration settings include the application of deadbands, adjusting update rates, configuring pre-defined metadata, creating data transformations, adjusting timestamp decimal precision, and triggering logging based on the value of another tag.

The MQTT Collector

Built to follow the standards of the MQTT SparkplugB specification.
Install with Canary Installer. From the Canary Admin users can use the MQTT Collector tile to configure, enable/disable, or monitor multiple sessions.
Configuration settings include the ability to subscribe to multiple brokers and topics, the assignment of primary host to leverage store and forward, using birth and death certificates for state of awareness, as well as enabling TLS security.

The SQL Collector

Supports Microsoft SQL Server and MySQL databases.
Install with Canary Installer. From within the application 'Canary Admin', users can use the SQL Collector module to create, modify, and oversee the status of all SQL logging sessions.
Functionality includes the ability to convert existing historical SQL databases to Canary Historian archives and/or read new ongoing updates. The SQL Collector requires a specific queue table to stage records, imports them at a configurable interval, and then discards the records from the queue table. The performance benefits from this methodology are drastically superior to reading production tables.

The Cygnet Collector

Built to utilize the CygNet .NET API and interface with the Canary Sender.
Install with Canary Installer. From the C:\ProgramData\Canary Labs\Logger\StoreAndForward\CygNetCollector\ folder, edit the SQLite database provided by the Canary team to create or modify CygNet logging sessions. From within the application 'Canary Admin', users can use the CygNet Collector module to enable, monitor, or disable existing sessions.
Configuration includes facility and UDC combinations, the mapping of CygNet attributes to Canary tag properties, and the auto-discovery of new CygNet facilities.

The CSV Collector

Created to import flat files using the .CSV format on arrival providing a simple and easy solution for logging data from external systems without using APIs.
Install with Canary Installer. Requires the modification and creation of two config files: 'SAF_ImportService.config' and 'SAF_Import.config'.
Very few configurations required, simply determine whether or not you need to archive the CSV file after upload or discard. Flexibility exists in the file formatting, giving you the ability to read CSV files in vertical or horizonal data structures.

Canary's Module for Ignition

Approved by Inductive Automation and available on the Inductive Automation Third-Party Module Showcase.
From the Ignition Gateway, users can access the 'Config' menu and the 'Modules' submenu to install, configure, and edit the Canary Historian Module.
Configuration settings include the native historian functionality available from within the Ignition platform, including the ability to log to multiple Canary Historian instances as well as the added flexibility to query data from Canary Historians from Ignition controls.

API Collector

Offered to allow for custom .NET or Web API collector development.
Documentation can be found at https://writeapi.canarylabs.com/.
Configuration settings include the availability of the same calls and settings as used in other Canary Collectors to empower custom collector development.

Data Collector Licensing

The Canary Data Collectors are not licensed and all Collectors are included with the purchase of the Canary System. However, Collectors do need to be installed individually using the Canary Installer. By choosing not to license individual data collectors, it is easier to follow best practice architectures and ensure robust solutions.

Sender and Store and Forward (SaF) Technology

The Canary Sender and the Receiver are the main components that work together to provide SaF technology, accomplishing the following:

Communication is built on WCF framework and provides secure and encrypted data transmission.
Network / historian machine state awareness by monitoring the connection between the Sender (data collection side) and the Receiver (historian archive side).
Data caching to local disk should the Sender lose connection to the Receiver. Cache will continue for as long as disk space permits. By default the Sender will stop writing new values to disk when only 1 GB of space remains. This can be configured using the same registry entry that the Historian uses. (Configuring Disk Space Limits)
Notification of system administrators that data logging has not updated in a set period of time (customized by DataSet and managed from the Canary Admin > Historian > Configuration settings.
Upon the re-establishment of the network or connection to the historian machine, the Sender and Receiver will automatically backfill the Canary Historian archive with all buffered values and clear the local disk cache.

The Sender is used by all Canary Data Collectors automatically, as apart of the store and forward process. Data Collectors communicate with the Sender Service via a public API. This means with some programming knowledge, a user can configure their own custom Data Collectors using the Sender API.

User Authentication

The Sender is capable of supporting User Authentication and requires the user credentials before connecting to a Sender. The Canary System administrator can determine, from within the Sender Service, which users are granted access. The advantage of user authentication is that no bad actors can jeopardize your operation.

Packet Size, Packet Zip, and Packet Delay

The Sender API supports three session settings of packet size, zip, and delay for conserving network resources. 'Packet Size' prevents data from being sent in huge bursts or batches, by limiting the amount of bytes that can be sent at once. The 'Packet Delay' determines how often data packets are sent across the network to the historian. The packet size and delay settings work in tandem to form the 'data throttle' that controls total network resource consumption. Finally, the 'Packet Zip' can be configured to prevent the historian from zipping the file, which saves some storage space at the cost of decreasing network performance.

Packet settings are not configurable within all Data Collectors but is available in the Sender API SDK.

Inserting or Editing Data

The fastest way to write data is chronologically, which is the default method of the Canary historian. However, sometimes custom-made data collectors send data in batches, with timestamps logging out of order. A tag with a timestamp of 4:05pm may log before a 4:02PM timestamp. Normally, these timestamps would be rejected by the historian and labeled as 'backwards timestamps'. However the 'InsertReplaceData' session setting can be activated using the Sender API. The 'InsertReplaceData' setting instructs the historian to accept 'backwards timestamps' and can even be configured to replace used timestamps with the incoming 'backwards timestamps'.

Inserting data is supported by the CSV, MQTT, and SQL Data Collectors as well as available in the Sender API SDK. For best performance, it is best to send data to the historian in chronological order.

Dual Logging

The Sender is capable of supporting dual logging. The advantage of dual logging is that data from a single Logging Session can be sent simultaneously to multiple historians.

Typically this is configured by adding a second or third historian to the Logging Session, using a comma and space as separators (e.g., HistorianOne, HistorianTwo).

Logging Architecture Options and Best Practices

The Canary System is comprised of components that are pre-integrated to assist in the collection, storage, and reporting of data.

In its simplest form, the Canary System can be broken down into the following components, Data Collectors, Sender, Receiver, Canary Historian, and Views.

The Canary Data Collectors and Sender work in tandem to log data from the data source and push that data to the Receiver. Each Collector/Sender can be configured to push data to multiple Receivers. The connection between Sender and Receiver is equipped with Store and Forward enabling local data buffering as necessary.

The Receiver is always installed local to the Canary Historian and writes received values into the archive.

The Views serves as the gatekeeper for client data queries, retrieving necessary data from the Canary Historian, performing any aggregated data requests, and then publishing that data to the client tool, whether a Canary tool, API, or third-party solution.

Most components can be installed individually and provide for flexible system architectures. However, a series of best practices is recommended to follow whenever possible.

Data Logging

Whenever possible, keep the Canary Data Collector and Sender components local to the data source.

Install both the appropriate Data Collector as well as the Canary Sender on the same machine as the OPC server, MQTT broker, SCADA server, or other data collection source.

The benefits of this architecture are based on the Sender's ability to buffer data to local disk should connection to the Receiver and Historian be unavailable. This protects from data loss during both network outages as well as both historian server and Canary System patches and version upgrades.

Both the Sender and Receiver have security settings that can be configured with Active Directory to require user authentication and authorization prior to accepting and transmitting data and only require a single port to be open in the firewall.

Should the value of the organization's data not be as important, the Data Collector and Sender could be installed remote from the data source on an independent machine or on the same server as the Receiver and Canary Historian itself. This form of 'remote' data collection will still work without issue except in the event of a network outage or other cause of unavailability of the historian. The option of buffering data will be unavailable.

Logging from Multiple Data Sources

Data from multiple data sources can be sent to the same Canary Historian. To accomplish this create unique Data Collector and Sender pairings as local to each data source as possible. The type of data source can vary (OPC DA or UA, MQTT, SQL, CSV, etc) but the architecture will stay the same.

Redundant Logging Sessions

The Receiver moderates which data is logged to the Canary Historian using the tag name, timestamp, value, and quality; all of which are communicated from the Sender. In doing so, it monitors each individual tag, only allowing a single unique value and quality per timestamp.

This feature provides simplistic dual logging as the Receiver follows a 'first in wins' methodology. Simply put, redundantly logging the same information from multiple Sender sessions will result in the Canary Receiver keeping the first entry while discarding any and all duplicate entries.

To use this feature for redundant data logging, simply create two separate Canary Data Collector and Sender sessions. Architectures can vary as you may choose to duplicate the data server (OPC server, SCADA instance, MQTT broker, etc) shifting the point of failure to the device level. Or you can simply create a separate Data Collector and Sender session that is remote to the data source in which case the data source or server becomes an additional failure point to the device.

Logging to Redundant Canary Historians

Each Sender has the ability to be configured to push data in real-time to multiple Receiver and Canary Historian locations enabling redundancy. This is configured within the Data Collector with the configuration slightly varying depending on the collector type. Generally however, this is accomplished by listing multiple historians in the 'Historian' field, separated with a comma.

For example, the Sender would send a data stream across the network to the Receiver installed on the server named 'HistorianPrimary' while also simultaneously sending an identical stream to the server named 'HistorianRedundant'.

This dual logging approach is recommended when redundant historical records are desired and insures that a real-time record is provided to both historian instances. Multiple data sources can be used as demonstrated in previous architectures. Each data source would need to have the Data Collector configured to push data to all desired Canary Historian instances.

A Canary Historian instance operates independently of other Canary Historian instances. This isolation ensures that the records of each historian are secure and not vulnerable to data syncs that may create duplication of bad data.

Proxy Servers and Logging Across DMZs

A Canary Proxy Server may also be implemented in logging architectures. This feature provides a Receiver packaged with another Sender, and would be installed to serve as a 'data stream repeater', often useful in DMZ or strict unidirectional data flows.

Sitting between two firewalls, the Proxy Server is comprised of a Receiver that manages the incoming data stream from the remote Sender and Data Collector sessions. The Proxy Server Receiver is also paired with an outbound Sender service that can relay the data stream to another Receiver, in this case a level above the Proxy Server and through an additional firewall. Like all Sender and Receiver configurations, this only requires a single open firewall port for all data transfer and ensures no 'top-down' communication can occur.

Logging to Cloud Historian Instances

Data logging can easily be accomplished to 'cloud' or 'hosted' Canary Historian instances. Again, this is accomplished by simply configuring the Data Collector and Sender sessions to include the IP address and security configurations necessary to reach the remote historian.

Should a local historian also need to be added to the architecture, add an additional historian instance to the Data Collector configuration and you will begin pushing data in real time to both a local Canary Historian as well as the cloud installation.