Canary Historian Storage Methodology (version 23)
Familiarizing yourself with the structure and options available within the Canary Historian are the keys to understanding and architecting the best system possible for your organization. The historian uses several strategies to store data using loss-less compression while maintaining fast data retrieval speeds. Equal emphasis is placed on efficiently storing data as compactly as possible without any data loss or trimming.
Tags, DataSets, and Views
At the most basic level, the historian is a collection of tags. Tags, sometimes called points, or channels, represent a sensor or other data point. Tags consist of a name, a TVQ (timestamp, value, quality score), and any additional properties, or metadata, that is associated with them.
Tags are grouped into DataSets which function like folders. When clients request data from the Historian, they are ultimately asking for a collection of TVQs (timestamps, values, and qualities) for a group of tags that live within a DataSet or DataSets.
How clients see those tags grouped, or how they browse for those tags in a UI, is a View. The Canary Historian exposes itself as a View to a client in the exact same order/structure as the data is logged. Virtual Views can be built on top of the raw historian view by system administrators that allow clients to interact with tags differently - perhaps by asset type, maybe with the tag name aliased, or even grouped in different hierarchies.
Tag Names
Tag names are determined when Logging Sessions are created. Once a tag name has been established, it is difficult to change it within the historian archive view, but easy to change in a Virtual View. Tag names can be comprised of many alphanumeric characters and symbols, and multiple levels of hierarchy are represented within a tag name.
When clients browse for tags, Canary will recognize any dot ('.') within the tag name as a subnode. As an example of this, look at a handful of tags found within the {Diagnostics} DataSet included in the historian View on any Canary Historian.
DemoHistorian.{Diagnostics}.AdminRequests/sec |
DemoHistorian.{Diagnostics}.Reading.NumClients |
DemoHistorian.{Diagnostics}.Reading.TagHandles |
DemoHistorian.{Diagnostics}.Reading.TVQs/sec |
DemoHistorian.{Diagnostics}.Sys.CPU Usage Historian |
DemoHistorian.{Diagnostics}.Sys.CPU Usage Total |
DemoHistorian.{Diagnostics}.Sys.Memory Physical |
DemoHistorian.{Diagnostics}.Sys.Memory Virtual |
DemoHistorian.{Diagnostics}.Views.AxiomTotalTVQS |
DemoHistorian.{Diagnostics}.Views.CPU Usage |
DemoHistorian.{Diagnostics}.Views.TotalConnections/min |
DemoHistorian.{Diagnostics}.Views.TotalTVQs/min |
DemoHistorian.{Diagnostics}.Views.Working Set Memory |
DemoHistorian.{Diagnostics}.Writing.NumClients |
DemoHistorian.{Diagnostics}.Writing.TagHandles |
DemoHistorian.{Diagnostics}.Writing.TVQs/sec |
When browsing this View, we are looking at tags as they appear within the Canary Historian archive so the tag is prepended with the Canary Historian machine name, in this case, 'DemoHistorian'. Then, a dot ('.') is used to distinguish the next subnode. Here, the DataSet name {Diagnostics} is the next part of the tag name. Then, once again a dot separates the DataSet from the rest of the tag name.
Some tags will no longer need to add structure while others will continue to represent additional subnodes using the dot. In the above example we can see that 'AdminRequests/sec' has no additional subnodes while all other tags are grouped within the following sub nodes; 'Reading', 'Sys', 'Views', or 'Writing'.
The usefulness of the dot and tag subnodes can be clearly demonstrated when you consider the browsing of tags by a client. In Axiom, we can illustrate how the above tags are presented. You can see that the {Diagnostics} DataSet has been selected and expanded at left, revealing the subnodes below it. Additionally, all tags show in the list at right with the subnodes still prepended as part of the tag name.
As the 'Reading' subnode is selected, the tag list filters to only show those tags that are part of the 'Reading' subnode. Additionally, the tag name is again shortened.
Timestamps
For the historian to store a record, a timestamp is required. Timestamps are determined by the data source (e.g., OPC server, MQTT broker, SCADA server, PLC). This includes both time zone as well as timestamp resolution.
Timestamp resolution refers to the rounding up of the incoming timestamp from the data source to a specified number of milliseconds. When possible, reduce timestamp resolution within Canary Collector Logging Sessions. Tags with timestamp resolution set to 10 milliseconds will be compressed less efficiently than tags with timestamp resolution set to 1000 milliseconds.
Use the 'Normalization Time' features of the OPC UA and OPC DA Collectors to sync entire Logging Groups to standardized timestamps and round values up to the indicated millisecond.
Timestamps are converted and stored in the historian archive based on UTC time. When clients request or query data from the historian, timestamps are converted and presented based on the client machine's time zone.
Additionally, timestamps are important for redundant data logging. The historian allows redundant data feeds for the same tags by using First In Wins storage principals. The historian will archive the first value and quality for a timestamp while ignoring a later value or quality with an identical timestamp.
This type of redundancy is only possible if using Collectors that cannot insert historical timestamps (ie. OPC DA and OPC UA).
The historian is able to handle large batches of timestamps but in order to achieve the best performance, it is recommended that they be in chronological order. The historian is most efficient storing data when the timestamps are moving in a forward direction. The process of storing data is slowed significantly when the historian has to determine where to insert the historical timestamp in its archive.
Data Inserts, or the addition/editing of historical timestamps, values, and quality scores are possible, but only certain collectors support it - MQTT Collector, CSV Collector, Ignition Module, SQL Collector and custom collectors using the Sender API. The OPC DA and UA Collector are not able to insert historical records.
As the historian is constantly aware of whether it is actively connected to the Logging Sessions, timestamps are automatically extended once per minute for tags that have slower than one minute scan frequencies. These automatic Timestamp Extensions are held virtually as the 'last known timestamp' and provide client tools the ongoing ability to see that although a value has not been updated for the tag in question, it is still being actively logged.
Values
The Canary Historian is structured for the following data types:
- Integers
- Booleans
- Floating Points (Singles and Doubles)
- Strings
To maximize historian efficiency, use caution when storing large varieties of strings. Strings are best implemented when using the same series of string values over and over. For instance, 'Off' and 'On'.
This table indicates the range of values that can be written per data type.
Data Type | Min | Max |
I1 | -128 | 127 |
UI1 | 0 | 255 |
I2 | -32,768 | 32,767 |
UI2 | 0 | 65,535 |
I4 | -2,147,438,648 | 2,147,438,647 |
UI4 | 0 | 4,294,967,295 |
I8 | -9,223,372,036,854,770,000 | 9,223,372,036,854,770,000 |
UI8 | 0 | 1.84467E+19 |
R4 | -3.4E+38 (7 digits) | 3.4E+38 (7 digits) |
R8 | -1.7E+308 (15 digits) | 1.7E+308 (15 digits) |
Additionally, the Canary Historian also captures properties or metadata surrounding each tag. This can include:
- Descriptions
- Limits
- Engineering Units
- Scales
- Annotations
- Custom Defined Properties
Properties are written to the HDB file that contains the tags and are displayed within the HDB file and can be viewed by selecting the tag in the Canary Admin application within the Historian tile.
Writing metadata is dependent upon logging protocol. For example, OPC UA allows for custom metadata properties whereas OPC DA does not.
Tags are polled, or scanned, based on the configuration of the Logging Session. Update by Exception rules are then applied by the Canary Historian. This means only tag value or quality score changes are recorded as separate historian archive entries. Update by Exception storage methodology allows slow-changing tags to be scanned more frequently without increasing the storage footprint of the historian. If the value or quality score of a tag does not change, the last known timestamp is updated to current without the entire entry being duplicated.
Quality
Tag qualities are important for understanding the reliability of the data within the historian as well as for instructing client applications how to process or display the data.
The Canary Historian logs four categories of data quality. Three of those qualities follow the OPC Foundation standard and include:
- Good - the data source returns good quality scores if the operation completed normally and the result is always valid.
- Bad - the data source returns results with a bad/failed quality score to indicate that the data is flawed.
- Uncertain - the data source returns uncertain/warning quality scores if it could not complete the operation in the manner requested by the client application but the operation did not fail entirely.
Additionally, Canary uses a fourth data quality:
- No Data - indicates that the historian is no longer connected to the tag.
Each main data quality is supplemented with additional context, represented by a Status Code, which is helpful to identify more specific errors or conditions. These codes are found right beside the Quality score in the Historian tag record. Notice that the highlighted TVQ below shows a quality score of 'No Data (0x8000)', with 'No Data' representing the quality and '0x8000' representing the status code.
When a query is made for data values, if the historian encounters a 'No Data', it will continue to look backwards through up to ten HDB files or through the end of the data query duration, whatever occurs first. If only 'No Data' values persist through ten HDB files, the historian will by default will end the query. This setting can be configured using the following steps:
- Open the 'Registry Editor'
- Browse to the following location: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Canary Labs\Historian
- Add a new DWORD Value by right clicking on the 'Historian' folder and selecting 'New', then 'DWORD (32 bit) Value'
- Rename the new DWORD Value name to SequentialFileSearchLimit and set the 'Value data' field to match the number of HDB files you wish for the historian to search through
- Leave the 'Base' set to 'Decimal'
NOTE: Increasing the SequentialFileSearchLimit can impact the performance of the historian as it is having to query more files when retrieving data.
Common TVQ Status Codes
The table below contains commonly used status codes, however, many more Status codes can be attributed to a TVQ. To find out the meaning of these other codes, it must be converted from hexadecimal format into binary.
(0xC0) | This code indicated that the data source does not support any quality information. This code is typically seen with the 'Good' quality score, to indicate a normal TVQ. |
(0x8000) | No Data |
(0xE000) | NoData - Edge of the Data (i.e., there is no data prior to this) |
(0x4000) | Inserted TVQ |
(0x2000) | Modified TVQ |
(0x1000) | Deleted TVQ |
(0x800) | Reserved - currently unused and not expected |
(0x400) | Reserved - currently unused and not expected |
(0x200) | When this bit is set, the TVQ was inserted or replaced without audit information. This Quality can be passed. |
(0x100) | When this bit is set, it signifies that there is 'extra data' associated with the TVQ (i.e., audit info on inserts, replaces, or deletes; such as 'when', 'who', 'what it changed from', and 'why' information). |