0

Estimating Historian Disk Space (version 22)

Appropriately accounting for the future growth of a data historian can be very complicated and requires the consideration of several variables, including:

  1. Number of Tags
  2. Sample Rate
  3. Date Change Rate
  4. Data Type
  5. Timestamp Resolution
  6. Tag Metadata Properties
  7. Length of Tag Names

Attached to this document you will find a helpful Excel file for roughly calculating the size of the Canary Historian based on the first three variables from above.  Note, this is only a tool for estimation purposes.


Perhaps the most obvious of the seven variables, the number of tags obviously matters when calculating how much disk space gets consumed. A tag, sometimes referred to as a data-point, channel, or trend, is a uniquely identified point of data collection from a field device such as a sensor. A tag consists of a timestamp, a value, and a quality score. 

The rate at which the Canary Collectors poll data from the data source is referred to as the sample rate, or update/scan rate.  The faster you sample a tag, the more likely you are to have additional values.

Using deadbanding features with a Logging Session can help avoid archiving values that do not change significantly enough.

Due to the 'Update by Exception' mechanism of the historian, Canary does not log a tag value that does not change from the previous value. Instead, the timestamp of the last known value gets overwritten, saving disk space. Understanding how often your tags change helps predict how much storage space is needed. In most real-world environments, the majority of tags do not change as often as they are scanned.

From Canary's experience, most tags have a change rate below 40%.

The type of data stored must be considered when calculating disk space. The Historian supports data types from single byte Booleans to 8 byte floats. Being efficient in what type of data you are writing is one of the most important measures in maintaining a small storage footprint.  For instance, 100 tags sampled at one second intervals in an R4 format use approximately 45% less storage than the same 100 tags stored in an R8 format.

Choosing the appropriate length, or resolution, of timestamps can save on storage.  If  a hundredth of a millisecond or thousandth of a millisecond precision is not necessary, use 'timestamp normalization' settings within the Logging Session to round timestamps to the nearest 1000 millisecond.  

Metadata, or properties, can be recorded along with tag values and quality scores. Similar to tag names, lengths of strings used will impact overall HDB file size. While these metadata values are handled like tag names, only written to the HDB file one time, they will add to the overall storage footprint, albeit minimally.

Last is the amount of characters in a tag name. Long tag names can add to the storage requirements, even though they are only written one time per HDB file.  For DataSets with slow changing tags, it is not uncommon for the tag names to require more storage than the TVQs themselves (e.g., alarm tags that only change once or twice per day).

When an HDB file (the daily file that stores all tag data) is rolled over and validated, Canary uses a loss less compression algorithm to minimize the storage footprint. 

To estimate the potential hard driving sizing for your project, consider the following potential scenarios, both representative of typical complex systems.  Tag count is held to 1,000 so you can easily run storage estimates based on the below values.

1 Year Archive - 10.4 GB / 7 Year Archive - 72.8 GB

  • 1,000 tags
  • 50% Boolean (I1), 10% integer (I4), 30% float (R4), 10% double (R8)
  • Average sample rate - 5 seconds
  • Average change rate - 30%

1 Year Archive - 6.5 GB / 7 Year Archive - 35.2 GB

  • 1,000 tags
  • 70% Boolean (I1), 10% integer (I4), 20% float (R4)
  • Average sample rate - 5 seconds
  • Average change rate - 30%

1 Year Archive - 28.8 GB / 7 Year Archive - 202.2 GB

  • 1,000 tags
  • 30% Boolean (I1), 20% integer (I4), 20% float (R4), 30% double (R8)
  • Average sample rate - 3 seconds
  • Average change rate - 50%

As you can see from the three examples above, the tag count is constant, but the additional mix of data types, sample rate, and change rate have serious input on the storage requirements. 

To better help you estimate your needs, here are additional details of storage needs, based solely on data type.  Again 1,000 tags was used for easy scalability.  All change rates were held at 50% to also allow for simplified calculations.

1,000 two byte Boolean tags (I1):

Scan rate Change rate Daily file size Per year
1 second 50% 120 MB 44.0 GB
5 seconds 50% 24 MB 8.8 GB
30 seconds 50% 4 MB 1.5 GB
1 minute 50% 2 MB 0.7 GB

 

1,000 four byte integer tags (I4):

Scan rate Change rate Daily file size Per year
1 second 50% 241 MB 88.1 GB
5 seconds 50% 48 MB 17.6 GB
30 seconds 50% 8 MB 2.9 GB
1 minute 50% 4 MB 1.5 GB

 

1,000 four byte float tags (R4):

Scan rate Change rate Daily file size Per year
1 second 50% 241 MB 88.1 GB
5 seconds 50% 48 MB 17.6 GB
30 seconds 50% 8 MB 2.9 GB
1 minute 50% 4 MB 1.5 GB

 

1,000 eight byte float tags (R8):

Scan rate Change rate Daily file size Per year
1 second 50% 402 MB 146.9 GB
5 seconds 50% 80 MB 29.4 GB
30 seconds 50% 13 MB 4.9 GB
5 minutes 50% 7 MB 2.5 GB

Reply

null