Issues with Aggregation & Resolution

Forum / Questions & Answers

sharding
1 yr ago

We're having some issues with null-handling and data aggregation.

We automatically aggregate the data to control the number of samples. For example, we try to ensure that we are asking for around 1000 samples of data regardless of the date range the user requests.

In Canary, this is causing some issues at different resolutions.

Let's say we have a point of data that is recorded every second.

If using Average, everything looks fine requesting 1000 points of data across 8 hours of time.

When we zoom in on 30s of data, a 1000-point Average aggregation will return 'null' for many of the points, since we are effectively asking for the points between actual samples. We understand this, and we know that we can use TimeAverage or TimeAverage2.

However, these have a disadvantage - both TimeAverage and TimeAverage2 tend to ignore all null values, even when they are legitimate. So if we zoom out to a full 24 hours on a day where equipment or collectors were switched off for several hours, we end up getting a smooth, unbroken line that hides these outages. We really don't want that behaviour.

Is there an aggregation we can use that would help here?

What we'd ideally need would be an aggregation function that is able to tell the difference between a time-window that is empty compared to a time-window that is actually recorded as null. If it filled in empty values but didn't fill hard-written nulls, outages would be visible.

One potential way of achieving this is if - when getting bounding values - a point with no values inside the aggregation-window inherited the quality of the earliest data point, not the earliest non-null data point. That way it'd be almost identical to the existing TimeAverage functions, but we'd have the ability to see that the quality of the data was poor.