Module metrics

Module metrics 

Source
Expand description

Defines the metrics system for pmem devices.

§Metrics format

The metrics are flushed in JSON when requested by vmm::logger::metrics::METRICS.write().

§JSON example with metrics:

{
 "pmem_drv0": {
    "activate_fails": "SharedIncMetric",
    "cfg_fails": "SharedIncMetric",
    "no_avail_buffer": "SharedIncMetric",
    "event_fails": "SharedIncMetric",
    "execute_fails": "SharedIncMetric",
    ...
 }
 "pmem_drv1": {
    "activate_fails": "SharedIncMetric",
    "cfg_fails": "SharedIncMetric",
    "no_avail_buffer": "SharedIncMetric",
    "event_fails": "SharedIncMetric",
    "execute_fails": "SharedIncMetric",
    ...
 }
 ...
 "pmem_drive_id": {
    "activate_fails": "SharedIncMetric",
    "cfg_fails": "SharedIncMetric",
    "no_avail_buffer": "SharedIncMetric",
    "event_fails": "SharedIncMetric",
    "execute_fails": "SharedIncMetric",
    ...
 }
 "pmem": {
    "activate_fails": "SharedIncMetric",
    "cfg_fails": "SharedIncMetric",
    "no_avail_buffer": "SharedIncMetric",
    "event_fails": "SharedIncMetric",
    "execute_fails": "SharedIncMetric",
    ...
 }
}

Each pmem field in the example above is a serializable PmemDeviceMetrics structure collecting metrics such as activate_fails, cfg_fails, etc. for the pmem device. pmem_drv0 represent metrics for the endpoint “/pmem/drv0”, pmem_drv1 represent metrics for the endpoint “/pmem/drv1”, and pmem_drive_id represent metrics for the endpoint “/pmem/{drive_id}” pmem device respectively and pmem is the aggregate of all the per device metrics.

§Limitations

pmem device currently do not have vmm::logger::metrics::StoreMetrics so aggregate doesn’t consider them.

§Design

The main design goals of this system are:

  • To improve pmem device metrics by logging them at per device granularity.

  • Continue to provide aggregate pmem metrics to maintain backward compatibility.

  • Move PmemDeviceMetrics out of from logger and decouple it.

  • Rely on serde to provide the actual serialization for writing the metrics.

  • Since all metrics start at 0, we implement the Default trait via derive for all of them, to avoid having to initialize everything by hand.

  • Devices could be created in any order i.e. the first device created could either be drv0 or drv1 so if we use a vector for PmemDeviceMetrics and call 1st device as pmem0, then pmem0 could sometimes point to drv0 and sometimes to drv1 which doesn’t help with analysing the metrics. So, use Map instead of Vec to help understand which drive the metrics actually belongs to.

The system implements 1 type of metrics:

  • Shared Incremental Metrics (SharedIncMetrics) - dedicated for the metrics which need a counter (i.e the number of times an API request failed). These metrics are reset upon flush.

We add PmemDeviceMetrics entries from pmem::metrics::METRICS into Pmem device instead of Pmem device having individual separate PmemDeviceMetrics entries because Pmem device is not accessible from signal handlers to flush metrics and pmem::metrics::METRICS is.

Structs§

PmemMetrics
Pmem Device associated metrics.
PmemMetricsPerDevice
map of pmem drive id and metrics this should be protected by a lock before accessing.

Functions§

flush_metrics
This function facilitates aggregation and serialization of per pmem device metrics.