Access built-in metrics
Applies to: IoT Edge 1.1 IoT Edge 1.2
The IoT Edge runtime components, IoT Edge hub and IoT Edge agent, produce built-in metrics in the Prometheus exposition format. Access these metrics remotely to monitor and understand the health of an IoT Edge device.
As of release 1.0.10, metrics are automatically exposed by default on port 9600 of the edgeHub and edgeAgent modules (http://edgeHub:9600/metrics
and http://edgeAgent:9600/metrics
). They aren't port mapped to the host by default.
Access metrics from the host by exposing and mapping the metrics port from the module's createOptions
. The example below maps the default metrics port to port 9601 on the host:
{
"ExposedPorts": {
"9600/tcp": {}
},
"HostConfig": {
"PortBindings": {
"9600/tcp": [
{
"HostPort": "9601"
}
]
}
}
}
Choose different and unique host port numbers if you are mapping both the edgeHub and edgeAgent's metrics endpoints.
Note
The environment variable httpSettings__enabled
should not be set to false
for built-in metrics to be available for collection.
Environment variables that can be used to disable metrics are listed in the azure/iotedge repo doc.
Available metrics
Metrics contain tags to help identify the nature of the metric being collected. All metrics contain the following tags:
Tag | Description |
---|---|
iothub | The hub the device is talking to |
edge_device | The ID of the current device |
instance_number | A GUID representing the current runtime. On restart, all metrics are reset. This GUID makes it easier to reconcile restarts. |
In the Prometheus exposition format, there are four core metric types: counter, gauge, histogram, and summary. For more information about the different metric types, see the Prometheus metric types documentation.
The quantiles provided for the built-in histogram and summary metrics are 0.1, 0.5, 0.9 and 0.99.
The edgeHub module produces the following metrics:
Name | Dimensions | Description |
---|---|---|
edgehub_gettwin_total |
source (operation source)id (module ID) |
Type: counter Total number of GetTwin calls |
edgehub_messages_received_total |
route_output (output that sent message)id |
Type: counter Total number of messages received from clients |
edgehub_messages_sent_total |
from (message source)to (message destination)from_route_output to_route_input (message destination input)priority (message priority to destination) |
Type: counter Total number of messages sent to clients or upstream to_route_input is empty when to is $upstream |
edgehub_reported_properties_total |
target (update target)id |
Type: counter Total reported property updates calls |
edgehub_message_size_bytes |
id |
Type: summary Message size from clients Values may be reported as NaN if no new measurements are reported for a certain period of time (currently 10 minutes); for summary type, corresponding _count and _sum counters are emitted. |
edgehub_gettwin_duration_seconds |
source id |
Type: summary Time taken for get twin operations |
edgehub_message_send_duration_seconds |
from to from_route_output to_route_input |
Type: summary Time taken to send a message |
edgehub_message_process_duration_seconds |
from to priority |
Type: summary Time taken to process a message from the queue |
edgehub_reported_properties_update_duration_seconds |
target id |
Type: summary Time taken to update reported properties |
edgehub_direct_method_duration_seconds |
from (caller)to (receiver) |
Type: summary Time taken to resolve a direct message |
edgehub_direct_methods_total |
from to |
Type: counter Total number of direct messages sent |
edgehub_queue_length |
endpoint (message source)priority (queue priority) |
Type: gauge Current length of edgeHub's queue for a given priority |
edgehub_messages_dropped_total |
reason (no_route, ttl_expiry)from from_route_output |
Type: counter Total number of messages removed because of reason |
edgehub_messages_unack_total |
reason (storage_failure)from from_route_output |
Type: counter Total number of messages unacknowledged because storage failure |
edgehub_offline_count_total |
id |
Type: counter Total number of times edgeHub went offline |
edgehub_offline_duration_seconds |
id |
Type: summary Time edge hub was offline |
edgehub_operation_retry_total |
id operation (operation name) |
Type: counter Total number of times edgeHub operations were retried |
edgehub_client_connect_failed_total |
id reason (not authenticated) |
Type: counter Total number of times clients failed to connect to edgeHub |
The edgeAgent module produces the following metrics:
Name | Dimensions | Description |
---|---|---|
edgeAgent_total_time_running_correctly_seconds |
module_name |
Type: gauge The amount of time the module was specified in the deployment and was in the running state |
edgeAgent_total_time_expected_running_seconds |
module_name |
Type: gauge The amount of time the module was specified in the deployment |
edgeAgent_module_start_total |
module_name , module_version |
Type: counter Number of times edgeAgent asked docker to start the module |
edgeAgent_module_stop_total |
module_name , module_version |
Type: counter Number of times edgeAgent asked docker to stop the module |
edgeAgent_command_latency_seconds |
command |
Type: gauge How long it took docker to execute the given command. Possible commands are: create, update, remove, start, stop, and restart |
edgeAgent_iothub_syncs_total |
Type: counter Number of times edgeAgent attempted to sync its twin with iotHub, both successful and unsuccessful. This number includes both Agent requesting a twin and Hub notifying of a twin update |
|
edgeAgent_unsuccessful_iothub_syncs_total |
Type: counter Number of times edgeAgent failed to sync its twin with iotHub. |
|
edgeAgent_deployment_time_seconds |
Type: counter The amount of time it took to complete a new deployment after receiving a change. |
|
edgeagent_direct_method_invocations_count |
method_name |
Type: counter Number of times a built-in edgeAgent direct method is called, such as Ping or Restart. |
edgeAgent_host_uptime_seconds |
Type: gauge How long the host has been on |
|
edgeAgent_iotedged_uptime_seconds |
Type: gauge How long iotedged has been running |
|
edgeAgent_available_disk_space_bytes |
disk_name , disk_filesystem , disk_filetype |
Type: gauge Amount of space left on the disk |
edgeAgent_total_disk_space_bytes |
disk_name , disk_filesystem , disk_filetype |
Type: gauge Size of the disk |
edgeAgent_used_memory_bytes |
module_name |
Type: gauge Amount of RAM used by all processes |
edgeAgent_total_memory_bytes |
module_name |
Type: gauge RAM available |
edgeAgent_used_cpu_percent |
module_name |
Type: histogram Percent of cpu used by all processes |
edgeAgent_created_pids_total |
module_name |
Type: gauge The number of processes or threads the container has created |
edgeAgent_total_network_in_bytes |
module_name |
Type: gauge The number of bytes received from the network |
edgeAgent_total_network_out_bytes |
module_name |
Type: gauge The number of bytes sent to network |
edgeAgent_total_disk_read_bytes |
module_name |
Type: gauge The number of bytes read from the disk |
edgeAgent_total_disk_write_bytes |
module_name |
Type: gauge The number of bytes written to disk |
edgeAgent_metadata |
edge_agent_version , experimental_features , host_information |
Type: gauge General metadata about the device. The value is always 0, information is encoded in the tags. Note experimental_features and host_information are json objects. host_information looks like {"OperatingSystemType": "linux", "Architecture": "x86_64", "Version": "1.2.7", "Provisioning": {"Type": "dps.tpm", "DynamicReprovisioning": false, "AlwaysReprovisionOnStartup": false}, "ServerVersion": "20.10.11+azure-3", "KernelVersion": "5.11.0-1027-azure", "OperatingSystem": "Ubuntu 20.04.4 LTS", "NumCpus": 2, "Virtualized": "yes"} . Note ServerVersion is the Docker version and Version is the IoT Edge security daemon version. |