Important Node Metrics
Last updated
Last updated
When you visit the metrics endpoint (see Node Inspection Service), you will notice that there are a large number of metrics and counters being produced by your node. Most of these metrics and counters are useful only for blockchain development and diagnosing hard-to-find issues. As a result, we recommend that node operators ignore most metrics and pay attention to only the key metrics presented below:
If you are running a validator node, the following consensus metrics are important:
endless_consensus_proposals_count
: Counts the number of times the node sent a block proposal to the network. The count will increase only when the validator is chosen to be a proposer, which depends on the node's stake and leader election reputation. You should expect this metric to increase at least once per hour.
endless_consensus_last_committed_round
: Counts the last committed round of the node. During consensus, we expect this value to increase once per consensus round, which should be multiple times per second. If this does not happen, it is likely the node is not participating in consensus.
endless_consensus_timeout_count
: Counts the number of times the node locally timed out while trying to participate in consensus. If this counter increases, it is likely the node is not participating in consensus and may be having issues, e.g., network difficulties.
endless_state_sync_executing_component_counters{label="consensus"
: This counter increases a few times per second as long as the node is participating in consensus. When this counter stops increasing, it means the node is not participating in consensus, and has likely fallen back to state synchronization (e.g., because it fell behind the rest of the validators and needs to catch up).
If you are running a fullnode (or a validator that still needs to synchronize to the latest blockchain state), the following state sync metrics are important:
endless_state_sync_version{type="synced"}
: This metric displays the current synced version of the node, i.e., the number of transactions the node has processed. If this metric stops increasing, it means the node is not syncing. Likewise, if this metric doesn't increase faster than the rate at which new transactions are committed to the blockchain, it means the node is unlikely to get and stay up-to-date with the rest of the network. Note: if you've selected to use fast sync, this metric won't increase until all states have been downloaded, which may take some time. See (3) below.
endless_data_client_highest_advertised_data{data_type="transactions"}
: This metric displays the highest version synced and advertised by the peers that your node is connected to. As a result, when this metric is higher than endless_state_sync_version{type="synced"}
(above) it means your node can see new blockchain data and will sync the data from its peers.
endless_state_sync_version{type="synced_states"}
: This metric counts the number of states that have been downloaded while a node is fast syncing. If this metric doesn't increase, and endless_state_sync_version{type="synced"}
doesn't increase (from above), it means the node is not syncing at all and an issue has likely occurred.
endless_state_sync_bootstrapper_errors
and endless_state_sync_continuous_syncer_errors
: If your node is facing issues syncing (or is seeing transient failures), these metrics will increase each time an error occurs. The error_label
inside these metrics will display the error type.
The following network metrics are important, for both validators and fullnodes:
endless_connections{direction="inbound"
and endless_connections{direction="outbound"
: These metrics count the number of peers your node is connected to, as well as the direction of the network connection. An inbound
connection means that a peer (e.g., another fullnode) has connected to you. An outbound
connection means that your node has connected to another node (e.g., connected to a validator fullnode).
If your node is a validator, the sum of both inbound
and outbound
connections should be equal to the number of other validators in the network. Note that only the sum of these connections matter. If all connections are inbound
, or all are outbound
, this doesn't matter.
If your node is a fullnode, the number of outbound
connections should be > 0
. This will ensure your node is able to synchronize. Note that the number of inbound
connections matters only if you want to act as a seed in the network and allow other nodes to connect to you as discussed Fullnode Network Connections.
The following mempool metrics are important:
core_mempool_index_size{index="system_ttl"
: This metric displays the number of transactions currently sitting in the mempool of the node and waiting to be committed to the blockchain:
If your node is a fullnode, it's highly unlikely that this metric will be > 0
, unless transactions are actively being sent to your node via the REST API and/or other fullnodes that have connected to you. Most fullnode operators should ignore this metric.
If your node is a validator, you can use this metric to see if transactions from your node's mempool are being included in the blockchain (e.g., if the count decreases). Likewise, if this metric only increases, it means that either: (i) your node is unable to forward transactions to other validators to be included in the blockchain; or (ii) that the entire blockchain is under heavy load and may soon become congested.
endless_api_requests_count{method="GET"
and endless_api_requests_count{method="POST"
: These metrics count the number of REST API GET
and POST
requests that have been received via the node's REST API. This allows you to monitor and track the amount of REST API traffic on your node. You can also further use the operation_id
in the metric to monitor the types of operations the requests are performing.
endless_api_response_status_count
: This metric counts the number of response types that were sent for the REST API. For example, endless_api_response_status_count{status="200"}
counts the number of requests that were successfully handled with a 200
response code. You can use this metric to track the success and failure rate of the REST API traffic.
The following metrics are important: