This guide outlines how to integrate TrustGate with Prometheus for metrics scraping and Grafana for observability dashboards.

Overview

TrustGate exposes a /metrics endpoint compatible with Prometheus. You can use this to collect metrics such as request throughput, latency distributions, and service health indicators. These metrics can then be visualized via Grafana dashboards.


Prometheus Configuration

To enable metrics collection from TrustGate, add the following scrape_config section to your Prometheus configuration file (prometheus.yml):

scrape_configs:
  - job_name: 'trustgate'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'

Grafana Dashboard

Create a Grafana dashboard to visualize key metrics:

  1. Request Overview
  • Total requests by status code
  • Request rate over time
  • Active connections
  1. Latency Metrics
  • Overall request latency (p50, p90, p99)
  • Service-specific latency
  • Upstream latency distribution
  1. Service Health
  • Success rate by service
  • Error rate by route
  • Connection status

Example PromQL Queries

Request Rate

rate(trustgate_requests_total{status="2xx"}[5m])

95th Percentile Latency

histogram_quantile(0.95, sum(rate(trustgate_detailed_latency_ms_bucket{}[5m])) by (le, service))

Error Rate

sum(rate(trustgate_requests_total{status=~"4xx|5xx"}[5m])) by (status)

Active Connections

trustgate_connections{state="active"}

Best Practices

  1. Alert Configuration
  • Set up alerts for high error rates
  • Monitor latency thresholds
  • Track connection limits
  • Watch for request spikes
  1. Dashboard Organization
  • Group related metrics
  • Use appropriate time ranges
  • Include service-level views
  • Add error tracking panels
  1. Metric Collection
  • Set appropriate scrape intervals
  • Configure retention periods
  • Monitor metric cardinality
  • Use label aggregation
  1. Performance Monitoring
  • Track latency trends
  • Monitor resource usage
  • Watch for bottlenecks
  • Analyze traffic patterns

Troubleshooting

Common monitoring issues and solutions:

  1. High Latency
  • Check upstream service latency
  • Review connection pooling
  • Monitor resource usage
  • Analyze request patterns
  1. Error Spikes
  • Check service health
  • Review error logs
  • Monitor rate limits
  • Verify configurations
  1. Connection Issues
  • Check network connectivity
  • Review connection limits
  • Monitor timeout settings
  • Verify DNS resolution

Next Steps

  • Set up Prometheus and Grafana
  • Configure alerting rules
  • Create custom dashboards
  • Implement logging integration