This week I worked on building a real-time diagnostics panel using Prometheus and Grafana to visualize CPU, GPU, memory, and disk health across my MCP network.
I spent the past couple of days stitching together the core observability stack of my MCP project. Prometheus is set up and scraping system metrics. Grafana dashboards are working and beautiful—but I’m hung up on the next step: embedding those dashboards directly into my UI via iframe. It’s a small wall but one I’ll climb.
Prometheus is a time-series database that scrapes metrics from exporters—like node-exporter
for CPU and memory, or dcgm-exporter
for NVIDIA GPU stats. These metrics are then visualized through Grafana, which acts as the front-end brain of the entire monitoring system.
A basic Prometheus configuration file looks like this:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['forge-node-exporter:9100']
- job_name: 'gpu'
static_configs:
- targets: ['dcgm-exporter:9400']
The file is mounted inside the Prometheus container:
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
Once up and running, Prometheus listens on port 9090. You can hit localhost:9090
and explore live metrics at /metrics
or test targets in the /targets
tab.
Grafana is running and I’ve built out a dashboard for:
My next step is building a full diagnostic UI that will monitor:
I’ll unify this with a Python script that stitches together live feeds from 8 LLM bots, each with their own role. It’ll act like a systems crew—talking, checking logs, responding to performance changes.
/forge/docker/monitoring/prometheus/
├── docker-compose.yml
├── prometheus.yml
└── dashboards/
└── system-health.json
All services are running in Docker. Each container logs to a volume or pipe, and I’m scripting auto-healing logic for anything that crashes. It’s early, but it’s forming the backbone of my local-first command center.
This stack is what I’ll eventually run across multiple machines. It's minimal, sharp, and entirely sovereign.
— Lorelei Noble