So… The last couple of days for me at my day job has been figuring out how to tier our various environments and setup some kind of tiered Prometheus.

Federation was never going to work out so well for me because you can’t scrape 10-20k time-series of another Prometheus server, the scraping loop will just timeout consume that amount of data.

Turns out there is a new feature in Prometheus v2.34.0 called “Agent mode” that allows one to run a Prometheus server in such a way that it does two very important things:

  • Sends/writes all time-series it locally scrapes via Remote Write to another Prometheus server
  • Keeps a WAL (write-ahead-log) to ensure no metric is ever lost and,
  • only deletes the WAL entries on a successful write to the remote Prometheus.

This setup allows (along with appropriate External Labels) to have a very nice tiered Prometheus setup where you effectively have a central Prometheus server (with no scraping configuration, except for Prometheus itself) that acts as the Remote receiver for Prometheus agents.

#SRE #DevOps #Prometheus #Monitoring

⤋ Read More