I donāt normally enjoy much of the content that comes out of the CNCF but in this particular case this guest blog post on how DevOps is āmisunderstoodā is actually a great read. I empathize with a lot of the points made here and agree. š It also makes me want to write a follow-up blog post in response to this, because thereās another aspect of āmisunderstandingā when it comes to the āPlatform Operatorsā as the author puts it here that Iād like to discuss š #DevOps #SRE
Using time series as alert thresholds ā Robust Perception | Prometheus Monitoring Experts
This is neat, I must try this out one day to let our dev teams define their own alerting thresholds per service. #sre #alerting #prometheus
Soā¦ The last couple of days for me at my day job has been figuring out how to tier our various environments and setup some kind of tiered Prometheus.
Federation was never going to work out so well for me because you canāt scrape 10-20k time-series of another Prometheus server, the scraping loop will just timeout consume that amount of data.
Turns out there is a new feature in Prometheus v2.34.0 called āAgent modeā that allows one to run a Prometheus server in such a way that it does two very important things:
- Sends/writes all time-series it locally scrapes via Remote Write to another Prometheus server
- Keeps a WAL (write-ahead-log) to ensure no metric is ever lost and,
- only deletes the WAL entries on a successful write to the remote Prometheus.
This setup allows (along with appropriate External Labels) to have a very nice tiered Prometheus setup where you effectively have a central Prometheus server (with no scraping configuration, except for Prometheus itself) that acts as the Remote receiver for Prometheus agents.
Book Release: Go For DevOps #go #golang #sre #devops #terraform href=āhttps://txt.sour.is/search?tag=kubernetesā>#kubernetes**
In the book you will find:
\* The basics of Go (including the 1.18 generics addition)
\* Using Go with various encoding formats
\* Building basic REST and gRPC services
\* Applying Go to automate local system tasks
\* Utilizing Go to automate those same tasks on thousands of machines
\* Building a ā¦ ā Read more
TIL: Today I learned that Googleā¢ actually has a .google
TLD and that there is sre.google
=> Google - Site Reliability Engineering š³ #Google #SRE