Linux Site Reliability Engineer (SRE) Jobs
Site reliability engineers apply software engineering principles to infrastructure and operations problems, ensuring systems run reliably at scale. Linux is the de facto SRE platform. Browse SRE openings at cloud-native companies, unicorns, and established enterprises.
Frequently Asked Questions
-
SRE (Site Reliability Engineering) is a specific implementation of DevOps principles developed at Google. SREs apply software engineering to infrastructure and operations, defining error budgets, SLOs, and SLAs. DevOps is a broader cultural and organisational approach. In practice both roles often overlap, but SRE tends to have a stronger emphasis on reliability metrics and automation of toil.
-
SREs need deep Linux system knowledge including performance profiling (perf, flamegraphs, strace, tcpdump), service management with systemd, Linux networking (iptables, eBPF, tc), and container internals (cgroups, namespaces). Programming skills in Python or Go, and observability tooling (Prometheus, Grafana, OpenTelemetry) are also essential.
-
SRE salaries are among the highest in engineering. In the US, SRE total compensation typically ranges from $150,000 to $300,000+ at top tech companies, with base salaries commonly between $120,000 and $180,000. Demand significantly outstrips supply, keeping compensation elevated.
-
Common SRE tools on Linux include Prometheus and Grafana for observability, the ELK stack (Elasticsearch, Logstash, Kibana) for logging, Terraform and Ansible for infrastructure as code, Kubernetes for container orchestration, and chaos engineering tools like Chaos Monkey or LitmusChaos for reliability testing.