4 Software Engineers
Location: Mountainview, CA or Remote
Looking for a (SRE) Site Reliability Engineers that is degreed. Must be a Linux administration expert and have Python, Cloud AWS, Azure, GCP. Infrastructure and applications monitoring Docker - Nice to haves Kafka RabbitMQ Data Bricks TeamCity DataDog CICD for cloud applications terraform Chef Puppet
Responsibilities:
- Implement tools and processes necessary to achieve required SLOs for Omnicell Platform.
- Define and implement CI/CD pipelines.
- Automate delivery of platform services using infrastructure-as-a-code. Build self-service playbooks for platform which can be consumed across globally distributed teams at Omnicell.
- Define and implement incident response management process, deploy necessary tools.
- Fix support and escalation issues.
- Conduct post-incident reviews.
- Collaborate with application and business stakeholders to ensure high-quality product is developed and deployed in production. Work diligently with other engineering teams to ratify release processes necessary to meet business goals.
- Drive continuous improvement process
Required Knowledge and Skills:
- Expert knowledge of one of the major public cloud platforms (Azure, AWS, GCP)
- Hands-on programming experience in Python or other object-oriented programming languages.
- Expert knowledge of Infrastructure and Application Monitoring tools: Prometheus, Grafana, DataDog, etc
- Experience implementing IaC concepts using Terraform, Chef, Puppet.
- Experience with Elasticsearch, Kibana
- Experience administering Databases
- Expert in Linux administration.
- Expert knowledge of Docker, Helm.
- Experience implementing CI/CD for cloud native applications.
- Experience with deploying applications that utilize Service Mesh
- Experience administering Kubernetes clusters.
- Experience defining and implementing incident response management processes.
Basic Requirements:
- Bachelor's degree
- 8+ years' experience in software engineering