Advertisement

Description

Company Overview:

Float is the #1 resource management software trusted by over 4,500+ customers worldwide. Since 2012, we’ve grown every year—independently, profitably, and with a commitment to positive impact as a certified B Corporation. Our 50+ team members work 100% remotely across multiple countries, empowering people to live their “Best Work Life.”

Job Summary:

We are seeking a motivated Site Reliability Engineer (SRE) to join our fast-scaling global team. This high-impact role gives you the opportunity to shape infrastructure, automate smarter systems, and drive reliability at scale. Whether you’re an entry-level engineer looking to grow quickly or an experienced SRE ready for urgent challenges, this position offers autonomy, collaboration, and exciting career growth in a truly remote-first environment.

Key Responsibilities:

  • Maintain, upgrade, and optimise Kubernetes infrastructure for smooth and secure operations.

  • Improve system hygiene by eliminating noisy alerts and boosting monitoring accuracy.

  • Collaborate with engineers to integrate services and support service migrations.

  • Optimise Kubernetes cluster usage and scale node specifications.

  • Lead service mesh exploration and implement secure ingress layers.

  • Create incident response playbooks to ensure faster recovery during production issues.

  • Support the next-gen data layer (CDC) for high-performance engineering teams.

  • Coach teams on reliability goals, SLAs, and best practices for production quality.

Required Qualifications & Skills:

  • Proficiency with Bash scripting and at least one programming language (PHP, Python, or NodeJS).

  • Strong production experience managing and optimising Kubernetes clusters.

  • Solid knowledge of Terraform (Infrastructure as Code).

  • Familiarity with Google Cloud Platform (GCP) or willingness to upskill fast.

  • Strong written communication skills for documentation and team collaboration.

  • Remote-first mindset: comfortable with asynchronous communication via Slack, Loom, and similar tools.

Preferred Qualifications:

  • Experience in scaling infrastructure for SaaS platforms.

  • Prior experience in a global, remote-first team environment.

Preferred Experience:

2–5+ years in site reliability, DevOps, or infrastructure engineering.

Benefits & Perks:

  • Competitive salary of USD $133,000 per year (Level 2 role).

  • 100% remote global team – work from anywhere.

  • Flexible schedules with deep focus time and minimal meetings.

  • Professional development opportunities, paid take-home assignment, and transparent salary framework.

  • Supportive, diverse, and collaborative culture with a strong focus on work-life balance.

Remote (Global Team – Work from Anywhere)

How to Apply:

Submit your updated CV and cover letter via our careers portal. If you’re excited about scaling infrastructure and want to thrive in a remote-first, high-growth environment, we’d love to hear from you.