Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Principal Site Reliability Engineer image - Rise Careers
Job details

Principal Site Reliability Engineer

About Kandji


Kandji is the Apple device management and security platform that empowers secure and productive global work. With Kandji, Apple devices transform themselves into enterprise-ready endpoints, with all the right apps, settings, and security systems in place. Through advanced automation and thoughtful experiences, we’re bringing much-needed harmony to the way IT, InfoSec, and Apple device users work today and tomorrow.


Some of the smartest money in tech has partnered with Kandji to realize our vision, including Tiger Global, Felicis, Greycroft, First Round Capital, and Okta Ventures. In July 2024, Kandji raised $100 million in capital from General Catalyst, bringing Kandji’s valuation to $850 Million.


Since Kandji’s Series C in 2021, the company has seen a 600%+ increase in annual recurring revenue, and its customer base has grown nearly 4X across 40+ industries. Notable customers include Allbirds, Canva, and Notion, and the company has partnerships with such industry giants as ServiceNow, AWS, and Okta.


Kandji was also named to Forbes’ Next Billion Dollar Startup List 2023 and recognized as a top venture-backed startup with the potential to reach unicorn status.


As a Principal Site Reliability Engineer at Kandji, you will play a critical role in ensuring the reliability, scalability, and performance of our platform. In this strategic position, you’ll work cross-functionally to build and evolve the systems, tools, and processes that keep our services resilient and performant—especially as we scale to meet the demands of a growing customer base.


You’ll bring a deep understanding of distributed systems, incident management, observability, and automation. Your experience with AWS, Kubernetes, and Infrastructure-as-Code (Terraform preferred) will help drive efforts to proactively identify and eliminate reliability risks, reduce toil through automation, and establish engineering best practices across teams.


We’re looking for a seasoned engineer with both technical depth and a strategic mindset—someone who can guide long-term reliability efforts, lead postmortems and systemic remediation, and mentor others in SRE principles. This role provides the opportunity to shape the culture and architecture of reliability at Kandji, partnering closely with engineering, infrastructure, and product teams to build systems that are not only functional, but fault-tolerant and maintainable.


How You Will Make a Difference Day to Day:
  • Reliability Strategy & Resilience Engineering: Design and implement fault-tolerant, scalable, and highly available systems across our AWS-hosted platform to ensure reliability under load and failure conditions.
  • Service Ownership & Runbook Maturity: Partner with engineering teams to define and uphold SLIs/SLOs, perform root cause analyses, and drive post-incident reviews with a focus on long-term systemic improvements. Run recurring reliability reviews, and mature incident response practices including alert quality, runbooks, and failure simulations.
  • Automation & Tooling: Build and maintain automation for deployment, incident response, and remediation workflows to reduce manual toil and increase operational efficiency.
  • Secure Systems Design: Hands-on experience implementing DevSecOps practices including secure IaC, policy-as-code, and embedding controls in pipelines or platform abstractions.
  • Observability & Monitoring: Champion the development of comprehensive observability solutions—including metrics, logging, tracing, and alerting—to enable proactive detection and resolution of issues.
  • Infrastructure as Code: Contribute to and improve our Terraform-based infrastructure management, enabling consistent, auditable, and repeatable infrastructure deployments.
  • Capacity Planning, FinOps & Performance: Lead efforts in system tuning, load testing, and capacity forecasting to support our scaling platform and avoid bottlenecks before they occur. Lead efforts to monitor and optimize cloud costs across environments. Design and advocate for architectural trade-offs that balance cost, performance, and reliability.
  • Cross-Functional Reliability Coaching: Embed reliability thinking into engineering and product workflows. Run architecture reviews, failure simulations, and training to elevate operational discipline.
  • Mentorship & Leadership: Mentor engineers across the organization in SRE best practices, incident response, and reliability design patterns, helping build a culture of ownership and operational excellence across the company.


We’d love to hear from you if you have:
  • Experience: 10+ years in Site Reliability Engineering, DevOps, Infrastructure or related roles, with a proven track record of improving system reliability and scaling distributed systems in cloud environments (preferably AWS).
  • Technical Proficiency: Deep expertise in Infrastructure as Code (Terraform strongly preferred), Kubernetes, and container orchestration at scale; strong background in automation, scripting (e.g., Python, Go, or Bash), and CI/CD pipelines.
  • Reliability Engineering Mindset: Experience defining and maintaining SLOs/SLIs, leading incident response and postmortems, and applying SRE principles to reduce toil and improve system reliability. Deep familiarity with chaos engineering, failure mode analysis, and designing systems for graceful degradation under partial failure.
  • Observability & Performance: Strong understanding of modern observability stacks (e.g., Datadog, Prometheus, Grafana, OpenTelemetry) and performance tuning for distributed systems.
  • Security & Compliance Awareness: Solid understanding of security and compliance in cloud environments, with experience implementing secure-by-default infrastructure patterns. Familiar with secure infrastructure design, cloud compliance requirements (SOC2, ISO27001, ISO42001), and embedding DevSecOps into delivery workflows.
  • Problem Solving: Skilled in diagnosing complex, multi-layered production issues and implementing pragmatic, long-term solutions.
  • Influence & Communication: Excellent written and verbal communication skills with the ability to clearly articulate reliability trade-offs and influence engineering teams toward better operational outcomes. Trusted collaborator with product, infra, security, and GTM leaders.
  • Location: Required to work on-site 5x a week in our Miami office (Coral Gables).


Benefits & Perks


 • Competitive salary

 • 100% individual and dependent medical + dental + vision coverage

 • 401(k) with a 4% company match

 • 20 days PTO

 • Kandji Wellness Week the first week in July

 • Equity for full-time employees

 • Up to 16 weeks of paid leave for new parents

 • Paid Family and Medical Leave

 • Modern Health - Mental Health Benefits - Individual and Dependents

• Fertility Benefits

 • Working Advantage Employee Discounts

 • Free onsite fitness center

 • Free parking

 • Lunch 5 days/week

 • Exciting opportunities for career growth

 • An outstanding, inclusive culture


We are excited to be serving a significant need for a fast-growing market, and are proud of the high-performing team we have brought together so far. If you’re someone who wants to engage in new, exciting projects that will challenge your skills in the best way possible, we would love to connect with you.


At Kandji we believe in fostering an inclusive environment in which employees feel encouraged to share their unique perspectives, leverage their strengths, and act authentically. We know that diverse teams are strong teams, and welcome those from all backgrounds and varying experiences.


Kandji is proud to be an equal opportunity employer committed to diversity and inclusion in the workplace. Qualified applicants will be considered for employment without regard to race, color, religion, national origin, age, sex, sexual orientation, gender identity, physical or mental disability, protected veteran or military status or any other status protected by applicable law.

Kandji Glassdoor Company Review
3.4 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Kandji DE&I Review
3.5 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of Kandji
Kandji CEO photo
Adam Pettit
Approve of CEO

Average salary estimate

$180000 / YEARLY (est.)
min
max
$150000K
$210000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 13 days ago
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Vision Insurance
Paid Holidays

Support Engineering Lead needed at Kandji to lead technical support teams and resolve advanced Apple device management challenges in an onsite Miami role.

Photo of the Rise User
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Vision Insurance
Paid Holidays

Kandji is looking for an Enterprise Sales Development Representative to fuel sales pipeline growth by qualifying leads and booking meetings for its Apple device management platform.

Photo of the Rise User
Posted 12 days ago

Innovate and lead test automation development at Palo Alto Networks to enhance manufacturing processes for advanced cybersecurity hardware.

Photo of the Rise User

A health tech leader is looking for an Intermediate Database Reliability Engineer to enhance the performance, reliability, and automation of their cloud database systems in a fully remote role.

Photo of the Rise User

Collaborate as a Site Reliability Engineer to enhance infrastructure development workflows on MongoDB's DevInfra team, supporting multi-cloud provisioning and developer efficiency.

Photo of the Rise User
Vast Hybrid Long Beach, California, United States
Posted 5 days ago

Contribute to future space habitats as a Mechanical/Aerospace Engineering intern at Vast's Long Beach HQ, gaining immersive experience in artificial gravity space station systems.

Experienced database and cloud infrastructure professional needed to enhance and maintain Clear Capital’s high-volume real-time data systems and platforms.

Photo of the Rise User
AC Foods Hybrid Fresno, California, United States
Posted 4 days ago

AC Foods is recruiting a Sustainability Engineer to lead sustainable agricultural initiatives and optimize resource use across operations.

Photo of the Rise User
Posted 14 days ago

Lead the strategy, architecture, and delivery of a scalable API and microservices platform as Head of Microservices & API Engineering at Truist Financial Corporation.

Photo of the Rise User

Experienced OT Systems Engineer needed to lead and maintain secure network and virtual infrastructure for critical natural gas delivery systems.

Photo of the Rise User
Aldea Hybrid No location specified
Posted 20 hours ago

Senior DevOps Engineer needed to architect and maintain multi-cluster Kubernetes infrastructure for a fast-growing AI platform in a flexible hybrid setting.

Timmons Group Hybrid 430 Eastwood Rd, Wilmington, NC 28403, USA
Posted 3 hours ago

Dynamic Civil Project Engineer II/III opportunity at Timmons Group to support land development projects with design, engineering, and client collaboration in Wilmington, NC.

Photo of the Rise User
Posted 4 days ago

A Water Resources Engineer role at Dewberry's Bloomfield, NJ office focusing on hydraulic analyses, stormwater management, and waterway project design for diverse clients.

Photo of the Rise User

Explore Electro-Optic test engineering as a Stevens Institute co-op at SRI International's Princeton lab, contributing to cutting-edge imaging sensor evaluation and research.

Photo of the Rise User

Experienced construction specialist needed to assess and support disaster recovery efforts in the Southwest, focusing on Adobe construction expertise for a leading FEMA contractor.

Drawing on decades of experience in Apple IT, we saw a dire need for a device management platform that could accommodate growing businesses and increasing regulatory demands. Existing solutions were either overly simplistic or mind-numbingly compl...

4 jobs
MATCH
Calculating your matching score...
BENEFITS & PERKS
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Vision Insurance
Paid Holidays
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, onsite
DATE POSTED
July 9, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!