Senior Site Reliability Engineer
Location: Pleasanton, CA
At 10x Genomics, accelerating our understanding of biology is more than a mission for us. It’s a commitment. This is the century of biology, and the breakthroughs we make now have the potential to change the world.
Our software enables our scientists to better understand human health, such as pinpointing the differences between a cancer and normal cell and the genomic sequences the body makes in response to infection. We’ve built an ecosystem of powerful software, hardware, microfluidics, and chemistry to create products that are used by researchers around the world, including 96 of the top 100 global research institutions.
Our teams are encouraged to follow their passions and pursue new ideas in an inclusive and dynamic environment. The discoveries we enable together will lead to better technologies, better treatments, and a better future. Find out how you can make a 10x difference.
About the role:
We are looking for an exceptional site reliability engineer with a solid understanding of Linux and distributed computing to join our team. Our multi-disciplinary team in microfluidics, biochemistry, mechanical engineering, computational biology, and software has a proven track record of delivering successful commercial products built on deep technological innovation. If you are a self-starter who is passionate about building and operating reliable, scalable and performant systems, and is excited to work in a highly collaborative environment alongside a diverse team of experts every day, join us at 10x Genomics.
What you will be doing:
- Build, deploy and maintain resilient and scalable High Performance Computing (HPC) systems and services on premise and in the cloud.
- Scale systems and improve operational efficiency through extensive automation.
- Collaborate with software engineering team on continuous delivery and deployment.
- Monitor infrastructure and applications for uptime and resource utilization, identify performance bottlenecks, troubleshoot and mitigate system issues, and develop solutions to improve reliability and performance.
- Maintain detailed documentation of system build and operational procedures.
- Participate in on-call rotations.
To be successful you will need:
- Bachelor’s degree in Computer Science or a related field, or equivalent work experience.
- 7+ years of Linux systems engineering or development experience in a large scale environment.
- 3+ years of experience in SRE-type role.
- Extensive experience with orchestration and configuration management tools (e.g. Terraform, CloudFormation, Ansible, Puppet, Chef).
- Knowledge of Linux kernel tuning, networking and performance optimization.
- Proficiency in shell scripting and at least one other language, e.g. Python.
- Experience with AWS services and infrastructure design.
- Strong desire to learn and implement new technologies.
- Excellent written and verbal communication skills.
Nice to have skills and background:
- Experience in managing multi-petabyte scale network-attached storage (NAS) and operational knowledge of NFS protocol.
- Familiar with HPC workload managers such as SGE or Slurm.
All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.