Staff Site Reliability Engineer
Location: Pleasanton, CA
10x Genomics is building tools for scientific discovery that reveal and address the true complexities of biology and disease. Through a combination of novel microfluidics, chemistry and bioinformatics, our award-winning Chromium™ System is enabling researchers around the world to more fully understand the fundamentals of biology at unprecedented resolution and scale. Learn more at 10xGenomics.com.
Fueled by equal parts scientific vision and determined passion, we are delivering unprecedented innovation to short-read sequencing technologies and transforming how genomic information is accessed. You will feel the 10x difference the moment you enter our offices and labs. There’s a dynamic energy here, and we’re looking for the best of the best to be a part of it. We are seeking talented professionals excited to build new technology that advances scientific research while growing their career within a dynamic, supportive environment.
Staff Site Reliability Engineer
We are looking for an exceptional site reliability engineer with a solid understanding of Linux and distributed computing to join our team. Our multi-disciplinary team in microfluidics, biochemistry, mechanical engineering, computational biology, and software has a proven track record of delivering successful commercial products built on deep technological innovation. If you are a self-starter who is passionate about building and operating reliable, scalable and performant systems, and is excited to work in a highly collaborative environment alongside a diverse team of experts every day, join us at 10x Genomics.
- Lead a team of SREs to design, build and maintain resilient and scalable Linux high performance computing (HPC) systems and storage on premise and in the cloud.
- Automate the deployment, operations, and monitoring of infrastructure.
- Monitor infrastructure and applications for uptime and resource utilization, identify performance bottlenecks, troubleshoot system issues, and develop solutions to improve reliability and performance.
- Scale systems and improve operational efficiency.
- Maintain detailed documentation of system build and operational procedures.
- Off-hour support may be required on occasions.
Required Skills and Background
- Bachelor’s degree in Computer Science or a related field, or equivalent work experience.
- 10+ years of Linux systems engineering experience in a large scale environment.
- 5+ years of experience in SRE-type role.
- Extensive experience with automation, provisioning and configuration management tools (e.g. Ansible, Puppet, Chef).
- Knowledge of Linux kernel tuning, networking and performance optimization, with ability to deep dive into code.
- Software engineering experience with proficiency in one or more of the following: Go, Python, and/or shell scripting.
- Experience with IaaS, e.g. AWS.
- Strong desire to learn and implement new technologies.
- Excellent written and verbal communication skills.
Desired Skills and Background
- Experience in managing multi-petabyte scale network-attached storage (NAS) and operational knowledge of NFS protocol.
- Familiar with HPC workload managers such as SGE or Slurm.
- Working knowledge of LDAP and Active Directory.
All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.