Site Reliability Engineer

Location: Cambridge

We are seeking a Site Reliability Engineer to maintain and develop our cloud infrastructure and monitoring systems

Key features

Location: Cambridge
Fantastic opportunity to help the business develop and thrive
Full time hybrid working

The opportunity

We are seeking a Site Reliability Engineer to maintain and develop our cloud infrastructure and monitoring systems and processes, helping to ensure the reliability and security of the service we provide to our customers. This role will report into our Head of Platform. 

Alchemite provides a web browser and RESTful API interface for machine learning to train and use machine learning models and upload, retrieve and analyse datasets. Our applications are containerised and deployed to Kubernetes on our Google Cloud infrastructure or customer managed environments on Google Cloud, Azure or AWS.

Main duties and responsibilities

You will be responsible for the continued development of our monitoring systems and use them to proactively identify and communicate performance, reliability, security and cost issues. You will assist in responding to incidents and the remediation of vulnerabilities in our platform. You will also identify, plan and implement improvements to our cloud infrastructure and deployment processes in a secure and robust way, working alongside other engineers to support our product roadmap. As part of the wider product engineering team you will advocate throughout the design process for effective monitoring to ensure the performance, stability and security of our products in line with our commitment to ISO 27001 compliance.

What makes you our next Site Reliability Engineer?

Essential requirements

  • Minimum Bachelor 2:1 degree in computer science or a related field
  • 2+ years experience in a professional DevOps, SRE, Platform Engineering or similar role
  • Self-motivated with strong problem-solving and analytical skills
  • Experience using and configuring monitoring tools, ideally Grafana and Prometheus, to identify insights and alert to potential issues
  • Experience using and configuring cloud infrastructure (ideally GCP but Azure also desirable)
  • Experience with IaC tools (ideally Terraform)
  • Experience with Docker, Kubernetes and Helm
  • Knowledge of security and reliability best practices for cloud infrastructure and application deployments to kubernetes
  • Experience using Python and Bash for scripting or small CLI applications
  • Experience using Git for professional software development

Desirable requirements

  • Experience responding to and investigating security or reliability incidents in a distributed cloud environment
  • The ability to communicate technical challenges and opportunities to people outside your area of expertise
  • Some familiarity with the applications in our tech stack: NGINX, Flask (Python), React (TypeScript), PostgreSQL, Opensearch, Valkey, Keycloak
  • Knowledge of administering Linux based systems
  • Experience using CI tools, ideally CircleCI, to manage application deployments
  • Experience applying and monitoring compliance with information security policies
  • Experience applying Agile methodologies and working in sprints

The above is not an exhaustive list and you are required to be flexible in your approach to carrying out your duties which may change from time to time in order to reflect business needs or the company’s continuous improvement.

What can we offer you?

  • A competitive financial package – salary and share options
  • 5 weeks annual leave pro rata, flexible leave policy
  • Salary sacrifice pension, with company savings being paid into the scheme
  • A collaborative work environment with neither red tape nor bureaucracy
  • Scope for career development as an early team member
  • Support and resources to develop your skills and succeed in the role
  • Hybrid working arrangements and a great team culture
  • Access to an EAP, wellbeing champion, and financial advice
  • Enhanced sickness policy
  • Regular social and team building events
  • Cycle scheme

About Intellegens

With Intellegens, you can apply advanced machine learning to find new product and process solutions, get products to market faster, and break through R&D bottlenecks. The Alchemite™ Suite offers a series of easy-to-use apps focused on key challenges for scientists, experimentalists, data scientists, or their managers. Unlock insights hidden in your data and guide decision-making to reduce experimental workloads by 50-80%. Share results and collaborate across R&D teams. 

Originally developed at the University of Cambridge, the Alchemite™ algorithm offers unique capabilities for handling real experimental and process data. Since 2017, Intellegens has further advanced the method, implemented software solutions that enable it to be applied to solve practical problems in industrial R&D, and developed deep experience of such applications through hundreds of projects. The strengths of Alchemite™ include its ability to train machine learning models on sparse and noisy data, where other machine learning methods typically fail, and accurate uncertainty quantification, enabling effective decisions.

The Intellegens team of expert scientists, software engineers, and commercial staff is based in its new Cambridge Headquarters at Chesterton Mill. Its record of innovation and customer success led to it being awarded ‘AI Company of the Year’ in the Cambridge Independent Science & Technology Awards, 2023, selected from a very strong peer group of scientific software and research businesses in the Cambridge ‘Silicon Fen’ region. Successful applications of Intellegens technology have included alloys, specialty chemicals, plastics, paints, coatings, cosmetics, food and beverages, drugs, biopharmaceuticals, batteries, and optimising manufacturing processes. 

How to apply

If you are interested in this role, send a CV or showcase your suitability by emailing hr@intellegens.com

Please no recruitment agencies.

Search