




Summary: Join EPAM's Application Production Support team as a Site Reliability Engineer to enhance a world-class cash equities execution platform, ensuring reliability and efficiency of trading systems. Highlights: 1. Support and enhance a world-class cash equities execution platform 2. Work with innovative, low-latency technology in financial markets 3. Develop advanced technical skills and impact financial market technologies EPAM is looking for a **Site Reliability Engineer** to join our Application Production Support (APS) team. In this role, you will support and enhance a world\-class cash equities execution platform as part of a global team. This position offers a hybrid setup, requiring regular office visits two days per week in Lisbon. Your primary focus will be on ensuring the reliability, scalability and efficiency of trading systems by applying site reliability engineering best practices. This position offers the opportunity to work with innovative, low\-latency technology and gain hands\-on experience with modern financial markets, stock exchange organizations and algorithmic high\-frequency trading. You will collaborate with software engineering teams, automate processes, manage incidents and contribute to the continuous improvement of the production environment. This is an excellent opportunity to develop advanced technical skills and make a direct impact on the technologies that drive today’s financial markets. **Responsibilities** * Develop and implement monitoring, alerting and incident response strategies * Automate routine tasks and processes to reduce manual intervention and improve efficiency * Collaborate with software engineering teams to design and deploy reliable, scalable and efficient systems * Deploy production changes with precision, ensuring minimal disruption to services and maintaining platform integrity; rotation\-based weekend work may be required * Manage incidents including detailed analysis and reporting to maintain high service levels * Participate in on\-call rotations to provide support for critical systems and services **Requirements** * Degree in computer science, information technology or a related field * Proven experience in a similar technical or engineering role * Strong knowledge of Unix/Linux systems and networking * Proficiency in programming and scripting (Unix/Linux shell, Python, Perl for scripting, any other programming language such as C, C\+\+ or Java) * Experience with monitoring and observability tools such as ITRS Geneos, Dynatrace, Prometheus or Grafana * Strong problem\-solving skills and ability to troubleshoot complex systems **Nice to have** * Knowledge of financial markets and electronic trading * Experience with log management tools such as Splunk, ELK, Graylog or Loki * Experience with network monitoring tools such as Corvil * Familiarity with databases such as Oracle, PostgreSQL, MySQL/MariaDB or KDB/q * Experience with messaging systems such as Tibco, Solace, IBM MQ, LBM or Kafka * Experience with Infrastructure as Code (IaC) tools such as Ansible, Terraform or similar * Prior experience in a high\-availability, high\-traffic environment **We offer** * Competitive compensation depending on experience and skills * Variety of projects within one company * Being a part of a project following engineering excellence standards * Individual career path and professional growth opportunities * Internal events and communities * Flexible work hours


