




We are looking for a **Lead Site Reliability Engineer** to enhance a global execution platform, delivering robust solutions to trading desks and clients. You will collaborate with expert teams, advancing your expertise in system administration, monitoring, and low\-latency technologies. Join us to contribute to cutting\-edge financial technology innovations. **Note that working on\-site at the client's Lisbon office for 2\-3 days per week is required.** **Responsibilities** * Design and enforce monitoring, alerting, and incident management strategies * Automate repetitive tasks and workflows to increase operational efficiency * Work alongside software engineering teams to build and launch scalable, dependable systems * Execute production deployments carefully to preserve platform stability * Handle incident management with thorough analysis and reporting to maintain service quality * Engage in on\-call duties to support essential systems and services * Communicate clearly with colleagues to swiftly resolve technical problems * Maintain up\-to\-date documentation for operational workflows and system settings * Drive continuous improvements in system reliability and efficiency through proactive initiatives **Requirements** * Deep understanding of Unix/Linux operating systems and networking with over 5 years experience * Proficiency in Unix/Linux shell scripting and programming languages including Python, Perl, C, C\+\+, or Java * Experience with monitoring and observability solutions such as ITRS Geneos, Dynatrace, Prometheus, and Grafana * Strong troubleshooting skills for complex system issues * Experience in environments with high availability and heavy traffic * Bachelor’s or Master’s degree in IT engineering or a related discipline * Ability to collaborate effectively within a team and adapt to evolving environments * Self\-driven with excellent problem\-solving capabilities and thorough issue tracking * Excellent written and verbal communication abilities with English proficiency at B2\+ level **Nice to have** * Familiarity with log analysis tools like Splunk, ELK, Graylog, or Loki * Knowledge of network monitoring solutions such as Corvil * Experience with relational databases including Oracle, PostgreSQL, MySQL/MariaDB, or KDB/q * Understanding of messaging platforms like IBM MQ, Tibco, Solace, LBM, or Kafka * Experience with Infrastructure as Code tools such as Ansible or Terraform


