Senior Site Reliability Engineer
About Us
Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you.
Job Description
The Sr. Site Reliability Engineer is responsible for supporting the deployment and configuration of monitoring and logging tools, automating routine operational tasks, and maintaining observability tools such as Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, and CloudWatch. This role works closely with team members to implement and maintain monitoring solutions across development, staging, and production environments, and contributes to the setup and maintenance of CI/CD pipelines to support automated build, test, and deployment processes. The engineer provides support in managing cloud infrastructure (AWS, GCP) to ensure availability and security, learns and applies DevOps and SRE best practices, and assists with the implementation and management of containerization technologies like Docker and Kubernetes. Responsibilities include monitoring system performance, identifying and escalating issues, participating in troubleshooting and root cause analysis for production incidents, and creating and updating documentation for infrastructure and operational procedures. All roles require digital fluency, including the ability to work with emerging technologies such as Generative AI tools (e.g. ChatGPT, Microsoft Copilot) to support everyday work. Key Responsibilities: Support deployment and configuration of monitoring and logging tools. Automate routine operational tasks to improve efficiency and support system integration. Assist with maintenance and management of observability tools (Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, CloudWatch). Implement and maintain monitoring solutions in development, staging, and production environments. Contribute to setup and maintenance of CI/CD pipelines for automated build, test, and deployment. Provide support in managing cloud infrastructure (AWS, GCP) for availability and security. Use infrastructure as code tools (Terraform, Ansible, CloudFormation) for environment configuration. Monitor system performance and assist in identifying and escalating issues. Support implementation and management of containerization technologies (Docker, Kubernetes). Participate in troubleshooting and root cause analysis for production incidents. Create and update documentation for infrastructure, processes, and operational procedures. Provide first-level support for routine infrastructure and deployment issues, escalating complex problems as needed. Seek opportunities to automate repetitive tasks and suggest workflow improvements. This is a remote position. A remote position does not require job duties be performed within proximity of a Visa office location. Remote positions may be required to be present at a Visa office with scheduled notice. #LI-Remote
Qualifications
Basic Qualifications: 2+ years of relevant work experience and a Bachelors degree, OR 5+ years of relevant work experience Preferred Qualifications: 3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) 2+ years of relevant work experience and a Bachelor's degree, OR 5+ years of relevant work experience. Experience in supporting deployment and configuration of monitoring and logging tools. Experience in automating routine operational tasks for system integration. Experience with observability tools such as Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, and CloudWatch. Experience in implementing and maintaining monitoring solutions across environments. Experience in setting up and maintaining CI/CD pipelines for automated processes. Experience in managing cloud infrastructure (AWS, GCP) for availability and security. Experience with infrastructure as code tools (Terraform, Ansible, CloudFormation). Experience in monitoring system performance and escalating issues. Experience with containerization technologies (Docker, Kubernetes). Experience in troubleshooting and root cause analysis for production incidents. Experience in creating and updating documentation for infrastructure and operational procedures. Experience in providing first-level support for infrastructure and deployment issues. Experience in automating repetitive tasks and suggesting workflow improvements. Experience in learning and applying DevOps and SRE best practices. Experience in supporting implementation and management of containerization technologies. Visa is an EEO Employer Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law. Apply To This Job