Senior Site Reliability Engineer

Крайний срок: 26 Май 2024

Условия контракта: Постоянный

Категория: Программирование

Тип вакансии: Полная ставка

Местонахождение: Ереван

Описание работы

OneMarketData is continuously searching for bright talent with the skills to make an impact. From developers to data scientists, at OneTick you will have the opportunity to develop and enhance your problem-solving skills using a combination of analytics, imagination, and talent.

Overview

Our DevOps team develops the infrastructure behind the hosted solutions and our software and data delivery lifecycle.

Prior to advancing with your application, we kindly request that you review the CONSENT NOTICE FOR HR AND RECRUITING provided by OneMarketData. Your attention to this matter is greatly appreciated.

Our stack:

  • AWS (some of the services we use are: EKS, EC2, S3, SGW, ASG, ELB, Lambda, etc.);
  • Terraform and Ansible as an IaC approach;
  • Gitlab and Gitlab CI/CD;
  • Python is the main programming language for automation;
  • Kubernetes (mostly EKS, but GKE and other Kubernetes engines are also being used) for Orchestration and Helm for its management.
  • Prometheus/Victoria Metrics, Grafana, Loki, AWS CloudWatch, and CloudTrail for monitoring, logging, and some statistics collection;
  • OneTick (our platform for market data);

Some other tools for different purposes - i.e., Packer, HashiCorp Vault, OpenVPN, Slack, Confluence, and other popular and well-known tools:)


More information about the projects

In the Cloud Project, we have a multi-account AWS infrastructure managed by the AWS organization. Separate AWS accounts are necessary to host customer-facing environments. We have been providing our customers with different setups for our application. In general, we use most of all common AWS resources like EC2, EKS, S3, VPC, ELB, etc, but also the stack of AWS resources is pretty comprehensive. Most of our AWS infrastructure is covered by IaC. CI/CD is running on GitLab.

We have more than 4 petabytes of data in S3 and EFS. We expose part of the data in S3 to the file system using Storage Gateways. Currently, we are migrating from setup on EC2 instances to Kubernetes, integrating centralized logging and monitoring solutions, migrating data loading processes to Airflow, and optimizing infrastructure costs planning to improve performance at the same time.

We are looking for an experienced Site Reliability Engineer (SRE) to join our team. Your primary responsibility will be to guarantee the reliability, scalability, and performance of our applications and systems. Working closely with both our software engineers and product teams, you will dive deep into troubleshooting production issues, ensuring seamless operation. Additionally, you will collaborate on designing and implementing solutions to enhance our monitoring and alerting systems, aiming to optimize our overall efficiency and reliability. Your expertise in automation will play a crucial role in reducing manual toil and streamlining processes, ultimately contributing to the success of our operations.

Обязанности

  • Monitor and maintain the health and reliability of our production systems
  • Investigate and resolve production issues and outages
  • Develop and maintain monitoring, alerting, and incident response systems
  • Design and implement automation to reduce manual toil and improve system reliability
  • Collaborate with software engineers to design and implement highly scalable and resilient systems
  • Participate in on-call rotation and respond to incidents promptly
  • Continuously improve our systems and processes to ensure the highest level of reliability and availability
  • Document processes and procedures for maintaining and troubleshooting production systems

Требования

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field
  • 3+ years of experience as a Site Reliability Engineer or related role
  • Strong knowledge of Linux/Unix systems and administration
  • Proficiency in at least one programming language (e.g., Python, Java, C++)
  • Experience with automation and configuration management tools (e.g., Ansible, Terraform)
  • Experience with AWS and Kubernetes

General requirements:

  • English - Upper-Intermediate or higher.
  • Good communicative skills, being able to explain complicated things in simple words.
  • Being eager to learn new technologies (including area-specific).
  • Strong analytical and problem-solving skills
  • Attentiveness, hard-working and goal-oriented mindset (to have the tasks done), and opportunity to work both in the team and independently.
  • Be prepared to explore further and gain a comprehensive understanding of the product, ready to delve deeply into its functionality, because it is closely connected to how things work.

Требуемый уровень кандидата: Старший

Дополнительная информация

Подай заявку через staff.am и отслеживай весь процесс онлайн․

Профессиональные навыки

Python

DevOps

AWS

Личные навыки

Решение проблем

Аналитические навыки

Поделитесь этой вакансией в соцсетях.

Привилегии для сотрудников

Ежегодный пересмотр зарплаты
Бесплатные курсы языков
Медицинская страховка
Тимбилдинг и корпоративные мероприятия
Бесплатный чай, кофе и напитки
Гибкий график
Медицинское страхование для семьи
Бесплатная парковка
Программа направления персонала

Контакты

Веб-сайт https://www.onetick.com/

Телефон: +37460460479

Адрес: Yeraz Business Center, bldg 2 (Adontsi 2)., Ереван, Армения