Monitoring System Engineer
Join SOFTSWISS as a Monitoring System Engineer and be part of our exciting journey! We are searching for someone who not only has the expertise but also resonates with our culture and values.
Overview:
SOFTSWISS continues to expand the team and is looking for a Monitoring System Engineer.
If you're passionate about delivering top-notch service and consider yourself a proactive, positive thinker, we'd love to hear from you! We're eager for you to contribute to our team's success. If you're looking for a challenging and rewarding career opportunity, this could be the perfect fit.
Key responsibilities:
The two main pillars of our workflow are:
Responding to Events/Monitoring Alerts (L1/L2 tasks for certain system parts):
Offering on-duty service coverage, encompassing day and night shifts.
Addressing incidents by troubleshooting and resolving issues, even seeking assistance from third-party or vendor support when necessary.
Directing issues or queries to the relevant department as needed.
Keeping detailed records and documentation of current infrastructure challenges and Root Cause Analyses (RCAs).
Contribute to safe and effective internal practices for AI usage in monitoring and incident response workflows.
Maintaining and Enhancing the Monitoring Systems:
Collaborating with other teams to understand and define their monitoring needs, then implementing the right solutions.
Setting up and adjusting the monitoring/observability systems for various teams.
Designing and tweaking alerts and dashboards to suit specific needs.
Refining alerts to reduce irrelevant notifications and increase their significance.
Enhancing dashboards for better clarity, understanding, and a more comprehensive view.
Building and sustaining connections between the monitoring systems and other platforms like Jira, Opsgenie, etc. when required.
Establishing and updating a Knowledge Base, covering system configurations, alert processes, troubleshooting guidelines, and user manuals.
Staying updated with the newest trends and best practices to continuously uplift our organization's monitoring capabilities.
Identify opportunities to automate repetitive monitoring and support tasks, including with AI-assisted approaches where suitable.
Required Experience:
Minimum of 3 years experience as a Systems Engineer, SRE, DevOps, or Monitoring Support Engineer (L2+).
Good understanding of Linux-like operating systems (Debian-based).
Experience with containerization, virtualization, and orchestration (LXC/LXD, Docker, Kubernetes).
Development experience in any scripting language (Bash, Python, Go, etc) and familiarity with REST API.
Knowledge of basic database concepts (experience with PostgreSQL is preferable), including transactions and WAL.
English proficiency at an Intermediate (B1) level or higher. It's crucial to understand technical terminology related to our specific tech stack and to be able to interpret technical documentation.
Practical interest in using AI-assisted tools for troubleshooting, automation, documentation, and operational efficiency:
- Ability to critically evaluate AI-generated output and validate it before using it in production environments.
- Understanding of the risks and limitations of AI usage in infrastructure and production operations.
Skills & Experience
Monitoring/observability tools (experience with at least two of the following)
Zabbix (familiarity with concepts such as LLD, prototypes, dependencies, and preprocessing)
Grafana (knowledge of data sources, dashboard creation, and query usage)
Prometheus/VictoriaMetrics/etc. (understanding of metrics collection and alerting)
ELK/Splunk/etc. (ability to use queries and filters for log analysis)
Site24x7/Pingdom/etc. (experience with web monitoring and performance metrics)
Linux-like operating systems
Strong understanding of key concepts, including:
File systems
Process management
Built-in monitoring tools
Networks
Scripting
Troubleshooting
Familiarity with
Kafka
RabbitMQ
GitLab
Nginx/Puma
Clickhouse
PostgreSQL
MongoDB
Hashicorp Vault
Microservices and orchestration (Kubernetes)
Any IaC / infrastructure automation:
- Provisioning tools (Terraform);
- Configuration management (Ansible, Salt, Puppet)Any AI-assisted/AIOps tools
Our Benefits:
Full-time remote work opportunities and flexible working hours
Private insurance
Additional 1 Day Off per calendar year
Sports program compensation
Comprehensive Mental Health Programme
Free online English lessons with a native speaker
Generous referral program
Training, internal workshops, and participation in international professional conferences and corporate events.
- Department
- DevOps
- Role
- Monitoring Systems Engineer
- Locations
- T'bilisi, Georgia, Brasília, Cyprus, Malaysia
- Remote status
- Fully Remote
- Employment type
- Full-time