Summary:
We seek a skilled Database Reliability Engineer (DBRE) to ensure the reliability, scalability, and performance of our Azure-hosted PostgreSQL databases, which form the backbone of our microservices architecture. The ideal candidate will collaborate with developers, system administrators, and cross-functional teams to optimize and manage database environments. You will oversee availability, backup, disaster recovery, security, and performance tuning while applying modern SRE and DevOps principles to drive continuous improvement.
This role is ideal for someone who excels at solving complex problems, thinks critically, and thrives in a fast-paced, high-accountability environment.
Key Responsibilities:
Database Management & Reliability:
Ensure high availability and reliability of PostgreSQL databases hosted on Azure.
Monitor database health, performance, and security using Azure-native and third-party tools.
Automate routine database management tasks, including backups, scaling, and failover.
Manage disaster recovery and backup strategies, ensuring the integrity of stored data.
Performance Optimization:
- Diagnose and resolve database performance issues, such as slow queries, locking, or inefficient resource utilization.
- Tune PostgreSQL databases for optimal performance based on the workload and usage patterns.
- Implement monitoring tools and dashboards for database performance metrics.
Infrastructure & Automation:
- Leverage Infrastructure-as-Code (IaC) tools (e.g., Terraform, Bicep, ARM templates) to automate the deployment and configuration of Azure database resources.
- Collaborate with development and platform engineering teams to integrate database performance monitoring, scaling, and other reliability measures into CI/CD pipelines.
Security & Compliance:
- Implement and enforce best practices for database security, including encryption, access controls, authentication, and auditability.
- Ensure compliance with relevant industry standards and regulations (e.g., SOC 2, ISO 27001) for data security and retention.
Incident Response & Troubleshooting:
- Respond to database-related incidents, ensuring timely resolution.
- Investigate and troubleshoot issues related to database performance, connectivity, and Azure cloud infrastructure.
Continuous Improvement:
- Identify and resolve potential bottlenecks in database reliability and performance.
- Contribute to the improvement of database architecture, schema design, and indexing strategies.
- Mentor developers and other team members on best practices for database reliability, generally, and PostgreSQL in particular.
Knowledge, Skills, & Competencies:
- Strong knowledge of PostgreSQL internals, query optimization, replication, and high availability techniques.
- Proficiency in using Azure tools such as Azure Monitor, Azure SQL Insights, and Azure Security Center.
- Proficiency in shell scripting, Python, or PowerShell for automating database tasks.
- Strong problem-solving abilities, especially under pressure during incident response.
- Excellent communication skills to work collaboratively with cross-functional teams.
- Ability to work in a fast-paced environment and handle multiple priorities.
#LI-Remote Pay Range: - , General Benefits:
You can also use your social account to sign in. First you need to:
Accept Terms & Conditions and Privacy Policy