Minma, Inc. Tokyo, JP
Tech Lead / Site Reliability Engineer May, 2017 - Aug, 2022
Competencies
- Data-Driven Decision Making
- Automation & Efficiency
- Cloud Architecture & Scalability
- System Reliability & Incident Management
- Leadership
Achievements
- Doubled platform throughput by designing a high-availability AWS infrastructure, mirroring the demands of deploying scalable machine learning models.
- Reduced incident response times by 60% by implementing a comprehensive monitoring solution (ELK stack, Cloudwatch) for a 100+ node distributed system, critical for the continuous operation of ML applications.
- Reduced deployment cycles by 50% (-25min) developing an automation framework with Ansible & Terraform, streamlining configurations and deployments, essential for agile development and machine learning workflows.
- Enabled dynamic scaling of core services by spearheading the transition to containerized applications using Docker and AWS ECS, crucial for agile machine learning model development and testing.
- 0-1 design and implementation of a scalable microservices architecture, demonstrating an understanding of scalability essential for data-intensive machine learning operations.
- Reduced search latency from 6 secs to under 1 sec by strategic caching and switching from Postgres to Elasticsearch, optimizing query performance and user experience.