Hands-on MLOps / ML Platform Engineer with 5+ years taking machine-learning systems from experiment to reproducible production. I build and operate MLflow tracking, Argo Workflows training pipelines, and Git-centric MLOps for data-science teams on GPU-ready Kubernetes, with deep experience in air-gapped delivery, observability, and developer enablement. Strong background across AWS, Terraform, and multi-tenant SaaS platforms — blending in-house leadership at Afiniti with global consulting experience. CKA Certified
My career journey began unexpectedly when the pandemic delayed my graduation, allowing me to start as a System Administrator before completing my degree. That early grounding in IT support and Linux administration gave me a strong foundation to build on as I moved into DevOps and platform engineering.
Over the past few years I've focused on MLOps and ML platform work — rolling out MLflow and Argo Workflows so data-science teams get reproducible experiment tracking, model registries, and automated training pipelines on GPU-ready Kubernetes. I've authored an offline-capable Python deployment CLI that packages complex ML stacks into single-command, air-gapped bundles, and built Claude agents, skills, and a Bitbucket MCP server to compress repo-to-deployment cycles from hours to minutes. Alongside this I design multi-tenant SaaS foundations on AWS, fully codified in Terraform.
I'm a Certified Kubernetes Administrator (CKA) and recently completed my MS in Computer Science at NUST. Outside of work, I enjoy baking, walking, and exploring nature — activities that help me stay creatively inspired and recharge.
Dec 2022 - Present
Rolled out an MLflow + Argo Workflows stack to standardize
experiment tracking, model/artifact storage, and automated
training pipelines, onboarding data-science teams onto a
Git-centric MLOps workflow on GPU-ready Kubernetes.
Authored an offline-capable Python deployment CLI that vendors
dependencies, pre-pulls images, and packages complex ML
stacks into single-command, air-gapped bundles for regulated
customers.
Built Claude agents, skills, and a Bitbucket MCP server to
auto-generate CI/CD pipelines and Kubernetes manifests,
compressing repo-to-deployment cycles from hours to ~10
minutes.
Designed multi-tenant SaaS foundations (VPC, EKS, EC2,
PrivateLink, S3, IAM) codified in Terraform, with AWS
guardrails and knowledge bases for on-prem and cloud AI
agents.
Operated Rancher-powered Kubernetes clusters, ported legacy
shared-memory apps to containers, and established
SonarQube-backed code-scanning pipelines.
May 2022 - Nov 2022
Built scalable REST APIs in Node.js/Express, integrating
open-source and third-party tools.
Developed parallel task APIs to optimize multi-core CPU usage,
and conducted stress tests with Artillery to eliminate
performance bottlenecks.
Implemented JWT authentication
for enhanced security.
Exported logs to Prometheus via Promtail and integrated
Grafana dashboards for performance monitoring.
Automated monitoring stack configurations with Ansible for
consistent setups.
Feb 2021 - Apr 2022
Managed Rancher-based Kubernetes clusters for development and
production, using GitLab CI/CD pipelines for automated builds,
tests, and deployments.
Deployed and managed EKS
clusters on AWS with ekstcl and GitOps via GitHub for client
projects.
Administered infrastructure with VMware ESXi, Ansible, and
Terraform, and configured production databases with read
replicas and backups.
Migrated company infrastructure
from AWS to on-prem servers, minimizing downtime and resolving
issues.
Consolidated a client's multi-cloud infrastructure
(DigitalOcean, AWS, Azure) into a unified AWS solution.
May 2020 - Sep 2020
Set up CentOS-based VMs for deploying PHP applications and
hosting MySQL databases, securing servers for PHP and Laravel
apps with SSL and hardening measures.
Implemented robust database backup policies using open-source
solutions.
High Availability and Disaster Recovery
Implemented high-availability PostgreSQL clusters with repmgr in a master-slave architecture. Automated backup processes and disaster recovery using AWS Backup services and custom scripts, ensuring robust data protection and rapid recovery across various use cases.
API Development
Developed a Node.js API to manage PostgreSQL database access, providing endpoints for CRUD operations and enabling parallel processing for improved performance.
Infrastructure Migration
Led seamless infrastructure migrations from AWS to GCP, utilizing Terraform and Ansible to automate and streamline the process, ensuring minimal downtime and smooth transitions.
Security and Development
Configured OpenVPN within VPCs to enhance security for in-house development, ensuring secure access to internal resources.
Kubernetes Cluster Repair
Recovered a critical Kubernetes cluster by repairing broken components caused by internal certificate expiry, safeguarding crucial data and ensuring system continuity.
Cloud Platform Utilization
Optimized cloud infrastructure by utilizing DigitalOcean databases with backups, Kubernetes clusters for container orchestration, and application deployment via GitHub CI/CD and the DigitalOcean App Platform for seamless application delivery.
AWS Infrastructure Utilization
Leveraged AWS services including VPCs, CloudFront, Block Storage, and WAF to host a production website, ensuring high availability, scalability, and security with caching for improved performance.
Website Performance Optimization
Reduced frontend load times for a WordPress site from 30 seconds to just 2-3 seconds by configuring AWS CloudFront for faster content delivery and improved user experience.
When I'm not immersed in MLOps and platform engineering, I love to
unwind by baking, taking long walks, and exploring the beauty of
nature. These activities help me recharge and keep my creativity
flowing. Whether it's trying out new recipes or simply enjoying the
peace of a nature trail, I find that a little time away from the
screen is just as important as the work itself.
If you're interested in connecting or discussing new opportunities,
feel free to reach out. I’m always open to sharing ideas,
collaborating, and learning from others in the tech community.
contact@szeeshan.me
zs.zeeshansaeed22@gmail.com
Peshawar, Pakistan