About
i started in trading. not the kind where you have opinions about the market -- the kind where your system processes 25TB before the bell rings or you lose actual money. sub-millisecond latency. real consequences. that is where i learned what production means. not the conference talk version. the version where something breaks at 2am and the P&L moves in the wrong direction.
i have been a founding engineer, an ML platform builder, and the person who gets paged when production systems need to be rearchitected without downtime. every role had the same job -- make the system reliable enough that nobody has to think about it.
the layers between the model and the user -- inference runtimes, GPU scheduling, observability, cost attribution -- those are the parts i care about. not because they are glamorous. because they are the parts that determine whether an AI product is viable at scale... or just a prototype with good funding.
i am currently a founding engineer at a generative AI startup building the ML platform from zero. training pipelines, inference serving, custom CUDA kernels, multi-region Kubernetes. the whole stack.
Technologies
ML Inference & Optimization
PyTorch, TensorRT, TensorRT-LLM, vLLM, ONNX Runtime, TransformerEngine, torch.compile, FlashAttention, PagedAttention, Triton, Custom CUDA Kernels, Quantization (INT8/FP16/FP4), Pruning, Model Compilation, Nsight Compute, Nsight Systems, Diffusion Model Serving
GPU & Systems Programming
CUDA, CUDA C++, Tensor Cores, NVLink, GPUDirect, Rust, C++, Assembly, Go, Python, GPU Memory Management, Warp-Level Primitives, Kernel Fusion, Occupancy Optimization, Bare Metal GPU Programming
Infrastructure & Orchestration
Kubernetes, GPU Scheduling, Helm, Kustomize, Terraform, ArgoCD, Docker, Ray Serve, Apache Airflow, GitOps, Multi-Region Clusters, Cold-Start Optimization, CI/CD Pipelines, Ingress Controllers, Load Balancing
ML Platforms & Serving
MLflow, SageMaker, Model Serving Architectures, Distributed Training (DDP/FSDP), Mixed Precision, Gradient Checkpointing, Feature Stores, Batch vs Real-Time Pipelines, A/B Inference, Transformer Architecture Optimization
Observability & Networking
Prometheus, OpenTelemetry, Grafana, Distributed Tracing, Structured Logging, Profiling, TLS, DNS, Load Balancing, Ingress, Cross-Region Connectivity, gRPC, Low-Latency Networking
Cloud & Data
AWS, GCP, Multi-Cloud, CoreWeave, PostgreSQL, Redis, Kafka, Pinecone, Vector Databases, GPU FinOps, Cost Attribution, IAM, Secrets Management