Hi, I’m Saurav Shrivastav

Software Engineer | Distributed Systems & Infrastructure Orchestration

I engineer the software that powers large-scale infrastructure. Currently, I am a Software Engineer at LinkedIn in the Reliability Infra organization, where I build distributed orchestration engines to manage a fleet of 80,000+ nodes. My work focuses on transforming complex operational challenges into scalable, “Infra-as-Code” solutions using Python, Go, and Temporal.

Read more about me at about or check out my latest posts in the blog.

What I’m Building

Currently, I focus on Hadoop/YARN/HDFS infrastructure at scale, with a growing focus on agentic automation:

  • Distributed Orchestration: Designed and implemented state-machine driven remediation workflows using Temporal to manage host lifecycles across multi-datacenter deployments.
  • Infrastructure-as-Code: Automated cluster expansion and host provisioning systems for a massive Hadoop fleet, recovering significant underutilized hardware capacity.
  • Resource Management: Built heuristic-based allocation engines to proactively manage build pool hosts, reducing idle time by 88%.
  • Platform Modernization: Leading core service upgrades to modern API frameworks to improve data correctness and system resilience.

The Current Sprint: AI & Systems

I believe the next frontier of infrastructure is Autonomous Reliability. I am currently documenting my journey in building:

  • Agentic Workflows: Orchestrating LLMs via LangGraph to reason about and fix complex system faults.
  • Reliable AI: Integrating Agentic loops with Temporal to ensure AI-driven remediations are consistent, durable, and safe for production.
  • System Internals: Deep-diving into Consensus Protocols (Raft/Paxos) and high-performance networking with gRPC.

What You’ll Find Here

  • Blog Posts: Technical deep-dives on distributed systems, race conditions, and building internal developer platforms.
  • Papershelf: Analysis of foundational research papers—from storage engines like LSM-Trees to the latest in AI orchestration.
  • Learning Journey: A public log of my builds, from gRPC log-intelligence services to self-healing grid agents.

Recent Notes & Articles

A comprehensive look at Large Language Models - from the Transformer architecture to the mechanics of inference and fine-tuning.

| Tags: LLM , Transformers , Architecture , Inference , Fine-tuning

Demystifying the relationships between Artificial Intelligence, Machine Learning, and Deep Learning.

| Tags: AI Fundamentals , Machine Learning , Deep Learning

Bridging the gap between research and production: A practical look at AI Engineering.

| Tags: AI Engineering , MLOps , System Design , Introduction

View all notes →