I am a researcher in the Cloud Reliability group at Microsoft Research in Redmond, WA.
Previously, I was tenure-track faculty at the Max Planck Institute for Software Systems (2018-2022), where I led the Cloud Software Systems group and held a dual appointment at Saarland University.
I received my PhD from Brown University (2011-2018) under the supervision of Rodrigo Fonseca, supported in part by a Facebook PhD Fellowship.
Research
My research focuses on designing and building reliable, observable, self-managing cloud systems. A central goal for me is to make it easier to operate large, complicated software systems, and to understand their behavior at runtime. Currently I am working at the intersection of observability, semantic modeling, and agentic AI.
Select Projects
Telemeta extracts and indexes semantic models from large-scale observability data, enabling accurate and reliable AI agents for cloud operations. This is an ongoing project I lead at Microsoft Research, so get in touch if you're interested in internships or collaborations!
Blueprint is an extensible compiler and benchmark suite for microservice applications. It simplifies prototyping by making it easy to reconfigure infrastructure choices without rewriting application code. Check out the project on GitHub.
Hindsight is a distributed tracing framework for edge-case tracing, i.e. capturing detailed traces for rare and outlier requests without the data loss of sampling-based systems. It combines per-node telemetry history, programmatic symptom detection, and rapid distributed retrieval. Hindsight appeared at NSDI 2023; code is on GitLab.
Clockwork is a DNN serving system designed for predictable performance. By eliminating sources of variability and centralizing scheduling and admission control, Clockwork achieves extremely tight tail latency. It received the Distinguished Artifact Award at OSDI 2020; code is on GitLab.
Pivot Tracing is a cross-component monitoring framework for distributed systems. Troubleshooting cross-component problems often requires information that is inaccessible due to a lack of end-to-end visibility. Pivot Tracing addresses this by combining causal metadata propagation with dynamic instrumentation, enabling operators to define, measure, and aggregate metrics across component boundaries using a simple SQL-like interface. It received the Best Paper Award at SOSP 2015; code is on GitHub.
Software
Research projects and code are scattered across a few locations:
- github.com/JonathanMace (personal projects)
- gitlab.mpi-sws.org/cld (MPI-SWS projects)
- github.com/brownsys (Brown University projects)
- github.com/tracingplane (Brown University projects)

Supervised Theses
I have been fortunate to work with many talented students over the years. In particular I advised or co-advised the following theses: