About Traversal
Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—already trusted by some of the largest companies in the world to troubleshoot, remediate, and even prevent the most complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work.
Our roots remain deeply embedded in AI research, and we’re channeling that scientific rigor and creativity into building the premier AI agent lab for the enterprise. Hence, what we’re proudest of is assembling the most talented yet nicest group of individuals, including researchers from MIT, Harvard, and Berkeley, to world-class engineers from industry: Citadel Securities, Cockroach Labs, Datadog, DE Shaw, ServiceNow, Glean, Perplexity, Pinecone, and more, to take on one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.
The Role
As an AI Agents Engineer at Traversal, you’ll be responsible for designing and implementing intelligent agents that operate at the heart of our observability platform. You will work on deploying frontier LLMs in production, optimizing multi-step reasoning workflows, and improving the performance and accuracy of agentic architectures across distributed systems.
This is a highly technical role at the intersection of AI, infrastructure, and systems engineering. You’ll work across teams to ensure our AI agents interact seamlessly with large volumes of observability data and scale reliably across environments.
Responsibilities
- AI Agent Architecture: Design and build multi-step reasoning workflows that use LLMs and other components to analyze large-scale observability data.
- Prompting & Tooling: Develop tooling for prompt engineering, function calling, and agentic orchestration that optimizes latency, reliability, and performance.
- Production Deployment: Integrate AI components into a robust, scalable production environment with low latency and high uptime requirements.
- Experimentation: Prototype and test new LLMs, prompting strategies, and fine-tuning approaches to improve the quality and coverage of agentic responses.
- Collaboration: Work closely with backend, infra, and product teams to ensure AI agents are well-integrated into the broader system.
- Observability & Debugging: Build evaluation and monitoring tools for agent performance and reliability across production workloads.
Requirements
- 3+ years of software engineering experience.
- Strong Python skills, including experience with performance tuning and ML/AI libraries.
- Experience building and deploying distributed systems and/or large-scale data processing pipelines. Familiarity with LLMs, prompt engineering, or agentic frameworks.
- Experience with observability data (logs, metrics, traces) and live incident debugging.
- Proven ability to deliver projects end-to-end: from experimentation to production deployment and iteration.
- Strong debugging skills and ability to work across layers (data, infra, model, and application).
Nice to Have
- Experience with observability data and production infrastructure (logs, metrics, traces).
- Familiarity with model fine-tuning, evaluation tooling, or reinforcement learning from human feedback (RLHF).
- Background in AI agent research or open-source contributions to agent frameworks.
- Experience working with Terraform, Kubernetes, or ML orchestration platforms.
Compensation
We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.
Why You Should Join Us
We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.
Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.
Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.