Justin Huang

Back

About

static post

Coding Agents · AI Infra · AI-native Builder

Justin Huang / 黄振庭 is not trying to be a generic “AI engineering student”, nor a single-topic RAG, multimodal, or visual localization researcher. A more accurate description is: I have built production agent infrastructure and AI-native products, and I am now bringing that systems experience into coding agents and LLM post-training research.

Positioning

The point is not that I know a bit of everything. The thread is clearer than that: I have worked on runtime, tools, gateways, memory, observability, and prompt evaluation in real agent products. I now use those experiences to think about coding agents, LLM post-training, long-horizon task synthesis, and executable evaluation.

An agent is not just a model call. It is an engineering system made of task state, tool protocols, execution environments, memory, evaluation, billing, tracing, rollout, and failure recovery. Many problems do not appear in demos, but become very concrete once the system has multiple users, long chains, asynchronous tool calls, and online accounting.

Systems I Care About

  • Agent Runtime
    From in-memory tasks to governable runtime state: lifecycle, tool calls, async execution, failure recovery, and traces need to be system-level concerns.
  • MCP Tool / Sandbox
    Agent-ready tools are not just APIs with wrappers. They need output protocols, execution environments, permission boundaries, reproducibility, and diagnosable failures.
  • OpenAPI Gateway
    Authentication, product tiers, billing, rate limiting, tool routing, logs, traces, rollout, fallback, and prompt evaluation for internal agent products.
  • Memory Service
    Multi-tenant memory is about the boundary between chat history, governable memory, permissions, updates, cleanup, and retrieval, not simply putting text into a vector store.

Research Turn

I now focus more on coding agents, Terminal-Bench, data synthesis, sandbox construction, verifiers, SFT/RL data quality, and credit assignment in long-horizon tasks. The questions I care about are shaped by production experience: what kind of data trains long-horizon coding agents, what kind of environments and verifiers support reliable RL, and how should we diagnose failures in long agentic trajectories?

Writing

I do not want this blog to become a tutorial site or an AI-flavored brochure. I want to record why I was confused, why a problem appears in real engineering, how I initially misunderstood it, how I debugged it, which designs were trade-offs, and which parts I still have not fully figured out.

One sentence

This is not a site about a student with many scattered experiences. It is a site about a gradually forming thread: production Agent Infra in the past; coding agents, LLM post-training, long-horizon data synthesis, and executable evaluation now; and, long term, AI-native systems that can become both papers and products.

Outside the Terminal

Outside of coding and research, I still play piano and cello, take photos, and maintain my own knowledge base. They are not the main story of this site, but they help me step back from over-engineering and return to a more human rhythm.

About This Site

This site keeps the terminal, blog, and hacker-ish feel of Astro Theme Pure. I am not trying to redesign the whole visual system right now. The first priority is to make the content actually mine: real, restrained, technically grounded, and reflective.