Featured image for Cursor SDK Tutorial: Build Programmatic Coding Agents
Vibe Coding ·
Intermediate
· · 24 min read · Updated

Cursor SDK Tutorial: Build Programmatic Coding Agents

Learn what Cursor SDK is, how it works, local vs cloud setup, key use cases, pricing considerations, and production guardrails for TypeScript coding agents.

CursorAI CodingDeveloper ToolsTypeScript

Cursor SDK landed at the exact moment coding agents stopped feeling like editor features and started looking like infrastructure. The important shift is not that Cursor can write code; it is that teams can now call Cursor agents from their own software.

That changes the shape of automation. A prompt that used to live in a chat panel can become a CI job, a repository maintenance bot, a ticket workflow, or an internal developer platform feature.

This guide explains how Cursor SDK works, where it fits beside the Cursor AI tutorial, and how teams can use it without giving an unchecked agent the keys to production. The practical goal is simple: start small, keep review gates, and only automate workflows that deserve to be repeated.

What Is the Cursor SDK and Who Is It For?

The Cursor SDK is a TypeScript API for running Cursor’s coding agent from applications, scripts, CI/CD jobs, and internal tools. In Cursor’s SDK announcement, the company described it as access to the same runtime, harness, and models that power Cursor’s desktop app, CLI, and web app.

The short version: Cursor SDK lets software start and manage Cursor agent runs. Instead of opening Cursor, typing a task into Agent, and waiting in the editor, a program can create an agent, point it at a repository, send a prompt, stream progress, and wait for the result.

That sounds small until the second or third use case appears. A release manager can trigger an agent to update migration notes. A CI pipeline can ask an agent to inspect failing tests and propose a fix. A developer portal can expose a button that creates a branch, applies a safe scaffold, and opens a draft pull request.

The SDK is best understood as a bridge between interactive vibe coding and repeatable engineering systems. Cursor already had the editor. Cursor CLI already made terminal-based agent work possible. The SDK adds the missing piece: a programmable surface for teams that want to embed coding agents into their own workflows.

As of May 2, 2026, the public examples center on @cursor/sdk, a TypeScript package. Cursor’s own quickstart examples use Node.js, create an Agent, send a prompt, and stream events from the run. The package is in public beta, so production teams should treat API details as moving parts and pin versions when building anything important.

The people who should pay attention first are not necessarily solo developers chasing a new toy. The highest-value early adopters are usually:

  • Platform engineers building internal developer tooling.
  • DevOps and CI owners tired of hand-triaging repetitive failures.
  • Engineering managers trying to reduce small maintenance tasks that never become roadmap work.
  • Product teams that want code-aware agents inside their own app.
  • Security-conscious teams evaluating whether agent execution should stay local, cloud-hosted, or self-hosted.

The common thread is repeatability. If a task happens once, opening Cursor or Cursor CLI is usually faster. If the same class of task appears every week, the SDK starts to make sense.

5 Things the Cursor SDK Adds Beyond an LLM API

The easiest mistake is to treat Cursor SDK like another wrapper around a model endpoint. That misses the point. A raw LLM API can generate code-shaped text; Cursor SDK gives an agent access to the surrounding machinery that makes code changes viable inside a real repository.

Here are the five additions that matter.

Coding agents fail when they operate from a shallow view of the project. A model that sees only a prompt and a few pasted files will make confident guesses about naming conventions, dependencies, routing patterns, test layout, and error handling.

Cursor’s agent harness is designed around codebase understanding. The SDK inherits that advantage. Cursor says agents launched through the SDK can use codebase indexing, semantic search, and fast grep-style retrieval so the agent can find relevant files before making changes.

That matters more on large projects than small ones. In a 500-line prototype, a prompt can include enough context. In a mature application, the hard part is not writing code; it is finding the right place to write code and avoiding subtle conflicts with local conventions.

2. Tool Execution Around Real Repositories

The Cursor SDK can run agents against a local working directory or a cloud environment with a cloned repository. This is different from asking an LLM to “write tests” in the abstract. The agent can inspect files, propose edits, and work in the same kind of project structure engineers already use.

That is why SDK-based coding agents belong in the same mental category as AI agent frameworks, but with an important specialization: Cursor is optimized around software repositories. The agent’s world is source code, tests, terminal commands, branches, and pull requests.

3. MCP, Skills, Hooks, and Subagents

Cursor’s announcement highlights a full harness: MCP servers, skills, hooks, and subagents. These are not decorative features. They are how teams turn a generic coding agent into a workflow-aware agent.

MCP servers let agents connect to external tools and data sources. Skills give the agent reusable, repo-scoped procedures. Hooks let teams observe, control, and extend the agent loop. Subagents let a main agent delegate specialized subtasks to named agents with their own prompts and models.

For production use, this is where the SDK becomes interesting. A team can encode release rules, security review steps, deployment constraints, or documentation patterns once, then reuse them across multiple automated agent runs.

4. Streaming Runs and Durable Work

Programmatic agents need status, not just final answers. A long-running coding task can search the repository, edit files, run tests, hit failures, retry, and finally produce a branch. If the only output is “done” or “failed,” the system is hard to operate.

Cursor’s changelog for the SDK release describes Cloud Agents API updates around durable agents and per-prompt runs. It also mentions run-scoped follow-ups, status, streaming, cancellation, server-sent event streaming, reconnect support, lifecycle controls, and standardized response and error shapes.

Those details sound like API plumbing, but they decide whether an agent can be used inside real systems. CI dashboards, ticket workflows, and internal apps all need progress events, stable identifiers, cancellation, and understandable failure states.

5. Model Access Through Cursor

The SDK can route work to models supported by Cursor. Cursor’s own examples include composer-2 and gpt-5.5 style model IDs, while its product pages highlight access to frontier models from Cursor, OpenAI, Anthropic, Google, and others.

That does not mean every model is right for every task. The operational question is cost versus reliability. Routine documentation cleanup may not need the most expensive model. A cross-repository refactor that touches build tooling probably does.

CapabilityRaw LLM APICursor SDK
Repository awarenessMust be manually suppliedBuilt into Cursor harness
Code editsText output unless tooling is addedAgent can work against repo context
Workflow integrationCustom orchestration requiredAgent/run API for scripts and apps
MCP and repo skillsMust be built separatelySupported through Cursor configuration
Best fitCustom AI app logicProgrammatic software engineering tasks

The practical takeaway is that Cursor SDK is a software-engineering harness first and a model interface second. Teams buying it only for “another way to call a model” will miss most of the value.

How Cursor SDK Agents Run Locally, in Cloud, or Self-Hosted

The runtime choice is the first architectural decision. Cursor SDK is useful partly because teams can start locally, move durable work into Cursor Cloud, and consider self-hosted workers when security or network access requires it.

Local Runtime

Local mode runs the agent against a working directory on the developer’s machine or on a controlled runner. It is the right starting point for experiments because setup is simple and feedback is fast.

Use local mode when:

  • The task is exploratory.
  • The repository is already checked out.
  • The agent should use the current branch and working tree.
  • A developer wants to inspect changes immediately.
  • The workflow is not yet worth running in the background.

Local mode also makes risk visible. If the agent edits too much, chooses the wrong files, or ignores tests, the problem appears before the workflow is embedded in CI or an internal app.

The tradeoff is durability. If the process stops, the machine sleeps, dependencies are missing, or local credentials are misconfigured, the run can fail. That is fine for early testing and annoying for repeated automation.

Cloud Runtime

Cloud mode is for background work. Cursor says cloud sessions started through the SDK run on the same optimized runtime used for Cloud Agents. Each agent gets a dedicated VM, a clone of the repository, sandboxing, and a configured development environment.

That makes cloud mode a better fit for jobs that should continue when a laptop closes or a developer changes context. A cloud agent can work on a branch, stream progress, and finish by opening a pull request or pushing a branch.

Use cloud mode when:

  • A task may take several minutes or longer.
  • Work should continue independently of a developer’s machine.
  • The output should be a branch, pull request, artifact, screenshot, or demo.
  • Multiple agents should run in parallel.
  • The workflow belongs in CI, a dashboard, or an internal tool.

Cloud mode also changes review expectations. Because the agent works away from the local editor, the output should land in a reviewable place. Draft pull requests are usually the right boundary. Direct merges are usually the wrong starting point.

Self-Hosted Workers

Self-hosted mode is for teams that want Cursor’s agent experience while keeping code and tool execution inside their own infrastructure. In Cursor’s self-hosted cloud agents write-up, a worker connects outbound to Cursor over HTTPS, and each agent session gets its own dedicated worker.

This matters for companies with private package registries, internal services, strict compliance rules, sensitive build artifacts, or network dependencies that cannot be exposed to a third-party cloud environment.

Use self-hosted mode when:

  • The agent needs access to internal networks or protected services.
  • Build artifacts or source code must remain inside the company’s environment.
  • Existing security controls require owned infrastructure.
  • The team already operates Kubernetes or worker fleets.
  • Compliance review will block a vendor-hosted execution environment.

Self-hosting is not automatically safer. It gives more control, but it also gives the team more operational responsibility. Worker lifecycle, credentials, logs, network policy, and update management still need engineering discipline.

RuntimeBest ForMain RiskFirst Sensible Task
LocalPrototypes and developer-driven experimentsAgent can touch the local working treeRepository summary or docs cleanup
Cursor CloudDurable background tasks and PR automationOver-broad prompts can create noisy branchesFailing test triage on a draft PR
Self-hostedRegulated teams and internal-network accessMore infrastructure to operateDependency or build fix using internal services

Most teams should not begin with the most complex runtime. Start locally to prove the prompt and repo rules. Move to cloud when the workflow is valuable. Reach for self-hosted when security, compliance, or internal connectivity makes it necessary.

Teams still building their agent habits should treat this runtime choice as part of broader vibe coding best practices: keep changes inspectable, make context explicit, and prefer small reversible steps over dramatic automation leaps.

Cursor SDK Setup: Install, Authenticate, and Stream a Run

The first Cursor SDK project should be boring. A repository summary agent, documentation checker, or test-output explainer is better than a fully autonomous “fix everything” agent. Boring examples reveal the API shape without creating a large cleanup job afterward.

Teams that have never built agents before may want to pair this setup with a simpler foundation such as building your first AI agent before giving an SDK workflow real repository write access.

The official quickstart in the Cursor Cookbook repository currently uses Node.js 22 or newer and the @cursor/sdk package. The exact SDK shape can change during public beta, so pin dependencies and re-check the docs before building production workflows.

Step 1: Create a Small Node Project

Use a clean package or add a small script directory inside an existing repo. The quickstart examples use TypeScript, so a practical starter stack is Node.js 22+, TypeScript, and either tsx or a compile step.

Install the SDK:

npm install @cursor/sdk

If the team uses pnpm, the equivalent is:

pnpm add @cursor/sdk

Step 2: Create and Store a Cursor API Key

Cursor’s examples expect a CURSOR_API_KEY environment variable. Store the key like any other production credential. Do not commit it into .env files that are tracked by Git, paste it into CI logs, or expose it to untrusted pull request jobs.

For local testing:

export CURSOR_API_KEY="crsr_..."

For CI, use the platform’s secret manager. GitHub Actions secrets, GitLab CI variables, Buildkite secrets, and similar tools are designed for this. The agent should receive only the credentials it needs for the specific workflow.

Step 3: Create a Local Agent

This example starts a local agent, points it at the current working directory, sends a prompt, streams assistant text, and waits for the run to finish. It follows the public quickstart pattern while keeping the task read-oriented.

import { Agent } from "@cursor/sdk";

const agent = await Agent.create({
  apiKey: process.env.CURSOR_API_KEY!,
  name: "Repo summary agent",
  model: { id: process.env.CURSOR_MODEL ?? "composer-2" },
  local: { cwd: process.cwd() },
});

const run = await agent.send(
  "Summarize this repository and list the most important entry points."
);

for await (const event of run.stream()) {
  if (event.type !== "assistant") continue;

  for (const block of event.message.content) {
    if (block.type === "text") {
      process.stdout.write(block.text);
    }
  }
}

await run.wait();

The prompt is intentionally narrow. It asks for a summary and entry points, not edits. Once the team understands streaming and run completion, the next step can be a controlled edit such as “update this README section based on these files” or “add missing tests for this one module.”

Step 4: Add Repository Instructions

The SDK becomes more reliable when the repository teaches the agent how to behave. Cursor supports rules, MCP configuration, skills, hooks, and subagents. For an SDK workflow, those files become part of the product surface.

Useful repo instructions include:

  • Which package manager to use.
  • Which tests must run before a change is acceptable.
  • Which directories are generated and should not be edited.
  • Which files require human review.
  • Which branch naming pattern to use.
  • What kind of output the automation expects.

This is where teams often discover that agent automation exposes missing engineering hygiene. If no one knows the correct test command, the agent will not magically know either. If the repo has six competing setup paths, the agent may pick the wrong one.

Step 5: Move From Read-Only to Draft Changes

After the first read-only task works, move to a draft-change workflow. A good second prompt might be:

Find one outdated code example in docs/sdk.md. Update only that example,
run the documentation lint command, and summarize the diff.

That prompt has a limited file target, a concrete success check, and a summary requirement. It is much safer than “clean up the docs,” which can create broad, subjective edits.

Production agent prompts should be written like small engineering tickets. They need a scope, a definition of done, allowed files, test expectations, and a fallback if the task is ambiguous.

What Can Teams Build With Cursor SDK Agents?

The strongest Cursor SDK workflows share one property: a human can describe the task, but would rather not repeat it manually for the hundredth time. The SDK is especially interesting for repetitive engineering maintenance that is valuable but rarely glamorous.

CI Failure Triage

A CI triage agent can receive failed job logs, inspect the repository, identify likely root causes, and propose a fix. The safest version opens a draft pull request with a summary, test output, and uncertainty notes.

This works best for common failure classes: snapshot updates, type errors, lint failures, dependency drift, flaky test quarantine suggestions, and straightforward build configuration mistakes. It works poorly when the underlying issue requires product judgment or coordination across teams.

Documentation Refresh

Documentation often falls behind code because it is not part of the critical path. An SDK agent can compare exported APIs, route definitions, CLI flags, or configuration schemas against docs and open small updates.

This is a good early workflow because the blast radius is lower than production code. It also teaches the team how to review agent output without risking runtime behavior.

Dependency Upgrade Assistants

Dependency updates are repetitive but detail-heavy. An agent can bump a package, inspect breaking changes, update imports, run tests, and summarize remaining failures.

The review gate matters here. Dependency upgrades can introduce security and runtime changes that tests do not catch. The agent can do the mechanical work; maintainers still need to evaluate release notes and risk.

Ticket-to-Draft-PR Workflows

Some teams want a workflow where a Linear, Jira, or GitHub issue can trigger an agent to create a draft branch. Cursor SDK is a natural fit because the agent can be started from code and can operate in a cloud runtime.

The ticket must be written well. “Fix onboarding” is not a task. “Add empty-state copy to the billing step when there are no saved payment methods” is closer. Agents amplify the quality of the input they receive.

Repository Health Bots

Repository health work is a strong fit for AI coding agents because it can be scoped and measured. Examples include unused dependency reports, missing test detection, stale TODO aggregation, broken internal links, generated type drift, or migration checklist updates.

The useful pattern is not one giant cleanup agent. It is a set of narrow agents, each responsible for one category of maintenance.

Internal Product Features

The most ambitious use case is embedding a Cursor-powered agent inside an internal or customer-facing product. For example, a developer platform might let users request a service scaffold, a test harness, or a safe migration branch from a UI.

This is where the SDK is more compelling than the CLI. A product needs API-level control over prompts, models, runs, streaming, cancellation, artifacts, and state. A terminal command is not enough.

The deciding question is simple: would this workflow benefit from being a button, scheduled job, webhook, or product feature? If yes, the SDK is worth evaluating. If no, the editor or CLI may be enough.

Cursor SDK vs CLI, Claude Agent SDK, and Codex

Cursor SDK sits in a crowded but fast-maturing category. The right comparison is not “which agent is best?” The better question is “which surface fits the workflow?”

Cursor CLI is best when a developer wants to use Cursor Agent from the terminal. It is direct, interactive when needed, and useful for scripts or headless prompts. The SDK is better when another application needs to create, track, and manage agents as part of a larger system.

Claude Agent SDK is broader in language support and deeply tied to Anthropic’s Claude Code/Agent SDK ecosystem. Anthropic’s Claude Agent SDK overview emphasizes TypeScript and Python options, tool permissions, MCP, context management, and building custom agents on Claude’s harness.

OpenAI’s agent ecosystem splits across Codex and the OpenAI Agents SDK. Codex focuses on software engineering tasks in local, IDE, app, and cloud surfaces. OpenAI’s Agents SDK update focuses on building agentic applications with tools, handoffs, tracing, and controlled sandbox execution.

ToolBest FitDeveloper SurfaceMain Strength
Cursor Editor AgentInteractive coding inside CursorGUI/editorTight developer feedback loop
Cursor CLITerminal-based agent workCLIFast prompts, scripts, and local automation
Cursor SDKProgrammatic coding agentsTypeScript APIEmbedding Cursor agents in workflows and products
Claude Agent SDKClaude-powered custom agentsTypeScript and Python SDKsClaude-specific agent harness and tool controls
OpenAI CodexDelegated software engineeringCLI, IDE, app, cloudParallel coding tasks and reviewable outputs
OpenAI Agents SDKGeneral agent applicationsPython and TypeScript SDKsTools, handoffs, tracing, and sandboxed agent apps

For teams already standardized on Cursor, the SDK has an obvious advantage: it uses the same agent family, repository conventions, and mental model developers already know. That reduces adoption friction.

For teams building model-provider-neutral platforms, the decision is more nuanced. Cursor SDK may be the best way to run Cursor agents, but a broader agent platform may still need Claude, Codex, and custom OpenAI agents for different tasks.

There is no permanent winner in this category. The agent stack is still changing quickly. Teams should optimize for clear interfaces, reversible choices, and output that can be reviewed through normal engineering workflows.

Security, Pricing, and Review Gates for Production Use

A coding agent with repository access is operationally powerful. A coding agent with terminal access, credentials, branch permissions, and cloud runtime access is more powerful still. The SDK deserves the same control mindset teams apply to CI systems and deployment bots.

Scope the Agent’s Workspace

The first production rule is file scope. Agents should know which directories they may edit and which ones they must leave alone. Generated files, migrations, infrastructure code, secrets, lockfiles, and payment flows often need stricter controls.

Local mode should run in disposable branches when possible. Cloud mode should create draft pull requests. Self-hosted mode should run under service accounts with minimum necessary access.

Protect Secrets

Never give an agent broad credentials because “it might need them.” Most coding tasks do not require production secrets. If a workflow needs package registry access, provide a scoped token. If it needs cloud access, use a limited service account. If it needs database access, point it at a disposable environment.

The uncomfortable truth is that agent safety depends heavily on old-fashioned security basics: least privilege, short-lived credentials, isolated environments, clear logs, and human review.

Require Tests and Evidence

Every editing workflow should define the required checks before the run starts. The agent should know which command to run and what to do if it fails.

For example:

Run npm test -- --runInBand after editing. If tests fail, attempt one fix.
If tests still fail, stop and summarize the failure without making more changes.

That instruction prevents infinite wandering. It also produces a reviewable trail: task, edits, test command, result, remaining risk.

Use Pull Requests as the Safety Boundary

The safest default output is a draft pull request. It gives engineers a familiar review surface, preserves diffs, runs normal CI, and supports discussion.

Direct commits to protected branches should be off the table for early workflows. Auto-merge can come later, but only for narrow tasks with strong tests and low blast radius.

Watch Spend and Model Selection

Cursor says the SDK is billed through standard token-based consumption pricing, and Cursor’s pricing pages explain that model selection affects usage consumption. That means SDK workflows need cost visibility.

Teams should log run counts, model choices, average duration, retry rates, and failure rates. A cheap model that fails repeatedly can be more expensive than a stronger model that finishes once. A cloud agent that works for 20 minutes on an ambiguous prompt can be more costly than a human rewriting the ticket.

The best cost control is prompt quality. Narrow tasks, clear file targets, and explicit stop conditions reduce both runtime and review burden.

Build a Human Override

Every production agent system needs cancellation and escalation. Cursor’s updated Cloud Agents API references cancellation and lifecycle controls, which are exactly the kind of controls operators need.

Human override is not a sign the agent failed. It is part of the system design. The agent handles routine work; humans handle ambiguity, product judgment, security risk, and final accountability.

Cursor SDK FAQ: Practical Answers Before You Build

What is the Cursor SDK?

Cursor SDK is a TypeScript package that lets developers run Cursor’s coding agent programmatically from scripts, applications, CI/CD workflows, and internal tools. It exposes the agent runtime behind Cursor’s editor, CLI, and web surfaces so software can create agents, send prompts, stream run events, and wait for results.

Is Cursor SDK the same as Cursor CLI?

No. Cursor CLI is a terminal interface for using Cursor Agent directly. Cursor SDK is an API for embedding Cursor agents into code. Use the CLI when a developer wants to run a prompt from a shell. Use the SDK when a product, pipeline, dashboard, webhook, or scheduled job needs to create and manage agent runs.

Is Cursor SDK available to everyone?

Cursor announced the SDK as a public beta for all users on April 29, 2026. Because it is a beta, teams should expect changes, pin package versions, and check official docs before relying on specific event shapes or API behavior in production.

What language does Cursor SDK use?

The public SDK surface is TypeScript through the @cursor/sdk package. Cursor’s starter examples use Node.js and TypeScript. Teams using Python, Go, or another backend can still call a small TypeScript service or script, but they should not assume first-party SDKs for other languages exist unless Cursor announces them.

Does Cursor SDK support MCP servers?

Yes. Cursor says SDK-launched agents can use MCP servers as part of the same harness that powers Cursor’s other agent surfaces. MCP is useful when the agent needs structured access to tools such as issue trackers, documentation systems, internal APIs, or other data sources.

Can Cursor SDK agents open pull requests?

Cursor’s announcement describes cloud agents that can open pull requests, push branches, or attach demos and screenshots when they finish. The right production pattern is usually a draft pull request with clear run notes, test output, and a human reviewer.

How much does Cursor SDK cost?

Cursor says the SDK uses standard token-based consumption pricing. Exact cost depends on the selected model, prompt size, repository context, output length, retries, and runtime behavior. Teams should monitor usage from the start instead of waiting for monthly billing surprises.

Should teams use local, cloud, or self-hosted mode first?

Most teams should start locally, because local mode exposes behavior quickly and keeps the first experiment simple. Move to cloud mode for durable background tasks and pull request automation. Consider self-hosted workers when code execution must stay inside the organization’s own network or when the agent needs internal dependencies and services.

How is Cursor SDK different from Claude Agent SDK?

Cursor SDK is built around Cursor’s coding-agent runtime and Cursor-supported models. Claude Agent SDK is built around Anthropic’s Claude agent ecosystem and supports TypeScript and Python. Teams already using Cursor heavily may prefer Cursor SDK for repository automation. Teams building Claude-specific agents or Python-native workflows may prefer Claude’s SDK.

What should a first Cursor SDK project be?

The best first project is a low-risk, read-heavy workflow: summarize repository structure, find stale docs, inspect failing test logs, or suggest missing tests for one module. Avoid broad write access until the team has prompt patterns, rules, logging, and review gates in place.

Final Takeaway for Teams Evaluating Cursor SDK

Cursor SDK is a sign that coding agents are moving from “ask the editor” to “wire agents into the engineering system.” That is a real shift, but it does not remove the need for engineering judgment.

The best early implementation is narrow: one repo, one repeated task, one runtime, one review path, and one clear definition of done. A team that proves value with a documentation refresh agent or CI triage assistant will learn more than a team that starts with an all-purpose autonomous developer.

Treat Cursor SDK as infrastructure for repeatable coding workflows, not a magic layer over software complexity. Pair it with explicit repo rules, scoped credentials, tests, draft pull requests, and normal code review. Used that way, it can become one of the more practical additions to the best vibe coding tools rather than just another launch-week SDK to experiment with and forget.

Found this helpful? Share it with others.

Vibe Coder avatar

Vibe Coder

AI Engineer & Technical Writer
5+ years experience

AI Engineer with 5+ years of experience building production AI systems. Specialized in AI agents, LLMs, and developer tools. Previously built AI solutions processing millions of requests daily. Passionate about making AI accessible to every developer.

AI Agents LLMs Prompt Engineering Python TypeScript