Contributing to the LangGraph Chain¶

This guide covers working on backend/chain/ — the LangGraph 8-node AI pipeline. Repo: aharbii/movie-finder-chain

For cross-cutting conventions (branching, commits, PRs, releases) see the Contributing Overview.

Pipeline overview¶

classify → search_rag → enrich_imdb → reason → route
                                                  │
                    ┌─────────────────────────────┤
                    │                             │
                 refine (≤3×)               confirm → qa_agent
                    │                             │
                  dead_end ◄────────────────── (exhausted)

All state is carried in MovieFinderState (TypedDict, src/chain/state.py). Nodes are pure functions: they receive the full state and return a partial update.

Development setup¶

The chain runs inside the backend Docker stack. From backend/chain/:

make dev         # build + start chain container with volume mount
make shell       # attach a shell to the running container
make test        # run pytest inside Docker
make lint        # ruff check + format check
make typecheck   # mypy --strict
make pre-commit  # all hooks

You can also run from the backend root:

cd backend/
make up          # starts full stack including chain

Adding a new node¶

Create src/chain/nodes/<name>.py — implement as an async function returning a state partial:

async def my_node(state: MovieFinderState) -> dict[str, Any]:
    """One-line docstring. Google style."""
    value = state.get("some_field", default_value)
    # ... logic
    return {"result_field": result}

Register the node in src/chain/graph.py — add to the builder and connect edges.
Update state.py if new fields are added to MovieFinderState.
Write tests in tests/nodes/test_<name>.py.
Update docs/architecture/plantuml/04-langgraph-pipeline.puml and 05-langgraph-statemachine.puml.

State rule: MovieFinderState has total=False (issue #15). Always use .get() with a safe default when reading fields — never index directly.

Code standards¶

mypy --strict must pass on every node
Nodes are pure functions — no shared mutable state between calls
No os.getenv() in node files — read settings from config.py (Pydantic BaseSettings)
No print() — use logging.getLogger(__name__)
Async all the way — no blocking I/O inside async functions
Line length: 100

Testing¶

make test           # all tests
make test-coverage  # with coverage report

Mock external services (Qdrant, OpenAI, Anthropic, imdbapi) — no real API calls in unit tests
pytest --asyncio-mode=auto is configured — async def test_* works without @pytest.mark.asyncio
Coverage must not regress

Environment variables¶

Copy backend/chain/.env.example to .env and fill in:

ANTHROPIC_API_KEY, OPENAI_API_KEY
QDRANT_URL, QDRANT_API_KEY_RO, QDRANT_COLLECTION_NAME
CLASSIFIER_MODEL, REASONING_MODEL
RAG_TOP_K, MAX_REFINEMENTS, IMDB_SEARCH_LIMIT, CONFIDENCE_THRESHOLD
LANGSMITH_TRACING, LANGSMITH_API_KEY, LANGSMITH_PROJECT  (optional)