BlueprintParser — Construction Blueprint Intelligence
An open-source pipeline that turns construction PDFs into structured, LLM-queryable data — with a human-in-the-loop viewer, automated takeoff, and on-demand YOLO object detection on top.
Your First Project in Five Minutes
If you have never seen BlueprintParser before, read this section first. It ignores the code, the AWS stack, and the tool-registry plumbing. It shows you the happy path a working estimator takes: upload a PDF, wait a minute, open the viewer, let BP find the things on the page, and export numbers for a bid.
1. Upload a PDF
From the dashboard at /home, drag a drawing set onto the upload card. BP accepts a normal multi-page PDF; individual pages up to roughly 10,000 px on either axis work fine at 300 DPI, which covers the common 24×36 and 30×42 sheet sizes. You get a progress bar; when it finishes, the project appears in the project list.
Behind the scenes, the file is uploaded to S3 and a processing job is kicked off. You do not have to wait on the page; you can close the tab and come back later.
2. BP reads the pages
For each page, BP runs OCR, detects CSI MasterFormat codes (the industry-standard classification scheme — "08 14 00 = Wood Doors"), extracts drawing numbers from title blocks, detects schedules and keynotes, and classifies what's on the sheet. This takes roughly one minute per ten pages. A 200-page set is usually ready in five to ten minutes on the default Fargate tier.
You don't have to do anything during this step. When the project card shows Ready, click in.
3. Open the viewer
The viewer lives at /project/[id]. It looks like a drawing review tool: a page sidebar on the left, a big canvas in the middle, a toolbar across the top, and a stack of panels you can flip open from the right edge. Pan with V, click with A, scroll to zoom (hold ⌘/ Ctrl if you're on a trackpad). The panels on the right — Text, CSI, LLM Chat, QTO, Schedules, Keynotes — are the feature surface. You only open the ones you need.
4. Run detection and tag a schedule
BP's text pipeline already knows where the schedules and keynotes are. What it doesn't know, until you ask, is where every door and window physically ison the floor plans. That's a YOLO run (an admin kicks it off — see Section 5). Once it's done, open the Schedules/Tables panel, point at the door schedule, and click Auto Parse. Then pick which YOLO class the tags are drawn inside (usually circle) and run Map Tags. BP binds each schedule row to every matching shape in the drawings and gives you a count per row.
Auto-QTO (the QTO → Auto tab) does all of that on autopilot: pick a material type, confirm the schedule, run the mapping, review the counts, export.
5. Export
Everything in the QTO panel exports to CSV or Excel through the Export CSVbutton at the bottom of the panel. One row per tag or area item, with counts, pages, annotations, and notes. Paste it into the bid spreadsheet and you're done.
When something looks off
Most of the complexity in these docs is the answer to one question: what if the happy path doesn't work? If bucket fill leaks through an open doorway, Section 8 covers barriers and the four tuning knobs. If Auto-QTO blocks you at the start, Section 7 covers the YOLO class requirements and how to fix them from Admin. If chat runs out of context room on a big project, Section 9 explains the budget and the presets that trade structure for OCR. And if you want to know how the whole thing runs on AWS, Section 11 is the tour.
What BlueprintParser Is
BlueprintParser (BP) is an open-source, self-hostable platform that turns construction PDFs into structured, LLM-queryable data. You upload a multi-page drawing set; BP rasterizes each page, runs OCR, detects CSI MasterFormat codes, extracts structured text annotations, parses tables and schedules, classifies drawing regions, and produces a per-project projectIntelligencebundle — a compact description of the project that is small enough to fit inside an LLM context window but rich enough to answer detailed questions about quantities, trades, cross-references, and specifications.
On top of that structured layer, BP ships a full blueprint viewer with markup, takeoff, tag-mapping, and chat — plus an admin dashboard that runs on-demand YOLO object detection via SageMaker when you want the project to become spatially aware as well as textually.
Feature map: engines + viewer
BP is organized as a set of engines that produce structured data and a single Viewer that consumes it, with a Graph/Output layerthat feeds everything back to the LLM and the Admin dashboard. Data flows upload → Preprocessing Engine → (optional) YOLO Post-Pipeline → Viewer surfaces (display + user parsing) → ParsedRegions → Graph/Output → LLM chat and downstream tools. Every stage persists to pageIntelligence or projectIntelligence; nothing is ephemeral.
- Preprocessing Engine(upload-time, always runs per page): rasterize at 300 DPI → Textract OCR (LAYOUT + TABLES) → drawing-number extraction → CSI code detection (3-tier matching) → text annotations (phones, equipment tags, abbreviations, 37+ types) → shape parse (keynote symbols via Python/OpenCV) → page intelligence analyze (classification, cross-refs, noteBlocks) → text-region classify (6-stage composite: LINE consumption, column-aware proposal, whitespace-rect discovery, Union-Find merge, per-region analysis, decision tree) → heuristic engine (9 rules, text-only mode) → table classifier → CSI spatial map (9×9 grid with title-block + right-margin zones).
- YOLO Post-Pipeline(admin-triggered, optional): SageMaker Processing job on g4dn.xlarge → YOLO annotations ingested → re-run heuristic engine with YOLO data → re-classify tables → composite region classifier (
classifiedRegions) → YOLO density heatmap (text_box + vertical_area + horizontal_area aggregated on a 16×16 grid) → ensemble reducer (cross-signal agreement, suppresses keyword-only false positives) → auto-table-detector (emitsAutoTableProposal[], read-only until user commits). - Viewer (user surface,
/project/[id]): canvas with pdf.js rasterizer + nine overlay layers, a dense toolbar, three mutually-exclusive modes (pointer/move/markup), a stack of toggleable right-side panels (Text, CSI, LLM Chat, QTO, Schedules/Tables, Keynotes, Specs/Notes, Page Intelligence, View All), a bottom Annotation Panel, and user-driven parsing tools (Table Parse, Keynote Parse, Notes Parse, Spec Parse [planned], Symbol Search, Bucket Fill, Shape Parse, Split Area, Scale Calibration). Section 02 enumerates the full tree. - Graph / Output Layer (downstream consumers): every user-committed ParsedRegion promotes via
/api/regions/promoteintopageIntelligence.parsedRegions; CSI tags merge intopages.csiCodesvia idempotentmergeCsiCodes;computeProjectSummariesrebuildsprojectIntelligence.summaries(schedules, notesRegions, specRegions, parsedTables, yoloTags). Thecontext-builderassembles a budget-allocated LLM payload from all of the above; the CSI network graph + hub pages are derived once per project and surfaced in chat and the View All panel. - Admin Dashboard: Pipeline config (toggle stages, concurrency, per-company heuristic overrides), Heuristics tab (DSL editor for rules), AI Models tab (register YOLO models, trigger runs), LLM Config (provider + context-budget allocations across 19 sections), Overview (reprocess controls + Lambda CV job status). Every viewer feature has a corresponding admin tuning surface.
How they connect: Preprocessing runs once per upload and populates JSONB blobs on pages. YOLO Post-Pipeline augments those blobs on admin trigger. The Viewer reads them into a Zustand store (17 slice hooks) and renders overlays; user parsing tools write back to the same blobs via the generic /api/regions/promote commit route. The Graph/Output layer re-derives summaries on every commit and serves them to Chat and the Admin dashboard. The whole stack is one database shape with one write path per mutation, which is why every number in the UI traces back to a pixel on a page.
The two data models
Everything in BP ultimately fits into two axes. Horizontally, a project is a list of pages; each page carries OCR text, a classification, detected text annotations, detected tables, CSI codes, and (optionally) YOLO detections. Vertically, a project is a bundle of cross-cutting data: annotations (user markups + YOLO + takeoff), pageIntelligence (per-page structured analysis), and projectIntelligence (a project-wide summary including the CSI network graph, hub pages, and discipline breakdown).
The preprocessing pipeline, the LLM context builder, the takeoff engine, and the viewer all read and write through those two shapes. Section 03 walks through how the data actually arrives; Section 11 walks through where it is stored and why.
What runs locally vs. what needs AWS
BP is the same codebase in every deployment tier — the difference is purely which external services are configured. A development machine with Docker and a Groq free-tier API key can run the full viewer against a locally-hosted PostgreSQL instance, parse tables with img2table and Camelot, and chat with an LLM, all without a single AWS credential. Add an S3 bucket and page images become durable; add the full Terraform stack and you get CloudFront, Textract, Step Functions, and Label Studio; add a SageMaker Processing role and a YOLO ECR image and YOLO object detection becomes available on-demand.
| Tier | Requires | Works | Does not work |
|---|---|---|---|
| Local Docker | Docker Compose, postgres:16, no AWS | Upload, viewer, CSI detect, table parse (img2table/Camelot/TATR), LLM chat via Groq free tier, heuristics, QTO (manual), Bucket Fill | Textract (falls back to Tesseract), SageMaker YOLO, CloudFront, S3 durability |
| Local + S3 | AWS creds for S3, S3_BUCKET, rest local | Everything Local Docker does, plus durable page/thumbnail storage and cross-device viewer load | Textract, SageMaker YOLO, CloudFront |
| Full AWS (CPU-only) | Terraform stack: ECS, RDS, S3, CloudFront, Step Functions, Textract, Secrets Manager | Production pipeline with Step Functions orchestration, Textract OCR, cached page CDN, multi-user auth, Label Studio | YOLO inference (no GPU) |
| Full AWS + SageMaker | Add SageMaker Processing role, a YOLO ECR image, sagemakerEnabled toggle | All of the above plus on-demand YOLO object detection on ml.g4dn.xlarge for Auto-QTO, tag mapping, symbol search | (nothing — this is the full stack) |
/demoroute hosts a read-only view of a seeded demo project, including YOLO detections, parsed schedules, and chat. It's the fastest way to kick the tires without installing anything.Tech stack snapshot
BP is a single Next.js 16 application (App Router, React 19, TypeScript) backed by PostgreSQL 16 via drizzle-orm. State in the viewer lives in a single zustand store with slice selectors. LLM access goes through a thin adapter layer over the Anthropic, OpenAI, and Groq SDKs, plus a generic OpenAI-compatible endpoint for Ollama and self-hosted models. The CSI network graph is rendered with d3-force. Python sidecars (pdfplumber, Camelot, img2table, TATR, OpenCV, Tesseract, and the YOLO inference container) are spawned from TypeScript via stdin/stdout JSON. AWS deployment is codified in infrastructure/terraform/— 13 files totaling the full stack including ECS, RDS, S3, Step Functions, IAM, Secrets Manager, and CloudFront.
Inside the Viewer
The viewer lives at /project/[id] and is the primary surface for every user-facing feature in BP. It is a single React tree driven by a Zustand store with 17 slice selectors, backed by a client-side pdf.js rasterizer for the canvas and a series of overlay layers for annotations, markups, YOLO detections, keynotes, parse regions, and search highlights. Everything you do inside a project flows through this view.
Feature tree — brute-force inventory
Every feature under the Viewer, nested by the DOM/panel hierarchy it renders into. One-line description under each. If a feature has sub-modes or tabs, those are indented under the parent. This is deliberately exhaustive; skim for the shape, read for the specifics.
- Canvas core
PDFPage.tsxpdf.js rasterizerRenders the current page as a bitmap at the user's zoom scale; caches the last 8 rendered pages as ImageBitmaps for instant tab-back.- Zoom / Fit / Pan controls+/− buttons, Fit-to-window, wheel-zoom in Move mode, drag-to-pan in Move mode.
- Thumbnail sidebarCollapsible left-side page list with page-name + drawing-number labels; click to jump, scrolls synchronously with the main canvas.
- Modes (mutually exclusive, keyboard-bound)
- Pointer (
A)Click-to-select overlays; double-click opens edit dialogs for markups, annotations, parsed regions. - Move / Pan (
V)Click-drag pans the canvas, wheel zooms. No overlay interaction. - MarkupDraw rectangle, polygon, or freehand stroke. Opens MarkupDialog on finish for name + note + color pick from 20-color palette.
- Group / multi-selectShift-click-add plus empty-canvas lasso; applies bulk ops (delete, recolor, category change) across selection.
- Pointer (
- Canvas overlay layers(stable z-order, normalized 0–1 coordinates)
SearchHighlightOverlayYellow boxes aroundtsvectorsearch hits from the toolbar text-search.TextAnnotationOverlayBoxes around detected phones, equipment tags, room names, abbreviations (37+ annotation types).KeynoteOverlayKeynote shape detections (circles, hexagons, diamonds) with inner-text OCR; gated by theshowKeynotestoggle.AnnotationOverlayThe master layer — YOLO detections, user markups, takeoff items, shape-parse output, symbol-search results. Click-to-select, drag-to-move, vertex-edit on polygons. Also hosts the draw-rect state machine for Parse flows.ParseRegionLayerSaved ParsedRegion outlines + grid preview, color-coded by type (keynote amber, notes blue, spec violet, schedule pink). Also renders the sharedparseDraftRegiondashed preview while a user is actively parsing.GuidedParseOverlayDraggable row + column boundary lines rendered during a Guided Parse (keynote and notes share this via a prop-based API).FastManualParseOverlayStage 4 Notes primitive — double-click snaps to Textract LINE, derives columns from line margins. Pending rework intoParagraphOverlay(paragraph-level hit-test + adjustable BB + Cmd+C/V template paste).DrawingPreviewLayerRubber-band preview while the user is dragging to draw a new markup or bbox.ParsedTableCellOverlayTATR cell-structure overlay for parsed tables; click-a-cell to search by its text, double-click to toggle highlight.
- Toolbar
- Back-to-dashboard + click-to-rename project nameInline edit on the project name, persists to
projects.name. - Zoom controls (− / % / +) + FitSymmetric bracketed zoom; Fit recalculates for the current page dimensions.
- Mode toggle (Pointer / Pan / Markup)3-state button; keyboard shortcuts A, V.
- Symbol Search buttonDraw a template bbox to find all instances; exposes Lite / Power / Custom presets for confidence thresholds.
- Menu dropdownLabeling wizard (YOLO training export), Settings, Page Intelligence toggle, Admin link, Help tips toggle, Export PDF (disabled placeholder).
- Text searchFull-text search over OCR via Postgres
tsvector; highlights on canvas + lists hits in Text panel. - Trade filter / CSI code filterFilter the CSI Network Graph, View All, and QTO lists by trade or specific CSI division.
- YOLO toggle (+ per-model confidence sliders)Shows when any YOLO annotation is loaded. Dropdown chevron opens per-model sliders (yolo_medium / yolo_primitive / yolo_precise).
- Six panel togglesText / CSI / LLM Chat / QTO / Schedules/Tables / Keynotes (+ Specs/Notes in the D2 panel orchestrator).
- Back-to-dashboard + click-to-rename project name
- Right-side panels (toggleable, stackable)
- Text Panel (
TextPanel.tsx)OCR text viewer, searchable, per-word Textract confidence, click-to-jump-to-canvas-position. - CSI Panel (
CsiPanel.tsx)Detected CSI MasterFormat codes grouped by division; toggle between page-scope and project-scope; click a code to highlight triggers on canvas. - LLM Chat Panel (
ChatPanel.tsx)Project- or page-scoped chat, streams via SSE; has 20 tools (search, read-page, highlight, zoom, list-schedules, count-takeoff, etc.). - QTO Panel (
TakeoffPanel.tsx) — quantity takeoff- Count tabClick-to-count with color-coded markers; auto-deduplicates via YOLO tag bindings where available.
- Area tab (+ Scale Calibration + Bucket Fill + Split Area)Polygon draw or bucket-fill flood with text-as-wall barrier detection; Scale Calibration is a 2-point known-dimension flow; Split Area slices a saved polygon.
- Linear tabPolyline length; same scale-calibration model as Area.
- Auto-QTO tabSuggests pages with likely schedules (ensemble-driven after Stage 2b); “Find & Parse Doors Schedule” style shortcuts auto-trigger Table Parse.
- All tabFlat list of every committed takeoff item, exportable to CSV.
- Count tab
- Schedules/Tables Panel (
TableParsePanel.tsx)- Auto ParseMulti-method merger: OCR-positions, Textract TABLES, OpenCV lines, img2table. Returns a consolidated grid + confidence.
- Guided ParseUser draws region, server proposes row/col boundaries, user drags to adjust, client extracts cells.
- Manual ParseUser draws column BBs + row BBs; grid extraction runs client-side via word-center hit-test.
- Compare / EditSide-by-side method outputs; edit cell text and re-save.
- Map Tags sectionBind tag column of a parsed table to YOLO tag instances; auto-infers scope + pattern.
- Auto Parse
- Specs/Notes Panel (
SpecsNotesPanel.tsx) — D2 orchestrator- Spec Parse tabStage 5 scope, currently stubbed. Will target full-page vertical-column spec layouts (PART / SECTION / GENERAL NOTES dense prose).
- Notes Parse tab (
NotesPanel.tsx)- IndexProject-wide table of detected note regions from
summaries.notesRegions; row click jumps to page and opens Parser pre-filled with the region bbox. - ClassifierPer-page Accept / Edit / Reject cards for Layer-1 classified textRegions (notes-numbered + notes-key-value). Accept one-click-promotes via
/api/regions/promote; Reject persists torejectedTextRegionIdswith stale-ID cleanup. - Parser — Auto sub-modeServer runs
parseNotesFromRegion(numbered-first, K:V fallback) + CSI detection; client shows dashed preview on canvas. - Parser — Guided sub-modePropose row/col boundaries, user drags on GuidedParseOverlay, client extracts grid.
- Parser — Fast-manual sub-mode (pending rework)Double-click Textract LINE to snap columns. Known-broken on dense multi-line paragraphs; scheduled for redesign as
ParagraphOverlayprimitive. - Parser — Manual sub-modeDraw column BBs + row BBs; grid extracted client-side via word-center hit-test. The always-works fallback.
- Index
- Keynotes tab (
KeynotePanel.tsx)- All KeynotesFlat list of every parsed keynote table across the project; CSV export.
- Auto / Guided / Manual / CompareSame sub-mode taxonomy as Table Parse but scoped to bubble-keyed keynote grids.
- All Keynotes
- Spec Parse tab
- Page Intelligence Panel (
PageIntelligencePanel.tsx)Read-only dump ofpageIntelligencefor the current page — classification, crossRefs, textRegions, noteBlocks, heuristicInferences, ensembleRegions. Debug/inspection surface. - View All Panel (
ViewAllPanel.tsx)Project-wide list with per-entity eye toggles (master-eye memento); surfaces schedules, parsed tables, keynotes, notes, specs, YOLO tags, CSI codes. Clickable-graph substrate for future LLM-side reasoning.
- Text Panel (
- Bottom Annotation PanelHorizontal summary row grouping Markups, YOLO detections, and takeoff items by source; filter chips per category.
- Dialogs / modals
- Markup dialogName + note + color on markup save.
- Bucket Fill Assign dialogAssign a filled region to an Area item + color; surfaces HTTP errors inline.
- Scale Calibration dialogTwo-point calibration with known real-world distance + unit selector (ft, in, m, mm).
- Symbol Search configConfidence presets (Lite / Power / Custom) + per-project defaults.
- Export CSV modalKeynote / Schedule / Notes export with column selection.
- Markup dialog
- Standalone tools (trigger from toolbar or panels)
- Symbol SearchDraw a bbox around any symbol on the page; CV matcher finds every other instance across the project.
- Bucket FillFlood-fill area computation from a click point; text-as-wall paradigm with 1k/2k/3k/4k resolution slider; assigns to Area item with error surfacing.
- Split AreaSlice a saved polygon with a user-drawn line into two children.
- Shape ParsePython/OpenCV keynote-shape detector (circles, hexagons, diamonds, pills, squares). Runs at upload; results live in
pages.keynotes. - Scale CalibrationPer-page; stored in
scaleCalibrations[pageNumber]. Required before any Area/Linear takeoff produces real-world units.
- Symbol Search
- ParsedRegion outputs (write path from Viewer into the graph)
type: "schedule"Tabular grid from Schedules/Tables panel.type: "keynote"Key → Description grid from the Keynotes tab.type: "notes"Notes-numbered or notes-key-value grid from Notes Parse.type: "spec"(Stage 5 planned)Section-header → body list from Spec Parse.type: "legend"Symbol legend variant; shares NotesData shape.
All types commit through the genericPOST /api/regions/promoteroute. Server merges CSI tags intopages.csiCodesviamergeCsiCodesand refreshesprojectIntelligence.summariesafter the transaction commits.
Anatomy
Top: the toolbar. Left: a collapsible page sidebar with thumbnails. Center: the canvas, which renders the current page and its overlays. Right: a stack of toggleable panels — Text, CSI, LLM Chat, QTO, Schedules/Tables, Keynotes, Page Intelligence — which fly in from the right edge when activated. Bottom: the Annotation Panel, a summary row grouping markups, YOLO detections, and takeoff items by source.
The toolbar
The toolbar is dense by design — a working estimator needs every mode and every panel within one click of the canvas. Below is a live static rendition with fake data so you can see the exact layout and control styling without loading a real project.
From left to right: the back arrow returns to the project dashboard; the project name is click-to-rename; the - and + buttons bracket the current zoom percentage and the Fit button auto-fits the page; the 3-state mode toggle selects Pointer / Pan / Markup; the Symbol button opens a draw-a-bbox-to-find-all-instances workflow; the Menu button opens the dropdown (shown below). The right half of the toolbar carries the text search, the trade filter, the CSI code filter, the YOLO toggle (with per-model dropdown when multiple models are loaded), and the six panel toggles.
Modes: pointer, move, markup
The canvas has three mutually-exclusive modes, controlled by setMode() in the viewer store. The internal mode values are "pointer", "move", and "markup". Keyboard shortcuts are A (pointer), V (pan/move), and switching to Markup mode activates the drawing tools. Pointer mode clicks on overlays to select them; move mode click-drags the canvas to pan and mouse-wheel zooms; markup mode lets you draw rectangles, polygons, or freehand strokes.
Markup mode
Markup annotations are user-authored overlays: a rectangle, polygon, or freehand stroke with an associated name and optional multi-line note. Each markup gets one of twenty colors drawn from the TWENTY_COLORS palette (src/types/index.ts), and the markup dialog captures a name and note on save. Markups show up in the Annotation Panel at the bottom of the viewer under the MARKUPScategory, and they're persisted to the annotations table with source = "user".
Menu dropdown
The menu collects operations that don't belong on the main toolbar: a labeling wizard for building YOLO training sets, a settings modal, a toggle for the Page Intelligence panel, a link to the admin dashboard, and a help tips toggle that reveals contextual tooltips across the UI.Export PDFis present but disabled — it's the obvious future feature.
YOLO controls in the toolbar
When a project has any YOLO annotations loaded, the purple YOLObutton appears. It toggles the canvas overlay and opens the Detection Panel. When multiple models are loaded, the dropdown chevron reveals a per-model panel with independent confidence sliders — useful for tuning the output on a project where yolo_medium is noisy but yolo_precise is conservative.
Admin → AI Models and start a SageMaker Processing job. Section 05 explains the full pipeline.Right-side panel toggles
The right half of the toolbar holds six panel toggles. Panels slide in from the right edge and can be stacked. Each is independently toggleable and each keeps its own internal state.
| Panel | Purpose | Lives in |
|---|---|---|
| Text | OCR text viewer, searchable, shows per-word confidence from Textract. | TextPanel.tsx |
| CSI | Detected CSI MasterFormat codes grouped by division. Page / project scope. | CsiPanel.tsx |
| LLM Chat | Project- or page-scoped chat with 20 tools. Streams via SSE. | ChatPanel.tsx |
| QTO | Quantity takeoff: Count, Area, Linear, Auto-QTO, and All tabs. | TakeoffPanel.tsx |
| Schedules/Tables | Parsed tables with Auto / Guided / Manual / Compare tabs and Map Tags. | TableParsePanel.tsx |
| Keynotes | Detected keynote symbols (circles, hexagons) with per-shape summaries. | KeynotePanel.tsx |
Canvas overlays
The canvas mounts several overlay layers on top of the rendered page. They stack in a stable z-order and each can be toggled or filtered independently. All overlays operate in normalized 0–1 page coordinates so they stay aligned when the user zooms or the page dimensions change across pages.
SearchHighlightOverlay— yellow boxes around tsvector search hits.TextAnnotationOverlay— boxes around detected phone numbers, equipment tags, room names, and other text-annotation matches.KeynoteOverlay— detected keynote shapes (circles, hexagons, diamonds) with their inner text.AnnotationOverlay— the main YOLO + user markup layer. Click-to-select, click-to-edit.ParseRegionLayer— saved table parse regions, click to jump to the parsed data.GuidedParseOverlay— the live grid lines rendered while tuning a Guided Parse.DrawingPreviewLayer— the rubber-band preview while the user is drawing a new markup.
State management: the 17 slice hooks
The viewer's state lives in a single Zustand store at src/stores/viewerStore.ts (1,986 lines). The store is large but access is scoped through seventeen slice hooks— each hook returns a narrow set of fields memoized by useShallow, so components only re-render on changes to their own slice.
The canvas render gate (drift hazard)
src/components/viewer/AnnotationOverlay.tsx is the center of the drawing logic — 2,581 lines that handle every canvas mode, hit testing, bucket fill commit, split-area, vertex edit, polygon drawing, symbol search, markup, calibration, and keynote/table parse region selection. The file has one structural trap that bit the group-tool fix on 2026-04-19 and keeps coming back: adding a new mode requires touching four places.
The companion architecture doc at featureRoadMap/BPArchitecture_422.md contains the full mode table and exact line numbers if you're about to add a new tool.
Scale calibration and measurement units
Before any area or linear takeoff will produce real-world numbers, the user has to calibrate the page scale. You click Set Scale in the Area tab, click two points on a known dimension (a grid line, a labeled wall), and enter the real-world distance plus a unit. Calibration is stored per page in scaleCalibrations[pageNumber]— reusing a polygon on a new page requires recalibrating unless the pages share the same scale.
From PDF to Structured Data
The preprocessing pipeline is the load-bearing part of BP. Everything the viewer, the LLM, and the takeoff engine depend on — CSI codes, page classifications, text annotations, detected tables, cross-references, note blocks, the CSI spatial heatmap, and the CSI network graph — is computed during preprocessing and then read back on demand. This section walks through what actually happens between POST /api/projects and the moment the viewer loads its first page.
Entry point and orchestration
The pipeline is triggered when the projects route creates a project row. On local development, it's invoked inline via processProject(projectId) in src/lib/processing.ts. On AWS, the same function runs inside the cpu-pipeline ECS task, launched by an AWS Step Functions state machine (infrastructure/terraform/stepfunctions.tf). In both cases the public-ID lookup, the processing body, and the post- processing project analysis are identical — the state machine just gives you durable retries, CloudWatch logging, and isolation from the web task.
export async function processProject(projectId: number): Promise<{
pagesProcessed: number;
pageErrors: number;
processingTime: number;
}> {
// ... fetch project, download PDF, count pages ...
// ... mapConcurrent(pageNums, pageConcurrency, processOnePage) ...
// ... analyzeProject + computeProjectSummaries + warmCloudFrontCache
}The 14 per-page stages
The diagram below shows the exact order each page moves through. Every stage is individually wrapped in a try/catch: a failure in Textract doesn't prevent text annotation detection from running on the (possibly empty) output, a failure in CSI detection doesn't prevent heuristics from firing, and so on. Per-page errors are written to pages.error so you can spot partial results in the admin dashboard without the whole project being marked as failed.
- Rasterize at 300 DPI (
rasterizePage()) — the full-resolution PNG for display. This is what the viewer's canvas eventually renders. - Upload PNG + 72 DPI thumbnail to S3 — both get
Cache-Control: public, max-age=31536000, immutableso CloudFront can cache forever. The thumbnail backs the sidebar. - Re-rasterize at a safe DPI if the 300 DPI image exceeds 9500 px in either dimension— Textract rejects images above 10000 px. A 24×36" sheet at 300 DPI is 10800 px; the pipeline re-rasterizes at roughly 263 DPI in that case. The re-rasterized buffer is only used for OCR; the display image stays at 300 DPI.
- OCR via Textract with Tesseract fallback (
analyzePageImageWithFallback()) — produces a structuredTextractPageDatawith per-word bounding boxes and confidence scores. If Textract is unreachable or credentials are missing, it falls through to Tesseract. - Flatten OCR into raw text (
extractRawText()) — the concatenation used by the PostgreSQLsearch_vectorcolumn for/api/search. - Extract the drawing number from the title block (
extractDrawingNumber()). This is what becomespages.name— e.g., "A-101". - Detect CSI codes (
detectCsiCodes()) — the 3-tier matcher. Output is written topages.csi_codes. Section 04 explains the algorithm. - Detect text annotations (
detectTextAnnotations()) — runs the 10 detector modules fromsrc/lib/detectors/registry.ts: contact, codes, dimensions, equipment, references, trade, abbreviations, notes, rooms, csi-annotations. Produces a grouped annotation list with sub-categories. - Analyze page intelligence (
analyzePageIntelligence()) — discipline and drawing-type classification, cross-references to other sheets, note blocks. This is the first place the pipeline produces a structured summary of the page. - Classify text regions (
classifyTextRegions()) — OCR-based identification of where the tables, schedules, legends, and note blocks live on the page. ProducestextRegions[]with confidence scores. - Run the heuristic engine in text-only mode (
runHeuristicEngine()). Rules that do not require YOLO classes fire here. Section 05 explains how YOLO-augmented heuristics re-run later, after a YOLO job completes. - Classify tables (
classifyTables()) — combines text regions and heuristic inferences into classified table candidates (door schedule, finish schedule, keynote table, etc.). - Compute CSI spatial heatmap (
computeCsiSpatialMap()) — divides the page into a 9×9 grid plustitle-blockandright-marginspecial zones and tallies CSI instances per zone. Initial pass is OCR-only; a YOLO pass can refresh later. - Upsert the
pagesrow and rebuild the search_vector via a raw SQLto_tsvector('english', rawText). This is the single write-point for the whole per-page pipeline.
Project-level analysis (after all pages complete)
Once every page has finished (or errored), the pipeline switches gears. It reads all processed pages back, passes them to analyzeProject() which computes the discipline breakdown, hub pages, cross-reference graph, and the CSI network graph via buildCsiGraph(). The result — a structured projectIntelligence blob and a short text projectSummary — is written back to the projects row. A separate computeProjectSummaries() pass then builds the per-index lookup tables (CSI → pages, trade → pages, keynote → pages, text-annotation → pages) that lookupPagesByIndex() reads at O(1) from LLM tool calls.
The final step is a best-effort CloudFront cache warm: each page PNG gets a HEAD request so CloudFront edge locations pull it ahead of the first viewer hit. Failures are logged and ignored.
Concurrency and tuning
Pages run in parallel via a small mapConcurrent() helper with a default limit of 8. The limit is per-company configurable through companies.pipelineConfig.pipeline.pageConcurrency and the Admin → Pipelinetab — raise it on a beefy Fargate task, lower it if Textract throttles you. The spatial grid size is also configurable via pipelineConfig.pipeline.csiSpatialGrid.
textract_data stored, the per-page body is skipped. Re-triggering processing on an existing project (via /api/admin/reprocess) will reuse completed pages and only work on missing or errored ones. To force a full re-run, delete the project rows in the DB first or zero out the textract_data column for the pages you want redone.CSI as a Token-Efficient Blueprint Encoding
The hardest thing about putting a construction project in front of an LLM is the raw token cost. A typical 200-page drawing set runs to ~2 million characters of OCR text — roughly 500,000 tokens — and the useful content is scattered across specifications, notes, schedules, dimensions, legends, and title blocks. Dumping all of that into a context window is both expensive and counterproductive: the model gets lost in the noise.
BP's answer is the CSI engine: a three-layer encoding that turns a page into a structured, compact tag set, turns a project into a navigable graph, and lets the LLM zoom from project-level structure down to individual pages through tool calls rather than by paging through OCR. CSI codes are the primary key — they're a shared vocabulary across all construction documents and map directly to how estimators and specifiers think.
Why CSI and not raw keywords
CSI MasterFormat is an industry-standard classification system maintained by the Construction Specifications Institute. Every specification section has a code like 08 14 00 — Division 08 (Openings), section 14 (Wood Doors), subsection 00 (general). Division is the most useful unit: it maps to trade, it's stable across projects, and it's dense enough that 25 divisions can meaningfully describe any project while being small enough to fit the whole project's division breakdown into a single paragraph of LLM context.
Because CSI is a closed vocabulary, BP can turn a full page of OCR (easily 4–10k characters) into a single short tag list like [22 00 00, 23 05 00, 26 05 00] plus confidence scores and then look up detailed division data on demand through tool calls. That pattern is what lets BP scale LLM chat to 200-page projects without ever exceeding a Sonnet-sized context budget.
Layer 1: per-page detection (3-tier algorithm)
src/lib/csi-detect.ts implements a rule-based matcher against a MasterFormat database. The matcher runs three tiers in order of specificity; a code can be tagged by any tier it passes, and the tier with the highest confidence wins. Defaults:
| Tier | What it matches | Confidence | Why it exists |
|---|---|---|---|
| Tier 1 | Exact consecutive-word subphrase from the MasterFormat description (e.g. 'cast-in-place concrete' anywhere in the OCR). | 0.95 | High-signal: the literal phrase is in the text, which essentially never happens by accident. |
| Tier 2 | Bag-of-words overlap — at least tier2MinWords significant words from the description appear anywhere on the page (stop words excluded). | ≤ 0.75 (tier2Weight) | Catches rephrased matches: 'acoustical ceiling panel' matches 'Acoustical Panel Ceilings' without insisting on word order. |
| Tier 3 | Keyword-anchor — at least tier3MinWords high-signal anchor words match a description. | ≤ 0.50 (tier3Weight) | Fallback: rescues obvious trades (plumbing, electrical, HVAC) when neither subphrase nor bag-of-words hits. |
The matcher keeps only codes whose final score beats matchingConfidenceThreshold (default 0.40). All defaults are overridable per-company through companies.pipelineConfig.csi and the Admin → CSI tab, which also lets admins upload a custom CSI database TSV (useful for trades like fire alarm that benefit from an expanded vocabulary).
const DEFAULT_CONFIG: CsiDetectConfig = {
matchingConfidenceThreshold: 0.4,
tier2MinWords: 3,
tier3MinWords: 5,
tier2Weight: 0.75,
tier3Weight: 0.50,
};Layer 2: per-page spatial heatmap
After detection, computeCsiSpatialMap()bins every CSI-tagged text annotation (and, after a YOLO pass, every YOLO-inferred region) into a 9×9 grid plus two special zones: title-block (y > 0.85) and right-margin(x > 0.75, y < 0.85). The output is a list of zones with per-division counts, which is what the LLM sees when it calls getCsiSpatialMap(pageNumber).
The spatial map is how the LLM answers questions like "what's in the top-right of this sheet?" or "where are the MEP systems concentrated?" without having to scan every word box on the page. The 3×3 demo below is a simplified view of a single page; the real default grid is 9×9.
Layer 3: the CSI network graph
At the project level, buildCsiGraph() converts the per-page CSI tags into a graph: nodes are CSI divisions, edges are co-occurrence relationships between divisions (with three types: co-occurrence, cross-reference, and containment), and clusters are pre-defined groupings: MEP (22, 23, 26, 27, 28), Architectural (08, 09, 12), Structural (03, 05), and Site (31, 32, 33). The graph carries a fingerprint that BP uses as a cache key so it can avoid re-computing the graph when nothing on the project has changed.
The graph is what makes LLM-driven navigation tractable. Tools like getCrossReferences return hub pages ranked by incoming reference count; lookupPagesByIndex({ index: "csi" }) answers "which pages have Division 22?" in O(1). When the LLM wants to find plumbing plans, it doesn't scan 200 pages — it queries the graph once and gets page numbers back.
How the LLM uses all three layers
All three layers surface to the LLM as tool calls. Section 09 walks through the full tool set, but the CSI-specific story is:
getProjectOverview()returns the project-level CSI divisions and cluster membership — the coarse first look.getCsiSpatialMap(pageNumber)returns the per-page heatmap — the "zoom in" query.getCrossReferences(pageNumber?)returns the cross-reference edges and hub pages — navigation.lookupPagesByIndex({ index: "csi", key: "22" })is the O(1) "give me every page tagged with Division 22" query.detectCsiFromText(text)lets the LLM run the 3-tier matcher on arbitrary input strings (e.g. a user's question).
src/lib/context-builder.ts) feeds the CSI network graph into the LLM's system context at priority 1.0— near the top, right after the project report. That means the model sees the division clusters and their edges beforeit sees raw OCR, so its first tool call is almost always a graph query rather than a full-text search. This is how a chat session starts "hot" even on a 200-page project.YOLO Object Detection — Run, Load, Display
YOLO toolbar button only shows and hides already-loaded detections. It does not kick off inference. To actually run YOLO, you go to Admin → AI Models, pick a model and a project, and click Run. The backend launches a SageMaker Processing job and webhook- ingests the results when the job finishes. Running YOLO costs money (GPU instance hours) and is gated behind a per-company feature toggle and an admin-only permission.YOLO in BP is the layer that turns blueprints from textual documents into spatially-aware ones. The text pipeline (Section 03) already extracts OCR, CSI codes, classifications, and tables. What it doesn't know is where the doors, windows, grid lines, tables, and title blocks physicallyare on each page. YOLO solves that. Once YOLO has identified tables, title_block, drawings, door_single, circle, and so on, every downstream feature in BP — Auto-QTO, Map Tags, the spatial heatmap, the heuristic engine, and LLM spatial queries — becomes significantly sharper.
Where the run actually happens
The run path is: admin opens Admin → AI Models, a tab rendered by src/app/admin/tabs/AiModelsTab.tsx, picks a model from the models table and a project, confirms the cost warning, and clicks Run. That fires POST /api/yolo/run, which writes a new processingJobs row and calls startYoloJob() in src/lib/yolo.ts. That function creates an AWS SageMaker Processing job pointing at the YOLO ECR container, mounts the project's pages/ prefix in S3 as input, and sets yolo-output/ as the output destination.
While the job runs (usually a few minutes per project on an ml.g4dn.xlarge), the admin UI polls GET /api/yolo/status every ~5 seconds and shows live status, execution ID, and CloudWatch logs. When the container finishes, it writes per-page detection JSONs to S3; a webhook hits POST /api/yolo/load, which reads the JSONs, normalizes them into the annotations table (with source = "yolo"), and triggers a refresh of the CSI spatial heatmap + heuristic engine in YOLO-augmented mode.
Safety toggles
Because a SageMaker Processing job can cost real money if mis-triggered, BP has several layers of safety:
- Company-level
sagemakerEnabledtoggle. Flipped off by default. Flipping it on requires the admin password stored inapp_settings. When off, the entire YOLO run path returns an error immediately without touching AWS. - Quota enforcement. Per-company concurrent-job caps check against the
processingJobstable before starting a new job. Toggleable in the same admin panel. - Per-user
canRunModelsflag. Regular members can view YOLO results but cannot initiate a run. Admins get the flag by default; root admins can grant it selectively. - Root-admin-only model sharing. A YOLO model uploaded by one company is not automatically visible to others. The root admin has to grant model access per-company via the
modelAccesstable.
The Detection Panel
Once a YOLO run completes and results are loaded, they show up in the viewer's Detection Panel (DetectionPanel.tsx, ~780 lines of React). The panel has three sub-tabs:
| Sub-tab | What it shows | How it's built |
|---|---|---|
| Models | Every YOLO annotation grouped by model → class → individual detection. Per-class and per-annotation visibility toggles, a global confidence slider, and a search filter. | Primary view. Reads from annotations where source === "yolo". |
| Tags | YoloTags — user-created tags that bind OCR text (like 'D-01') to specific YOLO shape instances. Created by the Map Tags step (Section 06) or by scan-ins. Each tag shows its instance count, pages, and CSI codes. | Powered by the yolo_tags table. The Tags sub-tab is the main input into Auto-QTO. |
| Shape | Detected primitive shapes on the current page — circles, hexagons, diamonds, etc. Built for keynote tagging and tag-shape discovery. Run on-demand via /api/shape-parse. | Shape-parse is OCR + OpenCV — it does not require a YOLO model run and can be triggered for free. |
Confidence thresholds and filters
Each YOLO model in BP carries a confidence threshold (default 0.25). The threshold applies both to storage (low-confidence detections can be filtered at ingest by the admin config) and to display — the toolbar's per-model slider in the YOLO dropdown filters the overlay live without mutating the underlying data.
On top of confidence, the toolbar exposes a trade filter (dropdown populated from the distinct trades inferred from CSI codes) and a CSI code filter (searchable dropdown). Both apply to the canvas overlay independently of confidence; they let estimators zero in on a single scope without fighting with confidence sliders.
Sample YOLO classes
BP ships reference models trained on construction drawings. The specific classes available depend on which models are registered in the models table for your company. Some commonly-useful classes:
The tables, title_block, and drawings classes are special: Auto-QTO (Section 07) strictly requires them. The drawings class marks the content region of a sheet, and tables + title_block mark regions to exclude from counts (so you don't double-count tags that appear inside a schedule).
How YOLO stacks with heuristics
The heuristic engine (src/lib/heuristic-engine.ts) runs in two modes. Text-only mode fires during the initial processing pass; YOLO-augmented mode re-runs after YOLO data loads. Each rule has optional yoloRequired and yoloBoostersfields. A rule like "if the page contains the word 'concrete' AND a tables class was detected, infer schedule_presentwith CSI division 03" will skip silently during text-only mode and fire when YOLO runs later.
{
id: "concrete-schedule",
outputLabel: "schedule_present",
outputCsiCode: "03",
minConfidence: 0.6,
textKeywords: ["concrete", "mix design"],
yoloRequired: ["tables"], // will skip until YOLO runs
yoloBoosters: ["title_block"], // adds confidence if present
spatialConditions: [
{ type: "contains", region: "tables", textRegion: "header" },
],
}This is the stacking story the rest of the docs will refer back to: YOLO models are not a replacement for heuristics, they're an additional signal that heuristics can chain on top of. A new YOLO class becomes a new input for existing rules, a new input for Auto-QTO, and a new input for the CSI spatial heatmap — without anyone having to touch the rules. Tools stack.
water_heaterclass immediately contributes to Division 22 on the CSI heatmap and graph. That's another place where a new model or a tagged class automatically flows into every other feature without code changes.Auto-QTO: Schedule-Driven Takeoff
Auto-QTO is the pipeline from "I parsed a schedule and mapped its tags to YOLO shapes" to "I have a line-item quantity takeoff ready to export to CSV." It is the feature that most directly translates BP's structured preprocessing into a deliverable an estimator actually sends to a bid. It lives in src/components/viewer/AutoQtoTab.tsx and is the Auto QTO sub-tab of the QTO panel.
What Auto-QTO actually does
Given a material type (doors, finishes, equipment, plumbing, or electrical), Auto-QTO:
- Finds or asks you to parse the relevant schedule page.
- Reads the parsed schedule's tag column.
- Asks you which YOLO tag-shape class the tags are drawn inside.
- Runs Map Tags (Section 06) over the entire project, binding every unique tag value to its YOLO shape instances, while excluding the schedule region itself and the title block so it doesn't double-count.
- Produces a line-item list with counts, pages, and an editable review surface. Estimators can hand-adjust before exporting.
- Exports to CSV / Excel for the bid package.
The thing to understand: Auto-QTO does not invent quantities. It simply counts tag occurrences identified by YOLO + OCR, and the fidelity of the count is a function of the fidelity of the YOLO model and the schedule parse. If the model missed a door, Auto-QTO will miss that count; that's why the review step matters and why the user always has an override.
Preflight — the strict YOLO class requirement
Auto-QTO hard-blocks the material picker unless the project's YOLO run includes three specific classes: tables, title_block, and drawings. These are exclusion / inclusion markers for the counting logic — without them, Auto-QTO can't cleanly differentiate "tags inside the schedule" from "tags out on the drawings."
const QTO_STRICT_EXCLUSION_CLASSES = ["tables", "title_block", "drawings"] as const;
const QTO_RECOMMENDED_CLASSES = ["grid", "vertical_area", "horizontal_area"] as const;If a strict class is missing, Auto-QTO shows a blocker callout with a link to Admin → AI Models: you need to run a YOLO model that has those classes before you can proceed. The recommended classes (grid, vertical_area, horizontal_area) are soft — missing them just produces a warning banner, not a block.
tablesregion and skips it. The same applies to title blocks (which often have a legend that re-uses tag symbols). Without these markers, a schedule with 20 rows of "D-01 D-02 D-03" would double-count every door.Material picker
Auto-QTO starts with the material picker. Each option binds to a schedule category that the table classifier uses when suggesting pages for the schedule step. Custom material types are supported via a free text input — BP singularizes by stripping a trailing s as a rough stem.
const MATERIALS = [
{ type: "doors", label: "Doors", scheduleCategory: "door-schedule", icon: "D" },
{ type: "finishes", label: "Finishes", scheduleCategory: "finish-schedule", icon: "F" },
{ type: "equipment", label: "Equipment", scheduleCategory: "material-schedule", icon: "E" },
{ type: "plumbing", label: "Plumbing", scheduleCategory: "plumbing-schedule", icon: "P" },
{ type: "electrical", label: "Electrical", scheduleCategory: "electrical-schedule", icon: "Z" },
];The 5-step state machine
Once a material is picked, Auto-QTO drops you into a step machine whose state lives in the qto_workflows table. The canonical step IDs come from AutoQtoTab.tsx:11:
const STEP_SEQUENCE = ["select-schedule", "confirm-tags", "map-tags", "review", "done"] as const;- ✓Select Schedule
- 2Confirm Tags
- 3Map Tags
- 4Review
- 5Done
1. select-schedule
Auto-QTO reads from summaries.schedules (built by computeProjectSummaries()) and surfaces pages whose classified tables match the selected material. Each suggestion shows a confidence badge. If nothing is parsed yet, the wizard can launch into the Table Parse panel inline — you parse the schedule, the wizard picks up where you left off.
2. confirm-tags
Auto-QTO reads the parsed schedule's headers and rows and asks the user to confirm the tag column (pre-selected from the parse). This is also where the user picks the tag-shape class from QTO_TAG_SHAPE_CLASSES: circle, arch_sheet_circle, dot_small_circle, hexagon, hex_pill, diamond, triangle, pill, oval, rectangle, square.
3. map-tags
The user clicks Run Mapping. Auto-QTO invokes POST /api/projects/[id]/map-tags-batchwith the schedule's tag column and the selected YOLO shape class. The backend runs Map Tags across every page, excludes regions labeled tables / title_block, and writes the resulting YoloTags. Results stream back as line items with counts per tag value.
4. review
Each row of the review surface is one line item: { itemType, label, yoloClass?, text?, count, pages, annotations }. Auto-QTO flags ambiguity — e.g. if a tag appears on more pages than its schedule row implies — as a QtoFlag. The user can edit counts, add notes, and fix miscategorizations. Edits are stored in qto_workflows.userEdits so they survive a re-run.
5. done
Terminal state. The user exports via TakeoffCsvModal or ExportCsvModal (CSV / Excel). The workflow stays in the project — you can re-enter it later, advance back to review, and re-export if a schedule was updated.
Item types (SHIP 2 taxonomy)
Under the hood, the counting engine supports five item-type strategies via findItemOccurrences() in src/lib/yolo-tag-engine.ts. Auto-QTO almost always defaults to type 4 (yolo-object-with-tag-shape), but the other four are available to composite-classifier and manual QTO workflows:
yolo-only— count instances of a class with no text.text-only— count occurrences of a literal OCR string, no YOLO.yolo-with-inner-text— YOLO shape containing specific text.yolo-object-with-tag-shape— primary object + tag-shape combo (the default for Auto-QTO).text-pattern— detect a repeating tag series (T-01, T-02, T-03, ...).
Demo mode
In demo mode (isDemo === true), Auto-QTO workflows persist in the Zustand store only — they disappear when the tab is closed. The same is true for annotations, markups, takeoff items, and parse results. This is a deliberate design choice so that the public /demo/project/* route can let anonymous users drive a full Auto-QTO workflow without polluting the shared demo project.
Bucket Fill: Click-to-Area
Bucket Fill is BP's answer to the most tedious part of a manual takeoff: tracing polygons around rooms on a 200-page floor plan set. You click once inside a room, and a browser-side Web Worker floods from that seed point, stops at walls (and any virtual barriers you've drawn across open doorways), simplifies the resulting polygon, and hands it back as normalized 0–1 vertices. If the page is scale-calibrated, BP converts those vertices to a real-world area in the unit you chose at calibration time.
Where it lives
Bucket Fill is the top strip of the Area tab inside the QTO panel (src/components/viewer/AreaTab.tsx). It appears as a tri-state button: disabled (no active area item), idle, active, or barrier mode. The state is controlled by two Zustand flags: bucketFillActive and bucketFillBarrierMode. A third store field, bucketFillResolution, drives the dominant tuning knob (see below).
The 8-stage worker pipeline
The client-side Web Worker at src/workers/bucket-fill.worker.tsdoes all the heavy lifting. It's one pass: no retry, no speculative seeding. The single-pass design is the reason the tool feels instant even on a 4096-pixel image: no round-trip to the server, no Python subprocess, just an OffscreenCanvas and a tight TypeScript loop.
Tuning hierarchy — maxDimension is dominant
The four knobs are not equal. If your fill leaks, or stops short, or over-bleeds through text, the order in which to reach for them is:
| Knob | What it does | Default | Secondary effect |
|---|---|---|---|
| maxDimension (dominant) | Largest dimension of the downscaled image before Otsu runs. 1000 / 2000 / 3000 / 4000 slider. | 1000 | Raise when thin wall lines get smeared away at low resolution. Doubles runtime each step. |
| Tolerance | Offset applied to the Otsu threshold. Negative → treat more pixels as walls; positive → more pixels as floor. | 0 | Use to rescue thin walls if raising maxDimension alone isn't enough. |
| Dilation | morphClose radius after threshold. Fills small gaps in line art (1–2 px door-frame breaks). | 3 | Dilation=0 skips morphClose entirely. Use for plans with thin mullions where closing bridges real gaps. |
| Barriers | User-drawn virtual walls to seal open doorways. Drawn by clicking two points in barrier mode. | ∅ | Tertiary. Reach for this when the underlying plan genuinely lacks a wall (e.g. an open doorway you don't want the fill to cross). |
maxDimension preserves the wall. Tolerance and dilation can sometimes rescue a low- resolution fill, but it's much cheaper (in user effort) to bump the resolution first.Text is a wall
Post-2026-04-22, the worker does not pre-erase text blocks. Letter boundaries simply act as dark pixels and the flood stops at them like it stops at walls. The reasoning: pre-erasing OCR'd text was error-prone (it enlarged bboxes and erased parts of adjacent walls) and the user rarely wants a fill to cross text anyway — text in a room almost always labels the room or notes something inside it, which stays inside the polygon.
areaFraction returned from the worker is decorative. It's the pixel-count ratio of the flood, which slightly under-estimates the real room area because the text blocks inside the room are not filled. For reporting, BP uses computeRealArea(vertices, pageW, pageH, calibration) on the traced outer polygon — which correctly includes the text-punctuated interior. Section 7 and src/lib/areaCalc.ts own this math.The workflow
- Open
QTO → Area. - Calibrate the page scale (Set Scale→ click two points → enter distance + unit). Without calibration the areas still render but the quantity column will say "page units" instead of a real measurement.
- Create an area item (name, color) or click an existing one to make it the active target.
- Click the Bucket Fill button to arm. Pick a resolution on the slider (1k / 2k / 3k / 4k).
- Click inside the room you want to measure. The worker runs; you see the preview overlay appear.
- If the fill leaked through an open doorway, toggle Barrier mode. Click two points to draw a virtual wall. Click inside the room again. Repeat until the fill is sealed.
Holes work natively (courtyards, light wells)
For U-shaped rooms and hallways enclosing a courtyard, the worker runs findHoleBorders()after the outer contour trace. Each hole is simplified separately with Douglas– Peucker. The preview overlay uses fill-rule="evenodd" so the courtyard renders as a true hole rather than a filled island, and computeRealArea() subtracts the hole areas from the outer polygon.
Server fallback
When the client worker fails (very old browser, extremely large images, corrupted ImageBitmap), the viewer falls back to the server path: POST /api/bucket-fill → src/lib/bucket-fill.ts → scripts/bucket_fill.py(Python OpenCV). The server path predates the Web Worker and uses an adaptive-threshold algorithm rather than Otsu, so its results can differ on low-contrast images. It's a safety net, not the preferred path.
{
"type": "result",
"polygon": [{ "x": 0.142, "y": 0.388 }, ...],
"holes": [[{ "x": 0.32, "y": 0.51 }, ...]], // evenodd-compatible
"vertexCount": 24,
"areaFraction": 0.017, // decorative — use computeRealArea()
"retryHistory": [...] // present only when worker retried
}Scale calibration and computeRealArea
Bucket Fill returns a polygon in normalized 0–1 coordinates. To turn that into square feet, BP needs two pieces of information: the scale calibration for the current page, and the page's pixel dimensions. src/lib/areaCalc.ts runs computeRealArea(vertices, pageWidth, pageHeight, calibration)— shoelace formula in pixel space, divided by the calibration's pixels-per-unit, returning a real area in the calibrated unit. Holes are subtracted. Supported units:
How it composes with the rest of QTO
Bucket Fill is not a feature on its own — it's an input method into the Area tab of the takeoff panel. Once the polygon is created, it behaves like any other area takeoff entry: the underlying annotations row has source = "takeoff", the group rollup appears in the QTO panel, the item can be edited, re-colored, moved between groups, or exported to CSV with the rest of the project. This is the design pattern the whole tool lives on: new capabilities stack on top of existing ones. Bucket Fill adds a fast path to create area polygons; everything downstream (aggregation, grouping, export) is unchanged.
The LLM Loop: Tool-Making, Agentic Rounds, Context Budgets
BP's LLM integration is the payoff for everything in sections 3–8. The preprocessing pipeline builds structured data, CSI encodes it compactly, YOLO makes it spatially aware, Auto-QTO materializes quantities. The LLM loop is what a user actually talks to, and it reaches into all of that structured data through a tool set, an agentic round loop, and a per-model context budget. This is the densest section in the docs — there's a lot happening under the hood.
The framing
A blueprint LLM has a fundamental problem: it can't read the PDF. Even if you chunk a 200-page drawing set into text, the raw OCR is too noisy (page numbers, dimensions, plot stamps, revision blocks) and too long to fit into a context window while leaving room for reasoning. BP solves this by inverting the flow:
- The LLM does not see the blueprint directly. What it sees is a compact structured summary built by the context builder.
- The LLM gets tools.Twenty of them — the full BP_TOOLS set. They query the pre-computed structured data, run BP engines on arbitrary inputs, and (for a small subset) drive the viewer. Tools are what give the model leverage.
- Tools compose inside an agentic loop. The model can call multiple tools in parallel per round, feed results back, and iterate up to ten rounds per turn before being forced to answer.
- Context budgets are per-model. A Sonnet call gets a very different slice of data than a Groq call. Admins can override priorities per-company via a preset system.
The "LLM tool making" story the user gets isn't a feature in the UI — it's the shape of the tool registry in src/lib/llm/tools.ts and the pattern you use when adding a new tool: write a tool definition with a JSON Schema input, write an executor, flip a switch in executeToolCall(), and the model can call it on the next request. The next subsection enumerates all twenty.
The 20 tools
Every tool in BP_TOOLS gets a card below, pulled live from src/lib/llm/tools.ts. Filter by group. Action tools (the ones that mutate data or drive the viewer) are marked amber so the distinction between "read" and "write" is visually obvious.
searchPagesSearch blueprint pages by text content using full-text search. Returns matching pages with text snippets and relevance scores. Use when looking for specific topics, materials, equipment, or references.
- query: string *
getProjectOverviewGet the full project map: discipline breakdown, page classifications, all trades, all CSI codes, schedule catalog, annotation summary counts, takeoff totals, and pre-computed page indexes. THIS SHOULD BE YOUR FIRST TOOL CALL — it gives you a complete overview before drilling into specifics.
getPageDetailsGet comprehensive intelligence for a specific page: classification (discipline, drawing type), cross-references to other sheets, general note blocks, detected text regions, heuristic inferences with evidence, classified tables/schedules, parsed schedule data with rows, CSI spatial heatmap, CSI codes, text annotations (37 types), and keynotes.
- pageNumber: number *
lookupPagesByIndexInstant O(1) lookup: which pages contain a specific CSI code, trade, keynote, or text annotation. Reads from pre-computed indexes — much faster than searching. Use for questions like 'which pages have Division 08?' or 'where is the electrical trade?'
- index: string *
- key: string *
getAnnotationsGet YOLO object detections and user markups, optionally filtered by page, class name, source type, or minimum confidence. Returns bounding boxes, class names, confidence scores, CSI codes, and keywords.
- pageNumber: number
- className: string
- source: string
- minConfidence: number
getParsedScheduleGet structured data from a parsed table or schedule on a page. Returns column headers, data rows as dictionaries, tag column identifier, and CSI codes. Use for door schedules, finish schedules, equipment lists, keynote tables.
- pageNumber: number *
- category: string
getCsiSpatialMapGet zone-based heatmap showing where CSI construction divisions are concentrated on a page. Divides page into 9 zones (3x3 grid) plus title-block and right-margin zones. Each zone lists which divisions appear and how many instances. Use for 'what's in the top-right corner?' or 'where are the MEP systems?'
- pageNumber: number *
getCrossReferencesGet sheet-to-sheet reference graph. Returns edges (which pages reference which), hub pages (referenced by 3+ other pages), and leaf pages. Use for 'what references A-501?' or 'what are the key hub pages?' Omit pageNumber for full project graph.
- pageNumber: number
getSpatialContextGet OCR text mapped into YOLO spatial regions (title_block, legend, drawing_area, grid, etc.). Shows what text is inside each detected region. Use for 'what's in the title block?' or 'read the legend.'
- pageNumber: number *
getPageOcrTextGet the full raw OCR text for a page. This is the complete extracted text without any structuring. Use as a fallback when structured tools don't have what you need, or when you need to read the full page content.
- pageNumber: number *
detectCsiFromTextRun CSI MasterFormat code detection on arbitrary text. Returns matching CSI codes with descriptions, trades, and divisions. Use to identify what construction category a piece of text belongs to.
- text: string *
scanYoloClassTextsFind all unique OCR texts inside YOLO annotations of a specific class. Use to discover what labels exist inside circles, doors, or any detected shape. Specify pageNumber for fast single-page scan, or omit for full project scan.
- yoloClass: string *
- yoloModel: string
- pageNumber: number
mapTagsToPagesGiven specific tag text values (like 'D-01', 'T-03'), find every instance. Optionally filter to a YOLO class or specific page. Specify pageNumber for fast single-page search, omit for project-wide.
- tags: string *
- yoloClass: string
- yoloModel: string
- pageNumber: number
detectTagPatternsAuto-discover repeating YOLO+OCR patterns across the project. Finds groups like 'circles containing T-01, T-02, T-03...' or 'diamonds with EQ-01, EQ-02...'. Returns pattern groups with instance counts, unique values, and confidence. Requires YOLO data to be loaded.
getOcrTextInRegionRead OCR text inside a specific rectangular region on a page. Coordinates are normalized 0-1 (top-left origin). Use to read text in a specific area of the drawing.
- pageNumber: number *
- minX: number *
- minY: number *
- maxX: number *
- maxY: number *
navigateToPageactionNavigate the blueprint viewer to a specific page. The user will see the page change in their viewer. Use when you want to show them a specific drawing.
- pageNumber: number *
highlightRegionactionHighlight a rectangular region on a page with a pulsing cyan outline. Use to point the user to a specific area — a detected table, a door tag, a note block, etc. Coordinates are normalized 0-1.
- pageNumber: number *
- minX: number *
- minY: number *
- maxX: number *
- maxY: number *
- label: string
createMarkupactionCreate a persistent markup annotation on the blueprint with a name and optional notes. Use when the user asks you to mark, flag, or annotate something for later reference.
- pageNumber: number *
- minX: number *
- minY: number *
- maxX: number *
- maxY: number *
- name: string *
- note: string
addNoteToAnnotationactionAppend a note to a specific annotation by ID. Notes are appended (never overwritten) to preserve existing user notes. Use when the user asks to annotate, comment on, or flag a specific detection.
- annotationId: number *
- note: string *
batchAddNotesactionAppend a note to ALL annotations matching a filter. Notes are appended to each annotation's existing notes. Use for bulk operations like 'add a note to all door detections on page 5' or 'flag all low-confidence detections'.
- note: string *
- pageNumber: number
- className: string
- source: string
- minConfidence: number
BP_TOOLS in src/lib/llm/tools.ts. Total: 20 tools.Why these particular tools exist
The set is small by design. Each tool corresponds to one of the structured surfaces BP already maintains, rather than being a low-level primitive the model has to compose. An LLM given 20 purpose-built tools will pick the right one faster than one given 80 composable primitives.
- Navigation tools —
getProjectOverview,getPageDetails,lookupPagesByIndex,getCrossReferences— answer "where is" questions without paging through pages. - Structured reads —
getAnnotations,getParsedSchedule,getCsiSpatialMap,getSpatialContext— pull a single structured chunk at a time, so the model can ask for exactly what it needs. - Text fallback —
searchPagesandgetPageOcrTextlet the model hit raw OCR only when structured data is insufficient. Raw OCR sits at priority 10 in the context builder for the same reason: last resort. - Engine invocation —
detectCsiFromTextlets the model run the 3-tier CSI matcher on a user's phrase.detectTagPatternsruns the tag-pattern detector. Tools can wrap BP engines so the model can do analysis on the fly. - YOLO tag tools —
scanYoloClassTexts,mapTagsToPages,getOcrTextInRegion— bridge between OCR text and YOLO regions. These are what let the model answer "how many doors have a 90-minute fire rating on the second floor" by joining schedule rows to shape detections. - Action tools (amber) —
navigateToPage,highlightRegion,createMarkup,addNoteToAnnotation,batchAddNotes. The viewer interprets these as side effects. The model can say "show me page 42" and the viewer actually scrolls there.
The agentic loop
BP's chat endpoint (POST /api/ai/chat) invokes streamChatWithTools() on the configured adapter. All three SDK adapters (anthropic.ts, openai.ts, groq.ts) implement the same interface:
async *streamChatWithTools(options: LLMToolUseOptions): AsyncIterable<ToolStreamEvent> {
const maxRounds = options.maxToolRounds ?? 10;
const tools: Tool[] = options.tools.map(toAnthropicShape);
const msgHistory = prepareMessages(options.messages);
for (let round = 0; round < maxRounds; round++) {
const stream = await client.messages.stream({ model, system, messages: msgHistory, tools, ... });
for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta.type === "text_delta")
yield { type: "text_delta", text: event.delta.text };
else if (event.type === "content_block_start" && event.content_block.type === "tool_use")
yield { type: "tool_call_start", name: event.content_block.name, id: event.content_block.id };
}
const finalMsg = await stream.finalMessage();
const toolUseBlocks = finalMsg.content.filter(b => b.type === "tool_use");
if (toolUseBlocks.length === 0 || finalMsg.stop_reason !== "tool_use") {
yield { type: "done" };
return;
}
const toolResults = [];
for (const block of toolUseBlocks) {
const result = await options.executeToolCall(block.name, block.input);
yield { type: "tool_call_result", name: block.name, id: block.id, result: JSON.stringify(result) };
toolResults.push({ type: "tool_result", tool_use_id: block.id, content: JSON.stringify(result) });
}
msgHistory.push({ role: "assistant", content: finalMsg.content });
msgHistory.push({ role: "user", content: toolResults });
}
yield { type: "text_delta", text: "\n\n(Reached maximum tool call rounds)" };
yield { type: "done" };
}The key behaviors to notice: text deltas stream live (the user sees the response materialize a word at a time); tool calls don't block the stream (the model's reasoning text shows up before the tools execute); each round's tool calls run in parallel before the next LLM turn starts; the loop terminates as soon as stop_reason !== "tool_use" or after 10 rounds, whichever comes first. On Opus-sized models, 3 rounds is typical; 10 is a safety cap, not an expected value.
Context budgets per model
Before the loop even starts, the server calls assembleContextWithConfig() in src/lib/context-builder.ts. The function takes a list of candidate sections (CSI codes, classification, annotations, parsed tables, etc.), sorts them by priority, and packs them into a character budget chosen for the current model. Bigger-window models get more context; smaller free-tier models stay lean to leave room for tool rounds.
| Provider | Model | Char budget | ~Tokens |
|---|---|---|---|
| anthropic | claude-opus-* | 200,000 | ~50,000 |
| anthropic | claude-sonnet-* | 80,000 | ~20,000 |
| anthropic | claude-haiku-* | 30,000 | ~7,500 |
| openai | gpt-4o* | 60,000 | ~15,000 |
| openai | gpt-4* (Turbo) | 40,000 | ~10,000 |
| openai | o1 / o3 | 80,000 | ~20,000 |
| groq | any | 24,000 | ~6,000 |
| custom | Ollama / self-hosted | 30,000 | ~7,500 |
| (fallback) | DEFAULT_CONTEXT_BUDGET | 24,000 | ~6,000 |
The fallback default is DEFAULT_CONTEXT_BUDGET = 24000 characters — ~6000 tokens — which is what unknown providers and unknown models get.
Section registry and presets
SECTION_REGISTRYenumerates the 20 sections the context builder can assemble into a page- or project-scope prompt. Each has a default priority (lower = earlier, higher priority) and a description. At run time the builder sorts by priority, computes per- section budgets from the admin's preset or per-company overrides, fills each section to its budget, and truncates anything that overflows. Unused allocations flow into an overflow pool so the next section can use the slack.
There are three presets in SECTION_PRESETS:
| Preset | Shape | When to use |
|---|---|---|
| balanced | Equal-share allocation across every enabled section. Simple and predictable. | Default for general-purpose chat. Unopinionated. |
| structured | Front-loads parsed-tables (25%), spatial-context (12%), csi-codes (11%), yolo-counts (10%), csi-spatial (9%), detected-regions (5%), raw-ocr (1%). | When the project has well-parsed schedules and you want the model to reason from structured data, not from OCR. Best for takeoff questions. |
| verbose | Front-loads raw-ocr (40%), spatial-context (15%), parsed-tables (10%). | Exploratory work on projects that aren't fully preprocessed. The model gets more text to read, at the cost of less structure. |
SECTION_REGISTRY with 20 sections. The global dashboard chat (the widget on /home) uses GLOBAL_SECTION_REGISTRY— 6 sections focused on cross-project discovery (project catalog, discipline breakdown, CSI summary, detection counts, search results, search OCR). Same loop, different data surface.Provider selection
The adapter is chosen by src/lib/llm/resolve.ts based on the llm_configs table and, optionally, per-user overrides from user_api_keys. The fallback chain is:
- The user's API key (from
user_api_keys, encrypted at rest). - Company-wide config from
llm_configs(set by company admin). - Environment variable (
ANTHROPIC_API_KEY,OPENAI_API_KEY,GROQ_API_KEY).
The adapter interface is identical across providers — LLMClient in src/lib/llm/types.ts defines streamChat() and streamChatWithTools(). Adding a new provider is a matter of writing a new file in src/lib/llm/ that implements the interface and wiring it into resolve.ts. For OpenAI-compatible endpoints (Ollama, self-hosted vLLM, llama.cpp servers), the existing openai.tsadapter works directly — you set provider = "custom" and a baseUrl.
Where to configure all of this
The user-facing surface is Admin → LLM Context (src/app/admin/tabs/LlmContextTab.tsx). It exposes:
- Enable / disable each of the 20 sections per company.
- Override any section's default priority.
- Pick a preset or set custom percent allocations.
- Inspect post-assembly section metadata (included, truncated, char count) — so admins can see exactly what made it into the prompt.
- Edit the system prompt (overrides
DEFAULT_SYSTEM_PROMPT). - Attach company-specific domain knowledge (free-text).
The LLM provider / model picker lives next door in Admin → AI Models → LLM Config. Both pages write to the same set of tables and both updates take effect on the next chat turn (no deployment required).
Admin Dashboard
The admin dashboard at /adminis where every company-level tuning knob lives: YOLO model management, CSI detection thresholds, heuristic rules, LLM provider configuration, user and invite management, pipeline concurrency, text-annotation detector toggles, and root-admin-only settings. It is deliberately flat — 14 tabs across the top of one page — rather than hidden behind wizards, because the intended audience is technical.
The 14 tabs
| Tab | What it controls | Backing route(s) |
|---|---|---|
| Overview | System health snapshot: recent parses, running jobs, disk usage, quotas. | /api/admin/parser-health, /api/admin/recent-parses, /api/admin/running-jobs |
| Projects | Every project in the company. Filter by status, bulk re-trigger processing, delete. | /api/admin/reprocess, /api/projects/[id] |
| AI Models | Upload and register YOLO models. Run SageMaker Processing jobs. Configure the LLM provider + default model. House the sagemakerEnabled kill switch + quota. (See Section 05 for the run path.) | /api/admin/models, /api/yolo/run, /api/admin/llm-config, /api/admin/toggles |
| Users | Per-company user list, invites, password resets, canRunModels grants. | /api/admin/invites, /api/admin/users/reset-password |
| Companies | Root admin only. Create companies, assign root admin, configure pipelineConfig per company (CSI thresholds, heuristics, pageConcurrency, csiSpatialGrid). | /api/admin/companies (root only) |
| CSI | Company CSI detection config (threshold + tier weights), custom CSI database upload, re-run CSI on all annotations after a database change. | /api/admin/csi/config, /api/admin/csi/upload, /api/admin/models/reprocess-csi |
| Heuristics | Built-in rules (enable/disable) + custom rules. Each rule supports text keywords, yoloRequired, yoloBoosters, spatial conditions, output labels, output CSI codes. | /api/admin/heuristics/config |
| Table Parse | Tuning defaults for Auto Parse, Guided Parse propose endpoints. Controls rowTolerance, minColGap, minHitsRatio defaults per company. | Same pipelineConfig fields |
| Page Intelligence | Classifier tuning, cross-ref detector config. Test on specific pages. | /api/admin/pipeline |
| Text Annotations | Enable/disable the 10 detector modules, view counts, configure regex patterns for custom detectors. | /api/admin/text-annotations/config |
| AI RBAC | Per-role tool access control. Lock individual LLM tools out of non-admin roles. | llm_configs + role table |
| LLM Context | Section registry enable/disable, priority overrides, preset (balanced / structured / verbose), system prompt, domain knowledge, per-section telemetry. | /api/admin/llm-config, pipelineConfig.llm |
| Pipeline | pageConcurrency (default 8), csiSpatialGrid (default 9×9), queue visibility. | /api/admin/pipeline |
| Settings | App settings, feature flags, non-sensitive env var reveal. Root admin only. | /api/admin/app-settings (root) |
Root admin vs company admin
BP is multi-tenant at the row level. Every user-visible table carries a company_id, and every /api/admin/* route runs a row-scope check in src/lib/audit.tsbefore reading or writing. A company admin sees their own company's projects, users, CSI config, heuristics, and LLM configs — nothing cross-company.
A root admin (a user with isRootAdmin = true) bypasses company scoping. Root admins can create new companies, assign root admins, edit any company's pipelineConfig, reveal global app settings, and flip the sagemakerEnabled toggle on any company. There is intentionally no UI for "become root admin" — the bit is set directly in the database by a system operator.
sagemakerEnabled toggle and the quota kill switch both require an admin password stored in app_settings. The password check is enforced in /api/admin/toggles. This is a belt-and- suspenders precaution on top of the RBAC — destructive toggles shouldn't be one forgotten session away from flipping.System Architecture
This section is a tour of where BP actually runs on AWS and how the pieces fit together. It is intentionally not a deployment tutorial — the README and the Terraform variables file are better starting points for that. The goal here is to answer questions like "what talks to what," "where does the LLM call come from," and "what part of this would I rip out if I wanted to run BP offline."
Topology at a glance
A browser hits CloudFront at assets.*for page images and thumbnails, and the ALB at the primary domain for everything else. The ALB routes HTTPS to two ECS services: the main app (a Next.js container) and Label Studio (a separate labeling UI for training data work). Both run in Fargate — no EC2 to manage. Secrets come from Secrets Manager; the DB is RDS PostgreSQL; durable storage is S3 behind CloudFront.
The main app service
blueprintparser-app is the Next.js 16 container defined in infrastructure/terraform/ecs.tf. Task definition: 2 vCPU / 4 GB. It serves the entire React app, handles every API route, runs Drizzle queries against RDS, pushes processing jobs to Step Functions, and proxies LLM calls. Auto scaling is CPU-based (target 70%) with memory as a guardrail (80%). A circuit breaker is enabled on deployments so a broken image rolls back automatically.
name = "blueprintparser-app"
cpu = 2048 // vCPU × 1024
memory = 4096 // MiB
container_image = "{{ecr}}/beaver_app:latest"
container_port = 3000
health_check = { path = "/api/health", interval = 30, timeout = 5 }
execution_role = "beaver_ecs_execution_role" // ECR pull, logs, secrets read
task_role = "beaver_ecs_task_role" // S3, Textract, SageMaker, SFN
desired_count = var.ecs_desired_count // auto-scaled
deployment_controller = "ECS"
circuit_breaker = { enable = true, rollback = true }The app task needs direct access to Secrets Manager (to pull DATABASE_URL, NEXTAUTH_SECRET, LLM keys), S3 (to write uploads and read page images), Textract (OCR), SageMaker (start/stop jobs), and Step Functions (start executions). Those are all attached to the task role in iam.tf.
The cpu-pipeline task
Long-running processing is offloaded to a second ECS task named blueprintparser-cpu-pipeline. It's the same container image as the main app, just started with a different command (node scripts/process-worker.js) and a much bigger footprint (8 vCPU, 16 GB memory). The task runs the full preprocessing pipeline for a single project and then exits. This keeps the web task responsive during heavy PDF ingest.
The state machine in stepfunctions.tf(blueprintparser-process-blueprint) is what starts cpu-pipeline tasks. It's a straight line: ValidateInput → CPUProcessing → ProcessingComplete with a failure branch. Retries happen on TaskFailed with a 30-second interval and 2.0× backoff, up to 2 attempts. The state machine logs to a CloudWatch log group (/aws/states/blueprintparser-process-blueprint) for debuggability.
processProject() runs inline from the /api/projectshandler in a fire-and-forget promise. Same code, no state machine. This is why the local tier in Section 01 works — you don't need anything AWS just to see the pipeline run.SageMaker Processing for YOLO
YOLO inference runs out-of-band on SageMaker Processing jobs. BP calls sagemaker:CreateProcessingJobfrom the app task with inputs pointing to the project's pages/ prefix in S3 and outputs pointing to yolo-output/. The container image comes from a second ECR repo (beaver_yolo_pipeline) built separately from the app image. The default instance type is ml.g4dn.xlarge, billed per run, which is the exact reason the sagemakerEnabled toggle exists.
Storage layout
S3 is the durability layer. The bucket is blueprintparser-data-{account_id} and the layout is stable:
{dataUrl}/ // {company_id}/{project_public_id}
├── original.pdf // raw upload
├── thumbnail.png // 72 DPI cover image
├── pages/
│ ├── page_0001.png // 300 DPI display image
│ └── page_0002.png
├── thumbnails/
│ ├── page_0001.png // 72 DPI thumbnail
│ └── page_0002.png
├── yolo-output/ // written by SageMaker
│ ├── page_0001_detections.json
│ └── page_0002_detections.json
└── exports/
├── takeoff.csv // user-exported CSVs
└── labels.zip // Label Studio exportsEvery file under pages/ and thumbnails/ is cached as public, max-age=31536000, immutableso CloudFront holds them forever. The cache-warming pass at the end of preprocessing primes edge locations so the first viewer open is fast. Filenames include the page number so they're effectively content-addressed.
Database storage uses PostgreSQL 16 on a db.t4g.mediumwith 50 GB gp3 that can auto-grow to 200 GB. Backups retained 7 days. Multi-AZ in production. All writes go through Drizzle; the schema lives in src/lib/db/schema.ts.
The database schema at 50,000 feet
| Table | Purpose |
|---|---|
| companies | Multi-tenant boundary. Holds pipelineConfig (CSI thresholds, heuristics, pageConcurrency, csiSpatialGrid). |
| users | Auth + RBAC. isRootAdmin, canRunModels, companyId. |
| sessions | NextAuth session tokens. |
| projects | One row per uploaded PDF set. status, numPages, projectIntelligence JSONB, projectSummary text. |
| pages | One row per page. rawText, drawingNumber, csiCodes, textAnnotations, pageIntelligence JSONB, search_vector tsvector. |
| annotations | YOLO + user markups + takeoff items. bbox, className, confidence, source, data JSONB. |
| yolo_tags | Map Tags output — tag text ↔ YOLO shape instances. |
| qto_workflows | Auto-QTO state machines with materialType, step, parsedSchedule, lineItems, userEdits. |
| takeoff_groups | Groups in the takeoff panel sidebar. |
| takeoff_items | Individual takeoff items (count/area/linear) organized into groups. |
| chat_messages | Conversation history, keyed by project + page + scope. |
| llm_configs | Company- or user-scoped LLM provider + model + encrypted API key + context section overrides. |
| user_api_keys | User-level API keys (encrypted at rest). |
| models | YOLO model registry — name, type, s3Path, config, isDefault. |
| model_access | Per-company access grants for models owned by another company. |
| processing_jobs | SageMaker / Step Functions job tracking with status + CloudWatch refs. |
| labeling_sessions | Label Studio integration state. |
| app_settings | Global key/value (root admin only). Includes the sagemakerEnabled toggle password. |
| audit_log | Admin action history. |
Terraform file map
The full stack is in infrastructure/terraform/. Each file has a single responsibility:
infrastructure/terraform/ — 13 filesmain.tfProvider + backend + top-level module wiring.variables.tfAll tunable inputs (region, sizing, domain name, ACM arn, etc.).terraform.tfvars.exampleTemplate for per-environment variable values.terraform.tfvarsActual (gitignored in most setups) per-env values.outputs.tfExported outputs: ALB DNS, ECR repo, RDS endpoint, S3 bucket, etc.vpc.tfVPC, public and private subnets, NAT, route tables, security groups.ecs.tfECS cluster, task defs (app / cpu-pipeline / label-studio), services, auto-scaling.ecr.tfECR repositories for the app image and the YOLO inference image.rds.tfPostgreSQL 16 instance, subnet group, parameter group, backups.s3.tfData bucket, CloudFront distribution with OAC, CORS, range requests.iam.tfExecution + task roles (S3, Textract, SageMaker, SFN) and Step Functions role.secrets.tfSecrets Manager entries for DATABASE_URL, NEXTAUTH_SECRET, LLM keys, etc.stepfunctions.tfState machine definition + CloudWatch log group for the processing pipeline.
Label Studio side-car
Label Studio runs as a separate ECS task with an EFS volume mounted at /label-studio/data. The ALB routes labelstudio.* to the task; the main app integrates via /api/labeling/* routes. It reads from the same S3 bucket the main app writes to, so round-tripping a project from ingest to labeling and back works without cross-service copying.
Running without AWS
Everything in this section is the deployed tier. You can ignore most of it and still run BP: the repo ships a docker-compose.yml that brings up a local PostgreSQL on port 5433 and lets you run npm run devagainst it. Textract, S3, and SageMaker are all gated by env vars — when they're missing, BP falls back to Tesseract for OCR, the local filesystem for images (or a dev-mode S3 emulator if you prefer), and nothing for YOLO. LLM chat still works if you have a Groq free-tier key. The table parsers (img2table, Camelot, TATR) all run locally from scripts/. Bucket Fill runs locally. Auto-QTO runs locally given a parsed schedule. The only hard dependency on AWS is YOLO inference.
How BP Works — for LLMs
This section is written for a language model reading the BP codebase cold. It packs the shape of the system, the construction vocabulary, the load-bearing file paths, and the known traps into one place. If you're a human reader you can still use it — it's just unusually dense because the target reader has a large context window.
[LLM-NAV:slug] anchor near its heading so you can jump in from a grep. For the full, file:line-grade navigation manual, the companion doc is featureRoadMap/BPArchitecture_422.md— this section is the client-readable summary of that doc.Canonical mental model (read this first)
[LLM-NAV:mental-model]
BlueprintParser is a graph builder for construction PDFs. It turns a multi-page drawing set into two data axes:
- Horizontal (per page)— each page carries OCR, CSI codes, text annotations, detected tables, a classification, and (optionally) YOLO detections. Stored on the
pagesrow with a largepageIntelligenceJSONB. - Vertical (project-wide) —
annotations(YOLO + user markups + takeoff),takeoff_items(count / area / linear),yolo_tags(schedule tag ↔ shape instance), andprojectIntelligence(the project-level summary: disciplines, CSI network graph, hub pages).
Everything downstream — the LLM chat, the viewer, takeoff, the CSI spatial heatmap, Auto-QTO — reads from those two shapes. The fastest way to get oriented in the code is: read src/lib/db/schema.ts, src/types/index.ts, and src/lib/processing.ts#processProject().
Construction glossary
[LLM-NAV:glossary]
BP's code uses construction-industry terms unapologetically. A model that doesn't know these will miscalibrate what the code is doing. Each card below gives the plain-English definition and the BP surface it appears in.
File:line landmarks (the 20 that matter most)
[LLM-NAV:landmarks]
| Symbol | File:line | Why it matters |
|---|---|---|
processProject | src/lib/processing.ts:165-605 | Auto pipeline entry. 14 per-page stages + project rollup. |
mapConcurrent | src/lib/processing.ts:35-52 | Worker-pool concurrency limit. Default 8. |
analyzePageImageWithFallback | src/lib/textract.ts:~315 | 3-tier OCR fallback: full Textract → half-res → Tesseract. |
detectCsiCodes | src/lib/csi-detect.ts | 3-tier matcher. Returns CsiCode[] with trade + division + confidence. |
findOccurrences | src/lib/tag-mapping/find-occurrences.ts:171 | Tag-mapping entry. Dispatches to 5 matcher types + composes scores. |
processFill | src/workers/bucket-fill.worker.ts:453 | 8-stage flood-fill pipeline. Text is a wall. |
computeRealArea | src/lib/areaCalc.ts | Shoelace → calibrated sqft. Truth for area takeoffs. |
streamChatWithTools | src/lib/llm/anthropic.ts:85-169 | Agentic tool-use loop. maxRounds=10. |
BP_TOOLS | src/lib/llm/tools-defs.ts | 20 tool definitions. Client-safe (no db/fs). |
executeToolCall | src/lib/llm/tools.ts | Server-side tool executors. Full db + fs access. |
canvasWantsEvents | src/components/viewer/AnnotationOverlay.tsx:2510 | Render-gate condition #1. Also touch L2521, L2550, L2554. |
useViewerStore | src/stores/viewerStore.ts:609 | Zustand store. 17 slice hooks, L1675 onward. |
resetAllTools | src/stores/viewerStore.ts | Canonical tool reset. Compose into when adding a new tool. |
focusAnnotationId | src/stores/viewerStore.ts | One-shot signal. Read-and-clear pattern. |
assembleContextWithConfig | src/lib/context-builder.ts | LLM prompt assembly. Priority-sorted section packing. |
ALL_DETECTORS | src/lib/detectors/registry.ts:21 | 10 text-annotation detectors. Add here + wire to enable config. |
QTO_STRICT_EXCLUSION_CLASSES | src/components/viewer/AutoQtoTab.tsx:52 | tables, title_block, drawings. Required for Auto-QTO. |
startYoloJob | src/lib/yolo.ts | SageMaker Processing job launch. Only caller: POST /api/yolo/run. |
resolveConfig | src/lib/llm/resolve.ts | Per-company LLM provider + model + key lookup. |
buildCsiGraph | src/lib/csi-graph.ts (~430 LOC) | Project-level CSI relationship graph. fingerprinted cache key. |
The 17 Zustand slice hooks
[LLM-NAV:store-slices]
Every panel, every toolbar button, every canvas overlay reads from one of these slices. Subscribing at the slice level — rather than with a raw useViewerStore(s => s.field) — is how the 1,986-line store doesn't cause cascading re-renders. If you're adding UI that needs state from the store, check this map for the existing slice before creating a new one.
The 20 LLM tools — when to call each
[LLM-NAV:tool-selection]
Section 9 has the full tool grid. This subsection is the selection heuristic: given a user question, which tool should an LLM call first?
| User asks about… | First tool to reach for | Reasoning |
|---|---|---|
| Project overview / disciplines | getProjectOverview | One call, returns cluster summary. Always cheaper than scanning pages. |
| A specific page | getPageDetails(pageNumber) | Structured summary first, then raw OCR only if needed. |
| Pages containing Division X | lookupPagesByIndex({index:\"csi\", key:\"X\"}) | O(1). Don't iterate every page. |
| Cross-references / hub pages | getCrossReferences | Returns edges and ranked hubs from the graph. |
| Text location on a page | searchPages / getOcrTextInRegion | Search is ts_vector; in-region is bbox-scoped OCR. |
| Parsed schedules | getParsedSchedule(pageNumber) | Headers + rows, already structured. Don't re-parse from OCR. |
| Spatial layout of a page | getCsiSpatialMap / getSpatialContext | 9×9 heatmap + YOLO-joined text. |
| YOLO detections | getAnnotations({source:\"yolo\"}) | Filter-based, returns bboxes + classes + confidence. |
| Tag instances across project | mapTagsToPages | Bridges schedule rows to drawing shapes. Cached per tag list. |
| CSI code for arbitrary text | detectCsiFromText(text) | Runs the 3-tier matcher on input you provide. |
| Jump the viewer to page X | navigateToPage({pageNumber}) | Side-effecting action. The user sees it happen. |
| Highlight a region | highlightRegion | Cyan pulse on canvas. Drives attention. |
| Persist a new annotation | createMarkup | Mutation. Writes to the annotations table. |
Signal valve state — what BP does NOT do yet
[LLM-NAV:signal-valves]
true as of 2026-04-22. These are reserved for the future Discrepancy Engine; matchers have not been wired to populate them yet.// :131 — windowMatch hardcoded to true (multi-word text coherence not evaluated)
const windowMatch = true;
// :141-142 — two boosts hardcoded to zero
shapeContainBoost: 0, // not yet produced by matchers; future refinement
objectAdjacencyBoost: 0, // not yet produced by matchers; future refinementTranslation for a model reasoning about BP's capabilities: tag mapping scores are conservative. Every returned match has passed a pattern + region-weight + scope check, but the adjacency and shape-containment refinements that would let BP surface subtle discrepancies (e.g. "schedule says 12 doors of type D-01 but only 11 appear on plans") are not yet implemented. Don't claim that capability in responses. Point the user at Section 6 and Section 9 if they ask.
Post-processing flows (the stack-on story)
[LLM-NAV:flows]
Every feature in BP follows the same shape: user action → API route → DB/S3 write → Zustand store update → re-render. If you're reasoning about how a change would propagate, trace that path. The diagram below shows four of the most common flows side-by-side.
Known hazards when editing code
[LLM-NAV:hazards]
| Trap | Where | Symptom |
|---|---|---|
| Canvas render gate drift | AnnotationOverlay.tsx:2510-2527 + :2550 + :2554 | Adding a new canvas mode without touching all four conditions → silent event loss / wrong cursor. |
| csi-detect.ts is server-only | src/lib/csi-detect.ts uses fs | Imports from client components. tsc + vitest pass; Turbopack build fails. Keep it behind route files and server libs. |
| Native binaries on Mac → Linux container | Host npm run build | Ships Darwin binaries that crash at runtime. Always build in Docker / CI. |
| In-memory rate limit + brute-force state | src/middleware.ts, src/lib/auth.ts | Won't scale past one ECS replica. Move to Redis when scaling. |
| focusAnnotationId is one-shot | viewerStore.ts — read + clear | Setting it twice to the same value won't fire the effect unless you clear it between. |
| Python scripts don't talk to S3 | scripts/*.py (except lambda_handler.py) | TS caller handles S3 download → tempdir → subprocess → upload. Don't add boto3 to the Dockerfile. |
| ClientAnnotation.data is a 5-variant union | src/types/index.ts + AnnotationOverlay | Heavy use of as any casts. If you're writing new access patterns, narrow by data.type and avoid adding more any casts. |
| OAuth has no domain allowlist | src/lib/auth.ts | Any email on a matching domain can join an existing company. Fine for self-hosting; dangerous on multi-tenant public deployments. |
How to extend BP (recipes)
[LLM-NAV:extend]
Add a new LLM tool
- Add a tool definition to
BP_TOOLSinsrc/lib/llm/tools-defs.ts(name, description, JSON Schema input). - Implement
execMyTool(input, ctx)insrc/lib/llm/tools.ts+ route it inexecuteToolCall(). - If it's a viewer action, add a handler in
ChatPanel.tsx's tool-result dispatcher. - Test via
POST /api/ai/chatwith a prompt that would trigger the tool.
Add a new canvas tool mode
- Add state to
viewerStore.tsand compose into the right slice hook. - Touch ALL FOUR conditions in
AnnotationOverlay.tsx:canvasWantsEvents(L2510),canvasShouldRender(L2521),pointerEvents(L2550),cursor(L2554). - Add branches in
handleMouseDown/Move/Up. - Compose the tool's state reset into
resetAllTools().
Add a new text-annotation detector
- Create
src/lib/detectors/my-detector.tsexporting aTextDetector. - Add it to
ALL_DETECTORSinsrc/lib/detectors/registry.ts. - Add per-company enable toggle via
Admin → Text Annotations.
Add a new YOLO class
- Register the YOLO model in the
modelstable (Admin → AI Models). - Run a SageMaker job to produce detections.
- (Optional) assign a CSI code to the class in the admin config so every annotation inherits it.
- Downstream features auto-pick up the new class; no code changes needed for Map Tags or Auto-QTO.
Instructions for a model answering a user question about BP
[LLM-NAV:model-behavior]
- Check the context budget. On Opus you have room to include raw OCR; on Groq or Haiku, rely on structured tools and skip raw OCR unless the question demands it.
- Reach for the right tool first.The selection table above is the heuristic: for "where are the plumbing fixtures,"
lookupPagesByIndexis always cheaper thansearchPages. - Ground every quantitative claim in a tool call. Do not invent counts. Auto-QTO is the source of truth for takeoffs;
mapTagsToPagesis the source of truth for tag instance counts. - Prefer actions over prose. If the user wants to see page 42, call
navigateToPagerather than describing it. - Respect the signal-valve state.BP does not currently do adjacency-based cross-schedule discrepancy detection (see the warning above). If the user asks "does the door schedule match the plans," answer with what
mapTagsToPagesreturns and note that you cannot detect subtler mismatches yet.
API Reference
BP exposes roughly 91 HTTP endpoints from Next.js API routes. This reference groups them by domain, with a one-line description for each. Click a method/path row to expand parameters and examples where provided. This is not an OpenAPI spec — for machine-readable schemas, src/lib/llm/tools.ts has JSON Schemas for the LLM tool surface, which is the most formally typed set of endpoints.
public, every endpoint requires a valid NextAuth session. Routes marked admin additionally check user.isAdmin || user.isRootAdmin. Routes marked root require isRootAdmin. Destructive admin toggles require an additional admin password stored in app_settings and checked at the route level in /api/admin/toggles. All authenticated routes enforce row-level multi-tenant scoping through src/lib/audit.ts.Endpoint catalog
Notes on specific routes
A few routes need extra context beyond the short description:
POST /api/ai/chatis Server-Sent Events, not request/response. The response stream yieldsdata:lines encoding a sequence of{ type: "text_delta" | "tool_call_start" | "tool_call_result" | "done" }events. The client reads them as they arrive.DELETEon the same path clears the scoped conversation history.POST /api/yolo/runis the only way to trigger YOLO inference. The request body is{ projectId, modelId }and the response returns the SageMaker execution ID. Watch it viaGET /api/yolo/status. Results ingest via the webhook.POST /api/processing/webhookis the callback surface for Step Functions and SageMaker. Requests are HMAC-SHA256 signed with thePROCESSING_WEBHOOK_SECRETfrom Secrets Manager. Unsigned or mis-signed requests are rejected.POST /api/projects/[id]/map-tags-batchis the heavy-lifter for Auto-QTO. It takes a parsed schedule's tag column, a target YOLO class (or a free-floating-text marker), and runs the mapping across the entire project at once. Expect it to take several seconds for large projects.POST /api/bucket-fillreturns a polygon in normalized 0–1 coordinates, not the image space. The viewer converts to canvas coordinates at render time; areaCalc.ts converts to real-world units using the page's scale calibration.POST /api/csi/detectis the public entry point to the 3-tier CSI matcher (see Section 04). Accepts atextstring in the body, returns an array of matches with codes, descriptions, divisions, trades, and confidence scores./api/demo/*is a parallel, read-only mirror of the project and search routes that does not require auth. These are what power the/demoroute and the docs page's live component demos.
Where to read the source
Every endpoint in the catalog above maps to a file under src/app/api/**/route.ts. The Next.js App Router uses directory-based routing, so /api/csi/detect is src/app/api/csi/detect/route.ts and /api/projects/[id]/map-tags-batch is src/app/api/projects/[id]/map-tags-batch/route.ts. Each file exports HTTP method handlers (GET, POST, etc.) and most hand off immediately to helper functions in src/lib/. Handlers are thin by design — the real logic lives in lib/ and is unit-tested.