Overview
This documentation outlines a high-precision matching engine for stem cell donor registries. It is designed to facilitate the rapid identification of compatible donors for patients by balancing high-recall semantic retrieval with the strict precision required for medical HLA matching.
Technical Architecture
1. Data Ingestion (Text Space Pattern)
Due to server-side infrastructure constraints (psycopg.errors.DiskFull), this vertical utilizes a Text Space manual ingestion pattern for initial validation.
- Corpus: Natural-language donor profiles and patient urgency rows.
- Segmentation: Granular point-level analysis.
- Topic Categorization: Enabled via GLM 5 for unsupervised grouping.
2. Stage 1: Semantic Proximity (Recall)
The system uses unsupervised embedding to surface contextually relevant donors based on clinical descriptions.
- Signal Validation: Patient data utilizing urgency language (e.g., "CRITICAL", "URGENT") successfully clusters adjacent to high-priority donor neighborhoods.
- Findings: Semantic encoding of urgency reliably drives geometric proximity, allowing for rapid triage of time-sensitive requests.
3. Stage 2: Relational Constraints (Precision)
Because stem cell matching requires exact biological compatibility, Composer AI applies a relational constraint layer over the semantic results.
- Compatibility Logic: Validates HLA markers and Rh-factor compatibility.
- Validation Case: Effectively isolates biologically compatible donors from those that are merely semantically similar (e.g., filtering out donors with high clinical urgency matches but incompatible blood types).
Infrastructure Notes
- Primary Blocker: Server-side PostgreSQL disk volume exhaustion.
- Resolution: Manual Data Bridge via hardcoded dictionary in Mantis Coding notebooks.
- Phase 2 Migration: Scaling to 30,000+ row registries via a local WSL2 stack.