Skip to content

Overview

This documentation outlines a high-precision matching engine for stem cell donor registries. It is designed to facilitate the rapid identification of compatible donors for patients by balancing high-recall semantic retrieval with the strict precision required for medical HLA matching.

Technical Architecture

1. Data Ingestion (Text Space Pattern)

Due to server-side infrastructure constraints (psycopg.errors.DiskFull), this vertical utilizes a Text Space manual ingestion pattern for initial validation.

  • Corpus: Natural-language donor profiles and patient urgency rows.
  • Segmentation: Granular point-level analysis.
  • Topic Categorization: Enabled via GLM 5 for unsupervised grouping.

2. Stage 1: Semantic Proximity (Recall)

The system uses unsupervised embedding to surface contextually relevant donors based on clinical descriptions.

  • Signal Validation: Patient data utilizing urgency language (e.g., "CRITICAL", "URGENT") successfully clusters adjacent to high-priority donor neighborhoods.
  • Findings: Semantic encoding of urgency reliably drives geometric proximity, allowing for rapid triage of time-sensitive requests.

3. Stage 2: Relational Constraints (Precision)

Because stem cell matching requires exact biological compatibility, Composer AI applies a relational constraint layer over the semantic results.

  • Compatibility Logic: Validates HLA markers and Rh-factor compatibility.
  • Validation Case: Effectively isolates biologically compatible donors from those that are merely semantically similar (e.g., filtering out donors with high clinical urgency matches but incompatible blood types).

Infrastructure Notes

  • Primary Blocker: Server-side PostgreSQL disk volume exhaustion.
  • Resolution: Manual Data Bridge via hardcoded dictionary in Mantis Coding notebooks.
  • Phase 2 Migration: Scaling to 30,000+ row registries via a local WSL2 stack.