A Private, On-Prem Language Model for Procedure Instructions

A private language model running on an on-premises GPU stack inside the hospital firewall, reading the EMR and drafting personalized procedure instructions

Sector

Healthcare & pharma · academic medicine

Engagement

Private on-prem SLM for clinical documentation

Timeline

11 months, including validation

Scope

EMR-integrated · 6 procedure areas · 600+ clinicians

Platform

On-prem NVIDIA GPU stack · private small language model · EMR integration

Status

In production, fully on-premises

100%

on-premises, zero PHI egress

90 sec

to draft a note, from ~8 min

47%

less documentation time

6

procedure areas live

Some data can't leave the building, and shouldn't. This academic medical center wanted the speed of generative AI for a painfully manual job: turning a patient's record into clear, personalized prep and recovery instructions for a procedure. But these are about as sensitive as records get, and sending them to an outside API was a non-starter for the institution and its review boards.

The instructions themselves were a real problem. Clinicians wrote them by hand or pulled from generic templates that ignored the patient's own medications, history, and language, which is exactly where instructions tend to fail.

The challenge

Could a language model personalize clinical instructions straight from the EMR without a single byte of patient data leaving the hospital network? Generic capability was not the hard part. Doing it entirely in-house, fast enough for clinical workflows and safe enough for the review board, was.

The approach

We brought the model to the data instead of the data to the model. Mindcracker stood up a small language model on an on-premises NVIDIA GPU stack, inside the hospital's own firewall. It reads the relevant record directly from the EMR, drafts a personalized set of notes and instructions, and routes them to a clinician to review and sign. Nothing is sent to any external service, ever.

01

A model that runs in-house

A right-sized small language model on an on-prem NVIDIA stack: fast enough for clinical workflows, and small enough to own and govern completely.

02

Direct, governed EMR access

The model reads the record inside the network, under the same access controls and audit expectations as any other internal system.

03

Personalized, not generic

Instructions adapt to the individual, their medications, history, and reading level, so a patient on metformin gets the guidance that actually applies to them.

04

Clinician in the loop

Every draft is reviewed and signed by a clinician before it reaches a patient. The model writes the first draft; the human owns the decision.

The model came to the data. The data never had to take a risk.

On-premises architecture where the EMR feeds a private small language model on a GPU cluster, which drafts notes a clinician reviews, all inside the firewall — FIG.02Every step, from EMR to private SLM to the drafted note, runs inside the hospital firewall. Patient data never leaves the network.

The outcome

Drafting a personalized instruction set went from several minutes of clinician time to seconds, and documentation load dropped across every procedure area that went live. Patients received instructions written for them specifically, and the institution got all of it without taking on the privacy risk of an outside model.

Privacy and capability are usually a trade. Here they weren't. The model lives where the data lives.

Because the platform is entirely on-premises, the medical center owns it outright: the model, the data, and the audit trail. No usage meter, no egress, no third party in the loop.

Have data that can't leave the building?

Start with a focused 90-minute AI readiness assessment. It's a candid read on what you can run privately, and what will actually reach production.

Take the assessment →

A private language model that never leaves the building

The challenge

The approach

The outcome

Have data that can't leave the building?

More work & thinking.