One voice note, five diaries

In NutriM8 you can mumble your whole day into your phone once. A background worker untangles it into sleep, weight, exercise, hydration and food, and resolves "a snack after lunch" to a real timestamp.

22 Jun 2026 · 3 min

The fastest way to make people stop logging their health is to make them log it. Five separate diaries, five forms, five moments where you have to stop living to record that you lived. So in NutriM8, my nutrition platform, you do one thing: you talk. "Slept badly, woke up around six, had eggs for breakfast, big walk before lunch, snack after, drank maybe two litres." One recording, mumbled into a phone. The platform sorts out the rest.

The piece that does the sorting is aiVoiceAnalysis, and it is a fire-and-forget background worker, not something you wait on. You hit stop, the app says thanks, and you put your phone away. What follows happens out of sight.

The shape of NutriM8, briefly

For context: NutriM8 is 16 packages in one monorepo, with 8 backend services that each get Dockerized and deployed independently. The voice worker is one job inside that, and the architecture only works because the worker can lean on the rest of the platform (the user model, the schedule, the diary creators) without owning any of it.

What the worker actually does

The pipeline is short and deliberately linear. First it transcribes the recording, with a fallback transcriber behind the primary one, because audio from a real phone in a real kitchen is noisy and the first transcriber will sometimes simply fail. A health log you lose because the mic clipped is worse than no feature at all, so there is always a second attempt.

Then comes the one expensive step: a single intent call to an LLM with structured output. Not five calls, one. The model reads the whole transcript and returns a typed object that says, in effect, this sentence is about sleep, this clause is about hydration, this is two separate food entries. One call keeps it cheap, keeps it fast, and keeps the interpretation coherent, because the model sees the whole day at once instead of five narrow slices that might disagree.

From there the worker routes. It is an LLM-as-router pattern: the model classifies and extracts, and then independent creators take over for each of up to five diary types: sleep, weight, exercise, hydration, and food. Each creator is its own unit. The food creator knows nothing about the sleep creator. If exercise has a bad day, hydration still lands. The model decides what goes where; it never gets to be the thing that writes to the database. That separation is the whole point of doing it responsibly.

"A snack after lunch" is a hard problem

Here is the part I am proud of. A phrase like "a snack after lunch" is meaningless without you. After whose lunch, at what time, in what timezone? So before the worker interprets anything, it builds a per-user ScheduleContext: the person's timezone, their usual meal times, when they wake, when they go to bed. That context goes into the resolution step, so "after lunch" becomes a real timestamp on the right day, and "this morning" anchors to their morning, not the server's.

Without that, every entry is a vague blob and the diary is useless for spotting patterns. With it, "snack after lunch" lands at 1:40pm on the right date, and a week of those is something you can actually graph. Time resolution is the unglamorous core of the feature. The transcription and the language model get the attention; the schedule context is what makes the output worth storing.

Then it taps you on the shoulder

When the entries are written, the worker push-notifies that they are ready. That closes the loop: you spoke once, walked away, and a notification later tells you five diaries quietly updated themselves. You can open it and correct anything that landed wrong, but the default is that you do not have to.

The pattern underneath all of this is one I keep coming back to. Let the model do the one thing it is genuinely good at, reading messy human language and turning it into structure, and let ordinary code do everything that has to be correct. The LLM routes. It does not rule. Every entry it produces passes through a typed boundary and a deterministic creator before it touches anything real. That is what makes it safe to fire and forget.

ai voice product architecture