You're three sentences into dictating, and it's going better than you expected. The words are arriving in the right order. Then you reach the end of a clause and freeze. Do you say "comma" out loud? Will the software type the word comma instead? Should you have been saying "period" all along, like someone reading a telegram? The sentence you were building quietly dissolves while you deliberate.

Punctuation is the part of dictation that makes people feel silly — the seam where speaking and writing visibly fail to line up. But here's the thing that changes how you approach it: punctuation isn't foreign to speech. It came from speech. And once you understand what a comma actually is — a written stand-in for something your voice already does — the question of how to punctuate when dictating gets much less awkward.

Punctuation Was Invented to Capture the Voice

For a long stretch of the written word's history, there was no punctuation at all. Ancient Greek and Latin texts were often written in scriptio continua — an unbroken river of letters, no spaces, no marks. That worked because reading was largely a performance: you read aloud, or you listened to someone who did, and the reader's voice supplied the structure the page withheld.

Around the third century BC, Aristophanes of Byzantium, a librarian at Alexandria, proposed a system of dots to help readers know where to breathe and pause — a low dot for a short pause, a middle dot for a longer one, a high dot for a full stop. His terms for the units of text they marked — komma, kolon, periodos — survive in our words comma, colon, and period. The marks weren't grammatical rules. They were stage directions for the voice.

Silent reading as the norm came much later; Augustine famously recorded his surprise at watching Ambrose of Milan read without moving his lips, as though it were a curiosity worth noting. As reading turned inward, punctuation gradually shifted from a performance aid into a grammatical convention. But its origin never fully washed out. A comma is, at bottom, a notation for a pause and a pitch movement. A question mark is a rising contour, frozen in ink.

So when you dictate and wonder where the punctuation will come from, you're not doing something unnatural. You're running the original process in its original direction: voice first, marks after.

Your Voice Is Already Punctuating

Linguists call the melody and rhythm of speech prosody, and it carries a remarkable amount of structural information. When you finish a clause, you don't just stop — several things happen at once, none of which you consciously decide. You lengthen the final syllable slightly, a well-documented phenomenon called phrase-final lengthening. Your pitch, which tends to drift downward across a phrase, drops at the boundary and then resets higher when the next phrase begins. You leave a beat of silence that's measurably different from the micro-pauses inside a phrase. Ask a question, and your intonation often rises where a statement would fall.

Listeners lean on these cues constantly. Research on speech perception shows that people use prosodic boundaries to decide how a sentence hangs together — where one thought ends and the next begins — often before the words themselves settle the matter. A pause in the wrong place can genuinely change what a listener understands, which is why "Let's eat, Grandma" is a prosody joke before it's a punctuation joke.

The upshot for dictation: the commas aren't missing from your speech. They're encoded in a different medium — duration, pitch, silence — waiting to be transcribed. The practical question is who does the transcribing: you, or the software.

Two Ways Software Turns Speech Into Marks

Dictation tools handle punctuation in two broad ways, and knowing which one you're working with dissolves most of the confusion.

The first is explicit commands. You say "comma," "period," "new paragraph," "open quote," and the software types the mark instead of the word. This is deterministic and precise — you get exactly the punctuation you asked for, exactly where you asked for it. The cost is cognitive: you're interleaving two streams, the sentence you're composing and the markup you're narrating. It's writing and typesetting at the same time, and it's the main reason old-school dictation felt like operating machinery.

The second is auto-punctuation. Modern speech models are trained on enormous amounts of transcribed speech, and they learn to infer punctuation from the same cues human listeners use — your pauses and phrasing — combined with the syntax and vocabulary of what you said. You just talk, and the commas and periods condense out of your prosody. On current systems, including on-device models, this works far better than most people who tried dictation years ago would guess.

Neither mode is simply better; they suit different jobs. Auto-punctuation is right for prose — messages, drafts, journal entries, anything where you're thinking in sentences. Explicit commands earn their keep when precision matters more than flow: dictating a list, controlling paragraph breaks, placing quotation marks around dialogue, or writing anything where a mark in the wrong spot changes the meaning.

How to Speak So the Punctuation Lands

If you're relying on auto-punctuation, you can meet the software halfway — not by talking like a robot, but by speaking the way you would to a person who's actually listening.

Finish your sentences with your voice. The single most common failure isn't the software missing a boundary — it's the speaker never producing one. When you trail off mid-thought and restart, there's no falling pitch, no clean pause, nothing to transcribe. Let the sentence land. You'll hear it end; so will the model.

Pause at the joints, not in the middle. A beat of silence after a clause is a signal. The same silence in the middle of a phrase, while you hunt for a word, is noise. If you need to think, it's often cleaner to stop fully, think silently, and resume with a fresh sentence than to leak hesitations into the middle of one.

Let questions sound like questions. If your intonation rises, you'll usually get the question mark without asking for it.

Switch to commands for structure. "New paragraph" is worth saying out loud. Paragraph breaks are editorial decisions, not acoustic facts, and no model can reliably read your mind about them.

Fix the Commas Later, Keep the Momentum Now

Here's the permission slip most new dictators need: you're allowed to get the punctuation slightly wrong. A misplaced comma costs you two seconds in revision. Stopping mid-thought to adjudicate one costs you the thought — and speech, unlike text, doesn't wait patiently while you deliberate. The sentence you were holding in working memory decays fast.

Punctuation is the single cheapest thing to repair in a draft. It's local, mechanical, and instantly visible on a reread. Ideas are none of those things. So the trade is lopsided: protect the flow of composition, and let punctuation be a finishing pass. This is, incidentally, how writers who dictated whole books managed it — they spoke the prose and trusted the marks to be settled on the page, where marks belong.

Once you stop treating every comma as a live decision, dictation stops feeling like a transcription exam and starts feeling like what it is: talking, with a very good stenographer.

Where Quill Fits In

This is the problem Quill was built around. It runs Wispr-class dictation entirely on-device — private, fast, and available in any app you can type in — and its auto-punctuation reads your prosody the way a listener would, so you can speak in ordinary sentences and watch the periods and commas settle into place on their own. And because the finishing pass matters, Quill gives you a one-tap rewrite: when the draft is down, a single tap smooths stray punctuation, tightens phrasing, and shifts the whole thing into whatever style the moment calls for. Speak the thought; let the marks find you. If you'd like to try writing the way punctuation was originally meant to work — voice first — you can start at quill.lumenlabs.works.