How to Redact a Scanned Document So the Hidden Text Is Actually Gone

Learn how to redact a scanned document properly. A black box over text often leaves the words readable underneath — here's how to remove sensitive information so it can't be recovered.

There is a particular kind of mistake that feels safe at the moment you make it and turns out to be the opposite. You have a bank statement, a medical form, a lease — something with one line you don't want a stranger to see. You draw a neat black rectangle over the account number, save the file, and send it on. The page looks redacted. The number is gone.

Except, very often, it isn't. The black box you drew is sitting on top of the document, like a sticker, and the original text is still underneath it — fully intact, fully readable to anyone who knows where to look. This is not a rare edge case. Courts, law firms, and large companies have repeatedly published "redacted" filings where the hidden text could be revealed by selecting it and pasting it into a plain text document. If institutions with legal teams get this wrong, it's worth understanding why.

A document is made of layers, not pixels

The confusion comes from imagining a digital page as a single flat picture. Most of the time, it isn't.

A PDF — especially one produced by a scanner app — is usually a stack of layers. At the bottom is the image of the page. Above that, if the file has been run through OCR (optical character recognition), there is an invisible text layer: a transcription of every word, positioned to line up exactly with the printed letters in the image. That invisible layer is what lets you search a scan, or tap a phone number to call it. It's enormously useful. It is also the thing that quietly defeats your black rectangle.

When you draw a box over a number, you typically add a third layer: an annotation. Annotations float above everything else. They hide content visually without touching it. The image beneath still contains the original pixels, and the OCR text layer beneath that still contains the original characters as machine-readable text. Removing the box — or simply copying the region and pasting it elsewhere — brings the secret right back.

This is the heart of redaction vs blacking out. Blacking out changes what the page looks like. Redaction changes what the file contains. Only one of them protects you.

Why "black box over text" stays readable

It helps to see the specific ways the information survives, because each one is a separate door you have to close.

The OCR text layer is the most common culprit. Even with a black graphic on top, the underlying string — "Account 4012 8888 8888 1881" — is still stored as text. A reader can drag-select across the blacked-out area and copy it, or a script can pull every text object out of the file in seconds.

The image itself is the second door. If the redaction is just an overlay, the original page image still holds the pixels of the number. Strip the overlay and the pixels reappear, unaltered.

Then there is metadata — the information about the file rather than in it. Scanned documents and photos can carry the original filename, the device that made them, timestamps, and sometimes GPS coordinates of where the photo was taken. You can redact every visible word on the page and still leak the address of the kitchen table you scanned it on, because that detail lives in the file's metadata, not its image.

What real redaction actually does

Proper redaction doesn't cover information. It removes it, and then removes the empty space where it used to be so nothing can be inferred or recovered.

The reliable mechanism has two parts. First, the sensitive content is deleted from every layer — the pixels are painted over destructively in the image, and the corresponding characters are stripped from the text layer. Second, and this is the step people skip, the document is flattened.

Flattening means collapsing all those layers down into a single image. The OCR text layer, the annotations, the separate graphics — they stop being independent objects and become one set of pixels, like printing the page and photographing the printout. After a flatten, there is no "underneath." A black region is simply black pixels. There is no text object to copy, no overlay to peel away, nothing to recover, because the structure that held the secret no longer exists.

This is why a true redaction tool will often warn you that the action can't be undone. That permanence is the feature. If you can undo it, so can someone else.

A practical way to redact, step by step

You don't need legal software to do this correctly. You need to do the steps in the right order.

Start from the image, not the searchable version. If your goal is to share a redacted copy, work on a flattened export rather than a layered, OCR'd PDF. The fewer layers in the file, the fewer places for text to hide.

Cover the content destructively. Use a tool that paints over the area — actually altering the pixels — rather than one that places a movable shape on top. If your only option places a shape, you must flatten afterward for it to count.

Flatten the result. Export the document as a flattened image or a PDF built from flattened images. This is the step that turns a cosmetic black box into a real one.

Check your work the way an adversary would. Open the finished file and try to select text across the redacted spot. Try to copy it. Zoom in hard on the black area to see whether any ghost of the original shows through a too-thin mark. If nothing selects and nothing shows, the redaction holds.

Strip the metadata. Before sharing, clear the file's metadata so the document doesn't carry timestamps, device names, or location data you never meant to send.

Watch the margins of the page itself. Redaction failures aren't only digital. A faint impression bleeding through thin paper, a reflection in a glossy photo, an account number partially visible in a fold — the camera sees more than you do. Scan cleanly, then redact.

The mindset that prevents leaks

The deeper lesson is to stop trusting your eyes as the test. A page that looks redacted has passed only the weakest possible check: it fooled a human glancing at a screen. The information you're protecting will be handled by people and systems that don't glance — they select, copy, parse, and search.

So the right question is never "does this look hidden?" It's "if someone tried, could they get it back?" That single shift — from appearance to recoverability — is what separates a redaction that protects you from one that merely reassures you.

Where this fits with scanning on your phone

Most redaction failures begin upstream, at the moment a document becomes a file. A scan that was OCR'd for convenience, then casually marked up and shared, carries every layer we just discussed straight to the recipient. The safest workflow keeps the sensitive page on your device, lets you cover and flatten before anything leaves, and never quietly uploads the original to a server you can't see.

That is the workflow LumenScan is built around. Scanning and OCR happen on-device, so the page — and its text layer — stay in your hands, not on someone else's cloud. When you need to share a redacted copy, you can cover the content and export a flattened file, and because nothing was sent off the phone to begin with, there's no second copy sitting somewhere with the original text intact.

If you regularly send documents that contain even one line you'd rather a stranger never read, it's worth scanning them somewhere that treats "private" as the default rather than a setting. You can see how LumenScan approaches it at https://lumenscan.lumenlabs.works — and either way, the next time you reach for a black rectangle, flatten the page before you hit send.