What OCR still gets wrong on real receipts

The marketing pitch for every receipt app, ours included if we're not careful, is some version of "just snap a photo and we extract everything automatically." It's a good pitch because it's mostly true. On a crisp, freshly-printed receipt, modern OCR is genuinely excellent — it'll pull the merchant, the total, and the date with little fuss.

But real receipts aren't crisp and fresh. They're faded, crumpled, photographed in bad light, half-thermal-half-handwritten, and laid out by whatever point-of-sale system the shop happened to buy in 2014. I'd rather be honest about where the technology stumbles than pretend it's magic — partly because the way a tool handles its own failures matters more than its success rate on easy cases.

What OCR reliably gets right

Credit where due. On a reasonable image of a printed receipt, you can expect today's OCR to reliably read the merchant name, the total amount, and the date. For a large share of everyday receipts, that's most of what you need, and it happens in a second or two with no typing. This part really does work, and it's a real time-saver.

The trouble starts at the edges — and the edges are more common than the demos suggest.

Where it stumbles

Here's an honest list of what still trips up receipt OCR in 2026:

Faded thermal paper. Thermal receipts fade with heat, light, and time. A slip that sat in a hot car for a week can be half-gone, and you can't read what isn't there. (This is also why capturing early matters — the photo preserves what the paper won't.)
Handwriting. Handwritten totals, tips scrawled on a restaurant bill, a market vendor's hand-written amount — OCR handles these far less reliably than print.
Odd layouts. Every POS system formats differently. Multi-column receipts, ones where the total sits in an unexpected place, or where "amount due" and "amount paid" and "change" all look alike — these confuse extraction more than a clean single-column slip.
Line items versus the total. Reading that a receipt totals ₹2,400 is easy; correctly itemising fifteen line items and their individual taxes is much harder, and much easier to get subtly wrong.
Currency, tax, and format ambiguity. Date formats (is 03/04 March or April?), decimal and thousands separators, multiple tax lines, service charges and tips — all introduce ambiguity that a confident-looking extraction can quietly get wrong.

None of this means OCR is bad. It means OCR is a very good assistant with blind spots — and a tool's job is to be honest about those blind spots rather than paper over them.

The failure mode that actually hurts

Here's the part I care about most. There are two ways for OCR to be wrong, and they are not equally bad.

The first is to fail visibly — to say "I'm not sure about this total, can you check?" That's mildly annoying and completely fine.

The second is to fail silently — to confidently extract ₹240 from a receipt that actually says ₹2,400, present it as done, and move on. This is the dangerous one, because it looks finished. You don't know to check it. The error sails through into your records and surfaces months later as a number that won't reconcile — or worse, doesn't surface at all, and quietly understates an expense. A wrong answer presented confidently is worse than an honest "I'm not sure," because it costs you nothing to fix what you've been asked to check, and a lot to find what you weren't.

So the design question isn't "how do we make OCR perfect?" — nobody can promise that. It's "how do we make sure that when OCR is wrong, you find out easily?"

How we handle it

The approach in Starlog comes down to a few principles, all of which follow from taking the failure modes seriously:

OCR drafts the easy fields; you own the rest. The store and amount are read off the receipt and pre-filled; the date, category, and anything else are yours to set, and every field stays editable. The autofill is a starting point, not a verdict.
Make correction fast, not punitive. Fixing a misread amount should take a tap, because a tool that's annoying to correct trains you to skip the check — which reintroduces the silent-error problem.
Never throw away the original. The receipt image is always kept as the source of truth, in your own Google Drive. The extracted text is a convenience; the image is the record. If a number is ever in doubt, the original is right there to settle it — today or in three years, when it's the thing an audit asks for.
Keep a human in the loop by design. OCR drafts; you confirm. That's not a failure of automation, it's the correct division of labour for records you're legally responsible for.

The honest bottom line

OCR will save you most of the typing, most of the time. It will not read a blank thermal slip, reliably decode handwriting, or perfectly itemise a messy bill — and any tool that claims otherwise is selling you the demo, not the Tuesday. The right thing for a receipt tool to do is lean on OCR for the easy 90%, be honest and quick about the hard 10%, and always keep the original image so the truth is never more than a tap away.

What OCR still gets wrong on real receipts (and how we handle it).

What OCR reliably gets right

Where it stumbles

The failure mode that actually hurts

How we handle it

The honest bottom line

A small app for keeping your receipts straight.
We’re early. Come along.

What OCR reliably gets right

Where it stumbles

The failure mode that actually hurts

How we handle it

The honest bottom line

Read next

A small app for keeping your receipts straight.We’re early. Come along.

A small app for keeping your receipts straight.
We’re early. Come along.