What AI Gets Wrong About Bookkeeping

I build software that uses AI to help bookkeepers clean up messy QuickBooks files. I've been doing this for over a year now, and I want to tell you what I've learned. Because what I've learned is not what the AI hype crowd is selling.

The pitch from every direction right now is that AI is going to automate accounting. Agents that categorize your transactions. Agents that reconcile your books. Agents that do your cleanup for you while you sleep. It sounds incredible if you've never actually opened a client's QuickBooks file and tried to figure out why Accounts Payable is negative.

Here's what actually happens when you point AI at real bookkeeping data.

It makes things up

I don't mean it gets things slightly wrong. I mean it fabricates dollar amounts that don't exist anywhere in the source data.

During testing, I had AI review a client file and report that a specific account had a balance of $47,000. Sounded reasonable. Fit the narrative. The actual balance in the raw data? It wasn't $47,000. It wasn't close to $47,000. The number didn't exist in any export. The AI pattern-matched its way to a plausible figure and reported it with complete confidence.

This is the problem no one talks about. AI doesn't tell you when it's guessing. It presents fabricated numbers with the same confidence as real ones. And if you're a bookkeeper handing a client an assessment that contains a made-up dollar amount, you don't get to blame the software. Your name is on it.

It doesn't understand debits and credits the way you do

One of the first bugs I caught in my own system was a debit/credit direction error. The AI-powered analysis was looking at the sign of a number (positive or negative) and making assumptions about whether it was income or an expense. A refund hitting an expense account? Negative number. The system flagged it as revenue that had been miscoded.

Any bookkeeper would look at that and immediately know: that's a vendor refund. It's a credit to an expense account. It's not income. But AI doesn't have the context of having posted thousands of transactions. It sees a pattern and runs with it.

This wasn't a one-off. It was a whole class of bugs. Every detector that relied on amount sign instead of transaction type was producing findings that a first-year bookkeeper would catch as wrong. And these are the findings that destroy your credibility fastest, because your client can disprove them by opening the transaction in QBO.

It flags things that aren't problems

I tested the system against a file where the client had legitimate finance charges from their bank. The AI flagged them as "returned payment anomalies." Same thing happened with credit card vendor refunds and garnishment withhold-then-remit cycles. All normal business transactions. All flagged as problems.

False positives are worse than missed findings. A missed finding means you have more work to do later. A false positive means you told your client something is wrong with their books when it isn't. That's the fastest way to lose trust with a client who already hired you because they didn't trust their own data.

So why am I building with AI?

Because when you use it correctly, it's incredibly useful. The key word is "correctly."

Here's what I learned the hard way: AI should narrate, not analyze. In my system, rule-based detectors do the actual analysis. They check specific things against specific data sources using logic I've tested against dozens of real client files. The AI layer takes what the detectors already found and generates the fix procedure, the journal entry memo, the plain-language explanation of what's wrong and how to fix it.

When I let AI independently analyze raw data and draw its own conclusions, it fabricated findings. When I limited it to explaining what the detectors already confirmed, the output got dramatically better.

That's the architecture that works: deterministic analysis, AI narration. The software finds the problems using rules a senior bookkeeper would apply. The AI writes up the results in a way that's useful and actionable. The bookkeeper reviews everything before it touches the client's books.

What this means for you

If someone is selling you an AI agent that will "do your bookkeeping," ask them one question: what happens when it's wrong?

Because it will be wrong. I've spent months doing nothing but testing AI output against raw QBO data, line by line, finding the places where it breaks. And I built the system. I know exactly what it's supposed to do, and I still find errors that would embarrass me if a client saw them.

The bookkeepers who will do best with AI are the ones who treat it like a junior staff member with a photographic memory and no judgment. Fast, thorough, confident, and completely capable of being wrong about things that matter. You wouldn't let a new hire post journal entries without review. Don't let AI do it either.

The value of a good bookkeeper has never been data entry. It's knowing what to look for, what questions to ask, and when something doesn't add up. AI can help you look faster. It can surface things you might have missed. It can write up findings in a fraction of the time.

But it can't replace the part where you look at a finding and think, "That's not right." That's the part that matters. That's the part your clients are paying for.

That's the part that makes this work interesting.