How I built a PDF Q&A app when no one was talking about

I had a 80-page research paper open. I needed one specific number from it. I spent 10 minutes skimming before I found it buried in section 4.3.

That was the moment I decided to build DocQuify.

The concept is simple: upload a PDF, ask it questions, get answers that come from the document itself. Not a summary, not a guess. The actual answer, from the actual content.

But simple concepts have a way of hiding hard problems.

The naive approach breaks fast

The first thing anyone thinks of when building something like this is: just send the whole PDF to the model. Done.

That works for a 3-page document. The second you try it on anything real, you hit context limits, the cost gets out of hand, and the model starts losing track of what's in the document versus what it already knows. The answers get fuzzy.

So you need retrieval. You need to figure out which parts of the document are relevant to the question and only send those.

Where chunking gets tricky

The idea behind RAG (Retrieval Augmented Generation) is straightforward: split the document into chunks, embed each chunk as a vector, and at query time find the chunks closest to the question semantically.

The part that took me the longest to get right was the splitting itself.

If you split naively at fixed character counts, you end up cutting sentences in half. A chunk ends mid-thought and the next one starts without enough context to be useful. You retrieve it, it looks relevant, but the model can't do anything useful with an incomplete idea.

I switched to overlapping chunks: each chunk shares some content with the one before and after it. It sounds wasteful but it means context doesn't fall through the cracks at boundaries. Answer quality went up noticeably after this change.

The retrieval loop

Once the document is chunked and embedded, the query side is relatively clean. The user's question gets embedded using the same model, and the vector DB returns the top matching chunks. Those chunks go into the prompt as context, and the model answers strictly from them.

The key constraint is that the model should not reach outside what was retrieved. If the answer is not in the retrieved chunks, it should say so rather than filling in from general knowledge. That boundary is what makes the answers trustworthy.

What I'd do differently

Chunk size is something I'd experiment with more. I settled on a size that worked well across the documents I tested but there is no universal right answer. Technical documentation chunks differently than narrative text.

I'd also add source highlighting earlier: showing the user exactly which part of the PDF the answer came from. It's in the roadmap but not shipped yet, and I think it would change how much people trust the output.

The takeaway

If you're building anything with RAG, spend more time on your chunking strategy than you think you need to. It's the part that doesn't feel important until your retrieval is consistently pulling the wrong context and you can't figure out why.

The model is the easy part. Getting the right content in front of it is the actual work.