Lessons From Building an AI-Powered CMS

AI-assisted content creation has gone from a demo to a production expectation in a remarkably short period. We spent two years building and maintaining an AI-powered CMS that used GPT models for multilingual content generation, editorial suggestions, and translation workflows. Here’s what we actually learned.

The UX Problem Is Harder Than the API Problem

Calling the OpenAI API is straightforward. Building a user experience around AI generation that editors actually trust and use effectively — that’s the hard problem.

Editors are risk-averse. They’ve spent careers developing editorial judgment, and an AI that confidently produces subtly wrong content is more dangerous than one that’s obviously wrong. The UX patterns that built trust in our system:

Generation as a draft, not a result. AI output always landed in a clearly marked draft state. The UI made it visually obvious that human review was required before publication.

Show the prompt alongside the output. When editors could see the exact prompt that produced a result, they could reason about why the AI wrote what it did — and how to prompt differently to get something better.

Inline revision rather than full regeneration. Instead of “regenerate the whole thing,” we built “revise this section” controls. Smaller surface area for review, faster editorial cycles.

Multilingual Generation Is a Different Problem

Generating content in 14 languages isn’t 14× the work of generating it in one language — it’s a qualitatively different problem.

GPT-4’s quality across languages is uneven. For high-resource languages (Spanish, French, German, Portuguese), outputs were consistently strong. For lower-resource languages, the model would sometimes produce grammatically correct but culturally tone-deaf content. Native reviewer workflows weren’t optional — they were the product.

We learned to treat AI translation as a first draft, not a final deliverable. The value wasn’t eliminating human translators — it was reducing the blank-page problem and giving translators a strong starting point.

Caching Is Your Friend, Rate Limits Are Not

GPT API calls are slow relative to database queries. For any content that’s generated once and reused, aggressive caching is essential. We cached at three levels: the raw API response, the processed output, and the rendered HTML.

Rate limits require a queuing strategy for bulk operations. Bulk content generation — “translate these 200 articles overnight” — needs a background job queue with retry logic and backoff. Don’t call the API synchronously in a web request for anything that might take more than a few seconds.

Prompt Management Is an Engineering Problem

Prompts are code. They need version control, testing, and deployment processes. The worst pattern we’ve seen is prompts hardcoded in application logic where changes require a full deployment.

We moved prompts to a database table with version history. Editors with the right permissions could update prompts and see the change reflected immediately. A/B testing prompt variations became possible because prompts were data, not code.

The Output Validation Layer

Never trust unvalidated AI output in a content pipeline. We built a validation layer that checked:

Character count within CMS field limits
No hallucinated URLs or links that 404
No repetition of the input prompt in the output
Language detection confirming output matched requested language

Most passes were clean. The 2% that weren’t would have caused real editorial problems without the check.