Building Prism CV - Emanuel Petre

Job applications are broken. Most candidates have one resume and one generic cover letter they copy-paste across dozens of job boards. ATS systems filter them out before a human ever reads anything. Carefully tailoring your resume and cover letter for each specific role is the real fix, but it takes hours most people don’t have.

The question I set out to answer: can you make that process instant, honest, and actually good?

Prism CV is the result. A mobile app (iOS + Android) that takes a user’s base resume, a job description they paste in, and produces a tailored resume and cover letter in under 60 seconds. Not a keyword-stuffed hack, but a thoughtfully repositioned version of your real experience, matched to the role. The whole system (mobile frontend, Rails API, AI pipeline, infrastructure) was designed and built solo.

Architecture

The system breaks into three layers.

Mobile (React Native + Expo). The app is fully anonymous by default. On first launch it silently provisions a Bearer token and stores it locally. Users never see an account flow unless they opt in. This kept onboarding frictionless and reduced drop-off to near zero. Everything else, including resume uploads, job application creation, PDF viewer, and resume editor, runs against a typed API client on top of a RESTful Rails backend.

Rails API. A stateless JSON API on Rails 7 with PostgreSQL. The interesting decision here was storing resume data as JSONB in a single structured_data column rather than normalised tables. The schema for AI-generated output evolves constantly: new fields, renamed keys, restructured sections. JSONB absorbs those changes without migrations, and the flexible structure maps cleanly to the TypeScript types on the mobile side. Authentication is pure Bearer token, no sessions, no cookies.

Async AI pipeline (Sidekiq + Claude). When a user creates a job application, the API returns immediately and enqueues a background job. Sidekiq runs two sequential Claude API calls: resume tailoring then cover letter generation, and writes results back to the database. The frontend polls for status. This keeps API response times under 200ms regardless of how long the AI takes, and it makes the processing UI genuinely honest rather than a fake spinner.

The AI Integration Work

This is where most of the real engineering effort went.

The core challenge isn’t getting an LLM to produce a resume. It’s getting it to produce one that’s accurate. Resume tailoring is a high-stakes domain: a hallucinated skill or fabricated achievement can destroy someone’s credibility in an interview. The prompt architecture has several layers to prevent this.

Accuracy constraints. The tailoring prompt explicitly forbids the model from inventing any skill, achievement, employer, or credential not present in the base resume. Elaboration is allowed (unpacking a vague bullet into specifics), but fabrication is not.

Honest ATS scoring. Early prompt versions inflated ATS match scores for unrelated jobs. A software engineer applying for an unrelated role would get an 85, which was meaningless. I added a mandatory field relevance assessment at the top of the instruction chain: the model evaluates domain match first, before any keyword scoring. Completely unrelated fields now produce scores below 30.

Tone matching. Most AI resume tools produce a generic “confident professional” voice regardless of the company. I built a two-step tone analysis into the prompt: identify phrases from the job description that reveal the company’s voice, then use those as a north star for word choice. A warm startup gets language like “contributed to.” A leadership-heavy role gets “architected” and “drove.” A hard-coded banned phrase list prevents the worst AI boilerplate from appearing regardless of context.

Skills generalisation. Rigid string matching fails in practice. If a resume lists Docker and Kubernetes but the job description asks for “container orchestration,” the model should substitute the generalised term. That’s not hallucination, that’s good tailoring. Explicit rules for this pattern are baked into the prompt.

Evaluation framework. To validate prompt changes without guessing, I built a standalone evaluation system: fixture-based test scenarios covering close matches, career transitions, unrelated fields, and seniority gaps, with a multi-provider runner that can execute the same prompt against Anthropic, OpenAI, Gemini, and DeepSeek in one command. When the prompt changes, I run the generator and review outputs against a criteria document with six scoring dimensions. Prompt engineering without this kind of structure is guesswork.

Engineering Decisions Worth Noting

Credit-gated extraction, not creation. Users can create unlimited applications for free. They only spend a credit when they extract output (download PDF, copy cover letter). This lets users see the quality of the tailoring before committing, reduces churn from uncertain first impressions, and creates a conversion moment at the point of highest intent.

API call logging as a first-class feature. Every Claude API call is logged asynchronously with full token counts, duration, model name, and a purpose tag. Not just ops hygiene: it’s the foundation for understanding actual usage patterns, cost per user segment, and prompt performance over time.

Async over streaming. The temptation is to stream AI output directly to the client. I chose polling instead: the backend job completes atomically, the result is written once, the frontend reads a clean finished state. No partial renders, no WebSocket state management, no reconnect logic. For a 30–60 second operation the “live” difference is imperceptible, and the code is dramatically simpler.

What It Taught Me

Building a commercial AI product solo end-to-end surfaced something that doesn’t come up in most AI engineering writing: the prompt is the product. The quality of what users experience is almost entirely determined by how well the prompt constrains and guides the model. The backend, the mobile app, the subscription system: all of that is table stakes. The real differentiation is whether your prompt produces output that’s accurate, honest, and useful across adversarial conditions: unrelated jobs, thin resumes, seniority gaps, edge cases.

The evaluation framework came directly from that insight. Without a structured way to measure output quality across a range of scenarios, prompt changes are guesswork. With it, each iteration has a measurable baseline and a clear signal for whether things improved.