FlowLift

A complete B2B SaaS product: a web app, a marketing site, a tablet layout, and a native iOS app, all on one design system. Designed and built end to end in seven days by directing AI workflows.

View Figma Link

FlowLift's dashboard, rendering a fictional customer called InvoicePilot. It names the single most expensive leak in the trial funnel, shows the money already recovered, and drafts the fix. A dense, opinionated product screen, and this case study is about the direction that produced it, not the prompt.

FlowLift is a B2B SaaS product I designed and built end to end, on my own, in seven days. The point of it was never the product. The point was the method: to find out whether a designer directing AI can reach genuine senior depth, or only fast, flat AI output.

So I ran it like an experiment. One self-initiated product, a hard scope, and a rule that nothing shipped unless it would survive a senior design review. What came out is a real activation analytics product with a web app, a marketing site, a tablet layout, and a native iOS app, around sixty screens in total, all built on one load-bearing design system. The whole thing took a week.

It took seven days. The parts people assume AI is bad at, the system architecture, the consistency across sixty screens, the judgment, are the parts I am proudest of. This case study walks through both: the product, and the workflow that produced it, which is the actual subject.

Seven days, end to end

Research, strategy, the design system, every screen, and four platforms. One person, one week, from blank file to a product that holds together.

Four platforms, one system

A product app across desktop, tablet and mobile, and a native iOS build to Apple's own guidelines, plus a marketing site, all on the same components and tokens.

AI as a team, me as director

I treated the AI as a studio: researchers, builders, and reviewers I briefed and corrected. The judgment, the taste, and the quality bar stayed mine.

Research that killed the thesis

An adversarial research workflow took apart my first product idea and proved it weak. I rebuilt the positioning before drawing a single screen.

A real design system

Tokens, atoms, molecules, organisms, full state coverage, and screens that consume the components rather than copy them. This is what AI output usually lacks.

One company, one moment

One fictional account, one instant in time, one number that survives across every screen. The discipline that separates a product from a folder of mockups.

What FlowLift is, and what I actually did

FlowLift is an activation layer for B2B SaaS. It sits on top of the analytics a team already runs, Amplitude, PostHog, Segment, reads the events they already collect with no new tracking code, finds the single most expensive leak in their trial funnel, explains why people drop there, drafts a fix, and then proves the recovered revenue in dollars. Its whole pitch is the opposite of a platform: keep the stack you have, we will just tell you what to fix first.

My role was every part of it, but it is more honest to describe it as two jobs. The first was the senior designer: the strategy, the positioning, the information architecture, the brand, the type and colour system, and the standard that every screen was held to. The second was the director of an AI studio: writing the briefs, running the research and build workflows, reading what came back, and sending most of it back. The AI did the volume work. I made every decision that needed taste, and I made a lot of them, all week.

What this is, plainly: FlowLift has no users. The customer in every screen, InvoicePilot, is invented, and its data is a single hand-reconciled dataset, not a live feed. So there are no usage outcomes to claim. The outcome here is the work itself, and the method that produced it at this speed and this depth.

I owned (the judgment)

Positioning & product strategy
Information architecture
Brand, colour & type direction
The senior quality bar
Every accept / reject call

AI workflows ran (the volume)

Multi-agent research & synthesis
Option generation at speed
Building screens in Figma
Token & component sweeps
Adversarial QA passes

What shipped

Design system + components
~60 screens, 4 platforms
Marketing site & pricing
Native iOS app
One canonical dataset

01. THE EXPERIMENT

I was testing the method, not the market

I did not set out to design a conversion tool. I set out to answer a question I keep getting asked, usually with a raised eyebrow: can you really build serious design with AI, or does it just make the easy stuff faster and fall apart on the hard stuff?

The honest worry behind that question is real. Most AI design output is recognisable on sight: it is flat, it is generic, every screen is its own island, there is no system underneath, and the moment you ask for the unglamorous parts, the empty states, the error states, the consistency of one number across forty screens, it falls over. If that is the ceiling, then AI is a faster way to make a worse portfolio.

So I gave myself a product hard enough to expose that ceiling if it existed. A dense, multi-sided B2B analytics tool is about the least forgiving thing you can design: it lives or dies on data hierarchy, on consistency, on a system that scales across dozens of screens and several platforms. If AI direction was going to break, this is where it would break. I gave it a week, kept myself as the only designer, and made one rule: nothing ships unless it survives a senior review. The rest of this case study is what that produced, and how.

02. THE WORKFLOW

How I ran a studio of agents

The thing that made this work was not one clever prompt. It was a pipeline, with my judgment as the gate between every stage. I was not asking AI to "design a SaaS app." I was running a process and reviewing the output of each step before it earned the next one.

My review sat on every arrow. The AI made the moves; I decided which ones survived. That gate is the whole difference between a system and a pile of screens.

Three habits mattered more than any single prompt. I built the system before the screens, so that every screen was an assembly of approved parts instead of a fresh invention that could drift. I used AI against itself, pointing reviewer agents at the builder agents' work to find what was broken before I did. And I treated my own attention as the scarce resource, spending it on the decisions that needed taste and automating the ones that needed patience. The sections that follow are the three places that mattered most: the research that changed the product, the system that held it together, and the screens that proved the depth.

03. RESEARCH

I made the AI try to kill my own idea

My first idea was the obvious one, and it was wrong. Running it through adversarial research on purpose is how I caught that, before drawing a single screen. Used well, AI is not just a generator. It is a critic that does not flatter you.

I started where most people start: an all-in-one "Conversion OS" that would unify analytics, session replay, nudges, experiments, and AI into one platform. It sounds ambitious. So before committing, I ran a research workflow: several agents mapping the market and the competitors, and three more given one job, to try as hard as they could to kill the idea.

THESIS THE SKEPTICS KILLED

An all-in-one Conversion OS that unifies analytics, replay, nudges, experiments and AI into one platform a team switches to.

THESIS THAT SURVIVED

An activation layer on top of the analytics you already run. Keep your stack. We find the costliest leak and prove the fix in dollars.

All three skeptics held: the all-in-one idea did not survive. The incumbents already cover that ground; "all-in-one of five commodity tools" plays straight into a solo founder's weakest position; and a couple of those pillars, session replay especially, carry legal and infrastructure weight a small team cannot take on responsibly. The idea was a trap. Running it through a review built to catch exactly that is what kept me out of it.

THE REFRAME

The product stopped trying to replace the analytics stack and started sitting on top of it. "Keep your platform. We will tell you what to fix." That single move turned a weak me-too platform into a sharp, defensible wedge, and it set the tone for every screen that followed.

Here is who did what. The workflow pressure-tested the position until the weak version fell apart and the stronger one showed underneath. I read what survived, decided to trust it, and rebuilt the product around it. I would not have gotten there as fast, or as honestly, arguing only with myself.

The surviving thesis became the homepage, almost word for word: "Your analytics tells you what happened. FlowLift tells you what to fix." The marketing site, the pricing, and the product all argue the same anti-platform line, because the research had settled it before any of them were drawn.

04. THE PRODUCT AS PROOF

One company, one moment, one number

Here is the discipline that AI output almost never has, and the one I cared about most. Every screen in the product renders the same fictional customer, InvoicePilot, at the same instant in time, pulling from one reconciled dataset. The leak the dashboard flags is the step the funnel dissects, is the cohort the copilot explains, is the fix that gets proven. When the same number has to survive across every screen, you cannot fake a single one.

That number is the Connect step. Of the 2,380 trials that reach it, 1,475 never connect a data source, a 62% drop, larger than every other step combined, worth $28,400 a month. The same figure carries across the dashboard, the leak detail, the copilot, and onboarding. That is not repetition for the case study. That is the product being internally true.

Leaks, ranked by money at risk rather than by raw drop rate. Four open leaks, $42,900 a month recoverable, the worst one flagged critical and owned by Gina, the growth lead. The personas from the research are not a slide. They are the people the work is assigned to inside the product.

The leak detail is the screen I would point a skeptic to first. A custom flow diagram traces where the 2,380 trials actually go, green for connected and coral for leaked, over a cohort heatmap and a worst-segments breakdown. A prompt gets you a plausible version of this. Getting it correct, and consistent with every other screen, took direction and several rounds of correction.

The fix, proven. A treated-versus-control chart, plus 9.0 points of activation worth $11,600 a month, with the full dollar math shown rather than asserted.

The copilot answers "where am I losing the most trials?" and, crucially, shows its work, the events it read and the sources it used. In a product whose differentiator is AI, the AI is never allowed to be a black box.

That last rule, AI proposes and shows its reasoning while the human decides, runs through the whole product. Every AI suggestion carries an Apply and a Dismiss, a number-backed rationale, and an audit trail. It is a product opinion, and it is the same argument this case study makes about how the product was built.

The end of onboarding. In 38 seconds, reading data the customer already has, it names a $28,400-a-month leak and offers the fix. The up-arrow is Lift, the product's mascot, an earned character that only shows up at milestones. Even the personality was designed, not defaulted.

05. THE SYSTEM

The part AI is supposed to be bad at

If there is one thing that proves this was directed and not generated, it is the system underneath. AI will happily draw you sixty good-looking screens that share nothing. A real product shares everything. So I built the system first and made the screens consume it.

It is structured the way a senior designer structures one: tokens at the bottom, then atoms, then molecules, then organisms. Every component carries its full state coverage, default, hover, focus, pressed, disabled, error, empty, loading, because the states are where products either teach you or abandon you. And it is load-bearing, not decorative: the sidebar, the cards, the tables, and the badges on real screens are instances of these components, so a change to one updates everywhere. The rule I held was simple. A new state goes back into the component. It never gets detached and faked on a single screen.

The atoms. Buttons, inputs, selects, checkboxes, radios, toggles, badges, and avatars, each shown across every interaction and content state and bound to tokens. The unglamorous layer that makes the product feel like one application instead of sixty screenshots.

Molecules - Complex Nested Components used in the design screens

06. THE BREADTH

Four platforms in the same week

Four platforms in one week is the part most people will not believe, so here is each one. In the same seven days, the product reached desktop, tablet, mobile web, and a native iOS app design built to Apple's own guidelines, not a shrunk-down website, with a home-screen widget, a lock-screen live activity, and an App Store listing.

The dashboard reflowed for tablet. Same system, same data, a layout that adapts rather than just scales.

Dashboard - Mobile Responsive

IOS - Dashboard

IOS - Lock Screen

IOS - App Store

07. THE HONEST PART

Where I was the bottleneck, on purpose

If I only showed the wins, this would be a commercial, not a case study. So here is the truthful version: the AI was fast and frequently wrong, and the week was me directing it. Setting the bar, originating the strategy, catching what missed, and deciding what the right answer actually was. That direction is the job, and it is where the design happened.

Left to its defaults, the AI reached for the obvious and the generic every time. The first brand direction came back too loud and too orange; I pushed it through several rounds to the calm green-on-charcoal the product ended up with. The first funnel came back as a plain bar chart, the most generic possible answer, so I rejected it and directed the bespoke activation-path flow that became the product's signature visualisation. Components came back as one-off frames pretending to be a system, so I sent them back until they were real, stateful, and reusable.

None of that is AI failing. It is AI doing what it does, producing a competent average at speed, and waiting to be told it is not good enough. The taste to know it is not good enough, and the specific direction to fix it, is the part that does not automate. I ran adversarial QA passes for the same reason: agents whose only job was to screenshot the work and hunt for what was broken. They caught real defects. I still made the final call on every one.

THE HONEST SUMMARY

AI gave me options faster than I could judge them. The work was choosing the right one and killing the rest, over and over, for a week. That is not less design. The slow part was never the deciding. It was the doing, and the doing is what I handed off.

7

DAYS TILL FINISH

4

PLATFORMS

~60

SCREENS

—

I still made every decision in this product. I just no longer needed a team to execute them.

08. REFLECTION

What this changes, and what it does not

—

The judgment did not get cheaper. It got more important.

When the production cost of a screen falls to near zero, the value moves entirely to the decisions: what to build, what to cut, what is good enough, what is not. A weaker designer with these tools just makes bad work faster. The bar did not move. The leverage on it did. That shift, leverage moving onto judgment, is the actual claim I am making here.

—

What it is not.

It is not magic, and it is not a launched company. FlowLift has no users and the data is invented, so I am claiming a method and a body of work, not a market result. The next honest step for the product is the one no workflow can fake: putting it in front of real growth leads and watching where the story holds and where it breaks.

—

Why I am putting this in a portfolio.

Because this is how I work now, and I would rather show it than describe it. I can hold a senior bar across a complex product, and I can move at a speed that used to need a team, because I have turned the parts of design that do not need taste into something I direct instead of do by hand. FlowLift is the proof that those two things can be true at once.