Flahr Image Studio

A desktop application for iterative AI image generation. Generate images with Gemini, edit them with multi-turn conversations, and build up version chains. The images coming out of this thing are indistinguishable from reality.

// Why I Built This

Google AI Studio is a horrible UI for iterative image generation. If I wanted a variation of an image, I could continue the chat, but if I wanted a new one I had to create a new chat. It was laggy as hell. When you're creating many chats just to generate similar images, they all end up with similar names, which makes it impossible to navigate. The whole thing felt bloated for what I was trying to do.

So I built my own UI. I used it as I made it, and made it perfect for me. Generate something, tweak it, fork it, compare versions side by side, and build toward what you actually want. A creative tool designed around iteration, not a chat interface repurposed for image generation.

Side-by-side comparison showing iterative refinement

// How It Works

Generation chains

Every image belongs to a chain. The first generation is the root. Edits create new nodes in the chain, each linked to its parent. A filmstrip at the bottom of the image view shows the full evolution — Original, Edit 1, Edit 2 — and you can click any version to compare. The gallery only shows the latest version from each chain, keeping things clean while preserving full history underneath.

Multi-turn editing via thought signatures

This is the key technical insight. When Gemini generates an image, it returns thought signatures — internal state that captures the model's understanding of what it produced. The app stores these signatures in the database. When you request an edit, the app reconstructs a multi-turn conversation: your original prompt, the model's response (with signatures), then your edit instruction. This gives the model context about what it already generated, so “make the sky more purple” actually works instead of producing a completely different image.

Main generator view with generation chain and prompt panel

Reference images and character consistency

You can upload reference images or save any generation as a reference. Attach a reference to your next generation and the model uses it for style and character consistency. I was genuinely surprised by how well this works — nano-banana-pro's ability to do consistent character generation across multiple images is impressive. You get the same character in different poses, settings, and lighting without carefully re-engineering your prompt each time. The results are indistinguishable from photographs.

AI-generated image indistinguishable from a real photograph

View prompt

{
  "ratio": "3:4",
  "scene": "A harsh, high-contrast candid flash photograph taken at
    an apple orchard on a crisp fall afternoon, shot on an iPhone 14
    with flash enabled in auto mode.",
  "subject": {
    "description": "The only visible human subject is the same young
      woman from the reference image.",
    "position": "Standing under an apple tree, reaching up into the
      branches.",
    "pose": "Mid-reach on her tiptoes stretching for an apple just
      out of grasp, one hand on a branch for balance, other arm
      fully extended, fingers straining, face scrunched with effort
      and determination. She is not looking at the camera."
  },
  "outfit": "Chunky knit sweater, high-waisted jeans, white sneakers
    now stained with grass and mud. A canvas picking bag hangs across
    her body, already bulging with apples.",
  "lighting": {
    "type": "Direct on-camera iPhone flash",
    "effect": "Harsh front flash typical of smartphone photography,
      illuminating her face and sweater flatly against the bright
      fall daylight, creating that washed-out amateur flash-in-
      daylight snapshot look."
  },
  "camera_settings": {
    "device": "iPhone 14 main camera, 26mm equivalent focal length",
    "aperture": "f/1.5 but with deep depth of field due to small
      sensor size",
    "focus": "Background remains sharp and legible, not blurred into
      bokeh",
    "flash": "TrueTone flash enabled, slight overexposure on
      subject's face competing with afternoon sun",
    "noise": "Subtle digital noise typical of smartphone photography"
  },
  "setting_details": "Apple tree with red apples hanging from
    branches, a few fallen apples on the ground, wooden orchard
    ladder leaning against a nearby tree, rows of apple trees
    stretching back, grass with fallen leaves, her jacket discarded
    on the ground beside her.",
  "background": {
    "environment": "The background shows the orchard in sharp focus —
      rows of apple trees, fall foliage in reds and oranges, a
      rustic wooden fence in the distance, blue sky with wispy
      clouds.",
    "depth": "Background is sharp and detailed, not blurred."
  },
  "style": "Ultra-realistic iPhone snapshot aesthetic, raw unedited
    look, visible digital noise, slightly blown-out flash competing
    with daylight, true-to-life fall colors, amateur candid
    photograph feel."
}

Gallery with filtering and tagging

Images are organized with custom color-coded tags, favorites, and searchable prompts. Multi-select with Ctrl+Click, range selection with Shift+Click, bulk operations. The gallery supports date range filtering and shows the latest version from each chain by default.

// Architecture

┌─────────────────────────────────────────────┐
│              Tauri v2 (Rust)                │
│                                             │
│  ┌──────────────────────────────────────┐   │
│  │  React + Zustand (6 stores)         │   │
│  │  Generator │ Gallery │ Image View   │   │
│  └──────────────────┬───────────────────┘   │
│                     │ IPC (invoke)          │
│  ┌──────────────────▼───────────────────┐   │
│  │  Rust Commands                       │   │
│  │                                      │   │
│  │  ┌────────────┐  ┌────────────────┐  │   │
│  │  │  SQLite    │  │  Filesystem    │  │   │
│  │  │  (metadata,│  │  (images/,     │  │   │
│  │  │   chains,  │  │   references/) │  │   │
│  │  │   tags)    │  │                │  │   │
│  │  └────────────┘  └────────────────┘  │   │
│  │                                      │   │
│  │  ┌────────────────────────────────┐  │   │
│  │  │  Gemini API (generateContent)  │  │   │
│  │  │  + thought signature storage   │  │   │
│  │  └────────────────────────────────┘  │   │
│  └──────────────────────────────────────┘   │
└─────────────────────────────────────────────┘

Everything is local. Images stored on the filesystem, metadata in SQLite, API calls go directly to Gemini with your own API key. No backend server, no cloud storage, no account required. The Rust layer handles all database operations and API communication through Tauri commands, keeping the React frontend purely presentational.

State management uses six Zustand stores — app state, settings, gallery, auth, tags, and reference images — each with clearly scoped responsibilities. The stores communicate through the Tauri IPC bridge, never directly with each other.

// Decisions I Made

Built for myself, not for a market

I didn't do user research. I used Google AI Studio every day, got frustrated, and built the tool I wanted. Every feature exists because I hit a friction point and solved it. The chain filmstrip exists because I kept losing track of which edit was which. The tag system exists because I had hundreds of generations and no way to find anything. When you're your own user, iteration speed is instant — the feedback loop is just using the app and noticing what's annoying.

Chain root ID pattern over tree structures

Image chains could form complex trees (edit from version 2, branch from version 3). I simplified this to a flat chain with a root ID. Every generation points to its parent and its chain root. The query for “show me this image's history” is just WHERE chain_root_id = ?. Simple, fast, and covers 95% of real usage.

Thought signatures as the editing primitive

The alternative to thought signatures is sending the full image back to the model with each edit request. That's expensive, slow, and produces worse results because the model is working from pixels, not from its own internal understanding. Storing signatures and reconstructing the multi-turn conversation gives the model memory of what it was trying to produce, which makes edits dramatically more coherent.

// Stack

Desktop

Tauri v2 (Rust)

Frontend

React 19 + TypeScript + Vite

State

Zustand (6 stores)

Gemini 3 Pro / 2.5 Flash

Database

SQLite (local)

Storage

Local filesystem

Styling

Tailwind CSS (terminal theme)

Image viewer

Pinch-zoom + drag-pan

// What I'd Do Differently

I want to add markup-directed editing — the ability to draw on an image and use the drawing to direct an edit. Google has this as a vibe-coded example project, but having it integrated into the chain workflow would be powerful. It's probably one prompt to Opus 4.6 to implement, it's just not a priority at the moment.

I'd also add model-agnostic support. Right now it's tightly coupled to Gemini's API. Abstracting the generation interface would let it work with Stable Diffusion, DALL-E, or any future model without rewriting the chain/editing logic. The real struggle is that there is too much to build and not enough time to build it, now that agentic coding has made everything possible.