Product Updates··8 min read

Extract AI Preset Categories from Photos

Build a coherent set of AI fashion presets from 2-6 reference photos. Same scene, lighting, and mood across the whole category — campaigns stay on-brand.

By On-Model Team

AI extracts consistent fashion photography presets from multiple reference images into a single reusable category

A few weeks ago we shipped Extract from Image — upload one fashion photograph and have AI fill in a complete preset for you. It removed most of the friction from preset creation, and it's been one of our most-used features since.

But brands kept asking us a different question: what about a set? A campaign isn't one shot, it's six. A product page isn't one preset, it's a coherent pack of front, back, three-quarter, and detail. Running Extract from Image six times in a row gives you six presets, but each one was generated in isolation — so the words drift. The same warm cyclorama gets called "soft beige cyclorama" in one preset and "creamy seamless backdrop" in the next. The same window-light setup is "soft directional from camera-left" once and "diffused side light" a second time. To the AI generating the next image, those reads as different lighting setups. Your campaign falls apart.

Today we're shipping the fix: Extract from a Set of Images.

Extract a whole category at once

The new flow lives on the dedicated Categories screen (you'll find it from Presets → Categories in the sidebar). Click the prominent Extract from images card, pick 2–6 reference images that should belong to the same campaign, and the AI does the rest. What you get back isn't a list of disconnected presets: it's a fully-formed category — N presets that all share the same wording for the things that visually agree, and only differ where the images actually differ.

The category is created and persisted atomically. Credits are charged at 1 per image (so a 4-image set costs 4 credits), and the new category is auto-named with a timestamp that you can rename later. Its thumbnail is even prefilled from your first reference, so you'll spot it immediately in the grid.

How it works

Step 1 — Open the Categories screen

In the app, head to Presets → Categories. The screen leads with a single hero card: "Extract a category from images." Click Get started.

The Categories screen — Extract from images is the primary CTA

Step 2 — Pick 2 to 6 references

The asset picker opens in multi-select mode. Pick from your library, upload a fresh batch (the dropzone accepts multiple files at once, drag-and-drop or click), or mix both. Selected tiles get a numbered blue badge so you always know the order, and a footer counter reads "N / 6 selected" with the Extract category button live as soon as you've picked at least two.

The picker lets you mix library and upload — Extract category lights up at 2 selections

Step 3 — One pass, one coherent category

Click Extract category and wait. The AI analyzes your references as a coherent set and extracts the presets together rather than one by one — that's what keeps them consistent. A few seconds later you land on the new category, with N presets already saved, and you can start applying them to your products immediately.

Because the AI processes your references as a set rather than independently, it can tell which traits are shared across the campaign and which vary by image — and lock the shared ones to identical wording. That's what unlocks the consistency you'll see below.

The consistency story

This is the part that matters. Calling single-image extract three times produces three presets that drift. Calling Extract from a Set with the same three images produces three presets that share byte-identical wording on every field where the images visually agree. We mean that literally — the strings are equal, character for character.

Here's what that looks like with the three references we tested for this post:

Reference image 1 — first shot in a coherent campaign
Reference 1
Reference image 2 — second shot, same scene
Reference 2
Reference image 3 — third shot, same scene
Reference 3

After extraction, the shared traits collapsed to a single canonical wording across all three presets. Background, style, mood, and color palette read identically — and so did lighting and most of the camera settings. Here are the byte-identical strings the model chose for the whole set:

shared-across-all-3-presets.json
{
  "background": "Seamless light gray cyclorama with a soft shadow falloff at the floor, creating a clean e-commerce look.",
  "style": "Clean e-commerce studio photography emphasizing the drape and fit of the garment against a neutral backdrop.",
  "mood": "Neutral, professional, and focused on presenting the product clearly without distraction.",
  "color_palette": "Monochromatic scene palette featuring light gray and white tones from the studio background and lighting.",
  "lighting": {
    "direction": "Key light positioned slightly above and straight on, with gentle side fills to minimize harsh shadows.",
    "quality": "Soft, diffused lighting creating smooth gradients and natural-looking highlights on the fabric.",
    "complexity": "Standard studio setup with a primary softbox key light and ambient fill for even, shadowless illumination."
  },
  "camera": {
    "lens": "Standard portrait lens around 50-85mm, providing natural proportions and gentle compression.",
    "angle": "Shot straight on at roughly chest height, maintaining a flat, proportional perspective.",
    "aperture": "Moderate aperture around f/5.6-8 to keep the entire garment in sharp focus."
  }
}

Only the fields where the references actually differed got per-image descriptions — pose, framing, and expression:

varying-per-preset.json
{
  "preset_1_full_body": {
    "pose": "Model standing facing forward, legs slightly apart, arms resting naturally at sides, creating a straight and relaxed posture.",
    "camera": { "framing": "Full-body shot, cropped just below the neck and just below the toes, keeping the garment centered in frame." },
    "expression": "Facial expression is not fully visible due to cropping, but head is positioned straight and level."
  },
  "preset_2_mid_shot": {
    "pose": "Model standing slightly angled, hands clasped gently together near the waist, showing the sleeve details.",
    "camera": { "framing": "Three-quarter length shot, cropped just below the knees and at the top of the head." },
    "expression": "Neutral expression with eyes looking slightly downward, conveying a relaxed and understated confidence."
  },
  "preset_3_detail": {
    "pose": "Model facing slightly away from the camera, arms folded and raised near the chest, showcasing the fabric texture and sleeve construction.",
    "camera": { "framing": "Close-up detail shot of the upper torso and arms, cropped at the neck and mid-skirt." },
    "expression": "Facial expression is completely cropped out, focusing entirely on the upper body and hands."
  }
}

This isn't post-processed normalization. The AI evaluates the references together as a set, chooses byte-identical wording where they visually agree, and writes per-image wording only where they don't. Apply any of these three presets to a different product and you'll get an output that belongs to the same shoot.

Outfit-agnostic by design

There's a second change worth calling out, because it's the difference between a preset that works on every product you own and one that only works on the product it was extracted from.

Presets describe the photographic context — pose-shape, scene, lighting, camera, mood, style, ambient color of the scene. They never name specific clothing items, fabrics, or outfit colors. Extract a category from a denim lookbook and you won't see "blue denim" in the color_palette. You'll see the scene palette: the cyclorama tone, the lighting cast, the ambient. Apply that category to a satin slip dress and the dress reads true.

A category extracted from a jeans-and-tee lookbook can be reused on a dress shoot, a knitwear shoot, or any other product. The preset describes how the photo is taken, not what the model is wearing.

In action — from references to generated outputs

Here's the same set of references, applied as a category. We took the three extracted presets, paired them with an AI model and a flat-lay product, and ran one generation per preset:

The flat-lay input — a wrap dress, deliberately a completely different garment from the black satin midi dress in the references:

Wrap dress flat-lay — the input product
Input — wrap dress

All three presets in the category, applied back-to-back to the same product:

AI Generated
Full Body
Mid Shot
Detail

The product is brand new. The model is brand new. But the three outputs belong to the same shoot — and to the same shoot as the references. Same light gray cyclorama. Same soft directional lighting. Same neutral monochromatic palette. Same straight-on camera. Only pose, framing, and expression change — exactly as the references vary, and exactly as a real PDP trio would.

This is a complete product detail page in three clicks: load the Full Body preset for the hero, load Mid Shot for the lifestyle slot, load Detail for the close-up. Move on to the next SKU and reuse the whole category — the visual world stays put.

When you save the category, the AI also names each preset for you (here: Studio E-Commerce Full Body, Mid Shot, Detail). Those names are picked from what varies across the references — they're the cue you'll use later to load the right preset for each shot type.

A category is a strong starting point. The presets carry scene, lighting, mood, color palette, camera, and pose-shape — everything that makes the photographs belong together. They don't try to lock down every last styling detail of the originals (props, accessories, micro-styling cues), and they shouldn't: the whole point is that you can reuse the category on entirely new products. If you want a single output to copy a specific reference even more closely, you can also feed that reference image as a second input alongside your flat-lay — see the trick covered in the single-image extraction post.

Single image vs a set — when to use each

Both flows have their place. Here's the practical breakdown:

Extract from one imageExtract from a set
Output1 preset1 category, N presets
Best forOne-off shots, ad-hoc style transferCampaigns, lookbooks, multi-shot product pages
ConsistencyPer-image onlyByte-identical wording on shared traits
Cost1 credit1 credit per image (N total)
Reuse on other productsYes, outfit-agnosticYes, outfit-agnostic
Where to find itPresets → New Preset → From ImagePresets → Categories → Extract from images

If you only need one shot, the single-image flow is faster and cheaper. The moment you need more than one shot in the same visual world, switch.

Use cases

  • Build a campaign brief from a moodboard. Drag your team's seasonal moodboard into the picker and walk away with a category that captures the brief in machine-applicable form.
  • Convert a competitor's lookbook. Saw a coherent set of shots on a brand site you admire? Six clicks and you have a reusable preset pack for your own products.
  • Standardize a multi-shot PDP. Front, back, three-quarter, detail — extract from a single past shoot you loved and re-apply it to every new SKU going forward.
  • Onboard a new collaborator. Hand them a category instead of a 12-page brand-style PDF. The presets are the brief.

"Visual consistency is the single biggest driver of brand trust in e-commerce. The shoots that win on conversion aren't the most expensive — they're the ones where every image clearly belongs to the same world."

— Senior creative director at a European fashion house, in conversation

According to McKinsey, fashion brands spend $500–$1,000 per SKU on traditional product photography, and the cost compounds when each new campaign requires a fresh shoot to keep the catalog coherent. Shopify's research finds that 75% of online shoppers rely on product photos when deciding whether to buy, and Baymard Institute notes that catalog imagery whose visual style drifts measurably increases bounce rates on category pages. Extracting a whole category from references in a single pass is how brands scale the coherent-shoot experience without scaling the production budget.

Try it now

Head to Presets → Categories and click Extract from images. Pick a handful of references from a shoot you already love, and you'll have a reusable category in seconds.

Need just one preset? The single-image flow is still right there — see Create AI Presets from Any Fashion Photo.


Sources:

  1. McKinsey & Company. (2024). The State of Fashion: Technology Edition. mckinsey.com
  2. Shopify. (2025). Product Photography Statistics: Why Visuals Drive E-Commerce Sales. shopify.com
  3. Baymard Institute. (2025). Product Image UX: How Image Consistency Impacts Conversion. baymard.com
presetscategoriesflat-to-modelaiconsistencycampaignautomationfashion-photographyecommerce