Building Semantic Search for 100k+ Creative Assets

The problem wasn’t a lack of assets. It was a lack of discoverability.

At Sock Club, we had more than 100,000 creative assets sitting in Dropbox. Years of beautiful work that was effectively invisible. Finding a specific legacy design meant digging through deeply nested folders, guessing filenames, or relying on tribal knowledge.

Designers searched by concept. Sales searched under pressure, often mid-email or mid-call. The system only understood filenames.

This is the story of how I turned that functionally locked archive into a semantic visual search system that understands conceptual meaning. By moving away from rigid folder structures and toward a system that can actually “see” our work, we replaced hours of redundant recreation with seconds of discovery.

Logo credit: Baylor Meche & Rachal Berry

The Problem

While the archive was large, it was functionally locked. Filenames were inconsistent. Folder structures varied depending on who created them and when. Over time, this created a real operational bottleneck.

• Designers regularly recreated assets because finding originals took too long
• Sales had to interrupt designers to find relevant past work for clients
• New hires relied on tribal knowledge just to locate basic brand files

We tested Dropbox Dash hoping for a quick fix, but it lacked the depth required for nuanced, semantic discovery of visual assets. It provided a search bar, not conceptual understanding.

To truly unlock the archive, we needed something purpose-built: an internal system that recognized meaning, not just metadata.

Requirements

After working closely with designers and sales, I defined a small set of non-negotiable requirements.

User requirements

• Search by concept, not filename
• Fast results across the entire library
• Support PSDs, PNGs, and exports
• Secure access to original files
• Integration with HubSpot deals
• A pipeline that could process 100k+ assets without manual intervention

The system needed to be simple, intuitive, and reliable. If it required training, it would fail.

Architecture Overview

The system uses a shared mathematical “language” for both images and text, allowing users to search visual assets using natural language descriptions.

Tech stack

• Next.js frontend on Vercel
• FastAPI backend on Vercel’s serverless Python runtime
• Google Cloud Vertex AI multimodal embeddings
• Pinecone for serverless vector search
• AWS S3 for asset storage
• AWS Cognito for authentication
• HubSpot API for CRM integration
• PostHog for usage analytics

Data flow

• Assets migrate from Dropbox to S3
• Files are converted to PNG for previews and embedding
• Vertex AI generates high-dimensional embeddings
• Pinecone indexes vectors with rich metadata
• User queries are embedded and matched semantically
• The UI surfaces results with direct downloads and HubSpot context

Ingestion and Embedding Pipeline

Processing more than 100k assets required a pipeline that could survive failure.

Key challenges

• Staying within model rate limits
• Converting PSDs reliably
• Skipping corrupt or unusable files
• Recovering from long batch interruptions
• Detecting duplicates before embedding
• Handling PSDs that were effectively blank

Design choices

• Batch execution with adjustable concurrency
• Exponential backoff and retry logic
• Dedicated PNG conversion pipeline
• Pinecone pre-checks to avoid duplicate vectors
• Metadata mapping for HubSpot integration
• Progress tracking for multi-hour runs

The pipeline processed more than 253k embedding units. Total compute cost was $25, reduced to about $8 after credits.

Vector Database Strategy

Pinecone provided fast similarity search with minimal operational overhead.

Why Pinecone

• Serverless architecture
• Metadata-based filtering
• High read performance
• Smooth large-scale upserts

Index structure

• Cosine similarity with 1,408-dimensional vectors
• Each asset receives a “digital fingerprint” representing its visual and semantic essence

Conceptual vector example

[0.021, -0.114, 0.893, 0.004, -0.672, 0.318, … , 0.057]

Values capture latent features so similar designs remain mathematically close, even when filenames and folders are unrelated.

Search API

The FastAPI service handles:

• Text and image queries
• Embedding generation
• Vector retrieval
• Metadata hydration
• Authorization checks
• Secure access to original PSDs

Why Vertex AI

• High-quality multimodal embeddings
• Shared embedding space for text and images
• Strong benchmark performance
• Predictable latency
• Cost efficiency at scale

Search results consistently return in under a second.

Frontend Experience and Workflow

The interface is not just a search bar. It’s an operational tool that connects creative assets to business context.

High-value workflows

• Preventing redundant work by finding approved files instantly
• Handling zero-context client requests by reverse-searching images
• Surfacing complete brand histories for specific clients

Core functionality

• Unified search for text, images, and client names
• Direct links to HubSpot deals and internal projects
• "Explore similar designs" using embeddings
• Built-in asset flagging for quality control

Home screen with unified search experience

A simple search for "Denver airport" instantly surfaces years of brand history, turning a needle-in-a-haystack search into a five-second win.

Asset Quality Control

To maintain brand quality, I built a flagging system with four preset reasons. This calls out outdated or unusable assets within searches and gives designers a lightweight review workflow.

The integrated flagging system allows designers to mark assets with knittability issues or outdated logos, ensuring the team only references production-ready files.

Performance and Reliability

• Search latency: 300–600 ms
• Stable ingestion over multi-hour runs
• Zero-downtime deployments
• Full analytics and error tracking via PostHog

Cost Efficiency

The system runs for about $72 per month:

• Pinecone: $50
• Vercel: $20
• S3: $1–2
• Vertex AI: effectively zero after credits

The architecture scales without major changes.

Results

• Search became ~10× faster
• Designers stopped recreating work
• Fewer interruptions to senior team members
• Faster, more relevant sales follow-ups

Adoption was organic. No training or rollout was required.

"Sock Scout is phenomenal, by the way. I've used it many times this week and it has saved me so much time already. Thanks for building such an amazing tool for us!!"

Taylor Spence, Senior Designer

The biggest validation: people used it without being asked.

Lessons Learned

• Metadata is a byproduct, not a requirement
• UX drives adoption more than features
• Pipelines must be resilient by design
• Trust in search accuracy unlocks new workflows

Final Takeaway

Semantic search turns a static archive into a discovery engine. With modern AI tooling and a simple interface, a small team can build internal capabilities usually reserved for large engineering organizations.

This project reflects how I like to work: identifying real bottlenecks, designing pragmatic systems, and building end-to-end solutions that quietly make people’s work easier.