🌅 Building Semantic Search for 100k+ Creative Assets
How I transformed a chaotic Dropbox archive into an AI-powered search system that understands concepts instead of filenames.
By Jameson Campbell
Executive Summary
Impact
- • Search time reduced from twenty minutes to under two minutes
- • Lost asset time down forty percent
- • Adopted daily across design and sales
- • Zero training required
I built an end-to-end semantic search system that allows our teams to find past creative work using meaning based queries. The platform processes more than 100k assets, generates embeddings, stores vectors, and returns visually similar results with high accuracy and low latency. It eliminated the bottleneck of digging through nested folders and searching by filename guesswork.
1. The Problem
At Sock Club, our creative archive lived in Dropbox with more than 100k PSDs and exports accumulated over years. Filenames were inconsistent, folder structures varied, and searching depended on tribal knowledge. Designers searched for concepts, not file names. Sales needed examples for live client conversations.
Dropbox Dash could not solve this because it was built for general file search, not semantic visual discovery across creative content.
The consequences were clear. Designers recreated assets because finding originals took too long. Sales slowed down client emails while trying to find examples. Junior team members regularly pinged senior designers for help locating files.
We needed meaning based search that understood visual similarity, aesthetic themes, and conceptual intent.
2. Requirements
After working with designers and sales reps, I defined the core needs.
User Requirements
- • Search by concept, not filename
- • Fast results across the entire library
- • Support PSDs, PNGs, and exports
- • Direct access to original files with secure authentication
- • Integration with HubSpot Deals
- • A pipeline that processes more than 100k assets without manual intervention
The system needed to be simple, intuitive, and reliable.
3. Architecture Overview
The platform uses a unified embedding space for images and text to power semantic similarity search.
Tech Stack
- • Next.js frontend on Vercel
- • FastAPI backend on Vercel serverless Python runtime
- • Vertex AI multimodal embeddings
- • Pinecone for vector search
- • AWS S3 for asset storage
- • AWS Cognito for authentication
- • HubSpot API integration
Data flow
- • Assets migrate from Dropbox to S3.
- • Files convert to PNG when needed.
- • Vertex AI generates embeddings.
- • Pinecone stores vectors with metadata.
- • Search requests embed the query and retrieve nearest matches.
- • The UI displays results with metadata, previews, and PSD downloads.
Click to view full-size architecture diagram
4. Ingestion and Embedding Pipeline
Processing more than 100k assets required a durable, recoverable ingestion pipeline.
Key Pipeline Challenges
- • Staying within model rate limits
- • Converting PSDs reliably
- • Skipping corrupt files
- • Recovering from long batch interruptions
- • Detecting duplicates before embedding
- • Migrating from Dropbox to S3 cleanly
- • Handling PSDs that were 99 percent white
Design choices
- • Batch execution with adjustable concurrency
- • Exponential backoff and retry logic
- • PNG conversion pipeline for PSDs
- • Pinecone pre checks to avoid duplicate embeddings
- • Metadata mapping for HubSpot integration
- • Progress tracking to support multi hour runs
The pipeline completed more than 253k embedding units with a total compute cost of $25, reduced to $8 after Google Cloud credits.
5. Vector Database Strategy
Pinecone provided fast similarity search with minimal operational overhead.
Why Pinecone
- • Serverless architecture with zero maintenance
- • Metadata based filtering
- • High read performance
- • Smooth large scale upserts
Index structure
- • Cosine similarity with 1408 dimension vectors
- • Rich metadata for filtering
- • Dropbox ID based keys for designer assets to ensure long term consistency
- • S3 hash keys for sales assets for backward compatibility
This eliminated duplicate vectors and prevented orphaned entries during S3 reorganizations.
6. Search API
The FastAPI service handles:
- • Text or image queries
- • Embedding generation
- • Vector retrieval
- • Metadata hydration
- • Authorization checks
- • Secure access to original PSDs
Why Vertex AI
- • High quality multimodal embeddings
- • Same embedding space for text and images
- • Strong performance on MTEB benchmarks
- • Predictable latency
- • Cost effective at scale
The API consistently returns top results in under a second, helped by debouncing on the frontend.
7. Frontend Experience
The Next.js interface makes searching feel instant and intuitive.
UX Priorities
- • Fast feedback
- • Clear concept based results
- • Minimal learning curve
- • Useful previews without losing context
- • Full mobile support
Features
- • Unified search bar
- • Toggle for Design or Sales assets
- • Responsive grid of results
- • Fullscreen preview modal
- • Asset metadata with HubSpot links and Internal Portal links
- • Explore Similar Designs using the asset as the query
- • Asset flagging system for quality control
- • Dark mode via theme variables
- • Optimized mobile layout
- • Lazy loaded result sets
Home screen with unified search interface
Concept-based search results
Full screen preview with metadata and actions
Users can search for terms like playful geometric layout or vintage airport theme and the system returns visually coherent matches.
8. Asset Quality Control
To maintain brand quality, I built a flagging system with four preset reasons. This calls out outdated or unusable assets within searches and gives designers a lightweight review workflow.
Asset flagging system for quality control
9. Performance and Reliability
The system meets the following internal targets:
- • Search results under 300 to 600 milliseconds
- • Efficient preview loading even in large result sets
- • Ingestion pipeline stable over many hours
- • Zero downtime deployments with Vercel
- • Full analytics and error tracking via PostHog
10. Cost Efficiency
The platform costs about $72 a month to run.
- • Pinecone: $50
- • Vercel: $20
- • S3: about $1 to $2
- • Vertex AI queries: effectively zero after credits
The architecture can scale with minimal changes while staying cost efficient.
11. Results
Impact Metrics
- • Search 10x faster
- • 40% reduction in lost asset time
- • Designers stopped recreating assets unnecessarily
- • Fewer interruptions to senior team members
- • Faster sales responses for client emails
Adoption was organic. No training or rollout was required. People saw value immediately.
"Sock Scout is phenomenal, by the way. I've used it many times this week and it has saved me so much time already. Thanks for building such an amazing tool for us!!" — Taylor Spence, Senior Designer
The biggest validation: people used it without being asked.
12. Lessons Learned
- • Pipelines must be resilient to failure
- • UX drives adoption more than features
- • Trust in search accuracy unlocks new workflows
- • Isolated environments enable safe experimentation
- • Each environment teaches you what to automate next
- • Dry-run mode prevents costly mistakes
- • Early architectural decisions have lasting impact
- • Building for scale means making systems adaptable to constraints, not just fast
Final Takeaway
Semantic search turns an archive into a discovery engine. This project showed how modern AI tooling, paired with simple UX, can give a small team the type of internal capability usually reserved for large engineering organizations.
It reflects how I approach internal tools work. I identify a real operational friction point, design a pragmatic architecture, build the solution end to end, and deliver measurable improvements in team productivity and decision making.