Finding Duplicate Photos with Google Photos API
Like most people, I have thousands of photos spread across Google Photos. And like most people, I have duplicates. Lots of them. Screenshots, multiple uploads, WhatsApp backups—the mess is real. So I built a Python script to solve it: google-photos-duplicate-finder.
The Problem
Google Photos is great, but:
- Multiple devices = multiple uploads
- Screenshot accumulation
- WhatsApp/WhatsApp Business duplicates
- No built-in duplicate detection
- Manual deletion is painful
The Solution
A Python script that:
- Connects to Google Photos API
- Identifies duplicates using multiple methods
- Provides detailed reports
- Helps you clean up safely
Installation
git clone https://github.com/irfancode/google-photos-duplicate-finder
cd google-photos-duplicate-finder
pip install -r requirements.txt
Setup
1. Google Cloud Console
- Go to Google Cloud Console
- Create a new project
- Enable “Photos Library API”
- Create OAuth 2.0 credentials
- Download
credentials.json
2. First Run
python main.py --auth
This opens browser for OAuth authentication.
Detection Methods
Method 1: Filename Matching
# Find files with same name
matches = find_by_filename(media_items)
Good for: Quick scans, obvious duplicates
Method 2: Metadata Comparison
# Compare creation time, device info
matches = find_by_metadata(media_items)
Good for: Same photo from different devices
Method 3: Content Hashing
# Download and hash image content
matches = find_by_hash(media_items)
Good for: Exact duplicates, compressed copies
Usage
Quick Scan (Filenames Only)
python main.py --method filename --report
Deep Scan (Content Hash)
python main.py --method hash --report --download
Interactive Review
python main.py --interactive
Code Walkthrough
Authentication
from google_photos import GooglePhotos
photos = GooglePhotos('credentials.json')
photos.authenticate()
# Get all media items
media_items = photos.list_media()
Duplicate Detection
def find_duplicates(media_items, method='hash'):
if method == 'filename':
return find_by_filename(media_items)
elif method == 'metadata':
return find_by_metadata(media_items)
elif method == 'hash':
return find_by_hash(media_items)
Hash Comparison
import hashlib
def compute_hash(media_item):
# Download image
content = download_image(media_item['baseUrl'])
# Compute SHA256
return hashlib.sha256(content).hexdigest()
Results
Typical scan findings:
- 500-1000 duplicates in average library
- 20-30% storage savings
- Most common: Screenshots, WhatsApp images
Safety Features
- Read-Only Mode — Scan without deleting
- Reports First — Review before acting
- Batch Processing — Delete in chunks
- Trash, Not Delete — Items go to trash first
Future Enhancements
- Face detection for grouping
- AI-powered similarity detection
- Cloud function deployment
- Mobile app companion
Conclusion
Google Photos is an excellent service, but it lacks native duplicate management. With google-photos-duplicate-finder, you can reclaim storage and organize your memories.
The best part? It’s open source, so the community can improve it together.
Questions? Let’s connect!
| Connect: LinkedIn | GitHub |