Resume → Job Match AI

An end-to-end AI pipeline that extracts skills from a resume PDF, generates semantic embeddings, and ranks job listings by similarity — automating the job search process.

📅 2025

🏷️ Personal Project

💻 Python, NLP, Embeddings

View on GitHub

The Problem

Job searching is broken. Candidates spend hours manually reading job descriptions, trying to figure out if their skills match. Resume keyword scanning by ATS systems is noisy and misses semantic relationships between skills.

The goal: build an AI-powered tool that reads a resume, understands its skill profile at a semantic level, and automatically surfaces the most relevant job listings — ranked by actual fit, not keyword overlap.

The Solution

A Python pipeline with four distinct stages: document parsing, NLP skill extraction, semantic embedding generation, and cosine similarity ranking. The result is a ranked list of job listings with match scores.

The key insight is representing both the resume and job descriptions as dense vectors in the same embedding space — then finding which jobs are "closest" to the resume profile geometrically.

Screenshots / Demo

🖼️

Screenshots & demo video coming soon

System Architecture

The pipeline is fully modular — each stage is a separate Python module with a clean interface, making it easy to swap components (e.g., replace the embedding model with a newer one).

📄 Resume PDF

PyPDF2

Text Extraction

spaCy / regex

Skill Detection & Normalization

sentence-transformers

Semantic Embedding Model

scikit-learn

Cosine Similarity Scoring

✅ Ranked Job Recommendations

Technical Highlights

🔤

PDF Text Extraction

Used PyPDF2 to extract raw text from multi-page resume PDFs, handling various PDF encodings and formats robustly.

🧠

NLP Skill Extraction

Combined named entity recognition and a curated skills ontology to detect technical and soft skills, normalizing synonyms (e.g., "ML" → "Machine Learning").

📐

Semantic Embeddings

Generated dense 768-dimensional vectors using sentence-transformers (all-MiniLM-L6-v2), capturing semantic meaning rather than surface keywords.

📊

Similarity Ranking

Computed cosine similarity between the resume vector and each job listing vector, returning a scored ranked list with confidence percentages.

Tech Stack

Links

View on GitHub Live Demo

Project Type

▸ Personal Project
▸ End-to-End AI Pipeline
▸ NLP / Semantic Search

Other Projects

✈️ Travel Booking AI 🏃 MarathonHub Capstone 📊 ERPsim Simulation 🏛️ Monroe County Redesign