Home / Projects / Resume → Job Match AI
AI/ML Python NLP

Resume → Job Match AI

An end-to-end AI pipeline that extracts skills from a resume PDF, generates semantic embeddings, and ranks job listings by similarity — automating the job search process.

📅 2025
🏷️ Personal Project
💻 Python, NLP, Embeddings
View on GitHub

The Problem

Job searching is broken. Candidates spend hours manually reading job descriptions, trying to figure out if their skills match. Resume keyword scanning by ATS systems is noisy and misses semantic relationships between skills.

The goal: build an AI-powered tool that reads a resume, understands its skill profile at a semantic level, and automatically surfaces the most relevant job listings — ranked by actual fit, not keyword overlap.

The Solution

A Python pipeline with four distinct stages: document parsing, NLP skill extraction, semantic embedding generation, and cosine similarity ranking. The result is a ranked list of job listings with match scores.

The key insight is representing both the resume and job descriptions as dense vectors in the same embedding space — then finding which jobs are "closest" to the resume profile geometrically.

Screenshots / Demo

🖼️

Screenshots & demo video coming soon

System Architecture

The pipeline is fully modular — each stage is a separate Python module with a clean interface, making it easy to swap components (e.g., replace the embedding model with a newer one).

📄 Resume PDF
PyPDF2
Text Extraction
spaCy / regex
Skill Detection & Normalization
sentence-transformers
Semantic Embedding Model
scikit-learn
Cosine Similarity Scoring
✅ Ranked Job Recommendations

Technical Highlights

🔤
PDF Text Extraction
Used PyPDF2 to extract raw text from multi-page resume PDFs, handling various PDF encodings and formats robustly.
🧠
NLP Skill Extraction
Combined named entity recognition and a curated skills ontology to detect technical and soft skills, normalizing synonyms (e.g., "ML" → "Machine Learning").
📐
Semantic Embeddings
Generated dense 768-dimensional vectors using sentence-transformers (all-MiniLM-L6-v2), capturing semantic meaning rather than surface keywords.
📊
Similarity Ranking
Computed cosine similarity between the resume vector and each job listing vector, returning a scored ranked list with confidence percentages.

Tech Stack

Python 3.11 PyPDF2 spaCy sentence-transformers scikit-learn NumPy Pandas

Project Type

  • Personal Project
  • End-to-End AI Pipeline
  • NLP / Semantic Search