Skip to content

A high-performance PDF summarization tool powered by Google's Gemma 3 LLM. Features parallel processing, async operations, and intelligent chunking for technical paper analysis. Built with FastAPI, Streamlit, and Ollama.

Notifications You must be signed in to change notification settings

arjunprabhulal/gemma3_pdf_summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

πŸ“„ AI-Powered PDF Summarizer

πŸš€ AI-Powered PDF Summarizer is a tool that extracts and summarizes research papers from ArXiv PDFs using Ollama (Gemma 3 LLM). The system provides structured, downloadable summaries to help researchers and professionals quickly grasp key findings.

PDF Summarizer UI


πŸ›  Features

  • 🌐 Input an ArXiv PDF URL to fetch and summarize papers.
  • πŸ“‘ Extracts technical content (architecture, implementation, results).
  • πŸ” Optimized for large text processing with parallel summarization.
  • 🎨 Modern UI built with Streamlit.
  • πŸ“₯ Download summary as a Markdown file.

πŸš€ Tech Stack

Component Technology
Frontend Streamlit
Backend FastAPI
LLM Platform Ollama
LLM Model Google Gemma 3
PDF Processing PyMuPDF (fitz)
Text Chunking LangChain RecursiveCharacterTextSplitter

🎬 Demo

1️⃣ Enter an ArXiv PDF URL
2️⃣ Click "Summarize PDF" πŸš€
3️⃣ Get a structured summary with technical insights πŸ“
4️⃣ Download as Markdown πŸ“₯


πŸ”§ Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/arjunprahulal/gemma3_pdf_summarizer.git
cd gemma3_pdf_summarizer

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Install Ollama and Gemma 3 LLM

Install Ollama - MacOS/Linux

curl -fsSL https://ollama.com/install.sh | sh

Download Gemma 3 Model

ollama pull gemma3:27b

3️⃣ Start the Backend (FastAPI)

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

4️⃣ Start the Frontend (Streamlit)

streamlit run frontend.py

πŸ“œ API Endpoints

πŸ”Ή Health Check

GET /health

Response:

{"status": "ok", "message": "FastAPI backend is running!"}

πŸ”Ή Summarize

Summarize an ArXiv Paper

POST /summarize_arxiv/

Request Body:

{
  "url": "https://arxiv.org/pdf/2401.02385.pdf"
}

Response:

{
  "summary": "Structured summary of the research paper..."
}

About

A high-performance PDF summarization tool powered by Google's Gemma 3 LLM. Features parallel processing, async operations, and intelligent chunking for technical paper analysis. Built with FastAPI, Streamlit, and Ollama.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages