DegreeGuru is an open-source project demonstrating how to build a Retrieval-Augmented Generation (RAG) AI chatbot using the Vercel AI SDK, Langchain, Upstash Vector, and OpenAI. It's designed to provide expert answers on custom data, exemplified by university degrees.
Key features include:
- Built-in Crawler: Scrapes specified websites, automatically making data available for the AI.
- Real-time Performance: Delivers fast answers leveraging Upstash Vector for efficient data retrieval and real-time data streaming.
- API Protection: Incorporates rate limiting using Upstash Redis to prevent API abuse.
- Domain Agnostic: Easily adaptable to any custom dataset by modifying the
crawler.yamlconfiguration.
The technical stack comprises:
- Crawler: Developed with Scrapy (Python) for efficient web data extraction.
- Chatbot Application: Built on Next.js, providing a modern and responsive user interface.
- Vector Database: Utilizes Upstash Vector for storing and querying vector embeddings of the scraped data.
- LLM Orchestration: Employs Langchain.js to manage interactions with large language models.
- Generative AI: Powered by OpenAI's
gpt-3.5-turbo-1106for generating expert responses. - Embeddings: Uses OpenAI's
text-embedding-ada-002for creating vector representations of text. - Streaming: Leverages Vercel AI for seamless text streaming in chatbot responses.
The project provides a comprehensive quickstart guide for local development, covering environment setup (Upstash Vector, Upstash Redis, OpenAI API keys), Python library installation, and crawler configuration via crawler.yaml and settings.py. A Docker-compose option is also available for simplified deployment. Users can customize the chatbot's behavior, including streaming modes and the AGENT_SYSTEM_TEMPLATE, to tailor it to specific use cases. While robust, current limitations include the UpstashVectorStore being a work-in-progress within Langchain, potential message history issues in non-streaming mode, and challenges in explicitly displaying sources during streaming.




