Summer 2024 • Hybrid
Summer 2024 | Hybrid
Project: Intelligent File Search & Vector Database Integration
When I joined Zillion Technologies as an intern, most interns were assigned exploratory or future-looking projects. But thanks to my full-stack experience, I was entrusted with a live, client-facing deliverable — a system that transformed SharePoint file repositories into an intelligent, searchable vector database.
Optimized SharePoint API Integration
Took over an existing SharePoint API implementation and significantly improved its performance and structure. Enhanced its ability to securely authenticate (secret ID, secret key, user token), recursively navigate file systems, and handle large volumes of mixed-format files more efficiently.
Vector Embedding for Semantic Search
Designed a pipeline to process document content (with OCR for image-based PDFs) and embed it into a vector database. Used the all-MiniLM-L6-v2 transformer from Hugging Face to generate embeddings for semantic search.
Chatbot Interface
Integrated a chatbot powered by OpenAI’s ChatGPT that allowed users to query the system conversationally. Results included not just content matches, but also exact file locations and metadata for precise navigation.
Delta Updating
Implemented a delta-update mechanism to selectively re-index only modified files or folders. This significantly reduced processing time and improved system scalability.
all-MiniLM-L6-v2This wasn’t a sandbox or experimental task — it was a production-grade, client-facing system. It gave me hands-on experience in:
Outcome: Delivered a working prototype that enabled real-time, context-aware document retrieval from complex SharePoint file systems — all within a few weeks.
Imagine a hospital staff member trying to locate a patient’s medical records — but they don’t remember the patient’s name, only their facial features or a vague description. With our AI-powered semantic search system, they can input that contextual information and receive the most relevant matches. The system parses embeddings to infer meaning, then returns metadata-rich file locations from SharePoint. This kind of retrieval bridges human memory with structured enterprise storage.