Are you interested in building an efficient semantic search system for your internal documents ?
In this blog post, we'll share a new (well, but also classic) embedding model that’s essential for your document search system. Let's dive in!
Understanding ModernBERT: The New BERT
ModernBERT, recently released by Answer.ai, represents a significant leap forward in the BERT (Bidirectional Encoder Representations from Transformers) family of models. While the original BERT model, released in 2018, marked the beginning of the transformer era in AI, ModernBERT builds upon its success with several key improvements:
Extended Context Window: ModernBERT can handle up to 8,000 words, a substantial increase from BERT's 512-word limit. This allows for better understanding of longer documents.
Improved Efficiency: Despite its enhanced capabilities, ModernBERT remains a relatively small model, making it fast and efficient for various NLP tasks.
Architectural Makeover: The model incorporates both local and global attention mechanisms, leading to improved performance across a range of tasks.
Versatility: ModernBERT can be used for multiple NLP tasks, including classification, sentiment analysis, and semantic search.
Curious to learn more?
Join Professor Mehdi and myself for a discussion about this topic below:
What you’ll learn 🤓:
🔎 Showcase a practical search application against 1000 research articles.
🚀 A detailed code walk through of the entire process of building the search system, including -
Setting up the necessary libraries and environment
Loading and preprocessing the dataset
Implementing the embedding function using ModernBERT
Configuring and using the Milvus vector database
Creating batched insertions for improved efficiency
Implementing the search function with customizable parameters
👇
Before we go on…a quick announcement -
🚀 Join Our New YouTube Membership Community!
For many of you following us on YouTube. thank you so much for your support! 🦄
In addition to our regular updates, I’m excited to announce the launch of our membership community! Whether you’re looking to master Retrieval-Augmented Generation (RAG), AI Agents, or dive deep into advanced AI projects and tutorials through AI Unbound, there’s something for everyone passionate about AI.
By joining, you’ll gain exclusive content, stay ahead of the curve, and reduce AI FOMO while building real-world skills. Ready to take your AI journey to the next level?
Let’s build, learn, and innovate together!
🛠️✨ Happy practicing and happy building! 🚀🌟
Thanks for reading our newsletter. You can follow us here: Angelina Linkedin or Twitter and Mehdi Linkedin or Twitter.
🌈 Our RAG course: https://maven.com/angelina-yang/mastering-rag-systems-a-hands-on-guide-to-production-ready-ai
📚 Also if you'd like to learn more about RAG systems, check out our book on the RAG system: You can download for free on the course site:
https://maven.com/angelina-yang/mastering-rag-systems-a-hands-on-guide-to-production-ready-ai
🦄 Any specific contents you wish to learn from us? Sign up here: https://noteforms.com/forms/twosetai-youtube-content-sqezrz
🧰 Our video editing tool is this one!: https://get.descript.com/nf5cum9nj1m8
📽️ Our RAG videos: https://www.youtube.com/@TwoSetAI
📬 Don't miss out on the latest updates - Subscribe to our newsletter: