Here is the first article of a series, on how to build a search engine, from scratch, in Rust.
Feel free to give me some feedback
https://jdrouet.github.io/posts/202503161800-search-engine-intro/
Following the introduction, here is the part 1 of my series of articles on how to build a crossplatform search engine from scratch, in #rustlang.
This section will handle how we'll store the encrypted data on any platform.
Enjoy reading it, feel free to provide some feedback, here or directly on GitHub
https://jdrouet.github.io/posts/202503170800-search-engine-part-1/
If you enjoy it, feel free to share it on other platforms!
If you enjoy it, feel free to share it on other platforms!
@jdrouet You can get some pretty big performance improvements by intersecting the binary indices on the go.
Depending on how they are laid out, you can intersect any number of postings lists in linear to sublinear time, with zero memory overhead. This scales much better than intersecting hash tables.
"Search Engines: Information Retrieval in Practice" has a section discussing the technique in chapter 5.4.7.
@jdrouet This article discusses the technique in more detail with regards to skip lists, though it does (as noted in SeIRP) work with any sorted list.
@marginalia really interesting! I'll have a look at it. Maybe not for the next article (although the topic is the optimisation). Thanks!
@jdrouet It's also fully possible the juice might not not be worth the squeeze for these types optimizations at the scale you're targeting. Though I figured I'd share it none the less, as it's genuinely a very cool optimization that's pretty intuitive.
@marginalia yeah, right now the bottleneck is not here, but more at the encryption/decryption level...