- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
Kagi has quickly grown into something of a household name within tech circles. From Hacker News and Lobsters to Reddit, the search provider seems to attract near-universal praise. Whenever the topic of search engines comes up, there’s an almost ritual rush to be the first to recommend Kagi, often followed by a chorus of replies echoing the endorsement.
Took me awhile to get back to this, but yeah I agree that it seems at least conceptually solid. The big barrier is that, like jarfil mentioned, you’d need at least 200 million sites indexed, so you’d need a good amount of users for it to work. And the users would need to consent to running some software that basically logs all the pages they visit. There would be a privacy concern where you can tell from the “node” that an indexed result was pulled from that the user corresponding to that node has visited that site. This could maybe be fixed by each user also downloading indexed site data from others aside from what they personally use, thus mixing in their own activity with others indistinguishably? Probably clever vulnerabilities in that too though.
Structurally it seems a lot like DNS. If only DNS servers were fine storing embeddings of site content and making those queryable, it would seemingly accomplish the same idea, aside from it being in the hands of DNS operators. Of course, that massively multiplies the amount of data these servers need to an impossible degree.
I still need to read up on what primitive indexing really looks like and how much space it takes to store per site.