I feel like there are probably some ad based search engines which are privacy and service oriented, but in general even for those there remains a misalignment problem. Hence if I don’t want to be a product now or in the future, what good search engines are there that I can pay for?
Original comment said in good faith, but from sketchy long term memory of stuff I’ve come across. It seems like it was in a Lex Friedman or similar podcast at some point, but from some time in the last 3-10 years. I may have conflated or misunderstood, as I am not experienced with such complexity. I seem to recall it coming up around the time several astronomers were speaking publicly about issues with processing large amounts of data and soliciting solutions. I just recall wondering why search started to suck around 2017, and putting the pieces together when I heard this. Now, in retrospect, it seems much of the changes were also adversarial for rival AI training after the Transformers paper. At least, looking at how search results are salted now, and the way images are selected for search is absolutely adversarial for AI training datasets… but that is all I know, and should be taken as friendly neighborhood water cooler talk, always with the best of intentions.
I think most startup search engines use Google/bing because it’s free/way cheaper than running their own database, not because it’s impossible. It also likely sidesteps a lot of the seo bullshit simply because Google/bing have more experience working around it
So like, short term/small size its cheaper and straight up easier to piggyback off of the big two companies, rather than manage your own data set. Long term, if you get popular enough to be noticed, I expect that the seo business would wreck any selfhosting search engine startup company’s results pretty regularly.