I feel like there are probably some ad based search engines which are privacy and service oriented, but in general even for those there remains a misalignment problem. Hence if I don’t want to be a product now or in the future, what good search engines are there that I can pay for?

  • DaGeek247@fedia.io
    link
    fedilink
    arrow-up
    6
    ·
    14 hours ago

    The issue is that the internet is too large to index.

    It’s really not. At least, not yet. It’s a large part of why it isn’t done, but it’s not the only one, and I’d argue, not even the main reason it isn’t really done.

    A complete crawl with meta data of the internet in 2025 is only 424TiB. For comparison, my 1000$ home setup can handle about a tenth of that(in storage at least). The hardware to maintain a single database of the internet with metadata could cost under $100,000, easily.

    Dave, your comment about it costing a billion to run Bing or Google might be true, but it is completely unrelated to the realities of running a small search engine and has everything to do with the fact that they are Google and Microsoft products respectively.

    The real issue isn’t the physical size of the internet, it’s much more likely to be the complexity of making a search algorithm that can compete with the 75 billion seo market that wxists to break search engines.

    • 𞋴𝛂𝛋𝛆@piefed.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      13 hours ago

      Original comment said in good faith, but from sketchy long term memory of stuff I’ve come across. It seems like it was in a Lex Friedman or similar podcast at some point, but from some time in the last 3-10 years. I may have conflated or misunderstood, as I am not experienced with such complexity. I seem to recall it coming up around the time several astronomers were speaking publicly about issues with processing large amounts of data and soliciting solutions. I just recall wondering why search started to suck around 2017, and putting the pieces together when I heard this. Now, in retrospect, it seems much of the changes were also adversarial for rival AI training after the Transformers paper. At least, looking at how search results are salted now, and the way images are selected for search is absolutely adversarial for AI training datasets… but that is all I know, and should be taken as friendly neighborhood water cooler talk, always with the best of intentions.

      • DaGeek247@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        13 hours ago

        I think most startup search engines use Google/bing because it’s free/way cheaper than running their own database, not because it’s impossible. It also likely sidesteps a lot of the seo bullshit simply because Google/bing have more experience working around it

        So like, short term/small size its cheaper and straight up easier to piggyback off of the big two companies, rather than manage your own data set. Long term, if you get popular enough to be noticed, I expect that the seo business would wreck any selfhosting search engine startup company’s results pretty regularly.