It never ceases to amaze me the amount of effort being put into shoehorning a probability machine into being a deterministic fact-lookup assistant. The word “reliable” seems like a bit of a misnomer here. I guess only in the sense of reliable meaning “yielding the same or compatible results in different clinical experiments or statistical trials.” But certainly not reliable in the sense of “fit or worthy to be relied on; worthy of reliance; to be depended on; trustworthy.”
Since that notion of reliability has to do with “facts” determined by human beings and implanted in the model as learned “knowledge” via its training data. There’s just so much wrong with pushing LLMs as a means of accurate information. One of the problems being that supposing they got an LLM to, say, reflect the accuracy of wikipedia or something 99% of the time. Even setting aside how shaky wikipedia would be on some matters, it’s still a blackbox AI that you can’t check the sources on. You are supposed to just take it at its word. So sure, okay, you tune the thing to give the “correct” answer more consistently, but the person using it doesn’t know that and has no way to verify that you have done so, without checking outside sources, which defeats the whole point of using it to get factual information…! 😑
Sorry, I think this is turning into a rant. It frustrates me that they keep trying to shoehorn LLMs into being fact machines.
That’s an interesting take on it and I think sort of highlights part of where I take issue. Since it has no world model (at least, not one that researchers can yet discern substantively, anyway) and has no adaptive capability (without purposeful fine-tuning of its output from Machine Learning engineers), it is sort of a closed system. And within that, is locked into its limitations and biases, which are derived from the material it was trained on and the humans who consciously fine-tuned it toward one “factual” view of the world or another. Human beings work on probability in a way too, but we also learn continuously and are able to do an exchange between external and internal, us and environment, us and other human beings, and in doing so, adapt to our surroundings. Perhaps more importantly in some contexts, we’re able to build on what came before (where science, in spite of its institutional flaws at times, has such strength of knowledge).
So far, LLMs operate sort of like a human whose short-term memory is failing to integrate things into long-term, except it’s just by design. Which presents a problem for getting it to be useful beyond specific points in time of cultural or historical relevance and utility. As an example to try to illustrate what I mean, suppose we’re back in time to when it was commonly thought the earth is flat and we construct an LLM with a world model based on that. Then the consensus changes. Now we have to either train a whole new LLM (and LLM training is expensive and takes time, at least so far) or somehow go in and change its biases. Otherwise, the LLM just sits there in its static world model, continually reinforcing the status quo belief for people.
OTOH, supposing we could get it to a point where an LLM can learn continuously, now it has all the stuff being thrown at it to contend with and the biases contained within. Then you can run into the Tay problem, where it may learn all kinds of stuff you didn’t intend: https://en.wikipedia.org/wiki/Tay_(chatbot)
So I think there are a couple important angles to this, one is the purely technical endeavor of seeing how far we can push the capability of AI (which I am not opposed to inherently, I’ve been following and using generative AI for over a year now during it becoming more of a big thing). And then there is the culture/power/utility angle where we’re talking about what kind of impact it has on society and what kind of impact we think it should have and so on. And the 2nd one is where things get hairy for me fast, especially since I live in the US and can easily imagine such a powerful mode of influence being used to further manipulate people. Or on the “incompetence” side of malice and incompetence, poorly regulated businesses simply being irresponsible with the technology. Like Google’s recent stuff with AI search result summaries giving hallucinations. Or like what happened with the Replika chatbot service in early 2023, where they filtered it heavily out of nowhere claiming it was for people’s “safety” and in so doing, caused mental health damage to people who were relying on it for emotional support; and mind you, in this case, the service had actively designed it and advertised it as being for that, so it wasn’t like people were using it in an unexpected way from that standpoint. The company was just two-faced and thoughtless throughout the whole affair.