• Knock_Knock_Lemmy_In@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    16 hours ago

    Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate. LLMs don’t get tired and they can be run in parallel.

    • jsomae@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      11 hours ago

      The problem is they are not i.i.d., so this doesn’t really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we’re already looking at “agents,” so they’re probably already doing chain-of-thought.

      • Knock_Knock_Lemmy_In@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        9 hours ago

        Very fair comment. In my experience even increasing the temperature you get stuck in local minimums

        I was just trying to illustrate how 70% failure rates can still be useful.

    • MangoCats@feddit.it
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      16 hours ago

      I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It’s a lot like machine translation. I speak fluent C++, but I don’t speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.

      I also don’t speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.

      Is this useful? When C++ is getting banned for “security concerns” and Rust is the required language, it’s at least a little helpful.

      • jsomae@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        ·
        11 hours ago

        I’m impressed you can make strides with Rust with AI. I am in a similar boat, except I’ve found LLMs are terrible with Rust.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          3
          ·
          11 hours ago

          I was 0/6 on various trials of AI for Rust over the past 6 months, then I caught a success. Turns out, I was asking it to use a difficult library - I can’t make the thing I want work in that library either (library docs say it’s possible, but…) when I posed a more open ended request without specifying the library to use, it succeeded - after a fashion. It will give you code with cargo build errors, I copy-paste the error back to it like “address: <pasted error message>” and a bit more than half of the time it is able to respond with a working fix.

          • jwmgregory@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            5 hours ago

            i find that rust’s architecture and design decisions give the LLM quite good guardrails and kind of keep it from doing anything too wonky. the issue arises in cases like these where the rust ecosystem is quite young and documentation/instruction can be poor, even for a human developer.

            i think rust actually is quite well suited to agentic development workflows, it just needs to mature more.

            • MangoCats@feddit.it
              link
              fedilink
              English
              arrow-up
              2
              ·
              5 hours ago

              i think rust actually is quite well suited to agentic development workflows, it just needs to mature more.

              I agree. The agents also need to mature more to handle multi-level structures - work on a collection of smaller modules to get a larger system with more functionality. I can see the path forward for those tools, but the ones I have access to definitely aren’t there yet.

            • Log in | Sign up@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 hours ago

              Ah, my bad, you’re right, for being consistently correct, I should have done 0.3^10=0.0000059049

              so the chances of it being right ten times in a row are less than one thousandth of a percent.

              No wonder I couldn’t get it to summarise my list of data right and it was always lying by the 7th row.

            • jwmgregory@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 hours ago

              don’t you dare understand the explicitly obvious reasons this technology can be useful and the essential differences between P and NP problems. why won’t you be angry >:(