• BakedCatboy@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    13 hours ago

    Aiui, back-feeding uncurated slop is a real problem. But curated slop is fine. So they can either curate slop or scrape websites, which is almost free. So even though synthetic training data is fine, they still prefer to scrape websites because it’s easier / cheaper / free.