@davidgerard@circumstances.run
The thing about gpt-oss is, the market for LLM-at-home is nerd experimenters. home llm runners on localllama. there isn't a market as such.
there's cloud providers offering it as a model you could use
the purpose of gpt-oss seems to be marketing - pure catchup cos llama and deepseek and qwen are open weights. look not left behind.
and apparently it's pretty good as a home model? if that's your bag
OpenAI censored the shit out of it because that risks bad press less than not censoring it - and breaking the censorship appears not that hard
this thread on how gpt-oss seems to have been trained is hilarious https://xcancel.com/jxmnop/status/1953899426075816164
this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.i could easily write 250 words on this thing but i'm not sure they'd be ones i'd care to read either
and it truly is a tortured model. here the model hallucinates a programming problem about dominos and attempts to solve it, spending over 30,000 tokens in the process
completely unprompted, the model generated and tried to solve this domino problem over 5,000 separate times
@michael@westergaard.social
there is absolutely a market other than home enthusiasts for open models. i have tons of customers that think they need to add ai to their applications, but do not wish to send their data abroad or to us big tech companies even if the models are hosted locally.
if using llms to write code was actually useful, i'd strongly argue for self-hosting a model for our developers (and optionally customers) over uploading it to github where microsoft can do whatever with it. llms can (sort of) help extracting information from documents to a structured format or aid in natural-language search, and for those purposes, i would argue for a self-hosted model over sending your internal documents to a us web-service
we can sell self-hosted models to eu customers as an eu company much better than a us company can, and we can sell that much better if the model is recognizable (gpt-oss) or at least decently working (llama) than if it is chinese (deepseek) or shit (mistral)
there is little use for open models for regular people, but there is for enterprise customers