I have a lot of thoughts about this. https://blog.pragmaticengineer.com/stack-overflow-is-almost-dead/
@anildash
I very much look forward to reading them
@anildash if stackoverflow was a major source in training current LLMs, what will next-generation LLMs be trained on?
@filmvanalledag @anildash why, the previous generation LLMs.
@anildash is it the same for the rest of Stack Exchange? How much of SE is archived and available elsewhere? (E.g internet archive)
@anildash Is this possibly just chat-gpt weeding out the dumb questions and the ones still being asked are golden?
(I'd like a deeper analysis than: "frown face".)
@anildash "Plus, ChatGPT is polite and answers all questions, in contrast to StackOverflow moderators."
Yep.
@codinghorror @anildash SO's average moderator is more polite than Wikipedia's average Talk page denizen imo, to be fair.
SO is just like a lot of older web communities, there are people who have been there for a long time and they have varying levels of tolerance for newbies. MetaFilter is no different. It just turns out that people like talking to no-judgment robots more than many of us expected. (I was just editing the WP page for Replika and some of the science around that is wild)
@jessamyn @codinghorror @anildash I gets hard to show an exceptional level of tolerance when 'questions' are posted where simply entering the title into the search field would reveal it being already answered. Or Newbs trying to rule lawyer why their off topic question is of course valid. Or most common: people complaining that an answer does not support what they wanted to hear.
So yeah, once one has spend some time on any SE site tolerance gets strained.
@Raffzahn @codinghorror @anildash I feel like any community website has to really build tools and experiences based on the outcomes they're looking for. I think SO was really trying to maximize good answers. every community has to figure out how much it's important to support the long-term users versus the newer users because usually there is some kind of essential tension there.
@jessamyn @codinghorror @anildash Well, yes.
I think one relevant part here is that first time users often treat SO/SE like a magical mirror, created to serve them, not humans that spend their time without payment or alike to help. (An image AI fits well)
With that mind set they ignore basic rules of interaction - like first checkin for on their own and being thankful for the service they get gifted from all users answering/commenting - including mods.
@Raffzahn @jessamyn @codinghorror I’m well aware of what the failure states were. But knowing that these problems arose constantly, I also think mods too often took satisfaction in being unkind to people whose first experience of the community would then be encountering rank hostility from the most-tenured members of the site. They could have tried to actually solve the core issues with empathy instead of lashing out at inexperienced coders.
@anildash @jessamyn @codinghorror Sounds like a very single sides picture you're drawing here.
@jessamyn @codinghorror @anildash ChatGPT always tells me my question is insightful and does its best to give me a helpful answer. It's kind of awful and creepy but it sure makes me more willing to ask any sort of question no matter how simple or dumb.
@nelson @jessamyn @codinghorror it’s also destructive that it’s scooped up all the code and questions of the web without giving anything back to tbe commons in return. But I am not surprised people picked the obsequious option over the one that made them feel dumb.
@anildash @nelson @jessamyn @codinghorror Not even that. I used Stack Overflow a lot when a question would get a correct, if unkind, answer. It started to get answers that were incorrect, but also voted up.
@sayrer @nelson @jessamyn @codinghorror yeah, and then the combination of both of those things together could be deadly for people’s perception, especially folks from underrepresented communities
@anildash @nelson @jessamyn @codinghorror Yes, the general phenomenon is "STEM bullying". That stuff even hits me sometimes.
Imagine going back and reading TLS 1.1 or SOAP or whatever. You'll find it in every case.
It's an adversarial mindset.
@sayrer @anildash @jessamyn @codinghorror I'm certainly guilty of that mindset back when I was a young brash engineer. It was how I was taught, particularly at MIT and early Google. I've been trying to unlearn it ever sense with mixed success.
@jessamyn @codinghorror Stack eventually started to improve on its hostility to newbies, but not quickly enough. Ultimately, moderators wrongly decided that it was impossible to have reliable, reference-quality information without also being condescending or patronizing to new coders, and the community suffered deeply for it. They could simply have come up with norms for routing & directing new users, instead of scolding & demaning those who didn’t yet know the norms.
@anildash @jessamyn Blerg. Killing off such a major source of the training data would seem to be a bad thing that will likely hasten the textpocalypse that @mkirschenbaum has written about -> https://www.theatlantic.com/technology/archive/2023/03/ai-chatgpt-writing-language-models/673318/
@anildash tbf, while LLMs give crap answers, many times SO was wrong too. To the point where I joked to block SO if I'd ever found a company.
@anildash questions asked != answers viewed… would be interesting to see if the views (minus bots) are also decreasing. I still prefer a source such as StackOverflow to AI….
I also wonder to what extent a large proportion of questions do not need to be asked any more (because there is already an answer)
@csolisr Chatgpt is not following cc-by-sa because it does not credit the source of the content and it is not distributed under sharealike terms. (Copyleft is not public domain.) This is why I stopped contributing to Stack Overflow. They violated the license terms I contributed under, and when I accordingly attempted to remove my content, they banned me for violating the license (?)
@anildash the answers became worse as it got more popular, and it's not hard to see LLMs getting there too.
@anildash This post misses a major point: >90% of all questions posed on SO could have been solved by simply entering the title into google and browsing thru the first three links (pre-ki Google that is).
KI is nothing more than a search engine that simplifies (and sugarcoats) that task.
This of course only leaves questions going for SO/SE that can't be solved by easy lookup.
@anildash thoughts me