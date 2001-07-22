Trolls, haters, flamers and other ugly characters are, unfortunately, a fact of life across much of the internet. Their ugliness ruins social media networks and sites like Reddit and Wikipedia.

But toxic content looks different depending on the venue, and identifying online toxicity is a first step to getting rid of it.

[...] To better understand what toxicity looked like in the open-source community, the team first gathered toxic content. They used a toxicity and politeness detector developed for another platform to scan nearly 28 million posts on GitHub made between March and May 2020. The team also searched these posts for "code of conduct" — a phrase often invoked when reacting to toxic content — and looked for locked or deleted issues, which can also be a sign of toxicity.

[...] "Toxicity is different in open-source communities," Miller said. "It is more contextual, entitled, subtle and passive-aggressive."

Only about half the toxic posts the team identified contained obscenities. Others were from demanding users of the software. Some came from users who post a lot of issues on GitHub but contribute little else. Comments that started about a software's code turned personal. None of the posts helped make the open-source software or the community better.

"Worst. App. Ever. Please make it not the worst app ever. Thanks," wrote one user in a post included in the dataset.

[...] "We've been hearing from developers and community members for a really long time about the unfortunate and almost ingrained toxicity in open-source," Miller said. "Open-source communities are a little rough around the edges. They often have horrible diversity and retention, and it's important that we start to address and deal with the toxicity there to make it a more inclusive and better place."