Datasets
Datasets and Context:
4chan
4chan is an imageboard website where users can anonymously post. Users primarily participate in threaded discussions in response to an original post containing an image. Threads are categorized into “boards”, which are a many-to-one relationship between a thread and a forum room. One of the most popular boards is “/pol/” or “politically incorrect” which rebranded from the “/new/” board and is where many internet attacks and threats of real world violence have been found. Content and users across the site suggest a free-speech maximalist ideology.
Data Types Collected: User Profiles, Posts, Comments
8kun
8kun, previously called 8chan, is an imageboard site where anonymous users respond in a threaded format to an original post. 8kun was created in 2013 as a free speech alternative to 4chan after 4chan began banning topics. Like 4chan, 8kun threads are categorized into various “boards”. Activity on the site has been linked to several mass shootings and terrorist events, including three in 2019 (Christchurch, New Zealand, Poway, CA, and El Paso, TX). The site has also been known as the home of Q, the user behind the notorious QAnon conspiracies. 8kun’s founder no longer controls the site and has since advocated for it to be shut down due to the real world violence attributed to its usage.
Data Types Collected: User Profiles, Posts, Comments
BitChute
BitChute is a British video sharing site founded in 2017 and is commonly used as an alternative to YouTube. BitChute says it promotes freedom of expression, decentralized distribution, and empowering creators and are against censorship, mob rule, and platform bias. It does moderate content to some extent, though much of it has been as a result of government oversight. Many of the prominent accounts have migrated to the platform after being banned or demonetized from other social networks.
Data Types Collected: User Profiles, Comments, Video Metadata
Bluesky
Bluesky is an American text-based, decentralized social network created by a group of former Twitter employees. Also known as “Bluesky Social”, it is a microblogging social network that uses the “AT Protocol”. It officially branched out from Twitter in 2021 but maintains a similar feel and user experience. Moderation, however, works in a unique way at Bluesky in comparison to many legacy platforms. Called ‘Composable Moderation’, Bluesky’s moderation begins with a ‘basic default’ level of moderation followed by additional layers that are left to individual users to determine.
Data Types Collected: User Profiles, Posts
Fediverse
Fediverse is comprised of a network of decentralized platforms that gives users more control and autonomy. These networks are often backed by servers running open-source code and maintained by pseudonymous administrators. The open-source libraries running on these servers, like Mastodon or Lemmy, implement a shared communications protocols that allow the servers to “federate” and share information with one another. One of these protocols is called “ActivityPub”, which operates as a server-to-server federation communication network.
Data Types Collected: User Profiles, Posts
Gab
Gab is an American social media platform that was launched in 2016 as an uncensored alternative to mainstream social media platforms.
Gab has been a subject of controversy due to concerns about the presence of extremist and controversial content on the platform. Notably, the gunman responsible for killing 11 at a Pittsburgh synagogue in 2018 had previously posted antisemitic content on the platform.
Data Types Collected: User Profiles, Posts, Comments
Gettr
Gettr is an American social media platform that was launched in July 2021. Gettr positions itself as a platform for free speech and an alternative to mainstream social media. It has found some traction among Brazil’s far right with Jair Bolsonaro owning an account. There have been reported connections tying the self-exiled businessman Guo Wengui to the platform as its source of funding.
Data Types Collected: User Profiles, Posts, Comments
LBRY / Odysee
LBRY is a blockchain-based, peer-to-peer file-sharing and payment network. It was founded in 2015 and served as the foundation for decentralized platforms like social networks and video sharing platforms. One of its founders described it as the most censorship-resistant system to ever exist. It was shut down in 2023 following a lawsuit brought by the SEC for selling unregistered securities.
Odysee, a subsidiary of LBRY, is a fringe decentralized alternative to YouTube and has emerged as the LBRY successor. White supremacists and other extremists have naturally found a home at Odysee due to its stance on moderation.
Data Types Collected: User Profiles, Comments, Video Metadata
MeWe
MeWe is an American alt-tech social networking platform launched in 2011, originally under the name Sgrouples. It is billed as the anti-Facebook as it does not moderate content on its platform, which has allowed for the proliferation of extremism and disinformation. MeWe gained a lot of popularity and many new users in early 2021 when Donald Trump and many of his supporters were banned or removed from platforms like YouTube, Facebook, and Twitter.
Data Types Collected: User Profiles, Posts, Comments
Minds
Minds is an American peer-to-peer blockchain-based social network. It was launched in 2015 as a free speech, minimally moderated alternative to Facebook where users can earn crypto rewards for platform engagement. Its founders say they allow extremist content as part of an effort to deradicalize users through discourse. Like many of the other datasets, Minds saw a large influx of users following the January 6th US Capitol attack and the removal of many thousands of users from Twitter and Facebook.
Data Types Collected: User Profiles, Posts, Comments
OK
Odnoklassniki (OK), or “Classmates” in Russian, is a social media network founded in 2006. For the first few years of its existence, OK was the most popular website in Russia. In 2010, OK merged with VK and monopolized the Russian social media landscape. Like many other datasets, OK has little to no content moderation. The Texas man who killed 8 in a mass shooting in 2023 had previously posted neo-Nazi content to the platform.
Data Types Collected: User Profiles, Posts, Comments
Parler
Parler is an American alt-tech microblogging social network. Temporarily shuttered in April 2023 following an acquisition, Parler has reemerged as a place for maximal free speech and little content moderation. Parler is known as one of the primary social networking sites used to coordinate the January 6th storming of the US Capitol.
Data Types Collected: User Profiles, Messaging, Groups, Media
Poal
Poal is an American alt-tech threaded forum site modeled after the more mainstream Reddit. Poal insists they maintain a free speech approach with their community and have implemented very little content oversight. Like in other datasets, this lack of oversight leads to content containing harmful and harassing posts including online disinformation campaigns in addition to antisemitic and white nationalist propaganda.
Data Types Collected: User Profiles, Posts, Comments
RuTube
RuTube is a Russian video platform and alternative to YouTube founded in 2006. Now owned by Gazprom Media, RuTube has been used to push Wagner and state-authored talking points, online disinformation, and propaganda. State-sponsored material via a library of licensed content includes movies, series, cartoons, shows, and live broadcasts. It also hosts blogs, podcasts, and video game streams.
Data Types Collected: User Profiles, Comments, Video Metadata
Scored
Scored (formerly known as Communities.win, Win Communities, and The Donald) is a collection of alt-tech threaded based conversation forums that operates very similarly to its more mainstream counterpart, Reddit. The sites first came into existence when Reddit banned the subreddit r/The_Donald in 2020. Users responded by creating their own site, thedonald.win. Scored claims to “unblur the lines between entertainment and politics”. The Scored community c/TheDonald remains a very popular channel for users to discuss January 6th, conspiracy theories, and extremist rhetoric.
Data Types Collected: User Profiles, Posts, Comments
Telegram
Telegram Messenger, commonly known as Telegram, is an encrypted, cross-platform, cloud-based messaging application. Telegram was founded in 2013 by the founders of VK and hosts its operational center in Dubai. Telegram data schema consists of channels which users can join to post messages, images, videos or other media. The Open Measures Telegram dataset includes activity from extremist and neo-Nazi groups in the United States and coordinated state-backed disinformation campaigns throughout Europe and Africa.
Data Types Collected: User Profiles, Messaging, Groups, Media
VK
Founded in 2006, Vkontakte (VK) is a Russian social networking site. VK is based out of Saint Petersburg and considered to be the Russian Facebook. It was originally founded by the founders of Telegram and is still one of the most popular websites in Russia. It has light content moderation and loose enforcement on policy-violating content. In 2021, VK’s parent company (VK Group) sold majority ownership to Gazprom, effectively making VK a state-run company.
Data Types Collected: User Profiles, Messaging, Groups, Media
Wimkin
Wimkin is an American alt-tech social network founded in 2017. It promotes itself as a free speech alternative to traditional social media and the user experience is seen as a combination of Twitter and Facebook. Wimkin was pulled from major app stores in January of 2021 following calls to violence relating to the storm on the Capitol. The platform has since returned and maintains its lax policies on content moderation, describing itself as “100% Uncensored Social Media”.
Data Types Collected: User Profiles, Posts, Comments