Data

Below we have linked a number of relevant datasets and organisations with rich data for the field. To add new sets, please drop a message to Javier or Anna.   

Open data

Open data is data that can be freely accessed, used, re-used and redistributed by anyone – subject, at most, to the requirement that preserve provenance and openness. 

1. Social Networks Data

1.1 Twitter

Twitter recently launched  their new Twitter academic API that gives access to historical tweets, with a generous data cap and rate limits (of  10 million tweets a month; 50 requests per 15 min). This is historical in itself, as this gives us the opportunity to look back in time (search old tweets) which wasn’t possible with the earlier 7 day threshold. Researchers can apply fairly easy with a form online.

  • Count data on n-grams can be visualized/extracted easily via StoryWrangling
  • Capture tweets continuously with DMI-TCAT
  • Use the academic search API with 4CAT

1.2 Reddit

Reddit data is rather easy to obtain via either the reddit API, or probably preferably the Pushshift initiative, see documentation on how to user at the readthedocs.

1.4 YouTube

YouTube Data Tools (YTDT) is a collection of simple modules for extracting data from the YouTube platform via the YouTube API v3. It is not a mashup or fully developed analytics software, but a means for researchers to collect data in standard file formats to analyze further in other software packages.

1.3 Other social networks: 4CAT –  Reddit, 8chan, 4chan, 8kun, Telegram, Twitter, Tumblr

The Digital Methods Initiative at UvA Humanities makes accessible to researchers affiliated to UvA their tool 4CAT to gather data from various social media platforms (and Breitbart) and they also provide some analysis tools through a nice interface. You can also easily install it yourself via Docker. This page offers a good overview on how to use it.

2. Survey Panels

2.1 LISS panel

The LISS panel consists of about 5,000 households, comprising of about 7,500 individuals. It is based on a true probability sample of households drawn from the population register by CBS. It is accessible via ODISSEI.

2.2 European social survey

The European Social Survey (ESS) is an academically driven cross-national survey that has been conducted across Europe since its establishment in 2001. Every two years, face-to-face interviews are conducted with newly selected, cross-sectional samples.

2.3 GESIS panel

The GESIS Panel offers the social science community a unique opportunity to collect survey data within a probability-based mixed-mode access panel. Data collection and data usage for academic purposes is in most cases free of charge.

3. CBS (Statistics Netherlands)

Microdata are linkable data at the level of individuals, companies and addresses which can be made available to Dutch universities, scientific organisations, planning agencies and statistical authorities within the EU under strict conditions for statistical research.

It is accessible through CBS (long application process), ODISSEI, POPNET (collaborations), and IAS (collaborations)

4. Digital text

4.1 The WayBack Machine

The Wayback Machine saves internet pages at different moments in time. Free and easy to access.

4.2 KB Collections – Koninklijke Bibliotheek Collections

Some of the KB collections of webpages and blogs are now being made available through the TWIXL project.

4.3 CLARIAH

Focused on humanities research, and data from cultural sectors, here an overview of all data by CLARIAH and specifically their socalled Media Suite