This analysis project has been executed by Gerrit Anders and Jürgen Buder from IWM Tuebingen. For project-related queries, please contact Gerrit Anders at g.anders@iwm-tuebingen.de.

Project Overview

This repository host a general data analysis framework employed to investigate reply behaviour and polarization in the comment section of "Spiegel Online" (SPON). The research, focuses on understanding uncongeniality within a large online sample and examining polarization in online discussions.

The dataset for analysis can be found on the Open Science Framework.

Setup

To set up the analysis, follow the steps below:

Clone the Repository

git clone https://gitea.iwm-tuebingen.de/ganders/uncongeniality_analysis.git

Install non-python prerequisites

In order to run the analysis R needs to be installed. The analysis was conducted using R version 4.1.1. Please install R from the official website. In addition, development tools is recommended. This can be done via apt-get on Linux systems:

sudo apt-get update
sudo apt-get install build-essential

Furthermore, to enable the generation of pdf reports, pandoc and texlive needs to be installed. pandoc installation can be done via apt-get on Linux systems:

sudo apt-get install pandoc
sudo apt-get install texlive-latex-recommended

The code runs without these functionalities if in the config file the pdf flag is set to false. Please note that the markdown versions of result reports use relative paths to images, thus they will only display those while being in the results_reports folder (in contrast to pdf reports)

Install requirements

The code was tested under python 3.10.2. It is recommended to run the code in a virtual environment. To create a virtual environment, run the following commands:

python3 -m venv venv
source venv/bin/activate

To install the required python packages, run:

pip3 install -r requirements.txt

Running analysis

To run analysis with these frameworks one needs to adapt the configuration file config.yaml to the desired settings. Adapt the data_path to the directory in which the dataset that is available on the Open Science Framework is stored.

To replicate the analysis provided in "Polarizing reply patterns in comment sections of a large German news outlet" all other settings can be unchanged.

The analysis can be run by executing the following command:

python3 main.py

Configuring analysis

This framework allows to run a wide range of analyses on a dataset or subsets by defining analysis jobs as yaml files. Such files consist of four parts:

preprocessing: defines all subsets of the dataset that will be targeted in the analysis
descriptive: defines all descriptive analyses that will be conducted
analysis: defines all other analysis jobs that will be conducted (e.g. regression, correlation, etc.)
visualizations: defines all visualizations that will be created (e.g. histograms, scatterplots, etc.)

Examples for all supported analysis and their arguments can be found in the analysis_config_templates folder. The general structure of an analysis job consists of a tag that names the analysis followed by a list of arguments. The name argument is mandatory and is used for identification and naming of the output files. Please note that dataset argument refers to the names of the datasets in preprocessing. Some other analysis (e.g. forest plots) require addition information referring to specific models also defined in the analysis job.

Output

An analysis job creates three types of outputs:

A markdown report in the results_reports folder which for each analysis give the settings of the analysis and the result
A pdf report in the results_reports folder which is the conversion of the pdf (that can be shared)
For each analysis a file in the results folder that contains the results of the analysis and is named the same as the result

Contributing: Extending the framework

If you want to extend the framework with new analysis, you can do so by following these steps:

Fork the repository
add your analysis function class to the analysis_functions folder
write a wrapper function for your analysis that takes a list of job arguments and calls the analysis.
add the wrapper function to the analysis.py file and extend it to create a list of analysis jobs for the newly created analysis type
create a parameter dataclass in the data_classes folder that is inherited from GeneralParameters.
Add your dataclass to the constructor.py in order for it to be readable from the job yaml file.
Extend utils/helper_logging.py to log the settings for your analysis in order for them to be documented in the report.
Either use the extended analysis framework or create a merge request for it to be included in the main repository.

License

See the LICENSE file for the GNU General Public License v3.0 related details.

Contact

For queries, feedback, or issue reporting, please e-mail Gerrit Anders at g.anders@iwm-tuebingen.de.