uncongeniality_analysis/README.md

# uncongeniality_analysis

## Project Contributors

This analysis project has been executed by Gerrit Anders and Jürgen Buder from IWM Tuebingen.
For project-related queries, please contact Gerrit Anders at g.anders@iwm-tuebingen.de.

## Project Overview

This repository host a general data analysis framework employed to investigate reply behaviour and polarization in the
comment section of "Spiegel Online" (SPON). The research, focuses on understanding uncongeniality within a large
online sample and examining polarization in online discussions.

The dataset for analysis can be found on the [Open Science Framework](https://osf.io/t6eph).

## Setup
To set up the analysis, follow the steps below:

### Clone the Repository

```bash
git clone https://gitea.iwm-tuebingen.de/ganders/uncongeniality_analysis.git
```

### Install non-python prerequisites

In order to run the analysis R needs to be installed. The analysis was conducted using R version 4.1.1.
Please install R from the [official website](https://cran.r-project.org/).
In addition, development tools is recommended. This can be done via apt-get on Linux systems:

```bash
sudo apt-get update
sudo apt-get install build-essential
```

Furthermore, to enable the generation of pdf reports, pandoc and texlive needs to be installed.
pandoc installation can be done via apt-get on Linux systems:

```bash
sudo apt-get install pandoc
sudo apt-get install texlive-latex-recommended
```

The code runs without these functionalities if in the config file the pdf flag is set to false.
Please note that the markdown versions of result reports use relative paths to images,
thus they will only display those while being in the `results_reports` folder (in contrast to pdf reports)

### Install requirements

The code was tested under python 3.10.2.
It is recommended to run the code in a virtual environment. To create a virtual environment, run the following commands:

```bash
python3 -m venv venv
source venv/bin/activate
```

To install the required python packages, run:

```bash
pip3 install -r requirements.txt
```

## Running analysis

To run analysis with these frameworks one needs to adapt the configuration file `config.yaml` to the desired settings.
Adapt the `data_path` to the directory in which the dataset that is available on the
[Open Science Framework](https://osf.io/t6eph) is stored.

To replicate the analysis provided in "Polarizing reply patterns in comment sections of a large German news outlet"
all other settings can be unchanged.

The analysis can be run by executing the following command:

```bash
python3 main.py
```

### Configuring analysis

This framework allows to run a wide range of analyses on a dataset or subsets by defining analysis jobs as yaml files.
Such files consist of four parts:
- `preprocessing`: defines all subsets of the dataset that will be targeted in the analysis
- `descriptive`: defines all descriptive analyses that will be conducted
- `analysis`: defines all other analysis jobs that will be conducted (e.g. regression, correlation, etc.)
- `visualizations`: defines all visualizations that will be created (e.g. histograms, scatterplots, etc.)

Examples for all supported analysis and their arguments can be found in the `analysis_config_templates` folder.
The general structure of an analysis job consists of a tag that names the analysis followed by a list of arguments.
The `name` argument is mandatory and is used for identification and naming of the output files.
Please note that `dataset` argument refers to the names of the datasets in preprocessing. Some other analysis
(e.g. forest plots) require addition information referring to specific models also defined in the analysis job.

### Output

An analysis job creates three types of outputs:
- A markdown report in the `results_reports` folder which for each analysis give the settings of the analysis and the result
- A pdf report in the `results_reports` folder which is the conversion of the pdf (that can be shared)
- For each analysis a file in the `results` folder that contains the results of the analysis and is named the same as the result

### Contributing: Extending the framework

If you want to extend the framework with new analysis, you can do so by following these steps:
- Fork the repository
- add your analysis function class to the `analysis_functions` folder
- write a wrapper function for your analysis that takes a list of job arguments and calls the analysis.
- add the wrapper function to the `analysis.py` file and extend it to create a list of analysis jobs for the
newly created analysis type
- create a parameter dataclass in the `data_classes` folder that is inherited from `GeneralParameters`.
- Add your dataclass to the `constructor.py` in order for it to be readable from the job yaml file.
- Extend `utils/helper_logging.py` to log the settings for your analysis in order for them to be documented in the report.
- Either use the extended analysis framework or create a merge request for it to be included in the main repository.

## License

See the LICENSE file for the GNU General Public License v3.0 related details.

## Contact

For queries, feedback, or issue reporting, please e-mail Gerrit Anders at g.anders@iwm-tuebingen.de.