120 lines
5.1 KiB
Markdown
120 lines
5.1 KiB
Markdown
# uncongeniality_analysis
|
|
|
|
## Project Contributors
|
|
|
|
This analysis project has been executed by Gerrit Anders and Jürgen Buder from IWM Tuebingen.
|
|
For project-related queries, please contact Gerrit Anders at g.anders@iwm-tuebingen.de.
|
|
|
|
## Project Overview
|
|
|
|
This repository host a general data analysis framework employed to investigate reply behaviour and polarization in the
|
|
comment section of "Spiegel Online" (SPON). The research, focuses on understanding uncongeniality within a large
|
|
online sample and examining polarization in online discussions.
|
|
|
|
The dataset for analysis can be found on the [Open Science Framework](https://osf.io/t6eph).
|
|
|
|
## Setup
|
|
To set up the analysis, follow the steps below:
|
|
|
|
### Clone the Repository
|
|
|
|
```bash
|
|
git clone https://gitea.iwm-tuebingen.de/ganders/uncongeniality_analysis.git
|
|
```
|
|
|
|
### Install non-python prerequisites
|
|
|
|
In order to run the analysis R needs to be installed. The analysis was conducted using R version 4.1.1.
|
|
Please install R from the [official website](https://cran.r-project.org/).
|
|
In addition, development tools is recommended. This can be done via apt-get on Linux systems:
|
|
|
|
```bash
|
|
sudo apt-get update
|
|
sudo apt-get install build-essential
|
|
```
|
|
|
|
Furthermore, to enable the generation of pdf reports, pandoc and texlive needs to be installed.
|
|
pandoc installation can be done via apt-get on Linux systems:
|
|
|
|
```bash
|
|
sudo apt-get install pandoc
|
|
sudo apt-get install texlive-latex-recommended
|
|
```
|
|
|
|
The code runs without these functionalities if in the config file the pdf flag is set to false.
|
|
Please note that the markdown versions of result reports use relative paths to images,
|
|
thus they will only display those while being in the `results_reports` folder (in contrast to pdf reports)
|
|
|
|
### Install requirements
|
|
|
|
The code was tested under python 3.10.2.
|
|
It is recommended to run the code in a virtual environment. To create a virtual environment, run the following commands:
|
|
|
|
```bash
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
```
|
|
|
|
To install the required python packages, run:
|
|
|
|
```bash
|
|
pip3 install -r requirements.txt
|
|
```
|
|
|
|
## Running analysis
|
|
|
|
To run analysis with these frameworks one needs to adapt the configuration file `config.yaml` to the desired settings.
|
|
Adapt the `data_path` to the directory in which the dataset that is available on the
|
|
[Open Science Framework](https://osf.io/t6eph) is stored.
|
|
|
|
To replicate the analysis provided in "Polarizing reply patterns in comment sections of a large German news outlet"
|
|
all other settings can be unchanged.
|
|
|
|
The analysis can be run by executing the following command:
|
|
|
|
```bash
|
|
python3 main.py
|
|
```
|
|
|
|
### Configuring analysis
|
|
|
|
This framework allows to run a wide range of analyses on a dataset or subsets by defining analysis jobs as yaml files.
|
|
Such files consist of four parts:
|
|
- `preprocessing`: defines all subsets of the dataset that will be targeted in the analysis
|
|
- `descriptive`: defines all descriptive analyses that will be conducted
|
|
- `analysis`: defines all other analysis jobs that will be conducted (e.g. regression, correlation, etc.)
|
|
- `visualizations`: defines all visualizations that will be created (e.g. histograms, scatterplots, etc.)
|
|
|
|
Examples for all supported analysis and their arguments can be found in the `analysis_config_templates` folder.
|
|
The general structure of an analysis job consists of a tag that names the analysis followed by a list of arguments.
|
|
The `name` argument is mandatory and is used for identification and naming of the output files.
|
|
Please note that `dataset` argument refers to the names of the datasets in preprocessing. Some other analysis
|
|
(e.g. forest plots) require addition information referring to specific models also defined in the analysis job.
|
|
|
|
### Output
|
|
|
|
An analysis job creates three types of outputs:
|
|
- A markdown report in the `results_reports` folder which for each analysis give the settings of the analysis and the result
|
|
- A pdf report in the `results_reports` folder which is the conversion of the pdf (that can be shared)
|
|
- For each analysis a file in the `results` folder that contains the results of the analysis and is named the same as the result
|
|
|
|
### Contributing: Extending the framework
|
|
|
|
If you want to extend the framework with new analysis, you can do so by following these steps:
|
|
- Fork the repository
|
|
- add your analysis function class to the `analysis_functions` folder
|
|
- write a wrapper function for your analysis that takes a list of job arguments and calls the analysis.
|
|
- add the wrapper function to the `analysis.py` file and extend it to create a list of analysis jobs for the
|
|
newly created analysis type
|
|
- create a parameter dataclass in the `data_classes` folder that is inherited from `GeneralParameters`.
|
|
- Add your dataclass to the `constructor.py` in order for it to be readable from the job yaml file.
|
|
- Extend `utils/helper_logging.py` to log the settings for your analysis in order for them to be documented in the report.
|
|
- Either use the extended analysis framework or create a merge request for it to be included in the main repository.
|
|
|
|
## License
|
|
|
|
See the LICENSE file for the GNU General Public License v3.0 related details.
|
|
|
|
## Contact
|
|
|
|
For queries, feedback, or issue reporting, please e-mail Gerrit Anders at g.anders@iwm-tuebingen.de. |