analysis_config_templates | ||
analysis_jobs | ||
results | ||
results_reports | ||
src | ||
.gitignore | ||
config.yaml | ||
LICENSE | ||
main.py | ||
README.md | ||
requirements.txt |
uncongeniality_analysis
Project Contributors
This analysis project has been executed by Gerrit Anders and Jürgen Buder from IWM Tuebingen. For project-related queries, please contact Gerrit Anders at g.anders@iwm-tuebingen.de.
Project Overview
This repository host a general data analysis framework employed to investigate reply behaviour and polarization in the comment section of "Spiegel Online" (SPON). The research, focuses on understanding uncongeniality within a large online sample and examining polarization in online discussions.
The dataset for analysis can be found on the Open Science Framework.
Setup
To set up the analysis, follow the steps below:
Clone the Repository
git clone https://gitea.iwm-tuebingen.de/ganders/uncongeniality_analysis.git
Install non-python prerequisites
In order to run the analysis R needs to be installed. The analysis was conducted using R version 4.1.1. Please install R from the official website. In addition, development tools is recommended. This can be done via apt-get on Linux systems:
sudo apt-get update
sudo apt-get install build-essential
Furthermore, to enable the generation of pdf reports, pandoc and texlive needs to be installed. pandoc installation can be done via apt-get on Linux systems:
sudo apt-get install pandoc
sudo apt-get install texlive-latex-recommended
The code runs without these functionalities if in the config file the pdf flag is set to false.
Please note that the markdown versions of result reports use relative paths to images,
thus they will only display those while being in the results_reports
folder (in contrast to pdf reports)
Install requirements
The code was tested under python 3.10.2. It is recommended to run the code in a virtual environment. To create a virtual environment, run the following commands:
python3 -m venv venv
source venv/bin/activate
To install the required python packages, run:
pip3 install -r requirements.txt
Running analysis
To run analysis with these frameworks one needs to adapt the configuration file config.yaml
to the desired settings.
Adapt the data_path
to the directory in which the dataset that is available on the
Open Science Framework is stored.
To replicate the analysis provided in "Polarizing reply patterns in comment sections of a large German news outlet" all other settings can be unchanged.
The analysis can be run by executing the following command:
python3 main.py
Configuring analysis
This framework allows to run a wide range of analyses on a dataset or subsets by defining analysis jobs as yaml files. Such files consist of four parts:
preprocessing
: defines all subsets of the dataset that will be targeted in the analysisdescriptive
: defines all descriptive analyses that will be conductedanalysis
: defines all other analysis jobs that will be conducted (e.g. regression, correlation, etc.)visualizations
: defines all visualizations that will be created (e.g. histograms, scatterplots, etc.)
Examples for all supported analysis and their arguments can be found in the analysis_config_templates
folder.
The general structure of an analysis job consists of a tag that names the analysis followed by a list of arguments.
The name
argument is mandatory and is used for identification and naming of the output files.
Please note that dataset
argument refers to the names of the datasets in preprocessing. Some other analysis
(e.g. forest plots) require addition information referring to specific models also defined in the analysis job.
Output
An analysis job creates three types of outputs:
- A markdown report in the
results_reports
folder which for each analysis give the settings of the analysis and the result - A pdf report in the
results_reports
folder which is the conversion of the pdf (that can be shared) - For each analysis a file in the
results
folder that contains the results of the analysis and is named the same as the result
Contributing: Extending the framework
If you want to extend the framework with new analysis, you can do so by following these steps:
- Fork the repository
- add your analysis function class to the
analysis_functions
folder - write a wrapper function for your analysis that takes a list of job arguments and calls the analysis.
- add the wrapper function to the
analysis.py
file and extend it to create a list of analysis jobs for the newly created analysis type - create a parameter dataclass in the
data_classes
folder that is inherited fromGeneralParameters
. - Add your dataclass to the
constructor.py
in order for it to be readable from the job yaml file. - Extend
utils/helper_logging.py
to log the settings for your analysis in order for them to be documented in the report. - Either use the extended analysis framework or create a merge request for it to be included in the main repository.
License
See the LICENSE file for the GNU General Public License v3.0 related details.
Contact
For queries, feedback, or issue reporting, please e-mail Gerrit Anders at g.anders@iwm-tuebingen.de.