Go to file
2024-07-17 03:49:12 +02:00
spon_api public repository of the spon_api. Initial publication 2024-07-17 03:49:12 +02:00
.gitignore public repository of the spon_api. Initial publication 2024-07-17 03:49:12 +02:00
LICENSE public repository of the spon_api. Initial publication 2024-07-17 03:49:12 +02:00
README.md public repository of the spon_api. Initial publication 2024-07-17 03:49:12 +02:00
setup.py public repository of the spon_api. Initial publication 2024-07-17 03:49:12 +02:00

spon_api

spon_api is a Python package for scraping article text, metadata, and comments from DER SPIEGEL. The package includes modules for handling individual articles, comments, and archiving data from specified URLs.

Please note that due to restructuring of the DER SPIEGEL website in december 2023 this api is not working, as comment sections were removed (see the offical announcement). Thus, the code is not actively maintained as of 2024-01-01 and hosted for documentation purposes.

Features

  • Fetch and parse archive data for specified dates to get articles published on that date.
  • Fetch and parse individual articles to extract detailed metadata and content.
  • Fetch and parse comments for specific articles, with options for nesting replies.

Setup

To install spon_api, follow these steps:

Prerequisites

Ensure you have Python 3.10 or later installed.

Installation

  1. Clone the repository (if you're pulling it from a source repository):

    git clone https://github.com/your_username/spon_api.git
    cd spon_api
    
  2. Install the package:

    python setup.py install
    

Usage

Below are examples demonstrating how to use the different modules in the spon_api package.

Fetching and Parsing Archive Data (list of articles)

archive.py

import datetime as dt
from spon_api.archive import Archive
date = dt.date(2023, 10, 1)

archive = Archive(date=date)

archive.fetch()
articles = archive.parse()

print(articles)

Fetching and Parsing Articles

article.py

from spon_api.article import Article

article = Article(url='article_url here')

article.fetch()
article_details = article.parse()

print(article_details)

Fetching and Parsing Comments

comments.py

from spon_api.comments import Comments

# One obtains the article_id from the article_details as article_details["id"]
comments = Comments(article_id='article_id here')

comments.fetch()
parsed_comments = comments.parse()

print(parsed_comments)

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.

Contact

Gerrit Anders - g.anders@iwm-tuebingen.de

Note

  • The talk endpoint is set via the .env file. It should not be necessary to change this environmental variable.