spon_api | ||
.gitignore | ||
LICENSE | ||
README.md | ||
setup.py |
spon_api
spon_api
is a Python package for scraping article text, metadata, and comments from DER SPIEGEL.
The package includes modules for handling individual articles, comments, and archiving data from specified URLs.
Please note that due to restructuring of the DER SPIEGEL website in december 2023 this api is not working, as comment sections were removed (see the offical announcement). Thus, the code is not actively maintained as of 2024-01-01 and hosted for documentation purposes.
Features
- Fetch and parse archive data for specified dates to get articles published on that date.
- Fetch and parse individual articles to extract detailed metadata and content.
- Fetch and parse comments for specific articles, with options for nesting replies.
Setup
To install spon_api
, follow these steps:
Prerequisites
Ensure you have Python 3.10 or later installed.
Installation
-
Clone the repository (if you're pulling it from a source repository):
git clone https://github.com/your_username/spon_api.git cd spon_api
-
Install the package:
python setup.py install
Usage
Below are examples demonstrating how to use the different modules in the spon_api
package.
Fetching and Parsing Archive Data (list of articles)
archive.py
import datetime as dt
from spon_api.archive import Archive
date = dt.date(2023, 10, 1)
archive = Archive(date=date)
archive.fetch()
articles = archive.parse()
print(articles)
Fetching and Parsing Articles
article.py
from spon_api.article import Article
article = Article(url='article_url here')
article.fetch()
article_details = article.parse()
print(article_details)
Fetching and Parsing Comments
comments.py
from spon_api.comments import Comments
# One obtains the article_id from the article_details as article_details["id"]
comments = Comments(article_id='article_id here')
comments.fetch()
parsed_comments = comments.parse()
print(parsed_comments)
License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.
Contact
Gerrit Anders - g.anders@iwm-tuebingen.de
Note
- The talk endpoint is set via the
.env
file. It should not be necessary to change this environmental variable.