spon_api/README.md

100 lines
2.4 KiB
Markdown

# spon_api
`spon_api` is a Python package for scraping article text, metadata, and comments from DER SPIEGEL.
The package includes modules for handling individual articles, comments, and archiving data from specified URLs.
Please note that due to restructuring of the DER SPIEGEL website in december 2023 this api is not working,
as comment sections were removed (see the [offical announcement](https://www.spiegel.de/backstage/community-wir-starten-spiegel-debatte-a-8df0d3e4-722a-4cd9-87cf-6f809bb767ce)).
Thus, the code is not actively maintained as of 2024-01-01 and hosted for documentation purposes.
## Features
- **Fetch and parse archive data** for specified dates to get articles published on that date.
- **Fetch and parse individual articles** to extract detailed metadata and content.
- **Fetch and parse comments** for specific articles, with options for nesting replies.
## Setup
To install `spon_api`, follow these steps:
### Prerequisites
Ensure you have Python 3.10 or later installed.
### Installation
1. **Clone the repository** (if you're pulling it from a source repository):
```bash
git clone https://github.com/your_username/spon_api.git
cd spon_api
```
2. **Install the package**:
```bash
python setup.py install
```
## Usage
Below are examples demonstrating how to use the different modules in the `spon_api` package.
### Fetching and Parsing Archive Data (list of articles)
**archive.py**
```python
import datetime as dt
from spon_api.archive import Archive
date = dt.date(2023, 10, 1)
archive = Archive(date=date)
archive.fetch()
articles = archive.parse()
print(articles)
```
### Fetching and Parsing Articles
**article.py**
```python
from spon_api.article import Article
article = Article(url='article_url here')
article.fetch()
article_details = article.parse()
print(article_details)
```
### Fetching and Parsing Comments
**comments.py**
```python
from spon_api.comments import Comments
# One obtains the article_id from the article_details as article_details["id"]
comments = Comments(article_id='article_id here')
comments.fetch()
parsed_comments = comments.parse()
print(parsed_comments)
```
## License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.
## Contact
Gerrit Anders - g.anders@iwm-tuebingen.de
## Note
- The talk endpoint is set via the `.env` file. It should not be necessary to change this environmental variable.