100 lines
2.4 KiB
Markdown
100 lines
2.4 KiB
Markdown
|
# spon_api
|
||
|
|
||
|
`spon_api` is a Python package for scraping article text, metadata, and comments from DER SPIEGEL.
|
||
|
The package includes modules for handling individual articles, comments, and archiving data from specified URLs.
|
||
|
|
||
|
Please note that due to restructuring of the DER SPIEGEL website in december 2023 this api is not working,
|
||
|
as comment sections were removed (see the [offical announcement](https://www.spiegel.de/backstage/community-wir-starten-spiegel-debatte-a-8df0d3e4-722a-4cd9-87cf-6f809bb767ce)).
|
||
|
Thus, the code is not actively maintained as of 2024-01-01 and hosted for documentation purposes.
|
||
|
|
||
|
## Features
|
||
|
|
||
|
- **Fetch and parse archive data** for specified dates to get articles published on that date.
|
||
|
- **Fetch and parse individual articles** to extract detailed metadata and content.
|
||
|
- **Fetch and parse comments** for specific articles, with options for nesting replies.
|
||
|
|
||
|
## Setup
|
||
|
|
||
|
To install `spon_api`, follow these steps:
|
||
|
|
||
|
### Prerequisites
|
||
|
|
||
|
Ensure you have Python 3.10 or later installed.
|
||
|
|
||
|
### Installation
|
||
|
|
||
|
1. **Clone the repository** (if you're pulling it from a source repository):
|
||
|
|
||
|
```bash
|
||
|
git clone https://github.com/your_username/spon_api.git
|
||
|
cd spon_api
|
||
|
```
|
||
|
|
||
|
2. **Install the package**:
|
||
|
|
||
|
```bash
|
||
|
python setup.py install
|
||
|
```
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
Below are examples demonstrating how to use the different modules in the `spon_api` package.
|
||
|
|
||
|
### Fetching and Parsing Archive Data (list of articles)
|
||
|
|
||
|
**archive.py**
|
||
|
|
||
|
```python
|
||
|
import datetime as dt
|
||
|
from spon_api.archive import Archive
|
||
|
date = dt.date(2023, 10, 1)
|
||
|
|
||
|
archive = Archive(date=date)
|
||
|
|
||
|
archive.fetch()
|
||
|
articles = archive.parse()
|
||
|
|
||
|
print(articles)
|
||
|
```
|
||
|
|
||
|
### Fetching and Parsing Articles
|
||
|
|
||
|
**article.py**
|
||
|
|
||
|
```python
|
||
|
from spon_api.article import Article
|
||
|
|
||
|
article = Article(url='article_url here')
|
||
|
|
||
|
article.fetch()
|
||
|
article_details = article.parse()
|
||
|
|
||
|
print(article_details)
|
||
|
```
|
||
|
|
||
|
### Fetching and Parsing Comments
|
||
|
|
||
|
**comments.py**
|
||
|
|
||
|
```python
|
||
|
from spon_api.comments import Comments
|
||
|
|
||
|
# One obtains the article_id from the article_details as article_details["id"]
|
||
|
comments = Comments(article_id='article_id here')
|
||
|
|
||
|
comments.fetch()
|
||
|
parsed_comments = comments.parse()
|
||
|
|
||
|
print(parsed_comments)
|
||
|
```
|
||
|
|
||
|
## License
|
||
|
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.
|
||
|
|
||
|
## Contact
|
||
|
Gerrit Anders - g.anders@iwm-tuebingen.de
|
||
|
|
||
|
|
||
|
## Note
|
||
|
|
||
|
- The talk endpoint is set via the `.env` file. It should not be necessary to change this environmental variable.
|