# PyWebScrapr Package Documentation

## Changelog

* **0.1.6 (Latest)**:&#x20;
  * Added progress indicators to both `scrape_images` and `scrape_text` to provide real-time feedback on scraping progress.
  * Implemented multithreading to improve performance by scraping multiple pages concurrently.
  * Added a `rate_limit` parameter to both scraping functions to control the request frequency and prevent server overload.
  * Refactored the concurrency model to ensure that child links are also scraped concurrently.
* **0.1.5**: Added new parameters to control following child links, and added a new export format, `json`.
* **0.1.4**: Added new parameters to the `scrape_text` function for added control and flexibility.
* **0.1.3**: Added support for handling different types of images on websites. Added improved error handling.
* **0.1.2**: Updated PYPI project description.
* **0.1.1**: New parameters for image extraction, and optimized extraction by using BeautifulSoup4's `SoupStrainer`.
* **0.1.0**: Initial release.

## Installation

You can install PyWebScrapr using PyPi, please make sure that you are using Python 3.6 or later before installing PyWebScrapr:

```bash
pip install pywebscrapr
```

***

## Example Usage

### Text scraping

```python
from pywebscrapr import scrape_text

# Specify links in a file or list
links_file = 'links.txt'
links_array = ['https://example.com/page1', 'https://example.com/page2']

# Scrape text and save to the 'output.txt' file
scrape_text(links_file=links_file, links_array=links_array, output_file='output.txt')
```

### Image scraping

```python
from pywebscrapr import scrape_images

# Specify links in a file or list
links_file = 'image_links.txt'
links_array = ['https://example.com/image1.jpg', 'https://example.com/image2.png']

# Scrape images and save to the 'images' folder
scrape_images(links_file=links_file, links_array=links_array, save_folder='images')
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://infinitode-docs.gitbook.io/documentation/package-documentation/pywebscrapr-package-documentation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
