PyWebScrapr Package Documentation

Package documentation for PyWebScrapr, a Python package for handling web scraping tasks. Supports both image scraping and text scraping.

Changelog

  • 0.1.5 (Latest): Added new parameters to control following child links, and added a new export format, json.

  • 0.1.4: Added new parameters to the scrape_text function for added control and flexibility.

  • 0.1.3: Added support for handling different types of images on websites. Added improved error handling.

  • 0.1.2: Updated PYPI project description.

  • 0.1.1: New parameters for image extraction, and optimized extraction by using BeautifulSoup4's SoupStrainer.

  • 0.1.0: Initial release.

Installation

You can install PyWebScrapr using PyPi, please make sure that you are using Python 3.6 or later before installing PyWebScrapr:

pip install pywebscrapr

Example Usage

Text scraping

from pywebscrapr import scrape_text

# Specify links in a file or list
links_file = 'links.txt'
links_array = ['https://example.com/page1', 'https://example.com/page2']

# Scrape text and save to the 'output.txt' file
scrape_text(links_file=links_file, links_array=links_array, output_file='output.txt')

Image scraping

from pywebscrapr import scrape_images

# Specify links in a file or list
links_file = 'image_links.txt'
links_array = ['https://example.com/image1.jpg', 'https://example.com/image2.png']

# Scrape images and save to the 'images' folder
scrape_images(links_file=links_file, links_array=links_array, save_folder='images')

Last updated

Was this helpful?