PyWebScrapr Package Documentation
Package documentation for PyWebScrapr, a Python package for handling web scraping tasks. Supports both image scraping and text scraping.
Changelog
0.1.5 (Latest): Added new parameters to control following child links, and added a new export format,
json
.0.1.4: Added new parameters to the
scrape_text
function for added control and flexibility.0.1.3: Added support for handling different types of images on websites. Added improved error handling.
0.1.2: Updated PYPI project description.
0.1.1: New parameters for image extraction, and optimized extraction by using BeautifulSoup4's
SoupStrainer
.0.1.0: Initial release.
Installation
You can install PyWebScrapr using PyPi, please make sure that you are using Python 3.6 or later before installing PyWebScrapr:
pip install pywebscrapr
Example Usage
Text scraping
from pywebscrapr import scrape_text
# Specify links in a file or list
links_file = 'links.txt'
links_array = ['https://example.com/page1', 'https://example.com/page2']
# Scrape text and save to the 'output.txt' file
scrape_text(links_file=links_file, links_array=links_array, output_file='output.txt')
Image scraping
from pywebscrapr import scrape_images
# Specify links in a file or list
links_file = 'image_links.txt'
links_array = ['https://example.com/image1.jpg', 'https://example.com/image2.png']
# Scrape images and save to the 'images' folder
scrape_images(links_file=links_file, links_array=links_array, save_folder='images')
Last updated
Was this helpful?