PyWebScrapr Package Documentation
Package documentation for PyWebScrapr, a Python package for handling web scraping tasks. Supports both image scraping and text scraping.
Changelog
0.1.6 (Latest):
Added progress indicators to both
scrape_imagesandscrape_textto provide real-time feedback on scraping progress.Implemented multithreading to improve performance by scraping multiple pages concurrently.
Added a
rate_limitparameter to both scraping functions to control the request frequency and prevent server overload.Refactored the concurrency model to ensure that child links are also scraped concurrently.
0.1.5: Added new parameters to control following child links, and added a new export format,
json.0.1.4: Added new parameters to the
scrape_textfunction for added control and flexibility.0.1.3: Added support for handling different types of images on websites. Added improved error handling.
0.1.2: Updated PYPI project description.
0.1.1: New parameters for image extraction, and optimized extraction by using BeautifulSoup4's
SoupStrainer.0.1.0: Initial release.
Installation
You can install PyWebScrapr using PyPi, please make sure that you are using Python 3.6 or later before installing PyWebScrapr:
pip install pywebscraprExample Usage
Text scraping
from pywebscrapr import scrape_text
# Specify links in a file or list
links_file = 'links.txt'
links_array = ['https://example.com/page1', 'https://example.com/page2']
# Scrape text and save to the 'output.txt' file
scrape_text(links_file=links_file, links_array=links_array, output_file='output.txt')Image scraping
from pywebscrapr import scrape_images
# Specify links in a file or list
links_file = 'image_links.txt'
links_array = ['https://example.com/image1.jpg', 'https://example.com/image2.png']
# Scrape images and save to the 'images' folder
scrape_images(links_file=links_file, links_array=links_array, save_folder='images')Last updated
Was this helpful?