Documentation
Our WebsiteOur Github
  • 👋Welcome to Infinitode Documentation
  • AI Documentation
  • API Documentation
    • Basic Math API Documentation (#Experimental)
    • BMI Calculator API Documentation
    • Character Counter API Documentation
    • Chemical Equation Balancer API Documentation
    • Color Generator API Documentation
    • Date Difference Calculator API Documentation
    • Dungen API Documentation
    • Dungen Dev API Documentation
    • Factorial Calculator API Documentation
    • Fantasy Name Generator API Documentation
    • Fibonacci Sequence Generator API Documentation
    • GCD Calculator API Documentation
    • Hash API Documentation
    • Helix PSA API Documentation
    • LCM Calculator API Documentation
    • Leap Year Checker API Documentation
    • Lorem API Documentation
    • Molar Mass Calculator API Documentation (#Experimental)
    • MycoNom API Documentation
    • Name Generator API Documentation
    • Palindrome Checker API Documentation
    • Password Generator API Documentation
    • Password Strength Detector API Documentation
    • Periodic Table API Documentation
    • Prime Number Checker API Documentation
    • Quadratic Equation Solver API Documentation
    • Random Facts Generator API Documentation
    • Random Quotes Generator API Documentation
    • Roman Numeral Converter API Documentation
    • Simple Interest Calculator API Documentation
    • Slugify API Documentation
    • Text Case Converter API Documentation
    • Unit Converter API Documentation
    • Username Generator API Documentation
    • UUID Generator API Documentation
    • Vowel Counter API Documentation
  • Package Documentation
    • BlurJS Package Documentation
      • BlurJS Usage Examples
      • BlurJS Reference Documentation
    • CodeSafe Package Documentation
      • CodeSafe Reference
        • CodeSafe Functions
    • DeepDefend Package Documentation
      • DeepDefend Reference
        • Attacks Functions
        • Defenses Functions
    • DupliPy Package Documentation
      • DupliPy Reference
        • Formatting Functions
        • Replication Functions
        • Similarity Functions
        • Text Analysis Functions
    • FuncProfiler Package Documentation
      • FuncProfiler Reference
        • FuncProfiler Functions
    • Hued Package Documentation
      • Hued Reference
        • Analysis Functions
        • Colors Functions
        • Conversions Functions
        • Palettes Functions
    • LocalSiteMap Package Documentation
      • LocalSiteMap Reference
        • LocalSiteMap Functions
    • PyAutoPlot Package Documentation
      • PyAutoPlot Reference
        • PyAutoPlot Functions
    • PyWebScrapr Package Documentation
      • PyWebScrapr Reference
        • PyWebScrapr Functions
    • ValX Package Documentation
      • ValX Reference
        • ValX Functions
Powered by GitBook
On this page
  • Changelog
  • Installation
  • Example Usage
  • Text scraping
  • Image scraping

Was this helpful?

  1. Package Documentation

PyWebScrapr Package Documentation

Package documentation for PyWebScrapr, a Python package for handling web scraping tasks. Supports both image scraping and text scraping.

Changelog

  • 0.1.5 (Latest): Added new parameters to control following child links, and added a new export format, json.

  • 0.1.4: Added new parameters to the scrape_text function for added control and flexibility.

  • 0.1.3: Added support for handling different types of images on websites. Added improved error handling.

  • 0.1.2: Updated PYPI project description.

  • 0.1.1: New parameters for image extraction, and optimized extraction by using BeautifulSoup4's SoupStrainer.

  • 0.1.0: Initial release.

Installation

You can install PyWebScrapr using PyPi, please make sure that you are using Python 3.6 or later before installing PyWebScrapr:

pip install pywebscrapr

Example Usage

Text scraping

from pywebscrapr import scrape_text

# Specify links in a file or list
links_file = 'links.txt'
links_array = ['https://example.com/page1', 'https://example.com/page2']

# Scrape text and save to the 'output.txt' file
scrape_text(links_file=links_file, links_array=links_array, output_file='output.txt')

Image scraping

from pywebscrapr import scrape_images

# Specify links in a file or list
links_file = 'image_links.txt'
links_array = ['https://example.com/image1.jpg', 'https://example.com/image2.png']

# Scrape images and save to the 'images' folder
scrape_images(links_file=links_file, links_array=links_array, save_folder='images')
PreviousPyAutoPlot FunctionsNextPyWebScrapr Reference

Last updated 2 months ago

Was this helpful?