scrape_text(links_file=None, links_array=None, output_file='output.txt', csv_output_file=None, remove_extra_whitespace=True): Scrape textual content from the given links and save to specified output file(s).
scrape_images(links_file=None, links_array=None, save_folder='images', min_width=None, min_height=None, max_width=None, max_height=None): Scrape image content from the given links and save to specified output folder.
Scrape text
Scrape textual content from the given links and save to specified output file(s).
Parameters:-links_file (str): Path to a file containing links,with each link on a new line.-links_array (list): List of links to scrape text from.-output_file (str): File to save the scraped text.-csv_output_file (str): File to save the URL and text information in CSV format.-remove_extra_whitespace (bool): If True, remove extra whitespace and empty lines from the output.Example:from pywebscrapr import scrape_text# Using links from a file and saving text to output.txtscrape_text(links_file='links.txt', output_file='output.txt')# Using links directly and saving text to output.txt and csv_output.csv with extra whitespace removallinks = ['https://example.com/page1','https://example.com/page2']scrape_text(links_array=links, output_file='output.txt', csv_output_file='csv_output.csv', remove_extra_whitespace=True)
Scrape images
Scrape image content from the given links and save to specified output folder.
Parameters:-links_file (str): Path to a file containing links,with each link on a new line.-links_array (list): List of links to scrape images from.-save_folder (str): Folder to save the scraped images.-min_width (int): Minimum width of images to include (optional).-min_height (int): Minimum height of images to include (optional).-max_width (int): Maximum width of images to include (optional).-max_height (int): Maximum height of images to include (optional).Example:from pywebscrapr import scrape_images# Using links from a file and saving images to output_images folder.scrape_images(links_file='links.txt', save_folder='output_images', min_width=100, min_height=100)