ValX Functions

Available functions:

detect_profanity(text_data, language="English"): Detect profanity in text using regex.
remove_profanity(text_data, output_file=None, language="English"): Remove profanity from text data.
detect_sensitive_information(text_data, info_type=[]): Detect sensitive information in text data.
remove_sensitive_information(text_data, output_file=None, info_type=[]): Remove sensitive information from text data.
detect_hate_speech(text): Detect hate speech or offensive language in a text string.
remove_hate_speech(text_data): Remove hate speech or offensive language in text data using AI.
load_custom_profanity_from_file(filepath): Loads a custom profanity word list from a text file.

Detect profanity

Detect profanity in text using regex.

Args:
        text_data (list): A list of strings representing the text data to analyze.
        language (str, optional): The language used to detect profanity. Defaults to 'English'. Available languages include: All, Arabic, Czech, Danish, German, English, Esperanto, Persian, Finnish, Filipino, French, French (CA), Hindi, Hungarian, Italian, Japanese, Kabyle, Korean, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Thai, Klingon, Turkish, Chinese. If set to `None` and `custom_words_list` is provided, only the custom list will be used.
        custom_words_list (list[str], optional): A Python list of custom profanity words to detect. Defaults to `None`. If provided, these words will be used in addition to the selected language's wordlist, or exclusively if `language` is `None`.

Returns:
            list: A list of dictionaries where each dictionary represents a detected instance of profanity.

Raises:
            ValueError: If `language` is set to `None` and `custom_words_list` is not provided or is empty.
            Each dictionary contains the following keys:
            - "Line" (int): The line number where the profanity was detected.
            - "Column" (int): The column number (position in the line) where the profanity starts.
            - "Word" (str): The detected profanity word.
            - "Language" (str): Indicates the source of the profanity detection (e.g., "English", "Custom", or "Custom + English" if a custom list is combined with a language).

Remove profanity

Remove profanity from text data.

Args:
        text_data (list): A list of strings representing the text data to clean.
        output_file (str, optional): The file path to write the cleaned data. If None, cleaned data is not written to a file. Defaults to `None`.
        language (str, optional): The language for which to remove profanity. Defaults to 'English'. Available languages include: All, Arabic, Czech, Danish, German, English, Esperanto, Persian, Finnish, Filipino, French, French (CA), Hindi, Hungarian, Italian, Japanese, Kabyle, Korean, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Thai, Klingon, Turkish, Chinese. If set to `None` and `custom_words_list` is provided, only the custom list will be used.
        custom_words_list (list[str], optional): A Python list of custom profanity words to remove. Defaults to `None`. If provided, these words will be used in addition to the selected language's wordlist, or exclusively if `language` is `None`.

Returns:
        list: A list of strings representing the cleaned text data.

Raises:
            ValueError: If `language` is set to `None` and `custom_words_list` is not provided or is empty (as this function internally calls `load_profanity_words`).

Detect sensitive information

Detect sensitive information in text data.

Args:
        text_data (list of str): A list of strings representing the text data to be analyzed.
        info_type (str or list of str, optional): One or more types of sensitive info to detect. Available types are: "email", "phone", "credit_card", "ssn", "id", "address", "ip", "iban", "mrn", "icd10", "geo_coords", "username", "file_path", "bitcoin_wallet", "ethereum_wallet". Uses all info types by default.

Returns:
        list of tuple: A list of tuples containing detected sensitive information, each tuple representing (line number, column index, type, value).

Remove sensitive information

Remove sensitive information from text data.

Args:
        text_data (list of str): A list of strings representing the text data to be cleaned.
        output_file (str, optional): Path to the output file where cleaned data will be saved.
        info_type (str or list of str, optional): One or more types of sensitive info to detect and remove. Available types are: "email", "phone", "credit_card", "ssn", "id", "address", "ip", "iban", "mrn", "icd10", "geo_coords", "username", "file_path", "bitcoin_wallet", "ethereum_wallet". Uses all info types by default.

Returns:
        list of str: A list of strings representing the cleaned text data.

Load custom profanity from file

Loads a custom list of profanity words from a text file.

The file should contain one profanity word per line. Lines starting with a hash symbol (#) are treated as comments and are ignored. Empty lines or lines containing only whitespace are also ignored.

Args:
        filepath (str): The path to the text file containing profanity words.

Returns:
        list: A list of profanity words loaded from the file.

Detect hate speech or offensive language

Detect offensive language or hate speech in the provided text string, using an AI model.

Args:
        text (str): A string representing the text data to be used for hate speech detection and offensive language detection.
        
Returns:
        list of str: A list of strings representing the outcome of the detection.

Remove hate speech or offensive language

Remove offensive language or hate speech in the provided text data array, using an AI model.

Args:
        text (str): A string representing the text data to be used for hate speech detection and offensive language detection.

Returns:
        list of str: A list of strings representing the cleaned text data.

PreviousValX Reference

Last updated 6 days ago

Was this helpful?