# ValX Package Documentation

## Changelog

* **0.2.6 (Latest)**: Fixed a major bug ([issue #4](https://github.com/Infinitode/ValX/issues/4)) where there were missing languages for profanity filtering, mixed language lists, and missing words. Also introduced text case ignores for language selection, e.g., you can select English for filtering by passing "English", "english", "en", "EN", "EnGliSh", or any other variation of it to the `language` parameter.
* **0.2.5**: Introduced enhanced flexibility for profanity filtering:
  * Added `custom_words_list` parameter to `detect_profanity` and `remove_profanity` for user-defined profanity lists.
  * Support for standalone custom lists by setting `language=None`.
  * Support for combined lists (built-in language + custom list).
  * New helper function `load_custom_profanity_from_file(filepath)` to load custom words from a file.
  * `detect_profanity` output now specifies profanity source (e.g., "Custom", "Custom + English").
* **0.2.4**: Fixed compatibility issues with `scikit-learn` versions `1.3.0` and up. Also removed dependency for `scikit-learn 1.2.2` as it is no longer needed, older versions and newer versions are now compatible. Please read this issue for more information: <https://github.com/Infinitode/ValX/issues/1>
* **0.2.3**: Created new detection patterns for sensitive information, and created a new optional `info_type` parameter to control sensitive information detection and removal.
* **0.2.2**: Refactored `detect_profanity` function to return more information about the found profanities. Also removed unnecessary printing in functions.
* **0.2.1**: Updated project PYPI description.
* **0.2.0**: Created a new function to automatically remove detected hate speech or offensive speech from text.
* **0.1.8 - 0.1.9**: Updated docstrings.
* **0.1.7**: Added AI models to ValX for hate speech detection.
* **0.1.1 - 0.1.6**: Fixed errors in code and created several functions for text cleaning.
* **0.1.0**: Initial release.

## Installation

You can install ValX using PyPi. Please make sure that you are using Python 3.6 or later before installing ValX:

```bash
pip install valx
```

***

#### List of supported languages for profanity detection and removal

Below is a complete list of all the available supported languages for ValX's profanity detection and removal functions, which are valid values for `language`:

* All
* Arabic
* AR
* Czech
* CS
* Danish
* DA
* German
* DE
* English
* EN
* Esperanto
* EO
* Persian
* Finnish
* FI
* Filipino
* FIL
* French
* FR
* French (CA)
* FR-CA-U-SD-CAQC
* Hindi
* HI
* Hungarian
* HU
* Italian
* IT
* Japanese
* JA
* Kabyle
* KAB
* Korean
* KO
* Dutch
* NL
* Norwegian
* NO
* Polish
* PL
* Portuguese
* PT
* Russian
* RU
* Spanish
* ES
* Swedish
* SV
* Thai
* TH
* Klingon
* TLH
* Turkish
* TR
* Chinese
* ZH

## Example Usage

### Profanity Detection

<pre class="language-python"><code class="lang-python">from valx import detect_profanity

sample_text = [
    "This is a sample text containing some profanity like bad word 1, bad word 2, and bad word 3.",
<strong>    "This line doesn't contain any profanity.",
</strong>    "But this one has another, just in another language: bad word 4."
]

# Detect profanity
results = detect_profanity(sample_text, language='English')
print("Profanity Evaluation Results", results)
</code></pre>

### Profanity Removal

<pre class="language-python"><code class="lang-python"><strong>from valx import remove_profanity
</strong>
sample_text = [
    "This is a sample text containing some profanity like bad word 1, bad word 2, and bad word 3.",
    "This line doesn't contain any profanity.",
    "But this one has another, just in another language: bad word 4."
]

# Remove profanity
removed = remove_profanity(sample_text, "text_cleaned.txt", language="English")
</code></pre>

### PII Detection

<pre class="language-python"><code class="lang-python"><strong>from valx import detect_sensitive_information
</strong>
<strong>sample_text = [
</strong>        "Please contact john.doe@example.com or call 555-123-4567 for more information.",
        "We will need your credit card number to complete the transaction: 1234-5678-9012-3456.",
        "My social security number is 123-45-6789 and my ID number is AB123456.",
        "Our office address is 123 Main St, Anytown, USA. Please visit us!",
        "Your IP address is 192.168.1.1. Please don't share it with anyone."
]

# Detect sensitive information
detected_information = detect_sensitive_information(sample_text)
</code></pre>

### PII Removal

<pre class="language-python"><code class="lang-python"><strong>from valx import remove_sensitive_information
</strong>
sample_text = [
        "Please contact john.doe@example.com or call 555-123-4567 for more information.",
        "We will need your credit card number to complete the transaction: 1234-5678-9012-3456.",
        "My social security number is 123-45-6789 and my ID number is AB123456.",
        "Our office address is 123 Main St, Anytown, USA. Please visit us!",
        "Your IP address is 192.168.1.1. Please don't share it with anyone."
]

# Detect sensitive information
cleaned_information = remove_sensitive_information(sample_text)
</code></pre>

### Hate Speech Detection

```python
from valx import detect_hate_speech

# Detect hate speech or offensive language
outcome_of_detection = detect_hate_speech("You are stupid.")
```

### Remove Hate Speech

```python
from valx import detect_hate_speech

sample_text = [
    "This is a sample text containing some profanity like bad word 1, bad word 2, and bad word 3.",
    "This line doesn't contain any profanity.",
    "But this one has another, just in another language: bad word 4."
]
# Remove hate speech or offensive language
cleaned_text = remove_hate_speech(sample_text)
```

### Custom Profanity Filtering

ValX allows for flexible profanity filtering using custom word lists.

```python
from valx import detect_profanity, remove_profanity, load_custom_profanity_from_file

# Example: Create a dummy custom profanity file
with open("custom_profanity.txt", "w") as f:
    f.write("# This is a comment and will be ignored\n")
    f.write("sillyword1\n")
    f.write("    \n") # Empty line, ignored
    f.write("sillyword2\n")
    f.write("custombadword\n")

sample_text_custom = [
    "This text contains a sillyword1.",
    "Another line with sillyword2 and also a standard badword.",
    "This custombadword should be caught."
]

# 1. Using custom_words_list directly (standalone)
custom_list_direct = ["sillyword1", "custombadword"]
detected_custom_direct = detect_profanity(sample_text_custom, language=None, custom_words_list=custom_list_direct)
print("Detected with direct custom list (standalone):", detected_custom_direct)
# Expected: Detects "sillyword1" and "custombadword" with source "Custom"

# 2. Loading custom words from a file
loaded_custom_list = load_custom_profanity_from_file("custom_profanity.txt")
print(f"Loaded custom words: {loaded_custom_list}") # Expected: ['sillyword1', 'sillyword2', 'custombadword']

# 3. Using loaded custom list with a built-in language (English)
detected_custom_combined = detect_profanity(sample_text_custom, language="English", custom_words_list=loaded_custom_list)
print("Detected with loaded custom list + English:", detected_custom_combined)
# Expected: Detects "sillyword1", "sillyword2", "custombadword" (Source: Custom + English or Custom)
# and "badword" (Source: English)

# 4. Removing profanity using a custom list
cleaned_text_custom = remove_profanity(sample_text_custom, language=None, custom_words_list=loaded_custom_list)
print("Text after removing profanity with custom list:", cleaned_text_custom)
# Expected: "sillyword1", "sillyword2", "custombadword" are removed.

# Clean up dummy file (optional)
import os
os.remove("custom_profanity.txt")
```
