Defenses Functions

Available functions:


Adversarial Training

Adversarial training is a method where the model is trained on both the original and adversarial examples, aiming to make the model more robust to adversarial attacks.

Parameters:
    model (tensorflow.keras.Model): The model to defend.
    x (numpy.ndarray): The input training examples.
    y (numpy.ndarray): The true labels of the training examples.
    epsilon (float): The magnitude of the perturbation (default: 0.01).

Returns:
    defended_model (tensorflow.keras.Model): The adversarially trained model.

Feature Squeezing

Feature squeezing reduces the number of bits used to represent the input features, which can remove certain adversarial perturbations.

Gradient Masking

Gradient masking modifies the gradients during training to make them less informative for adversarial attackers.

Input Transformation

Input transformation applies a transformation to the input data before feeding it to the model, aiming to remove adversarial perturbations.

Defensive Distillation

Defensive distillation trains a student model to mimic the predictions of a teacher model, which is often a more robust model.

Randomized Smoothing

Randomized smoothing adds random noise to the input data to make the model more robust to adversarial attacks.

Feature Denoising

Feature denoising applies denoising operations to the input data to remove adversarial perturbations.

Thermometer Encoding

Thermometer encoding discretizes the input features into bins, making it harder for adversarial perturbations to affect the model.

Adversarial Logit Pairing (ALP)

Adversarial logit pairing encourages the logits of adversarial examples to be similar to those of clean examples.

Spatial Smoothing

Spatial smoothing applies a smoothing filter to the input data to remove adversarial perturbations.

JPEG Compression

JPEG compression reduces the size of an image by discarding some information, which can also remove adversarial perturbations.

Last updated

Was this helpful?