This page was generated from doc/source/methods/adversarialvae.ipynb.

Variational Auto-Encoder Adversarial Detector¶

Overview¶

The adversarial VAE detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised or semi-supervised training is desirable since labeled data is often scarce. The loss is however different from traditional VAE training and focuses on minimizing the KL-divergence between a classifier’s class prediction probabilities on the original and reconstructed data by the VAE. When an adversarial instance is fed to the VAE, the KL-divergence between the predictions on the adversarial example and the reconstruction is large. The reconstruction does not contain the adversarial artefacts and has a different prediction distribution. As a result, the adversarial instance is flagged. The algorithm works well on tabular and image data.

Usage¶

Initialize¶

Parameters:

threshold: threshold value above which the instance is flagged as an adversarial instance.
latent_dim: latent dimension of the VAE.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(32, 32, 3)),
      Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)
  ])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(latent_dim,)),
      Dense(4*4*128),
      Reshape(target_shape=(4, 4, 128)),
      Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
  ])

vae: instead of using a separate encoder and decoder, the VAE can also be passed as a tf.keras.Model.
model: the classifier as a tf.keras.Model. Example:

inputs = tf.keras.Input(shape=(input_dim,))
outputs = tf.keras.layers.Dense(output_dim, activation=tf.nn.softmax)(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

samples: number of samples drawn during detection for each instance to detect.
beta: weight on the KL-divergence loss term following the \(\beta\)-VAE framework. Default equals 0.
data_type: can specify data type added to metadata. E.g. ‘tabular’ or ‘image’.

Initialized outlier detector example:

from alibi_detect.ad import AdversarialVAE

ad = AdversarialVAE(
    threshold=0.1,
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    model=model,
    latent_dim=50,
    samples=10
)

Fit¶

We then need to train the adversarial detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the custom adversarial loss.
w_model: weight on the loss term minimizing the KL-divergence between model prediction probabilities on the original and reconstructed instance. Defaults to 1.
w_recon: weight on the elbo loss term. Defaults to 0.
optimizer: optimizer used for training. Defaults to Adam with learning rate 1e-3.
cov_elbo: dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)) which is the default.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

ad.fit(
    X_train,
    epochs=5
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

ad.infer_threshold(
    X,
    threshold_perc=95
)

Detect¶

We detect adversarial instances by simply calling predict on a batch of instances X. We can also return the instance level adversarial score by setting return_instance_score to True.

The prediction takes the form of a dictionary with meta and data keys. meta contains the detector’s metadata while data is also a dictionary which contains the actual predictions stored in the following keys:

is_adversarial: boolean whether instances are above the threshold and therefore adversarial instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = ad.predict(
    X,
    return_instance_score=True
)

Examples¶

Image¶

Adversarial detection on MNIST