1288 lines
115 KiB
TeX
1288 lines
115 KiB
TeX
%%
|
||
%% Copyright 2019-2020 Elsevier Ltd
|
||
%%
|
||
%% This file is part of the 'CAS Bundle'.
|
||
%% --------------------------------------
|
||
%%
|
||
%% It may be distributed under the conditions of the LaTeX Project Public
|
||
%% License, either version 1.2 of this license or (at your option) any
|
||
%% later version. The latest version of this license is in
|
||
%% http://www.latex-project.org/lppl.txt
|
||
%% and version 1.2 or later is part of all distributions of LaTeX
|
||
%% version 1999/12/01 or later.
|
||
%%
|
||
%% The list of all files belonging to the 'CAS Bundle' is
|
||
%% given in the file `manifest.txt'.
|
||
%%
|
||
%% Template article for cas-dc documentclass for
|
||
%% double column output.
|
||
%\documentclass[a4paper,fleqn,longmktitle]{cas-dc}
|
||
\documentclass[a4paper,fleqn]{cas-dc}
|
||
% \usepackage[colorlinks=true,urlcolor]{hyperref}
|
||
\usepackage{bigstrut,multirow,rotating}
|
||
%\usepackage[numbers]{natbib}
|
||
%\usepackage[authoryear]{natbib}
|
||
% \usepackage[authoryear,longnamesfirst,round]{natbib}
|
||
% 参考文献,不显示序号\usepackage[round]{natbib}
|
||
\usepackage[authoryear]{natbib}
|
||
\usepackage{appendix}
|
||
\usepackage{color}
|
||
\usepackage[mathscr]{eucal}
|
||
\usepackage{float}
|
||
\usepackage{graphicx}
|
||
\usepackage{caption}
|
||
\usepackage{array}
|
||
\usepackage{multirow}
|
||
\usepackage{booktabs}
|
||
\usepackage{indentfirst}
|
||
\usepackage{mathrsfs}
|
||
\usepackage{graphicx}
|
||
\usepackage{amsmath}
|
||
\usepackage{amssymb}
|
||
\usepackage{hyperref}
|
||
\hypersetup{colorlinks=true, urlcolor=cyan}
|
||
\usepackage{titlesec}
|
||
\titleformat{\section}{\bfseries\Large\bfseries}{\thesection}{0.4em}{} %居中,数字开头,\Large字号
|
||
\titleformat{\subsection}{\bfseries\normalsize\bfseries}{\thesubsection}{0.4em}{} %居中,数字开头,\large字号
|
||
\titleformat{\subsubsection}{\bfseries\small\color{black}}{\thesubsubsection}{0.4em}{} %居左
|
||
%作者介绍
|
||
\usepackage{appendix}
|
||
\usepackage{wrapfig} % 用于插入biography
|
||
% \usepackage{flushend} % 用于让最后一页的biography以行对齐
|
||
%防止与已有包冲突
|
||
\makeatletter
|
||
\newif\if@restonecol
|
||
\makeatother
|
||
\let\algorithm\relax
|
||
\let\endalgorithm\relax
|
||
|
||
%引入伪代码模块需要的包,第三代
|
||
\usepackage[linesnumbered,ruled,vlined]{algorithm2e}%[ruled,vlined]{
|
||
|
||
% \usepackage[ruled]{algorithm2e} %带竖线
|
||
% \usepackage[ruled,vlined]{algorithm2e} %带竖线和折线
|
||
% \usepackage[linesnumbered,boxed]{algorithm2e} %方框格式
|
||
% \usepackage[lined,algonl,boxed]{algorithm2e} %可以显示EndIf等
|
||
|
||
|
||
%%%Author definitions
|
||
\def\tsc#1{\csdef{#1}{\textsc{\lowercase{#1}}\xspace}}
|
||
\tsc{WGM}
|
||
\tsc{QE}
|
||
\tsc{EP}
|
||
\tsc{PMS}
|
||
\tsc{BEC}
|
||
\tsc{DE}
|
||
%%%
|
||
|
||
% Uncomment and use as if needed
|
||
%\newtheorem{theorem}{Theorem}
|
||
%\newtheorem{lemma}[theorem]{Lemma}
|
||
%\newdefinition{rmk}{Remark}
|
||
%\newproof{pf}{Proof}
|
||
%\newproof{pot}{Proof of Theorem \ref{thm}}
|
||
|
||
\begin{document}
|
||
\let\printorcid\relax
|
||
\let\WriteBookmarks\relax
|
||
\def\floatpagepagefraction{1}
|
||
\def\textpagefraction{.001}
|
||
|
||
% Short title
|
||
\shorttitle{}
|
||
|
||
% Short author
|
||
\shortauthors{}
|
||
|
||
% Main title of the paper
|
||
\title [mode = title]{Efficient Text-based Evolution Algorithm To Hard-label Adversarial Attacks On Text}
|
||
% Title footnote mark
|
||
% eg: \tnotemark[1]
|
||
% \tnotemark[1,2]
|
||
|
||
% % Title footnote 1.
|
||
% % eg: \tnotetext[1]{Title footnote text}
|
||
% % \tnotetext[<tnote number>]{<tnote text>}
|
||
% \tnotetext[1]{This document is the results of the research
|
||
% project funded by the National Science Foundation.}
|
||
|
||
% \tnotetext[2]{The second title footnote which is a longer text matter
|
||
% to fill through the whole text width and overflow into
|
||
% another line in the footnotes area of the first page.}
|
||
|
||
|
||
% Here goes the abstract
|
||
\begin{abstract}
|
||
Deep neural networks that play a pivotal role in fields such as images, text, and audio are vulnerable to adversarial attacks. In current textual adversarial attacks, the vast majority are configured with a black-box soft-label which is achieved by the gradient information or confidence of the model. Therefore, it becomes challenging and realistic to implement adversarial attacks using only the predicted top labels of the hard-label model. Existing methods to implement hard-label adversarial attacks use population-based genetic optimization algorithms. However, this approach requires significant query consumption, which is a considerable shortcoming. To solve this problem, we propose a new textual black-box hard-label adversarial attack algorithm based on the idea of differential evolution of populations, called the text-based differential evolution (TDE) algorithm. First, the method will judge the importance of the words of the initial rough adversarial examples, according to which only the keywords in the text sentence will be operated, and the rest of the words will be gradually replaced with the original words so as to reduce the words in the sentence in which the replacement occurs. Our method judges the quality of semantic similarity of the adversarial examples in the replacement process and deposits high-quality adversarial example individuals into the population. Secondly, the optimization process of adversarial examples is combined and optimized according to the word importance. Compared with existing methods based on genetic algorithm guidance, our method avoids a large number of meaningless repetitive queries and significantly improves the overall attack efficiency of the algorithm and the semantic quality of the generated adversarial examples. We experimented with multiple datasets on three text tasks of sentiment classification, natural language inference, and toxic comment, and also perform experimental comparisons on models and APIs in realistic scenarios.
|
||
{\color{red} For example, in the Google Cloud commercial API adversarial attack experiment, compared to the existing hard-label method, our method reduces the average number of queries required for the attack from 6986 to 176, and increases semantic similarity from 0.844 to 0.876.} It is shown through extensive experimental data that our approach not only significantly reduces the number of queries, but also significantly outperforms existing methods in terms of the quality of adversarial examples.
|
||
% \noindent\texttt{\textbackslash begin{abstract}} \dots
|
||
% \texttt{\textbackslash end{abstract}} and
|
||
% \verb+\begin{keyword}+ \verb+...+ \verb+\end{keyword}+
|
||
% which
|
||
% contain the abstract and keywords respectively.
|
||
|
||
% \noindent Each keyword shall be separated by a \verb+\sep+ command.
|
||
\end{abstract}
|
||
|
||
% Use if graphical abstract is present
|
||
% \begin{graphicalabstract}
|
||
% \includegraphics{figs/grabs.pdf}
|
||
% \end{graphicalabstract}
|
||
|
||
% Research highlights
|
||
% \begin{highlights}
|
||
% \item Research highlights item 1
|
||
% \item Research highlights item 2
|
||
% \item Research highlights item 3
|
||
% \end{highlights}
|
||
|
||
% Keywords
|
||
% Each keyword is seperated by \sep
|
||
\begin{keywords}
|
||
Natural language processing \sep Machine learning \sep Language model \sep Adversarial attack \sep Black-box attack \sep Hard-label \sep
|
||
\end{keywords}
|
||
\maketitle
|
||
\section{Introduction}
|
||
|
||
In recent years, deep neural networks (DNNs) have rapidly developed and achieved great success in computer vision\cite{schmidhuber2015deep}, natural language processing (NLP), audio processing, and graph data processing. However, at the same time, DNNs are also vulnerable to adversarial examples\cite{szegedy2013intriguing,goodfellow2014explaining} which can make DNN models produce wrong classification predictions by adding perturbations, that are not easily detectable by humans, to the original examples. This is especially the case with real-world scenarios such as medical, autonomous driving, and financial fields wherein adversarial examples can place people's property, and even health safety, at risk. Nowadays, research on adversarial examples mainly focuses on computer vision (CV), especially in image recognition and classification. Recently, more people have started to devote their attention to the field of NLP\cite{papernot2016crafting,kwon2022ensemble,shao2022triggers}. However, compared to the richness and diversity of CV adversarial attack algorithms, adversarial attacks in NLP still need room for development. First, because text data is discrete, the image domain adversarial example generation methods cannot be directly applied to it. Second, while perturbations in images are small changes in pixel values that are difficult to detect with the human eye, small perturbations for textual adversarial examples such as the substitution of characters or words that produce invalid words or syntactically incorrect sentences which may change the semantics of sentences can be easily detected. In addition, if the gradient-based adversarial attack method in the image domain is directly applied to text features after vectorization, the generated adversarial examples may be invalid sequences of characters or words.
|
||
|
||
Adversarial attacks are mainly divided into white-box\cite{ebrahimi2017hotflip} and black-box\cite{gao2018black} attacks. In white-box attacks, it is necessary to obtain prior information on the target model such as parameters, architecture, and training data to thereby obtain the gradient information about the target model. So, it is necessary to satisfy more prerequisites to achieve white-box attacks and obtain sufficient information about the model. In a black-box attack, the internal structure and relevant parameters of the target model cannot be obtained, but only the model inputs and outputs, or the discriminative probability, or confidence scores to achieve an adversarial attack. Based on this, a black-box attack that only predicts the label by the top of the target model and does not use other confidence information is a hard-label adversarial attack.
|
||
|
||
Therefore, it is crucial to achieve a fast and efficient generation of high-quality adversarial examples under black-box hard-label. To address this challenge, the study proposes a differential evolution-based method\cite{das2010differential} called text-based differential evolution (TDE) algorithm which has high generation efficiency and can greatly reduce the number of queries to generate high-quality adversarial examples. {\color{red}Our method rejects the optimization strategy based on the idea of genetic algorithm in existing methods, because this optimization strategy is very likely to lead to local optimal solutions and repeated search when individuals in the population tend to be the same or similar, so our method introduces a certain amount of diversity through the design of multiple variant individuals, and can avoid repeated search of the same antagonistic sample individuals through continuous individual updating during the optimization of the antagonistic sample, so it can improve the overall efficiency of the attack process more effectively.} The optimization method for initial adversarial examples of hard-label is divided into two steps. In the first step, keywords are selected for population generation by judging the importance of words that are replaced in the initial adversarial example which in turn can greatly reduce the number of populations. In the second step, generated populations are selected among the better ones wherein only the head populations with high semantic similarity proceed for combinatorial optimization to avoid an excessive number of meaningless queries. Detailed experiments are conducted to compare the proposed algorithm. Experimental results show that the approach can greatly reduce the number of queries and still generate high-quality adversarial examples in different scenarios, for different models, and different textual tasks under different experimental settings.
|
||
|
||
{\color{red}In this study,we propose an efficient adversarial attack method based on the idea of differential evolution under the text black-box hard-label, which increases diversity and avoids local optimal and repeated search by multiple initialization and continuous updating in the individuals of the population, thus ensuring the efficiency and quality of generation of adversarial example generation. Therefore, our main contributions in this paper are the following}:
|
||
\begin{enumerate}[(1)]
|
||
\itemsep=0pt
|
||
\item The first implementation of a population-based differential evolution method of hard-label that can generate high-quality adversarial examples with very low query times;
|
||
\item The addition of multi-angle initialization in the process of initializing adversarial examples to avoid the impact of single initialization on the quality of adversarial examples;
|
||
\item Then, with our high-efficiency method which can greatly reduce the number of queries and time required in the attack process, making the NLP text attack of hard-label more realistic and significantly reducing the expenditure required for the attack.
|
||
\end{enumerate}
|
||
|
||
|
||
This paper has seven chapters, each of which is organized as follows: Section 1, which introduces the background and problems of the existing work. Section 2 presents work related to adversarial attacks. Section 3 describes the advantages of our work over existing work. Section 4 introduces the attack algorithm of our method in detail. Section 5 introduces information on experiments. Section 6 compares the analysis of experimental results with several baselines. Section 7 summarizes the work of this paper and identifies future research directions.
|
||
|
||
|
||
\section{Related Work}
|
||
|
||
According to the research shown in Table I of recent years of reviews related to adversarial attacks and defenses in the field of NLP. The adversarial attacks of text are divided in two ways:
|
||
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{{Survey List}}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|}
|
||
\hline
|
||
\textbf{Title} & \textbf{Year} \bigstrut\\
|
||
\hline
|
||
Machine Learning Model Security and Privacy Research:A survey\cite{ji2020Machine} & 2020 \bigstrut\\
|
||
\hline
|
||
Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey\cite{zhang2020adversarial} & 2020 \bigstrut\\
|
||
\hline
|
||
Towards a Robust Deep Neural Network in Texts: A Survey\cite{wang2019towards} & 2021 \bigstrut\\
|
||
\hline
|
||
Measure and Improve Robustness in NLP Models: A Survey\cite{wang2021measure} & 2022 \bigstrut\\
|
||
\hline
|
||
Adversarial attack and defense technologies in natural language processing: A survey\cite{qiu2022adversarial} & 2022 \bigstrut\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
\textbf{White-box attacks} All information of the target model, model architecture, model parameters, training data, gradient information, loss function, and model output can be obtained.
|
||
|
||
\textbf{Black-box attacks} It cannot obtain the aforementioned key information about the model with the exception of the model output. Also, it can be further subdivided to the base black-box attacks. If the model output contains the prediction confidence, then this black-box attack can be classified as \textbf{soft-label}. Meanwhile, if only the top prediction label of the model can be obtained without the prediction confidence, then this attack is \textbf{hard-label}.
|
||
\subsection{White-box Attacks}
|
||
This study is not the first to investigate the problem of adversarial examples of text sequences and to propose an adversarial text generation method based on the idea of the \emph{JSMA} algorithm which successfully attacked recurrent neural networks. In fact, Ebrahimi proposed \emph{HotFlip2017}\cite{ebrahimi2017hotflip}, a white-box adversarial text generation method based on gradient optimization, and extended it to targeted attacks in subsequent work. The method can handle discrete text structures in one-hot representation and make character-level text classification models wrong by character substitution. Based on the idea of the FGSM algorithm, \emph{Liang}\cite{liang2017deep} proposed a gradient to measure the degree of influence of words on the classification result and to perturb important words by inserting, deleting, and modifying these. However, the process of adding perturbations in this method requires human intervention, so \emph{Samanta}\cite{samanta2017towards} automated this process and restricted the words to be replaced/added to maintain the correct grammatical structure of the original text. Meanwhile, \emph{Gong}\cite{gong2018adversarial} perturbed the word vector (Word Embedding) based on the ideas of FGSM and Deepfool followed by the use of WordMover Distance (WMD) to find the nearest neighbor words for replacement.
|
||
On the other hand, \emph{Lei}\cite{lei2019discrete} demonstrated the submodularity of network functions for text classification and showed that the greedy algorithm can approximate the optimal solution very well.
|
||
\subsection{Balck-box Attacks}
|
||
{$\textbf{black-box soft-label}$}
|
||
The \emph{Deepwordbug2018}\cite{gao2018black} subtle textual perturbations are effectively generated in a black-box setting which forces the classifier to misclassify textual input. It adopts a novel scoring strategy to identify key tokens wherein modifying these tokens allows the classifier to make incorrect predictions. In \emph{Textbugger2018}\cite{li2018textbugger}, the most important sentences are found first and then a scoring function is used to find the keywords in the sentences. Meanwhile, \emph{GA2018}\cite{alzantot2018generating} is a black-box-based population optimization algorithm to generate semantically and syntactically similar adversarial examples. \emph{PWWS2019}\cite{ren2019generating} introduces a new word substitution order jointly determined by word salience and classification probability then proposes a greedy algorithm called probabilistic weighted word saliency for a textual adversarial attack. \emph{SememePSO2019}\cite{zang2019word} is a particle swarm optimization algorithm used as the search algorithm to generate adversarial examples to the black-box setting. The \emph{BAE2020}\cite{garg2020bae} is black-box attacks using contextual perturbations from the BERT mask language model to generate adversarial examples. \emph{Textfooler2020}\cite{jin2020bert} is a baseline method based on textual adversarial using synonyms to replace vulnerable words of sentences. On the other hand, \emph{Bertattack2020}\cite{li2020bert} is an approach to generate high-quality adversarial examples using BERT pre-trained masked language models.
|
||
Next, \emph{LSH and Attention2021}\cite{maheshwary2021strong} uses attention mechanisms and locally sensitive hashing (LSH) so that the number of queries is reduced by a combination of both.
|
||
Then, \emph{SemAttack2022}\cite{wang2022semattack} constructs semantic perturbation functions and thus searching for the best perturbation in the different semantic spaces identified. This can generate adversarial examples more efficiently.
|
||
Finally, \emph{DiscreteBlockBayesAttack2022}\cite{lee2022query} by querying discrete text data using Bayesian optimization, the ARD classification kernel is used to dynamically compute significant positions and thus efficiently generate adversarial examples.
|
||
|
||
|
||
$\textbf{black-box hard-label}$ Attacks on hard-label can be more difficult than on soft-label, except these are also more realistic. Therefore, the adversarial attacks of hard-label are also gradually developing.
|
||
In \emph{TextDecepter2020}\cite{saxena2020textdecepter} there is no public model information and the attacker can only query the model to obtain the final decision of the classifier without the confidence scores of the involved classes. Meanwhile, in \emph{Hard-label attack2021}\cite{maheshwary2021generating} the words in the adversarial examples are optimized by a genetic algorithm to generate high-quality adversarial examples.
|
||
|
||
According to the research, there has been a tremendous growth in research related to hard-label\cite{qin2022fuzzing} and adversarial attacks\cite{xu2020community}\cite{xu2023adversarial} in other fields. Adversarial attacks on black-box soft-label have also progressed tremendously, and today's work has gradually evolved to the question of how to improve the efficiency of attacks. But there is still a huge research space for the study of hard-label and adversarial attacks. Therefore, based on the existing hard-label attacks in the field of NLP, there are still many aspects to be improved. The process of generating adversarial examples can continue to be improved in terms of quality and efficiency. Therefore, it is necessary and meaningful to study how to efficiently generate high-quality adversarial text under black-box hard-label.
|
||
|
||
\section{Inferences and Challenges}
|
||
The mutation operation of a genetic algorithm is a transformation of a segment of genes of individuals in a population to obtain a new individual, that is, a new set of solutions. Its purpose is to find a better choice by generating new solutions. However, the new segment of genes generated after the mutation may have genetic overlap with individuals in the original population. If this is the case, then it means that the mutation is meaningless and does not generate a new solution. When it comes to the later stage of optimization, the whole population may fall into a local optimum. Therefore, the solution after the mutation operation needs to be distinguishable from the solution that already exists in the population. Then, the optimization algorithm of differential evolution can satisfy this requirement. The mutation is achieved by adding the difference of certain two existing individuals to another individual. By scaling, it is ensured that both the newly generated individuals as well as the individuals in the original population have a certain difference thereby ensuring the searchability of the new individuals.
|
||
|
||
\begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.37]{figs/Algorithm Comparison.pdf}
|
||
\caption{Algorithm Comparison}
|
||
\label{fig:1}
|
||
\end{figure}
|
||
Under the population-based optimization algorithm, the differential evolution algorithm (DE)\cite{panduro2009comparison}\cite{karabouga2004simple}\cite{vesterstrom2004comparative} has a global search capability close to that of the genetic algorithm (GA), but at the same time has a faster convergence speed than the genetic algorithm. As can be seen in Figure I, when the optimal solution required by the target function is reached, DE iterates 100 times, but GA needs to iterate 200 times to complete. Thus, it can further solve the problem of low efficiency in existing work while ensuring the quality of adversarial examples. According to the research of related articles and references\cite{alatas2020comparative}\cite{akyol2017plant}, it is feasible to improve the efficiency of genetic algorithms in existing work by differential evolutionary optimization algorithms for adversarial attacks under black-box hard-label.
|
||
|
||
Existing work and our method, unlike the original evolutionary algorithm, the search space for word-level adversarial example search is discrete. Therefore, the related work on population evolution is borrowed from related ideas for research design attack algorithms. The adversarial attack under black-box hard-label cannot determine the decision boundary because the model parameters, confidence information and training information are not available, and the decision boundary can only be approximated infinitely by a large number of attempts to obtain the top prediction results of the model. This is the reason why the study of adversarial attacks under black-box hard-label requires a large number of queries and is challenging.
|
||
Among the challenges, the following need to be addressed:
|
||
\begin{enumerate}[(1)]
|
||
\itemsep=0pt
|
||
\item In the field of NLP, there are not many related precedents to the combination of black-box hard-label adversarial attacks and the idea of differential evolution.
|
||
\item Because differential evolution is a more efficient optimization algorithm, then the proposed method must be able to significantly improve efficiency compared to the original genetic algorithm.
|
||
\item It is crucial to ensure a balance between efficiency and the quality of adversarial example generation.
|
||
\end{enumerate}
|
||
\section{Proposed Work}
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{{Algorithm Symbols List.}}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{ll}
|
||
\hline
|
||
\hline
|
||
Symbols & Details \bigstrut\\
|
||
\hline
|
||
$\mathbb X$ & Original input text \\
|
||
F & The text classifier \\
|
||
$\mathbb Y$ & Prediction of classifier for original input text \\
|
||
$\mathbb{X^{\prime}}$ & The initial adversarial example \\
|
||
$\mathbb X^{\prime\prime}$ & Adversarial example after reducing the search space \\
|
||
$\mathbb X^{\ast}$ & Better adversarial example \\
|
||
$\textit{w}$ & Synonyms of the word \\
|
||
$\mathit{S_{i}}$ & Score of semantic similarity \\
|
||
$\epsilon$ & Semantic similarity of the initial adversarial example \\
|
||
$\mathit S^{\ast}$ & Better score of semantic similarity \\
|
||
$\mathbf{G}$ & Maximum iterations \\
|
||
$\mathbf{S}$ & Initial generation size \\
|
||
$\mathit{P}$ & Generated populations \\
|
||
$\mathbb{X^{\ast}_{ADV}}$ & Optimal adversarial example \bigstrut[b]\\
|
||
\hline
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
\subsection{Problem Formulation}
|
||
Given a text sample $\mathbb{X}$ with n words, $\mathbb{X}$=[$x_{1}$,$x_{2}$,.... ,$x_{n}$]. The ground truth label corresponding to this $\mathbb{X}$ target model is $\mathbb{Y}$. At this point, we want to get an adversarial example $\mathbb{X^{\prime}}$ that is similar to $\mathbb{X}$. This $\mathbb{X^{\prime}}$ can make the target model misclassify it, and this classifier is F, i.e.,
|
||
% \begin{center}
|
||
% $\mathrm{f}\left(\mathbb{X^{\prime}}\right) \neq \mathrm{f}(\mathbb{X})=\mathbb{Y}$
|
||
% \end{center}
|
||
\begin{equation}
|
||
\hspace*{1.85cm} F\left(\mathbb{X^{\prime}}\right) \neq
|
||
F(\mathbb{X})=\mathbb{Y}
|
||
\end{equation}
|
||
|
||
Since all the adversarial examples associated with $\mathbb{X}$ are generated by synonymous substitution of words in $\mathbb{X}$, the word that is synonymously substituted for $x_{i}$ is $w_{i}$. Then, the adversarial example is $\mathbb{X^{\prime}}$=[$w_{1}$,$w_{2}$,.....,$w_{i}$,...$x_{n}$]. The goal is to find an adversarial example $\mathbb{X^{\ast}}$ with the highest semantic similarity ($\mathit{Sim}$) among all the adversarial examples $\mathbb{X^{\prime}}$. In the process of generating adversarial examples, not only is the semantic similarity quality of the adversarial example considered, but also the efficiency. Therefore, the average number of queries ($\mathit{Qrs}$) is introduced. The study pursues the high efficiency of adversarial example generation and to guarantee the high-quality of adversarial examples this can be entirely expressed as a formula:
|
||
% \begin{center}
|
||
% \end{center}
|
||
\begin{gather}
|
||
\mathbb{X^{\ast}_{ADV}}=\max \operatorname{Sim}\left(\mathbb{X}, \mathbb{X^{\prime}}\right)
|
||
\& \min \operatorname{Qrs}\left(\mathbb{X^{\prime}}\right),\\
|
||
\hspace*{1.85cm} \notag s.t. F\left(\mathbb{X^{\prime}}\right) \neq F(\mathbb{X})
|
||
\end{gather}
|
||
\subsection{Method}
|
||
|
||
|
||
|
||
The study proposes a new black-box hard-label attack method: a population-based DE to solve the inefficiency problem of existing methods while further improving semantic similarity of adversarial examples. The relevant details of the algorithm are shown in Table II.
|
||
|
||
In order to consider the generated adversarial examples to be as similar as possible to the original text in terms of structure and semantics. Therefore, we seek to achieve minimal changes and perturbations. Therefore, we only perform the operation of synonym substitution for the keywords in the text to be determined, without adding, inserting or deleting, to generate adversarial examples with high semantic similarity.
|
||
|
||
|
||
\begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.105]{figs/Attack Diagram.pdf}
|
||
\caption{{Attack Diagram.}}
|
||
\label{fig:1}
|
||
\end{figure}
|
||
|
||
\begin{algorithm}[h] %如果不能显示,这里要把[H]给加上就行了“H”是指定伪代码浮动体的位置
|
||
\caption{Initialize adversarial examples and search space reduction}
|
||
\small
|
||
\LinesNumbered
|
||
|
||
\KwIn{Original input text $\mathbb X$ , the text classifier F, label F($\mathbb X$){=}$\mathbb Y$}
|
||
\KwOut{Better adversarial example $\mathbb X^{\ast}$}
|
||
$indices$ $\leftarrow$ Randomly select positions \\
|
||
\textbf{for} $i$ in indices \textbf{do}\\
|
||
\hspace*{0.5cm} $\textit{w} $ $\leftarrow$ random(Syn($x_{i}$))\\
|
||
\hspace*{0.5cm} $\mathbb X^{\prime}$ $\leftarrow$ Replace $x_{i}$ with w in $\mathbb X$\\
|
||
\hspace*{0.5cm} $\textbf{if}$ F($\mathbb X$) $\not=$ $\mathbb Y$ \textbf{then} \\
|
||
\hspace*{1cm} $\boldsymbol{break}$ \\
|
||
$\mathbb{X^{\ast}}$ $\leftarrow$ $\mathbb{X^{\prime}}$ //Initialize Adversarial Examples; \\
|
||
\textbf{for} $i$ in indices \textbf{do}\\
|
||
\hspace*{0.5cm} $\mathbb X_{i}$ $\leftarrow$ Replace $w_{i}$ with $x_{i}$ in $\mathbb{X^{\ast}}$ \\
|
||
\hspace*{0.5cm} $\mathit{S_{i}}$ $\leftarrow$ $\mathit{Sim}$($\mathbb X$,$\mathbb X_{i}$) \\
|
||
\hspace*{0.5cm} $\textbf{if}$ F($\mathbb X_{i}$) $\not=$ $\mathbb Y$ \textbf{then} \\
|
||
$\hspace*{0.5cm} Queue.append$ ($\mathit{S_{i}}$,$x_{i}$)\\
|
||
Sort $Queue$ by $\mathit{S_{i}}$\\
|
||
\textbf{for} $x_{i}$ in $Queue$ \textbf{do}\\
|
||
\hspace*{0.5cm} $\mathbb X^{\prime\prime}$ $\leftarrow$ Replace $w_{i}$ with $x_{i}$ in $\mathbb X^{\ast}$ \\
|
||
\hspace*{0.5cm} $\textbf{if}$ F($\mathbb X^{\prime\prime}$) {{=}} $\mathbb Y$ \textbf{then} \\
|
||
\hspace*{1cm} $\boldsymbol{break}$\\
|
||
\hspace*{0.5cm} $\mathit S^{\ast}$ $\leftarrow$ $\mathit{Sim}$($\mathbb X$,$\mathbb X^{\prime\prime}$) \\
|
||
\hspace*{1cm} $\textbf{if}$ $\mathit S^{\ast}$ $\geqslant$ $\epsilon$ \\
|
||
$\mathbb X^{\ast}$ $\leftarrow$ $\mathbb X^{\prime\prime}$ //Search Space Reduction;\\
|
||
\hspace*{1cm} $\textbf{else}$ $Queue^{\ast}.append$($\mathit S^{\ast}$,$\mathbb X^{\prime\prime}$) \textbf{then}\\
|
||
\hspace*{1cm} \textbf{return} Initialize Adversarial Examples \\
|
||
\hspace*{1cm} $\mathit S^{\ast}_{i}$ $\leftarrow$ $\mathit{Sim}$($\mathbb X$,$\mathbb X_{i}^{\prime\prime}$)\\
|
||
\hspace*{1cm} $\textbf{else}$ $Queue^{\ast}.append$($\mathit S^{\ast}_{i}$,$\mathbb X^{\prime\prime}_{i}$) \\
|
||
Sort $Queue^{\ast}$ by $\mathit{S_{i}^{\ast}}$ \\
|
||
$\mathbb X^{\ast}$ = $\mathop{\mathrm{arg\max}}{\mathit S^{\ast}_{i}}$\\
|
||
\textbf{return} $\mathbb X^{\ast}$
|
||
\end{algorithm}
|
||
\subsubsection{Adversarial example generation and search space reduction}
|
||
{$\textbf{Initializing adversarial example}$}
|
||
As described in Algorithm 1, the process of initializing the adversarial example $\mathbb{X^{\prime}}$ by performing synonym replacement of the original input sentence $\mathbb{X}$, the synonyms of each word in the sentence, except for the filtered-out stop words, are guaranteed to be selected as candidates for the first 50 synonyms provided they are the same part of speech. By continuously replacing the word $x_{i}$ in sentence $\mathbb{X}$, the sentence is allowed to gradually move outside the decision boundary so that the classifier's judgment of the sentence changes at which point the initialization of the adversarial example is completed. To control the semantic similarity, the threshold value of words undergoing replacement is set at $30\%$ and no large-scale replacement of words in the sentence is allowed to occur. In this way, the generation and semantic similarity of the adversarial examples as well as the facilitation of a series of operations afterwards can be ensured.
|
||
|
||
{$\textbf{Reduce search space}$} After initializing the adversarial example, it is not yet possible to directly perform the optimization. This is because the number of replaced words of the adversarial example is still large at this point. If the search space is not reduced at this point, then the optimization algorithm will search for all words where substitution occurs. Both GA and DE, in this case, will cause a large number of unnecessary queries. This is why it is necessary to minimize the number of words that are replaced and to maintain an adversarial example. With this step, the number of word substitutions and the optimization process workload can be reduced.
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{{Multiple Initialization Diagram.}}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|ccc|}
|
||
\hline
|
||
Original Text & an enthralling , entertaining feature. & $\mathit{Sim}$ \bigstrut\\
|
||
\hline
|
||
Single Initialization & an \textcolor{red}{\textit{[[[puzzling]]}} ,\textcolor{red}{\textit{[[comic]]}} feature . & 0.451 \bigstrut[t]\\
|
||
Multiple Initialization & an enthralling , \textcolor{red}{\textit{[[comical]]}} \textcolor{red}{\textit{[[peculiarity]]}} . & \textbf{0.734} \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
As shown in Figure II, it is the whole attack diagram. At the same time, the proposed algorithm is designed to avoid the problem of the general quality of the adversarial examples caused by a single random initialization. If the difference between the semantic similarity with the adversarial examples and the original input is too large, random initialization is performed again from different directions to select the adversarial example of the highest semantic similarity for the next optimization process. As can be seen in Table III, the semantic similarity can be increased from 0.451 to 0.734.
|
||
% % Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\subsubsection{DE-based optimization algorithm}
|
||
\begin{algorithm}[h]%如果不能显示,这里要把[H]给加上就行了“H”是指定伪代码浮动体的位置
|
||
\small
|
||
\caption{Optimize adversarial examples}
|
||
|
||
\LinesNumbered
|
||
|
||
\KwIn{Original input text $\mathbb X$, Better adversarial example $\mathbb X^{\ast}$, maximum iterations $\mathbf{G}$, initial generation size $\mathbf{S}$ }
|
||
|
||
\KwOut{Optimal Adversarial Example $\mathbb{X^{\ast}_{ADV}}$}
|
||
|
||
\textbf{for} $i$ = 1 $\rightarrow$ $\mathbf{S}$ population \textbf{do}\\
|
||
\hspace*{0.5cm} $\mathit{P^{0}_{i}}$ $\leftarrow$ Mutate( $\mathbb X^{\ast}$,$w_{i}$) \\
|
||
\textbf{for} $g$ = 1 $\rightarrow$ $\mathbf{G}$ generation \textbf{do}\\
|
||
\hspace*{0.5cm} \textbf{for} $i$ = 1 $\rightarrow$ $\mathbf{S}$ population \textbf{do}\\
|
||
\hspace*{1cm} $\mathit{S_{i}^{g-1}}$ = $\mathit{Sim}$( $\mathbb{X^{\ast}}$,$\mathit{P^{g-1}_{i}}$) \\
|
||
$\mathit{P^{g}_{i}}$ = $\mathop{\mathrm{arg\max}}{\mathit{S_{i}^{g-1}}}$\\
|
||
$\mathit{P^{g}}$.append($\mathit{P^{g}_{i}}$)\\
|
||
\textbf{for} $i$ = 2 $\rightarrow$ $\mathbf{S}$ population $\textbf{do} $\\
|
||
\hspace*{0.5cm} ${P^{g\ast}_{i}}$ $\leftarrow$ $Crossover$($\mathit{P^{g}_{i}}$,${P^{g}_{j}}$,${P^{g}_{k}}$) 3 $\leq $ i $\neq $ j $\neq $ k $\leq $ $\mathbf{S}$ \\
|
||
\hspace*{1.0cm}$\mathbf{if}$ $\mathit{Sim}$($\mathbb{X}$ , ${P^{g\ast}_{i}}$ ) $\geq $ $\mathit{Sim}$($\mathbb X$ , $\mathbb X^{\ast}$) \\
|
||
\hspace*{1.0cm} $\mathit{S_{i}^{g\ast}}$ = $\mathit{Sim}$( $\mathbb{X^{\ast}}$,$\mathit{P^{g\ast}_{i}}$) \\
|
||
\hspace*{0.5cm} $\mathbb{X^{\ast}_{ADV}}$ = $\mathop{\mathrm{arg\max}}\mathit{S_{i}^{g\ast}}$ \\
|
||
$\mathbf{else} $\\
|
||
\textbf{for} $i$ = 1 $\rightarrow$ $\mathbf{S}$ population \textbf{do}\\
|
||
\hspace*{0.5cm} Randomly Sample $parent_{j}$,$parent_{k}$ from $\mathit{P^{g}}$ \\
|
||
\hspace*{0.5cm} $\mathit{child_{i}} $ = Selection($parent_{j}$, $parent_{k}$) j $\neq $ k\\
|
||
\hspace*{0.5cm} $\mathbf{if} $ $\mathit{Sim}$($\mathbb X$,$\mathit{child_{i}}$) $\geq $ $\mathit{Sim}$( $\mathbb X$ , $\mathbb X^{\ast}$) \\
|
||
\hspace*{1.0cm} $\mathit{S_{i}^{child}}$ = $\mathit{Sim}$( $\mathbb{X^{\ast}}$,$\mathit{child_{i}}$) \\
|
||
\hspace*{0.5cm} $\mathbb{X^{\ast}_{ADV}}$ = $\mathop{\mathrm{arg\max}}\mathit{S_{i}^{child}}$ \\
|
||
% \hspace*{1cm} $\mathbb{X^{\ast}_{ADV}}$ = $\mathit{child_{i}} $ \\
|
||
\hspace*{0.5cm} $\mathbf{else} $\\
|
||
\hspace*{1cm}$\mathit{P^{g-1}_{i}}$ $\leftarrow$ Mutate( $\mathbb X^{\ast}$,$w_{i}$) \\
|
||
\hspace*{1cm}$\mathbb{X^{\ast}_{ADV}}$ = $\mathit{Sim}$($\mathbb X$ , $\mathit{P^{g-1}_{i}}$) \\
|
||
\textbf{return} $\mathbb{X^{\ast}_{ADV}}$
|
||
\end{algorithm}
|
||
The DE is an efficient global optimization algorithm that is also a population-based heuristic search algorithm. As described in Algorithm 2, the quality of the candidate words in the population is evaluated using semantic similarity as an adaptation function during each iteration. As shown in Figure III, the optimization algorithm of the whole adversarial examples has three parts. The generated high-quality candidate words are then deposited into the population through variation, crossover, and selection operations. The DE algorithm has 4 main steps:
|
||
\begin{enumerate}[(1)]
|
||
\itemsep=0pt
|
||
\item \textit{Initialization}: The adversarial example, after search space reduction, is used as the initial population.
|
||
\item \textit{Mutation}: The words in the initial population are replaced by synonyms, and a certain number of individuals are deposited in the population under the premise of satisfying the adversarial example.
|
||
\item \textit{Crossover}: As much as possible, the individuals in the population are replaced by words with high semantic similarity while the original input is also deposited as individuals in the sample.
|
||
\item \textit{Selection}: The replacement operation is performed for each word that is different from the original input. First, try to replace the word with the original input and update the population if it can satisfy the adversarial example. Otherwise, select the word with high semantic similarity to replace and update the population.
|
||
\end{enumerate}
|
||
% \begin{figure*}[h]
|
||
% %是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
% \centering
|
||
% \includegraphics[scale=0.13]{figs/DE 11.7.2.pdf}
|
||
% \caption{Differential Evolution Diagram.}
|
||
% \label{fig:3}
|
||
% \end{figure*}
|
||
% \begin{figure*}[h]
|
||
% %是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
% \centering
|
||
% \includegraphics[scale=0.13]{figs/Differential Evolution Diagram.pdf}
|
||
% \caption{{\color{red}Differential Evolution Diagram.}}
|
||
% \label{fig:3}
|
||
% \end{figure*}
|
||
|
||
|
||
\begin{figure*}[h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.13]{figs/Differential Evolution.pdf}
|
||
\caption{{\color{red}Differential Evolution Diagram.}}
|
||
\label{fig:3}
|
||
\end{figure*}
|
||
|
||
|
||
|
||
{$\textbf{Mutation}$} The mutation operation plays a crucial role in the DE algorithm. After Algorithm 1, the better adversarial example is generated. For a given better adversarial example $\mathbb{X^{\ast}}$, which $\mathbb{X^{\ast}}=\left\{x_1, w_2, x_3, w_4, \cdots, x_n\right\}$, the original word-by-word replacement of the words for which replacement has occurred is performed first. If the sample $\mathbb{X^{\ast}}$ cannot reach the adversarial example after the original word replacement, then the word for which replacement has occurred is indicated to be important. With the setting of the hard-label, we achieve the important judgment of words in this way. Throughout the variation process, the search and synonym replacement of important words will be satisfied with priority, and the top individuals with higher semantic similarity will be deposited in the population. If the synonym individuals of important words do not populate the whole population, then the remaining words in the adversarial example $\mathbb{X^{\ast}}$ where substitution occurs are subjected to synonymic substitution and then deposited into the population.
|
||
\begin{gather}
|
||
\mathit{P_1^g}=\left\{x_1, w_2^{\prime}, x_3, w_4, \cdots, x_n\right\}\\
|
||
\mathit{P_2^g}=\left\{x_1, w_2, x_3, w_4^{\prime}, \cdots, x_n\right\}\\
|
||
\notag\hspace*{1cm}\vdots\\
|
||
\mathit{P^g}=\left\{\mathit{P_1^g}, \mathit{P_2^g}, \cdots, \mathit{P_n^g}\right\}
|
||
\end{gather}
|
||
|
||
$\textbf{Crossover}$ In the generated population, the algorithm selects different optimization strategies based on the number of individuals in the population. When three or more words in the population have changed, the crossover operation can be performed. In crossover, individuals in the population are combined with words in positions of high semantic similarity. After several crossovers, all the words that have changed in the adversarial example can be replaced by words with higher semantic similarity. As a result, the adversarial example semantic similarity is greatly improved.
|
||
\begin{gather}
|
||
\mathit{P_i^g}=\left\{x_1, w_2^{\prime}, x_3, w_4^{\prime}, \cdots, w_{n-1}, x_n\right\}\\
|
||
\mathit{P_i^{g{\prime}}}=\mathit{P_i^g}+\lambda\left(\mathit{P_j^g}-\mathit{P_k^g}\right)\\
|
||
\operatorname{Sim}\left(\mathbb X, \mathit{P_i^{g{\prime}}}\right) \geqslant \operatorname{Sim}\left(\mathbb X, \mathbb{X^{\ast}}\right)\\
|
||
\notag\hspace*{1.75cm}\Downarrow \cdots\\
|
||
\mathit{P_i^{g{\ast}}}=\left\{\mathit{P_{i 1}^{g^{\prime}}}, \mathit{P_{i 2}^{g^{\prime}}}, \cdots, \mathit{P_{i n}^{g^{\prime}}}\right\}
|
||
\end{gather}
|
||
|
||
% \begin{figure}[h]
|
||
% %是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
% \centering
|
||
% \includegraphics[scale=0.078]{figs/Combinatorial Optimization Diagram.pdf}
|
||
% \caption{{\color{red}Combinatorial Optimization Diagram.}}
|
||
% \label{fig:4}
|
||
% \end{figure}
|
||
\begin{figure}[h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.080]{figs/Combinatorial Optimization.pdf}
|
||
\caption{{\color{red}Combinatorial Optimization Diagram.}}
|
||
\label{fig:4}
|
||
\end{figure}
|
||
|
||
$\textbf{Selection}$ After several crossover operations, if there are also multiple positions where the word is transformed and it is impossible to continue the crossover operation to improve the quality, corresponding original words are individually selected for replacement to reduce the number of word transformations. If there are only two positions where the words disagree with the original text, then the individuals with the highest semantic similarity are directly selected from the population and combined followed by an attempt to use word substitutions from the original text to reduce the number. If only one position of the word is left to be transformed, the synonym with the highest semantic similarity is selected for replacement. Once the adversarial example quality cannot be further improved, the adversarial example is generated as the optimal adversarial example. Figure IV shows the whole optimization process.
|
||
\begin{gather}
|
||
p_i^g=\left\{x_1, w_2^{\prime}, w_3^{\prime}, \cdots, w_i^{\prime}, \ldots, x_n\right\}\\
|
||
\notag\hspace*{3.17cm}\Downarrow \emph{Update}\\
|
||
p_i^g=\left\{x_1, w_2^{\prime}, w_3^{\prime}, \ldots, x_i, \ldots, x_n\right\}
|
||
\end{gather}
|
||
|
||
\section{Experiments}
|
||
To demonstrate the effectiveness of the proposed algorithm, extensive experiments were conducted on three tasks: text classification\cite{pang2002thumbs,zhang2015character}, toxic comment detection\cite{hosseini2017deceiving}, and natural language inference\cite{bowman2015large} under different scenario settings in six models and four APIs using nine standard datasets and a set of crawler data. By comparing with several baselines, the proposed hard-label under the population-based differential evolution attack method generates adversarial examples with high efficiency and quality. The experiments are based on the TextAttack\cite{morris2020textattack} framework.
|
||
|
||
\subsection{Datasets}
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Statistics of all datasets}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{lrrrr}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\textbf{Task}} & \multicolumn{1}{c|}{\textbf{Dataset}} & \multicolumn{1}{|c|}{\textbf{Train}} & \multicolumn{1}{c|}{\textbf{Test}} & \multicolumn{1}{c|}{\textbf{Avg Len}} \bigstrut\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{7}[2]{*}{\textbf{Classification}}} & \multicolumn{1}{c|}{MR} & \multicolumn{1}{|c|}{9K} & \multicolumn{1}{c|}{1K} & \multicolumn{1}{c|}{20} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{IMDB} & \multicolumn{1}{c|}{12K} & \multicolumn{1}{|c|}{12K} & \multicolumn{1}{c|}{215} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{CoLA} & \multicolumn{1}{c|}{8.5K} & \multicolumn{1}{|c|}{1K} & \multicolumn{1}{c|}{9} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{SST2} & \multicolumn{1}{c|}{67K} & \multicolumn{1}{|c|}{1.8K} & \multicolumn{1}{c|}{17} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{AMAZON} & \multicolumn{1}{c|}{1.8M} & \multicolumn{1}{|c|}{200K} & \multicolumn{1}{c|}{82} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{YELP} & \multicolumn{1}{c|}{560K } & \multicolumn{1}{|c|}{18K } & \multicolumn{1}{c|}{152} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{TWEETS} & \multicolumn{1}{c|}{} & \multicolumn{1}{|c|}{\textbf{*}} & \multicolumn{1}{c|}{} \bigstrut[b]\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\textbf{Toxic}} & \multicolumn{1}{c|}{Jigsaw-toxicity} & \multicolumn{1}{c|}{150K} & \multicolumn{1}{c|}{150K} & \multicolumn{1}{c|}{128} \bigstrut\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{\textbf{Entailment}}} & \multicolumn{1}{c|}{SNLI} & \multicolumn{1}{|c|}{120K} & \multicolumn{1}{c|}{7.6K} & \multicolumn{1}{c|}{8} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{MNLI} & \multicolumn{1}{c|}{12K} & \multicolumn{1}{|c|}{4K} & \multicolumn{1}{c|}{11} \bigstrut[b]\\
|
||
\hline
|
||
*Crawler gets comments from Twitter & & & & \bigstrut[t]\\
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
As shown in Table IV, \emph{MR}\cite{pang2005seeing} is a movie review dataset for sentiment binary classification tasks. \emph{IMDB}\cite{maas2011learning} is another dataset of longer statements and emotionally dichotomous movie reviews. \emph{CoLA}\cite{warstadt2019neural} is a single-sentence sentiment dichotomy task dataset collected from reviews of books and journals. \emph{SST2}\cite{socher2013recursive} is a single-sentence sentiment dichotomy task dataset collected from movie reviews and human annotations. \emph{AMAZON}\cite{lhoest-etal-2021-datasets} is a sentence sentiment classification task dataset collected from Amazon user reviews against products.\emph{YELP}\cite{zhang2015character} is a widely used text dataset collected for a binary sentiment classification task. \emph{TWEETS} uses a crawler and Twitter's API to retrieve the comments from Twitter users and clean the texts to remove cluttered symbols, emoji, and sparse terms.\emph{Jigsaw-toxicity}\cite{lhoest-etal-2021-datasets} is the dataset that consists of a large number of Wikipedia reviews that have been manually labeled as toxic behavior by commissioners. Finally, \emph{SNLI}\cite{bowman2015large} and \emph{MNLI}\cite{williams2017broad} are two datasets used for natural language inference tasks.
|
||
|
||
\subsection{Target Models}
|
||
We choose to attack from neural networks, language models, and realistic scenarios of APIs, so that the scope of the attack can be better covered. Neural networks include \emph{WordCNN}\cite{kim-2014-convolutional} and \emph{WordLSTM}\cite{hochreiter1997long}. WordCNN is a convolutional neural network for sentence classification. WordLSTM uses the long short term memory.
|
||
The language models attacked are \emph{BERT base-uncased}\cite{devlin2018bert}, \emph{Albert}\cite{lan2019albert}, \emph{DistilBERT base-uncased}\cite{sanh2019distilbert}, and \emph{RoBERTa}\cite{liu2019roberta}. Bert is a bi-directional Transformer, the emergence of Bert is a language model with epoch-making language. It allows many difficult tasks in the field of NLP to be solved. Based on Bert, various language models such as Albert, DistilBERT, RoBERTa have been generated by changing model parameters, distillation and pre-training methods. This has led to a tremendous development of NLP research. These neural networks and language models, all of which are classical target models, have had a huge impact in various tasks on NLP as well as in the study of adversarial attacks.
|
||
|
||
|
||
In addition to this, we also conducted experimental comparisons in realistic scenarios with models and APIs, as this setting is more similar to the black-box hard-label environment. Therefore, it is possible to show the variability of performance between algorithms. The \emph{ALIYUN} commercial API interface, \emph{Google Cloud} API interface, \emph{Facebook FastText}\cite{joulin2016fasttext} model \& API interface, and \emph{AllenNLP}\cite{gardner2018allennlp} model \& API interface were also attacked in a realistic setting.
|
||
|
||
|
||
In \emph{WordCNN} windows of sizes 3, 4, and 5 each having 150 filters were used.
|
||
For \emph{WordLSTM}, a single layer bi-direction LSTM with 150 hidden units was used.
|
||
\emph{BERT} has 12-layer, 768-hidden, 12-heads, and 110M parameters.
|
||
\emph{Albert} has 12 repeating layers, 128 embedding dimension, 768 hidden dimension, 12 attention heads, and 11M parameters.
|
||
\emph{DistilBERT} is a distilled version of the BERT base model. While Bert is a 12-layer transformer encode, the Distilled BERT is a 6-layer transformer encode.
|
||
\emph{RoBERTa} is also a more fine-tuned version of the BERT model using a larger number of model parameters and a larger batch size in the training process.
|
||
\subsection{Baselines}
|
||
The baseline methods used are all proposed methods for black-box text attacks. These include both soft-label and hard-label attack methods. The baselines include:
|
||
|
||
\emph{Textbugger} is a black-box soft-label method that uses the confidence of the model to determine sentence importance then uses a scoring function to score the importance of words in the sentence and generate adversarial examples by synonym replacement.
|
||
|
||
\emph{Textfooler} is a black-box soft-label method that replaces words by judging the importance of words in a sentence and uses changes in semantic similarity to determine the replaced synonyms.
|
||
|
||
\emph{SememePSO} is a black-box soft-label method based on particle swarm optimization attack method and uses semantic similarity as a guide for replacing synonyms.
|
||
|
||
\emph{Hard-label attack} is a black-box hard-label approach that uses population-based genetic algorithm as an attack method and uses semantic similarity as an indicator for replacement words.
|
||
|
||
\subsection{Evaluation Metrics}
|
||
The evaluation metrics used to quantify the quality of the generated adversarial examples are semantic similarity and perturbation rate. In addition to this, several metrics such as attack success rate, the average number of queries, running time, the change of perplexity and achieving rate to evaluate the overall performance of the attack algorithm are included.
|
||
|
||
\emph{Semantic Similarity} is computed by putting the original text samples and the generated adversarial examples into Universal Sequence Encoder\cite{cer2018universal}. In the range [0, 1], higher values are better. The abbreviation for \emph{Semantic Similarity} is $\mathit{Sim}$ .
|
||
|
||
\emph{Attack Success Rate} is able to generate sample ratios that cause the model to make judgment errors. That is, the accuracy of the attack algorithm, and since our work is a study of adversarial attack, so the \emph{Attack Success Rate} is used. The higher the ratio, the better. The abbreviation for \emph{Attack Success Rate} is $\mathit{Succ. }$ .
|
||
|
||
\emph{Perturbation Rate} is the ratio of words in the generated sample that change to those in the original text sample. The lower the ratio, the better. The abbreviation for \emph{Perturbation Rate} is $\mathit{Pert.}$ .
|
||
|
||
\emph{$\Delta$PPL} Perplexity (PPL) is a common metric in text to measure the overall quality of a text. In our work, the language model GPT-2\cite{radford2019language} is used for calculation. The difference between the perplexity of the generated adversarial example and the corresponding perplexity of the original text is used as the evaluation metric. The lower the $\Delta\mathit{PPL}$, the better it is.
|
||
|
||
\emph{Average Number of Queries} comes after completing a given number of attacks and the algorithm needs to query the average number of target model outputs. The lower the number, the better. The abbreviation for \emph{Average Number of Queries} is $\mathit{Qrs}$.
|
||
|
||
\emph{Achieving Rate} is the proportion of samples that can achieve a complete attack process to the total number of samples in the experiments, mainly used to limit the number of queries. The higher the value, the better. The abbreviation for \emph{Achieving Rate} is $\mathit{Ach.}$ .
|
||
\subsection{Experimental Settings}
|
||
The experiment is implemented with three NVIDIA RTX 3090 24G GPUs. The Universal Sequence Encoder (USE) is implemented to demonstrate semantic similarity between the original examples and adversarial examples. Meanwhile, NLTK\cite{bird2006nltk} is used to filter the deactivated words and then Spacy is used for POS tagging.
|
||
|
||
\emph{SememePSO:} The maximum number of iterations is 20, and the maximum value of a single population is 60. \emph{Hard-label attack:} The semantic similarity measure ranges from 40, the maximum number of random initial generation adversarial examples is 2500, the maximum number of individual populations is 30, the maximum number of iterations is 100, and the maximum number of variants is 25. \emph{Our method:} The maximum number of individual populations is 3, the maximum number of random initial generation of adversarial examples is 2000. In our method, the fitness function is based on semantic similarity as a guide. Other experimental settings are consistent with \emph{Textfooler} as well as \emph{Hard-label attack}.
|
||
|
||
\textcolor{red} {In \emph{Experiment A}, the exactly same one thousand data samples are used to attack same neural network model in the comparison of baseline methods and our method for the same text task. In \emph{Experiment B}, the setting remains the same as in Experiment A. In \emph{Experiment C}, most attack API experiments are unrestricted, so the data samples used are still one thousand. Since Aliyun cannot support large, uninterrupted queries, it can only attack Aliyun under the condition of limiting the number of queries. In \emph{Experiment D}, tens of thousands of queries are required to detect toxic comments of a robust model in one data sample due to the difficulty of attack. Thus, 300 typical data samples are selected to complete the attack.}
|
||
\textcolor{red}{\section{Results and Analysis}}
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{{Abbreviation List}}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{cc}
|
||
\hline
|
||
\hline
|
||
Abbreviation & Explanation \bigstrut\\
|
||
\hline
|
||
Textbugger & Method proposed in \emph{Textbugger2018}\cite{li2018textbugger} \bigstrut[t]\\
|
||
Textfooler & Method proposed in \emph{Textfooler2020}\cite{jin2020bert} \\
|
||
PSO & Method proposed in \emph{SememePSO2019}\cite{zang2019word} \\
|
||
HLA & Method proposed in \emph{Hard-label attack2021}\cite{maheshwary2021generating} \\
|
||
TDE & Our method \bigstrut[b]\\
|
||
\hline
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
|
||
To compare the proposed method performance, attack success rate, perturbation rate, semantic similarity, average number of queries, amount of change in PPL, running time, and achieving rate are used as metrics. The abbreviations of each baseline method are shown in Table V.
|
||
\textcolor{red}{\subsection{Experimental Results}}
|
||
Experimental comparisons are conducted under different experimental tasks are performed.
|
||
\subsubsection{Experiments A: Basic Experiments}
|
||
All basic experiments attack one thousand samples using the same target model and dataset. From there, the results are analyzed and compared.
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Basic Experiment I}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{lrrrrrrr}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Orig.}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Method}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Succ.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Pert.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{{$\Delta\mathit{PPL}$}}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{} \bigstrut[b]\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{5}[2]{*}{\textbf{WordLSTM-MR}}} & \multicolumn{1}{c|}{\multirow{5}[2]{*}{78\%}} & \multicolumn{1}{c|}{Textbugger} & \multicolumn{1}{c|}{79.17\%} & \multicolumn{1}{c|}{12.90\%**} & \multicolumn{1}{c|}{0.905**} & \multicolumn{1}{c|}{189.492 } & \multicolumn{1}{c|}{/} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{Textfooler} & \multicolumn{1}{c|}{\textbf{98.85\%}} & \multicolumn{1}{c|}{\textbf{13.07\%}} & \multicolumn{1}{c|}{0.872 } & \multicolumn{1}{c|}{113.652 } & \multicolumn{1}{c|}{/} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{PSO} & \multicolumn{1}{c|}{95.05\%} & \multicolumn{1}{c|}{14.53\%} & \multicolumn{1}{c|}{0.835 } & \multicolumn{1}{c|}{128.258 } & \multicolumn{1}{c|}{2804.15 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{HLA} & \multicolumn{1}{c|}{98.18\%} & \multicolumn{1}{c|}{13.75\%} & \multicolumn{1}{c|}{0.869 } & \multicolumn{1}{c|}{121.841 } & \multicolumn{1}{c|}{7406.78 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{TDE} & \multicolumn{1}{c|}{98.28\%} & \multicolumn{1}{c|}{14.22\%} & \multicolumn{1}{c|}{\textbf{0.885 }} & \multicolumn{1}{c|}{\textbf{86.092 }} & \multicolumn{1}{c|}{\textbf{284.74 }} \bigstrut[b]\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{5}[2]{*}{\textbf{WordLSTM-IMDB}}} & \multicolumn{1}{c|}{\multirow{5}[2]{*}{88\%}} & \multicolumn{1}{c|}{Textbugger} & \multicolumn{1}{c|}{99.52\%} & \multicolumn{1}{c|}{4.13\%} & \multicolumn{1}{c|}{0.977 } & \multicolumn{1}{c|}{18.011 } & \multicolumn{1}{c|}{/} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{Textfooler} & \multicolumn{1}{c|}{\textbf{100.00\%}} & \multicolumn{1}{c|}{\textbf{2.64\%}} & \multicolumn{1}{c|}{0.985 } & \multicolumn{1}{c|}{9.164 } & \multicolumn{1}{c|}{/} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{PSO} & \multicolumn{1}{c|}{99.85\%} & \multicolumn{1}{c|}{3.19\%} & \multicolumn{1}{c|}{0.977 } & \multicolumn{1}{c|}{11.475 } & \multicolumn{1}{c|}{75626.53 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{HLA} & \multicolumn{1}{c|}{99.29\%} & \multicolumn{1}{c|}{2.63\%} & \multicolumn{1}{c|}{0.986 } & \multicolumn{1}{c|}{9.657 } & \multicolumn{1}{c|}{13661.41 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{TDE} & \multicolumn{1}{c|}{99.41\%} & \multicolumn{1}{c|}{3.79\%} & \multicolumn{1}{c|}{\textbf{0.987 }} & \multicolumn{1}{c|}{\textbf{6.001 }} & \multicolumn{1}{c|}{\textbf{2355.68 }} \bigstrut[b]\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{5}[2]{*}{\textbf{BERT-MR}}} & \multicolumn{1}{c|}{\multirow{5}[2]{*}{84\%}} & \multicolumn{1}{c|}{Textbugger} & \multicolumn{1}{c|}{63.19\%} & \multicolumn{1}{c|}{13.63\%**} & \multicolumn{1}{c|}{0.899**} & \multicolumn{1}{c|}{264.191 } & \multicolumn{1}{c|}{/} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{Textfooler} & \multicolumn{1}{c|}{92.66\%} & \multicolumn{1}{c|}{18.57\%} & \multicolumn{1}{c|}{0.815 } & \multicolumn{1}{c|}{202.406 } & \multicolumn{1}{c|}{/} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{PSO} & \multicolumn{1}{c|}{93.49\%} & \multicolumn{1}{c|}{20.25\%} & \multicolumn{1}{c|}{0.772 } & \multicolumn{1}{c|}{214.335 } & \multicolumn{1}{c|}{5473.36 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{HLA} & \multicolumn{1}{c|}{\textbf{95.76\%}} & \multicolumn{1}{c|}{\textbf{15.41\%}} & \multicolumn{1}{c|}{0.864 } & \multicolumn{1}{c|}{162.391 } & \multicolumn{1}{c|}{8567.81 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{TDE} & \multicolumn{1}{c|}{\textbf{95.76\%}} & \multicolumn{1}{c|}{16.60\%} & \multicolumn{1}{c|}{\textbf{0.876 }} & \multicolumn{1}{c|}{\textbf{137.677 }} & \multicolumn{1}{c|}{\textbf{636.10 }} \bigstrut[b]\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{5}[2]{*}{\textbf{BERT-IMDB}}} & \multicolumn{1}{c|}{\multirow{5}[2]{*}{92\%}} & \multicolumn{1}{c|}{Textbugger} & \multicolumn{1}{c|}{91.85\%} & \multicolumn{1}{c|}{7.43\%} & \multicolumn{1}{c|}{0.957 } & \multicolumn{1}{c|}{41.865 } & \multicolumn{1}{c|}{/} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{Textfooler} & \multicolumn{1}{c|}{99.01\%} & \multicolumn{1}{c|}{7.35\%} & \multicolumn{1}{c|}{0.955 } & \multicolumn{1}{c|}{30.302 } & \multicolumn{1}{c|}{/} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{PSO} & \multicolumn{1}{c|}{\textbf{100.00\%}} & \multicolumn{1}{c|}{\textbf{3.47\%}} & \multicolumn{1}{c|}{0.971 } & \multicolumn{1}{c|}{\textbf{13.085 }} & \multicolumn{1}{c|}{104623.62 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{HLA} & \multicolumn{1}{c|}{96.31\%} & \multicolumn{1}{c|}{3.67\%} & \multicolumn{1}{c|}{\textbf{0.983 }} & \multicolumn{1}{c|}{15.871 } & \multicolumn{1}{c|}{26426.12 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{TDE} & \multicolumn{1}{c|}{96.31\%} & \multicolumn{1}{c|}{5.38\%} & \multicolumn{1}{c|}{\textbf{0.983 }} & \multicolumn{1}{c|}{14.474 } & \multicolumn{1}{c|}{\textbf{11574.07 }} \bigstrut[b]\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{5}[2]{*}{\textbf{WordCNN-MR}}} & \multicolumn{1}{c|}{\multirow{5}[2]{*}{77\%}} & \multicolumn{1}{c|}{Textbugger} & \multicolumn{1}{c|}{78.25\%} & \multicolumn{1}{c|}{13.97\%**} & \multicolumn{1}{c|}{0.901**} & \multicolumn{1}{c|}{218.209 } & \multicolumn{1}{c|}{/} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{Textfooler} & \multicolumn{1}{c|}{\textbf{99.43\%}} & \multicolumn{1}{c|}{\textbf{14.06\%}} & \multicolumn{1}{c|}{0.863 } & \multicolumn{1}{c|}{120.288 } & \multicolumn{1}{c|}{/} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{PSO} & \multicolumn{1}{c|}{95.22\%} & \multicolumn{1}{c|}{15.50\%} & \multicolumn{1}{c|}{0.828 } & \multicolumn{1}{c|}{142.586 } & \multicolumn{1}{c|}{3107.48 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{HLA} & \multicolumn{1}{c|}{98.81\%} & \multicolumn{1}{c|}{14.49\%} & \multicolumn{1}{c|}{0.866 } & \multicolumn{1}{c|}{140.677 } & \multicolumn{1}{c|}{7755.23 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{TDE} & \multicolumn{1}{c|}{98.81\%} & \multicolumn{1}{c|}{15.54\%} & \multicolumn{1}{c|}{\textbf{0.872 }} & \multicolumn{1}{c|}{\textbf{115.377 }} & \multicolumn{1}{c|}{\textbf{319.01 }} \bigstrut[b]\\
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{5}[2]{*}{\textbf{WordCNN-IMDB}}} & \multicolumn{1}{c|}{\multirow{5}[2]{*}{86\%}} & \multicolumn{1}{c|}{Textbugger} & \multicolumn{1}{c|}{99.76\%} & \multicolumn{1}{c|}{3.75\%} & \multicolumn{1}{c|}{0.979 } & \multicolumn{1}{c|}{16.417 } & \multicolumn{1}{c|}{/} \bigstrut[t]\\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{Textfooler} & \multicolumn{1}{c|}{\textbf{100.00\%}} & \multicolumn{1}{c|}{\textbf{2.43\%}} & \multicolumn{1}{c|}{\textbf{0.986 }} & \multicolumn{1}{c|}{\textbf{7.729 }} & \multicolumn{1}{c|}{/} \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{PSO} & \multicolumn{1}{c|}{\textbf{100.00\%}} & \multicolumn{1}{c|}{3.37\%} & \multicolumn{1}{c|}{0.975 } & \multicolumn{1}{c|}{11.409 } & \multicolumn{1}{c|}{80926.12 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{HLA} & \multicolumn{1}{c|}{99.76\%} & \multicolumn{1}{c|}{2.80\%} & \multicolumn{1}{c|}{0.985 } & \multicolumn{1}{c|}{10.341 } & \multicolumn{1}{c|}{11827.57 } \\
|
||
\multicolumn{1}{|c|}{} & \multicolumn{1}{c|}{} & \multicolumn{1}{c|}{TDE} & \multicolumn{1}{c|}{99.76\%} & \multicolumn{1}{c|}{3.90\%} & \multicolumn{1}{c|}{\textbf{0.986 }} & \multicolumn{1}{c|}{7.983 } & \multicolumn{1}{c|}{\textbf{1436.52 }} \bigstrut[b]\\
|
||
\hline
|
||
**Low Attack Success Rate($\mathit{Succ.}$) & & & & & & & \bigstrut[t]\\
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Basic Experiment II}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|c|c|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Orig.}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Method}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Succ.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Pert.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{{$\Delta\mathit{PPL}$}}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & & & & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{Albert-SST2}} & \multirow{2}[2]{*}{91\%} & HLA & 90.82\% & 27.98\% & 0.744 & 265.085 & 6970.38 \bigstrut[t]\\
|
||
& & TDE & \textbf{94.06\%} & \textbf{18.55\%} & \textbf{0.838 } & \textbf{167.965 } & \textbf{517.03 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{Albert-CoLA}} & \multirow{2}[2]{*}{83\%} & HLA & 96.85\% & 18.24\% & 0.856 & 188.848 & 4915.19 \bigstrut[t]\\
|
||
& & TDE & \textbf{97.23\%} & \textbf{18.20\%} & \textbf{0.894 } & \textbf{148.805 } & \textbf{111.12 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{DistilBERT-SNLI}} & \multirow{2}[2]{*}{86\%} & HLA & 98.60\% & \textbf{7.58}\% & 0.749 & 35.375 & 5381.19 \bigstrut[t]\\
|
||
& & TDE & \textbf{98.81\%} & 7.60\% & \textbf{0.799 } & \textbf{34.212 } & \textbf{99.66 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{DistilBERT-MNLI}} & \multirow{2}[2]{*}{81\%} & HLA & \textbf{95.19\%} & 7.85\% & 0.780 & 30.095 & 5893.71 \bigstrut[t]\\
|
||
& & TDE & \textbf{95.19\%} & \textbf{7.83\%} & \textbf{0.815 } & \textbf{29.243 } & \textbf{207.57 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
\begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.32]{figs/sim and qrs.pdf}
|
||
\caption{$\mathit{Sim }$$ \& $$\mathit{ Qrs}$}
|
||
\label{fig:5}
|
||
\end{figure}
|
||
|
||
Since only PSO, hard-label attack, and our method will consider the numbers of queries in their attacks, only the average number of queries is compared for these three attack methods in the experimental comparison. Through the results in Table VI, it can be observed that, under several different models and datasets, the proposed attack method can generate high-quality adversarial examples with higher semantic similarity under the premise of similar attack success rate and perturbation rate compared with other methods. Also, compared to PSO and hard-label attack under the premise that the quality of generated adversarial examples is similar or even better, the proposed algorithm requires only a small number of queries.
|
||
|
||
% \begin{figure}[pos=h]
|
||
% %是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
% \centering
|
||
% \includegraphics[scale=0.3]{figs/Graph1.1.pdf}
|
||
% \caption{$\mathit{Sim}$$\&$$\mathit{Qrs}$}
|
||
% \label{fig:5}
|
||
% \end{figure}
|
||
|
||
|
||
|
||
Besides, under Table VI, Table VII, and Figure V compared with the hard-label attack under multiple model tasks, especially under natural language inference tasks, the proposed method has an profound improvement in terms of average number of queries and semantic similarity. For example, on \emph{DistilBERT-SNLI}, with a similar attack success rate and perturbation rate, the semantic similarity is improved from 0.749 to 0.759 for HLA while the average number of queries is reduced from 5381.19 to 99.66.
|
||
|
||
To further validate the performance of the proposed TDE algorithm, different experimental comparisons were also conducted.
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Limit Queries}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Qry Lim}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Method}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Ach.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Succ.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{Albert-SST2}} & \multirow{2}[2]{*}{\textbf{1500}} & HLA & 15.65\% & 73.33\% & 646.08 \bigstrut[t]\\
|
||
& & TDE & \textbf{89.37\%} & \textbf{100.00\%} & \textbf{270.18 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{Albert-CoLA}} & \multirow{2}[2]{*}{\textbf{1500}} & HLA & 18.18\% & 99.33\% & 367.75 \bigstrut[t]\\
|
||
& & TDE & \textbf{95.63}\% & \textbf{100.00}\% & \textbf{79.22} \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{DistilBERT-SNLI}} & \multirow{2}[2]{*}{\textbf{1500}} & HLA & 13.45\% & 99.15\% & 575.09 \bigstrut[t]\\
|
||
& & TDE & \textbf{98.13\%} & \textbf{100.00\%} & \textbf{79.29 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{DistilBERT-MNLI}} & \multirow{2}[2]{*}{\textbf{1500}} & HLA & 12.72\% & \textbf{100.00\%} & 668.63 \bigstrut[t]\\
|
||
& & TDE & \textbf{92.84\%} & \textbf{100.00\%} & \textbf{123.62 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{WordLSTM-MR}} & \multirow{2}[2]{*}{\textbf{2000}} & HLA & 8.33\% & 96.92\% & 659.63 \bigstrut[t]\\
|
||
& & TDE & \textbf{95.89}\% & \textbf{97.73}\% & \textbf{194.71} \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{BERT-MR}} & \multirow{2}[2]{*}{\textbf{2000}} & HLA & 6.70\% & 58.42\% & 773.17 \bigstrut[t]\\
|
||
& & TDE & \textbf{85.43\%} & \textbf{97.67\%} & \textbf{262.45 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{WordCNN-MR}} & \multirow{2}[2]{*}{\textbf{2000}} & HLA & 6.81\% & \textbf{98.61\%} & 707.66 \bigstrut[t]\\
|
||
& & TDE & \textbf{97.14\%} & 97.29\% & \textbf{205.73 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{WordLSTM-IMDB}} & \multirow{2}[2]{*}{\textbf{5000}} & HLA & 7.35\% & 91.49\% & 3,283.97 \bigstrut[t]\\
|
||
& & TDE & \textbf{93.10\%} & \textbf{100.00\%} & \textbf{885.19 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{WordCNN-IMDB}} & \multirow{2}[2]{*}{\textbf{5000}} & HLA & 9.45\% & 97.94\% & 3,567.98 \bigstrut[t]\\
|
||
& & TDE & \textbf{96.39\%} & \textbf{100.00\%} & \textbf{832.56 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{BERT-IMDB}} & \multirow{2}[2]{*}{\textbf{10000}} & HLA & 19.35\% & 99.13\% & 6,963.79 \bigstrut[t]\\
|
||
& & TDE & \textbf{80.65\%} & \textbf{99.39\%} & \textbf{2,006.84 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
With the overall quality of adversarial examples close to or even better than that of the baseline method, a comparative experiment of limiting the number of queries is conducted and a new metric is introduced. At the same time, the proposed method uses a smaller number of queries to be able to achieve the attack process, thus highlighting the efficiency of the algorithm. As seen in Table VIII and Figure VI, TDE can complete the attack process for the vast majority of the given samples under different models as well as different tasks compared to HLA with different query restrictions.
|
||
% \begin{figure}[pos=h]
|
||
% %是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
% \centering
|
||
% %\includegraphics[scale=0.3]{figs/Graph3.1.pdf}
|
||
% \includegraphics[width=1\linewidth]{figs/Graph3.1.pdf}
|
||
% \caption{Ach.-Qry Lim}
|
||
% \label{fig:6}
|
||
% \end{figure}
|
||
\begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
|
||
\includegraphics[scale=0.34]{figs/Ach.-Qry Lim.pdf}
|
||
\caption{Ach.-Qry Lim}
|
||
\label{fig:6}
|
||
\end{figure}
|
||
|
||
|
||
For example, under \emph{WordLSTM-MR}, limiting the number of queries to 2000, TDE can complete the attack on $95.89\%$ of the samples while HLA can complete only $8.33\%$. In addition, using stricter query limits such as 3000 and 4000 under this model-dataset does not have much effect on the reach rate of TDE which is substantially ahead of HLA and PSO.
|
||
\subsubsection{Experiments B: Runtime Experiments}
|
||
To demonstrate the efficiency difference between HLA and TDE, it is shown directly by running time comparison. The running time required for attacks in experiments is obtained by averaging the values based on the elapsed time in the basic experiments. Since there are a thousand samples, it is generalizable.
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Time each data}
|
||
\resizebox{.4\textwidth}{!} {
|
||
\renewcommand\arraystretch{0.69}
|
||
\tabcolsep=0.35cm
|
||
\begin{tabular}{|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Method}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Time each data}} \bigstrut[t]\\
|
||
& & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{LSTM-MR}} & HLA & 57.91 s \bigstrut[t]\\
|
||
& TDE & \textbf{2.39 s} \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{BERT-MR}} & HLA & 63.26 s \bigstrut[t]\\
|
||
& TDE & \textbf{11.92 s} \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{LSTM-IMDB}} & HLA & 273.24 s \bigstrut[t]\\
|
||
& TDE & \textbf{132.00 s} \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{BERT-IMDB}} & HLA & 2556.60 s \bigstrut[t]\\
|
||
& TDE & \textbf{1713.10 s} \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{DistilBERT-MNLI}} & HLA & 61.56 s \bigstrut[t]\\
|
||
& TDE & \textbf{6.78 s} \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
\begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.3]{figs/Ach.&Time-Qry Lim.pdf}
|
||
\caption{{$\mathit{Ach.}$}$ \& $Time-Qry Lim}
|
||
\label{fig:7}
|
||
\end{figure}
|
||
|
||
As seen in Table IX, TDE takes a shorter time to complete each sample attack under the model with different tasks. Also, in Figure VII, under the setting of limiting the number of queries, the reach rate of TDE is extremely better than HLA as well. Therefore, TDE can maintain high efficiency while the semantic similarity of the generated adversarial examples are better than HLA.
|
||
|
||
|
||
|
||
|
||
\subsubsection{Experiments C: API Experiments}
|
||
The hard-label experimental setup is more in line with realistic scenarios. At this point, the adversarial attack under other black-box soft-label will fail. Therefore, it becomes necessary to be able to implement realistic scenarios of API attacks. When attacking the same model \& API, we still use the exact same dataset with one thousand samples for the attack.
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Model $\&$ API}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|c|c|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model\&API}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Method}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Succ.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Pert.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{{$\Delta\mathit{PPL}$}}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & & & & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{AllenNLP}} & \multirow{2}[2]{*}{\textbf{SST2}} & HLA & 85.32\% & 26.61\% & 0.752 & 213.715 & 5603.78 \bigstrut[t]\\
|
||
& & TDE & \textbf{87.40\%} & \textbf{25.45\%} & \textbf{0.803 } & \textbf{198.496 } & \textbf{186.95 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{FastText}} & \multirow{2}[2]{*}{\textbf{YELP}} & HLA & 93.29\% & \textbf{8.53\%} & 0.943 & \textbf{39.780 } & 24089.37 \bigstrut[t]\\
|
||
& & TDE & \textbf{93.33\%} & 9.07\% & \textbf{0.947 } & 40.525 & \textbf{4724.13 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{Google Cloud}} & \multirow{2}[2]{*}{\textbf{TWEETS}} & HLA & 74.92\% & 14.85\% & 0.844 & 273.642 & 6986.93 \bigstrut[t]\\
|
||
& & TDE & \textbf{77.85\%} & \textbf{14.53\%} & \textbf{0.876 } & \textbf{252.067 } & \textbf{175.47 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Model $\&$ API-Qry Lim}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{API\&API}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Method}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Qry Lim}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Ach.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Succ.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Pert.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & & & & & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{ALIYUN}} & \multirow{2}[2]{*}{\textbf{AMAZON}} & HLA & \multirow{2}[2]{*}{5000} & 5.59\% & / & / & / & 3,113.16 \bigstrut[t]\\
|
||
& & TDE & & \textbf{84.91\%} & \textbf{83.85\%} & \textbf{11.85\%} & \textbf{0.918 } & \textbf{1012.83 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{AllenNLP}} & \multirow{2}[2]{*}{\textbf{SST2}} & HLA & \multirow{2}[2]{*}{5000} & 19.27\% & 76.19\% & 30.44\% & 0.745 & 640.59 \bigstrut[t]\\
|
||
& & TDE & & \textbf{90.51\%} & \textbf{95.63\%} & \textbf{27.62\%} & \textbf{0.806 } & \textbf{186.96 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{FastText}} & \multirow{2}[2]{*}{\textbf{YELP}} & HLA & \multirow{2}[2]{*}{5000} & 4.21\% & / & / & / & 4293.08 \bigstrut[t]\\
|
||
& & TDE & & \textbf{80.57\%} & \textbf{100.00\%} & \textbf{9.15\%} & \textbf{0.949 } & \textbf{1120.65 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{Google Cloud}} & \multirow{2}[2]{*}{\textbf{TWEETS}} & HLA & \multirow{2}[2]{*}{5000} & 26.75\% & \textbf{79.49\%} & 16.41\% & 0.853 & 3140.44 \bigstrut[t]\\
|
||
& & TDE & & \textbf{100.00\%} & 77.85\% & \textbf{14.53\%} & \textbf{0.876 } & \textbf{175.472 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
Several domestic and international API interfaces for NLP-related models, conducted experiments using several different standard datasets, and real Twitter comment data were used. A limit on the number of queries was also added to the experiments to be more relevant to real scenario. As seen from the analysis of the experimental results in Table X and Table XI. The proposed method can attack the APIs of \emph{ALIYUN}, \emph{AllenNLP}, \emph{FastText}, and \emph{Google Cloud} with a very low budget. For example, the \emph{YELP} dataset is used in attacking the models and APIs provided by \emph{Facebook's FastText}. With the rest of its metrics being close or even better in performance, the proposed approach can save nearly 20,000 average number of queries per attack round compared to the original HLA. This significantly reduces the overhead and makes the API adversarial attack of hard-label usable. At the same time, by attacking real commercial and open-source API, the shortcomings and deficiencies are found in these models. Moreover, through the generated adversarial examples, corresponding defense can be conducted to improve robustness.
|
||
\subsubsection{Experiments D: Toxic Comment Experiments}
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Toxic Comment}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|c|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Orig.}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Method}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Pert.}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{{$\Delta\mathit{PPL}$}}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & & & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{RoBERTa\_toxicity-Jigsaw\_toxicity}} & \multirow{2}[2]{*}{93\%} & HLA & \textbf{7.53\%} & 0.902 & 18.688 & 22161.60 \bigstrut[t]\\
|
||
& & TDE & 8.73\% & \textbf{0.915 } & \textbf{18.208 } & \textbf{8249.18 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
In addition to sentiment classification and natural language inference tasks in the experiments, the task of toxic comment detection is included, which requires a large number of queries and the display efficiency can be more obviously shown. The adversarial attack of toxic comment detection is very difficult. A sample of 300 achievable attacks was selected for experimental comparison.
|
||
\begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.3]{figs/Query Ratio.pdf}
|
||
\caption{Query Ratio}
|
||
\label{fig:8}
|
||
\end{figure}
|
||
|
||
In this experiment, only the number of queries of the algorithms were compared, thus highlighting the differences between the efficiency of the algorithms.
|
||
|
||
The results in Table XII show that the average number of queries required by HLA for the task of toxic evaluation detection reaches more than 22,000 while the proposed TDE is still able to control around 8,000. Moreover, as seen in the scale of Figure VIII, the TDE method can complete the attack on most of the samples under 5,000 average number of queries while most of the samples of HLA require tens of thousands average number of queries to complete the adversarial attack.
|
||
\subsubsection{Comparison of Existing Method}
|
||
Through the four parts of experiments, it can be found that our method is able to achieve efficiency and quality advantages in the task of sentiment classification in short sentence datasets MR, CoLA, SST2, and long sentence datasets Amazon, Yelp, and IMDB in a variety of experiments compared with the existing hard-label method (HLA). Next, we perform the adversarial attack on the sentiment classification task with the crawled TWEETS review dataset, and we can see that our approach can achieve good results on both the standard dataset and the crawled data. For the sentiment classification task, the average number of queries of the existing method for the dataset of each model is 11,291.87, while the average number of queries of our method is 2,029.16. Under the premise that the quality of the adversarial example is generally better than the existing method, the average number of queries decreases by \textbf{82\%}, which is a significant efficiency improvement.
|
||
|
||
Next, for the natural language inference task, we use the standard SNLI and MNLI datasets, and our method has a large lead compared to the existing method. From the calculation of the average number of queries of the experimental results, we can see that our method reduces the average number of queries required to generate adversarial examples by \textbf{97\%} overall compared to the existing method, and the semantic similarity of adversarial examples is also significantly improved.
|
||
|
||
Finally, we also tested adversarial attacks on a robust toxic comment model, a very difficult task. As can be seen from the results, our method can have a significant improvement in efficiency over the existing method. The average number of queries is still guaranteed to be reduced by more than \textbf{62\%}, provided the success rate is consistent and the quality of the generated adversarial examples is close.
|
||
|
||
The four parts experiments are compared under a comprehensive experiment for multiple text tasks. Our method is able to maintain the quality and efficiency lead with multiple baseline methods as well as the existing method. We compare the results from four experimental sections to better understand the effectiveness of adversarial attacks under black-box and hard-label.
|
||
|
||
\textcolor{red}{\subsection{Further Analysis}}
|
||
\subsubsection{Statistical Test}
|
||
|
||
Our work is to improve the existing HLA method, which requires a high number of queries in the process of generating adversarial examples and the low semantic similarity of the generated adversarial examples. The semantic similarity of the generated adversarial examples is improved by our method and the number of queries is reduced, while the attack success rate and the perturbation rate are guaranteed.
|
||
Therefore, we analyze the average number of queries and semantic similarity in the evaluation metrics. So, we should determine whether the improvement of our method over the existing methods is significant. We select the 14 datasets under attack, average a large number of samples from each dataset, \begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.15]{figs/Statistica-sim1.28.pdf}
|
||
\caption{Statistical-Semantic Similarity}
|
||
\label{fig:8}
|
||
\end{figure}and take the average number of queries and semantic similarity of each dataset. The existing method is compared with our method by Welch's t-test.
|
||
|
||
For semantic similarity, the mean value of semantic similarity in HLA is 0.8669 with a standard deviation of 0.0863, and the mean value of semantic similarity in TDE is 0.8907 with a standard deviation of 0.0664. As shown in Figure IX, our experimental results outperform existing methods, but the amount of variation in semantic similarity is small. Consequently, the Welch's t-test cannot be performed.
|
||
|
||
Then, for the average number of queries, we performed a statistical test analysis. As shown in Figure X, the average number of queries in HLA has a mean value of 10488.27 and a standard deviation of 6967.5431, and the average number of queries in TDE has a mean value of 1810.77 and a standard deviation of 3259.8835. The Welch's t-test can be performed.
|
||
\begin{figure}[pos=h]
|
||
%是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
\centering
|
||
\includegraphics[scale=0.15]{figs/Statistica-qrs1.28.pdf}
|
||
\caption{Statistical-Number of Queries}
|
||
\label{fig:8}
|
||
\end{figure}
|
||
|
||
The formula for Welch's t-test is,
|
||
$$
|
||
t=\frac{\bar{X}_1-\bar{X}_2}{s_{\bar{\Delta}}}
|
||
$$
|
||
where
|
||
$$
|
||
s_{\bar{\Delta}}=\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} .
|
||
$$
|
||
|
||
|
||
|
||
|
||
\textbf{Null Hypothesis} ${H_0}$: There is no difference in the number of queries between the existing HLA method and our method.
|
||
|
||
\textbf{Alternative Hypothesis} ${H_1}$: There is a difference in the number of queries between the existing HLA method and our method.
|
||
|
||
|
||
% \begin{figure}[pos=h]
|
||
% %是可选项 h表示的是here在这里插入,t表示的是在页面的顶部插入
|
||
% \centering
|
||
% \includegraphics[scale=0.15]{figs/Statistical-Qrs.pdf}
|
||
% \caption{Statistical-Qrs}
|
||
% \label{fig:8}
|
||
% \end{figure}
|
||
|
||
|
||
The formula for calculating the degrees of freedom is,
|
||
$$
|
||
\text { d.f. }=\frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{\left(s_1^2 / n_1\right)^2}{n_1-1}+\frac{\left(s_2^2 / n_2\right)^2}{n_2-1}}
|
||
$$
|
||
|
||
The degree of freedom of the t-distribution is 18, $t(18)$ is 4.2208, the $p-value$ is 0.0004907, $\alpha$ is 5\%, it can be seen that $p-value$ is less than $\alpha$, the statistical test results show that there is a significant difference, rejecting the ${H_0}$, \textbf{accepting the ${H_1}$}. thus verifying that there is a significant difference in the number of queries between the existing method and our method, with an effect size $d$ is 1.2454 and a significant experimental effect.
|
||
|
||
|
||
According to the results of the statistical test, it can be seen that the quality of the generated semantic similarity of our method is higher than that of the existing method, and the experimental comparison of the number of queries has obvious differences and is much better than the existing method, while maintaining the attack success rate and the perturbation rate similar. So, our method has a significant improvement in the quality and efficiency of generating adversarial examples compared with the existing method.
|
||
|
||
\subsubsection{Time and Space Complexity}
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Complexity Symbols List}
|
||
% \resizebox{\linewidth}{!}
|
||
% {
|
||
\begin{tabular}{cc}
|
||
\hline
|
||
\hline
|
||
Symbols & Details \bigstrut\\
|
||
\hline
|
||
$\mathcal{O}$ & Time complexity or Space complexity \bigstrut[t]\\
|
||
$\mathtt{G}$ & Iterations \\
|
||
$\mathtt{N}$ & Population size \\
|
||
$\mathtt{L}$ & Sample information \\
|
||
$\mathtt{P}$ & Number of individuals \\
|
||
$\mathtt{D}$ & Difference times \bigstrut[b]\\
|
||
\hline
|
||
\hline
|
||
\end{tabular}%
|
||
% }
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
\textcolor{red}{
|
||
To better describe our method, we compare the complexity of time and space with the existing method. The symbolic details of time complexity and space complexity are shown in Table XIII.}
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Time and Space complexity}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|}
|
||
\hline
|
||
Method & Time complexity & Space complexity \bigstrut\\
|
||
\hline
|
||
HLA & $\mathcal{O}$$(\mathtt{G} \cdot \mathtt{N^2} \cdot\mathtt{L})$ & $\mathcal{O}$$(\mathtt{N^2} \cdot \mathtt{P} \cdot \mathtt{L})$ \bigstrut[t]\\
|
||
TDE & $\mathcal{O}$$(\mathtt{G} \cdot \mathtt{N} \cdot \mathtt{L})$ & $\mathcal{O}$$(\mathtt{N} \cdot \mathtt{L} \cdot (\mathtt{D+1}))$
|
||
\bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
\textcolor{red}{
|
||
As shown in the Table XIV, in terms of time and space complexity, the existing method is much larger than our method in terms of population size and number of individuals, and the growth level of population size is $\mathtt{N^2}$ in the existing method, which is much higher than $\mathtt{N}$ in our method. The time complexity and space complexity of the existing method are much higher than our method because all the individuals are deposited in the population. It can be seen that the efficiency of our method has been greatly improved. }
|
||
\subsubsection{Ablation Experiments}
|
||
To investigate the impact of different population optimization algorithms based on population generation and combinatorial optimization on the results, a comparison of ablation experiments using matching GA's population generation and DE's combinatorial optimization with DE's population generation and GA's combinatorial optimization was performed.
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Ablation Study I}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Ablation Study}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{WordLSTM-MR}} & HLA.Pop Gen + HLA.CO & \textbf{0.869 } & 7406.78 \bigstrut[t]\\
|
||
& TDE.Pop Gen + HLA.CO & 0.851 & \textbf{3602.30 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{DistilBERT-SNLI}} & HLA.Pop Gen + HLA.CO & 0.749 & 5381.19 \bigstrut[t]\\
|
||
& TDE.Pop Gen + HLA.CO & \textbf{0.752 } & \textbf{3935.29 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Pop Siz}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{c|c|c|c|c}
|
||
\hline
|
||
\multicolumn{1}{c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Module}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Pop Siz}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & & \bigstrut[b]\\
|
||
\hline
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{WordLSTM-MR}} & HLA.Pop Gen & 25.02 & \textbf{0.831 } & 915.30 \bigstrut[t]\\
|
||
& TDE.Pop Gen & \textbf{3.75 } & 0.850 & \textbf{236.61 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{DistilBERT-SNLI}} & HLA.Pop Gen & 25.78 & 0.724 & 854.34 \bigstrut[t]\\
|
||
& TDE.Pop Gen & \textbf{2.83 } & \textbf{0.739 } & \textbf{112.71 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
As seen in Tables XV and XVI, the population generation(Pop Gen) of DE will only retain the higher-quality offspring deposited in the population. This has a big impact on the next combinatorial optimization(CO) and the average number of queries.
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Ablation Study II}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{\multirow{2}[2]{*}{Model-Dataset}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{Ablation Study}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Sim}$}} & \multicolumn{1}{c|}{\multirow{2}[2]{*}{$\mathit{Qrs}$}} \bigstrut[t]\\
|
||
& & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{3}[2]{*}{\textbf{WordLSTM-MR}} & HLA.Pop Gen + HLA.CO & 0.869 & 7406.78 \bigstrut[t]\\
|
||
& TDE.Pop Gen + HLA.CO & 0.851 & 3602.30 \\
|
||
& TDE.Pop Gen + TDE.CO & \textbf{0.885 } & \textbf{284.74 } \bigstrut[b]\\
|
||
\hline
|
||
\multirow{3}[2]{*}{\textbf{DistilBERT-SNLI}} & HLA.Pop Gen + HLA.CO & 0.749 & 5381.19 \bigstrut[t]\\
|
||
& TDE.Pop Gen + HLA.CO & 0.752 & 3935.29 \\
|
||
& TDE.Pop Gen + TDE.CO & \textbf{0.799 } & \textbf{99.66 } \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
With the same combinatorial optimization strategy, population generation under TDE has higher semantic similarity as well as lower average number of queries. However, while the average number of queries is also reduced using HLA due to the difference in population size, the semantic similarity is not necessarily improved.
|
||
According to Table XVII, after using DE-based population generation and combinatorial optimization, the quality of adversarial examples and the average number of queries are positively correlated. Also, the DE-based combinatorial optimization can complete the generation of high-quality adversarial examples with high efficiency.
|
||
In analyzing the population differences between the two groups, the restricted population specification for both HLA and TDE is 30:
|
||
|
||
\begin{enumerate}[(1)]
|
||
\itemsep=0pt
|
||
\item Before generating populations, TDE will judge the words in the initial adversarial example where replacement first occurs. It then only selects the important words that can make the model change for synonym replacement. However, HLA does not consider this part and will perform a synonym search for all the words in the index position that meet adversarial example requirements.
|
||
\item Due to the algorithm design of differential evolution, in the process of selecting synonyms, only a few synonym candidates with high semantic similarity are selected and deposited in the population. As much as possible, HLA will fill up the population followed by quality consideration.
|
||
\item An excessive amount of words deposited into the population will cause a large number of duplicate queries. Since HLA does not consider the metrics of efficiency and average number of queries in the design of this algorithm, there is then a large difference in population size.
|
||
\item Under the TDE algorithm, expanding the population will not only raise the average number of queries but also adversely affect the results.
|
||
\end{enumerate}
|
||
|
||
|
||
\subsection{Analysis and Discussion}
|
||
First, we analyze the results by comparing the experiment with the baseline method.In Experiment A and Experiment B, we are able to find out by comparing typical experiments with baseline experiments under the setting of limiting the average number of queries. In comparison with the baseline of soft-label, our method can still not fall behind in various metrics related to the quality of adversarial examples. Based on the idea of population optimization, PSO, GA, and our proposed DE, even if PSO is a black-box soft-label under the adversarial attack, our proposed method can still get a great improvement in efficiency. At the same time, by querying the budget limit, and comparing the attack time. By using this design of population-based differential evolution under black-box hard-label compared to existing methods, our established experimental goals can be achieved. Even better efficiency and quality than the method under soft-label can be achieved.
|
||
|
||
Next, to further validate the performance of our method, we do not only perform the comparison of basic experiments. We also add adversarial attacks on the model and API in real scenarios and adversarial attacks on the detection of toxic comments in the robust model. Adversarial attacks on the API, which are more in line with the experimental setting of hard-label. The robust toxic comment detection, on the other hand, is a very challenging attack task that requires a large number of queries to achieve the attack. However, the results of Experiment C and Experiment D, are visible. Compared with existing methods, our proposed approach is able to overcome these challenges. Only a very low query budget is required to achieve the attack, which makes the adversarial attack under hard-label usable.
|
||
|
||
Then, more detailed analysis and comparison are made in the statistical test, time and space complexity and ablation experiments. Through different types of detailed analysis, it can be found that our method from has higher attack efficiency, the quality of the generated adversarial examples is also higher, and the effect is universal.
|
||
|
||
Finally, the attack success rate, efficiency, and deficiencies of the proposed method are discussed.
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Examples}
|
||
\resizebox{\linewidth}{!}
|
||
{
|
||
\begin{tabular}{|c|lllllll|cc|}
|
||
\hline
|
||
\textbf{Dataset} & \multicolumn{7}{c|}{\textbf{Adversarial Example}} & \multicolumn{2}{c|}{\textbf{Prediction}} \bigstrut\\
|
||
\hline
|
||
\textbf{MR} & \multicolumn{7}{l|}{bears is even \textcolor{red}{\textit{[[greatest]]}} than i imagined a movie ever could be .} & \multicolumn{2}{c|}{\textbf{ Negative $\rightarrow$ Positive}} \bigstrut\\
|
||
\hline
|
||
\textbf{SST2} & \multicolumn{7}{l|}{it is pretty damned \textcolor{red}{\textit{[[satirical]]}} . } & \multicolumn{2}{c|}{\textbf{ Positive $\rightarrow$ Negative}} \bigstrut\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{SNLI}} & \multicolumn{7}{l|}{Premise: A couple of females are talking to each other while one is flicking her cigarette.} & \multicolumn{2}{c|}{\multirow{2}[2]{*}{\textbf{Neutral $\rightarrow$ Entailment}}} \bigstrut[t]\\
|
||
& \multicolumn{7}{l|}{Hypothesis: \textcolor{red}{\textit{[[Ani]]}} one of them enjoy smoking.} & \multicolumn{2}{c|}{} \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{\textbf{MNLI}} & \multicolumn{7}{p{35.205em}|}{Premise: Coast Guard rules establishing bridgeopening schedules.} & \multicolumn{2}{c|}{\multirow{2}[2]{*}{\textbf{Contradiction $\rightarrow$ Entailment}}} \bigstrut[t]\\
|
||
& \multicolumn{7}{l|}{Hypothesis: The Coast Guard is in \textcolor{red}{\textit{[[burdened]]}} of opening bridges.} & \multicolumn{2}{c|}{} \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
}
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
As shown in the Table XVIII, the adversarial examples generated on different datasets. Red italics in the box are replaced by synonyms.
|
||
|
||
\textbf{Attack Success Rate}: Due to the discrete nature of text, it is more difficult to generate adversarial examples in the field of NLP than in CV. The attack is mainly accomplished by using semantically similar near-synonym substitution. The method performs the adversarial attack by using only the top-level output of the model under black-box hard labeling which improves the attack success rate by multi-angle initialization.
|
||
|
||
\textbf{Efficiencies}: The proposed method discards the local optimum brought by the genetic algorithm to avoid repeated searches during the initialization of adversarial examples. Meanwhile, the words that are prioritized for combinatorial optimization are determined by judging the importance of replacement words for the initialized adversarial examples. By using both differential evolutionary algorithm and word importance selection, a large number of repeated queries are avoided.
|
||
|
||
\textbf{Deficiencies}: However, in terms of perturbation rate, the proposed method may be slightly weaker than the existing hard-label attack. This is due to the importance of judgment of words for the TDE algorithm to improve efficiency and semantic similarity. It is therefore not an exhaustive test by trying to combine words in all positions. Achieving excellence in the overall quality of the adversarial example while ensuring efficiency using optimization algorithm based on population is a challenging task.
|
||
|
||
\section{Conclusion and Future Work}
|
||
\subsection{Study Implication}
|
||
First, our research solves the problem of inefficiency and low quality in the generation of adversarial examples for black-box hard-label in the text field. Then, the existing black-box methods under black-box soft-label generate adversarial examples by using the predicted probability distribution of the model to achieve synonym replacement and generate the best adversarial examples. However, under hard-label, this approach loses its meaning. Therefore, in generating a high semantic adversarial example, it must be achieved by a large number of queries. As in the real scenario of API attacks, if the victim detects overloaded access in a short time, it is possible to stop the attack by simply denying access. An efficient algorithm can significantly save time and money in the attack process. Therefore, adversarial attacks in black-box hard-label are more in line with models and API attacks in realistic scenarios, and our method is far more efficient than the existing method, making adversarial attacks for such scenarios more feasible and greatly expanding the research on adversarial texts in black-box hard-label. In addition, the study of adversarial examples can be a guide for robust defense of models as well as APIs. Finally, our study deepens the current understanding of the combination of adversarial attacks and optimization algorithms under black-box hard-label.
|
||
\subsection{Conclusion}
|
||
Implementing adversarial attacks on text under black-box hard-label presents difficulties. Therefore, the study proposes a novel approach to text-based hard-label attack called text-based differential evolution (TDE) which is a population generation and optimization algorithm based on the idea of differential evolution (DE). Attacks are performed on various NLP tasks including sentiment classification, natural language inference, toxic comment detection, and real-world APIs using a variety of standard datasets as well as real-world comment data. By evaluating a wide range of metrics and comparing then with a combination of a limited number of queries and real-world scenarios, the proposed approach generates adversarial examples with higher semantic similarity and is more efficient than existing black-box hard-label attacks based on genetic algorithms. Furthermore, it is more in line with the use of hard-label attacks in realistic scenarios.
|
||
\subsection{Future Work}
|
||
With our experiments, we can solve the problems of excessive inefficiency and low quality of adversarial examples generated in existing work. And in the text black-box hard-label adversarial attack. We can go beyond the idea of population optimization algorithms to try, we can try other heuristic algorithms, such as Tabu search, to avoid local optimal solutions. And the research direction can refer to the work under the field of CV, such as the idea of sampling\cite{9897705}, but the discrete type of the text, which is precisely the limit of the work needs attention. At the same time, the research for adversarial examples, in terms of defense, can reveal the vulnerability of NLP models and APIs. And thus to improve the robustness and security of the models and APIs. This will be future work.
|
||
|
||
|
||
|
||
|
||
\bibliographystyle{apalike}
|
||
\bibliography{refs-1}
|
||
|
||
|
||
|
||
|
||
\clearpage
|
||
\appendix
|
||
\section{My Appendix}
|
||
\setcounter{table}{0}
|
||
In the Appendix are examples of the adversarial examples generated by our method for adversarial attacks on samples from each dataset. The red words are the words after the synonym substitution operation is performed. Each example is shown below.
|
||
\subsection{Sentiment Classification}
|
||
Among them, the dataset for sentiment classification task is divided into short text dataset and long text dataset.
|
||
\subsubsection{Short Text}
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Short Text Dataset}
|
||
|
||
\begin{tabular}{|c|p{11.5cm}|c|c|c|}
|
||
\hline
|
||
Dataset & \multicolumn{1}{c|}{Adversarial Examples} & $\mathit{Sim}$ & {$\Delta\mathit{PPL}$} & $\mathit{Qrs}$ \bigstrut\\
|
||
\hline
|
||
SST2 & proves once again he has n't lost his touch , bringing off a superb performance in an admittedly \textcolor{red}{\textit{[[mediocre]]}} film . (positive $\rightarrow$ negative) & 0.964 & 5.411 & 26 \bigstrut\\
|
||
\hline
|
||
CoLA & Joan ate dinner with someone but \textcolor{red}{\textit{[[me]]}} don't know who. (positive $\rightarrow$ negative) & 0.967 & 71.328 & 15 \bigstrut\\
|
||
\hline
|
||
TWEET & good morning ! ! ! work and then it`s espn`s sunday night baseball \textcolor{red}{\textit{[[thankfully]]}} it won`t get rained out (positive $\rightarrow$ negative) & 0.966 & 54.821 & 24 \bigstrut\\
|
||
\hline
|
||
MR & it has the right approach and the right opening premise , but it lacks the zest and it \textcolor{red}{\textit{[[is]]}} for a plot twist instead of trusting the material . (negative $\rightarrow$ positive) & 0.987 & 21.635 & 206 \bigstrut\\
|
||
\hline
|
||
\end{tabular}
|
||
|
||
\label{tab:addlabel}
|
||
\end{table}
|
||
|
||
|
||
\subsubsection{Long Text}
|
||
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Amazon}
|
||
\begin{tabular}{|p{13.1cm}|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{Adversarial Example} & $\mathit{Sim}$ & $\Delta\mathit{PPL}$ & $\mathit{Qrs}$ \bigstrut\\
|
||
\hline
|
||
I predicted there would be some sort of twist and there is, its a good one. The acting was a little off but nothing crazy. I felt the girl should have done better considering all those years she did Charm but she was off. Still worth \textcolor{red}{\textit{[[remarking]]}}! (positive $\rightarrow$ negative) & 0.975 & 7.374 & 203 \bigstrut\\
|
||
\hline
|
||
\end{tabular}%
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{IMDB}
|
||
\begin{tabular}{|p{13cm}|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{Adversarial Example} & $\mathit{Sim}$ & $\Delta\mathit{PPL}$ & $\mathit{Qrs}$ \bigstrut\\
|
||
\hline
|
||
It \textcolor{red}{\textit{[[transpires]]}} at least vaguely possible that this movie provided a bit of inspiration for "The Sopranos," as its main character, Martin Blank (John Cusack) is a hit man who has so many issues from his past and his profession that he's in therapy trying to deal with it all. Everything finally comes to a head at his 10-year high school reunion. The problem was that by the time Blank got to the reunion I had stopped caring. Frankly, I \textcolor{red}{\textit{[[detected]]}} this movie a drag from start to finish. It had \textcolor{red}{\textit{[[attainable]]}}. There was a found good cast, headed by \textcolor{red}{\textit{[[Schwartzman]]}} and Dan Aykroyd, playing Grocer, his arch-rival in the hit-man business, along with Minnie Driver as Debi, Blank's high school sweetheart who he stood up on prom night, and a limited role for Alan Arkin as Dr. Oatman, Blank's psychologist. That fairly talented cast never really seemed to come together, though. The drama lacked intensity and the comedy lacked real humour. What I thought had the most potential to be a comedic storyline was Grocer's proposal for a hit man's union, but aside from becoming a bit of a running joke, the idea never really got developed. As for the romance, one wondered why Debi would even think of letting this guy back into her life. There were a handful of chuckles, but nothing really caught me and held me and I spent most of the movie wondering whether this thing was ever going to start to click. It never did - not for me, at least. 2/10 (negative $\rightarrow$ positive) & 0.990 & 11.263 & 2870 \bigstrut\\
|
||
\hline
|
||
\end{tabular}%
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
\clearpage
|
||
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Yelp}
|
||
|
||
\begin{tabular}{|p{13.1cm}|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{Adversarial Example} & $\mathit{Sim}$ & $\Delta\mathit{PPL}$ & $\mathit{Qrs}$ \bigstrut\\
|
||
\hline
|
||
this is a tough one n nafter reading through all the other reviews more thoroughly after my experience , i was glad to see that i 'm not crazy and that my experience wasn't unique n nmy boyfriend and i arrived at 1 30pm on a saturday my mistake i thought they were open until 2 00pm as per the yelp app , but the chef vigorously and almost frantically informaed us that they closed at 1 45pm , therefore they only had 5 items available we mentioned that that was fine since we only had an hour anyway since we were headed to the airport this invoked another frantic explanation about how she needed time to prepare the food and could n't be rushed this whole encounter was stressing me out n ni asked if we could get anything to go and she pointed out some vegan cupcakes then mentioned she could get us some chili ! awesome ! ! chili it is ! ! we ordered two servings of chili to go n nwe sat down at an empty table to wait for our chili there was only one other couple in the restaurant and they appeared to \textcolor{red}{\textit{[[become]]}} done with their meal i e no plates or silverware , enjoying coffee we continued to wait n nafter about 10 minutes i stood up to clarify our order with the guy behind the register we ordered chili to go , right ? the guy called out to inquire about the status of the chili and the chef burst out of the kitchen saying , yes , you ordered it to go , right ? right ? i 'm preparing it to go right ? that 's what i understood to go ! ! n nthe whole experience was just kind of bizarre and dramatic and \textcolor{red}{\textit{[[minor]]}} rude n nnot to mention that while we we waiting a group of four came in they got the same frantic explanation about the limited menu they chose to sit down they made the mistake of bringing in outside drinks ( iced coffees ) now , while i was waiting i spied the signs about outside drinks not being allowed i then watched the chef approach the \textcolor{red}{\textit{[[tables]]}} and just take the drinks from the customers and put them in the \textcolor{red}{\textit{[[dustbin]]}} no verbal explanation no , hey , would you like to finish that ? we don't allow outside drinks she just took them from the customers and threw them away really ? ? the customers were kind of playing along with the drama , but you could tell that that action was a direct hit n nfinally the two 16 oz containers of chili come out yay ! then he rings me up 10 per serving ? ? i 'm \textcolor{red}{\textit{[[appologize}]]} , that seems high to me 21 62 for two \textcolor{red}{\textit{[[midst]]}} sized containers to go ? ? whatever , just \textcolor{red}{\textit{[[let]]}} me my food n nand now onto the food ! the one thing i ordered it was a pumpkin , squash , black bean chili it was excellent ! the mixture was seasoned really well , the flavors and textures were well combined it was some of the best vegan chili i've ever had ! n ni really wanted to have a great experience the cool , divey exterior , the photos of adopted turkeys on the walls , the fact it was a fully vegan shop but i didn't and \textcolor{red}{\textit{[[genuinely]]}} , i don't \textcolor{red}{\textit{[[considering]]}} i would go back n nfootnote as we were leaving and getting ready to pull out of the parking lot , the group of four has \textcolor{red}{\textit{[[opt]]}} to leave as well i don't blame them (negative $\rightarrow$ positive) & 0.998 & 8.013 & 1861 \bigstrut\\
|
||
\hline
|
||
\end{tabular}%
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
\subsection{Natural Language Inference}
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{NLI}
|
||
\begin{tabular}{|c|p{11.5cm}|c|c|c|}
|
||
\hline
|
||
Datasets & \multicolumn{1}{c|}{Adversarial Examples} & $\mathit{Sim}$ & $\Delta\mathit{PPL}$ & $\mathit{Qrs}$ \bigstrut\\
|
||
\hline
|
||
\multirow{2}[2]{*}{MNLI} & Premise: Watergate remains for many an unhealed wound, and Clinton's critics delight in needling him with Watergate comparisons--whether to Whitewater or Flytrap. & \multirow{2}[2]{*}{0.889 } & \multirow{2}[2]{*}{41.024 } & \multirow{2}[2]{*}{72} \bigstrut[t]\\
|
||
& Hypothesis: Clinton's critights \textcolor{red}{\textit{[[loves]]}} using Watergate to attack Clinton with. (neutral $\rightarrow$ entailment) & & & \bigstrut[b]\\
|
||
\hline
|
||
\multirow{2}[2]{*}{SNLI} & Premise: A woman in black walks down the street in front of a graffited wall. & \multirow{2}[2]{*}{0.879 } & \multirow{2}[2]{*}{9.358 } & \multirow{2}[2]{*}{344} \bigstrut[t]\\
|
||
& Hypothesis: A woman \textcolor{red}{\textit{[[jogs]]}} past a wall of graffiti on her way to the bookstore. (neutral $\rightarrow$ contradiction) & & & \bigstrut[b]\\
|
||
\hline
|
||
\end{tabular}%
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
|
||
\clearpage
|
||
\subsection{Toxic Comment }
|
||
|
||
% Table generated by Excel2LaTeX from sheet 'Sheet1'
|
||
\begin{table}[pos=h]
|
||
\centering
|
||
\caption{Jigsaw-toxicity}
|
||
\begin{tabular}{|p{13.2cm}|c|c|c|}
|
||
\hline
|
||
\multicolumn{1}{|c|}{Adversarial Example} & $\mathit{Sim}$ & $\Delta\mathit{PPL}$ & $\mathit{Qrs}$ \bigstrut\\
|
||
\hline
|
||
The so called sourced information is sourced from garbage. The bottom line is the information is wrong, and it doesn't matter what the rules are when someone is posting blatantly false information and then justifying this slanderous and false information because it has a source. Vandalism is continuing to insist in posting false information when you've been told it is false. And again, I couldn't care less about the rules, it has nothing to do with understanding them, it has to do with posting lies and continuing to do so. You've basically made this page completely useless to anyone who wants to know the true history of Clan Davidson and for that, you are about as low as one can go. This page is now a \textcolor{red}{\textit{[[faggot]]}} tale with no basis in fact and what comes around goes around my little friend. (non-toxic $\rightarrow$ toxic) & 0.981 & 1.556 & 447 \bigstrut\\
|
||
\hline
|
||
\end{tabular}%
|
||
\label{tab:addlabel}%
|
||
\end{table}%
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
\end{document}
|