# SAT, Computer Algebra, Multipliers * 

Daniela Kaufmann ${ }^{1}$, Armin Biere ${ }^{1}$, and Manuel Kauers ${ }^{1}$<br>Johannes Kepler University, Linz, Austria<br>daniela.kaufmann@jku.at armin.biere@jku.at manuel.kauers@jku.at


#### Abstract

Verifying multiplier circuits is an important problem which in practice still requires substantial manual effort. The currently most effective approach uses polynomial reasoning. However parts of a multiplier, i.e., complex final stage adders are hard to verify using computer algebra. In our approach we combine SAT and computer algebra to substantially improve automated verification of integer multipliers. In this paper we focus on the implementation details of our new dedicated reduction engine, which not only allows fully automated adder substitution, but also employs polynomial reduction efficiently. Our tool is furthermore able to generate proof certificates in the practical algebraic calculus and we also investigate the size of these proofs for one specific multiplier architecture.


## 1 Introduction

Formal verification of arithmetic circuits is extremely important to help to prevent issues like the famous Pentium FDIV bug. There have been many attempts since then to verify such circuits, but even today the problem of formally verifying arithmetic circuits, and especially multiplier circuits, is still considered to be hard and cannot be applied fully automated. In principle, theorem provers in combination with SAT are able to certify industrial multipliers [12]. However, such approaches lack automation.

Currently the most successful automated approach uses polynomial reasoning [4, 13, 17, 18,23] and in recent years has seen significant progress. The approach of [17,18] employs local cancellation of vanishing monomials in converging cones, which allows to verify a large variety of multiplier architectures much more efficiently than previous work. The authors of [4,23] eliminate redundant polynomials by identifying full- and half-adders in the multipliers. This technique is able to verify large simple multipliers, but fails on even slightly more complex multiplier architectures.

In our method [14] we combine two approaches, i.e., SAT and computer algebra. We observe that final stage adders of multipliers are a real challenge for the algebraic approach as some adder designs rely on sequences of OR-gates, which lead to an explosion of the polynomial representation of the intermediate results. Contrarily SAT solvers can easily verify the equivalence of adder circuits. Therefore we apply adder substitution and replace complex final stage adders by simpler adders and verify the correctness of the substitution using SAT solvers. The correctness of the simplified multiplier is shown using computer algebra. Our method is an order of magnitude faster than related work and is able to verify circuits with input bitwidth 2048.

Our reduction engine AMULET [14] detects complex final stage adders and applies adder substitution fully automatically. A bit-level miter in conjunctive normal form (CNF) as well as a rewritten multiplier is generated. In the verification phase AMULET uses the structure of the polynomial representation of circuits and thus is more efficient in circuit verification than computer algebra systems [7, 22] used in our previous work. Additionally we apply preprocessing based on variable elimination.

Furthermore AMULET efficiently produces certificates in the PAC format [21], which allow to check the correctness of the verification results. None of the related work [4, 17, 18, 23] produces certificates.

[^0]This paper provides supplementary material for an invited talk at Vampire' 19 of the second author based on $[13,14]$. We discuss the implementation of AMULET and present the underlying algorithms of [14] in more detail. Additionally we provide a generalization of our incremental approach of [13]. We further show that we are able to generate proof certificates of quadratic length for simple multipliers.

## 2 Algebraic approach

We consider acyclic gate-level circuits $C$ which implement integer multiplication. The circuits have $2 n$ input bits $a_{0}, \ldots, a_{n-1}, b_{0}, \ldots, b_{n-1} \in\{0,1\}$, as well as $2 n$ output bits $s_{0}, \ldots, s_{2 n-1} \in\{0,1\}$ and further a number of internal logical gates denoted by $g_{0}, \ldots, g_{k} \in\{0,1\}$. Let $R$ be a commutative ring with unity and let $R[X]$ be the polynomial ring over $R$ and the set of variables

$$
X=\left\{a_{0}, \ldots, a_{n-1}, b_{0}, \ldots, b_{n-1}, s_{0}, \ldots, s_{2 n-1}, g_{0}, \ldots, g_{k}\right\}
$$

A $\operatorname{term} \tau=x_{1}^{d_{1}} \cdots x_{r}^{d_{r}}$ is a product of powers of variables for certain $d_{1}, \ldots, d_{r} \in \mathbb{N}$. A monomial is a multiple of a term $c \tau$, with $c \in R$ and a polynomial $p$ is a finite sum of monomials.

An order $\leq$ is fixed on the set of terms such that $1 \leq \tau$ and $\sigma_{1} \leq \sigma_{2} \Rightarrow \tau \sigma_{1} \leq \tau \sigma_{2}$ for all terms $\tau, \sigma_{1}, \sigma_{2}$. Such an order is a lexicographic term order if for all terms $\sigma_{1}=x_{1}^{d_{1}} \cdots x_{r}^{\bar{d}_{r}}, \sigma_{2}=x_{1}^{e_{1}} \cdots x_{r}^{e_{r}}$ it holds that $\sigma_{1}<\sigma_{2}$ iff there exists $i$ with $d_{j}=e_{j}$ for all $j<i$, and $d_{i}<e_{i}$. The largest term (w.r.t. $\leq$ ) in a polynomial $p=c \tau+\cdots$ is called the leading term $\operatorname{lt}(p)=\tau$. The leading coefficient and leading monomial of $p$ are defined accordingly. Furthermore we call $p-c \tau$ the tail of $p$.

The specification of a circuit describes the desired relation between the outputs and inputs of a circuit. If for all possible inputs the circuit computes the desired output, we say that the circuit fulfills its specification and thus is correct. Formal verification aims to derive whether a circuit fulfills its specification or not. In the algebraic verification approach we model each logical gate by a polynomial. Correctness of the circuit is shown by deriving that the specification, also encoded as a polynomial $\mathcal{L}$, is implied by the gate polynomials.

The polynomial ring $R$ is fixed with the specification. Although we model integer multiplication, we showed in [14] that it is beneficial to use more general polynomial rings which allow modular reasoning.

Definition 1. The specification $\mathcal{U}_{n}$ of $n$-bit unsigned integer multipliers in the ring $\mathbb{Z}_{2^{2 n}}[X]$ is given as

$$
\begin{equation*}
\mathcal{U}_{n}=\sum_{i=0}^{2 n-1} 2^{i} s_{i}-\left(\sum_{i=0}^{n-1} 2^{i} a_{i}\right)\left(\sum_{i=0}^{n-1} 2^{i} b_{i}\right) \tag{1}
\end{equation*}
$$

As discussed in [14] modular reasoning also allows to define the specification of truncated multipliers, i.e., a truncated multiplier only returns the $n$ least significant output bits.

Definition 2. The specification $\mathcal{T}_{n}$ of $n$-bit truncated multipliers in the ring $\mathbb{Z}_{2^{n}}[X]$ is given as

$$
\begin{equation*}
\mathcal{T}_{n}=\sum_{i=0}^{2 n-1} 2^{i} s_{i}-\left(\sum_{i=0}^{n-1} 2^{i} a_{i}\right)\left(\sum_{i=0}^{n-1} 2^{i} b_{i}\right)=\sum_{i=0}^{n-1} 2^{i} s_{i}-\sum_{i=0}^{n-1} \sum_{j=0}^{n-1-i} 2^{i+j} a_{i} b_{j} \tag{2}
\end{equation*}
$$

After fixing the specification and thus the coefficient ring $R$, each logical gate in the circuit is encoded as a polynomial, such as:

$$
\begin{array}{llllll}
u=\neg v & \text { implies } & 0=-u+1-v & u=v \vee w & \text { implies } & 0=-u+v+w-v w \\
u=v \wedge w & \text { implies } & 0=-u+v w & u=v \oplus w & \text { implies } & 0=-u+v+w-2 v w . \tag{3}
\end{array}
$$

The polynomial equations in (3) are chosen in such a way that the possible solutions with $u, v, w \in$ $\{0,1\}$ of the polynomials in $R[X]$ are the solutions of the gate constraints and vice versa. Note, the polynomials above are defined in the ring $\mathbb{Z}[X]$ and thus the structure may differ for different coefficient rings $R$. We order the terms according to a reverse topological lexicographic term order, such that the output variable of a gate is always greater than the variables attached to the input edges of that gate.

Definition 3. By $G(C) \subseteq R[X]$ we denote the set of circuit polynomials which contains for each gate of $C$ the corresponding polynomial of (3). We further have Boolean value constraints $x(1-x)=0$ for $x \in X$, encoding that $x$ is a Boolean variable. Let $B(Y)=\{y(1-y) \mid y \in Y\} \subseteq R[X]$ for $Y \subseteq X$, be the set of Boolean value constraints for $Y$.

Definition 4. A nonempty subset $I \subseteq R[X]$ is called an ideal if $\forall p, q \in I: p+q \in I$ and $\forall p \in$ $R[X] \forall q \in I: p q \in I$. A set $P=\left\{p_{1}, \ldots, p_{s}\right\} \subseteq R[X]$ is called a basis of $I$ if $I=\left\{p_{1} q_{1}+\cdots+p_{s} q_{s} \mid\right.$ $\left.q_{1}, \ldots, q_{s} \in R[X]\right\}$. We say $I$ is generated by $P$ and write $I=\langle P\rangle$. The sum of two ideals $I$ and $J$ is defined as $I+J=\{p+q \mid p \in I, q \in J\}$.

In [14] we showed that the question whether $\mathcal{L}$ is implied by the gate polynomials of $C$ and the Boolean value constraints can be answered by deciding a so-called ideal membership problem: "Given $q \in R[X]$ and a (finite) set of polynomials $P \subseteq R[X]$, decide whether $q \in\langle P\rangle$."

Definition 5. Let $P \subseteq R[X]$. If for a certain term order, all leading terms of $P$ only consist of a single variable with exponent 1 and are unique and further all leading coefficients are multiplicatively invertible in $R$, then we say $P$ has unique monic leading terms (UMLT). Let $X_{0}(P) \subseteq X$ be the set of all variables that do not occur as leading terms in $P$. We further define $B_{0}(P)=B\left(X_{0}(P)\right)$.

It is easy to see that for an acyclic circuit $C$ the set $G(C)$ has UMLT for a fixed reverse topological term order. Further $X_{0}(P)$ contains only circuit inputs $a_{i}, b_{i}$.

Definition 6. Let $C$ be a circuit and let $J(C)=\left\langle G(C) \cup B_{0}(C)\right\rangle \subseteq R[X]$, with $B_{0}(C)=B_{0}(G(C))$.
Corollary 1. [14] A circuit $C$ fulfills $\mathcal{L}$ iff $\mathcal{L} \in J(C)$.
The theory of Gröbner bases [3] offers a decision procedure for the ideal membership problem. For our purpose we use the more general theory of D-Gröbner bases [2], where the coefficient domain $D$ is a principal ideal domain (PID). Let $p, q, r \in D[X]$ and let $P \subseteq D[X]$. A basis $P$ of an ideal $I \subseteq D[X]$ is a $D$-Gröbner basis of $I$ iff $\forall q \in I \exists p \in P: \operatorname{lm}(p) \mid \operatorname{lm}(q)$. Every ideal of $D[X]$ has a D-Gröbner basis, and there is an algorithm (Thm. 10.14 of [2]) which, given an arbitrary basis of an ideal, computes a D-Gröbner basis of it in finitely many steps.

We say $q D$-reduces to $r$ w.r.t. $p$ if there exists a monomial $m^{\prime}$ in $q$ with $m^{\prime}=m \operatorname{lm}(p)$ and $r=q-m p$. The remainder $r$ of the D-reduction of $q$ by $P$ is such that $q-r \in\langle P\rangle$ and $r$ is $D$-reduced w.r.t. $P$. If $P$ is a D-Gröbner basis, then $r=0$ iff $q \in\langle P\rangle$.

For the specifications listed in Def. 1 and Def. 2 we fixed the polynomial rings to $\mathbb{Z}_{l}[X]$ for $l \in \mathbb{N}$. In general $\mathbb{Z}_{l}$ is not a PID, but we showed in [14] that the ideal membership problem in $\mathbb{Z}_{l}[X]$ can be converted to an ideal membership problem in the ring $\mathbb{Z}[X]$, with $\mathbb{Z}$ being a PID. Whenever we want to decide whether a polynomial $q \in I \subseteq \mathbb{Z}_{l}[X]$ we can instead check whether $q \in I+\langle l\rangle \subseteq \mathbb{Z}[X]$. For the latter we have the concept of D-Gröbner bases available. And since $G(C)$ has UMLT we can directly derive a D-Gröbner basis for $J(C)+\langle l\rangle$.

Lemma 1. [14] Let $l \in \mathbb{N}$. Then $G(C) \cup B_{0}(C) \cup\{l\}$ is a D-Gröbner basis for $J(C)+\langle l\rangle \subseteq \mathbb{Z}[X]$.

### 2.1 Incremental Verification

In [13] we introduced an incremental verification algorithm, which splits the verification problem into smaller more manageable subproblems by partitioning the circuit into column-wise slices and splitting the word-level specification into multiple smaller specifications which relate the partial products, incoming carries, sum output bit and the outgoing carries of each slice. However this algorithm is tailored to multiplication of unsigned bit-vectors. In this section we show how to apply this procedure to different multiplier specifications. As the number of output bits varies for different multipliers, eg. in Def. 2, we denote the number of output bits by the constant $m$ and fix $l=2^{m}$ in this section.

Definition 7. Let $I_{i}:=\left\{\right.$ gate $g \mid g$ is in input cone of $\left.s_{i}\right\}$ be the input cone of each output bit $s_{i}$ for $0 \leq i<m$. A slice $S_{i}$ is defined as the difference of consecutive cones $I_{i}$, i.e., $S_{0}:=I_{0}$ and $S_{i+1}:=I_{i+1} \backslash \bigcup_{j=0}^{i} S_{j}$.

Definition 8 (Sliced Gröbner Bases). Let $G_{i}(C)$ be the set of circuit polynomials of the gates in a slice $S_{i}$. The terms are ordered such that the requirements of Lemma. 1 are fulfilled. We define by $X_{0}\left(G_{i}\right)$ the set of variables that do not occur as leading terms in $G_{i}(C)$ and further define $B_{0}\left(G_{i}\right)=B\left(X_{0}\left(G_{i}\right)\right)$.

Corollary 2. $G_{i}(C) \cup B_{0}\left(G_{i}\right) \cup\left\{2^{m}\right\}$ is a D-Gröbner basis for $\left\langle G_{i}(C) \cup B_{0}\left(G_{i}\right)\right\rangle+\left\langle 2^{m}\right\rangle$.
Corollary 2 follows directly from Lemma. 1. It is easy to see that $\left\langle G_{i}(C) \cup B_{0}\left(G_{i}\right)\right\rangle$ contains all the Boolean value constraints $B\left(G_{i}\right)$ for the gate variables in $S_{i}$, thus we may use them in the reduction process to eliminate exponents greater than 1 in the intermediate reduction results. After splitting the circuit, we are now going to split the word-level specification of a multiplier.

Definition 9. Let $C$ be a multiplier circuit which is sliced according to Def. 7 and let $\mathcal{L}$ be the specification of $C$. For slice $S_{i}$ with $0 \leq i<m$ let $P_{i}=\sum_{j+k=i} \alpha_{j k} a_{j} b_{k}$ be the partial product sum of column $i$, where the constant $\alpha_{j k}$ is the coefficient of the term $a_{j} b_{k}$ in $\mathcal{L}$.

Definition 10. Let $C$ be a multiplier circuit. A sequence of $m+1$ polynomials $C_{0}, \ldots, C_{m}$ over the variables of $C$ is called a carry sequence if for all $0 \leq i \leq m$ it holds that

$$
-C_{i}+C_{i+1}+\alpha_{i} s_{i}+P_{i} \quad \in J(C)
$$

where the constant $\alpha_{i}$ is the coefficient of $s_{i}$ in $\mathcal{L}$. We call the polynomials $-C_{i}+C_{i+1}+\alpha_{i} s_{i}+P_{i}$ the carry recurrence relations for the sequence $C_{0}, \ldots, C_{m}$.

It remains to fix the boundary polynomial $C_{m}$, where we simply choose $C_{m}=0$. Our incremental algorithm is shown in Alg. 1 and it follows from the proof of Thm. 6 in [13] that Alg. 1 is correct.

```
Algorithm 1: Multiplier Checking Algorithm
    Input : Circuit \(C\) with \(m\) output bits, sliced Gröbner bases \(G_{i}\)
    Output: Determine whether \(C\) is a correct multiplier
    \(l \leftarrow 2^{m}, C_{m} \leftarrow 0 ;\)
    for \(i \leftarrow m-1\) to 0 do
        \(C_{i} \leftarrow \operatorname{Remainder}\left(C_{i+1}+\alpha_{i} 2^{i} s_{i}-2^{i} P_{i}, G_{i}(C) \cup B\left(G_{i}\right) \cup\{l\}\right)\)
    return \(C_{0}=0\)
```



Figure 1: The components of a multiplier.

## 3 SAT

Computer algebra is able to verify simple multipliers very efficiently. However more complex multiplier architectures still impose quite a challenge and lead to a monomial blow-up in the intermediate reduction results. The reason for this blow-up are certain adder structures, which are part of the multipliers. During preparation for the SAT Race 2019 [15] we observed that checking the equivalence of different adder circuits is rather trivial for SAT solvers. We make use of this observation in the verification procedure and combine computer algebra and SAT. We summarize the main idea, as presented in [14].

Generally multipliers can be decomposed into three components [20], which are shown in Fig. 1. In the first component partial product generation (PPG) the partial products $a_{i} b_{j}$ as contained in $\mathcal{L}$ are derived. This can for example be achieved using simple AND-gates or using a more complex Booth encoding. In the second stage partial product accumulation (PPA) the partial products are reduced to two layers using full- and half-adders. In the last stage the output of the circuit is computed using an adder circuit. Hence we call this component final stage adder (FSA).

Adder circuits can be split into two groups. Either the carries are computed simultaneously with the sum bits or they are calculated separately before the sum to decrease the latency of carry computation. A scheme for both adder types can be seen in Fig. 2. Adders of the first group are usually based on a sequence of half- and full-adders, which gives them a simple but inefficient structure, eg., ripplecarry adders. Adders of the second group are also called generate-and-propagate (GP) adders. In a GP adder with inputs $x_{0}, \ldots, x_{m}, y_{0}, \ldots, y_{m}, c_{i n}$ and outputs $s_{0}^{\prime}, \ldots, s_{m}^{\prime}, c_{o u t}$ the output bits $s_{i}^{\prime}$ are calculated as $s_{i}^{\prime}=p_{i} \oplus c_{i}$, with $p_{i}=x_{i} \oplus y_{i}$. The carries $c_{i}$ are recursively generated using the equation $c_{i}=\left(x_{i-1} \wedge y_{i-1}\right) \vee\left(c_{i-1} \wedge p_{i-1}\right)$ with $c_{m+1}=c_{\text {out }}$ and $c_{0}=c_{i n}$. The precise derivation of the carries $c_{i}$ (recursively, unrolled or mixed) depends on the architecture of the adders, but is generally computed using sequences of OR-gates. These sequences of OR-gates make the GP adders hard to verify using the algebraic approach as the following example shows.

Example 1. Let $o=o_{2} \vee x_{0}, o_{2}=o_{1} \vee x_{1}, o_{1}=x_{3} \vee x_{2}$ represent a sequence of three OR-gates, which can be simplified to $o=x_{0} \vee x_{1} \vee x_{2} \vee x_{3}$. The corresponding polynomial representation $o=x_{0}+x_{1}-$ $x_{0} x_{1}+x_{2}-x_{0} x_{2}-x_{1} x_{2}+x_{0} x_{1} x_{2}+x_{3}-x_{0} x_{3}-x_{1} x_{3}+x_{0} x_{1} x_{3}-x_{2} x_{3}+x_{0} x_{2} x_{3}+x_{1} x_{2} x_{3}-x_{0} x_{1} x_{2} x_{3}$ contains $2^{4}-1$ monomials.


Figure 2: GP adder (left) and equivalent RCA (right).

In our approach we identify whether the FSA is a GP adder, using the equations $s_{i}^{\prime}=p_{i} \oplus c_{i}$ and $p_{i}=x_{i} \oplus y_{i}$. The algorithm is described in detail in Sect. 4, where we present our tool AMULET. If we detect that the FSA is a GP adder, we substitute the FSA by a simple ripple-carry adder (RCA), which has the same inputs $x_{0}, \ldots, x_{m}, y_{0}, \ldots, y_{m}, c_{i n}$ than the original FSA. We do not change the first two stages PPG and PPA. To prove that the RCA is equivalent to the GP adder we generate a bit-level miter in CNF, which is verified by a SAT solver. However, if the FSA is not a GP adder we do not apply adder substitution. After substitution we verify the rewritten AIG in AMULET using computer algebra. Figure 3 shows the original multiplier (orange) as well as the RCA and the bit-level miter (green). The dashed boxes depict which components of the extended multiplier are verified using SAT (blue) and computer algebra (red).


Figure 3: Reasoning techniques used to verify the extended multiplier circuit.

```
Algorithm 2: Adder substitution AMULET
    Input : Circuit \(C\) in AIG format
    Output: Rewritten Circuit \(C^{\prime}\) in AIG format, bit-level miter as CNF \(F\)
    Init( \(C\) );
    \(\tau \leftarrow 1\);
    \(c_{o u t}, \tau \leftarrow\) Identify-Carry-Out \(\left(s_{2 n-1}\right)\);
    if \(\tau=0\) then
            return \(C, 0\);
    ' \(j \leftarrow 2 n-2, \sigma \leftarrow 1\);
    while \(\sigma\) and \(j \geq 0\) do
        \(\sigma \leftarrow\) Check-if-XOR \(\left(s_{j}\right)\);
        \(\sigma, c_{j}, p_{j} \leftarrow\) Identify- \(\mathrm{p}_{j}\)-and- \(\mathrm{c}_{j}\left(s_{j}, \sigma\right) ;\)
        \(\sigma, x_{j}, y_{j} \leftarrow\) Mark-Adder-Inputs \(\left(p_{j}, \sigma\right)\);
        \(j \leftarrow j-1 ;\)
    \(c_{i n} \leftarrow c_{j}\);
    \(\tau \leftarrow\) Follow-and-Mark-Paths \(\left(c_{o u t}, X, Y, c_{i n}\right)\);
    for \(i \leftarrow j+1\) to \(2 n-2\) do
        \(\tau \leftarrow\) Follow-and-Mark-Paths \(\left(s_{i}, X, Y, c_{i n}, \tau\right)\);
    if \(\tau=0\) then
            return \(C, 0\);
    \(R \leftarrow \operatorname{Generate}-\operatorname{AIG}-\operatorname{RCA}\left(X, Y, c_{i n}\right)\);
    \(M \leftarrow\) Generate-Miter \((C, R)\);
    \(F \leftarrow\) Miter-to-CNF \((M)\);
    \(C^{\prime} \leftarrow\) Generate-Rewritten- \(\operatorname{AIG}(C, R)\);
    return \(C^{\prime}, F\)
```


## 4 AMulet

In this section we explain implementation details of our tool AMULET. Our tool, which is written in C reads multipliers given as And-Inverter-Graphs (AIG) [16] and automatically applies adder substitution and verification. Additionally we are able to generate proof certificates.

### 4.1 Adder Substitution

Our algorithm, which identifies whether the FSA is a GP adder and, if necessary, replaces the FSA by a RCA is shown in Alg. 2. It reads the original multiplier and returns a circuit in the AIG format as well as a CNF. To identify GP adders we highly relate on their structure as presented in Sect. 3. In particular we rely on the fact that the outputs $s_{i}^{\prime}$ are always outputs of XOR-gates and that the carries $c_{i}$ are never outputs of XOR-gates.

In the initialization phase AMULET reads the given multiplier and for each node in the AIG we introduce a unique variable. Variables in AMULET are organized in an ordered array, where the indices match the literal (divided by 2) of the AIG node. As there is a one-to-one correspondence between variables and AIG nodes we will use both terms interchangeably. We further identify whether the variable is an output or an internal gate of an XOR gate, using syntactic pattern matching.

The variable $\tau$ of line 2 acts as an error-flag. In line 3 we identify whether the output $s_{2 n-1}$ of the multiplier is the carry output $c_{\text {out }}$ of the FSA, which is not always the case. In some multiplier architectures the output $s_{2 n-1}$ is computed as an XOR, whose inputs are the carry output $c_{o u t}$ of the FSA and some output from the PPA step, which is usually again an XOR gate. Thus in line 3 we identify whether $s_{2 n-1}$ is an XOR gate. If not, then $s_{2 n-1}=c_{o u t}$. If on the other hand $s_{2 n-1}$ is an XOR gate we
examine the inputs of $s_{2 n-1}$ and identify which child is not an XOR gate. This child is then identified as $c_{\text {out }}$. If both inputs are not XOR gates, we cannot clearly identify $c_{\text {out }}$ of the FSA and set $\tau=0$. In that case the algorithm terminates and returns the original multiplier and an empty bit-level miter.

In the while-loop we identify the inputs $x_{0}, \ldots, x_{m}, y_{0}, \ldots, y_{m}, c_{i n}$ of the FSA. We do not know the concrete value of $m$ in advance, as it depends on the multiplier architecture. Hence we recursively iterate over the outputs of the multiplier. We start the loop at the output $s_{2 n-2}$, since $s_{2 n-1}$ was used to identify the carry output of the FSA. In line 8 we check if the output $s_{j}$ is an XOR gate. If so, we identify the propagate bit $p_{j}$ and the carry bit $c_{j}$ in the next step. Here we rely on the fact that $p_{j}$ is an XOR gate and $c_{j}$ is not an XOR gate. Using $p_{j}$ we mark the inputs $x_{j}, y_{j}$ of the adder in the next step.

As shown in Fig. 1 not all output bits of the multiplier are computed by the FSA. Smaller output bits may already be computed in the PPA step. Hence at some point we are not able to identify $p_{j}, c_{j}$ or $x_{j}, y_{j}$ anymore, which we capture in $\sigma$. If the FSA is not a GP adder the loop will directly stop after the first iteration. The carry-in $c_{i n}$ of the FSA is set to the smallest $c_{j}$, which was identified.

In lines 13 to 15 we mark all gates which belong to the FSA. We start at the carry output $c_{\text {out }}$ resp. sum outputs $s_{j+1}, \ldots, s_{2 n-1}$ and follow all paths in the input cones until we either reach a marked input $x_{i}, y_{i}$ or $c_{i n}$. We mark the visited variables. If at some point we reach the input variables $a_{i}, b_{i}$ of the multiplier the FSA is not a GP adder, i.e., we were not able to clearly identify the boundaries of the FSA. Consequently adder substitution was not successful and the initially given AIG is returned without generating a bit-level miter. If on the other hand all paths stop at the marked inputs or at $c_{i n}$, we have successfully identified and marked all gates belonging to a GP adder and apply adder substitution.

We generate an equivalent RCA in line 18. A RCA is simply a sequence of full-adders, cf. Fig. 2 and the AIG encoding of a full-adder can be seen in Fig. 5b. After the RCA is generated, the bit-level miter is defined. It contains all the gates which are identified to belong to the GP adder and the gates of the RCA. Furthermore we add XOR gates, whose inputs are corresponding pairs of output bits of the two adders. These XOR gates are summed up by a sequence of OR-gates, cf. Fig. 3. If the two adders are equivalent, all equivalent pairs of output bits compute the same result. Thus the outputs of the XOR gates are 0 , consequently all OR-gates are 0 and the output of the miter is 0 .

The equivalence of the adders is verified using a SAT-solver. Hence the bit-level miter is translated into a CNF $F$ in line 20. More precisely, the propositional formulas represented by each AIG node are translated into CNF, which is rather straightforward. If for example an AIG node represents $x=$ $a \wedge \bar{b}$, the equivalent propositional formula is $\neg(x \leftrightarrow a \wedge \bar{b})=\top$ which can be translated to the CNF $(x \vee \bar{a} \vee b) \wedge(\bar{x} \vee a) \wedge(\bar{x} \vee \bar{b})=\top$. We iterate over each node and output the corresponding clauses in DIMACS format. The final clause which is added, is the assumption that the output of the miter is 1. Thus for a correct adder substitution the SAT solver has to return that the CNF is UNSAT.

In the rewritten multiplier, we keep all nodes of the original multiplier, which are not marked to be an element of the FSA and replace the subgraph defining the GP adder by the AIG of the RCA. The rewritten AIG $C^{\prime}$ as well as the CNF $F$ are returned by AMulet.

### 4.2 Verification

After applying adder substitution the multiplier is verified. The pseudo-code can be seen in Alg. 3. During initialization, which is similar to Alg. 2, we fix the specification polynomial $\mathcal{L} \in \mathbb{Z}_{l}[X]$ and thus the constant $l$. As there are now more data structures involved, in particular representation of polynomials, let us briefly discuss our design decisions. The variables are organized as an ordered array. Terms are represented as ordered linked lists of variables. In general terms will be used multiple times during the reduction process, thus they are organized in a hash table. Monomials contain a coefficient and a term. Since the values of the coefficients exceed $2^{64}$ we use the GMP library [8] for number representation. Polynomials are represented as sorted linked lists of monomials. In the data structure

```
Algorithm 3: Outline of verification flow in AMULET
    Input : Substituted Circuit \(C\) in AIG format
    Output: Determine whether \(C\) is a multiplier
    \(\mathcal{L}, l \leftarrow \operatorname{Init}(C)\);
    for \(i \leftarrow 0\) to \(2 n-1\) do
        \(S_{i} \leftarrow\) Define-Slices \((i)\);
        Order-Slices \(\left(S_{i}\right)\);
        \(G_{i}(C) \leftarrow\) Init-Polynomials-of-Slices \(\left(S_{i}\right)\);
    \(\Omega \leftarrow\) Search-for-Booth-Encoding \((C)\);
    for \(i \leftarrow 0\) to \(2 n-1\) do
        Local-Elimination \(\left(G_{i}(C), l\right)\);
    Global-Elimination \((\Omega)\);
    \(C_{0} \leftarrow\) Incremental-Reduction \(\left(\mathcal{L}, G_{i}(C)\right.\) );
    return \(C_{0}=0\)
```



Figure 4: All polynomial encodings covered by AIG nodes
slice we store the gates which are assigned to a slice and the corresponding gate polynomials.
We use our incremental verification approach, cf. Alg. 1. Hence we define the slices as differences of consecutive input cones as introduced in Def. 7. However in certain cases not all gates are assigned to the correct slices. To this end we merge and promote gates as described in [13]. Additionally we identify which nodes are carry nodes, i.e., which nodes are used as inputs of nodes in bigger slices.

After the slices are defined we fix the reverse topological lexicographic term ordering in line 4. The gates inside the slices are ordered according to their reverse topological appearance and slices are ordered in descending order. Thus also the total order of the variables is reverse topological. As a consequence the polynomials $G_{i}(C)$, which are introduced in line 5, automatically form a D-Gröbner basis. Each AIG node represents an AND-gate between two inputs, which may or may not be inverted. Consequently three different polynomials are possible, as can be seen in Fig. 4. For each node we introduce the corresponding polynomial with $x, a, b$ replaced by the corresponding variables. We further add for each output $s_{i}$ a linking polynomial $-s_{i}+g_{k}$ to clearly mark which AIG node represents an output bit. All these polynomials mark our initial constraint set, i.e., the set $G(C)$ of Def. 3.

We apply syntactic pattern matching to identify whether the partial products are generated using a Booth encoding. Patterns which define Booth encoding usually stretch over more than one slice and we want to eliminate these nodes during "Global-Elimination" to reduce the size of the carries.

Before we apply "Global-Elimination" we locally eliminate variables in the sliced Gröbner bases $G_{i}(C)$. We described in [14] a procedure which allows us to locally eliminate variables without violating the D-Gröbner basis property.

Theorem 1 ( [14]). Let $P \subseteq \mathbb{Z}[X]$ be a D-Gröbner basis of $\langle P\rangle$ with UMLT. Let $q \in P$ be a polynomial with $\operatorname{lt}(q)=z$ and no other polynomial $p \in P$ contains $z$. Then $P \backslash\{q\}$ is a D-Gröbner basis with UMLT for the ideal $J=I \cap \mathbb{Z}[X \backslash\{z\}]$.

```
Algorithm 4: Local-Elimination
    Input : Ordered sliced Gröbner bases \(G_{i}\), constant \(l\)
    Output: Simplified ordered sliced Gröbner bases \(G_{i}\)
    \(p_{0}, \ldots, p_{m} \leftarrow\) Ordered-List-of-Polynomials \(\left(G_{i}(C)\right)\);
    \(\tau \leftarrow 1\);
    while \(\tau\) do
        \(\tau \leftarrow 0 ;\)
        for \(j \leftarrow 0\) to \(m\) do
            if Check-for-Elimination \(\left(p_{j}\right)\) then
                \(q \leftarrow\) Find-Parent-Polynomial \(\left(p_{0}, \ldots, p_{j-1}\right)\);
                \(q \leftarrow\) D-reduction \(\left(q, p_{j}, l\right)\);
                \(G_{i}(C) \leftarrow G_{i}(C) \backslash\left\{p_{j}\right\} ;\)
                \(\tau \leftarrow 1 ;\)
    return \(G_{i}(C)\)
```

```
Algorithm 5: D-reduction in AMULET
    Input : Two polynomials \(p\) and \(q \in \mathbb{Z}[X]\), constant \(l\)
    Output: Remainder \(r\) of D-reducing \(p\) modulo \(q\)
    \(p_{d} \leftarrow\) Divide-by- \(\operatorname{lm}(p, q)\);
    \(2 p_{m} \leftarrow \operatorname{Multiply}\left(p_{d}, q, l\right)\);
    \(r \leftarrow \operatorname{Add}\left(p, p_{m}, l\right)\);
    return \(r\)
```

We use the conclusion of Thm. 1 as follows. Assume $z \in X$ shall be eliminated and let $p, q \in G_{i}(C)$ be such that $\operatorname{lt}(p)=z$ and $z$ is contained in $q$. To eliminate $z$ of $G_{i}(C)$, we D-reduce $q$ by $p$ and subsequently delete the polynomial $p$.

The pseudo-code for "Local-Elimination" is shown in Alg. 4. We iterate over the polynomials $p_{j}$ in $G_{i}(C)$ and check whether the variable of the leading term is a candidate for local elimination, i.e. we check if the variable is contained in exactly one other polynomial of the same slice and if it is not marked as a carry variable. If both checks succeed, we search for the polynomial $q$, which contains the leading term of $p_{j}$ and apply D-reduction of $q$ by $p_{j}$. Since the polynomials are ordered, we only have to consider polynomials $p_{i}>p_{j}$ in this search.

Algorithm 5 shows how D-reduction is implemented in AMULET. In "Divide-by-lm" we use the UMLT property. Let $v=\operatorname{lt}(q)$. We iterate over the monomials $m$ in $p$ and check whether $v$ is contained in $m$. If $v$ is contained in $m$ we generate a monomial $m^{\prime}$ which consists of all variables of $m$ different from $v$. Furthermore we set coeff $\left(m^{\prime}\right)=\operatorname{coeff}(m)$. All these generated monomials $m^{\prime}$ are summed up and define the polynomial $p_{d}$. The operations "Multiply" and "Add" correspond to the elementary polynomial operations. For addition we use the fact that polynomials are ordered lists of monomials. We iterate over the two polynomials simultaneously and merge them in an interleaved way. More precisely, we start at the leading monomials of $p$ and $p_{m}$ and compare them. If the monomials are different, we add the larger monomial to $r$. If the monomials are equal we generate a new monomial, which has the same term and the coefficient is the sum of the two coefficients. This way we ensure that $r$ is again ordered. For multiplication we multiply each monomial of $p_{d}$ with each monomial of $q$ and sort the calculated monomials. In both operations we directly divide the calculated coefficients by the constant $l$ in order to achieve reduction by $l$. We further handle reduction by $B\left(G_{i}\right)$ implicitly, i.e., we replace $x^{i}$ by $x$ during multiplication too, for all $i>0$.

Example 2. Let $p=-a+2 b c-b d$ and $q=-b+2 x y \in \mathbb{Z}_{4}[X]$. The intermediate results of Alg. 5 are $p_{d}=2 c-d, p_{m}=-2 b c+b d-d x y$ and $r=-a-d x y$.

Let us continue the discussion of Alg. 4. The polynomial $q$ is replaced by the remainder of the D-reduction step and the polynomial $p_{j}$ is eliminated from the sliced Gröbner basis $G_{i}(C)$. We repeat variable elimination until no more polynomial in $G_{i}(C)$ can be considered for local elimination, i.e., all variables of its leading term are either carries or contained in multiple polynomials of the same slice. The rewritten Gröbner basis $G_{i}(C)$ is returned.

Now consider Alg. 3, where we repeat "Local-Elimination" for all sliced Gröbner bases $G_{i}(C)$. In "Global-Elimination" we eliminate the variables which were previously marked, independently how often they occur of whether they are carries. To this end we have to iterate over all polynomials in $G(C)$, finding their parent polynomials for D-reduction.

After variable elimination we apply the incremental checking algorithm as presented in Alg. 1. We start with $s_{2 n-1}$ and apply D-reduction by the polynomials in $G_{2 n-1}(C)$. In order to consider each polynomial of a slice only once for D-reduction, we D-reduce by the polynomials in $G_{i}(C)$ in reverse topological order. After we applied D-reduction by all polynomials of a slice we add to the remainder $C_{i}$ the partial products and output bit of the next smaller slice in order to derive $C_{i}+\alpha_{i-1} 2^{i-1} s_{i-1}-2^{i} P_{i-1}$. After reducing by $G_{0}(C)$, we check whether the final result is 0 .

## 5 Proof Generation

Formal verification derives correctness of a given system. However, the process of verification as well as the implementation might not be bug-free. A common approach to increase the confidence in automated reasoning tools is to generate proof certificates, which are checked by independent proof checkers.

For example, providing certificates of unsatisfiability is mandatory in the SAT competition since 2013. Generating and checking proofs efficiently is a lively research topic in the SAT community and several proof formats such as RUP [9], DRUP [11], DRAT [10] and LRAT [6] are available.

In order to provide proof certificates for reasoning tools using computer algebra we developed in [21] a proof format, called practical algebraic calculus (PAC), which is based on the polynomial calculus [5] and captures whether a polynomial is contained in the ideal generated by a given set of polynomials.

Our tool AMULET is able to generate proof certificates in the PAC format [21] to validate the result of Alg. 3. These proofs can be checked by our independent proof checker PacTrim [21]. We write proofs as sequences, where each rule is of the following form:

$$
d+: p_{i}, p_{j}, p_{i}+p_{j} ; \quad \begin{aligned}
& p_{i}, p_{j} \text { appearing earlier in the proof or are contained in } G(C) \cup\{l\} \\
& \text { and } p_{i}+p_{j} \text { being reduced by } B(X)
\end{aligned}
$$

$$
d *: p_{i}, q, q p_{i} ; \quad \begin{array}{ll}
p_{i} \text { appearing earlier in the proof or is contained in } G(C) \cup\{l\} \\
& \text { and } q \in R[X] \text { being arbitrary and } q p_{i} \text { being reduced by } B(X)
\end{array}
$$

These rules model the properties of an ideal, as given in Def. 4. Thus every conclusion polynomial $p=p_{i}+p_{j}$ or $p=q p_{i}$ is an element of $\langle G(C) \cup\{l\}\rangle$. We extend the proof rules by an optional deletion information $d$, similar to clause deletion in [11]. If $d$ occurs in a proof rule the antecedents $p_{i}$ and $p_{j}$ are deleted from the inference set, which helps to reduce the memory usage of PaCTRIm.

We do not explicitly write down proof rules when reducing a Boolean value constraint. Similar to verification, reduction by $B(X)$ is computed implicitly, e.g., $*: x, x, x$; is a valid proof rule.

Definition 11. The length of a PAC proof is defined as the number of generated proof rules. The size is determined as the total number of monomials in the conclusion polynomials, counted with repetition and degree defines the maximum degree seen in the conclusion polynomials.

PAC proofs are generated in AMULET as follows. The set of polynomials $G(C) \cup\{l\}$, which are defined in line 5 of Alg. 3 determines the initial constraints set. The specification $\mathcal{L}$ defines the target polynomial, i.e., the polynomial which is checked whether it is inferred by the proof rules. Proof rules have to be generated whenever polynomials are manipulated, that is for variable elimination, either locally or globally and during incremental reduction in Alg. 3.

For variable elimination we produce proof rules which simulate D-reduction of a polynomial $p$ by a polynomial $q$, cf. Alg. 5. Note that $p$ and $q$ are both contained in $G(C)$ and thus appear earlier in the proof. In general two rules are generated, a multiplication rule and an addition rule:

$$
(d) *: q, p_{d}, p_{m} ; \quad d+: p, p_{m}, r
$$

In AMULET reducing the polynomials $p_{m}$ and $r$ by the constant $l$ is handled implicitly. However to generate a complete PAC proof, we need to generate explicit proof rules which model D-reduction of $p_{m}$ and $r$ by the constant $l$.

After a polynomial $q$ was used for D-reduction during "Local-Elimination" we know, that we do not have to consider $q$ anymore, as $p$ was the only polynomial containing the leading variable of $q$. Thus we can delete $q$ from the constraints set, which we indicate by the optional parameter " $d$ ". For "GlobalElimination" we have to be more careful with deletion, as the polynomial $q$ may be used multiple times for elimination. In both cases we eliminate $p$ as we want to continue with the rewritten polynomial $r$.

For monitoring the incremental reduction we also have to generate proof rules which simulate Dreduction of $p$ by $q$. However in contrast to variable elimination, $p$ is not part of the constraints set and thus the addition rule would raise an error. On the other hand recall that all elements of an ideal can be represented as a linear combination of the generators of the ideal, cf. Def 4. To simulate the linear combination we generate a multiplication PAC rule $(d) *: p_{d}, q, p_{m}$; for each D-reduction step and store the computed factor $p_{m}$. After finishing D-reduction of a slice $S_{i}$, we sum up all the generated factors $p_{m}$ to derive the carry recurrence relations. After deriving all carry recurrence relations we sum them up and if the circuit is correct the final polynomial is the specification of the circuit. In both cases we sum up the polynomials in a tree-like approach, i.e., $\left(\left(p_{1}+p_{2}\right)+\left(p_{3}+p_{4}\right)\right)$ which is more beneficial compared to summing up the polynomials in order $\left(\left(\left(p_{1}+p_{2}\right)+p_{3}\right)+p_{4}\right)$ as this keeps the number of monomials in the intermediate summands smaller.

## 6 Proof Size

Proof complexity aims to analyze computational resources and allows to reason about the performance of solvers. In this section we want to elaborate the efficiency of AMULET and investigate the complexity of the generated proofs. In particular we are interested in the proof length, proof size and degree.

Proof complexity for multiplier circuits is for example studied in [1], where it is shown that verifying ring properties, e.g., commutativity of multiplication, admit polynomial resolution proofs for simple multipliers. Motivated by this result we experimentally show in [21] that checking commutativity of simple multipliers generates PAC proofs of quadratic length and cubic size. However these proofs are generated using existing computer algebra systems [22].

In this section we investigate the complexity of the proofs generated by AMULET for specific family of multipliers, more precisely btor-multipliers, which implement multiplication of unsigned integers. These multipliers are generated by Boolector [19] and have a simple architecture as can be seen in Fig. 5a for input bitwidth 4. They are also used in the experiments of [21] and correspond to the array multipliers as defined in [1]. In contrast to [1,21] we investigate the complexity for verifying the correctness of the circuit. For the proof length and degree we can give a precise bound while for proof size we derive an upper bound.


Figure 5: The architecture of btor-multipliers and their representation as AIGs.

Lemma 2. Let $C$ be a $n$-bit btor-multiplier. $C$ contains $n$ half-adders and $n^{2}-2 n$ full-adders.
Proof. As Fig. 5a shows, we can clearly identify rows and columns in the btor-multipliers. Let $A b_{i}$ denote the sequence of all partial products $a_{j} b_{i}$ for $0 \leq j \leq n-1$. The first row of full- and half-adders (as seen from the circuit inputs) in $C$ sum up the partial products $A b_{0}$ and $A b_{1}$. In row $k$ for $k \geq 2$ the partial products $A b_{k}$ are added to the sum-outputs of the adders of row $k-1$. Thus a btor-multiplier consists of $n-1$ rows.

In row $k$ with $k \geq 2$ the adders sum up two bitvectors of length $n$, which requires $n$ adders. As we do not have an incoming carry the first adder is a half-adder and the remaining $n-1$ adders are fulladders. In the first row the partial product $a_{0} b_{0}$ is directly processed to be output $s_{0}$. Thus only $2 n-1$ bits are summed up, which requires 2 half-adders and $n-2$ full-adders. Consequently btor-multipliers have $(n-2)(n-1)+n-2=n^{2}-2 n$ full-adder and $n-2+2=n$ half-adders.

Lemma 3. Let $C$ be a btor-multiplier of input bitwidth $n$. The number of variables is $8 n^{2}-7 n$ and the size of $G(C)$ is $8 n^{2}-9 n$.

Proof. The total number of variables consists of the input variables $a_{0}, \ldots, a_{n-1}, b_{0}, \ldots, b_{n-1}$, output variables $s_{0}, \ldots, s_{2 n-1}$ and the internal variables $g_{0}, \ldots, g_{k}$. It is easy to see that we need $4 n$ variables for the inputs and outputs. The internal nodes either represent partial products or they represent internal nodes of full- and half-adders, cf. Fig. 5d,5b and 5c. Generating a partial product needs one variable $p_{i j}=a_{i} b_{j}$, thus $n^{2}$ variables are needed to identify the partial products. According to Lemma 2 btormultipliers have $n$ half-adders, each consisting of 3 nodes and $n^{2}-2 n$ full-adders consisting of 7 nodes. Hence the total number of variables is $4 n+n^{2}+3 n+7\left(n^{2}-2 n\right)=8 n^{2}-7 n$.

Each variable, despite of the $2 n$ input variables, generates either a gate polynomial or a linking polynomial. Thus we have $8 n^{2}-9 n$ polynomials.

For proof length we measure the number of generated PAC rules. Because of the specific structure of btor-multipliers D-reduction by the constant $l=2^{2 n}$ is not necessary. Thus each D-reduction step in variable elimination and in the incremental reduction algorithm produces at most two proof rules, namely one multiplication rule and one addition rule. Furthermore the partial products are generated using AND-gates, thus "Global-Elimination" is not necessary and proof rules are only generated in "Local-Elimination" and "Incremental-Reduction". Hence each gate constraint in $G(C)$ is considered only once for D-reduction and thus we have an upper bound of $2\left(8 n^{2}-9 n\right)=16 n^{2}-18 n$ proof rules. This bound is not tight as the following lemma shows.

Theorem 2. The proof length of $n$-bit btor-multipliers produced in AMULET is $16 n^{2}-20 n-1$.
Proof. In "Local-Elimination" all AIG nodes $g_{k}$, which occur in the linking polynomials $-s_{i}+g_{k}$ are eliminated. The variable $g_{k}$ which links $s_{2 n-1}$ is not eliminated, as it acts as a carry. Since the coefficient of the variables $g_{k}$ in the linking polynomials is 1 , only the addition rule is required for D -reduction. We have $2 n-1$ such rules.

In the full- and half-adders the variables with only one parent get eliminated, that is $g_{1}$ and $g_{4}$ in Fig. 5b and $g_{1}$ in Fig. 5c. In total $2\left(n^{2}-2 n\right)+n=2 n^{2}-3 n$ variables are eliminated and each of these eliminations requires two rules. Hence "Local-Elimination" totally requires $2 n-1+2\left(2 n^{2}-3 n\right)=$ $4 n^{2}-4 n-1$ proof rules.

For "Incremental-Reduction" we need to consider the multiplication rules as well as the summation rules. After variable elimination $8 n^{2}-9 n-(2 n-1)-\left(2 n^{2}-3 n\right)=6 n^{2}-8 n+1$ polynomials remain in $G(C)$. Each of them, except for the single polynomial $-s_{0}+a_{0} b_{0}$ in $S_{0}$ produces a multiplication rule. Thus $6 n^{2}-8 n$ multiplication rules are generated.

The $6 n^{2}-8 n$ factors plus the polynomial $-s_{0}+a_{0} b_{0}$, are summed up slice-wise to produce the carry recurrence relations. The sum of these carry recurrence relations produces the multiplier specification. Thus $6 n^{2}-8 n$ additions are necessary. Collecting all the generated proof rules leads to the final number of $4 n^{2}-4 n-1+2\left(6 n^{2}-8 n\right)=16 n^{2}-20 n-1$ proof rules.

Theorem 3. The degree of the PAC proof of $n$-bit btor-multipliers is 3 .
Proof. The degree of the polynomials in the initial constraints set is at most 2 , since the degree of the polynomials induced by AIG nodes is 2 and the linking polynomials have degree 1.

The degree of the PAC proof can only increase in multiplication rules. In the remainder of the proof we will heavily use the annotation of the variables as in Fig. 5b and Fig. 5c.

In "Local Elimination" we eliminate $g_{1}$ and $g_{4}$ from the full-adders and $g_{1}$ from the half-adders. As they have the same internal structure, we only discuss elimination of $g_{1}$ from half-adders. To eliminate $g_{1}$, the polynomial $p=-s+(1-c)\left(1-g_{1}\right)$ is D-reduced by $q=-g_{1}+(1-x)(1-y)$. Hence "Divide-by-lm" of Alg. 5 yields $p_{d}=c-1$. Consequently, the resulting polynomial of multiplying $q$ and $p_{d}$ is $p_{m}=-g_{1} c+g_{1}+x y c-x y-x c+x-y c+y+c-1$ and has degree 3 .

In the slicing algorithm btor-multipliers are partitioned in such a way, that all nodes of a full- and half-adder belong to the same slice. Thus the internal nodes of full- and half-adders are reduced in sequence, which has the consequence, that summing up the factored gate polynomials of internal adder nodes yields the adder specifications $2(1-c)+s=x+y+z$ for full-adders and $2 c+s=x+y$ for half-adders. We use this observation to determine the degree of the factors.

We first discuss half-adders. After local elimination of $g_{1}$ half-adders are modeled by the polynomials $-s+(c-1)(x y-x-y)$ and $-c+x y$. The following multiplication rules are generated during incremental reduction. The constant $\alpha$ depends on the slice in which the half-adder belongs. Both conclusion polynomials have degree 3 and adding them yields the specification of a half-adder.

```
*: - s + ( c-1)(xy-x-y), \alpha, -\alpha s+\alphacxy-\alphacx-\alphacy-\alphaxy+\alphax+\alphay;
*: - c+xy, }\alpha(\textrm{xy}-\textrm{x}-\textrm{y}+2),-\alpha\textrm{cxy}+\alpha\textrm{cx}+\alpha\textrm{cy}-2\alphac+\alpha\textrm{xy
```

For full-adders the following factors are generated. All of them have at most degree 3.

```
*: - s + (g}\mp@subsup{\textrm{g}}{5}{}-1)(\mp@subsup{\textrm{g}}{3}{}\textrm{X}-\mp@subsup{\textrm{g}}{3}{}-\textrm{x}),\quad\alpha,\quad-\alpha\textrm{s}+\alpha\mp@subsup{\textrm{g}}{5}{}\mp@subsup{\textrm{g}}{3}{}\textrm{X}-\alpha\mp@subsup{\textrm{g}}{5}{}\textrm{X}-\alpha\mp@subsup{\textrm{g}}{5}{}\mp@subsup{\textrm{g}}{3}{}-\alpha\mp@subsup{\textrm{g}}{3}{}\textrm{X}+\alpha\mp@subsup{\textrm{g}}{3}{}+\alpha\textrm{X}
*: - c+(1- (1) (1- g
```



```
*: - g}
*: - g}\mp@subsup{2}{2}{}+\textrm{yz},\quad\alpha(\textrm{yz}-\textrm{y}-\textrm{z}+2),\quad-\alpha\mp@subsup{g}{2}{}\textrm{yz}+\alpha\mp@subsup{g}{2}{}\textrm{y}+\alpha\mp@subsup{\textrm{g}}{2}{}\textrm{z}-2\alpha\mp@subsup{\textrm{g}}{2}{}+\alpha\textrm{yz}
```

All polynomials, which model partial products are only multiplied by constants. Thus we never generated a polynomial which has a degree larger than 3 .

In contrast to proof length and degree we are only able to determine an upper bound for proof size.
Theorem 4. The proof size of $n$-bit btor-multipliers is in $\mathcal{O}\left(n^{2} \log (n)\right)$.
Proof. As in the previous proofs we distinguish between "Local Elimination" and "Incremental Reduction". Eliminating $g_{k}$ from the $2 n$ linking polynomials $-s_{i}+g_{k}$ needs only one addition. The conclusion polynomial has at most 5 monomials, since each gate constraint contains at most 5 monomials.

Elimination of $g_{1}$ and $g_{4}$ in the full-adders and $g_{1}$ in the half-adders produces one multiplication rule and one addition rule. In the proof of Thm. 3 we listed the conclusion polynomial $p_{m}$ of the multiplication, which has size 10 . Adding $p_{m}$ to $-s+\left(1-g_{1}\right)(1-c)$ yields a polynomial with 7 monomials. Since we eliminate two variables from each full-adder and one variable from each half-adder, we eliminate $2 n^{2}-3 n$ variables. Each elimination produces 17 monomials. Thus "Local Elimination" produces at most $5(2 n)+17\left(2 n^{2}-3 n\right)=34 n^{2}+41 n$ monomials.

In "Incremental Reduction" we need to consider the multiplication rules as well as the addition rules which add up the polynomials slice-wise and then totally to gain the word-level specification.

The $n^{2}$ polynomials defining the partial products are multiplied by constants $2^{i}$, thus each conclusion polynomial has 2 monomials. We have already written down each multiplication rule for the full- and half-adders in the proof of Thm. 3. Counting the monomials yields 32 monomials for each full-adder and 12 monomials for each half-adder. Thus in total $2 n^{2}+32\left(n^{2}-2 n\right)+12 n=34 n^{2}-52 n$ monomials are needed in the multiplication rules.

After the factors are generated, they are added up in a tree-like approach, as discussed at the end of Sect. 5. If $m$ polynomials are added, the depth of the corresponding addition tree is $\lceil\log (m)\rceil+1$.

First the polynomials within one slice are summed up. The biggest slice is $S_{n-1}$, which contains $n-2$ full-adders, 1 half-adder and $n$ partial products. Thus in total $6 n-8$ polynomials are added. For simplicity we drop the constant and assume $6 n$ polynomials are added. The depth of the tree is $\lceil\log (6 n)\rceil+1<\lceil\log (6)\rceil+\lceil\log (n)\rceil+1<\lceil\log (n)\rceil+4$.

It can be seen in the proof of Thm. 3, that each polynomial contains at most 8 monomials. Thus the initial layer of the addition tree has at most $48 n$ monomials. Let us assume adding two polynomials does not cancel any monomials. Thus in layer $i$ of the addition tree, the polynomials have $2^{i} .8$ monomials. Since each layer has $\frac{1}{2^{i}}(6 n)$ polynomials, the total number of monomials for each layer is $48 n$. Adding up one slice produces at most $48 n(\lceil\log (n)\rceil+4)=48 n\lceil\log (n)\rceil+192 n$ monomials. Since we have $2 n$ slices, we have at most $96 n^{2}\lceil\log (n)\rceil+384 n^{2}$ monomials.

We add up these carry recurrence relations to gain the word-level specification. We have $2 n$ carry recurrence relations and each of them contains one monomial for the output variable $s_{i}$, at most $n-$ 1 monomials for the incoming carries and $n-1$ monomials for the outgoing carries and at most $n$ partial products and one constant monomial. Adding two consecutive carry recurrence relations cancels the matching outgoing and incoming carries. Thus after adding two initial polynomials, the resulting polynomials contains 2 monomials for the output bits, at most $2 n-2$ monomials related to carries and at most $2 n$ partial products and a constant. Let $m=\lceil\log (2 n)\rceil \leq\lceil\log (n)\rceil+1$. We have $m+1$ addition layers and each layer contains $\frac{2 n}{2^{i}}$ polynomials. Thus the upper bound of monomials is

$$
\begin{aligned}
&\sum_{i=0}^{m} \frac{2 n}{2^{i}} \overbrace{2^{i}}^{\text {output }}+\overbrace{n-1}^{\text {carry in }}+\overbrace{n-1}^{\text {carry out }}+\overbrace{2^{i} n}^{\text {p.products }}+\overbrace{1}^{\text {constant }})=2 n \sum_{i=0}^{m}(n+1)+2 n(2 n-1) \sum_{i=0}^{m} \frac{1}{2^{i}}< \\
& 2 n(n+1)(\lceil\log (n)\rceil+2)+4 n^{2}\left(2-\frac{1}{4 n}\right)=2 n^{2}\lceil\log (n)\rceil+2 n\lceil\log (n)\rceil+12 n^{2}+3 n .
\end{aligned}
$$

Altogether our upper bound yields the polynomial $98 n^{2}\lceil\log (n)\rceil+2 n\lceil\log (n)\rceil+464 n^{2}-8 n$ and thus we are in $\mathcal{O}\left(n^{2} \log (n)\right)$.


Figure 6: Proof size for $n=[2,512]$.


Figure 7: Relative errors of upper bounds for $n=[2,512]$.

We could clearly improve this bound, as we now considered all $2 n$ slices to contain the same number of polynomials as the largest slice. Furthermore monomials do cancel when the polynomials within a slice are summed up. The real proof size as well as the estimated upper bound can be seen in Fig. 6. We further added the function $50 n^{2} \log (n)$ in the plots, which also seems to be sufficient as an upper bound. In Fig. 7 we show the relative errors of the upper bounds.

## 7 Conclusion

In this paper we presented our tool AMULET, a state-of-the-art tool to automatically verify and certify the correctness of large gate-level integer multipliers. We gave an introduction into the problem of arithmetic circuit verification and discussed our state-of-the-art solving method which combines SAT and computer algebra. Certain parts, more precisely complex final stage adders, of the multiplier are detected and replaced by simple ripple-carry adders. The correctness of the replacement is checked by SAT solvers and the rewritten multiplier is verified using computer algebra. We presented details of the underlying algorithms to detect final stage adders and rewrite the multipliers, originally introduced in [14]. Furthermore we reconsidered our incremental verification algorithm and discussed the procedure of generating proof certificates. For one specific simple type of multipliers we showed that we are able to generate proof certificates with length in $\mathcal{O}\left(n^{2}\right)$ and size in $\mathcal{O}\left(n^{2} \log (n)\right)$. In the future we want to be able to extend our methods to synthesized multipliers where technology mapping is applied. Investigating floating points and other word-level operators is interesting future work too.

## References

[1] Paul Beame and Vincent Liew. Toward verifying nonlinear integer arithmetic. J. ACM, 66(3):22:1-22:30, June 2019.
[2] Thomas Becker, Volker Weispfenning, and Heinz Kredel. Gröbner Bases. Springer, 1993.
[3] B. Buchberger. Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach einem nulldimensionalen Polynomideal. PhD thesis, University of Innsbruck, 1965.
[4] M. Ciesielski, T. Su, A. Yasin, and C. Yu. Understanding Algebraic Rewriting for Arithmetic Circuit Verification: a Bit-Flow Model. IEEE TCAD, pages 1-1, 2019.
[5] Matthew Clegg, Jeffery Edmonds, and Russell Impagliazzo. Using the groebner basis algorithm to find proofs of unsatisfiability. In STOC, pages 174-183. ACM, 1996.
[6] Luís Cruz-Filipe, João Marques-Silva, and Peter Schneider-Kamp. Formally verifying the solution to the boolean pythagorean triples problem. J. Autom. Reasoning, 63(3):695-722, 2019.
[7] Wolfram Decker, Gert-Martin Greuel, Gerhard Pfister, and Hans Schönemann. Singular 4-1-0. http: //www.singular.uni-kl.de, 2016.
[8] Torbjörn Granlund et al. GNU MP: The GNU Multiple Precision Arithmetic Library, 2016. http:// gmplib.org/.
[9] Allen Van Gelder. Verifying RUP proofs of propositional unsatisfiability. In ISAIM, 2008.
[10] Allen Van Gelder. Producing and verifying extremely large propositional refutations - have your cake and eat it too. Ann. Math. Artif. Intell., 65(4):329-372, 2012.
[11] Marijn Heule, Warren A. Hunt Jr., and Nathan Wetzler. Trimming while checking clausal proofs. In FMCAD 2013, pages 181-188. IEEE, 2013.
[12] Warren A. Hunt, Matt Kaufmann, J Strother Moore, and Anna Slobodova. Industrial hardware and software verification with acl2. Philos. Trans. Royal Soc. A, 375:20150399, 102017.
[13] Daniela Kaufmann, Armin Biere, and Manuel Kauers. Incremental Column-wise verification of arithmetic circuits using computer algebra. Formal Methods in System Design, Feb 2019.
[14] Daniela Kaufmann, Armin Biere, and Manuel Kauers. Verifying Large Multipliers by Combining SAT and Computer Algebra. In FMCAD 2019, To appear.
[15] Daniela Kaufmann, Manuel Kauers, Armin Biere, and David Cok. Arithmetic Verification Problems Submitted to the SAT Race 2019. In Proc. of SAT Race 2019, 2019. Submitted.
[16] Andreas Kuehlmann, Viresh Paruthi, Florian Krohm, and Malay Ganai. Robust Boolean reasoning for equivalence checking and functional property verification. IEEE TCAD, 21(12):1377-1394, 2002.
[17] Alireza Mahzoon, Daniel Große, and Rolf Drechsler. PolyCleaner: clean your polynomials before backward rewriting to verify million-gate multipliers. In Iris Bahar, editor, ICCAD, page 129. ACM, 2018.
[18] Alireza Mahzoon, Daniel Große, and Rolf Drechsler. RevSCA: Using Reverse Engineering to Bring Light into Backward Rewriting for Big and Dirty Multipliers. In Design Automation Conf., 2019. In press.
[19] Aina Niemetz, Mathias Preiner, Clifford Wolf, and Armin Biere. Btor2, BtorMC and Boolector 3.0. In Computer Aided Verification, CAV, volume 10981 of LNCS, pages 587-595. Springer, 2018.
[20] Behrooz Parhami. Computer Arithmetic - Algorithms and Hardware designs. Oxford University Press, 2000.
[21] Daniela Ritirc, Armin Biere, and Manuel Kauers. A Practical Polynomial Calculus for Arithmetic Circuit Verification. In SC-Square Workshop 2018, pages 61-76. CEUR-WS, 2018.
[22] Wolfram Research, Inc. Mathematica, 2016. Version 10.4.
[23] Cunxi Yu, Maciej J. Ciesielski, and Alan Mishchenko. Fast Algebraic Rewriting Based on And-Inverter Graphs. IEEE TCAD, 37(9):1907-1911, 2018.


[^0]:    *Supported by Austrian Science Fund (FWF), NFN S11408-N23 (RiSE), P31571-N32, SFB F5004, LIT AI Lab funded by the state of Upper Austria.

