Verifying a specification
Given a specification, Vehicle can be used to check whether a particular neural network satisfies it.
To do this, Vehicle relies on a type of tool called neural network verifiers. As input, these tools take in a query, a set of equality and inquality constraints relating the inputs and outputs of the neural network. They then attempt to find to an input that satisfies all the constraints at the same time. They can therefore be thought of as domain-specific SAT solvers.
However, these queries are written in a very low-level format and are often extremely large. Furthermore they often don’t support operations such as disjunction. Consequently they are very difficult to write or read, and a single high-level property often needs to be split up into many queries in a non-obvious manner.
However, Vehicle is capable of:
generating these queries automatically from a specification.
2. calling a verifier on the generated queries to obtain whether each query is satisfiable. and using this information to reconstruct the truth status for the original property.
Vehicle can be uesd to perform these two steps individually (via compile
and verify
) or sequentially (via compileAndVerify
).
Generating verifier queries
Verifier queries can be generated by passing a suitable target
to
the vehicle compile
command, e.g.
vehicle compile \
--target MarabouQueries
--specification my/project/mnist-robustness.vcl \
--output my/project/queries
--network classify:my/project/mnist.onnx \
--dataset trainingImages:my/project/mnist-trainingImages.idx \
--dataset trainingLabels:my/project/mnist-trainingLabels.idx \
--parameter epsilon:0.1 \
This command will create a folder my/project/queries
which contains
all the necessary queries, as well as verification plan plan file
.vcl-plan
. The latter contains all the information
about how the queries relate to the original property they were
compiled from.
The full list of relevant command line options are:
- --target, -t
The compilation target. There is currently one query format supported:
MarabouQueries
.
- --specification, -s
The
.vcl
file containing the specification to compile.
- --declaration, -y
The name of the declarations in the specification to compile. You may provide this option multiple times to compile multiple properties at once. If not provided, then by default all properties in the specification are compiled. When compiling to verify queries, the declaration provided via this option must be annotated with a
@property
annotation.
- --network, -n
Provide the implementation of a network declared in the specification. Its value should consist of a colon-separated pair of the name of the network in the specification and a file path, i.e.
--network NAME:FILEPATH
Can be used multiple times if the specification involves more than one network.
- --dataset, -d
Provide a dataset declared in the specification. Its value should consist of a colon-separated pair of the name of the dataset in the specification and a file path, i.e.
--dataset NAME:FILEPATH
Can be used multiple times if the specification involves more than one dataset.
- --parameter, -p
Provide a parameter referenced in the specification. Its value should consist of a colon-separated pair of the name of the parameter in the specification and its value, i.e.
--parameter NAME:VALUE
Can be used multiple times to provide multiple parameters.
- --output, -o
The output directory in which to store the compiled queries and the verification plan.
Calling the verifier
Given a folder my/project/compiled-queries
containing queries and a verification plan generated
by the vehicle compile
command above, the specification can be verified by using
the vehicle verify
command, e.g.
vehicle verify \
--queryFolder my/project/compiled-queries
--verifier Marabou
The full list of available command line arguments are as follows:
- --queryFolder, -p
The location of the folder containing the queries and verification plan previously generated by Vehicle.
- --verifier, -v
Which verifier should be used to perform the verification. At the moment the only supported option is
Marabou
.
- --verifierLocation, -l
Location of the executable for the verifier. If not provided, then Vehicle will search for the name of the executable in the
PATH
environment variable.
- --proofCache, -c
The location to write out a Vehicle proof cache that provides a permanent record of the results of the verification. This can be be used to later re-check the result in an interactive theorem prover. If this option is not present then no proof cache will be generated.
Warning
The verify
command is not atomic.
Verification involves repeatedly loading the network(s) from disk
and Vehicle will not detect changes to the networks that occur
while the command is running.
CompileAndVerify mode
In practice, you often want to perform the two steps above in sequence and you
don’t care about the queries generated in the middle. For convenience, this
may be performed using the compileAndVerify
mode which stores the queries
in a temporary directory and immediately calls the verifier.
vehicle compileAndVerify \
--specification my/project/mnist-robustness.vcl \
--network classify:my/project/mnist.onnx \
--dataset trainingImages:my/project/mnist-trainingImages.idx \
--dataset trainingLabels:my/project/mnist-trainingLabels.idx \
--parameter epsilon:0.1 \
--verifier Marabou
The table below contains the full list of command line arguments available
for the verify
command.
- --specification, -s
See
--specification
incompile
mode.
- --property, -y
See
--declaration
incompile
mode.
- --network, -n
See
--network
incompile
mode.
- --dataset, -d
See
--dataset
incompile
mode.
- --parameter, -p
See
--parameter
incompile
mode.
- --verifier, -v
See
--verifier
inverify
mode.
- --verifierLocation, -l
See
--verifierLocation
inverify
mode.
- --proofCache, -c
See
--proofCache
inverify
mode.
Re-checking a verification result
There are several reasons why one might want to check the status of a specification
some time after having initially called verify
:
1. The verification could be part of an automated test suite in a continuous integration framework.
2. The specification could have been exported to an interactive theorem prover whose workflow consists of regularly rechecking the validity of proofs.
Unfortunately, depending on the size of the network and the complexity of the specification, verification can be a very expensive procedure taking hours or days. Therefore it is important to avoid unnecessary re-verification.
To solve this problem, the vehicle verify
command can produce a proof cache
file, which contains:
The original text of the specification.
The verification status of the specification.
The values of the provided parameters.
The file paths of the networks and datasets provided to the original
verify
command along with a hash of the contents of each file.
The validate
command can then be run to use the proof cache to check
the status of the specification as follows:
vehicle validate \
--proofCache /my/project/spec.vcl-cache
Vehicle will read the proof cache, and use its contents to find and rehash
the networks and datasets that were used during the original verification
of the specification.
If the new hashes match those stored in the proof cache then the check passes,
otherwise the validate
command will exit with an error.
Note
For obvious reasons, moving or renaming any of the networks or datasets
will result in the validate
command failing.
Limitations of verification
As you might expect, verification is a very hard problem. Therefore there are several limitations that users should be aware of.
Linearity
Quantified variables in the specification must be used in a linear manner. For example, neither of the following is allowed:
@network
f : Vector Rat 2 -> Vector Rat 2
@property
p1 : Bool
p1 = forall x . x * x > 2 => f [ x , 2 ] >= 0.5
@property
p2 : Bool
p2 = forall x y . x * y > 2 => f [ x , y ] >= 0.5
In p1
the variable x
is used to calculate a non-linear value x * x
,
and in p2
the variables x
and y
are used to create a non-linear
value x * y
.
In the case where you do try to verify a non-linear property, Vehicle will use its sophisticated auxiliary type-system to help you pinpoint the source of the non-linearity.
Quantifiers
While verifiers can be used to verify both universal properties (i.e. with ``forall``s) and existential properties (i.e. with ``exists``s) they cannot verify properties with alternating quantifiers where one type of quantifier is used within the scope of the other type of quantifier. Here are some examples.
@network
f : Vector Rat 2 -> Vector Rat 1
@property
good1 : Bool
good1 = forall x . f x ! 0 >= 0.5
@property
good2 : Bool
good2 = exists x . f x ! 0 >= 0.75
Property good1
and good2
can both be verified as they each only use a single
type of quantifier.
@property
bad1 : Bool
bad1 = forall y . exists x . f x == y
In contrast property bad1
cannot be verified as it contains a alternating forall
and exists
.
@property
good3 : Bool
good3 = (forall x . f x ! 0 >= 0.5) and (exists y . f y ! 0 >= 0.75)
However, property good3
can be verified even though it contains both a forall
and an exists
as the quantifiers are not alternating (i.e. it can split into
two to form good1
and good2
.)
@property
bad2 : Bool
bad2 = forall x . not (forall x . f x != y)
Note, that as shown by property bad2
alternating quantifiers is not a syntactic
property but a logical one. This property can also not be verified despite only
containing forall
quantifiers. This is because under the rules of classical
first order logic, bad2
is logically equivalent to bad1
.
In the case where you do try to verify a property with alternating quantifiers, Vehicle will use its sophisticated auxiliary type-system to help you pinpoint the source of the alternation.
Network architecture
Verifiers tend to only support certain layer types and activation functions. At the moment Vehicle doesn’t perform any compatability checking, so please consult the verifier’s own documentation.
Performance
Verification has been shown to be an NP-complete problem so in the worst-case all verification algorithms will take an infeasibly long time to run. However, as with many NP-complete algorithms, in the common case performance can be surprisingly good.
How long it takes to verify a property depends on several factors:
1. The complexity of the property. The more SAT queries that a property is compiled down to, the longer it will take to verify them all. Language features that are likely to increase the number of queries generated are
if
statements
and
statements underneath aforall
quantifier
or
statements underneath aexists
quantifier2. The complexity of the network. The larger the number of nodes in the network, the longer it will take the verifier to run the query. In general, networks with a small number of wide layers will be easier to verify than networks with a large number of narrow layers.
3. How “close” the network is to satisfying each query. If a query is easily satisfiable, or easily non-satisfiable then the verifier will return an answer quickly. The closer to the boundary the network lies with respect to the query, the longer it will take the verifier to make a decision. Unfortunately this is almost impossible to quantify to advance.