|
| 1 | +# Using TensorFlow Securely |
| 2 | + |
| 3 | +This document discusses how to safely deal with untrusted programs (models or |
| 4 | +model parameters), and input data. Below, we also provide guidelines on how to |
| 5 | +report vulnerabilities in TensorFlow. |
| 6 | + |
| 7 | +## TensorFlow models are programs |
| 8 | + |
| 9 | +TensorFlow's runtime system interprets and executes programs. What machine |
| 10 | +learning practitioners term |
| 11 | +[**models**](https://developers.google.com/machine-learning/glossary/#model) are |
| 12 | +expressed as programs that TensorFlow executes. TensorFlow programs are encoded |
| 13 | +as computation |
| 14 | +[**graphs**](https://developers.google.com/machine-learning/glossary/#graph). |
| 15 | +The model's parameters are often stored separately in **checkpoints**. |
| 16 | + |
| 17 | +At runtime, TensorFlow executes the computation graph using the parameters |
| 18 | +provided. Note that the behavior of the computation graph may change |
| 19 | +depending on the parameters provided. TensorFlow itself is not a sandbox. When |
| 20 | +executing the computation graph, TensorFlow may read and write files, send and |
| 21 | +receive data over the network, and even spawn additional processes. All these |
| 22 | +tasks are performed with the permissions of the TensorFlow process. Allowing |
| 23 | +for this flexibility makes for a powerful machine learning platform, |
| 24 | +but it has implications for security. |
| 25 | + |
| 26 | +The computation graph may also accept **inputs**. Those inputs are the |
| 27 | +data you supply to TensorFlow to train a model, or to use a model to run |
| 28 | +inference on the data. |
| 29 | + |
| 30 | +**TensorFlow models are programs, and need to be treated as such from a security |
| 31 | +perspective.** |
| 32 | + |
| 33 | +## Running untrusted models |
| 34 | + |
| 35 | +As a general rule: **Always** execute untrusted models inside a sandbox (e.g., |
| 36 | +[nsjail](https://github.com/google/nsjail)). |
| 37 | + |
| 38 | +There are several ways in which a model could become untrusted. Obviously, if an |
| 39 | +untrusted party supplies TensorFlow kernels, arbitrary code may be executed. |
| 40 | +The same is true if the untrusted party provides Python code, such as the |
| 41 | +Python code that generates TensorFlow graphs. |
| 42 | + |
| 43 | +Even if the untrusted party only supplies the serialized computation |
| 44 | +graph (in form of a `GraphDef`, `SavedModel`, or equivalent on-disk format), the |
| 45 | +set of computation primitives available to TensorFlow is powerful enough that |
| 46 | +you should assume that the TensorFlow process effectively executes arbitrary |
| 47 | +code. One common solution is to allow only a few safe Ops. While this is |
| 48 | +possible in theory, we still recommend you sandbox the execution. |
| 49 | + |
| 50 | +It depends on the computation graph whether a user provided checkpoint is safe. |
| 51 | +It is easily possible to create computation graphs in which malicious |
| 52 | +checkpoints can trigger unsafe behavior. For example, consider a graph that |
| 53 | +contains a `tf.cond` depending on the value of a `tf.Variable`. One branch of |
| 54 | +the `tf.cond` is harmless, but the other is unsafe. Since the `tf.Variable` is |
| 55 | +stored in the checkpoint, whoever provides the checkpoint now has the ability to |
| 56 | +trigger unsafe behavior, even though the graph is not under their control. |
| 57 | + |
| 58 | +In other words, graphs can contain vulnerabilities of their own. To allow users |
| 59 | +to provide checkpoints to a model you run on their behalf (e.g., in order to |
| 60 | +compare model quality for a fixed model architecture), you must carefully audit |
| 61 | +your model, and we recommend you run the TensorFlow process in a sandbox. |
| 62 | + |
| 63 | +## Accepting untrusted Inputs |
| 64 | + |
| 65 | +It is possible to write models that are secure in a sense that they can safely |
| 66 | +process untrusted inputs assuming there are no bugs. There are two main reasons |
| 67 | +to not rely on this: First, it is easy to write models which must not be exposed |
| 68 | +to untrusted inputs, and second, there are bugs in any software system of |
| 69 | +sufficient complexity. Letting users control inputs could allow them to trigger |
| 70 | +bugs either in TensorFlow or in dependent libraries. |
| 71 | + |
| 72 | +In general, it is good practice to isolate parts of any system which is exposed |
| 73 | +to untrusted (e.g., user-provided) inputs in a sandbox. |
| 74 | + |
| 75 | +A useful analogy to how any TensorFlow graph is executed is any interpreted |
| 76 | +programming language, such as Python. While it is possible to write secure |
| 77 | +Python code which can be exposed to user supplied inputs (by, e.g., carefully |
| 78 | +quoting and sanitizing input strings, size-checking input blobs, etc.), it is |
| 79 | +very easy to write Python programs which are insecure. Even secure Python code |
| 80 | +could be rendered insecure by a bug in the Python interpreter, or in a bug in a |
| 81 | +Python library used (e.g., |
| 82 | +[this one](https://www.cvedetails.com/cve/CVE-2017-12852/)). |
| 83 | + |
| 84 | +## Running a TensorFlow server |
| 85 | + |
| 86 | +TensorFlow is a platform for distributed computing, and as such there is a |
| 87 | +TensorFlow server (`tf.train.Server`). **The TensorFlow server is meant for |
| 88 | +internal communication only. It is not built for use in an untrusted network.** |
| 89 | + |
| 90 | +For performance reasons, the default TensorFlow server does not include any |
| 91 | +authorization protocol and sends messages unencrypted. It accepts connections |
| 92 | +from anywhere, and executes the graphs it is sent without performing any checks. |
| 93 | +Therefore, if you run a `tf.train.Server` in your network, anybody with |
| 94 | +access to the network can execute what you should consider arbitrary code with |
| 95 | +the privileges of the process running the `tf.train.Server`. |
| 96 | + |
| 97 | +When running distributed TensorFlow, you must isolate the network in which the |
| 98 | +cluster lives. Cloud providers provide instructions for setting up isolated |
| 99 | +networks, which are sometimes branded as "virtual private cloud." Refer to the |
| 100 | +instructions for |
| 101 | +[GCP](https://cloud.google.com/compute/docs/networks-and-firewalls) and |
| 102 | +[AWS](https://aws.amazon.com/vpc/)) for details. |
| 103 | + |
| 104 | +Note that `tf.train.Server` is different from the server created by |
| 105 | +`tensorflow/serving` (the default binary for which is called `ModelServer`). |
| 106 | +By default, `ModelServer` also has no built-in mechanism for authentication. |
| 107 | +Connecting it to an untrusted network allows anyone on this network to run the |
| 108 | +graphs known to the `ModelServer`. This means that an attacker may run |
| 109 | +graphs using untrusted inputs as described above, but they would not be able to |
| 110 | +execute arbitrary graphs. It is possible to safely expose a `ModelServer` |
| 111 | +directly to an untrusted network, **but only if the graphs it is configured to |
| 112 | +use have been carefully audited to be safe**. |
| 113 | + |
| 114 | +Similar to best practices for other servers, we recommend running any |
| 115 | +`ModelServer` with appropriate privileges (i.e., using a separate user with |
| 116 | +reduced permissions). In the spirit of defense in depth, we recommend |
| 117 | +authenticating requests to any TensorFlow server connected to an untrusted |
| 118 | +network, as well as sandboxing the server to minimize the adverse effects of |
| 119 | +any breach. |
| 120 | + |
| 121 | +## Vulnerabilities in TensorFlow |
| 122 | + |
| 123 | +TensorFlow is a large and complex system. It also depends on a large set of |
| 124 | +third party libraries (e.g., `numpy`, `libjpeg-turbo`, PNG parsers, `protobuf`). |
| 125 | +It is possible that TensorFlow or its dependent libraries contain |
| 126 | +vulnerabilities that would allow triggering unexpected or dangerous behavior |
| 127 | +with specially crafted inputs. |
| 128 | + |
| 129 | +### What is a vulnerability? |
| 130 | + |
| 131 | +Given TensorFlow's flexibility, it is possible to specify computation graphs |
| 132 | +which exhibit unexpected or unwanted behavior. The fact that TensorFlow models |
| 133 | +can perform arbitrary computations means that they may read and write files, |
| 134 | +communicate via the network, produce deadlocks and infinite loops, or run out |
| 135 | +of memory. It is only when these behaviors are outside the specifications of the |
| 136 | +operations involved that such behavior is a vulnerability. |
| 137 | + |
| 138 | +A `FileWriter` writing a file is not unexpected behavior and therefore is not a |
| 139 | +vulnerability in TensorFlow. A `MatMul` allowing arbitrary binary code execution |
| 140 | +**is** a vulnerability. |
| 141 | + |
| 142 | +This is more subtle from a system perspective. For example, it is easy to cause |
| 143 | +a TensorFlow process to try to allocate more memory than available by specifying |
| 144 | +a computation graph containing an ill-considered `tf.tile` operation. TensorFlow |
| 145 | +should exit cleanly in this case (it would raise an exception in Python, or |
| 146 | +return an error `Status` in C++). However, if the surrounding system is not |
| 147 | +expecting the possibility, such behavior could be used in a denial of service |
| 148 | +attack (or worse). Because TensorFlow behaves correctly, this is not a |
| 149 | +vulnerability in TensorFlow (although it would be a vulnerability of this |
| 150 | +hypothetical system). |
| 151 | + |
| 152 | +As a general rule, it is incorrect behavior for TensorFlow to access memory it |
| 153 | +does not own, or to terminate in an unclean way. Bugs in TensorFlow that lead to |
| 154 | +such behaviors constitute a vulnerability. |
| 155 | + |
| 156 | +One of the most critical parts of any system is input handling. If malicious |
| 157 | +input can trigger side effects or incorrect behavior, this is a bug, and likely |
| 158 | +a vulnerability. |
| 159 | + |
| 160 | +### Reporting vulnerabilities |
| 161 | + |
| 162 | +Please email reports about any security related issues you find to |
| 163 | +`[email protected]`. This mail is delivered to a small security team. Your |
| 164 | +email will be acknowledged within one business day, and you'll receive a more |
| 165 | +detailed response to your email within 7 days indicating the next steps in |
| 166 | +handling your report. For critical problems, you may encrypt your report (see |
| 167 | +below). |
| 168 | + |
| 169 | +Please use a descriptive subject line for your report email. After the initial |
| 170 | +reply to your report, the security team will endeavor to keep you informed of |
| 171 | +the progress being made towards a fix and announcement. |
| 172 | + |
| 173 | +In addition, please include the following information along with your report: |
| 174 | + |
| 175 | +* Your name and affiliation (if any). |
| 176 | +* A description of the technical details of the vulnerabilities. It is very |
| 177 | + important to let us know how we can reproduce your findings. |
| 178 | +* An explanation who can exploit this vulnerability, and what they gain when |
| 179 | + doing so -- write an attack scenario. This will help us evaluate your report |
| 180 | + quickly, especially if the issue is complex. |
| 181 | +* Whether this vulnerability public or known to third parties. If it is, please |
| 182 | + provide details. |
| 183 | + |
| 184 | +If you believe that an existing (public) issue is security-related, please send |
| 185 | +an email to `[email protected]`. The email should include the issue ID and |
| 186 | +a short description of why it should be handled according to this security |
| 187 | +policy. |
| 188 | + |
| 189 | +Once an issue is reported, TensorFlow uses the following disclosure process: |
| 190 | + |
| 191 | +* When a report is received, we confirm the issue and determine its severity. |
| 192 | +* If we know of specific third-party services or software based on TensorFlow |
| 193 | + that require mitigation before publication, those projects will be notified. |
| 194 | +* An advisory is prepared (but not published) which details the problem and |
| 195 | + steps for mitigation. |
| 196 | +* The vulnerability is fixed and potential workarounds are identified. |
| 197 | +* Wherever possible, the fix is also prepared for the branches corresponding to |
| 198 | + all releases of TensorFlow at most one year old. We will attempt to commit |
| 199 | + these fixes as soon as possible, and as close together as possible. |
| 200 | +* Patch releases are published for all fixed released versions, a |
| 201 | + notification is sent to [email protected], and the advisory is published. |
| 202 | + |
| 203 | +Note that we mostly do patch releases for security reasons and each version of |
| 204 | +TensorFlow is supported for only 1 year after the release. |
| 205 | + |
| 206 | +Past security advisories are listed below. We credit reporters for identifying |
| 207 | +security issues, although we keep your name confidential if you request it. |
| 208 | + |
| 209 | +#### Encryption key for `[email protected]` |
| 210 | + |
| 211 | +If your disclosure is extremely sensitive, you may choose to encrypt your |
| 212 | +report using the key below. Please only use this for critical security |
| 213 | +reports. |
| 214 | + |
| 215 | +``` |
| 216 | +-----BEGIN PGP PUBLIC KEY BLOCK----- |
| 217 | +
|
| 218 | +mQENBFpqdzwBCADTeAHLNEe9Vm77AxhmGP+CdjlY84O6DouOCDSq00zFYdIU/7aI |
| 219 | +LjYwhEmDEvLnRCYeFGdIHVtW9YrVktqYE9HXVQC7nULU6U6cvkQbwHCdrjaDaylP |
| 220 | +aJUXkNrrxibhx9YYdy465CfusAaZ0aM+T9DpcZg98SmsSml/HAiiY4mbg/yNVdPs |
| 221 | +SEp/Ui4zdIBNNs6at2gGZrd4qWhdM0MqGJlehqdeUKRICE/mdedXwsWLM8AfEA0e |
| 222 | +OeTVhZ+EtYCypiF4fVl/NsqJ/zhBJpCx/1FBI1Uf/lu2TE4eOS1FgmIqb2j4T+jY |
| 223 | +e+4C8kGB405PAC0n50YpOrOs6k7fiQDjYmbNABEBAAG0LVRlbnNvckZsb3cgU2Vj |
| 224 | +dXJpdHkgPHNlY3VyaXR5QHRlbnNvcmZsb3cub3JnPokBTgQTAQgAOBYhBEkvXzHm |
| 225 | +gOJBnwP4Wxnef3wVoM2yBQJaanc8AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA |
| 226 | +AAoJEBnef3wVoM2yNlkIAICqetv33MD9W6mPAXH3eon+KJoeHQHYOuwWfYkUF6CC |
| 227 | +o+X2dlPqBSqMG3bFuTrrcwjr9w1V8HkNuzzOJvCm1CJVKaxMzPuXhBq5+DeT67+a |
| 228 | +T/wK1L2R1bF0gs7Pp40W3np8iAFEh8sgqtxXvLGJLGDZ1Lnfdprg3HciqaVAiTum |
| 229 | +HBFwszszZZ1wAnKJs5KVteFN7GSSng3qBcj0E0ql2nPGEqCVh+6RG/TU5C8gEsEf |
| 230 | +3DX768M4okmFDKTzLNBm+l08kkBFt+P43rNK8dyC4PXk7yJa93SmS/dlK6DZ16Yw |
| 231 | +2FS1StiZSVqygTW59rM5XNwdhKVXy2mf/RtNSr84gSi5AQ0EWmp3PAEIALInfBLR |
| 232 | +N6fAUGPFj+K3za3PeD0fWDijlC9f4Ety/icwWPkOBdYVBn0atzI21thPRbfuUxfe |
| 233 | +zr76xNNrtRRlbDSAChA1J5T86EflowcQor8dNC6fS+oHFCGeUjfEAm16P6mGTo0p |
| 234 | +osdG2XnnTHOOEFbEUeWOwR/zT0QRaGGknoy2pc4doWcJptqJIdTl1K8xyBieik/b |
| 235 | +nSoClqQdZJa4XA3H9G+F4NmoZGEguC5GGb2P9NHYAJ3MLHBHywZip8g9oojIwda+ |
| 236 | +OCLL4UPEZ89cl0EyhXM0nIAmGn3Chdjfu3ebF0SeuToGN8E1goUs3qSE77ZdzIsR |
| 237 | +BzZSDFrgmZH+uP0AEQEAAYkBNgQYAQgAIBYhBEkvXzHmgOJBnwP4Wxnef3wVoM2y |
| 238 | +BQJaanc8AhsMAAoJEBnef3wVoM2yX4wIALcYZbQhSEzCsTl56UHofze6C3QuFQIH |
| 239 | +J4MIKrkTfwiHlCujv7GASGU2Vtis5YEyOoMidUVLlwnebE388MmaJYRm0fhYq6lP |
| 240 | +A3vnOCcczy1tbo846bRdv012zdUA+wY+mOITdOoUjAhYulUR0kiA2UdLSfYzbWwy |
| 241 | +7Obq96Jb/cPRxk8jKUu2rqC/KDrkFDtAtjdIHh6nbbQhFuaRuWntISZgpIJxd8Bt |
| 242 | +Gwi0imUVd9m9wZGuTbDGi6YTNk0GPpX5OMF5hjtM/objzTihSw9UN+65Y/oSQM81 |
| 243 | +v//Fw6ZeY+HmRDFdirjD7wXtIuER4vqCryIqR6Xe9X8oJXz9L/Jhslc= |
| 244 | +=CDME |
| 245 | +-----END PGP PUBLIC KEY BLOCK----- |
| 246 | +``` |
| 247 | + |
| 248 | +### Known Vulnerabilities |
| 249 | + |
| 250 | +At this time there are no known vulnerability with TensorFlow-models. For a list of known vulnerabilities and security advisories for TensorFlow, |
| 251 | +[click here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/security/README.md). |
0 commit comments