Skip to content

Commit 7672902

Browse files
authored
Merge pull request #2 from ajberkley/ajb/safety-rails
Ajb/safety rails
2 parents 3b0587e + 2888748 commit 7672902

25 files changed

+491
-227
lines changed

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ cl-binary-store works on 64-bit SBCL, ECL, CCL, ABCL, Allegro Common Lisp, and L
1818
- Should work out of the box without any effort with an easy programmer / user interface (no need to write code for each class/struct you use!)
1919
- Stable API and no breaking changes (this is a standard Common Lisp goal)
2020
- Ability to limit amount of data written or read (safety rails)
21+
- Safe from malicious input (some amount of fuzz testing and code reading done, but if you want to rely on safety, please contribute!)
2122

2223
## General features
2324

@@ -101,6 +102,8 @@ The package :cl-binary-store-user exports all the user facing interfaces above.
101102

102103
If you keep files around long enough, eventually you find you have stored stuff you don't remember. It's nice if you don't get horrible errors while loading the files. cl-binary-store provides a good set of restarts for missing packages (create-package, rehome symbol) and for missing objects or structures (create them, use a different class) or for changes in slots (discard, change slot name). The deserialization is extensible enough that you can put in line upgrading of objects.
103104

105+
The two conditions signalled here are of type MAYBE-EXPECTED-ERROR and INVALID-INPUT-DATA and are MISSING-SLOT and OBJECT-TYPE-NOT-FOUND
106+
104107
## Extending object serialization
105108

106109
For serializing objects, the default behavior is probably good enough for 95% of users. There are four further methods of extension provided at with increasing degrees of complexity and control.
@@ -199,6 +202,10 @@ This can be used to override the restoration with a user provided codespace in c
199202

200203
Specify what codespace to use during writing. Use \*output-magic-number\* so the file records what was used during writing.
201204

205+
### Conditions and malicious input
206+
207+
cl-binary-store attempts to handle both malicious input and corrupted input reasonably. There is by default a \*max-to-read\* of 2GB which will prevent the equivalent of zip bombs, and I have done some fuzz testing so that in general one expects to see an INVALID-INPUT-DATA error signalled if there is bad input data as opposed to crashing. There are two types of errors one might expect, MISSING-SLOT and OBJECT-TYPE-NOT-FOUND which inherit from MAYBE-EXPECTED-ERROR which is of type INVALID-INPUT-DATA. This allows you to either catch all INVALID-INPUT-DATA (if you just want things to work) or all INVALID-INPUT-DATA except MAYBE-EXPECTED-ERRORs (if you want some interactive recovery). If you actually have corrupted input and wish to recover it, I suggest adding :debug-cbs to \*features\*, recompiling, and pulling the partial data out of the debugger where some of the data will be available on the stack. It is too complicated to support corrupted data recovery and maintain high performance.
208+
202209
### Extending the codespace
203210

204211
A codespace is a definition of the binary file format, they are identified with a magic / version number. At write time the codespace is identified by \*write-version\*. The codespace can optionally be written out to the output (\*write-magic-number\*). Currently we have baked in a notion of tag bytes between objects that identify the type of the next object --- you could presumably switch to whatever tagging scheme you want with a bit of work on the code generation side. We automatically build the storage time typecase dispatch, provide the basics of reference tracking, and some other niceties, and as well a dispatch case statement during restore. This code is specialized for each codespace and built at compile / load time. This can lead to some complexities debugging as the source code is not accessible. To alleviate this one may define-codespace with :debug t, in which case the store and restore functions that are built are dumped to a file "codespace-debug.lisp" and loaded so the usual nice Common Lisp debugging experience can occur. Usually you want to inline many of your functions for performance reasons (especially if you have regular data, the inlining, at least on sbcl, allows very nice performance as, for example, the first restore-object call from inside a wrapper function can be inlined --- so the list restore, for example, is not bouncing back and forth between functions).
@@ -279,7 +286,7 @@ I suggest just piping the output through gzip if you need the smallest possible
279286

280287
## Debugging
281288

282-
We generate the codespace code through a maze of macros and functions in [codespaces.lisp](src/codespaces.lisp), so if something isn't doing what you want, it is easiest to inspect cl-binary-store::\*codespaces\* and look at the codespace objects that are built and then look at the slots RESTORE-OBJECTS-SOURCE-CODE and STORE-OBJECTS-SOURCE-CODE (which are what are used to build the restore-objects and store-objects functions in the codespace). These can be compiled at the repl or put into a file and compiled so that you can get full debugging of store-objects / restore-objects.
289+
We generate the codespace code through a maze of macros and functions in [codespaces.lisp](src/codespaces.lisp), so if something isn't doing what you want, it is easiest to inspect cl-binary-store::\*codespaces\* and look at the codespace objects that are built and then look at the slots RESTORE-OBJECTS-SOURCE-CODE and STORE-OBJECTS-SOURCE-CODE (which are what are used to build the restore-objects and store-objects functions in the codespace). These can be compiled at the repl or put into a file and compiled so that you can get full debugging of store-objects / restore-objects. To improve the debugging experience you can specify :debug t in [basic-codespace.lisp](src/basic-codespace.lisp) which will emit the code to a file for you so you get the full debugging experience. Pushing :debug-cbs to \*features\* will also help.
283290

284291
## Basic codespace and user codes
285292

benchmarks.lisp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
(with-open-file (str "blarg.bin" :if-exists :supersede :if-does-not-exist :create
6262
:direction :output :element-type '(unsigned-byte 8))
6363
(cl-binary-store:store str data))))))
64+
;;(assert (equalp (cl-binary-store:restore store) data))
6465
(when read
6566
(timed (" READ :" repeats output-size-MB)
6667
(dotimes (x repeats) (cl-binary-store:restore store)))

cl-binary-store.asd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
:description "Unit tests for CL-BINARY-STORE"
6565
:author "Andrew J. Berkley <[email protected]>"
6666
:license :BSD-3
67-
:depends-on (#:parachute)
67+
:depends-on (#:parachute #:cl-binary-store)
6868
:pathname "test/"
6969
:components ((:file "cl-binary-store-tests"))
7070
:perform (test-op (o c) (uiop:symbol-call :parachute :test :cl-binary-store-tests)))

src/actions.lisp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,9 @@
3030
value which is :ignore, :end, or nil if the object is to be
3131
collected for the user. The second value only works if the
3232
object is a top level object (that is one of the objects in
33-
the call to store (store nil obj-a obj-b (make-instance 'end-action))"))
33+
the call to store (store nil obj-a obj-b (make-instance 'end-action))")
34+
(:method ((command t) (storage t) (references t) (restore-object t))
35+
(unexpected-data "Expected an action command" command)))
3436

3537
(defgeneric store-action (action storage store-object)
3638
(:documentation "Call during the serialization phase. You can

src/array.lisp

Lines changed: 42 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,51 @@
11
(in-package :cl-binary-store)
22

33
(defun restore-array (storage restore-object)
4-
;; This is somewhat complex because we cannot build the array
5-
;; if it is displaced to another array until we restore what the array
6-
;; is displaced to. So we need to use a fix-up scenario.
7-
(declare (type function restore-object))
4+
(declare (type function restore-object) (optimize (speed 3) (safety 1)))
85
(let* ((has-fill-pointer (funcall restore-object))
9-
(fill-pointer (when has-fill-pointer (funcall restore-object)))
6+
(fill-pointer (when has-fill-pointer (restore-tagged-unsigned-fixnum storage)))
107
(adjustable (funcall restore-object))
118
(array-rank (the (unsigned-byte 8) (restore-ub8 storage)))
12-
;; restore tagged integers
13-
(dimensions (loop repeat array-rank collect (funcall restore-object)))
14-
(displaced (funcall restore-object)))
15-
(if displaced
16-
(let ((element-type (funcall restore-object))
17-
(offset (funcall restore-object))
18-
(displaced-to (funcall restore-object)))
19-
(make-array dimensions :element-type element-type :adjustable adjustable
20-
:fill-pointer fill-pointer :displaced-to displaced-to
21-
:displaced-index-offset offset))
22-
(let ((array
23-
(let* ((element-type (funcall restore-object)))
24-
(make-array dimensions :element-type element-type :adjustable adjustable
25-
:fill-pointer fill-pointer))))
26-
(loop for idx fixnum from 0 below (array-total-size array)
27-
do (restore-object-to (row-major-aref array idx) restore-object))
28-
array))))
9+
(dimensions (loop repeat array-rank
10+
collect (restore-tagged-unsigned-fixnum storage)))
11+
(displaced (funcall restore-object))
12+
(array-total-size (if dimensions (reduce #'* dimensions) 0)))
13+
(unless (and (typep array-total-size 'fixnum) (>= array-total-size 0))
14+
(unexpected-data "Array total size is too large"))
15+
(check-if-too-much-data (read-storage-max-to-read storage) array-total-size)
16+
(labels ((check-fill-pointer (dimensions)
17+
(when has-fill-pointer
18+
(unless (= array-rank 1)
19+
(unexpected-data "found fill-pointer for a non-vector"))
20+
(unless (<= fill-pointer (first dimensions))
21+
(unexpected-data "fill-pointer > vector length")))
22+
(values)))
23+
(if displaced
24+
(let ((element-type (funcall restore-object))
25+
(offset (restore-tagged-unsigned-fixnum storage))
26+
(displaced-to (funcall restore-object)))
27+
(unless (typep displaced-to 'array)
28+
(unexpected-data "displaced to a non array?!"))
29+
(unless (typep (array-element-type displaced-to) element-type)
30+
(unexpected-data "array displaced to array of different element-type"))
31+
(unless (< offset (array-total-size displaced-to))
32+
(unexpected-data "array displaced to too small array"))
33+
(when has-fill-pointer (check-fill-pointer dimensions))
34+
(make-array dimensions :element-type element-type :adjustable adjustable
35+
:fill-pointer fill-pointer :displaced-to displaced-to
36+
:displaced-index-offset offset))
37+
(progn
38+
(when has-fill-pointer (check-fill-pointer dimensions))
39+
(let ((array
40+
(let* ((element-type (funcall restore-object)))
41+
(make-array dimensions :element-type element-type :adjustable adjustable
42+
:fill-pointer fill-pointer))))
43+
;; We need to make our array first in case any of the array elements refer to it!
44+
;; If we are ever referred to, then there will already be a fixup in place for
45+
;; our array handled by `restore-new-reference-indicator'.
46+
(loop for idx fixnum from 0 below array-total-size
47+
do (restore-object-to (row-major-aref array idx) restore-object))
48+
array))))))
2949

3050
(defun store-array (array storage eq-refs store-object assign-new-reference-id)
3151
(declare (optimize speed safety) (type array array) (type function store-object))

src/cl-binary-store-user.lisp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,11 @@
110110
#:serializable-object-info
111111
#:specialized-object-constructor
112112
#:specialized-serializer/deserializer
113+
114+
;; More conditions
115+
#:invalid-input-data
116+
#:too-much-data
117+
#:maybe-expected-error
113118
))
114119

115120
(in-package #:cl-binary-store-user)

src/cl-binary-store.lisp

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,29 @@
200200
#:*allow-codespace-switching*
201201
#:*max-to-write*
202202
#:*max-to-read*
203-
#:*output-magic-number*))
203+
#:*output-magic-number*
204+
205+
;; Conditions
206+
#:invalid-input-data
207+
#:too-much-data
208+
#:maybe-expected-error))
204209

205210

206211
(in-package :cl-binary-store)
212+
213+
(define-condition invalid-input-data (simple-error)
214+
())
215+
216+
(defun unexpected-data (expected &optional (data nil data-provided-p))
217+
(error 'invalid-input-data
218+
:format-control "Expected ~A~A"
219+
:format-arguments (list expected
220+
(if data-provided-p
221+
;; be careful not to provide anything
222+
;; that cannot be printed trivially here!
223+
(format nil ", found ~A" data)
224+
""))))
225+
226+
(define-condition maybe-expected-error (invalid-input-data)
227+
()
228+
(:documentation "Things like MISSING-PACKAGE-DURING-RESTORE, MISSING-SLOT"))

src/codespaces.lisp

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -530,8 +530,8 @@
530530
(t (case ,code-to-dispatch-on
531531
,@numeric-dispatch-codes
532532
(otherwise
533-
(error 'simple-error :format-control "Unknown code ~A found in stream"
534-
:format-arguments (list ,code-to-dispatch-on)))))))))
533+
(error 'invalid-input-data :format-control "Unknown code ~A found in stream"
534+
:format-arguments (list ,code-to-dispatch-on)))))))))
535535

536536
(defun store-objects (storage &rest stuff)
537537
"Store all the objects in stuff to storage. Do not call this directly without let'ing
@@ -549,6 +549,7 @@
549549
*read-version*."
550550
(declare (type read-storage storage))
551551
(let ((codespace *current-codespace*))
552-
(assert codespace nil
553-
"Unknown codespace to restore objects with... is *read-version* not correct?")
552+
(unless codespace
553+
(error 'invalid-input-data :format-control
554+
"Unknown codespace to restore objects with... is *read-version* not correct?"))
554555
(funcall (codespace-restore-objects codespace) storage)))

src/cons.lisp

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -142,16 +142,20 @@
142142
(defun restore-list/known-length (storage restore-object)
143143
(declare (optimize (speed 3) (safety 0)))
144144
(let* ((length (restore-tagged-unsigned-fixnum/interior storage)))
145-
(check-if-too-much-data (read-storage-max-to-read storage)
146-
(truly-the fixnum
147-
(+ (read-storage-total-read storage)
148-
(truly-the fixnum (* 16 length)))))
149-
(let* ((head (make-list length))
150-
(cons head))
151-
(dotimes (count (1- length))
152-
(restore-object-to (car cons) restore-object)
153-
(setf cons (cdr cons)))
154-
;; Support dotted end of list
155-
(restore-object-to (car cons) restore-object)
156-
(restore-object-to (cdr cons) restore-object)
157-
head)))
145+
(unless (and (<= 0 length (ash most-positive-fixnum -4))
146+
(<=
147+
(ash length 4)
148+
(truly-the fixnum
149+
(- (read-storage-max-to-read storage) (read-storage-total-read storage)))))
150+
(error 'too-much-data :max-bytes (read-storage-max-to-read storage)
151+
:bytes (+ (ash length 4) (read-storage-total-read storage))))
152+
(when (> length 0)
153+
(let* ((head (make-list length))
154+
(cons head))
155+
(dotimes (count (1- length))
156+
(restore-object-to (car cons) restore-object)
157+
(setf cons (cdr cons)))
158+
;; Support dotted end of list
159+
(restore-object-to (car cons) restore-object)
160+
(restore-object-to (cdr cons) restore-object)
161+
head))))

src/hash-table.lisp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@
3131
#-sbcl (declare (ignore synchronized weakness))
3232
;; weakness works as far as I can discern
3333
;; because of how we do reference restoration
34+
(unless (typep rehash-size '(or (integer 1 *) (float (1.0) *)))
35+
(unexpected-data "rehash-size is not correct"))
36+
(unless (< size (ash most-positive-fixnum -4))
37+
(unexpected-data "hash table too large"))
3438
(check-if-too-much-data (read-storage-max-to-read storage)
3539
(* 16 size)) ;; an estimate
3640
(make-hash-table :test test :size size

0 commit comments

Comments
 (0)