Skip to content

Commit 25267ff

Browse files
committed
Make more robust against corrupt and malicious data and add fuzzing test
1 parent f5eb747 commit 25267ff

22 files changed

+363
-206
lines changed

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ cl-binary-store works on 64-bit SBCL, ECL, CCL, ABCL, Allegro Common Lisp, and L
1818
- Should work out of the box without any effort with an easy programmer / user interface (no need to write code for each class/struct you use!)
1919
- Stable API and no breaking changes (this is a standard Common Lisp goal)
2020
- Ability to limit amount of data written or read (safety rails)
21+
- Somewhat safe from malicious input (some amount of fuzz testing and code reading done, but if you want to rely on safety, please contribute!)
2122

2223
## General features
2324

@@ -101,6 +102,8 @@ The package :cl-binary-store-user exports all the user facing interfaces above.
101102

102103
If you keep files around long enough, eventually you find you have stored stuff you don't remember. It's nice if you don't get horrible errors while loading the files. cl-binary-store provides a good set of restarts for missing packages (create-package, rehome symbol) and for missing objects or structures (create them, use a different class) or for changes in slots (discard, change slot name). The deserialization is extensible enough that you can put in line upgrading of objects.
103104

105+
The two conditions signalled here are of type MAYBE-EXPECTED-ERROR and INVALID-INPUT-DATA and are MISSING-SLOT and OBJECT-TYPE-NOT-FOUND
106+
104107
## Extending object serialization
105108

106109
For serializing objects, the default behavior is probably good enough for 95% of users. There are four further methods of extension provided at with increasing degrees of complexity and control.
@@ -199,6 +202,10 @@ This can be used to override the restoration with a user provided codespace in c
199202

200203
Specify what codespace to use during writing. Use \*output-magic-number\* so the file records what was used during writing.
201204

205+
### Conditions and malicious input
206+
207+
cl-binary-store attempts to handle both malicious input and corrupted input reasonably. There is by default a \*max-to-read\* of 2GB which will prevent the equivalent of zip bombs, and I have done some fuzz testing so that in general one expects to see an INVALID-INPUT-DATA error signalled if there is bad input data as opposed to crashing. There are two types of errors one might expect, MISSING-SLOT and OBJECT-TYPE-NOT-FOUND which inherit from MAYBE-EXPECTED-ERROR which is of type INVALID-INPUT-DATA. This allows you to either catch all INVALID-INPUT-DATA (if you just want things to work) or all INVALID-INPUT-DATA except MAYBE-EXPECTED-ERRORs (if you want some interactive recovery). If you actually have corrupted input and wish to recover it, I suggest adding :debug-cbs to \*features\*, recompiling, and pulling the partial data out of the debugger where some of the data will be available on the stack. It is too complicated to support corrupted data recovery and maintain high performance.
208+
202209
### Extending the codespace
203210

204211
A codespace is a definition of the binary file format, they are identified with a magic / version number. At write time the codespace is identified by \*write-version\*. The codespace can optionally be written out to the output (\*write-magic-number\*). Currently we have baked in a notion of tag bytes between objects that identify the type of the next object --- you could presumably switch to whatever tagging scheme you want with a bit of work on the code generation side. We automatically build the storage time typecase dispatch, provide the basics of reference tracking, and some other niceties, and as well a dispatch case statement during restore. This code is specialized for each codespace and built at compile / load time. This can lead to some complexities debugging as the source code is not accessible. To alleviate this one may define-codespace with :debug t, in which case the store and restore functions that are built are dumped to a file "codespace-debug.lisp" and loaded so the usual nice Common Lisp debugging experience can occur. Usually you want to inline many of your functions for performance reasons (especially if you have regular data, the inlining, at least on sbcl, allows very nice performance as, for example, the first restore-object call from inside a wrapper function can be inlined --- so the list restore, for example, is not bouncing back and forth between functions).
@@ -279,7 +286,7 @@ I suggest just piping the output through gzip if you need the smallest possible
279286

280287
## Debugging
281288

282-
We generate the codespace code through a maze of macros and functions in [codespaces.lisp](src/codespaces.lisp), so if something isn't doing what you want, it is easiest to inspect cl-binary-store::\*codespaces\* and look at the codespace objects that are built and then look at the slots RESTORE-OBJECTS-SOURCE-CODE and STORE-OBJECTS-SOURCE-CODE (which are what are used to build the restore-objects and store-objects functions in the codespace). These can be compiled at the repl or put into a file and compiled so that you can get full debugging of store-objects / restore-objects.
289+
We generate the codespace code through a maze of macros and functions in [codespaces.lisp](src/codespaces.lisp), so if something isn't doing what you want, it is easiest to inspect cl-binary-store::\*codespaces\* and look at the codespace objects that are built and then look at the slots RESTORE-OBJECTS-SOURCE-CODE and STORE-OBJECTS-SOURCE-CODE (which are what are used to build the restore-objects and store-objects functions in the codespace). These can be compiled at the repl or put into a file and compiled so that you can get full debugging of store-objects / restore-objects. To improve the debugging experience you can specify :debug t in [basic-codespace.lisp](src/basic-codespace.lisp) which will emit the code to a file for you so you get the full debugging experience. Pushing :debug-cbs to \*features\* will also help.
283290

284291
## Basic codespace and user codes
285292

cl-binary-store.asd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
:description "Unit tests for CL-BINARY-STORE"
6565
:author "Andrew J. Berkley <[email protected]>"
6666
:license :BSD-3
67-
:depends-on (#:parachute)
67+
:depends-on (#:parachute #:cl-binary-store)
6868
:pathname "test/"
6969
:components ((:file "cl-binary-store-tests"))
7070
:perform (test-op (o c) (uiop:symbol-call :parachute :test :cl-binary-store-tests)))

src/actions.lisp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,9 @@
3030
value which is :ignore, :end, or nil if the object is to be
3131
collected for the user. The second value only works if the
3232
object is a top level object (that is one of the objects in
33-
the call to store (store nil obj-a obj-b (make-instance 'end-action))"))
33+
the call to store (store nil obj-a obj-b (make-instance 'end-action))")
34+
(:method ((command t) (storage t) (references t) (restore-object t))
35+
(unexpected-data "Expected an action command" command)))
3436

3537
(defgeneric store-action (action storage store-object)
3638
(:documentation "Call during the serialization phase. You can

src/array.lisp

Lines changed: 39 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,50 @@
11
(in-package :cl-binary-store)
22

33
(defun restore-array (storage restore-object)
4-
;; This is somewhat complex because we cannot build the array
5-
;; if it is displaced to another array until we restore what the array
6-
;; is displaced to. So we need to use a fix-up scenario.
74
(declare (type function restore-object))
85
(let* ((has-fill-pointer (funcall restore-object))
9-
(fill-pointer (when has-fill-pointer (funcall restore-object)))
6+
(fill-pointer (when has-fill-pointer (restore-tagged-unsigned-fixnum storage)))
107
(adjustable (funcall restore-object))
118
(array-rank (the (unsigned-byte 8) (restore-ub8 storage)))
12-
;; restore tagged integers
13-
(dimensions (loop repeat array-rank collect (funcall restore-object)))
9+
(dimensions (loop repeat array-rank
10+
collect (restore-tagged-unsigned-fixnum storage)))
1411
(displaced (funcall restore-object)))
15-
(if displaced
16-
(let ((element-type (funcall restore-object))
17-
(offset (funcall restore-object))
18-
(displaced-to (funcall restore-object)))
19-
(make-array dimensions :element-type element-type :adjustable adjustable
20-
:fill-pointer fill-pointer :displaced-to displaced-to
21-
:displaced-index-offset offset))
22-
(let ((array
23-
(let* ((element-type (funcall restore-object)))
24-
(make-array dimensions :element-type element-type :adjustable adjustable
25-
:fill-pointer fill-pointer))))
26-
(loop for idx fixnum from 0 below (array-total-size array)
27-
do (restore-object-to (row-major-aref array idx) restore-object))
28-
array))))
12+
(check-if-too-much-data (read-storage-max-to-read storage) (reduce #'* dimensions))
13+
(labels ((check-fill-pointer (dimensions)
14+
(when has-fill-pointer
15+
(unless (= array-rank 1)
16+
(unexpected-data "found fill-pointer for a non-vector"))
17+
(unless (<= fill-pointer (length (first dimensions)))
18+
(unexpected-data (format nil "fill-pointer ~A > dimensions ~A"
19+
fill-pointer dimensions))))))
20+
(if displaced
21+
(let ((element-type (funcall restore-object))
22+
(offset (restore-tagged-unsigned-fixnum storage))
23+
(displaced-to (funcall restore-object)))
24+
(unless (typep displaced-to 'array)
25+
(unexpected-data "array" displaced-to))
26+
(unless (typep (array-element-type displaced-to) element-type)
27+
(unexpected-data (format nil "array with element-type ~A" element-type)
28+
displaced-to))
29+
(unless (< offset (array-total-size displaced-to))
30+
(unexpected-data (format nil "array of total size > ~A" offset)
31+
displaced-to))
32+
(when has-fill-pointer (check-fill-pointer dimensions))
33+
(make-array dimensions :element-type element-type :adjustable adjustable
34+
:fill-pointer fill-pointer :displaced-to displaced-to
35+
:displaced-index-offset offset))
36+
(progn
37+
(when has-fill-pointer (check-fill-pointer dimensions))
38+
(let ((array
39+
(let* ((element-type (funcall restore-object)))
40+
(make-array dimensions :element-type element-type :adjustable adjustable
41+
:fill-pointer fill-pointer))))
42+
;; We need to make our array first in case any of the array elements refer to it!
43+
;; If we are ever referred to, then there will already be a fixup in place for
44+
;; our array handled by `restore-new-reference-indicator'.
45+
(loop for idx fixnum from 0 below (array-total-size array)
46+
do (restore-object-to (row-major-aref array idx) restore-object))
47+
array))))))
2948

3049
(defun store-array (array storage eq-refs store-object assign-new-reference-id)
3150
(declare (optimize speed safety) (type array array) (type function store-object))

src/cl-binary-store-user.lisp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,11 @@
110110
#:serializable-object-info
111111
#:specialized-object-constructor
112112
#:specialized-serializer/deserializer
113+
114+
;; More conditions
115+
#:invalid-input-data
116+
#:too-much-data
117+
#:maybe-expected-error
113118
))
114119

115120
(in-package #:cl-binary-store-user)

src/cl-binary-store.lisp

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,22 @@
200200
#:*allow-codespace-switching*
201201
#:*max-to-write*
202202
#:*max-to-read*
203-
#:*output-magic-number*))
203+
#:*output-magic-number*
204+
205+
;; Conditions
206+
#:invalid-input-data
207+
#:too-much-data
208+
#:maybe-expected-error))
204209

205210

206211
(in-package :cl-binary-store)
212+
213+
(define-condition invalid-input-data (simple-error)
214+
())
215+
216+
(defun unexpected-data (expected &optional (data nil data-provided-p))
217+
(error 'invalid-input-data :format-control "Expected ~A~A" :format-arguments (list expected (if data-provided-p (format nil ", found ~A" data) ""))))
218+
219+
(define-condition maybe-expected-error (invalid-input-data)
220+
()
221+
(:documentation "Things like MISSING-PACKAGE-DURING-RESTORE, MISSING-SLOT"))

src/codespaces.lisp

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -530,8 +530,8 @@
530530
(t (case ,code-to-dispatch-on
531531
,@numeric-dispatch-codes
532532
(otherwise
533-
(error 'simple-error :format-control "Unknown code ~A found in stream"
534-
:format-arguments (list ,code-to-dispatch-on)))))))))
533+
(error 'invalid-input-data :format-control "Unknown code ~A found in stream"
534+
:format-arguments (list ,code-to-dispatch-on)))))))))
535535

536536
(defun store-objects (storage &rest stuff)
537537
"Store all the objects in stuff to storage. Do not call this directly without let'ing
@@ -549,6 +549,7 @@
549549
*read-version*."
550550
(declare (type read-storage storage))
551551
(let ((codespace *current-codespace*))
552-
(assert codespace nil
553-
"Unknown codespace to restore objects with... is *read-version* not correct?")
552+
(unless codespace
553+
(error 'invalid-input-data :format-control
554+
"Unknown codespace to restore objects with... is *read-version* not correct?"))
554555
(funcall (codespace-restore-objects codespace) storage)))

src/cons.lisp

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -142,16 +142,20 @@
142142
(defun restore-list/known-length (storage restore-object)
143143
(declare (optimize (speed 3) (safety 0)))
144144
(let* ((length (restore-tagged-unsigned-fixnum/interior storage)))
145-
(check-if-too-much-data (read-storage-max-to-read storage)
146-
(truly-the fixnum
147-
(+ (read-storage-total-read storage)
148-
(truly-the fixnum (* 16 length)))))
149-
(let* ((head (make-list length))
150-
(cons head))
151-
(dotimes (count (1- length))
152-
(restore-object-to (car cons) restore-object)
153-
(setf cons (cdr cons)))
154-
;; Support dotted end of list
155-
(restore-object-to (car cons) restore-object)
156-
(restore-object-to (cdr cons) restore-object)
157-
head)))
145+
(unless (and (<= 0 length (ash most-positive-fixnum -4))
146+
(<=
147+
(ash length 4)
148+
(truly-the fixnum
149+
(- (read-storage-max-to-read storage) (read-storage-total-read storage)))))
150+
(error 'too-much-data :max-bytes (read-storage-max-to-read storage)
151+
:bytes (+ (ash length 4) (read-storage-total-read storage))))
152+
(when (> length 0)
153+
(let* ((head (make-list length))
154+
(cons head))
155+
(dotimes (count (1- length))
156+
(restore-object-to (car cons) restore-object)
157+
(setf cons (cdr cons)))
158+
;; Support dotted end of list
159+
(restore-object-to (car cons) restore-object)
160+
(restore-object-to (cdr cons) restore-object)
161+
head))))

src/magic-numbers.lisp

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,11 @@
2323
(let ((magic-number (funcall restore-object)))
2424
(let ((codespace (gethash magic-number *codespaces*)))
2525
(unless codespace
26-
(error "Unsupported codespace version #x~X, we have ~{~x~X~^ ~}~%"
27-
magic-number (loop for key being the hash-keys of *codespaces*
28-
collect key)))
26+
(error 'invalid-input-data
27+
:format-control "Unsupported codespace version #x~X, we have ~{~x~X~^ ~}~%"
28+
:format-arguments (list
29+
magic-number (loop for key being the hash-keys of *codespaces*
30+
collect key))))
2931
(cond
3032
((not (eq *current-codespace* codespace))
3133
(cond
@@ -38,9 +40,11 @@
3840
(setf *version-being-read* magic-number)
3941
(restore-objects storage))
4042
(t
41-
(error "Switching codespace away from #x~X (~A) is DISALLOWED"
42-
(codespace-magic-number *current-codespace*)
43-
(codespace-name *current-codespace*)))))
43+
(error 'invalid-input-data
44+
:format-control "Switching codespace away from #x~X (~A) is DISALLOWED"
45+
:format-arguments (list
46+
(codespace-magic-number *current-codespace*)
47+
(codespace-name *current-codespace*))))))
4448
(t
4549
(setf *version-being-read* magic-number)
4650
(format t "Deserializing from version #x~X (~A)~%"

src/numbers.lisp

Lines changed: 39 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,8 @@
103103
(ensure-enough-data storage 8)
104104
(let* ((offset (read-storage-offset storage))
105105
(fixnum (signed-sap-ref-64 (read-storage-sap storage) offset)))
106+
(unless (typep fixnum 'fixnum)
107+
(unexpected-data "fixnum" fixnum))
106108
(setf (read-storage-offset storage) (truly-the fixnum (+ offset 8)))
107109
(truly-the fixnum fixnum)))
108110

@@ -220,10 +222,20 @@
220222
(incf offset))
221223
(set-sap-ref-double sap offset double-float)))))
222224

225+
(declaim (inline ensure-integer))
226+
(defun ensure-integer (x)
227+
(unless (integerp x)
228+
(unexpected-data "integer" x))
229+
x)
230+
223231
(defun restore-ratio (restore-object)
224232
(declare (optimize (speed 3) (safety 1)) (type function restore-object))
225-
(/ (the integer (funcall restore-object))
226-
(the integer (funcall restore-object))))
233+
(let ((a (ensure-integer (funcall restore-object)))
234+
(b (ensure-integer (funcall restore-object))))
235+
(declare (type integer a b))
236+
(unless (> b 0)
237+
(unexpected-data "integer > 0" b))
238+
(/ (the integer a) (the integer b))))
227239

228240
(defun store-ratio (ratio storage num-eq-refs assign-new-reference-id)
229241
"Nominally we don't need to do references here, but if someone has two bignums and takes
@@ -239,10 +251,16 @@
239251
(store-integer (numerator ratio))
240252
(store-integer (denominator ratio)))))
241253

254+
(declaim (inline ensure-real))
255+
(defun ensure-real (x)
256+
(unless (typep x 'real)
257+
(unexpected-data "real" x))
258+
x)
259+
242260
(defun restore-complex (restore-object)
243261
(declare (type function restore-object))
244-
(complex (funcall restore-object)
245-
(funcall restore-object)))
262+
(complex (ensure-real (funcall restore-object))
263+
(ensure-real (funcall restore-object))))
246264

247265
(declaim (inline restore-complex-double-float))
248266
(defun restore-complex-double-float (storage)
@@ -296,12 +314,17 @@
296314
(if (>= tag +first-direct-unsigned-integer-interior-code+)
297315
(- tag +first-direct-unsigned-integer-interior-code+)
298316
(truly-the fixnum
299-
(+ (ecase tag
300-
(#.+ub8-code+ (restore-ub8 storage))
301-
(#.+ub16-code+ (restore-ub16 storage))
302-
(#.+ub32-code+ (restore-ub32 storage))
303-
(#.+fixnum-code+ (restore-fixnum storage)))
304-
+interior-coded-max-integer+ 1)))))
317+
(+ (case tag
318+
(#.+ub8-code+ (restore-ub8 storage))
319+
(#.+ub16-code+ (restore-ub16 storage))
320+
(#.+ub32-code+ (restore-ub32 storage))
321+
(#.+fixnum-code+
322+
(let ((fixnum (restore-fixnum storage)))
323+
(unless (>= fixnum 0)
324+
(unexpected-data "unsigned fixnum" fixnum))
325+
(truly-the fixnum fixnum)))
326+
(otherwise (unexpected-data "tag for unsigned fixnum" tag)))
327+
+interior-coded-max-integer+ 1)))))
305328

306329
(declaim (ftype (function (read-storage)
307330
#+sbcl (values fixnum &optional)
@@ -313,11 +336,12 @@
313336
(let ((tag (restore-ub8 storage)))
314337
(if (<= +small-integer-zero-code+ tag +last-small-integer-code+)
315338
(- tag +small-integer-zero-code+)
316-
(ecase tag
339+
(case tag
317340
(#.+ub8-code+ (restore-ub8 storage))
318341
(#.+ub16-code+ (restore-ub16 storage))
319342
(#.+ub32-code+ (restore-ub32 storage))
320-
(#.+fixnum-code+ (restore-fixnum storage))))))
343+
(#.+fixnum-code+ (restore-fixnum storage))
344+
(otherwise (unexpected-data "tag for unsigned fixnum" tag))))))
321345

322346
(declaim (ftype (function (read-storage)
323347
#+sbcl (values fixnum &optional)
@@ -327,14 +351,15 @@
327351
(let ((tag (restore-ub8 storage)))
328352
(if (<= +first-small-integer-code+ tag +last-small-integer-code+)
329353
(- tag +small-integer-zero-code+)
330-
(ecase tag
354+
(case tag
331355
(#.+ub8-code+ (restore-ub8 storage))
332356
(#.+ub16-code+ (restore-ub16 storage))
333357
(#.+ub32-code+ (restore-ub32 storage))
334358
(#.+fixnum-code+ (restore-fixnum storage))
335359
(#.+sb8-code+ (restore-sb8 storage))
336360
(#.+sb16-code+ (restore-sb16 storage))
337-
(#.+sb32-code+ (restore-sb32 storage))))))
361+
(#.+sb32-code+ (restore-sb32 storage))
362+
(otherwise (unexpected-data "tag for fixnum" tag))))))
338363

339364
(declaim (inline store-tagged-unsigned-integer))
340365

0 commit comments

Comments
 (0)