Skip to content

Commit b0eb25e

Browse files
committed
add final post from series about improvements to jsonschema.net
1 parent 096e5c4 commit b0eb25e

File tree

5 files changed

+306
-0
lines changed

5 files changed

+306
-0
lines changed

.jekyll-metadata

2.79 KB
Binary file not shown.

_plugins/video_tag.rb

+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Title: Simple Video tag for Jekyll
2+
# Author: Brandon Mathis http://brandonmathis.com
3+
# Description: Easily output MPEG4 HTML5 video with a flash backup.
4+
#
5+
# Syntax {% video url/to/video [width height] [url/to/poster] %}
6+
#
7+
# Example:
8+
# {% video http://site.com/video.mp4 720 480 http://site.com/poster-frame.jpg %}
9+
#
10+
# Output:
11+
# <video width='720' height='480' preload='none' controls poster='http://site.com/poster-frame.jpg'>
12+
# <source src='http://site.com/video.mp4' type='video/mp4; codecs=\"avc1.42E01E, mp4a.40.2\"'/>
13+
# </video>
14+
#
15+
16+
module Jekyll
17+
18+
class VideoTag < Liquid::Tag
19+
@video = nil
20+
@poster = ''
21+
@height = ''
22+
@width = ''
23+
24+
def initialize(tag_name, markup, tokens)
25+
@videos = markup.scan(/((https?:\/\/|\/)\S+\.(webm|ogv|mp4)\S*)/i).map(&:first).compact
26+
@poster = markup.scan(/((https?:\/\/|\/)\S+\.(png|gif|jpe?g)\S*)/i).map(&:first).compact.first
27+
@sizes = markup.scan(/\s(\d\S+)/i).map(&:first).compact
28+
super
29+
end
30+
31+
def render(context)
32+
output = super
33+
types = {
34+
'.mp4' => "type='video/mp4; codecs=\"avc1.42E01E, mp4a.40.2\"'",
35+
'.ogv' => "type='video/ogg; codecs=theora, vorbis'",
36+
'.webm' => "type='video/webm; codecs=vp8, vorbis'"
37+
}
38+
if @videos.size > 0
39+
video = "<video #{sizes} preload='metadata' controls #{poster}>"
40+
@videos.each do |v|
41+
video << "<source src='#{v}' #{types[File.extname(v)]}>"
42+
end
43+
video += "</video>"
44+
else
45+
"Error processing input, expected syntax: {% video url/to/video [url/to/video] [url/to/video] [width height] [url/to/poster] %}"
46+
end
47+
end
48+
49+
def poster
50+
"poster='#{@poster}'" if @poster
51+
end
52+
53+
def sizes
54+
attrs = "width='#{@sizes[0]}'" if @sizes[0]
55+
attrs += " height='#{@sizes[1]}'" if @sizes[1]
56+
attrs
57+
end
58+
end
59+
end
60+
61+
Liquid::Template.register_tag('video', Jekyll::VideoTag)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
---
2+
title: "Improving JsonSchema.Net (Part 2)"
3+
date: 2024-05-17 09:00:00 +1200
4+
tags: [json-schema, architecture, performance, learning]
5+
toc: true
6+
pin: false
7+
---
8+
9+
Over the last few posts, I've gone over some recent changes to my libraries that work toward better performance by way of reducing memory allocations.
10+
11+
In this post, I'd like to review some changes I made internally to _JsonSchema.Net_ that helped the code make more sense while also providing some of the performance increase.
12+
13+
## The sad state of things
14+
15+
In version 6 and prior, analysis of schemas was performed and stored in code that was strewn about in many different places.
16+
17+
- `JsonSchema` would assess and store a lot of its own data, like base URI, dialect, and anchors.
18+
- There were extension methods for various lookups that I had to do a lot, and the static class that defined the methods had private static dictionaries to cache the data.
19+
- Keyword `Type` and instance to keyword name (e.g. `TitleKeyword` -> "title")
20+
- Whether a keyword supported a given JSON Schema version (e.g. `prefixItems` is only 2020-12)
21+
- Keyword priority calculation and lookup (e.g. `properties` needs to run before `additionalProperties`)
22+
- Whether a keyword produced annotations that another keyword needed (e.g. `unevaluatedProperties` depends on annotations from `properties`, even nested ones)
23+
- The code to determine which keywords to evaluate was in `EvaluationOptions`.
24+
- But the code to determine which keywords were supported by the schema's declared meta-schema was in `EvaluationContext`.
25+
26+
Yeah, a lot of code in places it didn't need to be. Moreover, a lot of this was performed at evaluation time.
27+
28+
It was time to fix this.
29+
30+
## A better way
31+
32+
About a month ago, I ran through an experiment to see if I could make a JSON Schema library (from scratch) that didn't have an object model. This came out of [reworking my JSON Logic library](/posts/logic-without-models) to do the same.
33+
34+
> The results of this experiment can be found in the [`schema/experiment-modelless-schema`](https://github.com/gregsdennis/json-everything/tree/schema/experiment-modelless-schema) branch, if you want to have a look. There's a new static `JsonSchema.Evaluate()` method that calls each keyword via a new `IKeywordHandler` interface. While the single run performance is great, it can't compete at scale with the [static analysis](/posts/new-json-schema-net) that was introduced a few versions ago.
35+
{: .prompt-info }
36+
37+
In building the experiment, I had to rebuild things like the schema and keyword registries, and I discovered that I could do a lot of the analysis that yielded the above information at registration time. This meant that I wasn't trying to get this data during evaluation, which is what lead to the stark increase in performance for single evaluations.
38+
39+
I had decided not to pursue the experiment further, but I had learned a lot by doing it, so it wasn't a waste.
40+
41+
> Sometimes rebuilding something from scratch can give you better results, even if it just teaches you things.
42+
{: .prompt-tip }
43+
44+
So let's get refactoring!
45+
46+
<div class="video-container">
47+
{% video /assets//video/matrix-we-got-a-lot-to-do.mp4 798 %}
48+
<p class="video-caption">We got a lot to do. We gotta get to it. - <strong>The Matrix, 1999</strong></p>
49+
</div>
50+
51+
## Managing keyword data
52+
53+
I started with the keyword registry. I wanted to get rid of all of those extensions and just precalculate everything as keywords were registered.
54+
55+
In its current state, `SchemaKeywordRegistry` contained three different dictionaries:
56+
57+
- keyword name → keyword type
58+
- keyword type → instance (for keywords that need to support null values, like `const`; this resolves some serializer problems)
59+
- keyword type → keyword `TypeInfoResolver` (supports Native AOT)
60+
61+
In the keyword extensions, I then had more dictionaries:
62+
63+
- keyword type → keyword name (reverse of what's in the registry)
64+
- keyword type → evaluation group (supporting priority and keyword evaluation order)
65+
- keyword type → specification versions
66+
67+
That's a lot of dictionaries! And I needed them all to be concurrent!
68+
69+
### Consolidation
70+
71+
First, I need to consolidate all of this into a "keyword meta-data" type. This is what I came up with:
72+
73+
```c#
74+
class KeywordMetaData
75+
{
76+
public string Name { get; }
77+
public Type Type { get; }
78+
public long Priority { get; set; }
79+
public bool ProducesDependentAnnotations { get; set; }
80+
public IJsonSchemaKeyword? NullValue { get; set; }
81+
public SpecVersion SupportedVersions { get; set; }
82+
public JsonSerializerContext? SerializerContext { get; }
83+
84+
// constructor contains most of the keyword inspection as well.
85+
}
86+
```
87+
88+
This single type stores all of the information for a single keyword that was stored in the various dictionaries listed above.
89+
90+
### Access
91+
92+
Second, I need a way to store these so that I can access them in multiple ways. What I'd really like is a current dictionary that allows access to items using multiple keys. There are probably (definitely) a number of ways to do this.
93+
94+
My [approach](https://github.com/gregsdennis/json-everything/blob/master/src/JsonSchema/MultiLookupConcurrentDictionary.cs) was to wrap a `ConcurrentDictionary<object, KeywordMetaData>` and keep a collection of "key functions" that would produce a number of key objects for an item. When I add an item, it produces all of the keys and creates an entry for each, using the item as the value. That way, I can look up the item using any of the keys.
95+
96+
### Data initialization
97+
98+
With these pieces in place, I can simply take all of the keyword types, build meta-data objects, and add those to the lookup.
99+
100+
Finally, once the lookup has all of the keywords, I run some dependency analysis logic to calculate the priorities, and it's done.
101+
102+
When a client adds a new keyword, I simply add it to the lookup and run the dependency analysis again.
103+
104+
### Deletion
105+
106+
The final step for this part of the refactor was to move the extension methods into the `SchemaKeywordRegistry` class (which was already static anyway) and delete the `KeywordExtensions` class.
107+
108+
## Managing schema data
109+
110+
The other significant update I wanted to make was how schema data was handled. Like keywords, the data should be gathered at registration time rather than at evaluation time.
111+
112+
So what kind of data do I need (or can I get) from schemas?
113+
114+
- What is the root document for any given URI?
115+
- Are there any anchors defined in the document?
116+
- Are any of those anchors dynamic (defined by `$dynamicAnchor`)?
117+
- Are any of those anchors legacy (defined by `$id` instead of `$anchor`)?
118+
- Is there a `$recursiveAnchor`?
119+
- What version of the specification should it use?
120+
- What dialect does the schema use (which keywords does its meta-schema declare)?
121+
122+
I currently have several chunks of code in various places that calculate and store this. Like the keyword data, this could be consolidated.
123+
124+
### Consolidation
125+
126+
In previous versions, `JsonSchema` contained a method called `PopulateBaseUris()` that would run on the first evaluation. This method would recursively scan the entire tree and set all of the base URIs for all of the subschemas and register any anchors. The anchor registry was on `JsonSchema` itself.
127+
128+
Later, when resolving a reference that had an anchor on it, the `RefKeyword` (or `DynamicRefKeyword` or whatever needed to resolve the reference) would ask the schema registry for the schema using the base URI, and then it would check that schema directly to see if it had the required anchor.
129+
130+
A better way would be to just let the registry figure it all out. To do that, we need a registration type to hold all of the schema identifier meta-data.
131+
132+
```c#
133+
class Registration
134+
{
135+
public required IBaseDocument Root { get; init; }
136+
public Dictionary<string, JsonSchema>? Anchors { get; set; }
137+
public Dictionary<string, JsonSchema>? LegacyAnchors { get; set; }
138+
public Dictionary<string, JsonSchema>? DynamicAnchors { get; set; }
139+
public JsonSchema? RecursiveAnchor { get; set; }
140+
}
141+
```
142+
143+
### Access
144+
145+
The next step was to expose all of this glorious data to consumers of the registry.
146+
147+
I already had a `.Get(Uri)` method, but for this, I'd need something a bit more robust. So I created these:
148+
149+
- `.Get(Uri baseUri, string? anchor, bool allowLegacy = false)`
150+
- `.Get(DynamicScope scope, Uri baseUri, string anchor, bool requireLocalAnchor)`
151+
- `.GetRecursive(DynamicScope scope)`
152+
153+
> These are all internal, but the `.Get(Uri)` still exists publicly.
154+
{: .prompt-info }
155+
156+
These methods let me query for schemas identified by URIs, URIs with anchors, and recursive and dynamic anchors, all with varied support based on which specification version I'm using.
157+
158+
- Draft 6/7 defines anchors in `$id`, but that usage is disallowed since 2019-09, which added `$anchor`.
159+
- Draft 2019-09 defines `$recursiveAnchor`, but that was replaced by `$dynamicAnchor` in 2020-12.
160+
- In draft 2020-12, `$dynamicRef` has a requirement that a `$dynamicAnchor` must exist within the same schema resource. This has been removed for the upcoming specification version.
161+
162+
I have to support all of these variances, and I can do that with these three methods.
163+
164+
### Data initialization
165+
166+
Scanning the schemas seemed like it was going to be the hard part, but it turned out to be pretty easy.
167+
168+
As mentioned before, the old scanning approach was recursive: it would scan the local subschema to see if it had the appropriate keywords, then it would call itself on any nested subschemas to scan them.
169+
170+
However, during all of the changes described in this and the previous posts, I developed a pattern that lets me scan a recursive structure iteratively. I'm not sure if it's the _best_ way, but it's a good way and it's mine. Here's some pseudocode.
171+
172+
```c#
173+
Result[] Scan(Item root)
174+
{
175+
var itemsToScan = new Queue() { root };
176+
var result = new List();
177+
while (itemsToScan.Count != 0)
178+
{
179+
// get the next item
180+
var item = itemsToScan.Dequeue();
181+
182+
// gather the data we want from it
183+
var localResult = GetDataForLocal(item);
184+
result.Add(localResult);
185+
186+
// check to see if it has children
187+
foreach(var sub in item.GetSubItems())
188+
{
189+
// set up child for scan
190+
itemsToScan.Enqueue(sub);
191+
}
192+
}
193+
194+
return result;
195+
}
196+
```
197+
198+
The things I wanted to get at each stage were all the anchors from before.
199+
200+
_And_ since I was already iterating through all of the subschemas and tracking their base URIs, it was simple to just set that on the subschemas. I also checked for:
201+
202+
- a declared version, determined by the meta-schema, which I could get because I'm already in the schema registry
203+
- the dialect, which is the set of vocabularies (which declare support for keywords) defined by that meta-schema
204+
205+
### Deletion
206+
207+
With all of this now pre-calculated when the schema is registered, I no longer needed all of the code that did this spread out all over everywhere. So it's gone!
208+
209+
- `JsonSchema` no longer keeps anchor data
210+
- `EvaluationOptions` no longer determines which keywords to process
211+
- `EvaluationContext` no longer determines vocab or stores dialect information
212+
213+
(This seems like a short list, but it was a serious chunk of code.)
214+
215+
## Wrap up
216+
217+
This was a lot of refactoring, but I've been wanting to do something about the disorganized state of my code for a really long time.
218+
219+
I knew that it needed fixing, and I unexpectedly discovered how to fix it by writing a new implementation from scratch. Hopefully that won't be necessary every time.
220+
221+
Thanks for reading through this series of posts covering the latest set of improvements and the things I learned along the way.
222+
223+
## One last thing
224+
225+
I've recently set up my [GitHub Sponsors page](https://github.com/sponsors/gregsdennis), so if you or your company find my work useful, I'd be eternally grateful if you signed up for a monthly contribution.
226+
227+
When you sign up at any level, you'll be listed in the sponsors section on that page as well as the new [Support page](/support) on this blog. Higher levels can get social media shoutouts as well as inclusion in the sponsors bubble cloud at the bottom of the [json-everything.net](https://json-everything.net) landing page (which will show up as soon as I have such a sponsor).
228+
229+
Thanks again.

assets/css/style.scss

+16
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,22 @@ img {
5656
border-radius: 10px;
5757
}
5858

59+
.video-container video {
60+
height: 100%;
61+
width: 100%;
62+
object-fit: contain;
63+
border-radius: 6px;
64+
}
65+
66+
.video-caption {
67+
display: block;
68+
text-align: center;
69+
font-style: italic;
70+
font-size: 80%;
71+
padding: 0;
72+
color: #6d6c6c;
73+
}
74+
5975
code.highlighter-rouge {
6076
font-size: .85em !important;
6177
}
297 KB
Binary file not shown.

0 commit comments

Comments
 (0)