Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Last-Modified or ETag headers on critical REST API endpoints #3329

Closed
ghys opened this issue Jan 22, 2023 · 10 comments · Fixed by #3335
Closed

Support Last-Modified or ETag headers on critical REST API endpoints #3329

ghys opened this issue Jan 22, 2023 · 10 comments · Fixed by #3335
Labels
enhancement An enhancement or new feature of the Core

Comments

@ghys
Copy link
Member

ghys commented Jan 22, 2023

Idea from openhab/openhab-webui#1650 (comment):

The Main UI (and possibly others) makes relatively frequent REST API calls to get all things, or item, or rules, or UI components to make sure the data is up-to-date. For instance every time an item or thing picker is rendered, a call to /rest/items resp. /rest/things is performed.
/rest/items is also called every time the home page is shown, to compute the model cards (locations, equipment, points) with the latest data.

It could be a huge performance improvement if these critical and costly endpoints supported either the ETag or Last-Modified HTTP response headers, and their equivalent request headers (If-None-Match and If-Modified-Since).
If the preconditions don't match (see JAX-RS - Handling 'Last-Modified' and 'ETag' headers with Request Injection) then a 304 Not Modified response would be sent instead.

The UI and other clients would store the last known list of entities in a client-side cache (possibly even surviving page refreshes with IndexedDB), as well as the last sent EntityTag or modified date retrieved from the response headers, which would then be sent along with future requests. It would react to a 200 OK response by updating the cache before returning the data to the caller, and to a 304 Not Modified response by simply returning the cache.

To determine if the data has changed, resources backed by a Registry could keep the last sent EntityTag or last modified date as well, and add a change listener to reset this last ETag or date when a change was notified.

For items it would surely be a little more complicated since changes to metadata would have to be handled as well.

@ghys
Copy link
Member Author

ghys commented Jan 22, 2023

A few screenshots to illustrate:

Refreshing the main UI home page when every static resource has already been cached:

image

Here the loading of the UI is delayed by:

  • The /rest/auth/token call (86ms) - blocking but not much we can do about that
  • The /rest/ call (157ms), also blocking; while its response is small in size, still takes some time to "process" so it could be cached with an ETag or Last-Modified in the browsers' localStorage and sent with If-None-Match: <ETag> or If-Modified-Since: <date> - in that case the root resource would not have to enumerate the links (available REST endpoints) and simply send a 304 response which would tell the UI to assume its cached version of /rest/ is still valid. However determining if that list of endpoints has changed would still probably take the same time, unless we can optimize that...
  • The /rest/ui/components/ui:{page|widget} calls (73ms/79ms) are also blocking - they are backed by a Registry and the response sizes can expand a lot as the admin adds pages & widgets - so they would benefit from Last-Modified / If-Modified-Since support and a client-side cache.
  • The /rest/items (973ms) call can become huge and would also benefit the most from client-side caching because it's called all the time, this is even more evident in the following scenario:

image

Here each item picker is forced to independently make calls to /rest/items (even if it's not opened), after 5 concurrent calls, we can see they become blocking:

image

An item cache will a reliable way to determine when it's out-of-date would be a huge improvement in this case.

Also important to know, there are "dynamic" fields returned with these entities, independent of their "configuration", for example the state of items or the status of things/rules, these ideally would not be taken into account for the "last modified" date too avoid too many cache invalidations - this would be by convention: if you send a If-Modified-Since or If-None-Match header it would be assumed that you don't care about these "dynamic" fields.

@ccutrer
Copy link
Contributor

ccutrer commented Jan 22, 2023

I would be very hesitant to ignore state for caching purposes just because a caching request header is present. Many many tools will silently and automatically add caching headers, and they expect that it will not change the semantics of the request.

If we could add a param to the items endpoint that simply excludes state from the response, it would make this far more tractable.

I like the idea of just having registry listeners that update an ETag on every change, regardless of the change. Honestly I don't think we should even bother trying to persist the ETag across restarts (and re-loading of registries!).

@ghys
Copy link
Member Author

ghys commented Jan 22, 2023

I would be very hesitant to ignore state for caching purposes just because a caching request header is present. Many many tools will silently and automatically add caching headers, and they expect that it will not change the semantics of the request.

You're absolutely right.

If we could add a param to the items endpoint that simply excludes state from the response, it would make this far more tractable.

Could work indeed, or a generic parameter to exclude fields which aren't to be cached (or the opposite, one to specify that the intent is to cache so the Last-Modified header would only be sent with this param set).

@J-N-K
Copy link
Member

J-N-K commented Jan 22, 2023

Why does each item picker need to make independent calls? And wouldn‘t compressing the response also improve the situation? I feel it‘s a bit anti-pattern to have a registry and then cache the registry in the REST resource.

@ccutrer
Copy link
Contributor

ccutrer commented Jan 22, 2023

I don't think he's saying to have the REST resource cache it, but to have the browser cache it. The REST resource would keep track of an ETag, and on any update to the registry invalidate it.

@ghys
Copy link
Member Author

ghys commented Jan 22, 2023

Why does each item picker need to make independent calls? And wouldn‘t compressing the response also improve the situation? I feel it‘s a bit anti-pattern to have a registry and then cache the registry in the REST resource.

They're independent components, they don't talk to each other, so either they're retrieving the data ad-hoc (which ensures it's up-to-date; that's what they do now) or there's client-side caching involved, but then there's the question of when to invalidate that cache, which this proposal would solve.
There's no caching of the registry, just a tracking of a "last modified date".

@rkoshak
Copy link

rkoshak commented Jan 23, 2023

From @spacemanspiff2007 :

I can't comment at #3329 - seems to be some kind of rights issue.
grafik

Wouldn't a possible solution be to just listen for the ItemAdded/ItemUpdated/ItemRemoved events over SSE/Websockets and when the event was received one could invalidate the local cache?

That still leaves tracking metadata changes to which do not generate an event.

ghys added a commit to ghys/openhab-core that referenced this issue Jan 24, 2023
This implements a new optional `cacheable` parameter for these REST endpoints:
- `/rest/items`
- `/rest/things`
- `/rest/rules`

When this parameter is set, a flat list of all elements excluding
non-cacheable fields (e.g. "state", "transformedState", "stateDescription",
"commandDescription" for items, "statusInfo", "firmwareStatus",
"properties" for things, "status" for rules) will be retrieved along with
a `Last-Modified` HTTP response header. When unknown, the Last-Modified
header will be set to the date of the request.

Also only when this parameter is set, and a `If-Modified-Since` header is
found in the request, that header will be compared to the last known
modified date for the corresponding cacheable list. The last modified date
will be reset when any change is made on the elements of the underlying
registry. If the `If-Modified-Since` date is equal or more recent than the
last modified date, then a 304 Not Modified response with no content will
be served instead of the usual 200 OK, informing the client that its
cache is still valid at the provided date.

All other request parameters will be ignored except for "metadata" in the
`/rest/items` endpoint. When a metadata selector is set, the resulting
item list will be considered like a completely different resource, i.e.
it will have its own last modified date. Regarding metadata, the approach
to invalidating last modified dates is very conservative: when any metadata
is changed, all cacheable lists of items will have their last modified date
reset even if the change was in a metadata namespace that wasn't requested.

This also implements the abovedescribed behavior for the
`/rest/ui/components/{namespace}` endpoint, but no `cacheable` parameter
is necessary. The last modified date is tracked by namespace.

Signed-off-by: Yannick Schaus <[email protected]>
@J-N-K J-N-K added the enhancement An enhancement or new feature of the Core label Jan 24, 2023
@ghys
Copy link
Member Author

ghys commented Jan 24, 2023

With #3335 combined with openhab/openhab-webui#1661 the total transferred size when loading the UI's home page can be as low as 7 kB - regardless of how many items, pages or widgets you have (if they have not changed). This could be a huge improvement for those accessing their instance on myopenhab.org with poor upload bandwidth for example.

image

image

The item pickers example is also far better - the requests are made normally to the server to revalidate the freshness but the response is empty and the 304 response tells the browser to reuse the version from its cache.

image

Note that this is all made by the browser as discussed before, I didn't have to implement any caching, only add the parameters to the URLs. It was therefore important to have a proper Cache-Control: must-revalidate header as well, and remove fields that aren't stored, and can change without the registry notifying the change to the listeners or those might become stale.

@rkoshak
Copy link

rkoshak commented Jan 24, 2023

Again from @spacemanspiff2007 since he still apparently cannot comment here himself.


@rkoshak

That still leaves tracking metadata changes to which do not generate an event.

Correct - but that could be changed.

I'm still not convinced if the additional complexity is worth a theoretical bandwidth saving.
Until now bandwidth was never an issue and with gzip we're already reducing it to 10% of the original bandwidth.
When playing around in the MainUI the user typical does administrative work so the chance that the cache gets invalidated nonetheless is quite high.

So this is more a "it's possible that this might save some bandwidth but only under very specific circumstances and we're not even sure if and how often the use is hitting those circumstances".
On the opposite site we have increased complexity, higher maintenance and unexpected behavior (e.g. the new parameter won't do anything in swagger because the header is missing which is quite confusing).
Proper caching is hard and @ccutrer has already identified an edge case with ruby scripting where it won't work.
OpenHAB access is mainly through the local network and there it really doesn't matter if we transfer ~1k for a simple header or 15k for the complete g-zipped item list.

@ccutrer
Copy link
Contributor

ccutrer commented Jan 25, 2023

@ccutrer has already identified an edge case with ruby scripting where it won't work.

TBF, what I identified is that something that's currently possible with JSR223 scripting, but I consider a violation of the abstraction (calling GenericItem.setLabel() on an item provided from the GenericItemProvider) would be even less likely to continue to work, since there's no GenericItemProvider.update() method in order to notify the registry and subsequently clear any downstream cache. I would not consider that a blocker for implementing proper HTTP caching. And anyone that's truly wanting to continue with such (ab)uses, could (ab)use it even more by calling AbstractRegistry.updated directly, masquerading as the GenericItemProvider. Yes, gross, which is why I don't endorse this usage in the first place, but don't take excessive steps to prevent it either.

J-N-K pushed a commit that referenced this issue Jun 15, 2023
* Closes #3329.

This implements a new optional `cacheable` parameter for these REST endpoints:
- `/rest/items`
- `/rest/things`
- `/rest/rules`

When this parameter is set, a flat list of all elements excluding
non-cacheable fields (e.g. "state", "transformedState", "stateDescription",
"commandDescription" for items, "statusInfo", "firmwareStatus",
"properties" for things, "status" for rules) will be retrieved along with
a `Last-Modified` HTTP response header. When unknown, the Last-Modified
header will be set to the date of the request.

Also only when this parameter is set, and a `If-Modified-Since` header is
found in the request, that header will be compared to the last known
modified date for the corresponding cacheable list. The last modified date
will be reset when any change is made on the elements of the underlying
registry. If the `If-Modified-Since` date is equal or more recent than the
last modified date, then a 304 Not Modified response with no content will
be served instead of the usual 200 OK, informing the client that its
cache is still valid at the provided date.

All other request parameters will be ignored except for "metadata" in the
`/rest/items` endpoint. When a metadata selector is set, the resulting
item list will be considered like a completely different resource, i.e.
it will have its own last modified date. Regarding metadata, the approach
to invalidating last modified dates is very conservative: when any metadata
is changed, all cacheable lists of items will have their last modified date
reset even if the change was in a metadata namespace that wasn't requested.

This also implements the abovedescribed behavior for the
`/rest/ui/components/{namespace}` endpoint, but no `cacheable` parameter
is necessary. The last modified date is tracked by namespace.

Signed-off-by: Yannick Schaus <[email protected]>
splatch pushed a commit to ConnectorIO/copybara-hab-core that referenced this issue Jul 12, 2023
* Closes openhab#3329.

This implements a new optional `cacheable` parameter for these REST endpoints:
- `/rest/items`
- `/rest/things`
- `/rest/rules`

When this parameter is set, a flat list of all elements excluding
non-cacheable fields (e.g. "state", "transformedState", "stateDescription",
"commandDescription" for items, "statusInfo", "firmwareStatus",
"properties" for things, "status" for rules) will be retrieved along with
a `Last-Modified` HTTP response header. When unknown, the Last-Modified
header will be set to the date of the request.

Also only when this parameter is set, and a `If-Modified-Since` header is
found in the request, that header will be compared to the last known
modified date for the corresponding cacheable list. The last modified date
will be reset when any change is made on the elements of the underlying
registry. If the `If-Modified-Since` date is equal or more recent than the
last modified date, then a 304 Not Modified response with no content will
be served instead of the usual 200 OK, informing the client that its
cache is still valid at the provided date.

All other request parameters will be ignored except for "metadata" in the
`/rest/items` endpoint. When a metadata selector is set, the resulting
item list will be considered like a completely different resource, i.e.
it will have its own last modified date. Regarding metadata, the approach
to invalidating last modified dates is very conservative: when any metadata
is changed, all cacheable lists of items will have their last modified date
reset even if the change was in a metadata namespace that wasn't requested.

This also implements the abovedescribed behavior for the
`/rest/ui/components/{namespace}` endpoint, but no `cacheable` parameter
is necessary. The last modified date is tracked by namespace.

Signed-off-by: Yannick Schaus <[email protected]>
GitOrigin-RevId: 6e83d3f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature of the Core
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants