Skip to content

Allow standard schemas to validate endpoint values #4864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 6, 2025

Conversation

EskiMojo14
Copy link
Collaborator

@EskiMojo14 EskiMojo14 commented Feb 26, 2025

Type-wise, schemas are required to match the type of the value they're validating - they can't change a string to a number for example. Transformations that preserve the correct type are allowed, however.

Schemas can also be used as a source of inference, for example

build.query({
  query: ({ id }) => `posts/${id}`,
  argSchema: v.object({ id: v.number() }),
  responseSchema: postSchema
})
// or more likely in a TS app, since arg will be checked statically
build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  responseSchema: postSchema
})

Schema parse failures are treated as unhandled errors, i.e. they're thrown and serialized rather than being returned. This is mainly because we have no idea what the base query error shape should look like, so we can't match it.

For transformable values, raw<value>Schema checks the value before transformation, and the <value>Schema checks the possibly transformed value.

build.query({
  query: (_arg: void) => `posts`,
  rawResponseSchema: v.array(postSchema),
  // raw response is inferred from schema
  transformResponse: (posts) =>
    postAdapter.getInitialState(undefined, posts),
  responseSchema: v.object({
    ids: v.array(v.number()),
    entities: v.record(
      v.pipe(v.string(), v.transform(Number), v.number()),
      postSchema,
    ),
  }),
})

@EskiMojo14 EskiMojo14 added the enhancement New feature or request label Feb 26, 2025
Copy link

codesandbox bot commented Feb 26, 2025

Review or Edit in CodeSandbox

Open the branch in Web EditorVS CodeInsiders

Open Preview

Copy link

codesandbox-ci bot commented Feb 26, 2025

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit b9dc923:

Sandbox Source
@examples-query-react/basic Configuration
@examples-query-react/advanced Configuration
@examples-action-listener/counter Configuration
rtk-esm-cra Configuration

Copy link

github-actions bot commented Feb 26, 2025

size-limit report 📦

Path Size
1. entry point: @reduxjs/toolkit/query (modern.mjs) 3.68 KB (+0.08% 🔺)
1. entry point: @reduxjs/toolkit/query/react (modern.mjs) 14.9 KB (+2.41% 🔺)
2. entry point: @reduxjs/toolkit/query (without dependencies) (modern.mjs) 110 B (+2.81% 🔺)
1. entry point: @reduxjs/toolkit/query (cjs, production.min.cjs) 23.93 KB (+1.78% 🔺)
1. entry point: @reduxjs/toolkit/query/react (cjs, production.min.cjs) 26.27 KB (+1.65% 🔺)
2. entry point: @reduxjs/toolkit/query (without dependencies) (cjs, production.min.cjs) 10.67 KB (+5.09% 🔺)
3. createApi (.modern.mjs) 15.3 KB (+2.32% 🔺)
3. createApi (react) (.modern.mjs) 17.32 KB (+2.01% 🔺)
3. fetchBaseQuery (.modern.mjs) 4.63 KB (+0.03% 🔺)

Copy link

netlify bot commented Feb 26, 2025

Deploy Preview for redux-starter-kit-docs ready!

Name Link
🔨 Latest commit b9dc923
🔍 Latest deploy log https://app.netlify.com/sites/redux-starter-kit-docs/deploys/67f2b8442c9c1f0008b6d88a
😎 Deploy Preview https://deploy-preview-4864--redux-starter-kit-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@agusterodin
Copy link

agusterodin commented Feb 26, 2025

FWIW, I really like where your head is at in the second example you proposed.

build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  resultSchema: postSchema
})

Input/request validation is probably out of scope for a data fetching solution anyway. If inaccurate types are an issue in someone's application, it should probably be addressed at the source of the problem, not as a last ditch effort right before request is placed. The CPU overhead of validating all inputs of all requests probably isn’t great either, especially if this is occurring on main thread.

I definitely prefer the terminology responseSchema btw, as it feels more intuitive, but I understand that RTKQ can be applied to async state management use-cases beyond just simple HTTP requests.

@EskiMojo14
Copy link
Collaborator Author

true - responseSchema would be more consistent with transformResponse. I'll switch it 🙂

@agusterodin
Copy link

Here is link to my initial idea pitch just so there is a paper trail:
https://x.com/agusterodin/status/1894623542471528888

Will respond directly to this PR for now on for brainstorming.

@agusterodin
Copy link

agusterodin commented Feb 26, 2025

It might be overkill, but i'm really enticed by the idea of performing schema validation in a web worker (off main thread). I have seen brief main thread lag when validating massive payloads from server (10MB+). One of the projects I work on professionally is a Tableau-esque data viewer where user can view and interact with up to 500k geospatial results.

One of my transformResponse functions was bogging the main thread down, so I experimented moving that transformation into a web worker. I ultimately decided not to go through with this, particularly because the type-safety situation was less than ideal.

Attached below is my rough attempt. The web worker was usable via promise.

Usage inside API slice

transformResponse: async (response: DashboardDataTableResult[]) => {
  return await transformDataTableResponseOffMainThread(response)
}

workerPromise.ts

import { DashboardDataTableResult } from 'datamodels'

export type TableRowByIdDictionary = Record<string, DashboardDataTableResult>

export function transformDataTableResponseOffMainThread(tableResults: DashboardDataTableResult[]): Promise<TableRowByIdDictionary> {
  return new Promise((resolve, reject) => {
    const worker = new Worker(new URL('./transformWorker.ts', import.meta.url))

    worker.onmessage = (event: MessageEvent<TableRowByIdDictionary>) => {
      resolve(event.data)
      worker.terminate()
    }

    worker.onerror = error => {
      reject(error)
      worker.terminate()
    }

    worker.postMessage(tableResults)
  })
}

workerLogic.ts

import { keyBy } from 'lodash'
import { DashboardDataTableResult } from 'datamodels'

self.onmessage = function (event: MessageEvent<DashboardDataTableResult[]>) {
  const data = event.data
  const transformedData = transformData(data)
  self.postMessage(transformedData)
}

function transformData(data: DashboardDataTableResult[]) {
  const chan: Partial<DashboardDataTableResult>[] = new Array(5000000).fill(0).map((_, i) => ({
    id: i,
    ipv4: 'zcxv',
    ipv6: '342342'
  }))
  return keyBy(chan, 'id')
}

Getting web worker to work with a variety of build tooling could be a pain though, particularly this kind of thing.
const worker = new Worker(new URL('./transformWorker.ts', import.meta.url)). It works well in Next.js, but I have no idea if other bundlers handle it the same way.

My example above is obviously for a different use case (transforming a response instead of validating a payload), but I figure is worth sharing.

@agusterodin
Copy link

agusterodin commented Feb 26, 2025

As for question about whether validation would occur before transformResponse or after:

"I would imagine responseSchema validating what data came back from the server pre-transformResponse.

If a transformResponse function is provided, perhaps the RTKQ endpoint will use the return type of their provided transformResponse function.

If no transformResponse function is provided, the RTKQ endpoint return type will be whatever was provided as responseSchema."

If rawReponseSchema for pre-tranformResponse is the route you end up going, it would be cool if responseSchema would be optional in if rawResponseSchema is provided.

@EskiMojo14
Copy link
Collaborator Author

EskiMojo14 commented Feb 26, 2025

It might be overkill
<snip>
My example above is obviously for a different use case (transforming a response instead of validating a payload), but I figure is worth sharing.

Definitely not something that makes sense for us to look into, but we would just accept any standard schema compliant value, so you're welcome to make your own:

declare function validateInWorker(input: unknown): Promise<StandardSchemaV1.Result<string>>

const workerSchema: StandardSchemaV1<string> = {
  '~standard': {
    version: 1,
    vendor: "custom",
    validate: validateInWorker
  }
}

😄

@EskiMojo14
Copy link
Collaborator Author

If rawReponseSchema for pre-tranformResponse is the route you end up going, it would be cool if responseSchema would be optional in if rawResponseSchema is provided.

All schemas are optional - the difficult thing with responseSchema vs rawResponseSchema is which one should be used for inference/checked against the provided type parameter 🤔

Currently I have it as rawResponseSchema is always StandardSchemaV1<BaseQueryResult<BaseQuery>>, and responseSchema is always StandardSchemaV1<ResultType>, which matches transformResponse's (res: BaseQueryResult<BaseQuery) => ResultType.

The difficulty with that is that rawResponseSchema cannot be used as a source of inference, so the below would not infer properly:

build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  rawResponseSchema: postSchema
})

however, this doesn't seem like a huge issue to me, as responseSchema will be used even if transformResponse isn't provided (it still happens, it just defaults to an identity function).

an ideal state of affairs would be

build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  rawResponseSchema: postSchema,
  transformResponse: (res /* inferred from rawResponseSchema */) => transformed // provides final data type
})
// or no transformResponse
build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  responseSchema: postSchema,
})

but this isn't currently possible (the inference part at least). The closest you'd get is manually annotating

build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  rawResponseSchema: postSchema,
  transformResponse: (res: Infer<typeof postSchema>) => transformed // provides final data type
})

@EskiMojo14
Copy link
Collaborator Author

cool, that wasn't actually too bad 😄
so basically:

// without transformResponse
build.query({
  query,
  responseSchema
})
// with transformResponse
build.query({
  query,
  rawResponseSchema,
  transformResponse
})

We could use a union to enforce this, but honestly the types are complicated enough and i think some people may still want to use both.

@agusterodin
Copy link

If rawReponseSchema for pre-tranformResponse is the route you end up going, it would be cool if responseSchema would be optional in if rawResponseSchema is provided.

All schemas are optional - the difficult thing with responseSchema vs rawResponseSchema is which one should be used for inference/checked against the provided type parameter 🤔

Currently I have it as rawResponseSchema is always StandardSchemaV1<BaseQueryResult<BaseQuery>>, and responseSchema is always StandardSchemaV1<ResultType>, which matches transformResponse's (res: BaseQueryResult<BaseQuery) => ResultType.

The difficulty with that is that rawResponseSchema cannot be used as a source of inference, so the below would not infer properly:

build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  rawResponseSchema: postSchema
})

however, this doesn't seem like a huge issue to me, as responseSchema will be used even if transformResponse isn't provided (it still happens, it just defaults to an identity function).

an ideal state of affairs would be

build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  rawResponseSchema: postSchema,
  transformResponse: (res /* inferred from rawResponseSchema */) => transformed // provides final data type
})
// or no transformResponse
build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  responseSchema: postSchema,
})

but this isn't currently possible (the inference part at least). The closest you'd get is manually annotating

build.query({
  query: ({ id }: { id: number }) => `posts/${id}`,
  rawResponseSchema: postSchema,
  transformResponse: (res: Infer<typeof postSchema>) => transformed // provides final data type
})

Is it not possible due to limitations in standard schema specification, the way RTKQ is currently structured, or due to limitation in TypeScript language itself?

@EskiMojo14
Copy link
Collaborator Author

EskiMojo14 commented Feb 26, 2025

Is it not possible due to limitations in standard schema specification, the way RTKQ is currently structured, or due to limitation in TypeScript language itself?

Due to how it's currently structured - I've just pushed a change adding the inference :)

@agusterodin
Copy link

Have’t thought too deeply about it, but agreed about not using union.

Adding the ability to provide both rawResponseSchema and regular responseSchema may not a be common use-case, but probably wouldn't hurt to have.

@agusterodin
Copy link

agusterodin commented Feb 26, 2025

Ability to provide a global callback for situation where schema validation fails would be super valuable btw. Would allow for warning toast, logging to external service, etc.

Callback would contain information about the request such as endpoint name, request payload, etc.


Ability to enable/disable schema validation on global and per-endpoint basis may be nice too. Similar to how RTKQ has a global keepUnusedDataFor value, but have ability to override at the endpoint level.

@EskiMojo14
Copy link
Collaborator Author

EskiMojo14 commented Feb 26, 2025

Ability to provide a global callback for situation where schema validation fails would be super valuable btw. Would allow for warning toast, logging to external service, etc.

Callback would contain information about the request such as endpoint name, request payload, etc.

Added - callback parameters are onSchemaFailure(error, info) where error is an extended SchemaError with .value (the original value before parsing) and .schemaName (e.g. "argSchema") and info is an object like { endpoint: 'getPost', arg: 1, type: "query", queryCacheKey: "getPost(1)" }

@EskiMojo14 EskiMojo14 changed the title Experiment with allowing standard schemas to validate endpoint values Allow standard schemas to validate endpoint values Mar 25, 2025
@markerikson
Copy link
Collaborator

markerikson commented Mar 30, 2025

@agusterodin can you give an example of when/why an option to skip schema validation would actually be necessary? Like, in that geospatial results blog example: do you really need to validate the entire response? when would you want to supply a schema for that endpoint and then not run it? not sure I follow the intended use case for such an option.

is it just to get the type inference? "trust me, the data will look like this, just don't bother checking for me"?

@markerikson markerikson merged commit 45a95cb into master Apr 6, 2025
118 checks passed
@agusterodin
Copy link

@agusterodin can you give an example of when/why an option to skip schema validation would actually be necessary? Like, in that geospatial results blog example: do you really need to validate the entire response? when would you want to supply a schema for that endpoint and then not run it? not sure I follow the intended use case for such an option.

is it just to get the type inference? "trust me, the data will look like this, just don't bother checking for me"?

The rationale would be to still take advantage of type inference and so that endpoint definitions can be consistent. Mixing-and-matching providing standard schema vs. providing generics (the current way of defining request/response types) based solely on whether you want to validate payload or not would likely be clunky.


More context on why we skip schema validation for certain endpoints: mostly for performance reasons. There are a few endpoints we use that return massive amounts of data (10MB+) and cause the browser to dramatically slow down when trying to validate the schema.

For these rare cases, we leave the Zod schema in the code but comment it out.

image

There are probably other ways to mitigate:

  • Paginating the data (would require extensive changes to the way UI works).
  • Performing schema validation in a web worker (briefly mentioned earlier in thread, not sure if overhead of thread communication is worth it and if type-safety situation is viable).

@EskiMojo14 EskiMojo14 deleted the endpoint-schemas branch April 7, 2025 03:09
@markerikson
Copy link
Collaborator

@agusterodin gotcha, thanks for the response.

I'm planning to ship this in 2.7.0. Was going to release that today :) but realized we still need docs for this feature, plus a couple other tweaks. Could you give the current PR build a shot and give me any last-minute feedback before this goes live?

@agusterodin
Copy link

@agusterodin gotcha, thanks for the response.

I'm planning to ship this in 2.7.0. Was going to release that today :) but realized we still need docs for this feature, plus a couple other tweaks. Could you give the current PR build a shot and give me any last-minute feedback before this goes live?

Hell yeah! Will test and provide feedback over next couple days

@agusterodin
Copy link

agusterodin commented Apr 8, 2025

Have been playing around with PR build. I absolutely love how the standard schema integration is implemented!

I have been testing it here https://github.com/agusterodin/rtkq-playground/tree/standard-schema. Provided this link in case it helps investigating anything Next.js specific (be sure you're on the standard-schema branch).

Overall extremely thrilled with this feature. Here are some random observations:


Seeing intercept-console-error.js:50 An unhandled error occurred processing a request for the endpoint "getPokemon". In the case of an unhandled error, no tags will be "provided" or "invalidated". This may be Next.js-specific.

Screenshot 2025-04-07 at 10 53 37 PM


Schema errors bubble up to Next.js error dialog since they are uncaught (as shown in image above). Ideally RTK catches the error itself so that the Next.js error dialog doesn't get triggered. I much prefer displaying a warning toast message and opening the browser devtools console if I want to investigate the error.

In my current Zod schema validation implementation (not the official RTK implementation in this PR), i'm able to avoid having the parse error "bubble up" to the Next.js error dialog by putting it in a try-catch block like this

export const baseQueryWithZodValidation: (baseQuery: BaseQuery) => BaseQuery = (baseQuery: BaseQuery) => async (args, api, extraOptions) => {
  const returnValue = await baseQuery(args, api, extraOptions)
  const zodSchema = extraOptions?.responseSchema
  const { data } = returnValue
  if (data && zodSchema) {
    try {
      zodSchema.parse(data)
    }
    catch (error) {
      if (error instanceof ZodError) {
        toast.warning('Response schema mismatch, see console for details', { id: 'schemaMismatch' })
        LogRocket.captureMessage(`Zod mismatch (${api.endpoint})`)
        console.error(error)
        return {
          error: {
            data: data.toString(),
            error: error.toString(),
            originalStatus: returnValue.meta?.response?.status || 0,
            status: 'PARSING_ERROR'
          }
        }
      }
    }
  }
  return returnValue
}

I understand that other people may actually want this error to "bubble up" to the Next.js error dialog (possibly matter of personal preference). In that case, the error can be rethrown inside of the global onSchemaFailure handler and left uncaught like this

onSchemaFailure: (error) => {
    throw error
}

Is there any way to access/log the parsing error message produced by Zod? In the screenshot below I call parse on Zod schema manually in browser console and see a much more detailed message (eg: exact name of fields in object that are mismatched).

These detailed Zod errors are one of the primary value-adds of data fetch schema validation in my opinion. They are extremely useful for both local development and issue triage. We frequently attach screenshots of these detailed Zod errors to Slack threads and bug tickets.

Screenshot 2025-04-07 at 11 14 41 PM

In comparison, the only message I currently see in PR build is SchemaError: Expected string, received number, which doesn't tell information such as the name of field in the Zod object that has the type mismatch.

Not sure if this is RTKQ specific or is just the way Zod exposes errors for standard schema usages.


I may not be grasping how errorResponseSchema is intended to work.

My assumption is that it validates the JSON the server sends back (if any) when receiving a response with an error status code (400 or 500 series).

Something like this (note that this is not how the PokeAPI formats errors, this is just a hypothetical):

const PokemonApiCustomError = z.object({
  errorCode: z.string(),
  shortErrorMessage: z.string(),
  verboseErrorMessage: z.string()
})

And you would define it like this in your endpoint definition:

errorResponseSchema: PokemonApiCustomError

If I provide errorResponseSchema as shown above this, TypeScript errors appear. It seems that TypeScript expects errorResponseSchema to look something like this:

errorResponseSchema: z.object({
  status: z.literal('CUSTOM_ERROR'),
  error: z.string(),
  data: PokemonApiCustomError
})

Is there a way to simplify this so that you only need to provide the schema like this? From what I can tell, we don't have control over status or error anyway as RTK provides those.

errorResponseSchema: PokemonApiCustomError

Is there any way errorResponseSchema can enhance the typing of the error returned by a RTKQ query hook result? The type of query hook result error is FetchBaseQueryError | SerializedError | undefined regardless of what errorResponseSchema is set to.

There may be a way to enhance this since we know what the error response schema should be (if schema validation succeeds).


This may be relatively obscure, but how do we distinguish something as a "schema validation error" from error in query hook result?

I could imagine it looking something like this (obviously SCHEMA_ERROR isn't currently a thing):

import { useGetPokemonQuery } from '@/state/pokemonApi'
import { FetchBaseQueryError } from '@reduxjs/toolkit/query'

function isFetchBaseQueryError(error: unknown): error is FetchBaseQueryError {
  return Boolean((error as FetchBaseQueryError)?.status)
}

export default function StandardSchemaExample() {
  const { data: pokemon, error } = useGetPokemonQuery('ditto')

  if (error && isFetchBaseQueryError(error) && error.status === 404) {
    return <div>Server responded saying the pokemon doesn't exist.</div>
  } 
  else if (error && isFetchBaseQueryError(error) && error.status === 'TIMEOUT_ERROR') {
    return <div>Connection to server timed out.</div>
  } 
  else if (error && isFetchBaseQueryError(error) && error.status === 'SCHEMA_ERROR') {
    return <div>Response from server wasn't in expected format.</div>
  }
  
  return <div>{JSON.stringify(pokemon)}</div>
}

Side note: would be awesome if RTKQ provided an out-of-the-box isFetchBaseQueryError type guard. Useful when you want to show specific UI based on error status code of response.

@EskiMojo14
Copy link
Collaborator Author

The root issue here is that with the way RTKQ is architected, there are only two types of error:

  • "expected" errors
    • type is determined by base query (in this case FetchBaseQueryError), and queryFn errors are required to match this
    • strangely transformErrorResponse breaks this contract - you can return anything and the rest of your app will act like it matches the base query errors
  • "unexpected" errors
    • always SerializedError, prevents tag invalidation

There's no such thing as a unique error type for an endpoint, for example, and adding this would be a major restructure of RTKQ's types.

With regards to schemas, we have absolutely no way of knowing how to turn our schema errors into a shape that would match the base query's errors. We could possibly leave this up to the user:

const api = createApi({
  baseQuery: fetchBaseQuery(),
  catchSchemaFailure: (error, info) => ({ status: "CUSTOM_ERROR", data: error.issues, error: `${error.schemaName} failed validation` })
  endpoints: () => ({})
})

This still raises questions - should the schema error be passed to transformErrorResponse? should it also be passed to the error schemas?

A more drastic approach would be adding a whole new type of error to the system specifically for schema failures, but this would be potentially breaking for code like the above that's only expecting BaseQueryError<BaseQuery> | SerializedError.

All the reasons above is also why your errorResponseSchema needs to match the whole base query error, because we'd have no idea how to extract only the server response from the shape returned from the base query. None of our code can be written assuming that errors will only ever be FetchBaseQueryError, because other base queries may use errors that use look nothing like it.

@EskiMojo14
Copy link
Collaborator Author

@agusterodin i've raised #4934 to add catchSchemaFailure - would you be able to try out the build from that and see if that helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants