Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-84 Convert async route to sync routes #43797

Merged

Conversation

pierrejeambrun
Copy link
Member

@pierrejeambrun pierrejeambrun commented Nov 7, 2024

As discussed in #43718 (comment), routes with blocking I/O code should be sync to not block main event loop. (db access, disk read, network call, etc...)

More information in FastAPI documentation https://fastapi.tiangolo.com/async/#path-operation-functions.

This PR converts all of the endpoints, all that can me converted to async def when we implement full async support.

(I recall Maybe one or two endpoints that are purely in memory and could stay async but I didn't bother to make an exception for them)

@bbovenzi
Copy link
Contributor

bbovenzi commented Nov 7, 2024

I see three endpoints still using async. Do we want to change those too?

Screenshot 2024-11-07 at 1 18 14 PM

@pierrejeambrun
Copy link
Member Author

Good catch,, I forgot the private API.

Updated thanks Brent.

@omkar-foss
Copy link
Collaborator

Hey @pierrejeambrun, I've one concern on this - if we use sync path funcs, the FastAPI requests will run in a threadpool (one request per thread), consuming more memory per request and will limit throughput on our new APIs, as there's a default limit of 40 threads on the threadpool, please see: encode/starlette#1724 (context: AnyIO is now used for async IO by Starlette, which in turn is used by FastAPI to handle http requests).

So, just a suggestion - instead of changing all the path funcs from async to sync, it'll be great if wrap the blocking (sync) function calls inside the path funcs to make them async, using asyncio.to_thread or asyncify or similar.

@dolfinus
Copy link
Contributor

dolfinus commented Nov 8, 2024

Hm, I don't get why this code:

@route.get(...)
def handler(...):
  something = session.get(...)
  other = session.select(...)

await asyncio.to_thread(handler)

does limit thoughput, but this doesn't:

@route.get(...)
async def handler(...):
  something = await asyncio.to_thread(session.get, ...)
  other = await asyncio.to_thread(session.select, ...)

Could you please elaborate?

@pierrejeambrun
Copy link
Member Author

I agree with @dolfinus, running in a separate thread manually or leveraging FastAPI to do so is more or less the same. (just less work and more code maintainability to let FastAPI handle that).

Long term we will rewrite that with full async support, in the meantime FastAPI is just sync for us.

@omkar-foss
Copy link
Collaborator

Hey @dolfinus, yes, the throughput in this case would be very similar for both snippets, because when calling asyncio.to_thread() (executor unspecified), the default thread pool executor will be used and be subject to same 40 thread limit. But unlike sync functions where entire request is processed in a thread, in async we'll have control over what should be processed in a separate thread.

Example: As our new APIs have a mix of CPU-bound (e.g. common params resolution, data checks, Pydantic validations, etc.) and IO-bound (e.g. DB queries, network calls) activities, I guess we could tune it something like this:

@route.get(...)
async def handler(...):
  # CPU bound
  if some_check:
      raise HTTPException(status.HTTP_404_NOT_FOUND, "Not Found")
  
  # IO bound, sent to it's own thread
  something = await asyncio.to_thread(session.get, ...)
  
  # CPU bound again
  return SomePydanticModel(something)

We could alternatively use run_in_threadpool instead of asyncio.to_thread which is provided by FastAPI, example here. Either way we go, it'll need thorough testing to understand what's working for us in terms of performance! :)

@omkar-foss
Copy link
Collaborator

Long term we will rewrite that with full async support, in the meantime FastAPI is just sync for us.

That would be lovely, thank you! :)

@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented Nov 8, 2024

I think that would be a lot of work to maintain + code becomes hard to read + 1 mistake (someone forget to manually put into the threadpool a blocking IO call) and then the main event loop is blocked...

I think we can start like that, and if it's not enough we can go deeper into the fine tuning of what is executed in the main even loop and what is run in a separate thread. I believe CPU bound operations run in a separate thread won't bottleneck. (And if they do, most likely the main even loop would struggle too, so we would have another problem here)

@pierrejeambrun pierrejeambrun merged commit 36e716a into apache:main Nov 8, 2024
52 checks passed
@pierrejeambrun pierrejeambrun deleted the aip-84-transform-async-to-sync-route branch November 8, 2024 11:39
@omkar-foss
Copy link
Collaborator

Yes sure, sounds good! Thanks @pierrejeambrun 👍🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

7 participants