Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(services/gdrive): List shows modified timestamp gdrive #5226

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

erickguan
Copy link
Contributor

@erickguan erickguan commented Oct 22, 2024

Which issue does this PR close?

Part of #4746.

Are there any user-facing changes?

None.

@erickguan erickguan force-pushed the modified-timestamp-gdrive branch 2 times, most recently from 8edbfff to 77cf29b Compare October 22, 2024 18:09
@erickguan erickguan marked this pull request as ready for review October 22, 2024 19:14
@erickguan
Copy link
Contributor Author

erickguan commented Oct 22, 2024

I did some experimentation (patch) with async support for listing operations, which helps reduce runtime as expected. However, extending OpList introduces a "breaking" change. Since OpenDAL has a compatibility package, I’m happy to coordinate with you when you're preparing for a breaking release, to minimize the amount of work on both sides. There's no urgency regarding async support, though.

A few other observations from running the behavior tests (not related to the PR):

  1. The behavior test fails on my local machine but passes in CI. I’m investigating the cause casually.
  2. The UUID random generation in behavior tests creates multiple levels of UUIDs. While this results in an extremely low chance of collision in the CI for OpenDAL's test accounts, I’d like to reduce the number of levels when running against my environment too.

// Return self at the first page.
if ctx.token.is_empty() && !ctx.done {
let path = build_rel_path(&self.core.root, &self.path);
let e = oio::Entry::new(&path, Metadata::new(EntryMode::DIR));
let mut metadata = Metadata::new(EntryMode::DIR);
if stat_file_metadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for not making the Metakey behavior clearer. (I'm working on this.)

Metakey represents a best effort hint and is not processed server-side. The service merely needs to supply the most comprehensive metadata available during listing.

The Operator will call stat as required, for example:

fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
// Returns `None` if we have errored.
if self.errored {
return Poll::Ready(None);
}
// Trying to pull more tasks if there are more space.
if self.tasks.has_remaining() {
// Building future if we have a lister available.
if let Some(mut lister) = self.lister.take() {
let fut = async move {
let res = lister.next_dyn().await;
(lister, res)
};
self.fut = Some(Box::pin(fut));
}
if let Some(fut) = self.fut.as_mut() {
if let Poll::Ready((lister, entry)) = fut.as_mut().poll(cx) {
self.lister = Some(lister);
self.fut = None;
match entry {
Ok(Some(oe)) => {
let (path, metadata) = oe.into_entry().into_parts();
if metadata.contains_metakey(self.required_metakey) {
self.tasks
.push_back(StatTask::Known(Some((path, metadata))));
} else {
let acc = self.acc.clone();
let fut = async move {
let res = acc.stat(&path, OpStat::default()).await;
(path, res.map(|rp| rp.into_metadata()))
};
self.tasks.push_back(StatTask::Stating(Box::pin(fut)));
}
}
Ok(None) => {
self.lister = None;
}
Err(err) => {
self.errored = true;
return Poll::Ready(Some(Err(err)));
}
}
}
}
}
// Try to poll tasks
if let Some((path, rp)) = ready!(self.tasks.poll_next_unpin(cx)) {
let metadata = rp?;
return Poll::Ready(Some(Ok(Entry::new(path, metadata))));
}
if self.lister.is_some() || self.fut.is_some() {
Poll::Pending
} else {
Poll::Ready(None)
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I get a weird Metadata instance problem, that Gdrivebackend::stat returns metadata with a timestamp but Lister::poll_next gets a result of Metadata without the timestamp.

I will debug it a bit.

@erickguan
Copy link
Contributor Author

erickguan commented Oct 27, 2024

@Xuanwo Thanks for the pointer. I've figured out how the PageList and Google Drive backend work together, but I’m currently stuck on a failing test with MinIO in the complete layer. Am I on the right track in continuing this PR?

From what I understand, the `CompleteLayer`` depends on the behavior of different services. Since you mentioned that work on metakey is underway, I’m trying not to alter OpenDAL’s behavior. Unfortunately, adding the new stat behavior in this layer breaks the MinIO service.

@erickguan erickguan requested a review from Xuanwo October 27, 2024 19:14
@@ -174,7 +174,12 @@ impl<A: Access> CompleteAccessor<A> {
}

if path == "/" {
return Ok(RpStat::new(Metadata::new(EntryMode::DIR)));
let meta = if capability.stat {
self.inner.stat(path, args).await?.into_metadata()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, it's intended that we don't send stat for /. The service should always make sure / exists.

let normalized_path = build_rel_path(root, &path);

let mut meta = Metadata::new(file_type);
// Resetting the metakey is necessary when mode is `EntryMode::DIR`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying cause might be here:

// If mode is dir, we should always mark it as complete.
if mode.is_dir() {
metakey |= Metakey::Complete
}

It seems reasonable to remove it.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 6, 2024

Also related to #5287

@erickguan
Copy link
Contributor Author

Thanks for tentative checking in with metakey!

@erickguan
Copy link
Contributor Author

After a closer look, can I pause this PR until your metakey work?

@Xuanwo
Copy link
Member

Xuanwo commented Nov 7, 2024

After a closer look, can I pause this PR until your metakey work?

Yes, after the metakey is refactored, our work here will be much cleaner.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 14, 2024

Yes, after the metakey is refactored, our work here will be much cleaner.

Hi, you can continue your work now.

@erickguan
Copy link
Contributor Author

Thanks for the ping! I'll pick it up and ping you when it's ready.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 14, 2024

Thanks for the ping! I'll pick it up and ping you when it's ready.

Thank you for your patience. Looking forward to your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants