Fastest/best performing way to load all files? #586
-
We need to query several document libraries and retrieve all files from there. Also we need some custom metadata. The metadata is not available directly at the Is there any performant way to do that with as many requests as possible? I'm thinking that throttling could become a problem here. Our current approach is: Create Context for a Site -> Get the wanted document libraries -> loop through the Also I would like to know how to handle throttling? Does the library handle it in any way? Even a custom exception or something like that would be enough for us. We would then retry it at a later time. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Hi @LegendaryB , I've just made a change to PnP Core SDK that enables below approach to work. There's one request to find all applicable lists, one request per list per 500 items to find the downloadable items and then one request per actual file to download. Next to the option to download the file you also have the metadata of the list item to work with. You'll have to wait until tomorrow for the next nightly build to make below code work. // grab all document libraries that are not hidden
var lists = await context.Web.Lists.QueryProperties(p => p.Fields.QueryProperties(p => p.InternalName,
p => p.FieldTypeKind,
p => p.TypeAsString,
p => p.Title))
.Where(p => p.TemplateType == ListTemplateType.DocumentLibrary && p.Hidden == false)
.ToListAsync();
// iterate over the found libraries
foreach (var list in lists)
{
// Query the library, filter on the files only and load the needed metadata (FieldRef's) using a paged approach
// Use orderby to make the CAML query work for large libraries (avoids table scan in SQL backend)
string viewXml = @"<View>
<ViewFields>
<FieldRef Name='Title' />
<FieldRef Name='FileLeafRef' />
<FieldRef Name='FSObjType'/>
<FieldRef Name='FileDirRef'/>
</ViewFields>
<Query>
<Where>
<Eq>
<FieldRef Name='FSObjType'/>
<Value Type='Integer'>0</Value>
</Eq>
</Where>
</Query>
<OrderBy Override='TRUE'><FieldRef Name='ID' Ascending='FALSE' /></OrderBy>
<RowLimit Paged='TRUE'>500</RowLimit>
</View>";
bool paging = true;
string nextPage = null;
while (paging)
{
var output = await list.LoadListDataAsStreamAsync(new RenderListDataOptions()
{
ViewXml = viewXml,
RenderOptions = RenderListDataOptionsFlags.ListData,
Paging = nextPage ?? null,
}).ConfigureAwait(false);
if (output.ContainsKey("NextHref"))
{
nextPage = output["NextHref"].ToString().Substring(1);
}
else
{
paging = false;
}
}
// Iterate over the retrieved list items and process them
foreach (var listItem in list.Items.AsRequested())
{
// Use your metadata
if (listItem["FileLeafRef"].ToString().EndsWith(".docx", StringComparison.InvariantCultureIgnoreCase))
{
// do something Word specific
}
// Download the the file behind the list item, use an async streaming approach to speed up things
using (Stream downloadedContentStream = await listItem.File.GetContentAsync(true))
{
var bufferSize = 2 * 1024 * 1024; // 2 MB buffer
using (var content = System.IO.File.Create($"e:\\temp\\downloadtest\\{listItem["FileLeafRef"]}.downloaded"))
{
var buffer = new byte[bufferSize];
int read;
while ((read = await downloadedContentStream.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
content.Write(buffer, 0, read);
}
}
}
}
} |
Beta Was this translation helpful? Give feedback.
-
Thank you @jansenbe, also I would like to know how to handle throttling? Does the library handle it in any way? Even a custom exception or something like that would be enough for us. We would then retry it at a later time. Also I can't get this to work:
I also tried that:
As you can see I would like to retrieve just the files in this call so that I don't need to process folders for example in my loop |
Beta Was this translation helpful? Give feedback.
-
@LegendaryB : throttling is handled automatically, the library will wait and retry. See https://pnp.github.io/pnpcore/using-the-sdk/basics-settings.html#settings-overview for more details on the throttling configuration options. About your other question: please use the approach I've outlined above, it already excludes folders via the filter on |
Beta Was this translation helpful? Give feedback.
Hi @LegendaryB ,
I've just made a change to PnP Core SDK that enables below approach to work. There's one request to find all applicable lists, one request per list per 500 items to find the downloadable items and then one request per actual file to download. Next to the option to download the file you also have the metadata of the list item to work with.
You'll have to wait until tomorrow for the next nightly build to make below code work.