-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: ABFS implementation #11419
base: main
Are you sure you want to change the base?
refactor: ABFS implementation #11419
Conversation
✅ Deploy Preview for meta-velox canceled.
|
58c4444
to
be9b010
Compare
* To facilitate unit testing of file write scenarios, we define the | ||
* AzureDatalakeFileClient here, which can be mocked during testing. | ||
*/ | ||
class AdlsFileClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use AzureDatalakeFileClient, AdlsFileClient indicates a different thing in our context.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Azure client name for writing is DataLakeFileClient
. I renamed the implementations accordingly and tried to keep the name short.
* https://github.com/Azure/Azurite/wiki/ADLS-Gen2-Implementation-Guidance | ||
* | ||
* To facilitate unit testing of file write scenarios, we define the | ||
* IBlobStorageFileClient here, which can be mocked during testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the name here also.
|
||
namespace facebook::velox::filesystems { | ||
|
||
static std::string kAzureBlobEndpoint{"fs.azure.blob-endpoint"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this property do? Is this for testing only, if yes please add comments.
Thanks.
890d571
to
cd4a59f
Compare
@zhli1142015 thanks for your review! I addressed your comments. Can you take another look? |
cd4a59f
to
9c88379
Compare
@zhli1142015 I noticed that the usage of DataLakeFileClient::Flush is not optimal. We flush when the file is closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice refactor. Looks much more clean now.
std::string_view file; | ||
bool isHttps = true; | ||
if (path.find(kAbfssScheme) == 0) { | ||
file = path.substr(8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use
file = path.substr(kAbfssScheme.length());
if (path.find(kAbfssScheme) == 0) { | ||
file = path.substr(8); | ||
} else if (path.find(kAbfsScheme) == 0) { | ||
file = path.substr(7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use
file = path.substr(kAbfsScheme.length());
|
||
namespace facebook::velox::filesystems::abfs { | ||
namespace facebook::velox::config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of your changes but this header doesn't have #pragma once
. I think we always should have that?
const config::ConfigBase& config) { | ||
auto abfsAccount = AbfsConfig(path, config); | ||
std::shared_ptr<AzureDataLakeFileClient> client = | ||
std::make_shared<DataLakeFileClientWrapper>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why the DataLakeFileClientWrapper is shared_ptr and not unique_ptr? Would something else access this if it is part of the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared_ptr made it easier to write tests. But I changed this to a unique_ptr as I think it should be as well.
abfssConfig.connectionString(), | ||
"DefaultEndpointsProtocol=https;AccountName=foobar;AccountKey=456;EndpointSuffix=core.windows.net;"); | ||
|
||
// test with special characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Commit not complete sentence.
{{"fs.azure.account.key.test.dfs.core.windows.net", key_}, | ||
{kAzureBlobEndpoint, endpoint}}); | ||
|
||
// Update the default config map with the supplied configOverride map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Missing .
. Or do we even need this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the function documentation instead.
|
||
virtual ~AzuriteServer(); | ||
|
||
private: | ||
int64_t port_; | ||
std::string account_{"test"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe all of these 4 new members are const?
I thought we don't cache data and send by chunk. The |
@czentgr thanks for the review. I addressed your comments. |
Filed #11456 for the write improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one nit.
filePath_ = tempFile->getPath(); | ||
} | ||
|
||
MockDataLakeFileClient(std::string_view filePath) : filePath_(filePath) {} | ||
|
||
std::string_view path() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, this could be std::string_view path() const
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
3502540
to
48ea8d7
Compare
This needs a maintainer to approve, cc: @Yuhta @xiaoxmeng https://velox-lib.io/docs/community/components-and-maintainers/ |
@kevinwilfong I am the maintainer for the storage_adapters and I approve this :). |
I just double checked with the PLC and it sounds like that's not sufficient. |
48ea8d7
to
5475558
Compare
@zhli1142015 Can you please take a look and approve this PR? |
Combine AbfsAccount and AbfsConfig in a separate file.
Clean up API naming and clarify semantics.
Add a new constructor for AbfsWriteFile to specify a client. This is used for testing.