-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flang][MLIR][OpenMP] - Add support for firstprivate when translating omp.target ops from MLIR to LLVMIR #131213
base: main
Are you sure you want to change the base?
[Flang][MLIR][OpenMP] - Add support for firstprivate when translating omp.target ops from MLIR to LLVMIR #131213
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-flang-openmp Author: Pranav Bhandarkar (bhandarkar-pranav) ChangesThis patch adds support to translate Full diff: https://github.com/llvm/llvm-project/pull/131213.diff 5 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 781b0dfceff9e..59d4a5563d3de 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -25,6 +25,7 @@
#include "flang/Semantics/attr.h"
#include "flang/Semantics/tools.h"
+#define DEBUG_TYPE "flang-data-sharing"
namespace Fortran {
namespace lower {
namespace omp {
@@ -548,7 +549,6 @@ void DataSharingProcessor::doPrivatize(const semantics::Symbol *sym,
}
mlir::Type argType = privVal.getType();
-
mlir::omp::PrivateClauseOp privatizerOp = [&]() {
auto moduleOp = firOpBuilder.getModule();
auto uniquePrivatizerName = fir::getTypeAsString(
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index b1568cc12a05a..0094abe044b86 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1666,13 +1666,19 @@ static void genTargetClauses(
cp.processNowait(clauseOps);
cp.processThreadLimit(stmtCtx, clauseOps);
- cp.processTODO<clause::Allocate, clause::Defaultmap, clause::Firstprivate,
- clause::InReduction, clause::UsesAllocators>(
- loc, llvm::omp::Directive::OMPD_target);
+ cp.processTODO<clause::Allocate, clause::Defaultmap, clause::InReduction,
+ clause::UsesAllocators>(loc,
+ llvm::omp::Directive::OMPD_target);
// `target private(..)` is only supported in delayed privatization mode.
if (!enableDelayedPrivatizationStaging)
cp.processTODO<clause::Private>(loc, llvm::omp::Directive::OMPD_target);
+
+ // We do not yet have MLIR to LLVMIR translation for privatization in
+ // for deferred target tasks.
+ if (clauseOps.nowait)
+ cp.processTODO<clause::Private, clause::Firstprivate>(
+ loc, llvm::omp::Directive::OMPD_target);
}
static void genTargetDataClauses(
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index f5a8a7ba04375..bac62da067db5 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -81,7 +81,7 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
There are no restrictions on the body except for:
- The `dealloc` regions has a single argument.
- - The `init & `copy` regions have 2 arguments.
+ - The `init` & `copy` regions have 2 arguments.
- All three regions are terminated by `omp.yield` ops.
The above restrictions and other obvious restrictions (e.g. verifying the
type of yielded values) are verified by the custom op verifier. The actual
@@ -90,15 +90,15 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
Instances of this op would then be used by ops that model directives that
accept data-sharing attribute clauses.
- The $sym_name attribute provides a symbol by which the privatizer op can be
+ The `sym_name` attribute provides a symbol by which the privatizer op can be
referenced by other dialect ops.
- The $type attribute is the type of the value being privatized. This type
+ The `type` attribute is the type of the value being privatized. This type
will be implicitly allocated in MLIR->LLVMIR conversion and passed as the
second argument to the init region. Therefore the type of arguments to
- the regions should be a type which represents a pointer to $type.
+ the regions should be a type which represents a pointer to `type`.
- The $data_sharing_type attribute specifies whether privatizer corresponds
+ The `data_sharing_type` attribute specifies whether privatizer corresponds
to a `private` or a `firstprivate` clause.
}];
@@ -161,9 +161,10 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]>
/// needsMap returns true if the value being privatized should additionally
/// be mapped to the target region using a MapInfoOp. This is most common
/// when an allocatable is privatized. In such cases, the descriptor is used
- /// in privatization and needs to be mapped on to the device.
+ /// in privatization and needs to be mapped on to the device. The use of
+ /// firstprivate also causes the need to map the host variable to the device.
bool needsMap() {
- return initReadsFromMold();
+ return readsFromMold();
}
/// Get the type for arguments to nested regions. This should
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 32c7c501d03c3..87b7ab9b6b0a3 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -44,6 +44,7 @@
#include <optional>
#include <utility>
+#define DEBUG_TYPE "openmp-to-llvmir"
using namespace mlir;
namespace {
@@ -211,19 +212,9 @@ static LogicalResult checkImplementationStatus(Operation &op) {
};
auto checkPrivate = [&todo](auto op, LogicalResult &result) {
if constexpr (std::is_same_v<std::decay_t<decltype(op)>, omp::TargetOp>) {
- // Privatization clauses are supported, except on some situations, so we
- // need to check here whether any of these unsupported cases are being
- // translated.
- if (std::optional<ArrayAttr> privateSyms = op.getPrivateSyms()) {
- for (Attribute privatizerNameAttr : *privateSyms) {
- omp::PrivateClauseOp privatizer = findPrivatizer(
- op.getOperation(), cast<SymbolRefAttr>(privatizerNameAttr));
-
- if (privatizer.getDataSharingType() ==
- omp::DataSharingClauseType::FirstPrivate)
- result = todo("firstprivate");
- }
- }
+ // Privatization is supported only for include target tasks.
+ if (!op.getPrivateVars().empty() && op.getNowait())
+ result = todo("privatization for deferred target tasks");
} else {
if (!op.getPrivateVars().empty() || op.getPrivateSyms())
result = todo("privatization");
@@ -4810,6 +4801,10 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
.failed())
return llvm::make_error<PreviouslyReportedError>();
+ if (failed(copyFirstPrivateVars(builder, moduleTranslation, mlirPrivateVars,
+ llvmPrivateVars, privateDecls)))
+ return llvm::make_error<PreviouslyReportedError>();
+
SmallVector<Region *> privateCleanupRegions;
llvm::transform(privateDecls, std::back_inserter(privateCleanupRegions),
[](omp::PrivateClauseOp privatizer) {
diff --git a/mlir/test/Target/LLVMIR/openmp-target-private.mlir b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
index f97360e2c6e84..0e651482647f0 100644
--- a/mlir/test/Target/LLVMIR/openmp-target-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-target-private.mlir
@@ -18,11 +18,6 @@ llvm.func @target_map_single_private() attributes {fir.internal_name = "_QPtarge
}
llvm.return
}
-// CHECK: define internal void @__omp_offloading_
-// CHECK-NOT: define {{.*}}
-// CHECK: %[[PRIV_ALLOC:.*]] = alloca i32, align 4
-// CHECK: %[[ADD:.*]] = add i32 {{.*}}, 10
-// CHECK: store i32 %[[ADD]], ptr %[[PRIV_ALLOC]], align 4
omp.private {type = private} @n.privatizer : f32
@@ -50,16 +45,6 @@ llvm.func @target_map_2_privates() attributes {fir.internal_name = "_QPtarget_ma
}
-// CHECK: define internal void @__omp_offloading_
-// CHECK: %[[PRIV_I32_ALLOC:.*]] = alloca i32, align 4
-// CHECK: %[[PRIV_FLOAT_ALLOC:.*]] = alloca float, align 4
-// CHECK: %[[ADD_I32:.*]] = add i32 {{.*}}, 10
-// CHECK: store i32 %[[ADD_I32]], ptr %[[PRIV_I32_ALLOC]], align 4
-// CHECK: %[[LOAD_I32_AGAIN:.*]] = load i32, ptr %[[PRIV_I32_ALLOC]], align 4
-// CHECK: %[[CAST_TO_FLOAT:.*]] = sitofp i32 %[[LOAD_I32_AGAIN]] to float
-// CHECK: %[[ADD_FLOAT:.*]] = fadd contract float %[[CAST_TO_FLOAT]], 1.100000e+01
-// CHECK: store float %[[ADD_FLOAT]], ptr %[[PRIV_FLOAT_ALLOC]], align 4
-
// An entirely artifical privatizer that is meant to check multi-block
// privatizers. The idea here is to prove that we set the correct
// insertion points for the builder when generating, first, LLVM IR for the
@@ -79,10 +64,6 @@ llvm.func @target_op_private_multi_block(%arg0: !llvm.ptr) {
}
llvm.return
}
-// CHECK: define internal void @__omp_offloading_
-// CHECK: %[[PRIV_ALLOC:.*]] = alloca float, align 4
-// CHECK: %[[PHI_ALLOCA:.*]] = phi ptr [ %[[PRIV_ALLOC]], {{.*}} ]
-// CHECK: %[[RESULT:.*]] = load float, ptr %[[PHI_ALLOCA]], align 4
// Descriptors are needed for CHARACTER arrays and their type is
// !fir.boxchar<KIND>. When such arrays are used in the private construct, the
@@ -165,10 +146,101 @@ llvm.func @llvm.memmove.p0.p0.i64(!llvm.ptr, !llvm.ptr, i64, i1) attributes {sym
-// CHECK: define internal void @__omp_offloading_{{.*}}(ptr %{{[^,]+}}, ptr %[[MAPPED_ARG:.*]]) {
+omp.private {type = firstprivate} @sf.firstprivate : f32 copy {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+ %0 = llvm.load %arg0 : !llvm.ptr -> f32
+ llvm.store %0, %arg1 : f32, !llvm.ptr
+ omp.yield(%arg1 : !llvm.ptr)
+}
+omp.private {type = firstprivate} @sv.firstprivate : i32 copy {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+ %0 = llvm.load %arg0 : !llvm.ptr -> i32
+ llvm.store %0, %arg1 : i32, !llvm.ptr
+ omp.yield(%arg1 : !llvm.ptr)
+}
+llvm.func @target_simple_() attributes {fir.internal_name = "_QPtarget_simple"} {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %sv = llvm.alloca %0 x i32 {bindc_name = "sv"} : (i64) -> !llvm.ptr
+ %sf = llvm.alloca %0 x f32 {bindc_name = "sf"} : (i64) -> !llvm.ptr
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ %5 = llvm.mlir.constant(1 : i64) : i64
+ %6 = omp.map.info var_ptr(%sv : !llvm.ptr, i32) map_clauses(to) capture(ByRef) -> !llvm.ptr
+ %7 = omp.map.info var_ptr(%sf : !llvm.ptr, f32) map_clauses(to) capture(ByRef) -> !llvm.ptr
+ omp.target map_entries(%6 -> %arg0, %7 -> %arg1 : !llvm.ptr, !llvm.ptr) private(@sv.firstprivate %sv -> %arg2 [map_idx=0], @sf.firstprivate %sf -> %arg3 [map_idx=1] : !llvm.ptr, !llvm.ptr) {
+ %8 = llvm.mlir.constant(2.000000e+00 : f64) : f64
+ %9 = llvm.mlir.constant(10 : i32) : i32
+ %10 = llvm.load %arg2 : !llvm.ptr -> i32
+ %11 = llvm.add %10, %9 : i32
+ llvm.store %11, %arg2 : i32, !llvm.ptr
+ %12 = llvm.load %arg3 : !llvm.ptr -> f32
+ %13 = llvm.fpext %12 : f32 to f64
+ %14 = llvm.fadd %13, %8 {fastmathFlags = #llvm.fastmath<contract>} : f64
+ %15 = llvm.fptrunc %14 : f64 to f32
+ llvm.store %15, %arg3 : f32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
+}
+// CHECK: define void @target_map_single_private() {
+// CHECK: call void @__omp_offloading_[[MAP_SINGLE_PRIVATE_OFFLOADED_FUNCTION:.*]](ptr {{.*}})
+// CHECK: define void @target_map_2_privates() {
+// CHECK: call void @__omp_offloading_[[MAP_2_PRIVATES_OFFLOADED_FUNCTION:.*]](ptr {{.*}})
+// CHECK: define void @target_op_private_multi_block
+// CHECK: call void @__omp_offloading_[[PRIVATE_MULTI_BLOCK_OFFLOADED_FUNCTION:.*]]()
+// CHECK: define void @target_boxchar_
+// CHECK: call void @__omp_offloading_[[BOXCHAR_OFFLOADED_FUNCTION:.*]](ptr {{.*}}, ptr {{.*}})
+// CHECK: define void @target_simple_()
+// CHECK: call void @__omp_offloading_[[SIMPLE_OFFLOADED_FUNCTION:.*]](ptr {{.*}}, ptr {{.*}})
+
+// CHECK: define internal void @__omp_offloading_[[MAP_SINGLE_PRIVATE_OFFLOADED_FUNCTION]]
+// CHECK: %[[PRIV_ALLOC:.*]] = alloca i32, align 4
+// CHECK: %[[ADD:.*]] = add i32 {{.*}}, 10
+// CHECK: store i32 %[[ADD]], ptr %[[PRIV_ALLOC]], align 4
+
+
+
+
+// CHECK: define internal void @__omp_offloading_[[MAP_2_PRIVATES_OFFLOADED_FUNCTION]]
+// CHECK: %[[PRIV_I32_ALLOC:.*]] = alloca i32, align 4
+// CHECK: %[[PRIV_FLOAT_ALLOC:.*]] = alloca float, align 4
+// CHECK: %[[ADD_I32:.*]] = add i32 {{.*}}, 10
+// CHECK: store i32 %[[ADD_I32]], ptr %[[PRIV_I32_ALLOC]], align 4
+// CHECK: %[[LOAD_I32_AGAIN:.*]] = load i32, ptr %[[PRIV_I32_ALLOC]], align 4
+// CHECK: %[[CAST_TO_FLOAT:.*]] = sitofp i32 %[[LOAD_I32_AGAIN]] to float
+// CHECK: %[[ADD_FLOAT:.*]] = fadd contract float %[[CAST_TO_FLOAT]], 1.100000e+01
+// CHECK: store float %[[ADD_FLOAT]], ptr %[[PRIV_FLOAT_ALLOC]], align 4
+
+// CHECK: define internal void @__omp_offloading_[[PRIVATE_MULTI_BLOCK_OFFLOADED_FUNCTION]]
+// CHECK: %[[PRIV_ALLOC:.*]] = alloca float, align 4
+// CHECK: %[[PHI_ALLOCA:.*]] = phi ptr [ %[[PRIV_ALLOC]], {{.*}} ]
+// CHECK: %[[RESULT:.*]] = load float, ptr %[[PHI_ALLOCA]], align 4
+
+
+// CHECK: define internal void @__omp_offloading_[[BOXCHAR_OFFLOADED_FUNCTION]](ptr %{{[^,]+}}, ptr %[[MAPPED_ARG:.*]]) {
// CHECK: %[[BOXCHAR:.*]] = load { ptr, i64 }, ptr %[[MAPPED_ARG]]
// CHECK: %[[BOXCHAR_PTR:.*]] = extractvalue { ptr, i64 } %[[BOXCHAR]], 0
// CHECK: %[[BOXCHAR_i64:.*]] = extractvalue { ptr, i64 } %[[BOXCHAR]], 1
// CHECK: %[[MEM_ALLOC:.*]] = alloca i8, i64 %[[BOXCHAR_i64]]
// CHECK: %[[PRIV_BOXCHAR0:.*]] = insertvalue { ptr, i64 } undef, ptr %[[MEM_ALLOC]], 0
// CHECK: %[[PRIV_BOXCHAR1:.*]] = insertvalue { ptr, i64 } %[[PRIV_BOXCHAR0]], i64 %[[BOXCHAR_i64]], 1
+
+
+// CHECK: define internal void @__omp_offloading_[[SIMPLE_OFFLOADED_FUNCTION]](ptr %[[SV:.*]], ptr %[[SF:.*]])
+// CHECK: entry:
+// CHECK-NEXT: %[[SV_PRIV_ALLOCA:.*]] = alloca i32, align 4
+// CHECK-NEXT: %[[SF_PRIV_ALLOCA:.*]] = alloca float, align 4
+// CHECK: omp.private.copy:
+// CHECK-NEXT: %[[INIT_SV:.*]] = load i32, ptr %[[SV]], align 4
+// CHECK-NEXT: store i32 %[[INIT_SV]], ptr %[[SV_PRIV_ALLOCA]], align 4
+// CHECK: %[[INIT_SF:.*]] = load float, ptr %[[SF]], align 4
+// CHECK-NEXT store float %[[INIT_SF]], ptr %[[SF_PRIV_ALLOCA]], align 4
+// CHECK: omp.target
+// CHECK: %[[LOAD_SV:.*]] = load i32, ptr %[[SV_PRIV_ALLOCA]], align 4
+// CHECK-NEXT: %[[ADD_SV:.*]] = add i32 %[[LOAD_SV]], 10
+// CHECK-NEXT: store i32 %[[ADD_SV]], ptr %[[SV_PRIV_ALLOCA]], align 4
+// CHECK: %[[LOAD_SF:.*]] = load float, ptr %[[SF_PRIV_ALLOCA]], align 4
+// CHECK-NEXT: %[[SF_EXT:.*]] = fpext float %[[LOAD_SF]] to double
+// CHECK-NEXT: %[[ADD_SF:.*]] = fadd contract double %[[SF_EXT]], 2.000000e+00
+// CHECK-NEXT: %[[TRUNC_SF:.*]] = fptrunc double %[[ADD_SF]] to float
+// CHECK-NEXT: store float %[[TRUNC_SF]], ptr %[[SF_PRIV_ALLOCA]], align 4
+
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Pranav! Just a few small comments from my side.
flang/lib/Lower/OpenMP/OpenMP.cpp
Outdated
loc, llvm::omp::Directive::OMPD_target); | ||
cp.processTODO<clause::Allocate, clause::Defaultmap, clause::InReduction, | ||
clause::UsesAllocators>(loc, | ||
llvm::omp::Directive::OMPD_target); | ||
|
||
// `target private(..)` is only supported in delayed privatization mode. | ||
if (!enableDelayedPrivatizationStaging) | ||
cp.processTODO<clause::Private>(loc, llvm::omp::Directive::OMPD_target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we flip the switch and enable delayed privatization for target by default, we need to list firstprivate
here otherwise we will enable it accidently while disabling private
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work so far on this
My end-to-end tests exposed some problems in firstprivate lowering (not just the changes in this PR). I have fixed all but one of them. Once I have fixed the last of these problems, I'll update this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Pranav.
When privatizing a variable, we need a corresponding map when its PrivateClauseOp reads from the Mold arg. This is possible when either the init reads from the mold or if the copy region is not empty. Until now, we checked only the former case. This patch adds the check for the second case as well.
d13edc9
to
9c10941
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -92,7 +103,7 @@ class MapsForPrivatizedSymbolsPass | |||
/*varPtrPtr=*/Value{}, | |||
/*members=*/SmallVector<Value>{}, | |||
/*member_index=*/mlir::ArrayAttr{}, | |||
/*bounds=*/ValueRange{}, | |||
/*bounds=*/boundsOps.empty() ? SmallVector<Value>{} : boundsOps, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary. An empty boundsOps
is already an empty SmallVector<Value>
.
} | ||
void genBoundsOps(fir::FirOpBuilder &builder, mlir::Value var, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultra-nit
} | |
void genBoundsOps(fir::FirOpBuilder &builder, mlir::Value var, | |
} | |
void genBoundsOps(fir::FirOpBuilder &builder, mlir::Value var, |
auto normalizedLB = builder.create<mlir::arith::ConstantOp>( | ||
loc, idxTy, builder.getIntegerAttr(idxTy, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is constant across iterations so you could move it outside of the loop and generate fewer constants like you did with one
.
mlir::Value lb = dimInfo.getLowerBound(); | ||
mlir::Value extent = dimInfo.getExtent(); | ||
mlir::Value byteStride = dimInfo.getByteStride(); | ||
mlir::Value ub = builder.create<mlir::arith::SubIOp>(loc, extent, one); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you subtract 1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the extent
is the number of elements. So, the upper_bound
is (extent -1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that depend upon what the lower bound is?
This patch adds support to translate
firstprivate
clauses onomp.target
ops when translating from MLIR to LLVMIR. Presently, this PR is restricted to supporting only included tasks, i.e#omp target nowait firstprivate(some_variable)
will likely not work correctly even if it produces object code.