Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions components/homme/src/share/cxx/HybridVCoord.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,11 @@ class HybridVCoord

auto dp_ij = Homme::subview(dp,igp,jgp);

ColumnOps::column_reduction<NUM_PHYSICAL_LEV>(kv,dp_ij,ps(igp,jgp));
kv.team_barrier();
Real ps_val = 0;
ColumnOps::column_reduction<NUM_PHYSICAL_LEV>(kv,dp_ij,ps_val);

Kokkos::single(Kokkos::PerThread(kv.team),[&](){
ps(igp,jgp) += hybrid_ai0*ps0;
ps(igp,jgp) = ps_val + hybrid_ai0*ps0;
});
});
kv.team_barrier();
Expand Down
11 changes: 0 additions & 11 deletions components/homme/src/theta-l_kokkos/cxx/CaarFunctorImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -343,8 +343,6 @@ struct CaarFunctorImpl {

set_rk_stage_data(data);

profiling_resume();

GPTLstart("caar compute");
int nerr;
Kokkos::parallel_reduce("caar loop pre-boundary exchange", m_policy_pre, *this, nerr);
Expand All @@ -367,7 +365,6 @@ struct CaarFunctorImpl {

limiter.run(data.np1);

profiling_pause();
}

KOKKOS_INLINE_FUNCTION
Expand Down Expand Up @@ -550,20 +547,14 @@ struct CaarFunctorImpl {
Kokkos::single(Kokkos::PerThread(kv.team),[&]() {
pi_i(0)[0] = m_hvcoord.ps0*m_hvcoord.hybrid_ai0;
});
kv.team_barrier();

ColumnOps::column_scan_mid_to_int<true>(kv,dp,pi_i);

ColumnOps::compute_midpoint_values(kv,pi_i,pi);

// Barrier so that the buffer shared by pi_i and omega_i is free for
// omega_i to use.
kv.team_barrier();

Kokkos::single(Kokkos::PerThread(kv.team),[&]() {
omega_i(0)[0] = 0.0;
});
kv.team_barrier();

ColumnOps::column_scan_mid_to_int<true>(kv,div_vdp,omega_i);
// Average omega_i to midpoints, and change sign, since later
Expand Down Expand Up @@ -1225,7 +1216,6 @@ struct CaarFunctorImpl {
ColumnOps::compute_midpoint_values(kv,prod_x,mgrad_x);
ColumnOps::compute_midpoint_values(kv,prod_y,mgrad_y);
}
kv.team_barrier();

// Apply pgrad_correction: mgrad += cp*T0*(grad(log(exner))-grad(exner)/exner) (if applicable)
if (m_pgrad_correction) {
Expand All @@ -1244,7 +1234,6 @@ struct CaarFunctorImpl {
mgrad_y(ilev) += cp*T0*(grad_tmp_i_y(ilev) - grad_exner_i_y(ilev)/exner_i(ilev));
});
}
kv.team_barrier();

// Compute KE. Also, add fcor to vort
auto u = Homme::subview(m_state.m_v,kv.ie,m_data.n0,0,igp,jgp);
Expand Down
10 changes: 1 addition & 9 deletions components/homme/src/theta-l_kokkos/cxx/LimiterFunctor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -99,15 +99,11 @@ struct LimiterFunctor {

void run (const int& tl)
{
profiling_resume();

GPTLstart("caar limiter");
m_np1 = tl;
Kokkos::parallel_for("caar loop dp3d limiter", m_policy_dp3d_lim, *this);
Kokkos::fence();
GPTLstop("caar limiter");

profiling_pause();
}

KOKKOS_INLINE_FUNCTION
Expand All @@ -130,8 +126,6 @@ struct LimiterFunctor {
diff(ilev) = (dp(ilev) - m_dp3d_thresh*dp0(ilev))*spheremp;
});

kv.team_barrier();

Real min_diff = Kokkos::reduction_identity<Real>::min();
auto diff_as_real = Homme::viewAsReal(diff);
auto dp_as_real = Homme::viewAsReal(dp);
Expand Down Expand Up @@ -168,8 +162,6 @@ struct LimiterFunctor {
});
}

kv.team_barrier();

Comment on lines -171 to -172
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this barrier out of if (min_diff<0) { branch to avoid possible stalls waiting for threads in the else branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky issue. It's definitely wrong to have a team_barrier in a conditional like this, since it can lead to deadlock. But if NUM_LEV != NUM_PHYSICAL_LEV, I think there can be a race condition in the parallel_for-parallel_reduce sequence that this team_barrier is separating. On the GPU, NUM_LEV == NUM_PHYSICAL_LEV, and on the CPU, we don't actually have threading here in practice, so this point is not relevant in practice.

Nonetheless, I had a branch going a long time ago that fixed this type of issue in a few spots in HOMME, but I ended up getting distracted by other machine issues and never got this branch in. If might be worth consulting this commit to see how to keep this team_barrier. (The ComposeTransport change is no longer needed, but the three others are still relevant, although again only in threading/vectorization configurations we don't run in practice.)

The changes in this PR are fine if keeping them is preferred.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrew, I think we should get your commit in (at least, the part that fixes the team barrier). While I agree that in practice it doesn't make a difference now, we don't know if our threading schemes will change 5 yy from now, and catching this kind of bugs would take time. Better put in the correct code now, and be safe later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I thought @oksanaguba had fixed this as well in one of her branches a while ago, no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get your commit in

Ok. For clarity, I'm not on this project this year, so someone else would need to bring in the changes.

// This loop must be done over physical levels, unless we implement
// masks, like it has been done in the E3SM/scream project
Real mass_new = 0.0;
Expand All @@ -193,7 +185,7 @@ struct LimiterFunctor {
dp(ilev) = diff(ilev)/spheremp + m_dp3d_thresh*dp0(ilev);
vtheta_dp(ilev) *= dp(ilev);
});
} //end of min_diff < 0
} // end of min_diff < 0

Kokkos::parallel_for(Kokkos::ThreadVectorRange(kv.team,NUM_LEV),
[&](const int ilev) {
Expand Down