Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with NA handling in scatter plots: Two NAs per category cause incorrect line connection #2419

Open
marik-ku opened this issue Jan 30, 2025 · 3 comments · May be fixed by #2427
Open

Issue with NA handling in scatter plots: Two NAs per category cause incorrect line connection #2419

marik-ku opened this issue Jan 30, 2025 · 3 comments · May be fixed by #2427

Comments

@marik-ku
Copy link

Issue Summary

When using plot_ly() in R with a scatter plot (mode = "lines+markers"), missing (NA) values are expected to create gaps in the line plot. However, if exactly two NA values exist per category, the missing values are incorrectly connected by a line instead of creating a gap.

Interestingly, when the hovertemplate is removed, the line plot behaves as expected (i.e., creating a gap for NA values). This issue only occurs when there are exactly two NA values per category; the code works with any other number of NA values.

Additional Discovery:
The issue is resolved if I include the argument split = ~Category, but I cannot find documentation for split in Plotly, which makes me think it may be deprecated. Moreover, when the hovertemplate is removed, the inclusion of split does not work as expected and does not resolve the issue.

Reproducible Example

The following R code demonstrates the issue:

library(plotly)

df <- data.frame(
  Category = rep(c("A", "B"), each = 6),
  Date = c(2020, 2021, 2022, 2023, 2024, 2025, 2020, 2021, 2022, 2023, 2024, 2025),
  Value = c(10, 15, NA, NA, 20, 25, 12, 14, NA, 22, NA, 27)
)

df$Date <- factor(df$Date, levels = unique(df$Date), ordered = TRUE)

plot_ly(
  df,
  x = ~Date,
  y = ~Value,
  color = ~Category,
  type = 'scatter',
  mode = 'lines+markers',
  text = ~Category,
  hovertemplate = paste0("Date: %{x}<br>Category: %{text}")
)

Expected Behaviour

  • NA values should create a gap in the line plot, i.e., they should not be connected.
  • This works correctly when there is any number of NA values other than exactly two in any category.

Actual Behavior

  • When there are exactly two NA values per category, the missing values are incorrectly connected by a line instead of creating a gap.
  • When the hovertemplate is removed, the lines create a gap as expected.
  • The issue only arises when there are exactly two NA values per category; any other instance of NA works fine.
  • Including split = ~Category resolves the issue, but:
    • I cannot find any Plotly documentation on split, leading me to believe it may be deprecated.
    • Interestingly, when the hovertemplate is removed, including split does not resolve the issue.

Additional Notes
This seems to be an issue specifically triggered by the combination of NA handling and the hovertemplate. I would appreciate further insight on why this happens or suggestions for a workaround to preserve the gap in the case of two NA values.

System Info

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2019 x64 (build 19045)

Matrix products: default

@romanzenka
Copy link

romanzenka commented Feb 21, 2025

I looked in depth into what is happening, and the generated plotly.js data is genuinely missing the null values. Will need to investigate deeper, looks like a genuine bug in the R code to me at the moment.

@romanzenka
Copy link

romanzenka commented Feb 21, 2025

I found the culprit... the function traceify in plotly_build would split "traces", looking for trace attributes of a proper length. However, the hidden parameter .plotlyVariableMapping that is used within the trace does not contain actual values, it contains a list of variable names for the trace. If the length of the variable list matches EXACTLY the number of entries in the trace (you have 12 datapoints, with NAs 8 are kept, the number of variables is 8: x, y, text, hoverTemplate, color, .plotlyTraceIndex, .plotlyMissingIndex and .plotlyGroupIndex... so if that happens, the list of variables in a trace gets cut in two halves, which will switch off grouping on one of the traces that no longer contains the .plotlyGroupIndex.

This sounds like very fragile code, I need to think more about how this could be fixed. The culprit is https://github.com/plotly/plotly.R/blob/aa619dc2fbc2fa786e15a8d11444a18863661ed4/R/plotly_build.R#L996C57-L996C71 - this needs to be replaced with a more robust check for "is this entry splittable?"

@romanzenka
Copy link

Here is a "minimal" counterexample. 7 rows, 1 NAs (total number of rows drops to 6), matches 6 variables without hovertemplate. Add another variable - any variable, it will start working. You can break it by arbitrary number of rows, as long as the number of non-NA rows matches exactly 6.

df <- data.frame(
    Category = c("A", "A", "A", "A", "B", "B", "B"),
    Date = c(2020, 2021, 2022, 2023, 2020, 2021, 2022),
    Value = c(1, 2, NA, 4, 2, 3, 4)
)

df$Date <- factor(df$Date, levels = unique(df$Date), ordered = TRUE)

plot_ly(
  df,
  x = ~Date,
  y = ~Value,
  color = ~Category,
  type = 'scatter',
  mode = 'lines+markers')

Shows incorrectly

Image

Should show

Image

romanzenka added a commit to romanzenka/plotly.R that referenced this issue Feb 21, 2025
If this is not done (or similar), in rare situations the `recurse` function splits the variable list into parts if it happens to match length of the column to split by.

Fixes plotly#2419
@romanzenka romanzenka linked a pull request Feb 21, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants