Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session.virtualfile_in: Remove the "extra_arrays" parameter #3823

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

seisman
Copy link
Member

@seisman seisman commented Feb 26, 2025

For GMT modules that accept table inputs, the number of required columns can vary significantly. Most modules typically need between one and three columns. For instance:

  • The histogram module requires only one column but can optionally accept a second column for weighting.
  • The rose module needs two columns.
  • The contour module requires three columns.

For these modules, the table input can be provided either through the data parameter or via the x, y, and z parameters. This is why the Session.virtualfile_in method includes parameters like data, x, y, and z.

However, some modules may require more than three columns. For example, the plot module can accept over 10 columns depending on the options specified. To accommodate these cases, the Session.virtualfile_in method currently includes an extra_arrays parameter, which is a list of additional vectors beyond x, y, and z. This requires users to define a list and append extra columns to it.

This PR proposes removing the extra_arrays parameter. Instead of passing x, y, z, and extra_arrays, we can simplify the process by using a dictionary of vectors, i.e., when data kind is "empty", create a dict like below and pass it directly to Session.virtualfile_in:

data = {
  "x": x, 
  "y": y,
  "z": z,
  "fill": fill,
  "transparency": transparency
}

The dict-like data can be thought as a lightweight pandas.DataFrame object, and it will be recognized as the "vector" kind in data_kind and Session.virtualfile_in. This approach enhances flexibility and usability, especially for modules requiring a larger number of input columns.

Changes in this PR:

  • Remove the extra_arrays parameters from Session.virtualfile_in. The x/y/z parameters are still kept since for some modules passing x/y/z vectors is easier that passing a dict of x/y/z
  • Refactor Figure.plot/Figure.plot3d/Figure.text to get rid of extra_arrays
  • In _validate_data_input, add extra checks to ensure that the dict values are not None. Please note that I plan to refactor/simplify the _validate_data_input function in a separate PR (i.e., Refactor _validate_data_input to simplify the codes #3818)
  • Add a new test for Figure.plot3d to ensure that an error is raise when z is not provided.

Strictly speaking, this is a breaking change for users who call Session.virtualfile_in, so let me know if you think we should raise warnings when extra_arrays is used and keep backward compatibility for a few versions (maybe 4 versions).

@seisman seisman force-pushed the refactor/virtualfile_in_extra_arrays branch from ad6b015 to 9733c53 Compare February 26, 2025 07:42
@seisman seisman force-pushed the refactor/virtualfile_in_extra_arrays branch from 1ff0de4 to df88e02 Compare March 3, 2025 06:41
@seisman seisman marked this pull request as ready for review March 3, 2025 08:30
@seisman seisman added deprecation Deprecating a feature needs review This PR has higher priority and needs review. labels Mar 3, 2025
@@ -143,6 +144,15 @@ def _validate_data_input(
raise GMTInvalidInput(msg)
if hasattr(data, "data_vars") and len(data.data_vars) < 3: # xr.Dataset
raise GMTInvalidInput(msg)
if kind == "vectors" and isinstance(data, dict):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that, these new codes in _validate_data_input is temporary and I plan to refactor _validate_data_input in a separate PR.

@seisman seisman added this to the 0.15.0 milestone Mar 3, 2025
@seisman
Copy link
Member Author

seisman commented Mar 3, 2025

Hopefully @weiji14 will have some time reviewing this PR.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @seisman, super busy with work so might not be able to give this a closer review until end of March or so. But just left two comments below for now.

@@ -1794,9 +1793,6 @@ def virtualfile_in(
data input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add dict as a possible type of data to L1790

Comment on lines 1772 to 1776
x=None,
y=None,
z=None,
extra_arrays=None,
required_z=False,
required_data=True,
Copy link
Member

@weiji14 weiji14 Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should raise warnings when extra_arrays is used and keep backward compatibility for a few versions (maybe 4 versions).

If we're going to break compatibility of virtualfile_in for 4 minor versions, maybe it's a good time to rethink the parameter ordering (ties in with potential changes in #3369). Would it make sense for the signature to be something like:

def virtualfile_in(
    self,
    check_kind=None,
    required_data=True,
    required_z=False,
    data=None,
    x=None,
    y=None,
    z=None,
    **extra_arrays

? We could also mark some of those parameters as keyword only (https://peps.python.org/pep-0570/#keyword-only-arguments) to ensure future compatibility (i.e. prevent positional-only parameters/arguments in case we decide to change the order again).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #3836 for my thoughts on the Session.virtualfile_in method.

@seisman seisman modified the milestones: 0.15.0, 0.16.0 Mar 6, 2025
@seisman seisman removed the needs review This PR has higher priority and needs review. label Mar 6, 2025
@seisman
Copy link
Member Author

seisman commented Mar 6, 2025

Sorry @seisman, super busy with work so might not be able to give this a closer review until end of March or so. But just left two comments below for now.

As mentioned in #3836, I plan to make many refactors (likely backward-incompatible) to the Session.virtualfile_in method. It's better to have all these refactors appear in a single release. So I've bumped the milestone to v0.16.0.

@seisman seisman marked this pull request as draft March 6, 2025 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Deprecating a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants