- 
                Notifications
    
You must be signed in to change notification settings  - Fork 37
 
Pandas simplification v2 #826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
           From previous PR: This PR looks to improve the handling for converting a topology to a dataframe. This currently lives as a method for topology. It is now being moved to a convert_dataframe.py module. A few different formats are available which give some nice default ways to view a topology. Notably, we have the formats: There is also an added function that allows you to generate dataframes that cover the parameters for a set of topologies. Finally, there will be some function that prints the dataframes with the rdkit mols which are labeled to match the dataframes. TODO Checklist: 
  | 
    
| 
           From discussion with @Vtsoch, there are a few more general use cases that would be nice to have working in the arguments for the main function, to_dataframeDict. dfDict = to_dataframeDict(ptop, parameters="sites", columns=["name", "atom_type.name", "atom_type.parameters", "charge", "molecule.name"], format="remove_duplicates")should be put as an example, since it could be hard to find the molecule info or parameters info if you don't know how the parsing works of these attributes. I would even consider these attributes to be in the  dfDict = to_dataframeDict(ptop, parameters=["sites", "bonds"], columns=["name"], format="specific_columns")This will fail currently. However, I think it should just go through the columns and only grab attributes that exist, skipping the others.  | 
    
          Codecov ReportAttention: Patch coverage is  
 
 Additional details and impacted files@@            Coverage Diff             @@
##             main     #826      +/-   ##
==========================================
- Coverage   94.07%   93.31%   -0.77%     
==========================================
  Files          65       66       +1     
  Lines        6953     7088     +135     
==========================================
+ Hits         6541     6614      +73     
- Misses        412      474      +62     ☔ View full report in Codecov by Sentry.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments.
| format: str = "default", | ||
| columns: list[str] = None, | ||
| handle_unyts: str = "to_headers", | ||
| ) -> pd.DataFrame: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be dict correct?
| ``` | ||
| 
               | 
          ||
| 
               | 
          ||
| >>> gmso.external.convert_dataframe.to_dataframeDict(ptop, parameters='sites', columns=['charge'], handle_unyts="to_headers") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zeerakkhan47 I think you could expand the information passed into the columns parameter here to include some of the things @CalCraven mention in his comment on June 16th.
This is a replacement PR for the features requested in #814, which has some commits in the merge history that deviated when switching to ruff linting.