Skip to content

Community Meeting Notes Archive

Martin Fleischmann edited this page Jun 4, 2021 · 5 revisions

The archive of Community Meeting Notes. See the most recent and tentative agenda for the next meeting on hackmd.

2021-05-27

(attending Martin, Joris, Stefanie, Thomas, Brendan)

  • dask-geopandas
    • Dask Summit workshop debrief
    • Google Summer of Code
      • We have one project on dask-geopandas development
        • Logistics:
          • smaller meetings every week, aim for Thurs 4-5 PM UTC; Martin will setup meetings
          • every 2 months a larger GeoPandas meeting
          • use Github issues, PRs, GeoPandas gitter, dask-geopandas gitter
          • Martin is admin point of contact
          • Blog posts from GSOC: these to get linked into NumFOCUS blog
        • Goals:
          • spatial partitioning
            • explore writing out to Parquet?
            • need to figure out partitioning methods, e.g., Hilbert curve
            • probably want to implement a couple methods: Hilbert, maybe a gridded approach
            • first identify some of the options:
              • simple grid
              • known regions (can do spatial clustering for getting more or less homogeneous sized partitions)
              • hilbert curve
              • quadtree: might work well, not exposed yet in GEOS C API / pygeos
              • strtree: don't have access to nodes / leaves via GEOS C API / pygeos
            • storage of partitions
              • right now just polygons as a geoseries
          • spatial indexing
            • also want to make sure this gets done
            • only place this is currently used is for writing to Parquet and cx coordinate indexer
            • good starter PR: simple predicates: intersects; check for overlap with partition first, before checking geometries within partition
      • feedback to rejected projects?
    • NumFocus SDG
      • Joris wants to apply for SDG to work on dask-geopandas
      • Focus more on I/O
        • Read large dataset, have dask-geopandas figure out partitioning to files
        • Read index and bounding boxes into memory to drive the partitioning, then use the partion bounding boxes or lists of indexes to query out chunks of data
        • Optimize parquet: store coordinates instead of WKB
        • Feather support? Right now using the dask support for Parquet, not available for Feather in dask; Joris has a prototype Feather file reader for dask
        • Convert GDAL directly to Arrow memory format instead of WKB
          • maybe do directly in GDAL
          • try first in pyogrio
  • GeoPandas Blog
  • API of matrix binary operations
    • https://github.com/geopandas/geopandas/pull/1674
    • We now have an implementation based on sparse matrix which works really well for all the use cases
    • Qs:
      • API
      • which sparse backend? scipy.sparse or pydata sparse?
      • Martin is planning to base the implementation around sparse approach
      • Discuss next time
  • API for interactive plotting
    • https://github.com/geopandas/geopandas/issues/1904
    • We want pluggable interactive plotting backends. How to do it smoothly?
      • interest from some of the plotting backends
      • don't really want global config for plot method
      • want to keep usage of static and interactive plotting separate, don't clobber the static implementation by using interactive plotting; keep these in separate methods
      • add another method: explore / view for interactive maps
    • datashader option to HVplot:
      • works quite well for large data
      • Joris follow up with them: can instance check be expanded to include geopandas geodataframes (via dask-geopandas), not just spatialpandas frames
  • Community calls
    • we have a shared Google Calendar for GeoPandas-related events
    • meetings are set to 17:00 UTC every two months (last Thursday)
  • xyzservices
    • new package under geopandas umbrella
    • formerly contextily.providers
    • https://github.com/geopandas/xyzservices
    • planning to have available before next release of geopandas
    • will have 2 JSON formats:
      • pretty version that includes metadata
      • compiled / compressed version that is actually used in code; plan is to create via Github action
  • Ecosystem update
    • cuSpatial should fully support geopandas-cuspatial dataframe conversion in the next release
  • Shapely 2.0
    • Joris planning to do more on this in June
    • main blocking issue is the discussion around STRtree
    • Differences in minimum rotated rectangle between Shapely's pure python method and method in GEOS
      • Follow up with GEOS team about differences
      • OpenCV method same as SHapely
      • Also a method in PostGIS - is it the same
  • Pyogrio
    • Brendan: transfer to GeoPandas org
  • Other

2021-05-24 GSOC coordination

  • Weekly meetings
  • Use public channels for discussion / questions (github issues, gitter channel, (specific? -> make a dask-geopandas channel))
  • Single Point of Contact (more for administrative questions)
    • Martin
  • Blog: on NumFOCUS & personal site is fine, no need for GeoPandas branded one

2021-03-25

(attending: Martin, Joris, James, Brendan, Sangarshanan, Levi)

  • Google Summer of Code
    • We have submitted 3 project ideas
    • Students should get in touch now and submit proposals within weeks
      • students will start applying next Monday
      • We need to select students between mid-April and mid-May
    • Should we advertise it more? Prospect on possible students?
      • TODO: Post on Twitter again (done)
      • PySAL: primarily recruits from own students; ~1/2 have been affiliated that way
  • Community repository
    • we have a new geopandas/community repo
      • if not package specific to not specific to code, governance, code of conduct, post to this
      • if specific to GeoPandas post issues to GeoPandas instead
      • use for announcing meetings or proposals (workshops, funding)
    • how should we efficiently use it?
    • https://github.com/geopandas/community
    • TODO: post issue for how to get funding for GeoPandas features or ideas list for potential future grants
  • Community calls
    • shall we switch to some predictable schedule? (Bi-)Monthly?
    • start with bimonthly on last Thursday of each month
      • TODO: post schedule to community repo
    • archive prior call notes to community repo; keep markdown doc for latest meeting
  • dask-geopandas
    • repository moved to GeoPandas org
    • https://github.com/geopandas/dask-geopandas
    • Dask-Summit workshop proposal
      • In May: https://summit.dask.org/
      • submitted proposal around scaling GeoPandas vector operations
      • Could have a presentation about current status of dask-geopandas
      • Some discussion around spatial partitioning
      • Look for ways to collaborate with spatial pandas
      • Would be good to do visualization of bigger data
      • TODO: add issue in community repo for ideas for this workshop
    • First alpha released on PyPI, still needs conda-forge
      • Martin: will add to conda-forge
      • Biggest needs: spatial index and overlap operations
  • User-friendly API of matrix binary operations
    • would be nice to have "intersects_matrix" in 0.10
    • We should agree on the API design, implementation should be straigtforward based on query_bulk,
    • https://github.com/geopandas/geopandas/pull/1674
    • returning a list maybe not particularly useful
    • might be a good to have a few example use cases
      • does any polygon in input intersect any in right dataframe
      • which of them in left dataframe intersects any in right dataframe
      • how many intersects
    • use outer strategy with sparse argument
      • currently don't depend on scipy; makes it harder to use sparse option
      • can keep sparse as an optional argument; fall back to full matrix
    • another alternative is to use xarray and pydata sparse backend (optional dependencies)
    • could just return dense pandas table of left and right indices
  • Interactive plotting
    • the existing tools are not as friendly as we thought
    • folium-based implementation of GeoDataFrame.view() mirroring the language of plot()
    • https://github.com/martinfleis/geopandas-view
    • should it be embedded in GeoPandas? Or as an affiliated project under GeoPandas repo?
    • @sangarshanan is willing to help maintaining it
    • status: most of the stuff supported for static plotting in matplotlib is now supported against folium
    • considerations for API:
      • plotting backend provider
      • namespacing folium / interactive methods to prevent collision with static plotting
      • over some threshold do not want to plot in folium
      • might be good to look at how sf in R handles translation to backend providers
      • implementation of backend can be outside GeoPandas; might be easier to have this directly in GeoPandas in order to allow it as a default (not a lot of code)
    • will do a bit more work to polish then migrate into GeoPandas
  • contextily providers module
  • Ecosystem update
  • geopandas.org
    • we still don't have access to the domain to point it to RTD
      • Joris will ping Kelsey J.
    • also need to have ownership in Pypi; need to be able to add others
    • conda forge:
      • anyone can help maintain this
      • currently Joris, James, Filipe
  • NumFOCUS small grants
    • do we want to apply for something in the near future?
    • anyone has capacity?
    • next round likely before summer
    • open issue on community repo

2021-03-25

(attending: Martin, Joris, James, Brendan, Sangarshanan, Levi)

  • Google Summer of Code
    • We have submitted 3 project ideas
    • Students should get in touch now and submit proposals within weeks
      • students will start applying next Monday
      • We need to select students between mid-April and mid-May
    • Should we advertise it more? Prospect on possible students?
      • TODO: Post on Twitter again (done)
      • PySAL: primarily recruits from own students; ~1/2 have been affiliated that way
  • Community repository
    • we have a new geopandas/community repo
      • if not package specific to not specific to code, governance, code of conduct, post to this
      • if specific to GeoPandas post issues to GeoPandas instead
      • use for announcing meetings or proposals (workshops, funding)
    • how should we efficiently use it?
    • https://github.com/geopandas/community
    • TODO: post issue for how to get funding for GeoPandas features or ideas list for potential future grants
  • Community calls
    • shall we switch to some predictable schedule? (Bi-)Monthly?
    • start with bimonthly on last Thursday of each month
      • TODO: post schedule to community repo
    • archive prior call notes to community repo; keep markdown doc for latest meeting
  • dask-geopandas
    • repository moved to GeoPandas org
    • https://github.com/geopandas/dask-geopandas
    • Dask-Summit workshop proposal
      • In May: https://summit.dask.org/
      • submitted proposal around scaling GeoPandas vector operations
      • Could have a presentation about current status of dask-geopandas
      • Some discussion around spatial partitioning
      • Look for ways to collaborate with spatial pandas
      • Would be good to do visualization of bigger data
      • TODO: add issue in community repo for ideas for this workshop
    • First alpha released on PyPI, still needs conda-forge
      • Martin: will add to conda-forge
      • Biggest needs: spatial index and overlap operations
  • User-friendly API of matrix binary operations
    • would be nice to have "intersects_matrix" in 0.10
    • We should agree on the API design, implementation should be straigtforward based on query_bulk,
    • https://github.com/geopandas/geopandas/pull/1674
    • returning a list maybe not particularly useful
    • might be a good to have a few example use cases
      • does any polygon in input intersect any in right dataframe
      • which of them in left dataframe intersects any in right dataframe
      • how many intersects
    • use outer strategy with sparse argument
      • currently don't depend on scipy; makes it harder to use sparse option
      • can keep sparse as an optional argument; fall back to full matrix
    • another alternative is to use xarray and pydata sparse backend (optional dependencies)
    • could just return dense pandas table of left and right indices
  • Interactive plotting
    • the existing tools are not as friendly as we thought
    • folium-based implementation of GeoDataFrame.view() mirroring the language of plot()
    • https://github.com/martinfleis/geopandas-view
    • should it be embedded in GeoPandas? Or as an affiliated project under GeoPandas repo?
    • @sangarshanan is willing to help maintaining it
    • status: most of the stuff supported for static plotting in matplotlib is now supported against folium
    • considerations for API:
      • plotting backend provider
      • namespacing folium / interactive methods to prevent collision with static plotting
      • over some threshold do not want to plot in folium
      • might be good to look at how sf in R handles translation to backend providers
      • implementation of backend can be outside GeoPandas; might be easier to have this directly in GeoPandas in order to allow it as a default (not a lot of code)
    • will do a bit more work to polish then migrate into GeoPandas
  • contextily providers module
  • Ecosystem update
  • geopandas.org
    • we still don't have access to the domain to point it to RTD
      • Joris will ping Kelsey J.
    • also need to have ownership in Pypi; need to be able to add others
    • conda forge:
      • anyone can help maintain this
      • currently Joris, James, Filipe
  • NumFOCUS small grants
    • do we want to apply for something in the near future?
    • anyone has capacity?
    • next round likely before summer
    • open issue on community repo

2021-01-14

2020-08-20: A second meeting!

Agenda

  • NumFOCUS Documentation project

    • I'd like to update you on current development and discuss a bit further steps to decide on priorities and time frame.
    • context: https://github.com/geopandas/geopandas/issues/1564
    • Martin provided an update on the latest direction in documentation work in https://github.com/geopandas/geopandas/issues/1564
      • some examples will move to user guide where they are using the core functions
      • for examples gallery may use nb-sphynx instead of sphynx-gallery
      • Will bulk up installation instructions to help alleviate many of the complaints around installation issues
      • will add a longer-term roadmap within the docs
    • Going forward, Martin will add examples incrementally but will try to get this reviewed as a larger PR
    • New Advanced Guide will include more advanced topics like using spatial index and vectorization
    • Will need to add redirects from important pages from existing readthedocs pages to the new documentation structure
  • Select final logo

    • https://github.com/geopandas/geopandas/issues/1405
    • Let's make the final decision!
      • Go with the one with highest votes
    • This will go into a separate PR with all the versions and source files
    • Add a page to documentation with the logo and specific colors used
    • Share logo back to NumFOCUS
    • TODO: update the logo on twitter, etc
  • GitHub Sponsors

    • We may consider using GitHub Sponsor button. Someone recently asked how to support GeoPandas and I was not sure if there is any possibility of a direct (financial) support, apart from donating to NumFOCUS.
    • In order to have NumFOCUS accept $ on behalf of GeoPandas, may need to become a fiscally-sponsored project instead of just an affiliated project; Joris will check into this
    • For GitHub Sponsor have seen examples of sponsoring individuals; will need to see what it would take to sponsor the larger project
  • GeoPandas usage / promotion

    • Would like to feature groups that use GeoPandas as part of their work, maybe on GeoPandas blog (if there was one)
    • Blog: would like to do this outside sphynx
  • GeoPandas domain

    • Joris will follow up with Kelsey
      • Also request PyPi access from Kelsey
  • Packaging automation

    • Can use GitHub Actions to publish packages to PyPi / Conda
    • Can derive this from Pydata project
  • Social media

    • Twitter
      • Joris is currently maintaining this
      • Martin can help with this; Joris will share access
      • Example that came up on twitter from COVID-19 dashboards around showing density of points, maybe by hexagon; might want to add something like this as an example in the docs
  • GeoPandas academic paper

    • Geographical Analysis journal is having a special issue on Open Source Software for Spatial Analysis, edited by Luc Anselin and Serge Rey (both PySAL). We had a small exchange about the possibility of writing a paper about GeoPandas (which is long overdue I'd say) with Joris and Serge on twitter: https://twitter.com/jorisvdbossche/status/1282208649335779328 I feel that this would be great thing to do, although it naturally takes time to write a proper paper.
    • Special issue will require more background documentation & contextualization; not just a description about the project
    • Need to position it into the wider ecosystem; directly address how it has advanced spatial analysis in Python
    • Could start brainstorming / collecting ideas
    • Martin will make a google doc
    • Martin will check to see if there is sponsorship from the university for making this open access
      • Full fee is $3,000 US
    • If we don't go for this, make sure to go after a different publication that allows open access
  • GeoPandas Survey

  • GeoPandas 0.9 roadmap

    • If we want to release 0.9 in December (we discussed switching to 6-month release cycle), we could discuss what do we want to (ideally) include.
    • Binary predicates change - https://gist.github.com/martinfleis/abc7cdbf9f9266bf9ed369080eec7cea
      • proposal is to build this on the output of query bulk
      • people normally interested in 2 questions: does my polygon intersect any in the other data frame (not just same line), which polygons from right data frame are intersected with the one on the left
      • sf (in R) doesn't return series, they return metrics (sparse / dense)
      • could have a function that gives more direct access to sindex bulk query
      • general agreement about keeping the existing predicate behavior as is, but adding a new set of methods on GeoSeries to add the cross / matrix oriented approach
      • Martin will add a new issue for this with notebook example
    • spatial index
      • do we want to expose interface to multiple spatial index or abstract base class that can wrap other spatial index implementations
      • can revise the issue based on discussion but don't target for 0.9
      • revisit once pygeos / shapely 2.0 integration is complete and no longer optional; STRtree will be default as part of that
    • Brendan will try to get outstanding pygeos issue to add other predicates to STRtree in for next pygeos version:
    • Upcoming pygeos features in next release: mostly around multithreading, adding support for Z values to coordinate ops
    • geodetic distance / area calculations
      • this was tricky to write these to be performant, dealing with wrap around the poles
      • there is project to extract out the S2 ideas into a general purpose library
      • Create an example out of this work and put in documentation
      • Create an issue about adapting ideas from sf
      • Aim for supporting different spatial backend (e.g., S2) after 1.0
      • Look into some of the other backends
    • cuSpatial:
      • want to support interoperability, not sure about supporting different underlying geometry providers / backends
    • Longer term, maybe consider making GDAL / Fiona optional (e.g., read data from Parquet...)
    • vectorized snap
      • e.g., make larger linestring out of 2 disconnected segments
      • in GEOS overlay refactor, this will include a precision-based snap
  • Future NumFOCUS grants

    • I am not aware of the schedule of future funding rounds, but we should be prepared (if anyone has a capacity).
      • Normally should be 3rd round for this year, but haven't heard yet
  • dask-geopandas

    • Discuss the current state and future of dask-geopandas.
    • Big work items underway:
      • I/O methods: Joris adding Parquet support from geopandas
      • making use of spatial partitioning

2020-05-07: A first meeting!

  • NumFOCUS

    • Small development grants ideas:
      • better documentation
      • better integration / leveraging spatial indexes for operations
      • small improvements to topological operations (relates operations); elementwise vs all-pairwise
  • Logo

  • Lowering barriers to effective engagement / involving community

    • reviewing PR bottlenecks
      • time of core maintainers
      • huge PRs, can we suggest folks make smaller PRs?
  • Maintenance bottlenecks

  • Roadmap (1.0?)

    • Shapely 2.0 / pygeos speed-ups
    • API for topological operations
    • IO
      • parquet/feather
      • faster GDAL
      • databases
      • consistent API
    • Integrating raster operations
      • zonal stats is problematic for large data
    • geodetic distance etc (geography)
    • visualization
      • maybe geoplot becomes an affiliate like contextily
      • residentmario may not have time naymore for maintenance
    • Vectorized snap feature to other feature
  • Do something like http://xarray.pydata.org/en/stable/roadmap.html

    • Open an issue for this
  • places to ask questions vs. filing an issue? document.

  • Documentation

    • notebooks/examples
  • Installation issues

Clone this wiki locally