Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect to alternative schemas #44

Closed
KoalaGeo opened this issue Mar 22, 2023 · 14 comments · Fixed by #48
Closed

Connect to alternative schemas #44

KoalaGeo opened this issue Mar 22, 2023 · 14 comments · Fixed by #48

Comments

@KoalaGeo
Copy link

Hi,

I'm running postgres, and have all my geospatial data in a schema "published" rather than using public.

I'm struggling to connect to that schema - tipg is only connecting to public.

In the Dockerfile I've tried

ENV POSTGRES_DBNAME=oraetl?options=-c%20search_path=published
and
ENV TIPG_DB_SCHEMAS=["published"]

Neither have worked. Any suggestions?

@vincentsarago
Copy link
Member

TIPG_DB_SCHEMAS=["published"] should be enough 🤔

cc @bitner

@KoalaGeo
Copy link
Author

KoalaGeo commented Mar 22, 2023

Hmm, Dockerfile looks like:

ARG PYTHON_VERSION=3.11

FROM ghcr.io/vincentsarago/uvicorn-gunicorn:${PYTHON_VERSION}

WORKDIR /tmp

COPY README.md README.md
COPY LICENSE LICENSE
COPY tipg/ tipg/
COPY pyproject.toml pyproject.toml

RUN pip install . --no-cache-dir
RUN rm -rf tipg/ README.md pyproject.toml LICENSE

ENV MODULE_NAME tipg.main
ENV VARIABLE_NAME app
ENV HOST=0.0.0.0
ENV PORT=8081
ENV PYTHONWARNINGS=ignore
ENV POSTGRES_USER=*********
ENV POSTGRES_PASS=********
ENV POSTGRES_DBNAME=oraetl
ENV POSTGRES_HOST=192.171.148.87
ENV POSTGRES_PORT=5432
ENV TIPG_DB_SCHEMAS=[published]
ENV DEBUG=TRUE

I've tried both "published" & published and TIPG_DB_SCHEMAS=["published, public"]

@KoalaGeo
Copy link
Author

Container logs:
image

@vincentsarago
Copy link
Member

🤔

I have a PgSTAC database that I start with docker compose (https://github.com/developmentseed/eoAPI/blob/master/docker-compose.yml#L129-L143) and then pass pgstac for the schemas and it work well 🤷

DATABASE_URL=postgresql://username:[email protected]:5439/postgis TIPG_DB_SCHEMAS='["pgstac"]' TIPG_DB_FUNCTIONS='[]' TIPG_TABLE_CONFIG__pgstac_items__pk=id uvicorn tipg.main:app --port 8000 --reload

Screenshot 2023-03-22 at 12 02 15 PM

@KoalaGeo
Copy link
Author

KoalaGeo commented Mar 22, 2023

Tried:

ENV DATABASE_URL=postgresql://user:[email protected]/oraetl
ENV TIPG_DB_SCHEMAS='["published"]'

But weirdly it's still just picking up 3 of the postgis functions:
image

Do I need TIPG_TABLE_CONFIG__pgstac_items__pk=id ?

@vincentsarago
Copy link
Member

ok I can reproduce this. It's weird that I had no issues using pgstac schema

# Launch database
docker-compose up database -d 

# Create a copy of `public.countries` in `myschema.countries`
psql postgresql://username:[email protected]:5439/postgis -c "CREATE SCHEMA IF NOT EXISTS myschema; SET schema 'myschema'; DROP TABLE IF EXISTS myschema.countries CASCADE; CREATE TABLE myschema.countries AS SELECT * FROM public.countries; SELECT count(*) FROM myschema.countries;"

# Start TiPG service
DATABASE_URL=postgresql://username:[email protected]:5439/postgis TIPG_DB_SCHEMAS='["myschema"]' TIPG_DB_FUNCTIONS='[]' uvicorn tipg.main:app --port 8000 --reload
curl http://127.0.0.1:8000/collections | jq
{
  "links": [
    {
      "href": "http://127.0.0.1:8000/collections",
      "rel": "self",
      "type": "application/json"
    }
  ],
  "numberMatched": 0,
  "numberReturned": 0,
  "collections": []
}

@KoalaGeo
Copy link
Author

KoalaGeo commented Mar 22, 2023

Pleased it's not just me!

Not an immediate fix but what about functionality matching pg_featureserv - where you don't have to specify the schema, the application finds everything which the $user has SELECT permissions for?

Then all the access control is at the DB side.

Kind of aligns with #37 keeping things stateless, so any new tables, views, comments associated with a user will come straight through without restarts.

@vincentsarago
Copy link
Member

I'll let @bitner have a look 🙏

@KoalaGeo
Copy link
Author

Don't know if it's helpful or not but we recently added CQL functionality to the PostGres provider in pygeoapi with sqlalchemy & pygeofilter
if there's anything you could reuse - geopython/pygeoapi#964

@KoalaGeo
Copy link
Author

@bitner any ideas?

@vincentsarago
Copy link
Member

I've narrowed down the issue and it comes from this line

AND (pg_table_is_visible(oid) or relnamespace=pg_my_temp_schema())

@vincentsarago
Copy link
Member

I'm also able to connect to other schema when I add them in

'pg_temp,' || current_setting('search_path', false),

eg

    await conn.execute(
        """
            SELECT set_config(
                'search_path',
                'pg_temp,myschema,public,' || current_setting('search_path', false),
                false
                );
            """
    )

@KoalaGeo
Copy link
Author

If you did similar to L216 but used has_schema_privilege then wouldn't need to define the schema in the config (think thats what https://github.com/CrunchyData/pg_featureserv/blob/master/internal/data/db_sql.go#L72 is doing)

@bitner
Copy link
Contributor

bitner commented Mar 28, 2023

There are two things going on here.

  1. We need to check what a user has permissions to see. This is all up to the database and cannot be overridden by anything that TiPG does - database privileges always trump. The calls to has_column_privilege, has_table_privilege, has_schema_privilege, and has_function_privilege make sure that we do not expose anything in the catalog that the user connecting to the database should not be able to use.
  2. We want to have some control to be able to filter over what is exposed by tipg regardless of permissions. Currently this is done partially on the database side to respect whatever is set in the search_path for the user that is connecting. This is where the pg_table_is_visible and pg_function_is_visible are kicking in. The exclude/include settings on TiPG are what kick in as additional filters when creating the catalog.

What this means is that if your schemas and function_schemas are set to NULL, then what shows up will be everything that is in the users search_path.

If the search_path is set to public, but schemas environment var is set to public,myschema only the tables from the public schema would show up. If the search_path for the user was set to public,myschema then tables from both would show up.

Looking through the code, I think that we could do a couple things that could make things a bit clearer.

  1. Currently we default schemas and function_schemas to public. We should instead default those to NULL which would make the users search_path the default.
  2. We should append all schemas listed in schemas/function_schemas environment variables to the search_path at the beginning of the connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants