Skip to content

Socrata tweaks (and a couple of odds and ends)#2084

Open
ssiegal1 wants to merge 2 commits intomainfrom
ssiegal.fix-socrata-local
Open

Socrata tweaks (and a couple of odds and ends)#2084
ssiegal1 wants to merge 2 commits intomainfrom
ssiegal.fix-socrata-local

Conversation

@ssiegal1
Copy link
Copy Markdown
Contributor

@ssiegal1 ssiegal1 commented Mar 3, 2026

Allow previous years' data when running locally with Socrata as the data source.

Added a unit test for src/utils/DataService.js while I was at it.

@ssiegal1 ssiegal1 force-pushed the ssiegal.fix-socrata-local branch from 23086d8 to 24ce1f9 Compare March 4, 2026 17:55
@ssiegal1 ssiegal1 changed the title interim checkin Socrata tweaks (and a couple of odds and ends) Mar 4, 2026
@ssiegal1 ssiegal1 marked this pull request as ready for review March 4, 2026 17:58
Copy link
Copy Markdown
Member

@rayneng rayneng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the one concern needs to be addressed. I'm not familiar enough with tests enough yet, perhaps someone else could also have a look

Comment thread backend/DbProvider.jsx
// Create db connection
const newConn = await newDb.connect();

// Create views so tables can be queried as requests_<year>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 100% the right move to make, but it has some implications when using Huggingface as the dataset source. I believe it will try to download all datasets into memory before the user gets a chance to filter the dataset. This is what the network tab shows on your branch (regardless of what data source is specified in .env):

Click to see 311 map + network tab

Image

(Note: the failed fetches for dataset years that are pre-2024 are likely a separate problem, I'll try and surface that issue elsewhere)

I think ultimately we should be creating the views without actually loading data. We are simply using ... AS SELECT * FROM requestsYYYY.parquet so that we can automatically retrieve the column names. Maybe we can change the query to do that? Or we can try an approach that is independent from Huggingface and simply store the relevant columns (for each year...sigh) in a local file (as a javascript object, or just put a small file in the data folder).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi May, I think we can accomplish this with a

WHERE 1 = 0;

This condition is universally false so the SELECT statement returns only the structure (column names and data types) and no data into the view.

Comment thread src/utils/DataService.js
import moment from "moment";
import ddbh from "@utils/duckDbHelpers.js";

const dataResources = {
Copy link
Copy Markdown
Member

@rayneng rayneng Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can add the following js object entries here:

2020: "rq3b-xjk8",
2021: "97z7-y5bt",
2022: "i5ke-k6by",
2023: "4a4x-mna2"

See my other comment for the links to verify: #2084 (comment)

Copy link
Copy Markdown

@RoanBox RoanBox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving REQUEST_CHANGES based on needed fixes to the Socrata fetch path, the DuckDB view-loading behavior, and the new test coverage added in this PR.

FIELD_MAP,
normalize,
getSocrataDataResources,
getServiceRequestSocrata,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file doesn’t match the module in this PR. DataService.js does not export FIELD_MAP or normalize, and these tests still call the old zero-arg getServiceRequestSocrata(). Please align the tests with the actual API in this branch.

Comment thread src/utils/DataService.js
const unvalidatedByYear = await Promise.all(
years.map((year) => {
const where = `createddate >= '${startDate}T00:00:00.000' AND createddate <= '${endDate}T23:59:59.999'`;
const url = `https://data.lacity.org/resource/${dataResources[year]}.json?$where=${encodeURIComponent(where)}&$limit=1000`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still uses $limit=1000 with no pagination. Large date ranges can silently return incomplete data.

Comment thread backend/DbProvider.jsx
for (let year = 2020; year <= currentYear; year++) {
try {
await newConn.query(
`CREATE VIEW requests_${year} AS SELECT * FROM 'requests${year}.parquet'`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still looks like the behavior from the open requested-changes thread. I’m not seeing the proposed schema-only WHERE 1 = 0 approach reflected here.

Comment thread src/utils/DataService.js
const startYear = moment(startDate).year();
const endYear = moment(endDate).year();
const years = [];
for (let year = startYear; year <= endYear; year++) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now iterates across all years in the selected range, but dataResources is missing 2020-2023. That will generate undefined.json URLs for those years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants