-
-
Notifications
You must be signed in to change notification settings - Fork 72
Socrata tweaks (and a couple of odds and ends) #2084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,3 +43,6 @@ venv/ | |
|
|
||
| .DS_Store | ||
|
|
||
| # claude | ||
| CLAUDE.md | ||
| /.claude | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -64,6 +64,17 @@ function DbProvider({ children, startDate }) { | |
| // Create db connection | ||
| const newConn = await newDb.connect(); | ||
|
|
||
| // Create views so tables can be queried as requests_<year> | ||
| for (let year = 2020; year <= currentYear; year++) { | ||
| try { | ||
| await newConn.query( | ||
| `CREATE VIEW requests_${year} AS SELECT * FROM 'requests${year}.parquet'`, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This still looks like the behavior from the open requested-changes thread. I’m not seeing the proposed schema-only |
||
| ); | ||
| } catch (err) { | ||
| console.warn(`Failed to create view for year ${year}:`, err); | ||
| } | ||
| } | ||
|
|
||
| setDb(newDb); | ||
| setConn(newConn); | ||
| setWorker(newWorker); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,17 +60,30 @@ const socrataServiceRequestSchema = object({ | |
|
|
||
| const srArraySchema = array().of(socrataServiceRequestSchema); | ||
|
|
||
| export async function getServiceRequestSocrata() { | ||
| export async function getServiceRequestSocrata(startDate, endDate) { | ||
| const dataLoadStartTime = performance.now(); | ||
|
|
||
| try { | ||
| // Fetch current year SR data through Socrata API | ||
| const currentYear = String(new Date().getFullYear()); | ||
| const currentYearFilename = `https://data.lacity.org/resource/${dataResources[currentYear]}.json` | ||
| const response = await fetch( | ||
| currentYearFilename | ||
| // Build list of years covered by the date range | ||
| const startYear = moment(startDate).year(); | ||
| const endYear = moment(endDate).year(); | ||
| const years = []; | ||
| for (let year = startYear; year <= endYear; year++) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This now iterates across all years in the selected range, but |
||
| years.push(String(year)); | ||
| } | ||
|
|
||
| // Fetch data for each year filtered by the requested date range. | ||
| // Without a $where clause, Socrata returns only 1000 records in internal-ID | ||
| // order (i.e. the oldest records first), which would all fail the client-side | ||
| // Mapbox date filter. We also raise $limit well above the default 1000 so that | ||
| // the full date range is covered. | ||
| const unvalidatedByYear = await Promise.all( | ||
| years.map((year) => { | ||
| const where = `createddate >= '${startDate}T00:00:00.000' AND createddate <= '${endDate}T23:59:59.999'`; | ||
| const url = `https://data.lacity.org/resource/${dataResources[year]}.json?$where=${encodeURIComponent(where)}&$limit=1000`; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This still uses |
||
| return fetch(url).then((res) => res.json()); | ||
| }) | ||
| ); | ||
| const unvalidatedSrs = await response.json(); | ||
|
|
||
| const dataLoadEndTime = performance.now(); | ||
| console.log( | ||
|
|
@@ -80,7 +93,12 @@ export async function getServiceRequestSocrata() { | |
| ); | ||
|
|
||
| const mapLoadStartTime = performance.now(); | ||
| const validatedSrs = await srArraySchema.validate(unvalidatedSrs); | ||
| const validatedByYear = await Promise.all( | ||
| unvalidatedByYear.map((unvalidatedSrs) => | ||
| srArraySchema.validate(unvalidatedSrs) | ||
| ) | ||
| ); | ||
| const validatedSrs = validatedByYear.flat(); | ||
| const mapLoadEndTime = performance.now(); | ||
| console.log( | ||
| `Socrata map preparation time: ${Math.floor( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 100% the right move to make, but it has some implications when using Huggingface as the dataset source. I believe it will try to download all datasets into memory before the user gets a chance to filter the dataset. This is what the network tab shows on your branch (regardless of what data source is specified in
.env):Click to see 311 map + network tab
(Note: the failed fetches for dataset years that are pre-2024 are likely a separate problem, I'll try and surface that issue elsewhere)
I think ultimately we should be creating the views without actually loading data. We are simply using
... AS SELECT * FROM requestsYYYY.parquetso that we can automatically retrieve the column names. Maybe we can change the query to do that? Or we can try an approach that is independent from Huggingface and simply store the relevant columns (for each year...sigh) in a local file (as a javascript object, or just put a small file in thedatafolder).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi May, I think we can accomplish this with a
This condition is universally false so the SELECT statement returns only the structure (column names and data types) and no data into the view.