Skip to content

Commit e82cbd9

Browse files
authored
Merge pull request #342 from MITLibraries/use-310-openaccess-lookup
Enables OpenAlex lookup for OpenAccess articles
2 parents 88b63d2 + 6b578ca commit e82cbd9

21 files changed

Lines changed: 564 additions & 31 deletions

.env.test

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,4 @@ TIMDEX_HOST=FAKE_TIMDEX_HOST
1515
TIMDEX_INDEX=FAKE_TIMDEX_INDEX
1616
THIRDIRON_ID=FAKE_THIRDIRON_ID
1717
THIRDIRON_KEY=FAKE_THIRDIRON_KEY
18+
OPENALEX_EMAIL=FAKE_OPENALEX_EMAIL

AGENTS.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ This file highlights the important, discoverable conventions and workflows an AI
44

55
- **Big picture:** TIMDEX UI is a Rails 7 app that orchestrates searches across two backends: TIMDEX (GraphQL) and Primo (legacy API). Core request flow is implemented in `app/controllers/search_controller.rb` which: validates params, builds an enhanced query (`Enhancer` -> `QueryBuilder`), then routes to Primo or Timdex fetchers (or both for the `all` tab). Results are normalized by `NormalizePrimoResults` / `NormalizeTimdexResults` and analyzed by `Analyzer`.
66

7+
- **Fulfillment links:** This application considers a link a fulfillment link if it takes the user to the resource directly. Sometimes we don't have a fulfillment link, so the user will click the Title of the result to view a full record view in the source system. Fulfillment links are provided from multiple ways. Primo data sometimes returns PDF or HTML links to the resource directly. If we have a DOI or PMID, we lookup fulfillment links via LibKey; in data comes back from LibKey, we prefer these links over the Primo links. If LibKey does not provide data, or if `FEATURE_OA_ALWAYS` is enabled, we will look for OpenAccess links in OpenAlex via DOI or PMID. For journal records that have an ISSN, we also use Browzine to get a fulfillment link.
8+
79
- **GraphQL integration:** GraphQL queries live on the Ruby side using `graphql-client` and `TimdexBase::Client`. See `app/models/timdex_search.rb` for the queries (`BaseQuery`, `GeoboxQuery`, `GeodistanceQuery`, `AllQuery`). The canonical schema is stored at `config/schema/schema.json`. Update schema via the Rails console:
810

911
```ruby
@@ -16,13 +18,14 @@ This file highlights the important, discoverable conventions and workflows an AI
1618

1719
| Flag | Purpose |
1820
|------|----------|
19-
| `FEATURE_GEODATA` | Enable geospatial search (bounding box and radius-based queries); defaults to false |
2021
| `FEATURE_BOOLEAN_PICKER` | Allow users to choose AND/OR boolean logic in searches |
22+
| `FEATURE_GEODATA` | Enable geospatial search (bounding box and radius-based queries); defaults to false |
23+
| `FEATURE_OA_ALWAYS` | Always do OpenAlex lookups when DOI or PMID is detected rather than only when LibKey does not return data |
24+
| `FEATURE_RECORD_LINK` | Show "View full record" link in search results |
2125
| `FEATURE_SIMULATE_SEARCH_LATENCY` | Add 1s minimum delay to search results for testing UX behavior |
2226
| `FEATURE_TAB_PRIMO_ALL` | Display combined Primo (CDI + Alma) results tab |
2327
| `FEATURE_TAB_TIMDEX_ALL` | Display combined TIMDEX results tab |
2428
| `FEATURE_TAB_TIMDEX_ALMA` | Display Alma-only TIMDEX results tab |
25-
| `FEATURE_RECORD_LINK` | Show "View full record" link in search results |
2629

2730
Essential ENV vars for core functionality: `TIMDEX_GRAPHQL`, `PRIMO_API_URL`, `PRIMO_API_KEY`, `RESULTS_PER_PAGE`, `TIMDEX_INDEX`, `TIMDEX_SOURCES`. Filter customization: `FILTER_*` (e.g., `FILTER_LANGUAGE`, `FILTER_CONTENT_TYPE`) and `ACTIVE_FILTERS` (comma-separated list controlling visibility/order of filters; note that filter aggregation keys in the schema use `*Filter` suffix, e.g., `languageFilter`, `contentTypeFilter`). Tests rely on `.env.test` values for VCR cassette generation and use `ClimateControl` gem to mock feature flags.
2831

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ See `Optional Environment Variables` for more information.
9898
- `FEATURE_GEODATA`: Enables features related to geospatial data discovery. Setting this variable to `true` will trigger geodata
9999
mode. Note that this is currently intended _only_ for the geodata app and
100100
may have unexpected consequences if applied to other TIMDEX UI apps.
101+
- `FEATURE_OA_ALWAYS`: Enables OpenAccess links from OpenAlex whenever they are available, not just when LibKey does not return data. `OPENALEX_EMAIL` must also be set.
101102
- `FEATURE_RECORD_LINK`: Display the 'View full record' link below each record.
102103
- `FEATURE_SIMULATE_SEARCH_LATENCY`: DO NOT SET IN PRODUCTION. Set to ensure a minimum of a one second delay in returning search results. Useful to see spinners/loaders. Only introduces delay for results that take less than one second to complete.
103104
- `FEATURE_TAB_PRIMO_ALL`: Display a tab for displaying the combined Primo data (CDI + Alma)
@@ -119,6 +120,7 @@ may have unexpected consequences if applied to other TIMDEX UI apps.
119120
- `MATOMO_CONTAINER_URL`: This is one of two options for integrating a TIMDEX UI application with Matomo - the Tag Manager. This is the only parameter needed for using a tag manager container.
120121
- `MATOMO_SITE_ID`: Integrating with Matomo using the legacy approach (instead of Tag Manager) requires two values: the site id and a URL. This is one of those legacy values.
121122
- `MATOMO_URL`: Integrating with Matomo using the legacy approach (instead of Tag Manager) requires two values: the site id and a URL. This is one of those legacy values.
123+
- `OPENALEX_EMAIL`: required to enable OpenAlex OpenAccess lookups. In dev use your personal email. In production we'll use a Moira.
122124
- `ORIGINS`: sets origins for CORS (currently used only for TACOS API calls).
123125
- `PLATFORM_NAME`: The value set is added to the header after the MIT Libraries logo. The logic and CSS for this comes from our theme gem.
124126
- `PRIMO_TIMEOUT`: The number of seconds before a Primo request times out (default 6).
Lines changed: 27 additions & 0 deletions
Loading

app/assets/stylesheets/partials/_results.scss

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -317,17 +317,6 @@
317317
@include buttonSecondary;
318318
}
319319

320-
// Treat libkey links similarly to the primo links
321-
.libkey-container > a.button:first-child {
322-
@include buttonSecondary;
323-
}
324-
325-
.libkey-container {
326-
display: flex;
327-
align-items: center;
328-
gap: 24px;
329-
}
330-
331320
// Loading skeleton when a button isn't available
332321
span.skeleton-loader {
333322
@include skeleton-loader(112px);
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
class OpenalexController < ApplicationController
2+
layout false
3+
4+
def work
5+
return unless Openalex.enabled? && expected_params?
6+
7+
@openalex = Openalex.work(identifier_type: params[:type], identifier: params[:identifier])
8+
end
9+
10+
private
11+
12+
def expected_params?
13+
params[:type].present? && params[:identifier].present?
14+
end
15+
end

app/controllers/thirdiron_controller.rb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ def libkey
55
return unless ThirdIron.enabled? && expected_params?
66

77
@libkey = Libkey.lookup(type: params[:type], identifier: params[:identifier])
8+
@doi = params[:type] == 'doi' ? params[:identifier] : nil
9+
@pmid = params[:type] == 'pmid' ? params[:identifier] : nil
810
end
911

1012
def browzine

app/javascript/controllers/content_loader_controller.js

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,16 @@ export default class extends Controller {
1111
fetch(this.urlValue)
1212
.then(response => response.text())
1313
.then(html => {
14-
this.element.innerHTML = html;
14+
const parentElement = this.element.parentElement;
15+
// Replace the entire element with the fetched HTML
16+
this.element.outerHTML = html;
1517
// Hide primo links if libkey link is present
16-
if (this.element.querySelector('.libkey-link')) {
17-
const resultGet = this.element.closest('.result-get');
18+
if (parentElement.querySelector('.libkey-link')) {
19+
const resultGet = parentElement.closest('.result-get');
1820
if (resultGet) {
1921
const primoLinks = resultGet.querySelectorAll('.primo-link');
20-
primoLinks.forEach(link => link.style.display = 'none');
22+
// removing instead of hiding to avoid layout issues when selecting which link to highlight
23+
primoLinks.forEach(link => link.remove());
2124
}
2225
}
2326
})

app/models/feature.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
#
3434
class Feature
3535
# List of all valid features in the application
36-
VALID_FEATURES = %i[geodata boolean_picker simulate_search_latency tab_primo_all tab_timdex_all
36+
VALID_FEATURES = %i[geodata boolean_picker oa_always simulate_search_latency tab_primo_all tab_timdex_all
3737
tab_timdex_alma record_link].freeze
3838

3939
# Check if a feature is enabled by name

app/models/openalex.rb

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# frozen_string_literal: true
2+
3+
# OpenAlex API integration
4+
# https://docs.openalex.org/how-to-use-the-api/api-overview
5+
# https://docs.openalex.org/api-entities/works
6+
class Openalex
7+
BASEURL = 'https://api.openalex.org'
8+
9+
class LookupFailure < StandardError; end
10+
11+
# enabled? confirms that the required environment variable is set.
12+
#
13+
# @return Boolean
14+
def self.enabled?
15+
openalex_email.present?
16+
end
17+
18+
# OpenAlex accepts various identifier formats: full URI, short URN, or raw identifier
19+
# We are only supporting raw identifier lookups here for simplicity (ex: DOI without the "doi:" prefix) as that is
20+
# the shape of the data coming from Primo. We add the identifier type prefix accordingly.
21+
# Currently supported identifier types in OpenAlex are 'doi', 'pmid', 'pmcid', and 'mag'
22+
def self.work(identifier:, identifier_type: 'doi', openalex_client: nil)
23+
return nil unless enabled?
24+
25+
# Check cache first
26+
cache_key = generate_cache_key(identifier_type, identifier)
27+
cached_result = Rails.cache.read(cache_key)
28+
return cached_result if cached_result.present?
29+
30+
# Construct the OpenAlex Works endpoint URL
31+
url = "#{BASEURL}/works/#{identifier_type}:#{identifier}"
32+
33+
openalex_http = setup(url, openalex_client)
34+
35+
begin
36+
raw_response = openalex_http.timeout(6).get(url)
37+
raise LookupFailure, raw_response.status unless raw_response.status == 200
38+
39+
json_response = JSON.parse(raw_response.to_s)
40+
41+
result = extract_metadata(json_response)
42+
Rails.logger.debug(result)
43+
44+
# Cache the result for 24 hours
45+
Rails.cache.write(cache_key, result, expires_in: 24.hours) if result.present?
46+
result
47+
rescue LookupFailure => e
48+
# 404s are expected for missing works, so only log unexpected statuses
49+
if e.message != '404 Not Found'
50+
Sentry.set_tags('mitlib.openalex_url': url)
51+
Sentry.set_tags('mitlib.openalex_status': e.message)
52+
Sentry.capture_message('Unexpected OpenAlex response status')
53+
Rails.logger.error("Unexpected OpenAlex response status: #{e.message}")
54+
end
55+
nil
56+
rescue HTTP::Error
57+
Rails.logger.error('OpenAlex connection error')
58+
{ 'error' => 'A connection error has occurred' }
59+
rescue JSON::ParserError
60+
Rails.logger.error('OpenAlex parsing error')
61+
{ 'error' => 'A parsing error has occurred' }
62+
end
63+
end
64+
65+
def self.is_oa?(external_data)
66+
return false if external_data.blank?
67+
68+
external_data.dig('open_access', 'is_oa') || false
69+
end
70+
71+
# Using the OpenAlex best OA location logic. If we need to change the logic, we can update here by using locations
72+
# rather than best_oa_location from OpenAlex directly.
73+
def self.extract_metadata(external_data)
74+
return nil if external_data.blank? || external_data['id'].blank?
75+
return nil unless is_oa?(external_data)
76+
77+
{
78+
record_id: external_data['id'],
79+
is_open: is_oa?(external_data),
80+
pdf_link: pdf_link(external_data),
81+
html_link: html_link(external_data),
82+
type: user_friendly_type(type(external_data))
83+
}
84+
end
85+
86+
def self.pdf_link(external_data)
87+
return nil if external_data.blank? || external_data['best_oa_location'].blank?
88+
89+
external_data['best_oa_location']['pdf_url']
90+
end
91+
92+
def self.html_link(external_data)
93+
return nil if external_data.blank? || external_data['best_oa_location'].blank?
94+
95+
external_data['best_oa_location']['landing_page_url']
96+
end
97+
98+
def self.type(external_data)
99+
return nil if external_data.blank? || external_data['best_oa_location'].blank?
100+
101+
external_data['best_oa_location']['version']
102+
end
103+
104+
def self.openalex_email
105+
ENV.fetch('OPENALEX_EMAIL', nil)
106+
end
107+
108+
def self.setup(url, openalex_client)
109+
openalex_client || HTTP.persistent(url)
110+
.headers(accept: 'application/json',
111+
'User-Agent': "TIMDEX UI (#{openalex_email})")
112+
end
113+
114+
def self.generate_cache_key(identifier_type, identifier)
115+
"openalex:works:#{identifier_type}:#{Digest::MD5.hexdigest(identifier)}"
116+
end
117+
118+
def self.user_friendly_type(type_code)
119+
case type_code
120+
when 'acceptedVersion'
121+
'Accepted Version'
122+
when 'publishedVersion'
123+
'Published Version'
124+
when 'submittedVersion'
125+
'Submitted Version'
126+
else
127+
Sentry.set_tags('mitlib.openalex_type_code': type_code)
128+
Sentry.capture_message('Unexpected OpenAlex type code')
129+
Rails.logger.error("Unexpected OpenAlex type code: #{type_code}")
130+
131+
type_code
132+
end
133+
end
134+
end

0 commit comments

Comments
 (0)