Skip to content

Commit b8c4878

Browse files
justincastillaJustin Castilla
and
Justin Castilla
authored
suppoprting-blog-content/navigating-an-elastic-vector-databse (#331)
Co-authored-by: Justin Castilla <[email protected]>
1 parent fd80079 commit b8c4878

16 files changed

+12151
-0
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
env/
2+
.env
3+
.DS_Store
4+
*/.ipynb_checkpoints/
5+
.python-version
6+
small_books_embedded.json
7+
books_embedded.json
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
<img src="https://github.com/user-attachments/assets/b2879240-ae16-4544-ae67-3d261a67e2a1" width="50%"/>
2+
3+
## Elastic Book Search
4+
This is a companion codebase for the article *Navigating an Elastic Vector Database* found [Navigating an Elastic Vector Database](). This contains all of the necessary instructions to create and operate a vector database with Elasticsearch.
5+
6+
Folder contents:
7+
8+
### `example.env`
9+
Update and rename this file to only `.env`. Provide your own credentials for the Elasticsearch Endpoint and Elastic API Key. The default index name for this repository is set to `books`
10+
11+
### `src/`
12+
13+
#### `src/elastic_client.py`: connector to Elasticsearch. Draws credentials from the above `.env` file.
14+
15+
#### `src/upload_books_local_embed.py`: scripts to upload books to Elasticsearch with local embedding.
16+
This file will run an embedding model locally to create vectors for each book object. The new books list with vectors will then be indexed into Elasticsearch.
17+
18+
- **Functions**:
19+
- `embed_descriptions()`: Converts the `book_description` field into a vector.
20+
- `create_index()`: Creates the Elasticsearch index `books-local` for storing book documents.
21+
- `bulk_upload()`: Uploads multiple book documents to the Elasticsearch index in bulk.
22+
- `upload_single_book()`: Uploads a single book document to the Elasticsearch index.
23+
24+
- **How to run**:
25+
1. Ensure Elasticsearch is running locally.
26+
2. Navigate to the `src/` directory.
27+
3. Run the script using Python:
28+
```sh
29+
python upload_books_local_embed.py
30+
```
31+
4. By default the script will run a small batch of books (25) for faster performance. Embedding and indexing the full `books.json` will take longer, but the search results will be more relevant.
32+
33+
#### `src/upload_books_with_pipeline.py`: scripts to upload books to Elasticsearch with ingestion pipeline functionality.
34+
This file will create an inference ingestion pipeline to instruct Elasticsearch to create a vector embedding of all `book_description` fields that are indexed. This moves the embedding computation from the local machine to the Elasticsearch instance.
35+
36+
Note: you will need to upload and deploy the embedding model to Elasticsearch via the execution of a Docker image. Full instructions are available [here](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-text-emb-vector-search-example.html).
37+
38+
- **Functions**:
39+
- `create_ingest_pipeline()`: Creates an inference ingestion pipeline to embed vectors when documents are indexed.
40+
- `create_index()`: Creates the Elasticsearch index `books-pipeline` for storing book documents.
41+
- `bulk_upload()`: Uploads multiple book documents to the Elasticsearch index in bulk.
42+
- `upload_single_book()`: Uploads a single book document to the Elasticsearch index.
43+
44+
- **How to run**:
45+
1. Ensure Elasticsearch is running locally.
46+
2. Navigate to the `src/` directory.
47+
3. Run the script using Python:
48+
```sh
49+
python upload_books_with_pipeline.py
50+
```
51+
4. By default the script will run all books (10,908) as all embedding occurs on the Elasticsearch instance.
52+
53+
54+
#### `src/query_examples.py`: scripts to demonstrate various query examples for searching books in Elasticsearch.
55+
This file contains three different types of search examples: traditional (bm25), vector, and hybrid search. Hybrid utilizes both search types then combines the results in a normalized ranking order.
56+
57+
- **Functions**:
58+
- `vector_search()`: performs a vector search with a given query string.
59+
- `search()`: performs a traditional search.
60+
- `hybrid_search(q)`: performs a hybrid search.
61+
62+
63+
- **How to run**:
64+
1. Ensure Elasticsearch is running locally.
65+
2. Navigate to the `src/` directory.
66+
3. Run the script using Python:
67+
```sh
68+
python query_examples.py
69+
```
70+
4. Modify the query parameters within the script to test different search criteria and observe the results.
71+
72+
### `notebooks/`
73+
Python notebooks have been provided of the above python scripts for more interactivity.

supporting-blog-content/navigating-an-elastic-vector-database/data/books.json

Lines changed: 10910 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"author_name": "Kate L. Mary","book_title": "Shattered World","genres": ["Horror (Zombies)", "Apocalyptic (Post Apocalyptic)", "Science Fiction", "Adventure (Survival)", "Fantasy"],"rating_votes": 1667,"book_description": "Stranded in the middle of the Mojave Desert, surrounded by zombies, Vivian and Axl’s group are sure they’re facing the end...","review_number": 146,"rating_score": 4.16,"url": "https://www.goodreads.com/book/show/22693753-shattered-world","year_published": 2014}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"author_name": "Kate L. Mary", "book_title": "Shattered World", "genres": ["Horror (Zombies)", "Apocalyptic (Post Apocalyptic)", "Science Fiction", "Adventure (Survival)", "Fantasy"], "rating_votes": 1667, "book_description": "Stranded in the middle of the Mojave Desert, surrounded by zombies, Vivian and Axl\u2019s group are sure they\u2019re facing the end...", "review_number": 146, "rating_score": 4.16, "url": "https://www.goodreads.com/book/show/22693753-shattered-world", "year_published": 2014, "description_embedding": [0.01580764167010784, 0.22619199752807617, 0.061492811888456345, -0.07076962292194366, 0.19296526908874512, -0.18163791298866272, 0.09238974750041962, 0.02809027023613453, 0.2069358080625534, 0.03369198366999626, 0.12177182734012604, -0.1551419049501419, 0.4301314651966095, 0.22151675820350647, -0.07834804058074951, -0.07931677252054214, -0.12000318616628647, 0.041314493864774704, 0.09286925196647644, 0.5461690425872803, 0.12646976113319397, 0.21454143524169922, 0.12328825145959854, 0.30151447653770447, 0.42981261014938354, -0.2755526900291443, 0.16859455406665802, 0.05333663895726204, -0.28827545046806335, -0.06596492975950241, 0.1376279592514038, 0.3892855644226074, 0.035325199365615845, 0.345918744802475, -0.1421118974685669, 0.28826847672462463, -0.3034919202327728, 0.031357333064079285, 0.42434701323509216, -0.054715268313884735, 0.06440500169992447, 0.09460961818695068, -0.20685116946697235, 0.04911485314369202, -0.16794872283935547, -0.25542789697647095, 0.09850622713565826, -0.12912166118621826, 0.061373889446258545, 0.28480663895606995, 0.0065286774188280106, -0.19825676083564758, -0.08802828937768936, -0.306179404258728, 0.1492973119020462, -0.29602527618408203, 0.08044683933258057, -0.09830383211374283, 0.11387518793344498, -0.19018980860710144, 1.004629135131836, -0.2527683675289154, 0.23049522936344147, 0.18911059200763702, -0.021210025995969772, -0.06271161884069443, 0.03231221064925194, 0.02669292874634266, 0.3727841377258301, -0.14707015454769135, -0.027216628193855286, 0.22932565212249756, 0.05605350434780121, -0.030585186555981636, -0.17031414806842804, 0.20451314747333527, -0.014836861751973629, -0.2994351387023926, -0.33840686082839966, 0.28302931785583496, -0.1205720603466034, 0.09383304417133331, -0.09202669560909271, -0.2311500757932663, 0.20528261363506317, 0.0600004605948925, 0.1652427464723587, 0.1952650249004364, 0.07426771521568298, 0.05831650272011757, 0.1998344361782074, 0.07970748841762543, 0.07687854766845703, 0.19421154260635376, -0.41393589973449707, -0.07710293680429459, 0.09480711072683334, 0.3590507507324219, -0.11169079691171646, 0.1466662585735321, -0.04648619517683983, 0.12184573709964752, 0.1886690855026245, -0.01227524969726801, 0.007221416104584932, -0.07297371327877045, 0.23645585775375366, -0.4140874445438385, -0.10309960693120956, 0.09794654697179794, -0.41897496581077576, -0.1735568791627884, 0.18298204243183136, 0.07754664868116379, -0.3299482762813568, 0.19581015408039093, 0.03185329586267471, 0.2265215367078781, -0.08955875784158707, 0.2397865504026413, -0.18556757271289825, 0.11270305514335632, 0.05812529847025871, 0.08518645912408829, 0.1123589500784874, -0.14968520402908325, -0.044921595603227615, -0.2631445527076721, 0.02177373506128788, 0.11460041254758835, -0.15427900850772858, 0.032435569912195206, 0.11800768971443176, -0.4030109941959381, -0.4043869376182556, -0.2038908749818802, -0.1324802041053772, -0.11795486509799957, -0.16821841895580292, -0.09793674945831299, 0.3601471781730652, 0.12467237561941147, 0.002474149689078331, -0.10111618041992188, 0.2644222378730774, -0.468985378742218, -0.271572083234787, -0.07352738827466965, -0.15459896624088287, -0.0660654604434967, -0.12395235896110535, -0.32043004035949707, -0.09178164601325989, -0.08296649158000946, 0.28688016533851624, -0.16636617481708527, 0.36838552355766296, 0.3100353479385376, 0.03138107806444168, 0.272241473197937, 0.4617847502231598, 0.2772320806980133, -0.17151904106140137, 0.3755682408809662, -0.222712904214859, 0.18610012531280518, -0.2690748870372772, 0.21402986347675323, 0.0709182545542717, -0.029267767444252968, 0.3305127024650574, 0.10856260359287262, 0.3864653408527374, -0.15817509591579437, 0.045786090195178986, 0.09548764675855637, -0.28868481516838074, -0.08767382800579071, -0.1209241971373558, -0.043898288160562515, 0.0036050728522241116, -0.3048270344734192, -0.03599291667342186, -0.09709092229604721, -0.029721688479185104, -0.35134631395339966, -0.5784905552864075, -0.06766840070486069, 0.15980173647403717, -0.02949715591967106, 0.14626264572143555, -0.01807369850575924, 0.1642894744873047, 0.051374804228544235, 0.26771458983421326, 0.19837787747383118, -0.17099831998348236, -0.15916897356510162, 0.24688614904880524, 0.3196212351322174, 0.05605035275220871, 0.04331720247864723, -0.6120195388793945, 0.022583208978176117, -0.060387544333934784, -0.016207493841648102, 0.14361926913261414, -0.02678176760673523, -0.15519793331623077, 0.12402782589197159, 0.0773395374417305, -0.12528321146965027, 0.11940529942512512, 0.2119743973016739, 0.30299147963523865, -0.17622201144695282, -0.2696838080883026, -0.08597947657108307, -0.24700026214122772, 0.17239801585674286, 0.19126614928245544, -0.3954373598098755, 0.15078216791152954, -0.08059302717447281, -0.12541820108890533, 0.07681937515735626, -0.01338228490203619, -0.27321934700012207, 0.07809127867221832, 0.12449196726083755, -0.14464430510997772, -0.4631950855255127, -0.5238178372383118, -0.08906760066747665, -0.9339474439620972, -0.3693825602531433, 0.49047601222991943, 0.12194299697875977, -0.28056254982948303, -0.30074018239974976, -0.19355319440364838, -0.24073077738285065, 0.18765251338481903, -0.02896942012012005, -0.34527507424354553, -0.19199252128601074, 0.30894196033477783, 0.03261258080601692, -0.1882130652666092, 0.35334688425064087, 0.1737353801727295, 0.010323960334062576, -0.35744673013687134, -0.05811749026179314, -0.2173653244972229, -0.4737064838409424, -0.0702119991183281, -0.25682809948921204, -0.07966335117816925, 0.14650246500968933, -0.3791314661502838, 0.04744107648730278, -0.23478181660175323, -0.21039234101772308, 0.12665563821792603, -0.33353665471076965, -0.4003707766532898, -0.21497881412506104, 0.18896757066249847, -0.06419731676578522, 0.28405871987342834, -0.06329311430454254, -0.30465611815452576, -0.2568626403808594, 0.166348397731781, 0.18288305401802063, -0.28701356053352356, -0.05783326178789139, 0.48132944107055664, 0.45148539543151855, -0.43814393877983093, -0.3642289936542511, 0.11159732937812805, -0.30594977736473083, -0.25078055262565613, -0.08456215262413025, 0.09206540882587433, 0.004950289148837328, 0.2380678504705429, 0.3148786127567291, -0.04756174609065056, 0.3784553110599518, -0.1976633369922638, 0.00804405938833952, -0.14443853497505188, 0.03598101809620857, -0.2638629972934723, -0.0919763594865799, 0.3016832768917084, 0.20871101319789886, -0.3181276321411133, 0.1800403743982315, -0.007544277235865593, -0.01903371885418892, -0.2112334817647934, -0.2198220193386078, -0.3543657958507538, 0.29617300629615784, 0.20467819273471832, 0.19838660955429077, -0.15409690141677856, -0.3423407971858978, 0.1877298653125763, 0.07888581603765488, 0.06472637504339218, 0.04434327408671379, -0.05003989860415459, -0.26605069637298584, 0.31113240122795105, -0.04380364343523979, 0.288592666387558, 0.15908759832382202, 0.1339091956615448, -0.2129518985748291, 0.296629935503006, 0.026913587003946304, 0.1970633715391159, 0.1425819993019104, -0.17312896251678467, 0.11456890404224396, -0.13980959355831146, -0.3912646174430847, 0.10069779306650162, 0.12064486742019653, 0.10918109118938446, 0.33785349130630493, -0.12132947146892548, 0.2375248819589615, 0.11017335951328278, -0.055947378277778625, -0.23857514560222626, 0.4142799973487854, -0.36116042733192444, -0.24256597459316254, 0.2427724003791809, 0.33385196328163147, -0.15238404273986816, 0.09165703505277634, 0.31173548102378845, 0.18004325032234192, 0.2942294478416443, -0.005795755889266729, -0.03435010462999344, -0.1392885148525238, -0.48163914680480957, -0.2582989037036896, 0.3578932583332062, -0.058649785816669464, -0.08177932351827621, -0.10383599251508713, 0.3722625970840454, -0.10008163750171661, 0.024953294545412064, 0.12239386886358261, -0.231162428855896, -0.2457631230354309, 0.5127071738243103, 0.01205542404204607, -0.10753599554300308, 0.2983993589878082, -0.03328469768166542, -0.29674237966537476, 0.09357470273971558, 0.3259318470954895, -0.03599970415234566, 0.3709545135498047, 0.156076118350029, 0.07773765921592712, -0.029436752200126648, 0.26356586813926697, -0.1992921084165573, -0.05686808004975319, -0.3875110447406769, 0.15471287071704865]}

0 commit comments

Comments
 (0)