Skip to content

Commit 9e908ef

Browse files
Proof of concept development kit
1 parent 9ec480e commit 9e908ef

File tree

3,527 files changed

+34842
-7841
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,527 files changed

+34842
-7841
lines changed

.gitattributes

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
conductorscripts/**/*.sh text eol=lf
1+
conductorscripts/**/*.sh text eol=lf

.gitignore

+5-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,5 @@
1-
.DS_Store
1+
.env
2+
.DS_Store
3+
node_modules
4+
dist
5+
__MACOSX

Makefile

+12-15
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,17 @@
11
platform:
22
PROFILE=platform docker compose --profile platform up --attach conductor
33

4-
stageDev:
5-
PROFILE=stageDev docker compose --profile stageDev up --attach conductor
6-
7-
arrangerDev:
8-
PROFILE=arrangerDev docker compose --profile arrangerDev up --attach conductor
9-
10-
maestroDev:
11-
PROFILE=maestroDev docker compose --profile maestroDev up --attach conductor
12-
13-
songDev:
14-
PROFILE=songDev docker compose --profile songDev up --attach conductor
15-
16-
scoreDev:
17-
PROFILE=scoreDev docker compose --profile scoreDev up --attach conductor
18-
194
down:
205
PROFILE=platform docker compose --profile platform down
6+
7+
clean:
8+
@echo "\033[31mWARNING: This will remove all data within Elasticsearch.\033[0m"
9+
@echo "Are you sure you want to proceed? [y/N] " && read ans && [ $${ans:-N} = y ]
10+
@echo "Stopping related containers..."
11+
PROFILE=platform docker compose --profile platform down
12+
@echo "Cleaning up Elasticsearch volumes..."
13+
-rm -rf ./volumes/es-data/nodes 2>/dev/null || true
14+
-find ./volumes/es-logs -type f ! -name 'logs.txt' -delete 2>/dev/null || true
15+
-docker volume rm -f deployment_elasticsearch-data 2>/dev/null || true
16+
-docker volume rm -f deployment_elasticsearch-logs 2>/dev/null || true
17+
@echo "Cleanup completed!"

README.md

+217-57
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,229 @@
1-
# Conductor
1+
# Drug Discovery Project
22

3-
Conductor is a flexible Docker Compose setup that simplifies the process of spinning up Overture development and deployment configurations using Docker profiles and extensible scripting events.
3+
This repository contains the data and infrastructure for the Overture-Drug Discovery Data Portal.
44

5-
## Key Features
5+
## Running the portal
66

7-
- **Profile-based Deployments**: Uses Docker profiles to manage different environment setups.
8-
- **Conductor-driven Execution**: The Conductor service executes ordered scripts based on the `PROFILE` environment variable.
7+
1. **Set Up Docker:** Install or update to Docker Desktop version 4.32.0 or higher. Visit [Docker's website](https://www.docker.com/products/docker-desktop/) for installation details.
98

10-
## Getting Started
9+
> [!important]
10+
> Allocate sufficient resources to Docker:
11+
> - Minimum CPU: `8 cores`
12+
> - Memory: `8 GB`
13+
> - Swap: `2 GB`
14+
> - Virtual disk: `64 GB`
15+
>
16+
> Adjust these in Docker Desktop settings under "Resources".
1117
12-
**1. Clone the repo's `main` branch**
18+
**2. Clone the repo branch**
1319

1420
```
15-
git clone -b concerto https://github.com/overture-stack/composer.git && cd composer
21+
git clone https://github.com/oicr-softeng/drug_discovery-ui.git
1622
```
1723

18-
**2. Run one of the following commands to spin up different environments:**
24+
**3. Build a Stage image from its dockerfile**
25+
26+
```
27+
cd stage
28+
docker build -t multi-arranger-stage:2.0 .
29+
```
30+
31+
**4. Run one of the following commands from the root of the repository:**
1932

2033
| Environment | Unix/macOS | Windows |
2134
|-------------|------------|---------|
22-
| Overture Platform | `make platform` | `make.bat platform` |
23-
| Stage Dev | `make stageDev` | `make.bat stageDev` |
24-
| Arranger Dev | `make arrangerDev` | `make.bat arrangerDev` |
25-
| Maestro Dev | `make maestroDev` | `make.bat maestroDev` |
26-
| Song Dev | `make songDev` | `make.bat songDev` |
27-
| Score Dev | `make scoreDev` | `make.bat scoreDev` |
28-
29-
Each command spins up complementary services for the specified development environment.
30-
31-
## Repository Structure
32-
33-
```
34-
.
35-
├── conductorScripts/
36-
│ ├── deployments
37-
│ └── services
38-
├── configurationFiles/
39-
│ ├── arrangerConfigs
40-
│ ├── elasticsearchConfigs
41-
│ └── keycloakConfigs
42-
├── guideMaterials
43-
├── persistentStorage/
44-
│ ├── data-keycloak-db
45-
│ ├── data-minio
46-
│ └── data-song-db
47-
├── Makefile
48-
└── make.bat
49-
```
50-
51-
- **`conductorScripts/`** Contains scripts for orchestrating the deployment process.
52-
- `deployments/`: Scripts that execute service scripts sequentially based on the deployment configuration. These also include custom post-deployment logs with essential next steps for the deployment scenario.
53-
- `services/`: Modular scripts for individual service setup tasks. Each file is named according to its purpose, with inline comments documenting the code.
54-
55-
- **`configurationFiles/`** Stores all required configuration files, including:
56-
- `arrangerConfigs/`: Configuration files specific to Arranger.
57-
- `elasticsearchConfigs/`: Configuration files for Elasticsearch, encompassing indexing mappings and documents for seeding data.
58-
- `keycloakConfigs/`: Configuration files for Keycloak, including preconfigured realm files and Overture API key provider details.
59-
60-
- **`guideMaterials/`** Supplementary folders and files for use with the [Overture guides](https://www.overture.bio/documentation/guides/).
61-
62-
- **`persistentStorage/`** Directory for storing persistent data during container startups and restarts. These folders come pre-loaded with mock data.
63-
- `data-keycloak-db/`: Persistent local storage for the Keycloak database.
64-
- `data-minio/`: Persistent local storage for MinIO object storage.
65-
- `data-song-db/`: Persistent local storage for the Song database.
66-
67-
- **`Makefile`** Contains [`make` commands](https://www.gnu.org/software/make/manual/make.html#Overview-of-make) for Unix-based systems (macOS, Linux) to streamline Docker operations.
68-
69-
- **`make.bat`** Windows equivalent of the Makefile, featuring batch commands tailored for Windows systems.
35+
| Overture Platform | `make platform` | `./make.bat platform` |
36+
37+
Following startup front end portal will be available at your `localhost:3000`
38+
39+
**3. You can also run any of the following helper commands:**
40+
41+
| Description | Unix/macOS | Windows |
42+
|-------------|------------|---------|
43+
| Shuts down all containers | `make down` | `./make.bat down` |
44+
| Removes all persistent Elasticsearch volumes | `make clean` | `./make.bat clean` |
45+
46+
# CSV to Elasticsearch Processor
47+
48+
A Node.js command-line tool for efficiently processing and indexing CSV files into Elasticsearch. This tool features progress tracking, batched processing, and detailed error reporting.
49+
50+
## Features
51+
52+
- 📊 Efficient CSV parsing with support for various delimiters
53+
- 🚀 Batch processing for optimal performance
54+
- 📈 Real-time progress tracking with ETA
55+
- 🔄 Configurable batch sizes
56+
- ⚠️ Detailed error reporting
57+
- 🔐 Elasticsearch authentication support
58+
- 🔍 Target index validation
59+
- 🧐 CSV Header Validation
60+
- Checks for duplicate headers
61+
- Validates header structure
62+
- Verifies headers match Elasticsearch index mapping
63+
64+
## Prerequisites
65+
66+
- Node.js (v14 or higher)
67+
- npm or yarn
68+
- Access to an Elasticsearch instance
69+
70+
## Installation
71+
72+
1. Clone the repository:
73+
```bash
74+
git clone [repository-url]
75+
cd csv-processor
76+
```
77+
78+
2. Install dependencies:
79+
```bash
80+
npm install
81+
```
82+
83+
3. Build the TypeScript code:
84+
```bash
85+
npm run build
86+
```
87+
88+
## Required Dependencies
89+
90+
```json
91+
{
92+
"dependencies": {
93+
"@elastic/elasticsearch": "^7.17.14",
94+
"@types/chalk": "^0.4.31",
95+
"@types/node": "^22.9.3",
96+
"chalk": "^4.1.2",
97+
"commander": "^12.1.0",
98+
"csv-parse": "^5.6.0",
99+
"ts-node": "^10.9.2"
100+
},
101+
"devDependencies": {
102+
"typescript": "^5.7.2"
103+
}
104+
}
105+
```
106+
107+
## Usage
108+
109+
The basic command structure is:
110+
111+
```bash
112+
node csv-processor.js -f <file-path> [options]
113+
```
114+
115+
### Command Line Options
116+
117+
| Option | Description | Default |
118+
|--------|-------------|---------|
119+
| `-f, --file <path>` | CSV file path (required) | - |
120+
| `--url <url>` | Elasticsearch URL | http://localhost:9200 |
121+
| `-i, --index <name>` | Elasticsearch index name | correlation-index |
122+
| `-u, --user <username>` | Elasticsearch username | elastic |
123+
| `-p, --password <password>` | Elasticsearch password | myelasticpassword |
124+
| `-b, --batch-size <size>` | Batch size for processing | 1000 |
125+
| `-d, --delimiter <char>` | CSV delimiter | , |
126+
127+
### Examples
128+
129+
Basic usage with default settings:
130+
```bash
131+
node csv-processor.js -f data.csv
132+
```
133+
134+
Custom Elasticsearch configuration:
135+
```bash
136+
node csv-processor.js -f data.csv --url http://localhost:9200 -i my-index -u elastic -p mypassword
137+
```
138+
139+
Process a semicolon-delimited CSV with custom batch size:
140+
```bash
141+
node csv-processor.js -f data.csv -d ";" -b 100
142+
```
143+
144+
## Repo Structure
145+
146+
```
147+
src/
148+
├── types/
149+
│ └── index.ts # Type definitions (Record, SubmissionMetadata interfaces)
150+
├── utils/
151+
│ ├── cli.ts # CLI setup and configuration
152+
│ ├── elasticsearch.ts # Elasticsearch client and operations
153+
│ ├── formatting.ts # Progress bar, duration formatting, etc.
154+
│ └── csv.ts # CSV parsing and processing utilities
155+
├── services/
156+
│ ├── processor.ts # Main CSV to Doc processing logic
157+
│ └── validator.ts # Config, index and data validation and checking
158+
└── main.ts # Entry point, brings everything together
159+
```
160+
161+
## Processing Flow
162+
163+
1. The tool first counts total records in the CSV file
164+
2. Confirms headers with the user
165+
3. Processes records in configured batch sizes
166+
4. Sends batches to Elasticsearch using the bulk API
167+
5. Displays real-time progress with:
168+
- Visual progress bar
169+
- Completion percentage
170+
- Records processed
171+
- Elapsed time
172+
- Estimated time remaining
173+
- Processing rate
174+
175+
## Error Handling
176+
177+
- Failed records are tracked and reported
178+
- Detailed error logging for debugging
179+
- Bulk processing continues even if individual records fail
180+
- Summary of failed records provided at completion
181+
182+
## Output Format
183+
184+
The tool provides colorized console output including:
185+
186+
```
187+
Total records to process: 1000
188+
189+
📋 Processing Configuration:
190+
├─ 📁 File: data.csv
191+
├─ 🔍 Index: my-index
192+
└─ 📝 Delimiter: ,
193+
194+
📑 Headers: id, name, value
195+
196+
🚀 Starting data processing and indexing...
197+
198+
[████████████░░░░░░░░░░] 50% | 500/1000 | ⏱ 0h 1m 30s | 🏁 1m 30s | ⚡1000 rows/sec
199+
```
200+
201+
## Performance Considerations
202+
203+
- Adjust batch size based on record size and Elasticsearch performance
204+
- Larger batch sizes generally improve throughput but use more memory
205+
- Monitor Elasticsearch CPU and memory usage
206+
- Consider network latency when setting batch sizes
207+
208+
## Troubleshooting
209+
210+
Common issues and solutions:
211+
212+
1. **Connection Errors**
213+
- Verify Elasticsearch is running
214+
- Check URL and port
215+
- Confirm network connectivity
216+
217+
2. **Authentication Failures**
218+
- Verify username and password
219+
- Check user permissions
220+
221+
3. **Parse Errors**
222+
- Verify CSV format
223+
- Check delimiter setting
224+
- Inspect file encoding
225+
226+
4. **Memory Issues**
227+
- Reduce batch size
228+
- Ensure sufficient system resources
229+
- Monitor Node.js memory usage

conductorScripts/deployments/arrangerDev.sh

-43
This file was deleted.

0 commit comments

Comments
 (0)