This repository will house any one-off side projects I create on behalf of others or small scripts I write to accomplish x or y tasks.
- Install
tesseract.- To install
tesseracton a Mac, you can run the following command:brew install tesseract - To install
tesseracton any other operating systems, please reference this document.
- To install
- Install the uv virtual environment container at the base directory for this repository.
- Run the
uv venvormake envcommand from the base folder of this repository.
The google_translate_pdfs package translates all PDFs in the google_translate_pdfs/data/input folder.
To run it, run this command: make translate source=[source_lang] target=[target_lang]
- The source and target language must be valid
ISO 639-1language codes values. - Example:
make translate source=fr target=en
- Set up your GCloud authentication via this set of instructions: Link
- Run the
gcloud auth application-default logincommand to save the credentials for your project to your computer.
- Run the
- Put the pdf files you want translated in the
google_translate_pdfs/data/inputfolder.- The
google_translate_pdfs/datafolder is in the.gitignorefile, so you don't have to worry about any files being potentially leaked into the repository.
- The
- Run the
make translatecommand from the base folder of the repository and your output files will be in thegoogle_translate_pdfs/data/outputfolder.
The pdf_parser package translates all PDFs in the pdf_parser/data/input folder.
To run it, run this command: make parse-pdf
- Put the PDF files you want parsed in the
pdf_parser/data/inputfolder.- The
pdf_parser/datafolder is in the.gitignorefile, so you don't have to worry about any files being potentially leaked into the repository.
- The
- Run the
make parse-pdfcommand from the base folder of the repository and your output files will be in thepdf_parser/data/outputfolder.
The geocode_verifier package confirms the addresses of rows in all CSVs in the geocode_verifier/data/input folder.
To run it, run this command: make verify-geocodes
- Set up your GCloud authentication via this set of instructions: Link
- Run the
gcloud auth application-default logincommand to save the credentials for your project to your computer.
- Run the
- Put the CSV files you want parsed in the
geocode_verifier/data/inputfolder.- The
geocode_verifier/datafolder is in the.gitignorefile, so you don't have to worry about any files being potentially leaked into the repository.
- The
- Run the
make verify-geocodescommand from the base folder of the repository and your output files will be in thegeocode_verifier/data/outputfolder.
| id | address_number | address | city | zipcode | state |
|---|---|---|---|---|---|
| external unique ID | address number | street and apartment information | city name | zip code | two-character state abbreviation |
| string or int | float or int | string | string | string | string |
citywill be defaulted toChicagoif the key is not present.statewill be defaulted toILif the key is not present.- All addresses are presumed to be in the United States.