Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with ChatGPT API #114

Open
baskaufs opened this issue Jan 23, 2024 · 5 comments
Open

Experiment with ChatGPT API #114

baskaufs opened this issue Jan 23, 2024 · 5 comments

Comments

@baskaufs
Copy link

The Python quickstart is at https://platform.openai.com/docs/quickstart?context=python

@baskaufs
Copy link
Author

New information about models (from a 2024-02-09 email):

Last week, we launched gpt-3.5-turbo-0125, our latest GPT-3.5 Turbo model. Along with various improvements, this model also has lower pricing: input prices for the new model are reduced by 50% to $0.0005 /1K tokens and output prices are reduced by 25% to $0.0015 /1K tokens.

If your code specifies gpt-3.5-turbo or gpt-3.5-turbo-16k (the pinned model alias), your model will be automatically updated to gpt-3.5-turbo-0125 on Friday, February 16 and you’ll receive the new, lower pricing.

If for any reason you'd like to continue using the old versions of GPT-3.5 Turbo, you can do so by updating your code to specify gpt-3.5-turbo-0613 or gpt-3.5-turbo-16k-0613 as the model parameter. Please note that the gpt-3.5-turbo-0613 and gpt-3.5-turbo-16k-0613 models will be shut down on June 13th, 2024 as part of our regular model upgrade cycles.

@baskaufs
Copy link
Author

Chat GPT 4.0 Vision: https://platform.openai.com/docs/guides/vision

@baskaufs
Copy link
Author

baskaufs commented Apr 2, 2024

Comments from Daniel about prompts:

Once you've accepted the invitation to the slavesocieties GitHub org, you should be able to access the repo here.

The instruction sets that I've constructed are here, examples for NLP are here, and example text for HTR is here.

The functions that compile those instruction sets and examples for specific use cases are in this file. The function names related to each of these tasks are collect_instructions, generate_training_data, and generate_block_training_data, respectively. You can refer to the last of those functions to see the construction of the URLs for images corresponding to the HTR example text (and those URLs should be publicly accessible).

I know that what you're most interested in is how I use those instructions and examples to build a conversation history. You can see examples of this in any of these files. One other thing that you'll see there and that might be useful for you and Emily as well is constraining the model to produce JSON output by specifying a response format in the API call (note that in order for this to work you also need to explicitly refer to "JSON" in the first message in the conversation).

@baskaufs
Copy link
Author

baskaufs commented Apr 2, 2024

From Emily:

The most recent version of the code should still be in the openai_ner.ipynb file (in label_analysis/chat_gpt); despite the file name it contains multiple processes (ner on titles, querying ner output, gpt vision on cropped images, querying vision output). I added a folder in image_analysis/output_final called test_0324 that has the output from my last round of testing. The ner_wikilabel files have to do with titles and image_wikilabel files have to do with object recognition (performed on a subset of all the objects depicted in the works selected for ner_wikilabel). "Sample" was the result of me randomly getting 10 works from five categories (print, painting, poster, sculpture, ceramic) while "Warhol" was just a sample of Warhol works.

@baskaufs
Copy link
Author

Emily's comments on final tests (2024-04-15):

As we discussed last time, it appears we need to be a bit more strategic about what tests we run on different types of artworks. I was able to figure out how to adjust URLs to trim images down to meet size requirements for ChatGPT, so I didn't have any more cases where it wasn't able to run an image (the processing time seemed to have improved too).

For object detection, I used the following procedure:

  • Use property coverage from Wikidata to get specific subsets: ceramics (soba bowls, plates, dishes), sculptures, drawings (creation date before 1940 or accession number below 1996, as anything outside of these filters was majority abstract/nonrepresentational art); results in work_type.csv
  • Use object_localization_image_urls and accession_dimensions to extract IIIF url for full image, then shrink down so max dimension was capped at 512 pixels; results in work_type_gpt.csv
  • Use modified urls as input for GPT vision, with slightly modified messages parameter for each work type and asking for all output to be in JSON. I also asked ChatGPT to return everything it thought was a notable subject in the work (instead of forcing it to identify a main subject and limiting it to three guesses); results in work_type_gpt_output1.csv

For NER on titles, I know we considered just running it on everything because of the relatively low cost, but as I took a second look at the titles I thought that there were some cases where the results would obviously not be good. I ended up just doing it on paintings:

  • Use property coverage from Wikidata to get all paintings, filtered to ones whose titles were not "Untitled"; results in painting_ner.csv
  • Run NER on titles in batches of 15 (to prevent output from being cut off by the token limit). I had better luck getting the output in JSON format this time but every 200 works or so the NER function had to be re-run (otherwise some random character would pop up in the output and my json.loads would throw an error); results in painting_ner1.csv

As far as additional categories go, I think the ones most likely to give significant results are running paintings through vision (probably will have to do something like I did with prints where I filtered out certain styles) and running prints (and perhaps ceramics/sculptures?) through NER. This would allow us to try matching NER labels with detected objects, adding depicts statements in Wikidata, improving IIIF manifest labels, etc. I do think the quality of the output is much better this time around, especially when comparing the full-image vision output with that of Google Vision (though ChatGPT lacks bounding boxes).

Let me know what your thoughts are. All the files should be in image_analysis/output_final/test_0415.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

1 participant