Possible solution for changed files, and also alternate database suggestions #383
Replies: 2 comments 3 replies
-
I would suggest not a checksum but an unique id added to exif or meta data, adding checksum will already make the checksum invalid. Furthermore if you to calculate the checksum everytime, how much computational resource you would need. As for the db, I would suggest using Flask-SQLAlchemy, it uses python classes to automatically create relations and tables in the native db, which will make relations easier to understand and accessing data is as easy as well. Furthermore, you have freedom to choose any native db. |
Beta Was this translation helpful? Give feedback.
-
I would definitely want such a feature to be disabled by default. I think it's great that TagStudio doesn't move, touch or modify the original files and I would want to keep it that way. I wouldn't want checksums to change or digital signatures of a file to become invalid just because I'm managing them in TagStudio. Also, very few file formats allow for custom metadata like this, you'd need to have specialized parsers for each file format, and it may break some file formats. It's mainly just media files (video, audio, images) that support this kind of arbitrary metadata, and that metadata wasn't intended for random software to add their own IDs to a file. |
Beta Was this translation helpful? Give feedback.
-
Hello all, saw on CyanVoxels youtube that this project is alive and well! About halfway through, he discusses how the system handles files which have moved or changed names still has room for improvement. I had a thought about that.
Create a MD5/SHA256 checksum of every indexed file. Inject that into the metadata of the file (not certain if all file formats can hold metadata and have a common field like "comments"?). When the file is altered (renamed, moved), the system can seek and compare the database version of the against all un-indexed file checksum efficiently without having to re-perform the calculation on the actual file. This also means that duplicates of the same file, or file alterations (such as changing text on word documents and renaming the file) will still be tracked.
While I'm here, I'd also like to suggest an alternate to a SQL database. A lot of the ways tags interact with each other will be very hard to manage in SQL. My recommendation is arangoDB, as it's multi-model (lets you do key-value, but also document and graph). That said, neo4j is a lot simpler but would still make a good use for this (your data gets stored with relationships between things, so [mario >wearing> overalls] is a query you could make, but also search for mario, wearing, overalls, or any combination thereof. Actually making the tags for that might be a bit more complicated, since you now have to add all the relationships in order to search them, but there are a few good interfaces around to simply that. Lastly, SurrealDB is a document database, but its special query language is basically SQL with some extra abstraction. I haven't really used surrealdb much, so I can't weigh in too much on whether it's right for this, only that it would offer a lot more flexibility and doesn't have as much of a learning curve as the others might if people are already familiar with SQL.
Also, tossing this in from one of the youtube comments I saw, someone asked for sidecar files. While that's exactly what we're trying to avoid creating with this whole thing, I think eventually having the option to export sidecar files could be useful in the future, and shouldn't be too difficult to add (takes current tags for a selected file, punts them into the right format for a standard existing file like json, and creates the sidecar file next to the original.) I think this could help with some backing up problems, or when moving the files to a new system which doesn't have tagstudio on it such as sharing with a team.
Beta Was this translation helpful? Give feedback.
All reactions