Journal Transporter is a Python 3 CLI application that facilitates the transfer of scholarly journal data to and from different online publishing platforms.
- Python 3.9 / Pip3
- Linux or Unix-based OS
- Untested on Windows. It may work, except for included Bash scripts
Clone this repo and navigate to it. Install dependencies with bin/install
.
CLI commands can be executed with:
bin/journal-transporter <COMMAND>
or
python -m journal_transporter <COMMAND>
Note: If you use the latter syntax and invoke python directly, you'll likely want to activate the virtual environment first with source ./venv/bin/activate
. bin/journal-transporter
does this for you.
bin/install
also by default creates a symlink for journal-transporter
in user/local
. If this is permitted, the application can be invoked simply at journal-transporter
.
-v, --verbose Enable verbose output
--version Show the application's version and exit
--help Show this message and exit.
Apply configuration options.
These options will be used as default values for the transfer command, unless different options are provided.
Options:
-d, --data-directory TEXT Path to data directory location
--default-source, --s TEXT Name of an already-defined source server to
use by default (see define-server)
--default-target, --t TEXT Name of an already-defined target server to
use by default (see define-server)
-k, --keep / -K, --discard Should the dataset from this transfer be
kept? This could use a lot of disk space
--keep-max INTEGER If --keep is true, how many transfers should
be kept? Older transfer will be discarded.
-v, --verbose / -V, --succinct Verbose output by default
--help Show this message and exit.
Defines a server that can be referenced later as a source or destination.
If NAME
already exists in the config (case sensitive), this command will
update that server rather than creating a new one.
Arguments:
NAME Name of the server to create or update [required]
Options:
-h, --host TEXT The server's URL or hostname that can be used to
access it from this machine
-t, --type [ssh|http] Method that should be used to connect to the server
-u, --user TEXT Username of a user authorized to access the
information
-p, --password TEXT Password of a user authorized to access the
information
--port INTEGER The port to which the client should connect
--help Show this message and exit.
Deletes a server configuration.
Arguments:
NAME Name of the server you wish to delete [required]
Options:
-f, --force Perform delete without confirmation prompt
--help Show this message and exit.
Displays the current application configuration.
Options:
--help Show this message and exit.
Initialize the application for use. Must be called first.
This command should be called before all else. It creates the data directory and config file, so is required before any other configuration.
Options:
-d, --data-directory TEXT Path to data directory location
--help Show this message and exit.
Initiates a transfer of data from a source server to a target server.
By default, this command invokes 3 actions - index, fetch, and push. These actions can be invoked
individually by using the --index-only
, --fetch-only
, and --push-only
flags. When using
--index-only
or --fetch-only
, only a --source
must be defined. When using --push-only
,
only a --target
must be defined.
Note that --fetch-only
will inherently invoke index unless indexes have already been cached
for the provided source. --push-only
can only be used if data has already been indexed
and fetched.
Options:
-j, --journals TEXT Any number of journal key names (also known as
'paths' or 'codes') that are to be transferred
-s, --source TEXT Name of an already-defined source server to use
(see define-server)
-t, --target TEXT Name of an already-defined target server to use
(see define-server)
--fetch-only If true, only fetch data and do not transfer to
target server.
--push-only If true, only take currently-stored data and
transfer to target server. Do not fetch new
data.
--index-only If true, only index data and do not fetch or
push.
--data-directory, --d TEXT Path to data directory location [default:
/Users/timfrazee/Library/Application
Support/journal_transporter]
-k, --keep / -K, --discard Should the dataset from this transfer be kept?
This could use a lot of disk space
--debug Enable debug output
-f, --force Run without prompts
--help Show this message and exit.
All commands are described by invoking bin/jt --help
. Detail on individual commands can be viewed with bin/jt <COMMAND> --help
.
Generally speaking, the process of transferring a journal from a source server to a target server involves the following commands:
- Define the source server
journal-transporter define-server <source_server_name> --host <source_server_plugin_api_url> --username <auth_user> --password <auth_password>
- Define the target server
journal-transporter define-server <target_server_name> --host <destination_server_plugin_api_url> --username <auth_user> --password <auth_password>
- Initiate the transfer, referencing the servers created above
journal-transporter transfer --source <source_server_name> --target <target_server_name> --journals <journal_path>
See documentation for the transfer
command (jt transfer --help
) for more options.
Let's say there is a source OJS server at https://ojs.example.com, with Journal Transporter
plugin available at /jt
. On this server instance, there is a privileged user with username
ojs_admin
and password abc_easy
.
There is also a Janeway server located at https://janeway.example.com, which has the
Journal Transporter plugin installed at/plugins/journal-transporter
.
On this Janeway instance, there is a privileged user with username janeway_admin
and password as_123
.
First, you must define the source OJS server in Journal Transporter. We'll name is "ojs":
journal-transporter define-server ojs --host https://ojs.example.com/jt --user ojs_admin --password abc_easy
Then, you must define the target Janeway server. We'll name it "janeway":
journal-transporter define-server janeway --host https://janeway.example.com/plugins/journal-transporter --user janeway_admin --password as_123
Let's say you have a journal called sample_journal
that exists on the OJS server, and you want to transfer it to the Janeway server.
With your servers defined, you can now initiate the transfer of the desired journal. In this example, we'll set verbose output for more detailed progress reporting:
journal-transporter --verbose transfer --source ojs --target janeway --journals sample_journal
This process may take quite a long time, depending on how much data there is to be transferred.
Since transfers can take quite a long time, you may want to split it up into multiple discrete actions. This also allows you to review the data that's been fetched before pushing it.
Journal Transporter performs transfers in 3 distinct stages:
- Indexing
- Fetching
- Pushing
You can invoke these stages separately, if desired, but they must occur in this order.
If you want to perform a full index of all resources that will be transferred as part of the process, you can perform indexing as a standalone action.
journal-transporter --verbose transfer --index-only --source ojs --journal sample_journal
When finished, the data directory's "current" directory will contain two top-level directories: "journals" and "users". See the data structure for more detail. These directories will contain at least a partial data structure and an index for each index-able resource. These indexes are primarily used for lookups during the fetching process, and only contain a small amount of the data to ultimately be transferred.
After indexing, Journal Transporter can fetch all of the data and files necessary for the transfer. This can be invoked independently with
journal-transporter --verbose transfer --fetch-only --source ojs --journal sample_journal
Depending on the size of the journal, this process may consume a significant amount of disk space.
Fetching will pull detailed JSON data for each indexed object, as well as referenced binary files. They will all be stored in the structure defined in the data structure section.
The final step of the transfer process is pushing the fetched data to the target server (in this example, a Janeway instance). This can be done, provided indexing and fetching have already been completed, with:
journal-transporter --verbose transfer --push-only --target janeway --journal sample_journal
Note that the plugin on the target server (in this case, the Janeway plugin) is responsible for handling the raw data pushed to it by Journal Transporter. Before starting the push, ensure that the target database is ready to accept the data (i.e. sufficient space, no impending journal code conflicts, etc.)
If a fatal error occurs during this process, you will be notified, but Journal Transporter will not, itself, perform any clean up of your target server (plugins are responsible for their own clean-up, if any is to be performed). After addressing the issue, you will likely need to seek out and delete any created resources on the target server (journals, files, etc) before trying again.
Journal Transporter saves all indexed and fetched data to the data directory defined in the application's configuration file. By default, data is stored in a directory called "current", indicating that it is relevant to the current (or most recent) transfer process invoked. If you used the --keep
flag or configuration, "current" may be a symlink to a dated directory that will be retained.
Within the data directory is a nested hierarchy of resource directories, named after the type of resource contained. Note that this name may not be the same as the resource in the source server. Names in Journal Transporter are designed to be as generic and simply descriptive as possible, but do not account for different systems' naming schemes.
All indexed and fetched objects are assigned a unique UUID. This UUID is stored in the index, as well as used as the directory name for that resource.
Files fall into a data hierarchy that resembles the following:
- users
- journals
- roles
- issues
- sections
- review_forms
- elements
- articles
- editors
- authors
- files
- revision_requests
- rounds
- assignments
- response
- assignments
The above translates into a file structure that looks something like the following example (truncated for brevity, with example UUIDs):
- users/
- index.json
- e82dfaef-9555-45c9-98cd-b4cb84dc595f/
- user.json
- 37a6c228-d0b1-41fb-b141-559768f1b113/
- user.json
- journals/
- index.json
- 46c807b7-7426-431d-9e02-4dcc108c67c6/
- journal.json
- issues/
- index.json
- c460af58-49f9-4cd1-a73d-38711b856e04/
- issue.json
- articles/
- index.json
- d766b878-951e-455a-b8b1-7c512a74d386/
- article.json
- authors/
- index.json
- 8eaae381-d8a9-4738-8ca7-65b90e1a8f4d/
- author.json
- files/
- index.json
- 33ba4f36-982c-4d58-b65a-e71f34f04351/
- file.json
- example_file.pdf
... and so on.
Except for binary files, all files are text and can be opened and reviewed in any text editor.
TBD
Contributions are welcome. Please see guidelines in the contributing file.