Predictive Models for Researching Scientific Software

Computational predictive models to assist in the identification, classification, and study of scientific software.

Models

Developer-Author Entity Matching

This model is a binary classifier that predicts whether a developer and an author are the same person. It is trained on a dataset of 3000 developer-author pairs that have been annotated as either matching or not matching.

Usage

Given a set of developers and authors, we use the model on each possible pair of developer and author to predict whether they are the same person. The model returns a list of only the found matches in MatchedDevAuthor objects, each containing the developer, author, and the confidence of the prediction.

from sci_soft_models import dev_author_em

devs = [
    dev_author_em.DeveloperDetails(
        username="evamaxfield",
        name="Eva Maxfield Brown",
    ),
    dev_author_em.DeveloperDetails(
        username="nniiicc",
    ),
]

authors = [
    "Eva Brown",
    "Nicholas Weber",
]

matches = dev_author_em.match_devs_and_authors(devs=devs, authors=authors)
print(matches)
# [
#   MatchedDevAuthor(
#       dev=DeveloperDetails(
#           username='evamaxfield',
#           name='Eva Maxfield Brown',
#           email=None,
#       ),
#       author='Eva Brown',
#       confidence=0.9851127862930298
#   )
# ]

Extra Notes

Developer-Author-EM Dataset

This model was originally created and managed as a part of rs-graph and as such, to regenerate the dataset for annotation, the following steps can be taken:

git clone https://github.com/evamaxfield/rs-graph.git
cd rs-graph
git checkout c1d8ec89
pip install -e .
rs-graph-modeling create-developer-author-em-dataset-for-annotation

Link to annotation set creation function.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
sci_soft_models		sci_soft_models
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predictive Models for Researching Scientific Software

Models

Developer-Author Entity Matching

Usage

Extra Notes

Developer-Author-EM Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

evamaxfield/sci-soft-models

Folders and files

Latest commit

History

Repository files navigation

Predictive Models for Researching Scientific Software

Models

Developer-Author Entity Matching

Usage

Extra Notes

Developer-Author-EM Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages