You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description: Scan multiple Git repositories, pull specified files content and process it with Large Language Models. You can summarize the content in specific way, extract information and data, or find answers to your questions about the repositories.
9
+
Description: Scan multiple Git repositories, pull specified files content and process it with Large Language Models. You can summarize the content in specific way, extract information and data, or find answers to your questions about the repositories. The output can be stored in vector database and used for semantic search or as a part of a RAG (Retrieval Augmented Generation) prompt.
Copy file name to clipboardExpand all lines: README.Rmd
+67-14Lines changed: 67 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -19,15 +19,46 @@ knitr::opts_chunk$set(
19
19
[](https://app.codecov.io/gh/r-world-devs/GitAI)
20
20
<!-- badges: end -->
21
21
22
-
The goal of GitAI is to derive knowledge from GitHub or GitLab repositories with the use of AI/LLM (Large Language Models). With GitAI you can easily:
22
+
> The goal of `GitAI` is to **extract knowledge from Git repositories** with the use of AI/LLM (Large Language Models).
23
23
24
-
- set up your project scope (Git repositories),
25
-
- select content of interest (files and file types),
26
-
- choose your LLM backend,
27
-
- define the LLM prompts,
28
-
- process content of all repositories with a single function call.
24
+
## Motivation
29
25
30
-
And all of that in a nice tidyverse style.
26
+
Large organizations need to deal with massive number of git repositories
27
+
(both internal and external). Those repositories can be hosted on different
28
+
platforms (like `GitHub` and `GitLab`).
29
+
30
+
It is very difficult or even impossible to review all those repositories
31
+
manually, especially if one needs to perform an exploratory search,
32
+
not knowing the exact keywords that should be used.
33
+
34
+
Because of that the reusability of the knowledge (and code) hidden in the
35
+
repositories is a constant challenge.
36
+
37
+
## Solution
38
+
39
+
We propose the `GitAI` framework written in R.
40
+
41
+
It is applicable to multiple use cases related to extracting knowledge from Git repositories.
42
+
At the same time, is IT infrastructure agnostic. It is designed to work with
43
+
different backends, LLMs, embeddings models, and vector databases.
44
+
Adapting to particular backends may need implementation of new classes, but
45
+
the core functionality stays the same.
46
+
47
+
## Workflow
48
+
49
+
Typical `GitAI` workflow looks like that:
50
+
51
+
1. Set up your project.
52
+
1. Set up your project scope (Git repositories).
53
+
1. Select content type of interest (files and file types).
54
+
1. Choose your LLM backend.
55
+
1. Define the LLM prompts.
56
+
1. (Optional) Choose embedding model and vector database provider.
57
+
1. Process content of all repositories with a single function call.
58
+
1. (Optional) If vector database is setup, the results will be stored there.
59
+
1. Use the information extracted from files content from git repositories.
60
+
1. (Optional) If results are stored in vector database,
61
+
they can be searched using *semantic search* or used as a part of a RAG (*Retrieval Augmented Generation*) prompt.
31
62
32
63
## Installation
33
64
@@ -38,21 +69,43 @@ You can install the development version of `GitAI` from [GitHub](https://github.
38
69
pak::pak("r-world-devs/GitAI")
39
70
```
40
71
41
-
## Example workflow
42
-
43
-
Basic workflow could look like:
72
+
## Simplified example (without vector database usage)
44
73
45
74
```{r}
46
75
library(GitAI)
47
-
# Set up project
76
+
```
77
+
78
+
Let's set up a project `fascinating_project` that will extract some summaries from the content of the `README.md` files in the few selected git repositories.
set_prompt("Write one-sentence summary for a project based on given input.")
92
+
```
93
+
94
+
Now, let’s get the results and print them.
47
95
48
-
# Get the results
96
+
```r
49
97
results<- process_repos(my_project)
50
-
purrr::map(results, ~.$text)
51
-
#> $GitStats
52
-
#> [1] "GitStats is an R package that enables users to extract and analyze GitHub and GitLab data, such as repository details, commits, and user activity, in a standardized table format."
#> GitStats is an experimental R package that facilitates the extraction
103
+
#> and analysis of git data from GitHub and GitLab, providing insights into
104
+
#> repositories, commits, users, and R package usage in a structured format.
53
105
#>
54
-
#> $GitAI
55
-
#> [1] "GitAI is an R package designed to harness the power of AI and Large Language Models to extract insights from GitHub or GitLab repositories in a user-friendly, tidyverse style, enabling users to set project scopes, select content of interest, and process repositories with ease."
106
+
#> GitAI is an R package that leverages AI and Large Language Models to extract
107
+
#> insights from GitHub or GitLab repositories, allowing users to define project
108
+
#> scopes, select relevant content, and process repositories efficiently in a
109
+
#> tidyverse-compliant manner.
56
110
#>
57
-
#> $DataFakeR
58
-
#> [1] "DataFakeR is an experimental R package designed to generate fake data samples that maintain specified characteristics of original datasets, streamlined through customizable configurations and schema management."
111
+
#> DataFakeR is an R package that enables users to generate synthetic datasets
112
+
#> while maintaining specified assumptions about the original data structure,
113
+
#> facilitating data simulation for testing and analysis.
59
114
```
115
+
116
+
## See also
117
+
118
+
Our `GitAI` uses under the hood the `GitStats` R package. If you want to
0 commit comments