Skip to content

Commit 655c583

Browse files
committed
Initial commit
1 parent fdb3890 commit 655c583

File tree

98 files changed

+9201
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+9201
-2
lines changed

.gitattributes

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Auto-detect text files, ensure they use LF.
2-
* text=auto eol=lf working-tree-encoding=UTF-8
2+
* text=auto eol=lf
33

44
# Bash scripts
55
*.sh text eol=lf

docs/PLANNER.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,4 @@ like "I want a job promotion."
3838
The planner will operate within the skills it has available. In the event that a
3939
desired skill does not exist, the planner can suggest you to create the skill.
4040
Or, depending upon the level of complexity the kernel can help you write the missing
41-
skill.
41+
skill.

python/.conf/.pre-commit-config.yaml

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v4.0.1
4+
hooks:
5+
- id: check-toml
6+
- id: check-yaml
7+
- id: end-of-file-fixer
8+
- id: mixed-line-ending
9+
- repo: https://github.com/psf/black
10+
rev: 22.3.0
11+
hooks:
12+
- id: black
13+
- repo: https://github.com/PyCQA/isort
14+
rev: 5.12.0
15+
hooks:
16+
- id: isort
17+
args: ["--profile", "black"]
18+
- repo: https://github.com/pycqa/flake8
19+
rev: 6.0.0
20+
hooks:
21+
- id: flake8
22+
args: ["--config=python/.conf/flake8.cfg"]

python/.conf/flake8.cfg

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[flake8]
2+
max-line-length = 88
3+
extend-ignore = E203

python/.editorconfig

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# To learn more about .editorconfig see https://aka.ms/editorconfigdocs
2+
3+
# All files
4+
[*]
5+
indent_style = space
6+
end_of_line = lf
7+
8+
# Docs
9+
[*.md]
10+
insert_final_newline = true
11+
trim_trailing_whitespace = true
12+
13+
# Config/data
14+
[*.json]
15+
indent_size = 4
16+
insert_final_newline = false
17+
trim_trailing_whitespace = true
18+
19+
# Config/data
20+
[*.yaml]
21+
indent_size = 4
22+
insert_final_newline = true
23+
trim_trailing_whitespace = true
24+
25+
# Code
26+
[*.py]
27+
indent_size = 4
28+
insert_final_newline = true
29+
trim_trailing_whitespace = true

python/.vscode/settings.json

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"python.analysis.extraPaths": [
3+
"./src"
4+
],
5+
"explorer.compactFolders": false,
6+
"prettier.enable": true,
7+
"editor.formatOnType": true,
8+
"editor.formatOnSave": true,
9+
"editor.formatOnPaste": true,
10+
"python.formatting.provider": "autopep8",
11+
"python.formatting.autopep8Args": [
12+
"--max-line-length=160"
13+
],
14+
"notebook.output.textLineLimit": 500,
15+
"cSpell.words": [
16+
"aeiou",
17+
"nopep",
18+
"OPENAI",
19+
"skfunction"
20+
],
21+
}

python/DEV_SETUP.md

+87
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# System setup
2+
3+
To get started, you'll need VSCode and a local installation of Python 3.x.
4+
5+
You can run:
6+
7+
python3 --version ; pip3 --version ; code -v
8+
9+
to verify that you have the required dependencies.
10+
11+
## If you're on WSL
12+
13+
Check that you've cloned the repository to `~/workspace` or a similar folder.
14+
Avoid `/mnt/c/` and prefer using your WSL user's home directory.
15+
16+
Ensure you have the WSL extension for VSCode installed (and the Python extension
17+
for VSCode installed).
18+
19+
You'll also need `pip3` installed. If you don't yet have a `python3` install in WSL,
20+
you can run:
21+
22+
```bash
23+
sudo apt-get update && sudo apt-get install python3 python3-pip
24+
```
25+
26+
ℹ️ **Note**: if you don't have your PATH setup to find executables installed by `pip3`,
27+
you may need to run `~/.local/bin/poetry install` and `~/.local/bin/poetry shell`
28+
instead. You can fix this by adding `export PATH="$HOME/.local/bin:$PATH"` to
29+
your `~/.bashrc` and closing/re-opening the terminal.\_
30+
31+
# LLM setup
32+
33+
Make sure you have an
34+
[Open AI API Key](https://openai.com/api/) or
35+
[Azure Open AI service key](https://learn.microsoft.com/azure/cognitive-services/openai/quickstart?pivots=rest-api)
36+
37+
ℹ️ **Note**: Azure OpenAI support is work in progress, and will be available soon.
38+
39+
Copy those keys into a `.env` file like this:
40+
41+
```
42+
OPENAI_API_KEY=""
43+
OPENAI_ORG_ID=""
44+
AZURE_OPENAI_API_KEY=""
45+
AZURE_OPENAI_ENDPOINT=""
46+
```
47+
48+
We suggest adding a copy of the `.env` file under these folders:
49+
50+
- [python/tests](tests)
51+
- [samples/notebooks/python](../samples/notebooks/python).
52+
53+
# Quickstart with Poetry
54+
55+
Poetry allows to use SK from the current repo, without worrying about paths, as
56+
if you had SK pip package installed. SK pip package will be published after
57+
porting all the major features and ensuring cross-compatibility with C# SDK.
58+
59+
To install Poetry in your system:
60+
61+
pip3 install poetry
62+
63+
The following command install the project dependencies:
64+
65+
poetry install
66+
67+
And the following activates the project virtual environment, to make it easier
68+
running samples in the repo and developing apps using Python SK.
69+
70+
poetry shell
71+
72+
To run the same checks that are run during the Azure Pipelines build, you can run:
73+
74+
poetry run pre-commit run -c .conf/.pre-commit-config.yaml -a
75+
76+
# VSCode Setup
77+
78+
Open any of the `.py` files in the project and run the `Python: Select Interpreter` command
79+
from the command palette. Make sure the virtual env (venv) created by `poetry` is selected.
80+
The python you're looking for should be under `~/.cache/pypoetry/virtualenvs/semantic-kernel-.../bin/python`.
81+
82+
If prompted, install `black` and `flake8` (if VSCode doesn't find those packages,
83+
it will prompt you to install them).
84+
85+
# Tests
86+
87+
You should be able to run the example under the [tests](tests) folder.

python/FEATURE_PARITY.md

+199
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# Achieving Feature Parity in Python and C#
2+
3+
This is a high-level overview of where things stand towards reaching feature parity with the main
4+
[C# codebase](https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/SemanticKernel).
5+
6+
| | | |
7+
|------|------| ------
8+
| |Python| Notes |
9+
|`./ai/embeddings`| 🔄| Using Numpy for embedding representation. Vector operations not yet implemented |
10+
|`./ai/openai`| 🔄 | Makes use of the OpenAI Python package. AzureOpenAI* not implemented |
11+
|`./configuration`|| Direct port. Check inline docs |
12+
|`./core_skills`| 🔄 | `TextMemorySkill` implemented. Others not |
13+
|`./diagnostics` || Direct port of custom exceptions and validation helpers |
14+
|`./kernel_extensions` | 🔄 | Extensions take kernel as first argument and are exposed via `sk.extensions.*`
15+
|`./memory`| 🔄 | Can simplify by relying on Numpy NDArray
16+
|`./planning`| ❌ | Not yet implemented
17+
|`./semantic_functions/partitioning`| ❌ | Not yet implemented
18+
19+
20+
## Status of the Port
21+
22+
The port has a bulk of the Semantic Kernel C# code re-implemented, but is not yet fully complete. Major things like `tests` and `docs` are still missing.
23+
Here is a breakdown by sub-module on the status of this port:
24+
25+
### `./ai/embeddings` (Partial)
26+
27+
For now, `VectorOperations` from the original kernel will be skipped. We can use
28+
`numpy`'s `ndarray` as an efficient embedding representation. We can also use
29+
`numpy`'s optimized vector and matrix operations to do things like cosine similarity
30+
quickly and efficiently.
31+
32+
The `IEmbeddingIndex` interface has been translated to the `EmbeddingIndexBase` abstract
33+
class. The `IEmbeddingGenerator` interface has been translated to the
34+
`embedding_generator_base` abstract class.
35+
36+
The C# code makes use of extension methods to attach convenience methods to many interfaces
37+
and classes. In Python we don't have that luxury. Instead, these methods are in the corresponding class definition.
38+
(We can revisit this, but for good type hinting avoiding something fancy/dynamic works best.)
39+
40+
### `./ai/openai` (Partial)
41+
42+
The abstract clients (`(Azure)OpenAIClientAbstract`) have been ignored here. The `HttpSchema`
43+
submodule is not needed given we have the `openai` package to do the heavy lifting (bonus: that
44+
package will stay in-sync with OpenAI's updates, like the new ChatGPT API).
45+
46+
The `./ai/openai/services` module is retained and has the same classes/structure.
47+
48+
#### TODOs
49+
50+
The `AzureOpenAI*` alternatives are not yet implemented. This would be a great, low difficulty
51+
task for a new contributor to pick up.
52+
53+
### `./ai` (Complete?)
54+
55+
The rest of the classes at the top-level of the `./ai` module have been ported
56+
directly.
57+
58+
**NOTE:** here, we've locked ourselves into getting a _single_ completion
59+
from the model. This isn't ideal. Getting multiple completions is sometimes a great
60+
way to solve more challenging tasks (majority voting, re-ranking, etc.). We should look
61+
at supporting multiple completions.
62+
63+
**NOTE:** Based on `CompleteRequestSettings` no easy way to grab the `logprobs`
64+
associated with the models completion. This would be huge for techniques like re-ranking
65+
and also very crucial data to capture for metrics. We should think about how to
66+
support this. (We're currently a "text in text out" library, but multiple completions
67+
and logprobs seems to be fundamental in this space.)
68+
69+
### `./configuration` (Complete?)
70+
71+
Direct port, not much to do here. Probably check for good inline docs.
72+
73+
### `./core_skills` (Partial)
74+
75+
We've implemented the `TextMemorySkill` but are missing the following:
76+
77+
- `ConversationSummarySkill`
78+
- `FileIOSkill`
79+
- `HttpSkill`
80+
- `PlannerSkill` (NOTE: planner is a big sub-module we're missing)
81+
- `TextSkill`
82+
- `TimeSkill`
83+
84+
#### TODOs
85+
86+
Any of these individual core skills would be create low--medium difficulty contributions
87+
for those looking for something to do. Ideally with good docs and corresponding tests.
88+
89+
### `./diagnostics` (Complete?)
90+
91+
Pretty direct port of these few custom exceptions and validation helpers.
92+
93+
### `./kernel_extensions` (Partial)
94+
95+
This is difficult, for good type hinting there's a lot of duplication. Not having the
96+
convenience of extension methods makes this cumbersome. Maybe, in the future, we may
97+
want to consider some form of "plugins" for the kernel?
98+
99+
For now, the kernel extensions take the kernel as the first argument and are exposed
100+
via the `sk.extensions.*` namespace.
101+
102+
### `./memory` (Partial)
103+
104+
This was a complex sub-system to port. The C# code has lots of interfaces and nesting
105+
of types and generics. In Python, we can simplify this a lot. An embedding
106+
is an `ndarray`. There's lots of great pre-built features that come with that. The
107+
rest of the system is a pretty direct port but the layering can be a bit confusing.
108+
I.e. What's the real difference between storage, memory, memory record,
109+
data entry, an embedding, a collection, etc.?
110+
111+
#### TODOs
112+
113+
Review of this subsystem. Lots of good testing. Maybe some kind of overview
114+
documentation about the design. Maybe a diagram of how all these classes and interfaces
115+
fit together?
116+
117+
### `./orchestration` (Complete?)
118+
119+
This was a pretty core piece and another direct port. Worth double checking. Needs good docs and tests.
120+
121+
### `./planning` (TODO: nothing yet)
122+
123+
Completely ignored planning for now (and, selfishly, planning isn't a priority for
124+
SK-based experimentation).
125+
126+
### `./reliability` (Complete?)
127+
128+
Direct port. Nothing much going on in this sub-module. Likely could use more strategies
129+
for retry. Also wasn't quite sure if this was integrated with the kernel/backends?
130+
(Like are we actually using the re-try code, or is it not hit)
131+
132+
#### TODOs
133+
134+
Implement a real retry strategy that has backoff perhaps. Make sure this code is integrated
135+
and actually in use.
136+
137+
### `./semantic_functions` (Complete?)
138+
139+
Another core piece. The different config classes start to feel cumbersome here
140+
(func config, prompt config, backend config, kernel config, so so much config).
141+
142+
### `./semantic_functions/partitioning` (TODO: nothing yet)
143+
144+
Skipped this sub-sub-module for now. Good task for someone to pick up!
145+
146+
### `./skill_definition` (Complete?)
147+
148+
Another core piece, another pretty direct port.
149+
150+
**NOTE:** the attributes in C# become decorators in Python. We probably could
151+
make it feel a bit more pythonic (instead of having multiple decorators have just
152+
one or two).
153+
154+
**NOTE:** The skill collection, read only skill collection, etc. became a bit
155+
confusing (in terms of the relationship between everything). Would be good to
156+
double check my work there.
157+
158+
### `./template_engine` (Complete?)
159+
160+
Love the prompt templates! Have tried some basic prompts, prompts w/ vars,
161+
and prompts that call native functions. Seems to be working.
162+
163+
**NOTE:** this module definitely needs some good tests. There can be see some
164+
subtle errors sneaking into the prompt tokenization/rendering code here.
165+
166+
### `./text` (TODO: nothing yet)
167+
168+
Ignored this module for now.
169+
170+
### `<root>` (Partial)
171+
172+
Have a working `Kernel` and a working `KernelBuilder`. The base interface
173+
and custom exception are ported. the `Kernel` in particular
174+
is missing some things, has some bugs, could be cleaner, etc.
175+
176+
## Overall TODOs
177+
178+
We are currently missing a lot of the doc comments from C#. So a good review
179+
of the code and a sweep for missing doc comments would be great.
180+
181+
We also are missing any _testing_. We should figure out how we want to test
182+
(I think this project is auto-setup for `pytest`).
183+
184+
Finally, we are missing a lot of examples. It'd be great to have Python notebooks
185+
that show off many of the features, many of the core skills, etc.
186+
187+
188+
## Design Choices
189+
190+
We want the overall design of the kernel to be as similar as possible to C#.
191+
We also want to minimize the number of external dependencies to make the Kernel as lightweight as possible.
192+
193+
Right now, compared to C# there are two key differences:
194+
195+
1. Use `numpy` to store embeddings and do things like vector/matrix ops
196+
2. Use `openai` to interface with (Azure) OpenAI
197+
198+
There's also a lot of more subtle differences that come with moving to Python,
199+
things like static properties, no method overloading, no extension methods, etc.

0 commit comments

Comments
 (0)