|
| 1 | +# Achieving Feature Parity in Python and C# |
| 2 | + |
| 3 | +This is a high-level overview of where things stand towards reaching feature parity with the main |
| 4 | + [C# codebase](https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/SemanticKernel). |
| 5 | + |
| 6 | +| | | | |
| 7 | +|------|------| ------ |
| 8 | +| |Python| Notes | |
| 9 | +|`./ai/embeddings`| 🔄| Using Numpy for embedding representation. Vector operations not yet implemented | |
| 10 | +|`./ai/openai`| 🔄 | Makes use of the OpenAI Python package. AzureOpenAI* not implemented | |
| 11 | +|`./configuration`|✅ | Direct port. Check inline docs | |
| 12 | +|`./core_skills`| 🔄 | `TextMemorySkill` implemented. Others not | |
| 13 | +|`./diagnostics` | ✅ | Direct port of custom exceptions and validation helpers | |
| 14 | +|`./kernel_extensions` | 🔄 | Extensions take kernel as first argument and are exposed via `sk.extensions.*` |
| 15 | +|`./memory`| 🔄 | Can simplify by relying on Numpy NDArray |
| 16 | +|`./planning`| ❌ | Not yet implemented |
| 17 | +|`./semantic_functions/partitioning`| ❌ | Not yet implemented |
| 18 | + |
| 19 | + |
| 20 | +## Status of the Port |
| 21 | + |
| 22 | +The port has a bulk of the Semantic Kernel C# code re-implemented, but is not yet fully complete. Major things like `tests` and `docs` are still missing. |
| 23 | +Here is a breakdown by sub-module on the status of this port: |
| 24 | + |
| 25 | +### `./ai/embeddings` (Partial) |
| 26 | + |
| 27 | +For now, `VectorOperations` from the original kernel will be skipped. We can use |
| 28 | +`numpy`'s `ndarray` as an efficient embedding representation. We can also use |
| 29 | +`numpy`'s optimized vector and matrix operations to do things like cosine similarity |
| 30 | +quickly and efficiently. |
| 31 | + |
| 32 | +The `IEmbeddingIndex` interface has been translated to the `EmbeddingIndexBase` abstract |
| 33 | +class. The `IEmbeddingGenerator` interface has been translated to the |
| 34 | +`embedding_generator_base` abstract class. |
| 35 | + |
| 36 | +The C# code makes use of extension methods to attach convenience methods to many interfaces |
| 37 | +and classes. In Python we don't have that luxury. Instead, these methods are in the corresponding class definition. |
| 38 | +(We can revisit this, but for good type hinting avoiding something fancy/dynamic works best.) |
| 39 | + |
| 40 | +### `./ai/openai` (Partial) |
| 41 | + |
| 42 | +The abstract clients (`(Azure)OpenAIClientAbstract`) have been ignored here. The `HttpSchema` |
| 43 | +submodule is not needed given we have the `openai` package to do the heavy lifting (bonus: that |
| 44 | +package will stay in-sync with OpenAI's updates, like the new ChatGPT API). |
| 45 | + |
| 46 | +The `./ai/openai/services` module is retained and has the same classes/structure. |
| 47 | + |
| 48 | +#### TODOs |
| 49 | + |
| 50 | +The `AzureOpenAI*` alternatives are not yet implemented. This would be a great, low difficulty |
| 51 | +task for a new contributor to pick up. |
| 52 | + |
| 53 | +### `./ai` (Complete?) |
| 54 | + |
| 55 | +The rest of the classes at the top-level of the `./ai` module have been ported |
| 56 | +directly. |
| 57 | + |
| 58 | +**NOTE:** here, we've locked ourselves into getting a _single_ completion |
| 59 | +from the model. This isn't ideal. Getting multiple completions is sometimes a great |
| 60 | +way to solve more challenging tasks (majority voting, re-ranking, etc.). We should look |
| 61 | +at supporting multiple completions. |
| 62 | + |
| 63 | +**NOTE:** Based on `CompleteRequestSettings` no easy way to grab the `logprobs` |
| 64 | +associated with the models completion. This would be huge for techniques like re-ranking |
| 65 | +and also very crucial data to capture for metrics. We should think about how to |
| 66 | +support this. (We're currently a "text in text out" library, but multiple completions |
| 67 | +and logprobs seems to be fundamental in this space.) |
| 68 | + |
| 69 | +### `./configuration` (Complete?) |
| 70 | + |
| 71 | +Direct port, not much to do here. Probably check for good inline docs. |
| 72 | + |
| 73 | +### `./core_skills` (Partial) |
| 74 | + |
| 75 | +We've implemented the `TextMemorySkill` but are missing the following: |
| 76 | + |
| 77 | +- `ConversationSummarySkill` |
| 78 | +- `FileIOSkill` |
| 79 | +- `HttpSkill` |
| 80 | +- `PlannerSkill` (NOTE: planner is a big sub-module we're missing) |
| 81 | +- `TextSkill` |
| 82 | +- `TimeSkill` |
| 83 | + |
| 84 | +#### TODOs |
| 85 | + |
| 86 | +Any of these individual core skills would be create low--medium difficulty contributions |
| 87 | +for those looking for something to do. Ideally with good docs and corresponding tests. |
| 88 | + |
| 89 | +### `./diagnostics` (Complete?) |
| 90 | + |
| 91 | +Pretty direct port of these few custom exceptions and validation helpers. |
| 92 | + |
| 93 | +### `./kernel_extensions` (Partial) |
| 94 | + |
| 95 | +This is difficult, for good type hinting there's a lot of duplication. Not having the |
| 96 | +convenience of extension methods makes this cumbersome. Maybe, in the future, we may |
| 97 | +want to consider some form of "plugins" for the kernel? |
| 98 | + |
| 99 | +For now, the kernel extensions take the kernel as the first argument and are exposed |
| 100 | +via the `sk.extensions.*` namespace. |
| 101 | + |
| 102 | +### `./memory` (Partial) |
| 103 | + |
| 104 | +This was a complex sub-system to port. The C# code has lots of interfaces and nesting |
| 105 | +of types and generics. In Python, we can simplify this a lot. An embedding |
| 106 | +is an `ndarray`. There's lots of great pre-built features that come with that. The |
| 107 | +rest of the system is a pretty direct port but the layering can be a bit confusing. |
| 108 | +I.e. What's the real difference between storage, memory, memory record, |
| 109 | +data entry, an embedding, a collection, etc.? |
| 110 | + |
| 111 | +#### TODOs |
| 112 | + |
| 113 | +Review of this subsystem. Lots of good testing. Maybe some kind of overview |
| 114 | +documentation about the design. Maybe a diagram of how all these classes and interfaces |
| 115 | +fit together? |
| 116 | + |
| 117 | +### `./orchestration` (Complete?) |
| 118 | + |
| 119 | +This was a pretty core piece and another direct port. Worth double checking. Needs good docs and tests. |
| 120 | + |
| 121 | +### `./planning` (TODO: nothing yet) |
| 122 | + |
| 123 | +Completely ignored planning for now (and, selfishly, planning isn't a priority for |
| 124 | +SK-based experimentation). |
| 125 | + |
| 126 | +### `./reliability` (Complete?) |
| 127 | + |
| 128 | +Direct port. Nothing much going on in this sub-module. Likely could use more strategies |
| 129 | +for retry. Also wasn't quite sure if this was integrated with the kernel/backends? |
| 130 | +(Like are we actually using the re-try code, or is it not hit) |
| 131 | + |
| 132 | +#### TODOs |
| 133 | + |
| 134 | +Implement a real retry strategy that has backoff perhaps. Make sure this code is integrated |
| 135 | +and actually in use. |
| 136 | + |
| 137 | +### `./semantic_functions` (Complete?) |
| 138 | + |
| 139 | +Another core piece. The different config classes start to feel cumbersome here |
| 140 | +(func config, prompt config, backend config, kernel config, so so much config). |
| 141 | + |
| 142 | +### `./semantic_functions/partitioning` (TODO: nothing yet) |
| 143 | + |
| 144 | +Skipped this sub-sub-module for now. Good task for someone to pick up! |
| 145 | + |
| 146 | +### `./skill_definition` (Complete?) |
| 147 | + |
| 148 | +Another core piece, another pretty direct port. |
| 149 | + |
| 150 | +**NOTE:** the attributes in C# become decorators in Python. We probably could |
| 151 | +make it feel a bit more pythonic (instead of having multiple decorators have just |
| 152 | +one or two). |
| 153 | + |
| 154 | +**NOTE:** The skill collection, read only skill collection, etc. became a bit |
| 155 | +confusing (in terms of the relationship between everything). Would be good to |
| 156 | +double check my work there. |
| 157 | + |
| 158 | +### `./template_engine` (Complete?) |
| 159 | + |
| 160 | +Love the prompt templates! Have tried some basic prompts, prompts w/ vars, |
| 161 | +and prompts that call native functions. Seems to be working. |
| 162 | + |
| 163 | +**NOTE:** this module definitely needs some good tests. There can be see some |
| 164 | +subtle errors sneaking into the prompt tokenization/rendering code here. |
| 165 | + |
| 166 | +### `./text` (TODO: nothing yet) |
| 167 | + |
| 168 | +Ignored this module for now. |
| 169 | + |
| 170 | +### `<root>` (Partial) |
| 171 | + |
| 172 | +Have a working `Kernel` and a working `KernelBuilder`. The base interface |
| 173 | +and custom exception are ported. the `Kernel` in particular |
| 174 | +is missing some things, has some bugs, could be cleaner, etc. |
| 175 | + |
| 176 | +## Overall TODOs |
| 177 | + |
| 178 | +We are currently missing a lot of the doc comments from C#. So a good review |
| 179 | +of the code and a sweep for missing doc comments would be great. |
| 180 | + |
| 181 | +We also are missing any _testing_. We should figure out how we want to test |
| 182 | +(I think this project is auto-setup for `pytest`). |
| 183 | + |
| 184 | +Finally, we are missing a lot of examples. It'd be great to have Python notebooks |
| 185 | +that show off many of the features, many of the core skills, etc. |
| 186 | + |
| 187 | + |
| 188 | +## Design Choices |
| 189 | + |
| 190 | +We want the overall design of the kernel to be as similar as possible to C#. |
| 191 | +We also want to minimize the number of external dependencies to make the Kernel as lightweight as possible. |
| 192 | + |
| 193 | +Right now, compared to C# there are two key differences: |
| 194 | + |
| 195 | +1. Use `numpy` to store embeddings and do things like vector/matrix ops |
| 196 | +2. Use `openai` to interface with (Azure) OpenAI |
| 197 | + |
| 198 | +There's also a lot of more subtle differences that come with moving to Python, |
| 199 | +things like static properties, no method overloading, no extension methods, etc. |
0 commit comments