Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GlassFlow #2615

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

add GlassFlow #2615

wants to merge 1 commit into from

Conversation

Boburmirzo
Copy link

@Boburmirzo Boburmirzo commented Sep 20, 2024

What is this Python project?

GlassFlow is a serverless, Python-centric real-time data transformation solution for end-to-end data pipelines. If you use GlassFlow, you do not need Apache Kafka and Flink. Visit the docs page to learn more: https://docs.glassflow.dev/get-started/introduction

Describe features.

You can:

  • Use GlassFlow out-of-the-box with any existing Python library.
  • Start GlassFlow without a complex initial setup such as creating clusters.
  • Skip the headache of managing partitions, shards, and workers' setup.
  • Define your pipeline as code using GlassFlow CLI.
  • Implement your transformation function using GlassFlow Python SDK
  • Run your Python code locally for easy development and debugging.

GlassFlow does:

  • Provides a pure Python and zero infrastructure environment.
  • Keeps your original data where it is.
  • Connects live data sources.
  • Ingests real-time data continuously.
  • Does real-time data transformation.
  • Simulates your production workloads.
  • Deploys your pipeline to production within minutes.
  • Delivers auto-scalable serverless event streaming infrastructure.

What's the difference between this Python project and similar ones?

Most real-time data processing tools including Kafka are Java-based, while in recent days Python has been the go-to language for data science and machine learning, especially with the AI hype. Because Python has a rich set of libraries for data manipulation and analysis, such as Pandas. To bridge this gap, nowadays you can find a set of tools and technologies available for real-time data processing in Python such as wrapper Python APIs/libraries for (JVM). However, In all Kafka wrappers, you can not simulate easily a production environment without a complex initial setup like creating computing clusters and managing partitions, shards, and workers' setups.

They need to implement a custom transformation user-defined function (UDF) to convert lets say most famous library Pandas transformation to Java syntax. This translation time can significantly impact the throughput and responsiveness of real-time applications.

Enumerate comparisons.

Getting a similar PyFlink based pipeline in production takes 6-12 months and involves several tools to use. GlassFlow can get your data pipeline up and running in just 15 minutes with single tool.

--

Anyone who agrees with this pull request could submit an Approve review to it.

@Boburmirzo
Copy link
Author

Boburmirzo commented Sep 20, 2024

@MatteoGuadrini @Wisma-55 @PythonChicken123 Could you help me to review and approve this PR, please? Thanks!

@Boburmirzo
Copy link
Author

@Wisma-55 Thanks! Do you know who can merge the PR here?:)

@Bib4real
Copy link

Approved these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants