Skip to content

Commit 214978a

Browse files
authored
Add Python syntax highlighting to readme (#205)
1 parent 54248fb commit 214978a

File tree

1 file changed

+16
-9
lines changed

1 file changed

+16
-9
lines changed

README.md

+16-9
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,49 @@
11
# mongo-arrow
2+
23
Tools for using Apache Arrow with MongoDB
34

45
## Apache Arrow
5-
We utilize Apache Arrow to offer fast and easy conversion of MongoDB query result sets to multiple numerical data formats popular among developers including NumPy ndarrays, Pandas DataFrames, parquet files, csv, and more.
6+
7+
We utilize Apache Arrow to offer fast and easy conversion of MongoDB query result sets to multiple numerical data formats popular among developers including NumPy arrays, Pandas DataFrames, parquet files, CSV, and more.
68

79
We chose Arrow for this because of its unique set of characteristics:
10+
811
- language-independent
912
- columnar memory format for flat and hierarchical data,
1013
- organized for efficient analytic operations on modern hardware like CPUs and GPUs
1114
- zero-copy reads for lightning-fast data access without serialization overhead
12-
- it was simple and fast, and from our perspective Apache Arrow is ideal for processing and transport of large datasets in high-performance applications.
15+
- it was simple and fast, and from our perspective, Apache Arrow is ideal for processing and transporting of large datasets in high-performance applications.
1316

1417
As reference points for our implementation, we also took a look at BigQuery’s Pandas integration, pandas methods to handle JSON/semi-structured data, the Snowflake Python connector, and Dask.DataFrame.
1518

16-
1719
## How it Works
20+
1821
Our implementation relies upon a user-specified data schema to marshall query result sets into tabular form.
1922
Example
20-
```
23+
24+
```py
2125
from pymongoarrow.api import Schema
22-
schema = Schema({'_id': int, 'amount': float, 'last_updated': datetime})
26+
27+
schema = Schema({"_id": int, "amount": float, "last_updated": datetime})
2328
```
2429

2530
You can install PyMongoArrow on your local machine using Pip:
2631
`$ python -m pip install pymongoarrow`
2732

2833
You can export data from MongoDB to a pandas dataframe easily using something like:
29-
```
30-
df = production.invoices.find_pandas_all({'amount': {'$gt': 100.00}}, schema=invoices)
34+
35+
```py
36+
df = production.invoices.find_pandas_all({"amount": {"$gt": 100.00}}, schema=invoices)
3137
```
3238

3339
Since PyMongoArrow can automatically infer the schema from the first batch of data, this can be
3440
further simplified to:
3541

36-
```
37-
df = production.invoices.find_pandas_all({'amount': {'$gt': 100.00}})
42+
```py
43+
df = production.invoices.find_pandas_all({"amount": {"$gt": 100.00}})
3844
```
3945

4046
## Final Thoughts
47+
4148
This library is in the early stages of development, and so it's possible the API may change in the future -
4249
we definitely want to continue expanding it. We welcome your feedback as we continue to explore and build this tool.

0 commit comments

Comments
 (0)