Introduction to the use of Python for data science
- Python has a syntax that is efficient, somewhat tries to resemble natural language, which allows for ease of use. (when compared to other languages)
- With it’s popularity, many other popular scientific software libraries were implemented:
- Numpy - To support numeric analysis as naturally as Matlab does
- Matplotlib - Similar plotting functionality to Matlab
- Pandas - Data frame and associated manipulations (similar to R)
- sckikit-learn - Machine Learning algorithms (similar to caret in R)
- IPython/Jupyter - Notebook concept (similar to Mathematica/Sage)
- There are better tools for specific use cases, but for general purpose programming, Python is preferred.
- Large community, many resources!
- Writing a workflow as a program allows for pushing the compute time to the cloud.
- To scale (distribute) your workflow across a very large dataset.
- Python and third party libraries are open source, no expensive enterprise licensing.
- Setup Anaconda, install GDAL
- Download workshop materials
- Download sample imagery
- Demo of Jupyter notebook
- Users can run through the Intro to Python notebook.