|
| 1 | +--- |
| 2 | +title: Before we start |
| 3 | +teaching: 30 |
| 4 | +exercises: 0 |
| 5 | +--- |
| 6 | + |
| 7 | +::::::::::::::::::::::::::::::::::::::: objectives |
| 8 | + |
| 9 | +- Present motivations for using Python. |
| 10 | +- Organize files and directories for a set of analyses as a Python project, and understand the purpose of the working directory. |
| 11 | +- How to work with Jupyter Notebook and Spyder. |
| 12 | +- Know where to find help. |
| 13 | +- Demonstrate how to provide sufficient information for troubleshooting with the Python user community. |
| 14 | + |
| 15 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 16 | + |
| 17 | +:::::::::::::::::::::::::::::::::::::::: questions |
| 18 | + |
| 19 | +- What is Python and why should I learn it? |
| 20 | + |
| 21 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 22 | + |
| 23 | +## What is Python? |
| 24 | + |
| 25 | +Python is a general purpose programming language that supports rapid development of data analytics |
| 26 | +applications. The word "Python" is used to refer to both, the programming language and the tool |
| 27 | +that executes the scripts written in Python language. |
| 28 | + |
| 29 | +Its main advantages are: |
| 30 | + |
| 31 | +- Free |
| 32 | +- Open-source |
| 33 | +- Available on all major platforms (macOS, Linux, Windows) |
| 34 | +- Supported by Python Software Foundation |
| 35 | +- Supports multiple programming paradigms |
| 36 | +- Has large community |
| 37 | +- Rich ecosystem of third-party packages |
| 38 | + |
| 39 | +*So, why do you need Python for data analysis?* |
| 40 | + |
| 41 | +- **Easy to learn:** |
| 42 | + Python is easier to learn than other programming languages. This is important because lower barriers |
| 43 | + mean it is easier for new members of the community to get up to speed. |
| 44 | + |
| 45 | +- **Reproducibility:** |
| 46 | + Reproducibility is the ability to obtain the same results using the same dataset(s) and analysis. |
| 47 | + |
| 48 | + Data analysis written as a Python script can be reproduced on any platform. Moreover, if you |
| 49 | + collect more or correct existing data, you can quickly re-run your analysis! |
| 50 | + |
| 51 | + An increasing number of journals and funding agencies expect analyses to be reproducible, |
| 52 | + so knowing Python will give you an edge with these requirements. |
| 53 | + |
| 54 | +- **Versatility:** |
| 55 | + Python is a versatile language that integrates with many existing applications to enable something |
| 56 | + completely amazing. For example, one can use Python to generate manuscripts, so that if you need to |
| 57 | + update your data, analysis procedure, or change something else, you can quickly regenerate all the |
| 58 | + figures and your manuscript will be updated automatically. |
| 59 | + |
| 60 | + Python can read text files, connect to databases, and many other data formats, on your computer or |
| 61 | + on the web. |
| 62 | + |
| 63 | +- **Interdisciplinary and extensible:** |
| 64 | + Python provides a framework that allows anyone to combine approaches from different research |
| 65 | + (but not only) disciplines to best suit your analysis needs. |
| 66 | + |
| 67 | +- **Python has a large and welcoming community:** |
| 68 | + Thousands of people use Python daily. Many of them are willing to help you through mailing lists and |
| 69 | + websites, such as [Stack Overflow][stack-overflow] and [Anaconda community |
| 70 | + portal][anaconda-community]. |
| 71 | + |
| 72 | +- **Free and Open-Source Software (FOSS)... and Cross-Platform:** |
| 73 | + We know we have already said that but it is worth repeating. |
| 74 | + |
| 75 | + |
| 76 | +## Knowing your way around Anaconda |
| 77 | + |
| 78 | +[Anaconda][anaconda] distribution of Python includes a lot of its popular packages, |
| 79 | +such as the IPython console, Jupyter Notebook, and Spyder IDE. |
| 80 | +Have a quick look around the Anaconda Navigator. You can launch programs from the Navigator or use the command line. |
| 81 | + |
| 82 | +The [Jupyter Notebook](https://jupyter.org) is an open-source web application that allows you to create |
| 83 | +and share documents that allow one to create documents that combine code, graphs, and narrative text. |
| 84 | +[Spyder][spyder-ide] is an **Integrated Development Environment** that |
| 85 | +allows one to write Python scripts and interact with the Python software from within a single interface. |
| 86 | + |
| 87 | +Anaconda comes with a package manager called [conda](https://conda.io/docs/) |
| 88 | +used to install and update additional packages. |
| 89 | + |
| 90 | + |
| 91 | +## Research Project: Best Practices |
| 92 | + |
| 93 | +It is a good idea to keep a set of related data, analyses, and text in a single folder. |
| 94 | +All scripts and text files within this folder can then use relative paths to the data files. |
| 95 | +Working this way makes it a lot easier to move around your project and share it with others. |
| 96 | + |
| 97 | +### Organizing your working directory |
| 98 | + |
| 99 | +Using a consistent folder structure across your projects will help you keep things organized, |
| 100 | +and will also make it easy to find/file things in the future. This can be especially helpful |
| 101 | +when you have multiple projects. In general, you may wish to create separate directories for |
| 102 | +your scripts, data, and documents. |
| 103 | + |
| 104 | +- **`data/`**: Use this folder to store your raw data. For the sake of transparency and provenance, |
| 105 | + you should always keep a copy of your **raw data**. If you need to cleanup data, do it |
| 106 | + programmatically (*i.e.* with scripts) and make sure to separate cleaned up data from the raw data. |
| 107 | + For example, you can store raw data in files `./data/raw/` and clean data in `./data/clean/`. |
| 108 | + |
| 109 | +- **`documents/`**: Use this folder to store outlines, drafts, and other text. |
| 110 | + |
| 111 | +- **`code/`**: Use this folder to store your (Python) scripts for data cleaning, analysis, and |
| 112 | + plotting that you use in this particular project. |
| 113 | + |
| 114 | +You may need to create additional directories depending on your project needs, but these should form |
| 115 | +the backbone of your project's directory. For this workshop, we will need a `data/` folder to store |
| 116 | +our raw data, and we will later create a `data_output/` folder when we learn how to export data as |
| 117 | +CSV files. |
| 118 | + |
| 119 | +## What is Programming and Coding? |
| 120 | + |
| 121 | +Programming is the process of writing *"programs"* that a computer can execute and produce some |
| 122 | +(useful) output. |
| 123 | +Programming is a multi-step process that involves the following steps: |
| 124 | + |
| 125 | +1. Identifying the aspects of the real-world problem that can be solved computationally |
| 126 | +2. Identifying (the best) computational solution |
| 127 | +3. Implementing the solution in a specific computer language |
| 128 | +4. Testing, validating, and adjusting the implemented solution. |
| 129 | + |
| 130 | +While *"Programming"* refers to all of the above steps, |
| 131 | +*"Coding"* refers to step 3 only: *"Implementing the solution in a specific computer language"*. It's |
| 132 | +important to note that *"the best"* computational solution must consider factors beyond the computer. |
| 133 | +Who is using the program, what resources/funds does your team have for this project, and the available |
| 134 | +timeline all shape and mold what "best" may be. |
| 135 | + |
| 136 | +#### If you are working with Jupyter notebook: |
| 137 | + |
| 138 | +You can type Python code into a code cell and then execute the code by pressing |
| 139 | +<kbd>Shift</kbd>\+<kbd>Return</kbd>. |
| 140 | +Output will be printed directly under the input cell. |
| 141 | +You can recognise a code cell by the `In[ ]:` at the beginning of the cell and output by `Out[ ]:`. |
| 142 | +Pressing the **\+** button in the menu bar will add a new cell. |
| 143 | +All your commands as well as any output will be saved with the notebook. |
| 144 | + |
| 145 | +#### If you are working with Spyder: |
| 146 | + |
| 147 | +You can either use the console or use script files (plain text files that contain your code). The |
| 148 | +console pane (in Spyder, the bottom right panel) is the place where commands written in the Python |
| 149 | +language can be typed and executed immediately by the computer. It is also where the results will be |
| 150 | +shown. You can execute commands directly in the console by pressing <kbd>Return</kbd>, but they |
| 151 | +will be "lost" when you close the session. Spyder uses the [IPython](https://ipython.org) console by |
| 152 | +default. |
| 153 | + |
| 154 | +Since we want our code and workflow to be reproducible, it is better to type the commands in |
| 155 | +the script editor, and save them as a script. This way, there is a complete record of what we did, |
| 156 | +and anyone (including our future selves!) has an easier time reproducing the results on their computer. |
| 157 | + |
| 158 | +Spyder allows you to execute commands directly from the script editor by using the run buttons on |
| 159 | +top. To run the entire script click *Run file* or press <kbd>F5</kbd>, to run the current line |
| 160 | +click *Run selection or current line* or press <kbd>F9</kbd>, other run buttons allow to run script |
| 161 | +cells or go into debug mode. When using <kbd>F9</kbd>, the command on the current line in the script |
| 162 | +(indicated by the cursor) or all of the commands in the currently selected text will be sent to the |
| 163 | +console and executed. |
| 164 | + |
| 165 | +At some point in your analysis you may want to check the content of a variable or the structure of |
| 166 | +an object, without necessarily keeping a record of it in your script. You can type these commands |
| 167 | +and execute them directly in the console. Spyder provides the |
| 168 | +<kbd>Ctrl</kbd>\+<kbd>Shift</kbd>\+<kbd>E</kbd> and <kbd>Ctrl</kbd>\+<kbd>Shift</kbd>\+<kbd>I</kbd> |
| 169 | +shortcuts to allow you to jump between the script and the console panes. |
| 170 | + |
| 171 | +If Python is ready to accept commands, the IPython console shows an `In [..]:` prompt with the |
| 172 | +current console line number in `[]`. If it receives a command (by typing, copy-pasting or sent from |
| 173 | +the script editor), Python will execute it, display the results in the `Out [..]:` cell, and come |
| 174 | +back with a new `In [..]:` prompt waiting for new commands. |
| 175 | + |
| 176 | +If Python is still waiting for you to enter more data because it isn't complete yet, the console |
| 177 | +will show a `...:` prompt. It means that you haven't finished entering a complete command. This can |
| 178 | +be because you have not typed a closing parenthesis (`)`, `]`, or `}`) or quotation mark. When this |
| 179 | +happens, and you thought you finished typing your command, click inside the console window and press |
| 180 | +<kbd>Esc</kbd>; this will cancel the incomplete command and return you to the `In [..]:` prompt. |
| 181 | + |
| 182 | +## How to learn more after the workshop? |
| 183 | + |
| 184 | +The material we cover during this workshop will give you an initial taste of how you can use Python |
| 185 | +to analyze data for your own research. However, you will need to learn more to do advanced |
| 186 | +operations such as cleaning your dataset, using statistical methods, or creating beautiful graphics. |
| 187 | +The best way to become proficient and efficient at Python, as with any other tool, is to use it to |
| 188 | +address your actual research questions. As a beginner, it can feel daunting to have to write a |
| 189 | +script from scratch, and given that many people make their code available online, modifying existing |
| 190 | +code to suit your purpose might make it easier for you to get started. |
| 191 | + |
| 192 | +## Seeking help |
| 193 | + |
| 194 | +- check under the *Help* menu |
| 195 | +- type `help()` |
| 196 | +- type `?object` or `help(object)` to get information about an object |
| 197 | +- [Python documentation][python-docs] |
| 198 | +- [Pandas documentation][pandas-docs] |
| 199 | + |
| 200 | +Finally, a generic Google or internet search "Python task" will often either send you to the |
| 201 | +appropriate module documentation or a helpful forum where someone else has already asked your |
| 202 | +question. |
| 203 | + |
| 204 | +I am stuck... I get an error message that I don't understand. |
| 205 | +Start by googling the error message. However, this doesn't always work very well, because often, |
| 206 | +package developers rely on the error catching provided by Python. You end up with general error |
| 207 | +messages that might not be very helpful to diagnose a problem (e.g. "subscript out of bounds"). If |
| 208 | +the message is very generic, you might also include the name of the function or package you're using |
| 209 | +in your query. |
| 210 | + |
| 211 | +However, you should check Stack Overflow. Search using the `[python]` tag. Most questions have already |
| 212 | +been answered, but the challenge is to use the right words in the search to find the answers: |
| 213 | +[https://stackoverflow.com/questions/tagged/python?tab=Votes][so-python] |
| 214 | + |
| 215 | +### Asking for help |
| 216 | + |
| 217 | +The key to receiving help from someone is for them to rapidly grasp your problem. You should make it |
| 218 | +as easy as possible to pinpoint where the issue might be. |
| 219 | + |
| 220 | +Try to use the correct words to describe your problem. For instance, a package is not the same thing |
| 221 | +as a library. Most people will understand what you meant, but others have really strong feelings |
| 222 | +about the difference in meaning. The key point is that it can make things confusing for people |
| 223 | +trying to help you. Be as precise as possible when describing your problem. |
| 224 | + |
| 225 | +If possible, try to reduce what doesn't work to a simple reproducible example. If you can reproduce |
| 226 | +the problem using a very small data frame instead of your 50,000 rows and 10,000 columns one, |
| 227 | +provide the small one with the description of your problem. When appropriate, try to generalize what |
| 228 | +you are doing so even people who are not in your field can understand the question. For instance, |
| 229 | +instead of using a subset of your real dataset, create a small (3 columns, 5 rows) generic one. |
| 230 | + |
| 231 | +### Where to ask for help? |
| 232 | + |
| 233 | +- The person sitting next to you during the workshop. Don't hesitate to talk to your neighbor during |
| 234 | + the workshop, compare your answers, and ask for help. You might also be interested in organizing |
| 235 | + regular meetings following the workshop to keep learning from each other. |
| 236 | +- Your friendly colleagues: if you know someone with more experience than you, they might be able and |
| 237 | + willing to help you. |
| 238 | +- [Stack Overflow][so-python]: if your question hasn't been answered before and is well crafted, |
| 239 | + chances are you will get an answer in less than 5 min. Remember to follow their guidelines on how to |
| 240 | + ask a good question. |
| 241 | +- [Python mailing lists][python-mailing-lists] |
| 242 | + |
| 243 | +## More resources |
| 244 | + |
| 245 | +- [PyPI - the Python Package Index][pypi] |
| 246 | + |
| 247 | +- [The Hitchhiker's Guide to Python][python-guide] |
| 248 | + |
| 249 | +- [Dive into Python 3][dive-into-python3] |
| 250 | + |
| 251 | + |
| 252 | + |
| 253 | +[stack-overflow]: https://stackoverflow.com |
| 254 | +[anaconda-community]: https://www.anaconda.com/community |
| 255 | +[anaconda]: https://www.anaconda.com/download |
| 256 | +[spyder-ide]: https://www.spyder-ide.org |
| 257 | +[python-docs]: https://www.python.org/doc |
| 258 | +[pandas-docs]: https://pandas.pydata.org/pandas-docs/stable/ |
| 259 | +[so-python]: https://stackoverflow.com/questions/tagged/python?tab=Votes |
| 260 | +[python-mailing-lists]: https://www.python.org/community/lists |
| 261 | +[pypi]: https://pypi.org/ |
| 262 | +[python-guide]: https://docs.python-guide.org |
| 263 | +[dive-into-python3]: https://diveintopython3.net/ |
| 264 | + |
| 265 | + |
| 266 | +:::::::::::::::::::::::::::::::::::::::: keypoints |
| 267 | + |
| 268 | +- Python is an open source and platform independent programming language. |
| 269 | +- Jupyter Notebook and the Spyder IDE are great tools to code in and interact with Python. With the large Python community it is easy to find help on the internet. |
| 270 | + |
| 271 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 272 | + |
| 273 | + |
0 commit comments