Skip to content

Commit 88d575d

Browse files
committed
source commit: 4f62521
0 parents  commit 88d575d

File tree

154 files changed

+155434
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

154 files changed

+155434
-0
lines changed

00-before-we-start.md

+273
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
---
2+
title: Before we start
3+
teaching: 30
4+
exercises: 0
5+
---
6+
7+
::::::::::::::::::::::::::::::::::::::: objectives
8+
9+
- Present motivations for using Python.
10+
- Organize files and directories for a set of analyses as a Python project, and understand the purpose of the working directory.
11+
- How to work with Jupyter Notebook and Spyder.
12+
- Know where to find help.
13+
- Demonstrate how to provide sufficient information for troubleshooting with the Python user community.
14+
15+
::::::::::::::::::::::::::::::::::::::::::::::::::
16+
17+
:::::::::::::::::::::::::::::::::::::::: questions
18+
19+
- What is Python and why should I learn it?
20+
21+
::::::::::::::::::::::::::::::::::::::::::::::::::
22+
23+
## What is Python?
24+
25+
Python is a general purpose programming language that supports rapid development of data analytics
26+
applications. The word "Python" is used to refer to both, the programming language and the tool
27+
that executes the scripts written in Python language.
28+
29+
Its main advantages are:
30+
31+
- Free
32+
- Open-source
33+
- Available on all major platforms (macOS, Linux, Windows)
34+
- Supported by Python Software Foundation
35+
- Supports multiple programming paradigms
36+
- Has large community
37+
- Rich ecosystem of third-party packages
38+
39+
*So, why do you need Python for data analysis?*
40+
41+
- **Easy to learn:**
42+
Python is easier to learn than other programming languages. This is important because lower barriers
43+
mean it is easier for new members of the community to get up to speed.
44+
45+
- **Reproducibility:**
46+
Reproducibility is the ability to obtain the same results using the same dataset(s) and analysis.
47+
48+
Data analysis written as a Python script can be reproduced on any platform. Moreover, if you
49+
collect more or correct existing data, you can quickly re-run your analysis!
50+
51+
An increasing number of journals and funding agencies expect analyses to be reproducible,
52+
so knowing Python will give you an edge with these requirements.
53+
54+
- **Versatility:**
55+
Python is a versatile language that integrates with many existing applications to enable something
56+
completely amazing. For example, one can use Python to generate manuscripts, so that if you need to
57+
update your data, analysis procedure, or change something else, you can quickly regenerate all the
58+
figures and your manuscript will be updated automatically.
59+
60+
Python can read text files, connect to databases, and many other data formats, on your computer or
61+
on the web.
62+
63+
- **Interdisciplinary and extensible:**
64+
Python provides a framework that allows anyone to combine approaches from different research
65+
(but not only) disciplines to best suit your analysis needs.
66+
67+
- **Python has a large and welcoming community:**
68+
Thousands of people use Python daily. Many of them are willing to help you through mailing lists and
69+
websites, such as [Stack Overflow][stack-overflow] and [Anaconda community
70+
portal][anaconda-community].
71+
72+
- **Free and Open-Source Software (FOSS)... and Cross-Platform:**
73+
We know we have already said that but it is worth repeating.
74+
75+
76+
## Knowing your way around Anaconda
77+
78+
[Anaconda][anaconda] distribution of Python includes a lot of its popular packages,
79+
such as the IPython console, Jupyter Notebook, and Spyder IDE.
80+
Have a quick look around the Anaconda Navigator. You can launch programs from the Navigator or use the command line.
81+
82+
The [Jupyter Notebook](https://jupyter.org) is an open-source web application that allows you to create
83+
and share documents that allow one to create documents that combine code, graphs, and narrative text.
84+
[Spyder][spyder-ide] is an **Integrated Development Environment** that
85+
allows one to write Python scripts and interact with the Python software from within a single interface.
86+
87+
Anaconda comes with a package manager called [conda](https://conda.io/docs/)
88+
used to install and update additional packages.
89+
90+
91+
## Research Project: Best Practices
92+
93+
It is a good idea to keep a set of related data, analyses, and text in a single folder.
94+
All scripts and text files within this folder can then use relative paths to the data files.
95+
Working this way makes it a lot easier to move around your project and share it with others.
96+
97+
### Organizing your working directory
98+
99+
Using a consistent folder structure across your projects will help you keep things organized,
100+
and will also make it easy to find/file things in the future. This can be especially helpful
101+
when you have multiple projects. In general, you may wish to create separate directories for
102+
your scripts, data, and documents.
103+
104+
- **`data/`**: Use this folder to store your raw data. For the sake of transparency and provenance,
105+
you should always keep a copy of your **raw data**. If you need to cleanup data, do it
106+
programmatically (*i.e.* with scripts) and make sure to separate cleaned up data from the raw data.
107+
For example, you can store raw data in files `./data/raw/` and clean data in `./data/clean/`.
108+
109+
- **`documents/`**: Use this folder to store outlines, drafts, and other text.
110+
111+
- **`code/`**: Use this folder to store your (Python) scripts for data cleaning, analysis, and
112+
plotting that you use in this particular project.
113+
114+
You may need to create additional directories depending on your project needs, but these should form
115+
the backbone of your project's directory. For this workshop, we will need a `data/` folder to store
116+
our raw data, and we will later create a `data_output/` folder when we learn how to export data as
117+
CSV files.
118+
119+
## What is Programming and Coding?
120+
121+
Programming is the process of writing *"programs"* that a computer can execute and produce some
122+
(useful) output.
123+
Programming is a multi-step process that involves the following steps:
124+
125+
1. Identifying the aspects of the real-world problem that can be solved computationally
126+
2. Identifying (the best) computational solution
127+
3. Implementing the solution in a specific computer language
128+
4. Testing, validating, and adjusting the implemented solution.
129+
130+
While *"Programming"* refers to all of the above steps,
131+
*"Coding"* refers to step 3 only: *"Implementing the solution in a specific computer language"*. It's
132+
important to note that *"the best"* computational solution must consider factors beyond the computer.
133+
Who is using the program, what resources/funds does your team have for this project, and the available
134+
timeline all shape and mold what "best" may be.
135+
136+
#### If you are working with Jupyter notebook:
137+
138+
You can type Python code into a code cell and then execute the code by pressing
139+
<kbd>Shift</kbd>\+<kbd>Return</kbd>.
140+
Output will be printed directly under the input cell.
141+
You can recognise a code cell by the `In[ ]:` at the beginning of the cell and output by `Out[ ]:`.
142+
Pressing the **\+** button in the menu bar will add a new cell.
143+
All your commands as well as any output will be saved with the notebook.
144+
145+
#### If you are working with Spyder:
146+
147+
You can either use the console or use script files (plain text files that contain your code). The
148+
console pane (in Spyder, the bottom right panel) is the place where commands written in the Python
149+
language can be typed and executed immediately by the computer. It is also where the results will be
150+
shown. You can execute commands directly in the console by pressing <kbd>Return</kbd>, but they
151+
will be "lost" when you close the session. Spyder uses the [IPython](https://ipython.org) console by
152+
default.
153+
154+
Since we want our code and workflow to be reproducible, it is better to type the commands in
155+
the script editor, and save them as a script. This way, there is a complete record of what we did,
156+
and anyone (including our future selves!) has an easier time reproducing the results on their computer.
157+
158+
Spyder allows you to execute commands directly from the script editor by using the run buttons on
159+
top. To run the entire script click *Run file* or press <kbd>F5</kbd>, to run the current line
160+
click *Run selection or current line* or press <kbd>F9</kbd>, other run buttons allow to run script
161+
cells or go into debug mode. When using <kbd>F9</kbd>, the command on the current line in the script
162+
(indicated by the cursor) or all of the commands in the currently selected text will be sent to the
163+
console and executed.
164+
165+
At some point in your analysis you may want to check the content of a variable or the structure of
166+
an object, without necessarily keeping a record of it in your script. You can type these commands
167+
and execute them directly in the console. Spyder provides the
168+
<kbd>Ctrl</kbd>\+<kbd>Shift</kbd>\+<kbd>E</kbd> and <kbd>Ctrl</kbd>\+<kbd>Shift</kbd>\+<kbd>I</kbd>
169+
shortcuts to allow you to jump between the script and the console panes.
170+
171+
If Python is ready to accept commands, the IPython console shows an `In [..]:` prompt with the
172+
current console line number in `[]`. If it receives a command (by typing, copy-pasting or sent from
173+
the script editor), Python will execute it, display the results in the `Out [..]:` cell, and come
174+
back with a new `In [..]:` prompt waiting for new commands.
175+
176+
If Python is still waiting for you to enter more data because it isn't complete yet, the console
177+
will show a `...:` prompt. It means that you haven't finished entering a complete command. This can
178+
be because you have not typed a closing parenthesis (`)`, `]`, or `}`) or quotation mark. When this
179+
happens, and you thought you finished typing your command, click inside the console window and press
180+
<kbd>Esc</kbd>; this will cancel the incomplete command and return you to the `In [..]:` prompt.
181+
182+
## How to learn more after the workshop?
183+
184+
The material we cover during this workshop will give you an initial taste of how you can use Python
185+
to analyze data for your own research. However, you will need to learn more to do advanced
186+
operations such as cleaning your dataset, using statistical methods, or creating beautiful graphics.
187+
The best way to become proficient and efficient at Python, as with any other tool, is to use it to
188+
address your actual research questions. As a beginner, it can feel daunting to have to write a
189+
script from scratch, and given that many people make their code available online, modifying existing
190+
code to suit your purpose might make it easier for you to get started.
191+
192+
## Seeking help
193+
194+
- check under the *Help* menu
195+
- type `help()`
196+
- type `?object` or `help(object)` to get information about an object
197+
- [Python documentation][python-docs]
198+
- [Pandas documentation][pandas-docs]
199+
200+
Finally, a generic Google or internet search "Python task" will often either send you to the
201+
appropriate module documentation or a helpful forum where someone else has already asked your
202+
question.
203+
204+
I am stuck... I get an error message that I don't understand.
205+
Start by googling the error message. However, this doesn't always work very well, because often,
206+
package developers rely on the error catching provided by Python. You end up with general error
207+
messages that might not be very helpful to diagnose a problem (e.g. "subscript out of bounds"). If
208+
the message is very generic, you might also include the name of the function or package you're using
209+
in your query.
210+
211+
However, you should check Stack Overflow. Search using the `[python]` tag. Most questions have already
212+
been answered, but the challenge is to use the right words in the search to find the answers:
213+
[https://stackoverflow.com/questions/tagged/python?tab=Votes][so-python]
214+
215+
### Asking for help
216+
217+
The key to receiving help from someone is for them to rapidly grasp your problem. You should make it
218+
as easy as possible to pinpoint where the issue might be.
219+
220+
Try to use the correct words to describe your problem. For instance, a package is not the same thing
221+
as a library. Most people will understand what you meant, but others have really strong feelings
222+
about the difference in meaning. The key point is that it can make things confusing for people
223+
trying to help you. Be as precise as possible when describing your problem.
224+
225+
If possible, try to reduce what doesn't work to a simple reproducible example. If you can reproduce
226+
the problem using a very small data frame instead of your 50,000 rows and 10,000 columns one,
227+
provide the small one with the description of your problem. When appropriate, try to generalize what
228+
you are doing so even people who are not in your field can understand the question. For instance,
229+
instead of using a subset of your real dataset, create a small (3 columns, 5 rows) generic one.
230+
231+
### Where to ask for help?
232+
233+
- The person sitting next to you during the workshop. Don't hesitate to talk to your neighbor during
234+
the workshop, compare your answers, and ask for help. You might also be interested in organizing
235+
regular meetings following the workshop to keep learning from each other.
236+
- Your friendly colleagues: if you know someone with more experience than you, they might be able and
237+
willing to help you.
238+
- [Stack Overflow][so-python]: if your question hasn't been answered before and is well crafted,
239+
chances are you will get an answer in less than 5 min. Remember to follow their guidelines on how to
240+
ask a good question.
241+
- [Python mailing lists][python-mailing-lists]
242+
243+
## More resources
244+
245+
- [PyPI - the Python Package Index][pypi]
246+
247+
- [The Hitchhiker's Guide to Python][python-guide]
248+
249+
- [Dive into Python 3][dive-into-python3]
250+
251+
252+
253+
[stack-overflow]: https://stackoverflow.com
254+
[anaconda-community]: https://www.anaconda.com/community
255+
[anaconda]: https://www.anaconda.com/download
256+
[spyder-ide]: https://www.spyder-ide.org
257+
[python-docs]: https://www.python.org/doc
258+
[pandas-docs]: https://pandas.pydata.org/pandas-docs/stable/
259+
[so-python]: https://stackoverflow.com/questions/tagged/python?tab=Votes
260+
[python-mailing-lists]: https://www.python.org/community/lists
261+
[pypi]: https://pypi.org/
262+
[python-guide]: https://docs.python-guide.org
263+
[dive-into-python3]: https://diveintopython3.net/
264+
265+
266+
:::::::::::::::::::::::::::::::::::::::: keypoints
267+
268+
- Python is an open source and platform independent programming language.
269+
- Jupyter Notebook and the Spyder IDE are great tools to code in and interact with Python. With the large Python community it is easy to find help on the internet.
270+
271+
::::::::::::::::::::::::::::::::::::::::::::::::::
272+
273+

0 commit comments

Comments
 (0)