Skip to content

Commit 72b96f9

Browse files
committed
removing punctuations
1 parent fbf9565 commit 72b96f9

File tree

2 files changed

+349
-0
lines changed

2 files changed

+349
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@
88
6) strings in Python: Introduction.
99
7) Emojis in Python with some examples.
1010
8) Working with polynomials represented by lists in Python. Here, we implement, from scratch, addition and multiplication of two polynomials.
11+
9) Removing punctuations from a given text. Here, we also download a text-file from the Internet using *request* of *urllib*.

removing punctuations from text.ipynb

+348
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,348 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Python Everything:\n",
8+
"9) **Removing punctuations from a text**\n",
9+
"<br>Also, getting a text file from the Internet using *request*\n",
10+
"\n",
11+
"# ورداشتن نقطه‌گون‌ها از یک مَتیان\n",
12+
"همچنین، نمونه ای از بارگیری یک پرونده از روی اندرتور\n",
13+
"<br>By Hamed Shah-Hosseini<br>\n",
14+
"Explanation in English:<br>\n",
15+
"https://www.pinterest.com/HamedShahHosseini/programming-languages/python\n",
16+
"<br> Explanation in Persian: https://www.instagram.com/words.persian\n",
17+
"<br>The code is at: https://github.com/ostad-ai/Python-Everything"
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": 2,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"import string"
27+
]
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"metadata": {},
32+
"source": [
33+
"نمایش ماریک‌های نقطه‌گون :"
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": 71,
39+
"metadata": {},
40+
"outputs": [
41+
{
42+
"name": "stdout",
43+
"output_type": "stream",
44+
"text": [
45+
"!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~\n"
46+
]
47+
}
48+
],
49+
"source": [
50+
"# showing the punctuations\n",
51+
"print(string.punctuation)"
52+
]
53+
},
54+
{
55+
"cell_type": "markdown",
56+
"metadata": {},
57+
"source": [
58+
"شناساندن دو کارکنش ساده، برای ورداشتن نقطه‌گون‌ها از متیان :"
59+
]
60+
},
61+
{
62+
"cell_type": "code",
63+
"execution_count": 70,
64+
"metadata": {},
65+
"outputs": [],
66+
"source": [
67+
"# simple function to remove punctuation marks\n",
68+
"def remove_punctuations(text,ps=string.punctuation):\n",
69+
" return ''.join([letter for letter in text\\\n",
70+
" if letter not in ps])\n",
71+
"\n",
72+
"# simple function to include only alphanumeric\n",
73+
"def only_alphanumeric(text):\n",
74+
" return ''.join([letter for letter in text\\\n",
75+
" if letter.isalnum() or letter.isspace()])"
76+
]
77+
},
78+
{
79+
"cell_type": "markdown",
80+
"metadata": {},
81+
"source": [
82+
"نمونه‌ای از کاریرد کارکنش بالا، در ورداشتن نقطه‌گون‌ها :"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": 67,
88+
"metadata": {},
89+
"outputs": [
90+
{
91+
"name": "stdout",
92+
"output_type": "stream",
93+
"text": [
94+
"---Text with punctuations:\n",
95+
" Hello world! Remove punctuations marks from this text; will you?\n",
96+
"By the way, Good luck.\n",
97+
"---------\n",
98+
"---Text without punctuations:\n",
99+
" Hello world Remove punctuations marks from this text will you\n",
100+
"By the way Good luck\n"
101+
]
102+
}
103+
],
104+
"source": [
105+
"#removing punctuations from a text with function remove_punctuations\n",
106+
"text='Hello world! Remove punctuations marks from this text;'\n",
107+
"text+=' will you?\\nBy the way, Good luck.'\n",
108+
"print('---Text with punctuations:\\n',text)\n",
109+
"print('---------')\n",
110+
"print('---Text without punctuations:\\n',remove_punctuations(text))"
111+
]
112+
},
113+
{
114+
"cell_type": "code",
115+
"execution_count": 68,
116+
"metadata": {},
117+
"outputs": [
118+
{
119+
"name": "stdout",
120+
"output_type": "stream",
121+
"text": [
122+
"---Text with punctuations:\n",
123+
" Hello world! Remove punctuations marks from this text; will you?\n",
124+
"By the way, Good luck.\n",
125+
"---------\n",
126+
"---Text without punctuations:\n",
127+
" Hello world Remove punctuations marks from this text will you\n",
128+
"By the way Good luck\n"
129+
]
130+
}
131+
],
132+
"source": [
133+
"#removing punctuations from a text with function only_alphanumeric\n",
134+
"text='Hello world! Remove punctuations marks from this text;'\n",
135+
"text+=' will you?\\nBy the way, Good luck.'\n",
136+
"print('---Text with punctuations:\\n',text)\n",
137+
"print('---------')\n",
138+
"print('---Text without punctuations:\\n',only_alphanumeric(text))"
139+
]
140+
},
141+
{
142+
"cell_type": "markdown",
143+
"metadata": {},
144+
"source": [
145+
"---\n",
146+
"Let's download a text file from the Internet <br>\n",
147+
"بیاییم از اندرتور، یک پرونده متیان را بارگیری کنیم و روی آن، ورداشتن نقطه‌گون‌ها را انجام دهیم :"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": 79,
153+
"metadata": {},
154+
"outputs": [
155+
{
156+
"name": "stdout",
157+
"output_type": "stream",
158+
"text": [
159+
"file is downloaded\n"
160+
]
161+
}
162+
],
163+
"source": [
164+
"#gettign a text file from Internet\n",
165+
"from urllib import request\n",
166+
"file_url='https://raw.githubusercontent.com/ostad-ai/'+\\\n",
167+
"'Machine-Learning/main/alice_in_wonderland.txt'\n",
168+
"with request.urlopen(file_url) as file:\n",
169+
" alice=file.read().decode('utf-8')\n",
170+
" print('file is downloaded')"
171+
]
172+
},
173+
{
174+
"cell_type": "code",
175+
"execution_count": 61,
176+
"metadata": {},
177+
"outputs": [
178+
{
179+
"name": "stdout",
180+
"output_type": "stream",
181+
"text": [
182+
"---Text with punctuations:\n",
183+
" l as this, I shall\n",
184+
"think nothing of tumbling down stairs! How brave they’ll all think me\n",
185+
"at home! Why, I wouldn’t say anything about it, even if I fell off the\n",
186+
"top of the house!” (Which was very likely\n",
187+
"---------\n",
188+
"---Text without punctuations:\n",
189+
" l as this I shall\n",
190+
"think nothing of tumbling down stairs How brave they’ll all think me\n",
191+
"at home Why I wouldn’t say anything about it even if I fell off the\n",
192+
"top of the house” Which was very likely\n"
193+
]
194+
}
195+
],
196+
"source": [
197+
"text2=alice[3000:3201]\n",
198+
"print('---Text with punctuations:\\n',text2)\n",
199+
"print('---------')\n",
200+
"print('---Text without punctuations:\\n',remove_punctuations(text2))"
201+
]
202+
},
203+
{
204+
"cell_type": "markdown",
205+
"metadata": {},
206+
"source": [
207+
"It is seen that in above punctuation removal, something is wrong. The four below punctuations have not been removed\n",
208+
"<br>در ورداشتن نقطه‌گون‌ها در بالا، دیده میشود که چهار نویسه زیر، همچنان برجا مانده‌اند"
209+
]
210+
},
211+
{
212+
"cell_type": "code",
213+
"execution_count": 62,
214+
"metadata": {},
215+
"outputs": [
216+
{
217+
"name": "stdout",
218+
"output_type": "stream",
219+
"text": [
220+
"Left single quotation: ‘\n",
221+
"Right single quotation: ’\n",
222+
"Left double quotation: “\n",
223+
"Right double quotation: ”\n"
224+
]
225+
}
226+
],
227+
"source": [
228+
"left_single_quotation='\\U00002018'\n",
229+
"right_single_quotation='\\U00002019'\n",
230+
"left_double_quotation='\\U0000201C'\n",
231+
"right_double_quotation='\\U0000201D'\n",
232+
"print('Left single quotation: ',left_single_quotation)\n",
233+
"print('Right single quotation: ',right_single_quotation)\n",
234+
"print('Left double quotation: ',left_double_quotation)\n",
235+
"print('Right double quotation: ',right_double_quotation)"
236+
]
237+
},
238+
{
239+
"cell_type": "markdown",
240+
"metadata": {},
241+
"source": [
242+
"---\n",
243+
"Let's also remove the four quotations, mentioned above <br>\n",
244+
"اکنون چهار نویسه بالا را نیز، در رشته نقطه‌گون‌ها میگذاریم، تا از متیان داده شده، ورداشته شوند"
245+
]
246+
},
247+
{
248+
"cell_type": "code",
249+
"execution_count": 63,
250+
"metadata": {},
251+
"outputs": [
252+
{
253+
"name": "stdout",
254+
"output_type": "stream",
255+
"text": [
256+
"---Text with punctuations:\n",
257+
" l as this, I shall\n",
258+
"think nothing of tumbling down stairs! How brave they’ll all think me\n",
259+
"at home! Why, I wouldn’t say anything about it, even if I fell off the\n",
260+
"top of the house!” (Which was very likely\n",
261+
"---------\n",
262+
"---Text without punctuations:\n",
263+
" l as this I shall\n",
264+
"think nothing of tumbling down stairs How brave theyll all think me\n",
265+
"at home Why I wouldnt say anything about it even if I fell off the\n",
266+
"top of the house Which was very likely\n"
267+
]
268+
}
269+
],
270+
"source": [
271+
"#this time, we include the four quotation marks, mentioned above\n",
272+
"text2=alice[3000:3201]\n",
273+
"print('---Text with punctuations:\\n',text2)\n",
274+
"print('---------')\n",
275+
"ps=string.punctuation+'\\U00002018'+'\\U00002019'+'\\U0000201C'+'\\U0000201D'\n",
276+
"print('---Text without punctuations:\\n',remove_punctuations(text2,ps))"
277+
]
278+
},
279+
{
280+
"cell_type": "markdown",
281+
"metadata": {},
282+
"source": [
283+
"---\n",
284+
"We may use function *only_alphanumeric* for the above text <br>\n",
285+
"میتوانیم کارکنش دیگر را برای متیان بالا به کار ببریم، و ببینیم به چه دستاوردی میرسد :"
286+
]
287+
},
288+
{
289+
"cell_type": "code",
290+
"execution_count": 65,
291+
"metadata": {},
292+
"outputs": [
293+
{
294+
"name": "stdout",
295+
"output_type": "stream",
296+
"text": [
297+
"---Text with punctuations:\n",
298+
" l as this, I shall\n",
299+
"think nothing of tumbling down stairs! How brave they’ll all think me\n",
300+
"at home! Why, I wouldn’t say anything about it, even if I fell off the\n",
301+
"top of the house!” (Which was very likely\n",
302+
"---------\n",
303+
"---Text without punctuations:\n",
304+
" l as this I shall\n",
305+
"think nothing of tumbling down stairs How brave theyll all think me\n",
306+
"at home Why I wouldnt say anything about it even if I fell off the\n",
307+
"top of the house Which was very likely\n"
308+
]
309+
}
310+
],
311+
"source": [
312+
"# keeping only alphanumeric characters for alice text\n",
313+
"text2=alice[3000:3201]\n",
314+
"print('---Text with punctuations:\\n',text2)\n",
315+
"print('---------')\n",
316+
"print('---Text without punctuations:\\n',only_alphanumeric(text2))"
317+
]
318+
},
319+
{
320+
"cell_type": "code",
321+
"execution_count": null,
322+
"metadata": {},
323+
"outputs": [],
324+
"source": []
325+
}
326+
],
327+
"metadata": {
328+
"kernelspec": {
329+
"display_name": "Python 3",
330+
"language": "python",
331+
"name": "python3"
332+
},
333+
"language_info": {
334+
"codemirror_mode": {
335+
"name": "ipython",
336+
"version": 3
337+
},
338+
"file_extension": ".py",
339+
"mimetype": "text/x-python",
340+
"name": "python",
341+
"nbconvert_exporter": "python",
342+
"pygments_lexer": "ipython3",
343+
"version": "3.8.3"
344+
}
345+
},
346+
"nbformat": 4,
347+
"nbformat_minor": 5
348+
}

0 commit comments

Comments
 (0)