Skip to content

Commit 62243a9

Browse files
authored
Updated two more urls due to reviewer suggestion
1 parent f9b97ec commit 62243a9

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

content/tutorial-nlp-from-scratch.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,8 @@ We made sure to include different demographics in our data and included a range
107107
1. **Text Denoising** : Before converting your text into vectors, it is important to clean it and remove all unhelpful parts a.k.a the noise from your data by converting all characters to lowercase, removing html tags, brackets and stop words (words that don't add much meaning to a sentence). Without this step the dataset is often a cluster of words that the computer doesn't understand.
108108

109109

110-
2. **Converting words to vectors** : A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Individual words are represented as real-valued vectors in a predefined vector space. GloVe is an unsupervised algorithm developed by Stanford for generating word embeddings by generating global word-word co-occurence matrix from a corpus. You can download the zipped files containing the embeddings from <https://nlp.stanford.edu/projects/glove/>. Here you can choose any of the four options for different sizes or training datasets. We have chosen the least memory consuming embedding file.
111-
>The GloVe word embeddings include sets that were trained on billions of tokens, some up to 840 billion tokens. These algorithms exhibit stereotypical biases, such as gender bias which can be traced back to the original training data. For example certain occupations seem to be more biased towards a particular gender, reinforcing problematic stereotypes. The nearest solution to this problem are some de-biasing algorithms as the one presented in <https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6835575.pdf> which one can use on embeddings of their choice to mitigate bias, if present.
110+
2. **Converting words to vectors** : A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Individual words are represented as real-valued vectors in a predefined vector space. GloVe is an unsupervised algorithm developed by Stanford for generating word embeddings by generating global word-word co-occurence matrix from a corpus. You can download the zipped files containing the embeddings from [the GloVe official website](https://nlp.stanford.edu/projects/glove/). Here you can choose any of the four options for different sizes or training datasets. We have chosen the least memory consuming embedding file.
111+
>The GloVe word embeddings include sets that were trained on billions of tokens, some up to 840 billion tokens. These algorithms exhibit stereotypical biases, such as gender bias which can be traced back to the original training data. For example certain occupations seem to be more biased towards a particular gender, reinforcing problematic stereotypes. The nearest solution to this problem are some de-biasing algorithms as the one presented in [this research article](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6835575.pdf), which one can use on embeddings of their choice to mitigate bias, if present.
112112
<!-- #endregion -->
113113
114114
You'll start with importing the necessary packages to build our Deep Learning network.

0 commit comments

Comments
 (0)