-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourcery refactored master branch #1
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to GitHub API limits, only the first 60 comments can be shown.
data = body["_source"] | ||
return data | ||
return body["_source"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Dictionary.get_word
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
search_text_box = placeholder.text_input('Word', value=st.session_state['current_word'], key='sidebar_text_input') | ||
if search_text_box: | ||
if search_text_box := placeholder.text_input( | ||
'Word', | ||
value=st.session_state['current_word'], | ||
key='sidebar_text_input', | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 43-59
refactored with the following changes:
- Use named expression to simplify assignment and conditional [×2] (
use-named-expression
)
f = open(DICTIONARY_FILE, "a") | ||
i = 0 | ||
for res in query_doc(es, index_name): | ||
docs = res["hits"]["hits"] | ||
print(len(docs)) | ||
doc_content = extract_dictionary_from_elasticsearch(docs) | ||
if len(doc_content) > 1: | ||
dict_content = yaml.dump(doc_content, allow_unicode=True, sort_keys=True) | ||
f.write(dict_content) | ||
i += 1 | ||
f.close() | ||
with open(DICTIONARY_FILE, "a") as f: | ||
i = 0 | ||
for res in query_doc(es, index_name): | ||
docs = res["hits"]["hits"] | ||
print(len(docs)) | ||
doc_content = extract_dictionary_from_elasticsearch(docs) | ||
if len(doc_content) > 1: | ||
dict_content = yaml.dump(doc_content, allow_unicode=True, sort_keys=True) | ||
f.write(dict_content) | ||
i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 40-50
refactored with the following changes:
- Use
with
when opening file to ensure closure (ensure-file-closed
)
dict = joblib.load(dict_file) | ||
return dict | ||
return joblib.load(dict_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function load_dictionary
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
"ADJ", "ADV", "INTJ", "NOUN", "PROPN", "PRON", "SYM", "X", "N:G", "VERB:G", "NY", | ||
"N", "NB", "NNPy", | ||
"NNP", "NNPy", | ||
"V", "VERB", | ||
"Num", "NUMx", "NUM", "NUMX" | ||
"ADJ", | ||
"ADV", | ||
"INTJ", | ||
"NOUN", | ||
"PROPN", | ||
"PRON", | ||
"SYM", | ||
"X", | ||
"N:G", | ||
"VERB:G", | ||
"NY", | ||
"N", | ||
"NB", | ||
"NNP", | ||
"NNPy", | ||
"V", | ||
"VERB", | ||
"Num", | ||
"NUMx", | ||
"NUM", | ||
"NUMX", | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 63-67
refactored with the following changes:
- Remove duplicate keys when instantiating sets (
remove-duplicate-set-key
)
new_sentence = "\n".join(result) | ||
return new_sentence | ||
return "\n".join(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function add_lemma_column
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
if r is None: | ||
return None | ||
return r.json() | ||
return None if r is None else r.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function ChatUser.send
refactored with the following changes:
- Lift code into else after jump in control flow (
reintroduce-else
) - Replace if statement with if expression (
assign-if-exp
)
max_len = 0 | ||
for file in ["train.txt", "test.txt", "dev.txt"]: | ||
print(file) | ||
max_len = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 31-31
refactored with the following changes:
- Hoist statements out of for/while loops (
hoist-statement-from-loop
)
if label not in self.label2index: | ||
index = self.vocab_size | ||
self.label2index[label] = index | ||
self.index2label[index] = label | ||
self.vocab_size += 1 | ||
return index | ||
else: | ||
if label in self.label2index: | ||
return self.label2index[label] | ||
index = self.vocab_size | ||
self.label2index[label] = index | ||
self.index2label[index] = label | ||
self.vocab_size += 1 | ||
return index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function LabelEncoder.encode
refactored with the following changes:
- Swap if/else branches (
swap-if-else-branches
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
optimizer = AdamW(self.parameters(), lr=2e-5) | ||
return optimizer | ||
return AdamW(self.parameters(), lr=2e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function BertForTokenClassification.configure_optimizers
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
output_line = line.split()[0] + " " + preds_list[example_id].pop(0) + "\n" | ||
output_line = f"{line.split()[0]} {preds_list[example_id].pop(0)}" + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function NER.write_predictions_to_file
refactored with the following changes:
- Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation
)
if path: | ||
with open(path, "r") as f: | ||
labels = f.read().splitlines() | ||
if "O" not in labels: | ||
labels = ["O"] + labels | ||
return labels | ||
else: | ||
if not path: | ||
return ["O", "B-MISC", "I-MISC", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC"] | ||
with open(path, "r") as f: | ||
labels = f.read().splitlines() | ||
if "O" not in labels: | ||
labels = ["O"] + labels | ||
return labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function NER.get_labels
refactored with the following changes:
- Swap if/else branches (
swap-if-else-branches
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
if path: | ||
with open(path, "r") as f: | ||
labels = f.read().splitlines() | ||
if "O" not in labels: | ||
labels = ["O"] + labels | ||
return labels | ||
else: | ||
if not path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Chunk.get_labels
refactored with the following changes:
- Swap if/else branches (
swap-if-else-branches
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
optimizer = AdamW(self.parameters(), lr=2e-5) | ||
return optimizer | ||
return AdamW(self.parameters(), lr=2e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function BertForMultilabelClassification.configure_optimizers
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
loss = 0 | ||
gpt2_outputs = self.gpt2(input_ids) | ||
hidden_states = gpt2_outputs[0].squeeze() | ||
logits = self.logit(self.linear(hidden_states)) | ||
batch_size, sequence_length = input_ids.shape[:2] | ||
logits = logits[range(batch_size), sequence_length] | ||
if labels is not None: | ||
loss = self.criterion(logits, labels) | ||
loss = self.criterion(logits, labels) if labels is not None else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function GPT2TextClassification.forward
refactored with the following changes:
- Replace if statement with if expression (
assign-if-exp
) - Move assignment closer to its usage within a block (
move-assign-in-block
) - Move setting of default value for variable into
else
branch (introduce-default-else
)
optimizer = SGD(self.parameters(), lr=1e-6) | ||
return optimizer | ||
return SGD(self.parameters(), lr=1e-6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function GPT2TextClassification.configure_optimizers
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
for i, line in enumerate(f): | ||
for line in f: | ||
word, freq = line.split("\t\t") | ||
other_words = Normalizer.normalize(word) | ||
uts_words = text_normalize(word) | ||
if word != "nghiêng" and len(word) > 6: | ||
continue | ||
if other_words != word and other_words != uts_words: | ||
if other_words not in [word, uts_words]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function compare_two_tools
refactored with the following changes:
- Remove unnecessary calls to
enumerate
when the index is not used (remove-unused-enumerate
) - Replace multiple comparisons of same variable with
in
operator (merge-comparisons
)
new_s = "\n".join(result) | ||
return new_s | ||
return "\n".join(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function predict_sentence
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
for key in syllable_map_r: | ||
items = syllable_map_r[key] | ||
for key, items in syllable_map_r.items(): | ||
for item in items: | ||
syllable_map[item] = key | ||
NONE_DIACRITIC_SINGLE_VOWELS = set(["a", "e", "i", "o", "u", "y"]) | ||
NONE_DIACRITIC_DOUBLE_VOWELS = set([ | ||
"ai", "ao", "au", "ay", | ||
"eo", "eu", "ia", "ie", "iu", "oa", "oe", "oi", "oo", | ||
"ua", "ue", "ui", "uo", "uu", "uy", "ye" | ||
]) | ||
NONE_DIACRITIC_TRIPLE_VOWELS = set([ | ||
"iai", "ieu", "iua", "oai", "oao", "oay", "oeo", | ||
"uao", "uai", "uay", "uoi", "uou", "uya", "uye", "uyu", | ||
"yeu" | ||
]) | ||
NONE_DIACRITIC_SINGLE_VOWELS = {"a", "e", "i", "o", "u", "y"} | ||
NONE_DIACRITIC_DOUBLE_VOWELS = { | ||
"ai", | ||
"ao", | ||
"au", | ||
"ay", | ||
"eo", | ||
"eu", | ||
"ia", | ||
"ie", | ||
"iu", | ||
"oa", | ||
"oe", | ||
"oi", | ||
"oo", | ||
"ua", | ||
"ue", | ||
"ui", | ||
"uo", | ||
"uu", | ||
"uy", | ||
"ye", | ||
} | ||
|
||
NONE_DIACRITIC_TRIPLE_VOWELS = { | ||
"iai", | ||
"ieu", | ||
"iua", | ||
"oai", | ||
"oao", | ||
"oay", | ||
"oeo", | ||
"uao", | ||
"uai", | ||
"uay", | ||
"uoi", | ||
"uou", | ||
"uya", | ||
"uye", | ||
"uyu", | ||
"yeu", | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 81-95
refactored with the following changes:
- Use items() to directly unpack dictionary values (
use-dict-items
) - Unwrap a constant iterable constructor [×3] (
unwrap-iterable-construction
)
if group in NONE_DIACRITIC_VOWELS: | ||
miss_spell = False | ||
else: | ||
miss_spell = True | ||
miss_spell = group not in NONE_DIACRITIC_VOWELS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function AnalysableWord.__init__
refactored with the following changes:
- Simplify boolean if expression (
boolean-if-exp-identity
) - Remove unnecessary casts to int, str, float or bool (
remove-unnecessary-cast
) - Replace if statement with if expression (
assign-if-exp
)
dic = {} | ||
char1252 = 'à|á|ả|ã|ạ|ầ|ấ|ẩ|ẫ|ậ|ằ|ắ|ẳ|ẵ|ặ|è|é|ẻ|ẽ|ẹ|ề|ế|ể|ễ|ệ|ì|í|ỉ|ĩ|ị|ò|ó|ỏ|õ|ọ|ồ|ố|ổ|ỗ|ộ|ờ|ớ|ở|ỡ|ợ|ù|ú|ủ|ũ|ụ|ừ|ứ|ử|ữ|ự|ỳ|ý|ỷ|ỹ|ỵ|À|Á|Ả|Ã|Ạ|Ầ|Ấ|Ẩ|Ẫ|Ậ|Ằ|Ắ|Ẳ|Ẵ|Ặ|È|É|Ẻ|Ẽ|Ẹ|Ề|Ế|Ể|Ễ|Ệ|Ì|Í|Ỉ|Ĩ|Ị|Ò|Ó|Ỏ|Õ|Ọ|Ồ|Ố|Ổ|Ỗ|Ộ|Ờ|Ớ|Ở|Ỡ|Ợ|Ù|Ú|Ủ|Ũ|Ụ|Ừ|Ứ|Ử|Ữ|Ự|Ỳ|Ý|Ỷ|Ỹ|Ỵ'.split( | ||
'|') | ||
charutf8 = "à|á|ả|ã|ạ|ầ|ấ|ẩ|ẫ|ậ|ằ|ắ|ẳ|ẵ|ặ|è|é|ẻ|ẽ|ẹ|ề|ế|ể|ễ|ệ|ì|í|ỉ|ĩ|ị|ò|ó|ỏ|õ|ọ|ồ|ố|ổ|ỗ|ộ|ờ|ớ|ở|ỡ|ợ|ù|ú|ủ|ũ|ụ|ừ|ứ|ử|ữ|ự|ỳ|ý|ỷ|ỹ|ỵ|À|Á|Ả|Ã|Ạ|Ầ|Ấ|Ẩ|Ẫ|Ậ|Ằ|Ắ|Ẳ|Ẵ|Ặ|È|É|Ẻ|Ẽ|Ẹ|Ề|Ế|Ể|Ễ|Ệ|Ì|Í|Ỉ|Ĩ|Ị|Ò|Ó|Ỏ|Õ|Ọ|Ồ|Ố|Ổ|Ỗ|Ộ|Ờ|Ớ|Ở|Ỡ|Ợ|Ù|Ú|Ủ|Ũ|Ụ|Ừ|Ứ|Ử|Ữ|Ự|Ỳ|Ý|Ỷ|Ỹ|Ỵ".split( | ||
'|') | ||
for i in range(len(char1252)): | ||
dic[char1252[i]] = charutf8[i] | ||
return dic | ||
return {char1252[i]: charutf8[i] for i in range(len(char1252))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function loaddicchar
refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block
) - Inline variable that is immediately returned (
inline-immediately-returned-variable
) - Convert for loop into dictionary comprehension (
dict-comprehension
)
if x == 4 or x == 8: # ê, ơ | ||
if x in [4, 8]: # ê, ơ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function chuan_hoa_dau_tu_tieng_viet
refactored with the following changes:
- Replace multiple comparisons of same variable with
in
operator (merge-comparisons
)
if nguyen_am_index == -1: | ||
if nguyen_am_index == -1 or index - nguyen_am_index == 1: | ||
nguyen_am_index = index | ||
else: | ||
if index - nguyen_am_index != 1: | ||
return False | ||
nguyen_am_index = index | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function is_valid_vietnam_word
refactored with the following changes:
- Merge nested if conditions (
merge-nested-ifs
) - Lift code into else after jump in control flow (
reintroduce-else
) - Hoist nested repeated code outside conditional statements [×2] (
hoist-similar-statement-from-if
) - Swap positions of nested conditionals [×2] (
swap-nested-ifs
) - Swap if/else to remove empty if body (
remove-pass-body
) - Hoist repeated code outside conditional statement (
hoist-statement-from-if
) - Swap if/else branches (
swap-if-else-branches
)
if response.status_code not in [200, 302]: | ||
if "www.dropbox.com" in url: | ||
# dropbox return code 301, so we ignore this error | ||
pass | ||
else: | ||
raise IOError("HEAD request failed for url {}".format(url)) | ||
if response.status_code not in [200, 302] and "www.dropbox.com" not in url: | ||
raise IOError(f"HEAD request failed for url {url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_from_cache
refactored with the following changes:
- Merge nested if conditions (
merge-nested-ifs
) - Swap if/else to remove empty if body (
remove-pass-body
) - Replace call to format with f-string (
use-fstring-for-formatting
)
This removes the following comments ( why? ):
# dropbox return code 301, so we ignore this error
if use_slower_interval: | ||
Tqdm.default_mininterval = 10.0 | ||
else: | ||
Tqdm.default_mininterval = 0.1 | ||
Tqdm.default_mininterval = 10.0 if use_slower_interval else 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Tqdm.set_slower_interval
refactored with the following changes:
- Replace if statement with if expression (
assign-if-exp
)
if not all: | ||
if license == "Close": | ||
continue | ||
if not all and license == "Close": | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function ModelFetcher.list
refactored with the following changes:
- Merge nested if conditions (
merge-nested-ifs
)
if 0.0 <= score <= 1.0: | ||
self._score = score | ||
else: | ||
self._score = 1.0 | ||
self._score = score if 0.0 <= score <= 1.0 else 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Label.score
refactored with the following changes:
- Replace if statement with if expression (
assign-if-exp
)
return "{} ({})".format(self._value, self._score) | ||
return f"{self._value} ({self._score})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Label.__str__
refactored with the following changes:
- Replace call to format with f-string (
use-fstring-for-formatting
)
Sourcery Code Quality Report✅ Merging this PR will increase code quality in the affected files by 0.13%.
Here are some functions in these files that still need a tune-up:
Legend and ExplanationThe emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request. Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Help us improve this quality report! |
* Open file with encoding='utf-8'
undertheseanlpGH-560: Add encoding='utf-8' to fix "UnicodeDecodeError"
Branch
master
refactored by Sourcery.If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.
See our documentation here.
Run Sourcery locally
Reduce the feedback loop during development by using the Sourcery editor plugin:
Review changes via command line
To manually merge these changes, make sure you're on the
master
branch, then run:Help us improve this pull request!