Skip to content

Sai: Lowercase converter in Python#190

Open
b-sai wants to merge 6 commits intokscanne:masterfrom
b-sai:lowercase
Open

Sai: Lowercase converter in Python#190
b-sai wants to merge 6 commits intokscanne:masterfrom
b-sai:lowercase

Conversation

@b-sai
Copy link
Copy Markdown

@b-sai b-sai commented Feb 16, 2023

No description provided.

Copy link
Copy Markdown

@austin-carnahan austin-carnahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I had a hard time finding anything to improve on!

if letter == 'I':
lower_letter = 'ı'
elif language.startswith(('gd', 'gv', 'ga')):
if idx == 1 and (letter in ['A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú', "Ó"] or ord(letter) in [211]) and word[0] in ['n', 't'] and (len(word)-idx >= 2 and ord(word[idx+1]) != 771):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this conditional statement is a little long - it took me a minute to tease out what it's saying. Consider breaking it up on multiple lines or adding some inline documentation to explain.

elif language.startswith('el'):
if letter == 'Σ' and idx == len(word)-1:
lower_letter = 'ς'
elif language.startswith(("zh", "th", "ja")):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not going to save a ton of time, but if you aren't making any changes to the letters in these languages -- you don't need to loop through each character. Consider moving this conditional to the top of your loop and exiting early.

"""
result = ""

if language.startswith(("zh", "th", "ja")):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3-letter language code permitted in BCP-47, so "startswith" won't work here, e.g. "jam" is "Jamaican Creole English".

result = ""

if language.startswith(("zh", "th", "ja")):
return word.lower()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the point was in these cases to not bother calling lower() as an optimization.

if language == 'tr' or language == 'az':
if letter == 'I':
lower_letter = 'ı'
elif language.startswith(('gd', 'gv', 'ga')):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above; this won't work.

is_2nd_letter = idx == 1
is_exception_letter = letter in [
'A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú', "Ó"]
is_letter_o_latin = ord(letter) in [211]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers, here and 771 below! Unreadable.

word, language, actual = test.split("\t")
predicted = to_lowercase(word, language)
if predicted != actual:
print(f"COuldn't convert {word} in {language}!")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo.

is_letter_o_latin = ord(letter) in [211]
is_beginning_exception = word[0] in ['n', 't']
is_not_last = len(word)-idx > 1
if is_2nd_letter and (is_exception_letter or is_letter_o_latin) and is_beginning_exception and (is_not_last and ord(word[idx+1]) != 771):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unicode business needs a bit more work; will discuss in class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants