PR: classes and subclasses by ElmoGeeraerts · Pull Request #4 · montesmariana/intro_machine_learning_using_python

ElmoGeeraerts · 2023-05-14T13:50:59Z

Hi Mariana,

I started on the assignment. The idea is to write a script that helps Translation Project Managers add vendors to a local database and keep a clear overview of preferred/back-up/potential classes. I have created a class and two subclasses

Best regards,
Elmo

montesmariana · 2023-05-16T08:33:24Z

vendor_data.py

+                VendorMail should be a string, if not a TypeError is raised.
+                TargLang should be a string, if not a TypeError is raised.
+                WordRate should be a float, if not a TypeError is raised.
+                WordRate should be > 0.15, if not a ValueError is raised.


WordRate should NOT be > 0.15, right?

montesmariana · 2023-05-16T08:33:48Z

vendor_data.py

+                TargLang should be a string, if not a TypeError is raised.
+                WordRate should be a float, if not a TypeError is raised.
+                WordRate should be > 0.15, if not a ValueError is raised.
+                CatTool should be a string, if not a TypeError is raised.


This one can probably have a limited range of options, right?

montesmariana

Possibility of using input():

done = "no"
name = ""
feeling = "super sad"
while done.lower() != "yes":
    new_name = input(f"What's your name? (Previous value: {name}) ")
    name = new_name if new_name else name
    feeling = input(f"What are you feeling? (Previous value: {feeling}) ")
    done = input("Are you done? ")
print(f"{name} says: '{feeling}'")

montesmariana · 2023-05-23T06:53:51Z

vendor_data.py

+        self.VendorMail = VendorMail
+        self.CatTool = CatTool
+
+        regex_mail = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$" #regex used to validate email


Have you tested this? It might need an r before the string to make it a raw string.

vendor_data.py

montesmariana · 2023-05-23T06:55:09Z

vendor_data.py

+            raise TypeError("TargLang should be a string!")
+        if type(WordRate) != float:
+            raise TypeError("WordRate should be a float!")
+        if WordRate:


if you don't have WordRate, line 64 is already going to fail. In any case, WordRate isn't optional, so there will always be WordRate...

montesmariana · 2023-05-23T06:56:18Z

vendor_data.py

+            raise TypeError("WordRate should be a float!")
+        if WordRate:
+            if WordRate > 0.15:
+                raise ValueError("This vendor is too expensive, consider another one.")


Should this be an error or a warning?
An error stops execution and the vendor cannot be added. A warning lets you add the vendor but just warns the user that something may be wrong. I ask this because the phrasing of the message "consider another one" is more warning-like than error-like.

montesmariana · 2023-05-23T06:56:58Z

vendor_data.py

+            "WordRate" (float): Vendor's word rate for new words in Euro
+            "CatTool" (str): Vendor's preferred CAT-tool
+            """
+        self.VendorName = VendorName


Assigning the arguments to the attributes should be done AFTER validation.

(Or during, which is what you are doing at the validation.)

montesmariana · 2023-05-23T06:58:11Z

vendor_data.py

+        if type(VendorMail) != str:
+            raise TypeError ("VendorMail should be a string!")
+        if VendorMail:   
+            if(re.search(regex_mail,VendorMail)):


(Just: if type(VendorMail) should be inside the if VendorMail part, because it fails if there is no VendorMail)

Ok, now that I see a bit below, lines 78-81 can be replaced with self.set_mail(VendorMail), and line 45 can be deleted.

montesmariana · 2023-05-23T06:59:57Z

vendor_data.py

+        if CatTool in VendorData.CatTools:
+            self.CatTool = CatTool
+        else:
+            raise ValueError("This CAT tool is not valid. Run 'VendorData.CatTools' to check options.")


montesmariana · 2023-05-23T07:04:23Z

vendor_data.py

+            df.to_excel(f"{filename}.xlsx", index=False, sheet_name="VendorData")
+        else:
+            with pd.ExcelWriter(f"{filename}.xlsx", mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
+                df.to_excel(writer, sheet_name = "VendorData", startrow=writer.sheets["VendorData"].max_row, header = None, index=False)


Looks good!
I would split lines 109 and 110 in different lines like you did with the dictionary right above.
And I would approach this mergin in a different, safer way. (To talk in person)

montesmariana · 2023-05-23T10:08:13Z

vendor_data.py

+    parser.add_argument("--WordRate", type=float, help="The vendor's word rate in Euro")
+    parser.add_argument("--VendorMail", type=str, help="The vendor's email address")
+    parser.add_argument("--CatTool", type=str, help="The CAT-tool the vendor wants to use")
+    args = parser.parse_args()


You can create a flag argument with the action attribute: https://docs.python.org/3/library/argparse.html#action
For example:

parser.add_argument('-a', '--add', action='store_true')

Then args.add is True if the user calls -a or --add and False otherwise.

montesmariana · 2023-05-30T06:58:21Z

vendor_data.py

+                raise ValueError("This vendor is too expensive, pick another one.")
+            if WordRate == 0.00:
+                raise ValueError("Word rate cannot be 0.00.")
        else:


This else is aligned with if WordRate, meaning that self.WordRate = WordRate only runs if you don't have WordRate.

You should actually remove the else and leave self.WordRate = WordRate aligned with the ifs in lines 77-81.
That way, the logic goes like so:

if there is a value for WordRate:

if it is not a float, throw an error

(otherwise) if it is larger than 0.15, throw an error

(otherwise) if it is equal to 0, throw an error

(otherwise) assign its value to self.WordRate
Because the if statements throw an error, you don't need elif or else, you can just assume that if there was no error it will move forward.

montesmariana · 2023-06-06T08:03:25Z

README.md

+The repository contains:
+- The `README.md` file which describes the repository and illustrates how to use the code as a module and by running the script;
+- The `tutorial.md` and `tutorial.ipynb` files which show how to actually use the code and the different functionalities;
+- Two `xlsx` files which were generated while creating the `tutorial.ipynb` file. Be aware that if you run the code in the tutorial notebook, some data will be duplicated in the excel if you do not change anything. To avoid this you can delete the two excel files from the directory or you can change the data in the tutorial code to fit your needs.;


You could also add a line of code yourself to make sure the file is deleted when it should be:

import os os.remove('path/to/the/file.xlsx')

montesmariana · 2023-06-06T08:04:53Z

README.md


 ```python
-%run vendor_data.py <example.json>
+%run vendor_data.py <"-a/--add or -m/--modify">


In both cases, because you have two mutually-exclusive arguments, I would make to chunks to show. That way, a user can copy your code. So: one line showing how to call your script to add an entry, and another line showing how to call your script to modify an entry.

montesmariana · 2023-06-06T08:05:19Z

README.md

-In both cases `example.json` stands for the `filename` argument that the script needs. You can use [the file in this repository](example.json) or a similar file of yours. Find more information on how this script works with:
+Running vendor_data.py without either the argument -a (to add a new vendor) or -m (to modify an existing vendor) will not do anything. The argument -a prompts the user to add a new vendor. The data is provided by answering some questions regarding the vendor and the project. The argument -m prompts the user to modify an existing vendor. Again the data is provided by answering some questions regarding the vendor and the project.
+
+For more information you can run the command below, or check in the notebook tuturial.ipynb in this repository under "Running `vendor_data.py`".


montesmariana · 2023-06-06T08:05:49Z

README.md

+1. Easily create an excel file with the project name and the source language as filename;
+2. Write the following data to the excel file: the target language and the translator's name, e-mail, word rate, preferred CAT tool and the status of the translator;
+3. Modify the following data for the vendors already in the excel file: e-mail, word rate, preferred CAT tool and status.
+The main advantage of this script is that it limits the posibilities of CAT tools and statuses and that it validates e-mail and word rate (betw. 0.01 and 0.15). This prevents the excel file from becoming messy when different people are working on the project.


montesmariana · 2023-06-06T08:06:06Z

README.md

+3. Modify the following data for the vendors already in the excel file: e-mail, word rate, preferred CAT tool and status.
+The main advantage of this script is that it limits the posibilities of CAT tools and statuses and that it validates e-mail and word rate (betw. 0.01 and 0.15). This prevents the excel file from becoming messy when different people are working on the project.
+
+## What this script CANNOT do


Very good caveat

montesmariana · 2023-06-06T08:07:35Z

vendor_data.py

+import os #os package to check if files exists
+import pyinputplus as pyip #pyinputplus package to facilitate providing arguments
+
+#4 methods to make validation easier


outside of a class: function
inside of a class: method

montesmariana · 2023-06-06T08:08:54Z

vendor_data.py

+        raise TypeError("WordRate should be a float!")
+    if WordRate > 0.15:
+        raise ValueError("This vendor is too expensive, pick another one.")
+    if WordRate == 0.00:


What if it's -0.04?

montesmariana · 2023-06-06T08:09:28Z

vendor_data.py

+    if type(CatTool) != str:
+        raise TypeError("CatTool should be a string!")
+    if not CatTool in ["XTM", "Trados Studio", "MemoQ", "Memsource"]:
+            raise ValueError("This CAT tool is not valid. Run 'VendorData.CatTools' to check options.") 


fix indentation

montesmariana · 2023-06-06T08:10:53Z

vendor_data.py

+    """
+    if type(CatTool) != str:
+        raise TypeError("CatTool should be a string!")
+    if not CatTool in ["XTM", "Trados Studio", "MemoQ", "Memsource"]:


To avoid repetition (if you wanted to add/remove a CatTool, now you have to do it in two places) either provide the list of CatTools as an argument of this function or call it as VendorData.CatTools.

montesmariana · 2023-06-06T08:13:12Z

vendor_data.py

+        self.WordRate = WordRate
+        self.VendorMail = VendorMail
+        self.CatTool = CatTool
+        self.Preferred = Preferred


The whole __init__() looks much better and cleaner now :)

montesmariana · 2023-06-06T08:17:37Z

vendor_data.py

+                raise ValueError("Invalid index, run 'self.ReadExcel()' to see options")
+            if Key == "E-mail": #validate e-mail address if a value for the key E-mail is modified
+                CheckVendorMail(NewValue)
+            if Key == "Word Rate": #validate word rate if a value for the key Word Rate is modified


For all the Key ==... checks, I would chain them with elif. Otherwise, it will check if Key == "Word Rate" even if it already say that it is "E-mail"

montesmariana · 2023-06-06T08:20:22Z

vendor_data.py

+        Vndr=VendorData(ProjectName, SourceLang, TargLang, VendorName, VendorMail, WordRate, CatTool, Preferred)
+        Excel = pyip.inputStr("Do you want to add this vendor to the excel file? ") #user gets the chance to write data to excel
+        if Excel.lower() == "yes":
+            Vndr.ToExcel() #data written to excel


I would add something like:

print("Vendor was added.") else: print("Action cancelled.")

So the user has confirmation of what happened

montesmariana · 2023-06-06T08:22:57Z

vendor_data.py

+        else:
+            raise ValueError(f"There is no file for {self.ProjectName} in {self.SourceLang}")
+
+    def ModExcel(self, Key, Index, NewValue):


Good job with this method.
BUT. This means that if a user has changed two or three values in an entry, you will open and close the Excel file two or three times, one for each value that you want to change.
If you think that users will not change more than one thing at a time, it's ok, but it might make more sense to collect all the new values and update them in one opening-closing of the file.

montesmariana · 2023-06-06T08:24:27Z

tutorial.md

+
+```python
+import sys
+sys.path.append ('vendor_data.py') #this is the relative path to the script. If the script is not directly next to the notebook, it could be better to copy the full path of the notebook.


no, you don't add the path TO the script, but to the folder where the script lives

(It seems to work anyways, but normally you append the directory)

montesmariana · 2023-06-06T08:31:13Z

tutorial.md

+
+### Docstrings
+The script is completely documented with docstrings in order to ensure that the script is correclty used.
+You can read these docstrings by calling one of the following functions:


Do not leave this output, it is too much automatic text for a tutorial.
You can show the code without running it by adding it inside the Markdown cell, with ```python right before and ``` afterwards.
For the tutorial, think of what you would like to find as a user that doesn't know about the module and is just starting. If anything, this kind of "here you have more information" could go at the end, but starting a tutorial with this can be a bit offputting for a user.

montesmariana · 2023-06-06T08:32:56Z

tutorial.md

+
+
+
+Some arguments had default arguments. When we check the e-mail for VendorNL, for which we didn't provide one, we'll see that it does not return anything:


Some arguments have default values.

I wouldn't say an empty string is a default value from the user's perspective. It's rather allowed to be empty.

montesmariana · 2023-06-06T08:33:41Z

tutorial.md

+
+
+
+The value for the argument Preferred can be True, False or None. This has an influence of the vendor's status.


montesmariana · 2023-06-06T08:34:27Z

tutorial.md

+
+
+
+### Functions


Functions inside a class are called "methods".

montesmariana · 2023-06-06T08:35:41Z

tutorial.md

+
+
+
+We can now write the data for the vendors to an excel file. Since every vendor has the value for the argument `ProjectName` and `SourceLang`, they will be written to the same excel file. We should execute this function for every vendor. Be careful because if you run this function multiple times for the same vendor, it will not overwrite but append and you will have duplicates in the excel file.


Since every vendor has the same value...

The fact that you can generate duplicates is a weakness in the code.
You're doing a great job, so if you feel that you cannot fix this yet (you could, by checking for the existence of an entry before adding it), at least add it to the list of things that could/should be improved in the future.
You could also make this very explicit in the markdown:

<div class="alert alert-danger"> If you run `.toExcel()` multiple times on the same vendor, it will generate multiple entries! </div>

Which generates the following:

montesmariana · 2023-06-06T08:41:23Z

tutorial.md

+
+
+```python
+vd.VendorData.Keys


montesmariana · 2023-06-06T08:42:24Z

tutorial.md

+
+
+```python
+VendorES.CatTool


Mmm I find this counterintuitive. I would have the code update the class itself as well.

montesmariana

Excellent work!

Elmo and others added 4 commits May 2, 2023 11:33

upgrade README

28f8263

Update Tutorial

f414458

Update README.md

1e1b710

Add files via upload

964abfe

montesmariana reviewed May 16, 2023

View reviewed changes

ElmoGeeraerts added 5 commits May 16, 2023 12:36

Update vendor_data.py

5248266

Update vendor_data

f7a2a44

update vendor_data

9de2efc

Update vendor_data

9694930

Update vendor_data

4e4212d

ElmoGeeraerts marked this pull request as draft May 21, 2023 16:39

Update vendor_data

1fac65a

montesmariana reviewed May 23, 2023

View reviewed changes

vendor_data.py Show resolved Hide resolved

montesmariana reviewed May 23, 2023

View reviewed changes

ElmoGeeraerts added 6 commits May 27, 2023 16:24

Update vendor_data.py

46317ba

Update vendor_data

4585991

Update vendor_data

e101c0f

Update README & vendor_data

f6557b7

Update vendor_data

4a2a485

Update vendor_data

465b5f2

montesmariana reviewed May 30, 2023

View reviewed changes

ElmoGeeraerts added 5 commits June 3, 2023 13:05

Update vendor_data

fe13379

Update vendor_data

4ede2b1

General Update

f5869e5

General Update

ee3cd4a

General Update

8d3baa4

montesmariana reviewed Jun 6, 2023

View reviewed changes

tutorial.md Outdated

### Functions

Copy link
Copy Markdown

Owner

montesmariana Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions inside a class are called "methods".

montesmariana reviewed Jun 6, 2023

View reviewed changes

tutorial.md

```python

vd.VendorData.Keys

Copy link
Copy Markdown

Owner

montesmariana Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good!

montesmariana reviewed Jun 6, 2023

View reviewed changes

ElmoGeeraerts added 3 commits June 11, 2023 12:55

General Update

11ddd3a

Final update

fff69c0

Final update

a671048




		Some arguments had default arguments. When we check the e-mail for VendorNL, for which we didn't provide one, we'll see that it does not return anything:




		The value for the argument Preferred can be True, False or None. This has an influence of the vendor's status.




		We can now write the data for the vendors to an excel file. Since every vendor has the value for the argument `ProjectName` and `SourceLang`, they will be written to the same excel file. We should execute this function for every vendor. Be careful because if you run this function multiple times for the same vendor, it will not overwrite but append and you will have duplicates in the excel file.



		```python
		VendorES.CatTool

Conversation

ElmoGeeraerts commented May 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

montesmariana left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

montesmariana May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

montesmariana May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

montesmariana May 23, 2023 •

edited

Loading

montesmariana May 23, 2023 •

edited

Loading