Skip to content

Commit 81212f3

Browse files
committed
Update scancode_toolkit_alok.rst
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
1 parent d4d05d1 commit 81212f3

File tree

1 file changed

+162
-162
lines changed

1 file changed

+162
-162
lines changed
Lines changed: 162 additions & 162 deletions
Original file line numberDiff line numberDiff line change
@@ -1,162 +1,162 @@
1-
========================================================================
2-
Have variable license sections in license rules
3-
========================================================================
4-
5-
**Organization:** `AboutCode <https://aboutcode.org>`_
6-
7-
**Projects:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_
8-
9-
**Mentee:** `Alok Kumar (alok1304) <https://github.com/alok1304>`_
10-
11-
**Mentors:**
12-
13-
- `Philippe Ombredanne <https://github.com/pombredanne>`_
14-
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
15-
16-
Overview
17-
--------
18-
This project aims to enhance the `detection_log` by clearly indicating when `extra-words`
19-
are detected. These `extra-words` represent variable parts in the license rules, which
20-
previously caused the match score to fall below 100.
21-
22-
To address this issue, the implementation now verifies whether the `extra-words`
23-
appear in the correct position within the license text. If they do, the score is
24-
adjusted and improved accordingly, resulting in more accurate license rule matching.
25-
26-
--------------------------------------------------------------------------------
27-
28-
Implementation
29-
--------------
30-
31-
- **Enhanced the detection_log:**
32-
33-
- Display `extra-words` when they are detected.
34-
35-
- **Added extra-phrase marker like [[n]] for the extra-words:**
36-
37-
- The `extra-phrase` is denoted by double opening square brackets ``[[``
38-
and double closing square brackets ``]]``.
39-
- Here, `n` represents the maximum number of allowable `extra-words`.
40-
- The `extra-phrase` ``[[n]]`` is inserted in license rules at positions
41-
where `extra-words` may appear.
42-
- The value of `n` specifies how many `extra-words` are permitted
43-
at that location.
44-
45-
- **Improve Score:**
46-
47-
- Check whether `extra-words` appear in the correct position as defined by
48-
the `extra-phrase`, and ensure they do not exceed the maximum allowable limit.
49-
- If the conditions are satisfied, increase the match score to ``100``.
50-
51-
- **Shows in detection_log:**
52-
53-
- If the score is increased that means `extra-words` are in the correct
54-
position, then show ``extra-words-permitted-in-rule`` in the `detection_log`.
55-
- If the `extra-words` are at wrong place or exceed the maximum allowable limit,
56-
then show ``extra-words`` in the `detection_log`.
57-
58-
- **Testing:**
59-
60-
- Added tests for the `extra-phrase` functionality, such as
61-
`test_extra_phrase_tokenizer` and `test_extra_phrase_spans`, to ensure that
62-
phrases are correctly identified and processed.
63-
- Implemented multiple tests to verify that `extra-words` appear in the correct
64-
position according to the rules and that the match score is updated correctly
65-
when they are within the allowable limit.
66-
- Covered various edge cases where `extra-words` might be misplaced or exceed
67-
the maximum allowable count, ensuring the scoring and logging behave as expected.
68-
69-
Linked Pull Requests
70-
--------------------
71-
72-
.. list-table::
73-
:widths: 10 60 30 10
74-
:header-rows: 1
75-
76-
* - Sr. no
77-
- Name
78-
- Link
79-
- Status
80-
* - 1
81-
- Display `extra-words` in `detection_log` if present
82-
- `aboutcode.org/scancode-toolkit#4402
83-
<https://github.com/aboutcode-org/scancode-toolkit/pull/4402>`_
84-
- Merged
85-
* - 2
86-
- Improve score by supporting `extra_phrase` for `extra-words` in rules
87-
- `aboutcode.org/scancode-toolkit#4432
88-
<https://github.com/aboutcode-org/scancode-toolkit/pull/4432>`_
89-
- Open
90-
91-
Related Issues
92-
--------------
93-
94-
.. list-table::
95-
:widths: 10 60 30
96-
:header-rows: 1
97-
98-
* - Sr. no
99-
- Name
100-
- Link
101-
* - 1
102-
- `extra-words` does not show up in detection_log properly
103-
- `#4400
104-
<https://github.com/aboutcode-org/scancode-toolkit/issues/4400>`_
105-
* - 2
106-
- Improve score when `extra-words`` are found in the correct position
107-
- `#4420
108-
<https://github.com/aboutcode-org/scancode-toolkit/issues/4420>`_
109-
110-
Pre GSoC Work
111-
-------------
112-
113-
Before GSoC, I had contributed the following PRs:
114-
115-
- `Renaming the dependency attribute is_resolved to is_pinned
116-
<https://github.com/aboutcode-org/scancode-workbench/pull/638>`_
117-
- `Add test for all PyPI METADATA versions
118-
<https://github.com/aboutcode-org/scancode-toolkit/pull/4180>`_
119-
- `Add test for false positive GPL3 license
120-
<https://github.com/aboutcode-org/scancode-toolkit/pull/4106>`_
121-
- `Add new rules for EUPL license
122-
<https://github.com/aboutcode-org/scancode-toolkit/pull/4204>`_
123-
- `Add DUMB License and detection rule
124-
<https://github.com/aboutcode-org/scancode-toolkit/pull/4143>`_
125-
- `Fixing the dead link by cross-reference in the documentation
126-
<https://github.com/aboutcode-org/purldb/pull/550>`_
127-
128-
Post GSoC
129-
---------
130-
131-
I plan to continue contributing by adding `extra-phrase` support across many
132-
license rules. This will strengthen license detection by making it more accurate
133-
and flexible in handling variations within the rules.
134-
135-
Links
136-
-----
137-
138-
* `Project Idea
139-
<https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#have-variable-license-sections-in-license-rules>`_
140-
141-
* `Official GSoC project page
142-
<https://summerofcode.withgoogle.com/programs/2025/projects/EvCogGhq>`_
143-
144-
* `GSoC Proposal
145-
<https://docs.google.com/document/d/1vNgiO8g1RiKVym4qK_jVFsiUH2z5ztaz8Q5lW6NkRK0/edit?tab=t.0>`_
146-
147-
* `Project Board <https://github.com/orgs/aboutcode-org/projects/28>`_
148-
149-
Acknowledgements
150-
----------------
151-
152-
I would like to thank my mentors:
153-
154-
- `Philippe Ombredanne`_
155-
- `Ayan Sinha Mahapatra`_
156-
157-
A special thanks to my mentors who always supported me throughout this journey. Whenever
158-
I faced a problem, we discussed it in depth during our weekly status calls. Without
159-
their guidance and constant help, completing this project would not have been possible.
160-
161-
I also plan to explore more projects in AboutCode and contribute whenever I get
162-
time, because I would love to remain a part of this wonderful organization.
1+
========================================================================
2+
Have variable license sections in license rules
3+
========================================================================
4+
5+
**Organization:** `AboutCode <https://aboutcode.org>`_
6+
7+
**Projects:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_
8+
9+
**Mentee:** `Alok Kumar (alok1304) <https://github.com/alok1304>`_
10+
11+
**Mentors:**
12+
13+
- `Philippe Ombredanne <https://github.com/pombredanne>`_
14+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
15+
16+
Overview
17+
--------
18+
This project aims to enhance the `detection_log` by clearly indicating when `extra-words`
19+
are detected. These `extra-words` represent variable parts in the license rules, which
20+
previously caused the match score to fall below 100.
21+
22+
To address this issue, the implementation now verifies whether the `extra-words`
23+
appear in the correct position within the license text. If they do, the score is
24+
adjusted and improved accordingly, resulting in more accurate license rule matching.
25+
26+
--------------------------------------------------------------------------------
27+
28+
Implementation
29+
--------------
30+
31+
- **Enhanced the detection_log:**
32+
33+
- Display `extra-words` when they are detected.
34+
35+
- **Added extra-phrase marker like [[n]] for the extra-words:**
36+
37+
- The `extra-phrase` is denoted by double opening square brackets ``[[``
38+
and double closing square brackets ``]]``.
39+
- Here, `n` represents the maximum number of allowable `extra-words`.
40+
- The `extra-phrase` ``[[n]]`` is inserted in license rules at positions
41+
where `extra-words` may appear.
42+
- The value of `n` specifies how many `extra-words` are permitted
43+
at that location.
44+
45+
- **Improve Score:**
46+
47+
- Check whether `extra-words` appear in the correct position as defined by
48+
the `extra-phrase`, and ensure they do not exceed the maximum allowable limit.
49+
- If the conditions are satisfied, increase the match score to ``100``.
50+
51+
- **Shows in detection_log:**
52+
53+
- If the score is increased that means `extra-words` are in the correct
54+
position, then show ``extra-words-permitted-in-rule`` in the `detection_log`.
55+
- If the `extra-words` are at wrong place or exceed the maximum allowable limit,
56+
then show ``extra-words`` in the `detection_log`.
57+
58+
- **Testing:**
59+
60+
- Added tests for the `extra-phrase` functionality, such as
61+
`test_extra_phrase_tokenizer` and `test_extra_phrase_spans`, to ensure that
62+
phrases are correctly identified and processed.
63+
- Implemented multiple tests to verify that `extra-words` appear in the correct
64+
position according to the rules and that the match score is updated correctly
65+
when they are within the allowable limit.
66+
- Covered various edge cases where `extra-words` might be misplaced or exceed
67+
the maximum allowable count, ensuring the scoring and logging behave as expected.
68+
69+
Linked Pull Requests
70+
--------------------
71+
72+
.. list-table::
73+
:widths: 10 60 30 10
74+
:header-rows: 1
75+
76+
* - Sr. no
77+
- Name
78+
- Link
79+
- Status
80+
* - 1
81+
- Display `extra-words` in `detection_log` if present
82+
- `aboutcode.org/scancode-toolkit#4402
83+
<https://github.com/aboutcode-org/scancode-toolkit/pull/4402>`_
84+
- Merged
85+
* - 2
86+
- Improve score by supporting `extra_phrase` for `extra-words` in rules
87+
- `aboutcode.org/scancode-toolkit#4432
88+
<https://github.com/aboutcode-org/scancode-toolkit/pull/4432>`_
89+
- Open
90+
91+
Related Issues
92+
--------------
93+
94+
.. list-table::
95+
:widths: 10 60 30
96+
:header-rows: 1
97+
98+
* - Sr. no
99+
- Name
100+
- Link
101+
* - 1
102+
- `extra-words` does not show up in detection_log properly
103+
- `#4400
104+
<https://github.com/aboutcode-org/scancode-toolkit/issues/4400>`_
105+
* - 2
106+
- Improve score when `extra-words`` are found in the correct position
107+
- `#4420
108+
<https://github.com/aboutcode-org/scancode-toolkit/issues/4420>`_
109+
110+
Pre GSoC Work
111+
-------------
112+
113+
Before GSoC, I had contributed the following PRs:
114+
115+
- `Renaming the dependency attribute is_resolved to is_pinned
116+
<https://github.com/aboutcode-org/scancode-workbench/pull/638>`_
117+
- `Add test for all PyPI METADATA versions
118+
<https://github.com/aboutcode-org/scancode-toolkit/pull/4180>`_
119+
- `Add test for false positive GPL3 license
120+
<https://github.com/aboutcode-org/scancode-toolkit/pull/4106>`_
121+
- `Add new rules for EUPL license
122+
<https://github.com/aboutcode-org/scancode-toolkit/pull/4204>`_
123+
- `Add DUMB License and detection rule
124+
<https://github.com/aboutcode-org/scancode-toolkit/pull/4143>`_
125+
- `Fixing the dead link by cross-reference in the documentation
126+
<https://github.com/aboutcode-org/purldb/pull/550>`_
127+
128+
Post GSoC
129+
---------
130+
131+
I plan to continue contributing by adding `extra-phrase` support across many
132+
license rules. This will strengthen license detection by making it more accurate
133+
and flexible in handling variations within the rules.
134+
135+
Links
136+
-----
137+
138+
* `Project Idea
139+
<https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#have-variable-license-sections-in-license-rules>`_
140+
141+
* `Official GSoC project page
142+
<https://summerofcode.withgoogle.com/programs/2025/projects/EvCogGhq>`_
143+
144+
* `GSoC Proposal
145+
<https://docs.google.com/document/d/1vNgiO8g1RiKVym4qK_jVFsiUH2z5ztaz8Q5lW6NkRK0/edit?tab=t.0>`_
146+
147+
* `Project Board <https://github.com/orgs/aboutcode-org/projects/28>`_
148+
149+
Acknowledgements
150+
----------------
151+
152+
I would like to thank my mentors:
153+
154+
- `Philippe Ombredanne`_
155+
- `Ayan Sinha Mahapatra`_
156+
157+
A special thanks to my mentors who always supported me throughout this journey. Whenever
158+
I faced a problem, we discussed it in depth during our weekly status calls. Without
159+
their guidance and constant help, completing this project would not have been possible.
160+
161+
I also plan to explore more projects in AboutCode and contribute whenever I get
162+
time, because I would love to remain a part of this wonderful organization.

0 commit comments

Comments
 (0)