Skip to content

[Enhancement]Create a “Plain Language Score” #1

@bashandbone

Description

@bashandbone

Category

new license or license feature/enhancement

Feature

Develop a plain language score to reliably measure writing “plainness”

Use Case

We could use it for educating and highlighting differences between licenses, rewarding plainness “in the wild” and to hold ourselves accountable with CI/CD scanning

Benefit

Make the world a little bit easier to understand.

Alternatives

There are many measures of readability, such as Flesch-Kincaid, Gunning-Fog, SMOG. I don’t feel these wholly capture the important parts of clear and accessible writing.

Impact

Drive us to write more plainly, highlight efforts (or non-efforts) at plainness, and maybe even create a standard measure for measuring plain writing.

Resources

There’s a decent body of research on readability.

Contact Details

adam@plainlicense.org

Additional Information

Enhancement Proposal: Plain Language Score

We need a way to measure how easy licenses are to read. Let's call it the "Plain Language Score". Here's what I'm thinking:

What's the idea?

We want to create a fair way to measure the readability of all licenses, including ours. This score would help us show how much clearer our licenses are compared to the originals. It could also show the benefits of plain writing and help us reward and recognize when people do it well.

What might the score look like?

Here are some ideas for what the score could measure:

  1. Use of legal jargon
  2. Sentence complexity
  3. Organization (Use of headers, lists/bullets, and paragraph density)
  4. Ratio of abstract to concrete words
  5. A traditional readability score (like Flesch-Kincaid)

These are just ideas! We can change any part of this - what we measure, how much each part counts, or how we measure it. The main goal is to have a clear and fair way to score license readability. I also think there is an opportunity to modernize readability measures with NLP techniques.

How would we build it?

We'd make this a separate project from Plain License, but affiliated with it. It would have:

  • Its own repository
  • Version numbers for both the code and the scoring system
  • Clear documentation on how it works

How would we manage changes?

Once we're out of the testing phase, we should think about how we approve changes to the score. We want it to be reliable, so we can't just change it whenever we want. Maybe we could have a group of experts review big changes, or a proposal review process?

What's next?

If you like this idea, we could:

  1. Start a new repository for this project
  2. Begin working on a basic version of the score
  3. Test it on different licenses to see how it works
  4. Ask for feedback from our community and legal experts

Example: How it might work

Let's look at a quick example of how this score might work. We'll use a 1-20 scale, where lower is better (like grade levels).

Here's a snippet from a made-up license:

Original: "The Licensee shall not utilize the Software in any manner that may impair the integrity or performance of the system."

Our version: "You may not use the work in a way that could damage or slow down the service."

Let's score them:

  1. Legal Jargon (30%):

    • Original: Lots of jargon (15/20)
    • Ours: No jargon (2/20)
  2. Sentence Variety (25%):

    • Original: One complex sentence (14/20)
    • Ours: One simple sentence (5/20)
  3. Document Organization (25%):

    • Both: N/A for this short example
  4. Concrete Language (10%):

    • Original: Mostly abstract (16/20)
    • Ours: Mostly concrete (4/20)
  5. Flesch-Kincaid (10%):

    • Original: Grade 16
    • Ours: Grade 6

Putting it all together (with some rounding):

Original: (15 * 0.3) + (14 * 0.25) + (16 * 0.1) + (16 * 0.1) = 10.9
Our version: (2 * 0.3) + (5 * 0.25) + (4 * 0.1) + (6 * 0.1) = 2.85

Final scores:

  • Original: 11 (rounded up, about 11th grade)
  • Our version: 3 (rounded up, about 3rd grade)
    [my scoring system here clearly needs work]

Again, this is just an example - we can adjust how we calculate this as we develop the system. Some other things we could potentially include in our measure:

  • ratio of active vs passive voice
  • Frequency of conditional statements (e.g. if-then). I found these were the most difficult to translate to plain language.
  • visual accessibility (as a component of organization). This could be a lot of things, to name a few: frequency of headers, length of median text blocks, frequency of bullets.
  • Consistent word choice. How well it avoids using multiple synonyms in the same text.
  • Clarity of definitions. If it includes a definitions section, I think there should be a higher penalty if the definitions suffer from the same inaccessible writing...

Let's discuss and refine this idea together!

We could also use it in our CI/CD process to enforce a target score, or identify parts of the site that we need to improve for clarity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or requesthelp wantedExtra attention is neededquestionFurther information is requested

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions