Skip to content

Many generated sentences contain unbalanced punctuation/markdown #1

@Deimos

Description

@Deimos

markovify actually throws out any sentences including quotes, parentheses or square brackets by default because they tend to end up unbalanced in the generated sentences. I overrode that behavior because it was removing a huge number of sentences from the training, like almost every single title in /r/relationships and most comments from /r/scenesfromahat. But by doing that I've ended up with the result it was trying to avoid - a lot of unmatched ones in the output.

Main things to try to fix with this:

  • Quotes - both double-quotes and single-quotes (need to distinguish from apostrophes)
  • Parentheses
  • Square brackets (especially as markdown link text)
  • Asterisks being used for bold and italic markdown

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions