Skip to content

"comparison of Float with NaN failed"...and GSL is Installed #8

@tra38

Description

@tra38

While trying to fix an unrelated issue, I experimented with the code from #5, but using ZombieWriter::MachineLearning rather than ZombieWriter::Randomization.

zombie = ZombieWriter::MachineLearning.new

zombie.add_string(content: "This is filler text that I invented.This is also a paragraph that could be used")
zombie.add_string(content: "This post is amazing. Please take a look")
zombie.add_string(content: "For all sports fan, you must watch this video. Hey you have to check this out.")

array = zombie.generate_articles

p array

#/Users/tariqali/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/kmeans-clusterer-0.11.4/lib/kmeans-clusterer.rb:237:in `sort_by': comparison of Float with NaN failed (ArgumentError)

The culprit is the third string. Classifier-Reborn classified its lsi_norm as a vector of NaNs...

 "For all sports fan, you must watch this video. Hey you have to check this out.\n"=>
  #<ClassifierReborn::ContentNode:0x007fdec4b25ae8
   @categories=[],
   @lsi_norm=GSL::Vector
[   nan   nan   nan   nan   nan   nan   nan ... ],
   @lsi_vector=GSL::Vector
[ 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @raw_norm=GSL::Vector
[ 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @raw_vector=GSL::Vector
[ 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @word_hash={:for=>1, :sport=>1, :fan=>1, :must=>1, :watch=>1, :video=>1, :hei=>1, :check=>1, :out=>1}>}

Changing the third string slightly resolves the issue.

zombie = ZombieWriter::MachineLearning.new

zombie.add_string(content: "This is filler text that I invented.This is also a paragraph that could be used")
zombie.add_string(content: "This post is amazing. Please take a look")
zombie.add_string(content: "For all sports fan, you must watch this video. Hey you have to check this out. Filler, filler, filler.")

array = zombie.generate_articles

p array
 "For all sports fan, you must watch this video. Hey you have to check this out. Filler, filler, filler.\n"=>
  #<ClassifierReborn::ContentNode:0x007fd931432fd0
   @categories=[],
   @lsi_norm=GSL::Vector
[ 6.205e-01 1.432e-01 1.432e-01 1.432e-01 1.432e-01 1.432e-01 0.000e+00 ... ],
   @lsi_vector=GSL::Vector
[ 6.593e-01 1.522e-01 1.522e-01 1.522e-01 1.522e-01 1.522e-01 0.000e+00 ... ],
   @raw_norm=GSL::Vector
[ 5.547e-01 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @raw_vector=GSL::Vector
[ 6.272e-01 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 ... ],
   @word_hash={:for=>1, :sport=>1, :fan=>1, :must=>1, :watch=>1, :video=>1, :hei=>1, :check=>1, :out=>1, :filler=>3}>}

But why? Both scenarios appeared to have a @word_hash, so it isn't quite clear why one string had a vector of NaNs and the other one doesn't. Is it because in the second scenario, the third string had words that were similar to that of the first string? I will have to research this issue more carefully and decide how to gracefully handle this potential error.

This problem is probably not likely to happen in the real-world...if you add long passages to ZombieWriter, there's bound to be a few overlaps of words that classifier-reborn can detect. But it could happen...which is why I need to figure out how to fix it.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions