change csv to deal in strings now and convert appended values into strings by wfsyre · Pull Request #625 · gtri/scrimmage

wfsyre · 2026-01-02T14:52:33Z

Right now, the CSV writer class can only handle writing doubles to the csv table object. These get converted into strings when writing to the actual CSV. To simplify things, the CSV writer should just maintain a map of strings, which allows us to write any data type we want to the CSV as long as we convert it to a string first. It introduces some extra pre-processing we have to do to recover data from a CSV, but I think the schema of get<datatype>() is already well established in scrimmage so its not a large burden. This also allows people the added control of determining how their data should be turned into a string instead of leaving it to the CSV writer to determine. I have left the scientific vs fixed schema in there but technically all that should be done before the CSV writer even sees the value. The Visitor stuff is a bit of a hacky way to leave some backward compatibility for the old Append method that only took a double.

…rings

jblee123 · 2026-01-08T14:30:59Z

include/scrimmage/common/CSV.h

+    std::string operator()(size_t value) const { return std::to_string(value); }
+    std::string operator()(long value) const { return std::to_string(value); }
+    std::string operator()(const std::string& value) const { return value; }
+    std::string operator()(bool value) const { return value ? "True" : "False"; }


I'd opt for "true" and "false" (not capitalized). There's not a huge reason for this, but there's also not a huge reason for capitalizing, so my "break the tie" reasoning" is that if people are ever typing it, that's one less thing to mess up. If it's read case-insensitive (which I would hope it is), then no reason to throw in a capital and make people wonder. Python is the only thing I know of that has case-sensitive "True" and "False". No reason to bring that into a CSV.

jblee123 · 2026-01-08T14:43:29Z

include/scrimmage/common/CSV.h

 public:
    typedef std::list<std::string> Headers;
-    typedef std::list<std::pair<std::string, double>> Pairs;
+    typedef std::variant<bool, size_t, unsigned int, long, int, double, std::string>


Given possible ripple effects, I'm not sure if what I'm about to say is feasible. Actually, if all you had was double before, then all these options should be new, so it wouldn't affect existing code any differently, so I'll go with that assumption. Try to erase unsized types like unsigned int, long, int, etc. from usage. Prefer sized types. What's an int? It has a minimum size, but you don't really know because those are implementation-defined. Same for long. I'm guessing for int you're expecting a 32-bit value, so use int32_t. I'm guessing long is expected to be a 64-bit value, so use int64_t.

And if these are for data files, you can probably get away without a size_t. Depends on what you're expecting to represent. A size_t isn't "how many things", but is specifically a value used to represent memory sizes, so it's tied to the size of a pointer. That's why they're used as offsets in arrays -- not because they're counting items, but because they're being used as memory offsets. So while you use it in code, I'm not sure if you'd want that kind of semantic exposed as an option in a data file.

jblee123 · 2026-01-08T14:51:43Z

include/scrimmage/common/CSV.h


-    double at(int row, const std::string& header);
+    template <class T1>
+    T1 at(int row, const std::string& header) {


see -- now this is where I'd use a size_t, because it's an index. But this ties into existing underlying definitions, so I wouldn't change that now. (Actually, taking a closer look... the table uses an int as a key into a map... wut? Why is it not a vector? And can you have negative column/row indexes? Weird... still -- not something for this update.)

jblee123 · 2026-01-08T14:54:16Z

src/common/CSV.cpp

 bool CSV::append(const Pairs& pairs, bool write, bool keep_in_memory) {

-    for (std::pair<std::string, double> pair : pairs) {
+    for (auto pair : pairs) {


The existing code was making a copy of the pair every time, and the current code still does that. You want to iterate over the pairs by reference, so auto&, and if you're not changing anything, which you're not since pairs is const, then const auto&.

jblee123 · 2026-01-08T15:09:13Z

src/common/CSV.cpp

-        } else {
-            values[kv.first] = std::to_string(kv.second);
-        }
+        values[kv.first] = kv.second;


This will work, but check the fuller context. Originally, the logic was: 1) build a vector of converted strings, 2) build an aggregate string out of those. You already have the strings, though, so your values vector here is really just making a bunch of copies. You could get rid of the copies by making it a vector of string ptrs (or references, but it'd have to be of reference_wrappers because you can't make a vector of raw references, but ptrs would do fine in this case). But you can get rid of values altogether; it was never really needed in the first place since you can append to the final string as you iterate. No real need for an intermediate list. And continually appending to a string like this isn't the most efficient since it's having to re-allocate a bigger and bigger string every time. Make result a std::ostringstream of appending is more efficient to the stream, and then return result.str() so it's converted only once at the end.

jblee123 · 2026-01-08T15:21:23Z

src/common/CSV.cpp

            boost::split(tokens, line, boost::is_any_of(","));
            for (unsigned int i = 0; i < tokens.size(); i++) {
-                table_[row_num][i] = std::stod(tokens[i]);
+                table_[row_num][i] = tokens[i];


This is going to be the most annoying thing, and is going to tie into CSV::row_to_string()... what happens if somebody writes a string that contains a comma? Look at the specification section at https://en.wikipedia.org/wiki/Comma-separated_values, and it tells you how to handle that situation. Any time you decide to allow strings in data files, that opens up "fun" edge cases that you have to think about. What about newlines? Carriage returns? Maybe CSV::append() needs to have more failure cases.

Note that when you handle things like escaped double quotes, those need to be handled via linearly scanning through the string; you can't do global substring substitutions because that may not work the way you expect it to for some cases.

wfsyre · 2026-03-30T16:19:29Z

Closed by accident. Reopened as a new MR here #630

change csv to deal in strings now and convert appended values into st…

1854c6c

…rings

wfsyre force-pushed the feature-write_strings_with_csv_writer branch from 22ea269 to 1854c6c Compare January 2, 2026 16:48

jblee123 reviewed Jan 8, 2026

View reviewed changes

requested changes

55a5296

wfsyre closed this Mar 30, 2026

wfsyre deleted the feature-write_strings_with_csv_writer branch March 30, 2026 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change csv to deal in strings now and convert appended values into strings#625

change csv to deal in strings now and convert appended values into strings#625
wfsyre wants to merge 2 commits intoUbuntu-22.04from
feature-write_strings_with_csv_writer

wfsyre commented Jan 2, 2026 •

edited

Loading

Uh oh!

jblee123 Jan 8, 2026

Uh oh!

jblee123 Jan 8, 2026

Uh oh!

jblee123 Jan 8, 2026

Uh oh!

jblee123 Jan 8, 2026

Uh oh!

jblee123 Jan 8, 2026

Uh oh!

jblee123 Jan 8, 2026

Uh oh!

wfsyre commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wfsyre commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jblee123 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jblee123 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jblee123 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jblee123 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jblee123 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jblee123 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

wfsyre commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wfsyre commented Jan 2, 2026 •

edited

Loading