This is related to issue #46. Trying to use pandas.read_csv to read the tps4 neg data I get this syntax error:
>>> df = pd.read_csv('LCdata/neg/Tps4_neg_withAddedDetails.2015.02.05.csv')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 225, in _read
return parser.read()
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 626, in read
ret = self._engine.read(nrows)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1070, in read
data = self._reader.read(nrows)
File "parser.pyx", line 727, in pandas.parser.TextReader.read (pandas/parser.c:7110)
File "parser.pyx", line 749, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7334)
File "parser.pyx", line 802, in pandas.parser.TextReader._read_rows (pandas/parser.c:7943)
File "parser.pyx", line 789, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:7817)
File "parser.pyx", line 1697, in pandas.parser.raise_parser_error (pandas/parser.c:19569)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 50 fields in line 2510, saw 51
That line of the CSV file looks like this
2509,770.8654788,770.8654788,770.8654788,85.3971,85.3971,85.3971,1,1,7192.340903,2403.965241,22070.52672,187320.4289,142664.2457,214115.0429,10416.10935,21067.41343,69994.9968,1105750.95,72005.57328,30897.23792,21990.54678,34453.52575,0,64901.8607,6047.34812,0,0,0,19950.28018,171732.1797,0,204246.1387,11901.23666,1126.787846,0,176015.1765,6520.571956,0,0,2235.663193,3300.804818,0,15898.08514,243419.2922,,[M-2H+Na]- 749.885,24,0,organo-iodine compound, PubChem CID 11535056
Looks like the annotated column is supposed to contain organo-iodine compound, PubChem CID 11535056, but that value has a comma in it so the correct CSV formatting for this line is:
2509,770.8654788,770.8654788,770.8654788,85.3971,85.3971,85.3971,1,1,7192.340903,2403.965241,22070.52672,187320.4289,142664.2457,214115.0429,10416.10935,21067.41343,69994.9968,1105750.95,72005.57328,30897.23792,21990.54678,34453.52575,0,64901.8607,6047.34812,0,0,0,19950.28018,171732.1797,0,204246.1387,11901.23666,1126.787846,0,176015.1765,6520.571956,0,0,2235.663193,3300.804818,0,15898.08514,243419.2922,,[M-2H+Na]- 749.885,24,0,"organo-iodine compound, PubChem CID 11535056"
So I think this is a "data bug" and there may be more instances of this problem, which I can discover using pd.read_csv.
This is related to issue #46. Trying to use
pandas.read_csvto read the tps4 neg data I get this syntax error:That line of the CSV file looks like this
Looks like the
annotatedcolumn is supposed to containorgano-iodine compound, PubChem CID 11535056, but that value has a comma in it so the correct CSV formatting for this line is:So I think this is a "data bug" and there may be more instances of this problem, which I can discover using
pd.read_csv.