I have downloaded the ORD data from the repository, using git lfs as mentioned. I am trying to figure out the best way to convert the individual .pb.gz to json or csv and then extract the data I need. While I was successful converting to json, I found that the ORDerly open-source framework can provide the extracted and cleaned data but it is not working for me when I process the .pb.gz files one by one. I'd like to know if there is any way to resolve this, or if there is another way to get the specific data from the json files. I need to go process all 546 datasets available in the ORD, as I require both USPTO and non-USPTO data, yet I cannot do this all at once due to hardware limitations.
I have downloaded the ORD data from the repository, using git lfs as mentioned. I am trying to figure out the best way to convert the individual .pb.gz to json or csv and then extract the data I need. While I was successful converting to json, I found that the ORDerly open-source framework can provide the extracted and cleaned data but it is not working for me when I process the .pb.gz files one by one. I'd like to know if there is any way to resolve this, or if there is another way to get the specific data from the json files. I need to go process all 546 datasets available in the ORD, as I require both USPTO and non-USPTO data, yet I cannot do this all at once due to hardware limitations.