feat(import): Roam import#561
Conversation
Cases
- Merge ROAM into ATHENS because ROAM schema is a superset
- Merge ATHENS into ROAM?
- Roam db has lots of additional attributes. Import could break Athens if the schema is not maintained.
-> In this case, just keep the necessary :block/ attributes below
Approach 1
- For all shared pages, merge top-level blocks
- For all non-shared pages, simply transact them to dsdb
Approach 2
- All the blocks from the 2nd DB get imported under a new block with the date on it, similar to Quick Capture in Roam
- Preserves context, rather than throwing all the top-level blocks together and forgetting where they come from.
- Also easier code-wise :
Don't need to worry about db/ids, only need
{:block/string "asd" :block/order 1 :block/open true :block/uid "asd123"}
Can parse [[links]] and ((refs)) directly from block/string, and then generate block/refs. Avoid datascript entity id collisions
Edge case: Roam uses natural language dates,
Roam: Month 1st, 2nd, 3rd, 4th...
Athens: Month 1, 2, 3, 4...
Though they should have the same block-uid
1: find pages with same node/title. Merge blocks.
2: find blocks with same block/uid.
If those blocks are date pages, merge. But have to change all the backlinks for this merge as well...
Otherwise log error. (Unlikely there are two real blocks that collide in their uids)
shared blocks
these are are blocks with the same :block/uid in both Athens and Roam
it's likely that these are all date pages
find all the block/refs for this date page, and convert all those to Athens date format.
do this by stripping the 2 characters before the comma of a Roam Date: "January 18th, 2021" -> "January 18r 2021")
| ;;(/ 3736 3842) 97% clean | ||
| ;;(-> (- 1056 2) | ||
| ;; (+ (- 3088 406)))) | ||
| ;;(defonce ROAM-DB (atom nil)) |
There was a problem hiding this comment.
Yes not intentional
There was a problem hiding this comment.
Great, the roam import worked for me after uncommenting this line.
There was a problem hiding this comment.
How's performance? What is your index.transit size now? @avichalp
There was a problem hiding this comment.
index.transit: 4.1M
roam export EDN file`: 7.2MThere is a high latency in loading search results and in loading pages with a high number of linked blocks (~ 400+). From my perception, latency is as high as 10-11 seconds sometimes.
Would like to know how is the performance for other people. If we were to optimize performance, the first step would be to collect some data, I guess.
There was a problem hiding this comment.
My guess is that this stems from inefficient posh queries. Roam and LogSeq teams both told me that Posh wasn't super performant, so they both ended up writing their own reactive lightweight wrappers around Datascript.
I also remember that @jeroenvandijk was working on performance optimizations a while ago, using Reagent cursors. I wonder if that is part of the solution. https://github.com/athensresearch/athens/pull/93/files#diff-3c7f15f69987f2ac41d3dfa65d60e8dfae0778e494b2207f64721d20baad680cR43-R46
Current performance issue: #570
| ;;(/ 3736 3842) 97% clean | ||
| ;;(-> (- 1056 2) | ||
| ;; (+ (- 3088 406)))) | ||
| ;;(defonce ROAM-DB (atom nil)) |
There was a problem hiding this comment.
| ;;(defonce ROAM-DB (atom nil)) | |
| (defonce ROAM-DB (atom nil)) |
|
|
||
|
|
||
| ;; Positive Lookbehind: between 1 and 2 digits | ||
| ;; One of an oridinal suffix, e.g. -st, -nd, -rd, -th, see https://en.wikipedia.org/wiki/Ordinal_indicator |
|
Depends on #665 |
Co-authored-by: baris <baristuncay@gmail.com>

Picking up from #288