Skip to content

feat(import): Roam import#561

Closed
jefftangx wants to merge 17 commits intoathensresearch:masterfrom
jefftangx:merge-roam
Closed

feat(import): Roam import#561
jefftangx wants to merge 17 commits intoathensresearch:masterfrom
jefftangx:merge-roam

Conversation

@jefftangx
Copy link
Copy Markdown
Collaborator

Picking up from #288

  • had to change roam dates to athens dates, e.g. January 1st, 2021 to January 1, 2021, for links and titles
  • had to fix linked refs to use query links by :block/_refs rather than with regex

 Cases
 - Merge ROAM into ATHENS because ROAM schema is a superset
 - Merge ATHENS into ROAM?
 - Roam db has lots of additional attributes. Import could break Athens if the schema is not maintained.
 -> In this case, just keep the necessary :block/ attributes below

 Approach 1
 - For all shared pages, merge top-level blocks
 - For all non-shared pages, simply transact them to dsdb

 Approach 2
 - All the blocks from the 2nd DB get imported under a new block with the date on it, similar to Quick Capture in Roam
 - Preserves context, rather than throwing all the top-level blocks together and forgetting where they come from.
 - Also easier code-wise :

 Don't need to worry about db/ids, only need
 {:block/string "asd" :block/order 1 :block/open true :block/uid "asd123"}
 Can parse [[links]] and ((refs)) directly from block/string, and then generate block/refs. Avoid datascript entity id collisions

 Edge case: Roam uses natural language dates,
 Roam: Month 1st, 2nd, 3rd, 4th...
 Athens: Month 1, 2, 3, 4...
 Though they should have the same block-uid

 1: find pages with same node/title. Merge blocks.
 2: find blocks with same block/uid.
 If those blocks are date pages, merge. But have to change all the backlinks for this merge as well...
 Otherwise log error. (Unlikely there are two real blocks that collide in their uids)

 shared blocks
 these are are blocks with the same :block/uid in both Athens and Roam
 it's likely that these are all date pages
 find all the block/refs for this date page, and convert all those to Athens date format.
 do this by stripping the 2 characters before the comma of a Roam Date: "January 18th, 2021" -> "January 18r 2021")
@jefftangx jefftangx changed the title resolve merge conflicts Import from Roam Jan 21, 2021
;;(/ 3736 3842) 97% clean
;;(-> (- 1056 2)
;; (+ (- 3088 406))))
;;(defonce ROAM-DB (atom nil))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This line of comment might not be intentional?

Screen Shot 2021-01-29 at 5 54 46 PM

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes not intentional

Copy link
Copy Markdown

@avichalp avichalp Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, the roam import worked for me after uncommenting this line.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's performance? What is your index.transit size now? @avichalp

Copy link
Copy Markdown

@avichalp avichalp Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.transit: 4.1M
roam export EDN file`: 7.2M

There is a high latency in loading search results and in loading pages with a high number of linked blocks (~ 400+). From my perception, latency is as high as 10-11 seconds sometimes.

Would like to know how is the performance for other people. If we were to optimize performance, the first step would be to collect some data, I guess.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that this stems from inefficient posh queries. Roam and LogSeq teams both told me that Posh wasn't super performant, so they both ended up writing their own reactive lightweight wrappers around Datascript.

I also remember that @jeroenvandijk was working on performance optimizations a while ago, using Reagent cursors. I wonder if that is part of the solution. https://github.com/athensresearch/athens/pull/93/files#diff-3c7f15f69987f2ac41d3dfa65d60e8dfae0778e494b2207f64721d20baad680cR43-R46

Current performance issue: #570

;;(/ 3736 3842) 97% clean
;;(-> (- 1056 2)
;; (+ (- 3088 406))))
;;(defonce ROAM-DB (atom nil))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;;(defonce ROAM-DB (atom nil))
(defonce ROAM-DB (atom nil))



;; Positive Lookbehind: between 1 and 2 digits
;; One of an oridinal suffix, e.g. -st, -nd, -rd, -th, see https://en.wikipedia.org/wiki/Ordinal_indicator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "oridinal"

@jefftangx jefftangx changed the title Import from Roam feat(import): Roam import Mar 6, 2021
@jefftangx
Copy link
Copy Markdown
Collaborator Author

Depends on #665

@jsmorabito jsmorabito mentioned this pull request Mar 17, 2021
12 tasks
Comment thread src/cljs/athens/db.cljs Outdated
Co-authored-by: baris <baristuncay@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants