From 5f72dfe72cddc7e54a28d0dc942cadd7b1a4a7a6 Mon Sep 17 00:00:00 2001 From: Nicole Tache Date: Thu, 4 Sep 2025 08:35:12 -0500 Subject: [PATCH 001/131] Create set-and-enforce-wip-limits Draft of new Practice -- Set and Enforce WIP Limits --- practices/set-and-enforce-wip-limits | 54 ++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 practices/set-and-enforce-wip-limits diff --git a/practices/set-and-enforce-wip-limits b/practices/set-and-enforce-wip-limits new file mode 100644 index 0000000..e69ada0 --- /dev/null +++ b/practices/set-and-enforce-wip-limits @@ -0,0 +1,54 @@ +# Set and Enforce Work-in-Process Limits + +Teams often have too much work in progress at once. This leads to long-lived branches, delayed code reviews, bottlenecks in QA, and constant context switching. Setting and enforcing work-in-process (WIP) limits helps teams stay focused, finish work already in motion, and reduce the overhead caused by juggling too many tasks at once. + +## When to Experiment + +“I am a developer and I need to learn how to prioritize tasks so I can move work across the finish line more quickly and avoid context switching.” + +"I am a team leader and I need to ensure our members stay focused on work that matters most so that we can avoid team burnout." + +## How to Gain Traction + + ### Set Limits that Feel Ambitious + +When teams start by setting limits that feel ambitious, it forces them to make deliberate choices about what work matters most. The exact number depends on your team's context, but the goal is to find the sweet spot where teams feel focused but not hamstrung. [more is needed here to make this point actionable, perhaps an example] + + ### Finish Work Before Starting New Work + +When team members are blocked or waiting, instead of starting new tickets, they can contribute in other ways. This might include refining upcoming tickets, pairing on active work with other developers, performing code reviews, or helping QA test in-progress items. These activities keep the team moving without adding more work to the queue. + + ### Visualize All Work + +Use a storyboard or dashboard tool, such as [xyz], to visualize all ongoing tasks, including hidden or auxiliary tasks like meetings or production support. When the board shows that a limit has been reached, treat it as a hard stop -- no new work enters the system until something completes. This creates the pressure needed to finish what's started and forces the prioritization conversations that lead to better decisions. + +## Lessons From The Field + +[Pragmint to complete] + +This section captures real-world patterns (things that consistently help or hinder this practice) along with short, relevant stories from the field. It’s not for personal rants or generic opinions. Each entry must be based on either: +1. a repeated observation across teams, or +2. a specific example (what worked, what didn’t, and why). + +## Deciding to Polish or Pitch + +After experimenting with this practice for [insert appropriate quantity of time in bold], bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + +### Fast & Measurable + +Fewer tickets stuck in review or QA (as tracked by ...) + +### Slow & Measurable + +Shorter lead times from development to release (as tracked by ...) + +### Slow & Intangible + +Less context switching and fewer rework cycles (via feedback captured by ...) + +Higher throughput and better team focus (via feedback captured by ...) + +## Supported Capability + + ### [Work-in-Process Limits](https://github.com/pragmint/open-practices/blob/main/capabilities/work-in-process-limits.md) +WIP limits help teams deliver more value by finishing what matters most. The focus shifts from starting new work to moving existing work across the finish line with greater speed and quality. From 83e1d6e2c25ebd4316db2a83c5e259b950da9653 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Thu, 4 Sep 2025 09:56:52 -0500 Subject: [PATCH 002/131] updated to follow new Practice template --- ...follow-functional-core-imperative-shell.md | 126 +++++++----------- 1 file changed, 49 insertions(+), 77 deletions(-) diff --git a/practices/follow-functional-core-imperative-shell.md b/practices/follow-functional-core-imperative-shell.md index e732e4a..ae1e5b5 100644 --- a/practices/follow-functional-core-imperative-shell.md +++ b/practices/follow-functional-core-imperative-shell.md @@ -1,119 +1,91 @@ # Follow Functional Core, Imperative Shell -Functional Core, Imperative Shell is a software design pattern that advocates for dividing code into two layers: a functional core and an imperative shell. -The functional core houses the critical logic of the application, written in a pure, side-effect-free manner following functional programming principles. -This core is highly testable, maintainable, and predictable. -The imperative shell handles interactions with the external world, using imperative programming for tasks like I/O operations and user interfaces. -This separation of concerns enhances modularity, clarity, and flexibility in software development, enabling easier maintenance and evolution of complex systems. +When a codebase has tight coupling between state and behavior, changes become difficult, testing impractical, and new developer onboarding difficult. The Functional Core, Imperative Shell pattern introduces a clear separation: pure, side-effect-free logic is isolated in a “functional core,” while I/O and system interactions are handled in an “imperative shell.” This structure improves modularity, simplifies testing, and makes it safer and easier to evolve complex parts of the system over time. -In dynamic languages, the Functional Core, Imperative Shell pattern is often favored because it can address testing challenges inherent in these languages. -By communicating between the functional core and imperative shell via passing values, instead of relying on interfaces, developers can mitigate the risk of false positives when using mocks and avoid encountering errors with the real implementation. +## When to Experiment -## Nuances +"I am a [persona] and I need to ... so I can gradually modernize the most brittle parts of our platform." -This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this practice. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the practice with your teams. +"I am a [persona] and I need to ... so I can reduce cognitive load." -### Overemphasis on Functional Purity +"I am a [persona] and I need to ... so I can build a foundation for more robust and maintainable systems." -Ensuring the functional core remains pure and without side effects is important. -However, excessively fixating on absolute purity may inadvertently introduce impractical code. -For instance, rigidly avoiding variable mutation can sometimes lead to the use of recursion. -While recursion aligns with functional purity, its efficiency may diminish, particularly in languages where stack management for large inputs poses challenges. -Functional programming constructs, while elegant, may not always be the most efficient choice, especially in performance-critical scenarios. +## How to Gain Traction -### Remembering Values are the Boundary +### Host a Roundtable Discussion -Values serve as the boundary between the layers. -The imperative shell should communicate with the functional core by passing value objects exclusively, avoiding objects or functions that could potentially induce side effects. -This ensures that the functional core remains isolated from external state changes, promoting clarity, predictability, and maintainability in the codebase while facilitating easier testing, debugging, and refactoring. +#### Assess the Benefits -### Potentially Steep Learning Curve - -Transitioning to the Functional Core, Imperative Shell pattern may present a steep learning curve for teams. -To facilitate this transition smoothly, it's recommended that developers with more knowledge mentor other developers through pair programming sessions. -Additionally, fostering an environment of knowledge sharing, providing resources, and allocating time for developers to study the pattern can greatly aid in its adoption and understanding across the team. - -### Can Be Compatible with Object-Oriented Programming - -The purpose of an object is to bundle behavior and state into a single, cohesive unit. In contrast, the Functional Core, Imperative Shell pattern aims to _separate_ pure logic from state management. When using the Functional Core, Imperative Shell pattern in an object-oriented application, the idea is to extract functional-style components that are then used by objects managing state and interacting with external systems. - -This approach works best when there are clear boundaries between pure computations and operations that cause side effects. However, avoid applying this pattern to objects where state and behavior are tightly intertwined, as it can add unnecessary complexity. The goal is to separate concerns while preserving the natural behavior of objects and maintaining the integrity of stateful components. - -So, developers can implement the functional core while adhering to OOP principles, taking advantage of both paradigms. - -### Some Behavior and Logic Needs to Live on the Imperative Shell - -In simple cases, the imperative shell merely passes inputs to the functional core, receives the response, and renders it back to the user. -However, there are scenarios where the functional core may produce outputs that require inspection or processing by the imperative shell. -While the imperative shell should ideally remain simple and devoid of complex logic, it may need to analyze, interpret, or transform the response from the functional core to ensure proper interaction and presentation to the user. +* What advantages could the adoption of the Functional Core, Imperative Shell pattern bring to our projects? +* How might separating business logic from side effects enhance code readability, maintainability, and scalability? +* In what ways could the Functional Core, Imperative Shell pattern mitigate the impact of changes in infrastructure technology, allowing for smoother transitions and future-proofing our codebase? +* What benefits might arise from writing unit tests that ensure the functional core code being tested has no side effects? +* How could the reduced presence of control statements in the imperative shell simplify integration tests? -### Unit Testing of Functional Core, Integration Testing of Imperative Shell +#### Evaluate Team Readiness -Unit tests should concentrate on validating the business logic enclosed within the functional core, testing its expected behavior in isolation. -This approach is especially advantageous due to the functional core's composition of pure functions, facilitating straightforward unit testing devoid of external dependencies or side effects. -Integration tests should cover the behavior of the imperative shell as it interacts with external systems, including database calls, API requests, or user interfaces. -Imperative shell integration tests ideally require fewer scenarios to validate, given that control statements such as `if`, `while`, or `for` loops should mostly reside within the functional core layer. +* How prepared are our development teams to embrace a paradigm shift toward functional programming principles? +* Do team members possess the necessary skills and knowledge to implement and maintain code following this practice? +* What resources, training, or support mechanisms can we provide to facilitate the transition and ensure successful adoption? -## Gaining Traction +#### Begin a Gradual Transition to Functional Core, Imperative Shell -The following actions will help your team implement this practice. +* How can we identify and prioritize modules or components within our existing codebase that are suitable candidates for transitioning to the Functional Core, Imperative Shell pattern? +* What strategies can we employ to refactor existing imperative code into pure functions within the functional core, while maintaining backward compatibility and minimizing disruptions to ongoing development? +* Are there opportunities to introduce the Functional Core, Imperative Shell pattern gradually, perhaps starting with new features or modules before expanding its adoption to legacy code? +* How can we ensure effective communication and collaboration among team members during the transition process, including knowledge sharing, pair programming, and code reviews? +* What metrics or milestones can we establish to measure progress and evaluate the success of incrementally transitioning to the Functional Core, Imperative Shell pattern? -### [Host a Viewing Party](/practices/host-a-viewing-party.md) +### Build a Regression Testing Suite +[fit this detail in this section? It was listed as a "prerequisite experiment" in the GS report. The below was part of Nuances, but it seems more like essential "setup" of this experiment. Should it go here?] +**Unit tests** should concentrate on validating the business logic enclosed within the functional core, testing its expected behavior in isolation. This approach is especially advantageous due to the functional core's composition of pure functions, facilitating straightforward unit testing devoid of external dependencies or side effects. **Integration tests** should cover the behavior of the imperative shell as it interacts with external systems, including database calls, API requests, or user interfaces. Imperative shell integration tests ideally require fewer scenarios to validate, given that control statements such as `if`, `while`, or `for` loops should mostly reside within the functional core layer. -#### [Boundaries by Gary Bernhardt](https://www.destroyallsoftware.com/talks/boundaries) +### Encourage Mentoring +Transitioning to the Functional Core, Imperative Shell pattern may present a steep learning curve for teams. +To facilitate this transition smoothly, have developers with more knowledge mentor other developers through pair programming sessions. +Fostering an environment of knowledge sharing, providing resources, and allocating time for developers to study the pattern can greatly aid in its adoption and understanding across the team. -This talk is about using simple values (as opposed to complex objects) as the boundaries between components and subsystems. It moves through many topics: functional programming, mutability's relationship to OOP, isolated unit testing with and without test doubles, concurrency, and more. +### Host Team Viewings -#### [Moving I/O to the edges of your app: Functional Core, Imperative Shell](https://www.youtube.com/watch?v=P1vES9AgfC4) +- [Boundaries by Gary Bernhardt](https://github.com/pragmint/open-practices/blob/main/resources/tech/boundaries.md): This talk is about using simple values (as opposed to complex objects) as the boundaries between components and subsystems. It moves through many topics: functional programming, mutability's relationship to OOP, isolated unit testing with and without test doubles, concurrency, and more. -Modern software design patterns, like Onion, Clean, and Hexagonal architecture, suggest that your app's logic should run the same way every time, with I/O handled in separate abstractions at the edges. This talk introduces a simple way to keep I/O and core logic apart, simplifying code. +- [Moving I/O to the edges of your app: Functional Core, Imperative Shell](https://www.youtube.com/watch?v=P1vES9AgfC4): Modern software design patterns, like Onion, Clean, and Hexagonal architecture, suggest that your app's logic should run the same way every time, with I/O handled in separate abstractions at the edges. This talk introduces a simple way to keep I/O and core logic apart, simplifying code. -#### [Are We There Yet](https://www.youtube.com/watch?v=ScEPu1cs4l0) +- [Are We There Yet](https://www.youtube.com/watch?v=ScEPu1cs4l0): This talk covers some first-principles thinking about how software could, and should, be designed. It highlights the challenges of managing state and avoiding complexity, and advocates for designs that allow for smoother evolution over time. -This talk covers some first-principles thinking about how software could, and should, be designed. It highlights the challenges of managing state and avoiding complexity, and advocates for designs that allow for smoother evolution over time. +### Read as a Team -### [Start a Book Club](/practices/start-a-book-club.md) +- [How functional programming patterns can simplify code](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_02): This article promotes the mastery of functional programming principles, stating they improve code quality beyond multi-core challenges. It emphasizes referential transparency, where functions yield consistent results regardless of mutable state. It criticizes mutable variables in imperative code and suggests smaller, immutable functions for fewer defects. It acknowledges functional programming's limitations but advocates for its application in various domains, asserting it complements OOP. -#### [How functional programming patterns can simplify code](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_02) +## Lessons From the Field -This article promotes the mastery of functional programming principles, stating they improve code quality beyond multi-core challenges. It emphasizes referential transparency, where functions yield consistent results regardless of mutable state. It criticizes mutable variables in imperative code and suggests smaller, immutable functions for fewer defects. It acknowledges functional programming's limitations but advocates for its application in various domains, asserting it complements OOP. +- _Don't overemphasize functional purity_ -- Ensuring the functional core remains pure and without side effects is important, but excessively fixating on absolute purity may inadvertently introduce impractical code. For instance, rigidly avoiding variable mutation can sometimes lead to the use of recursion. Functional programming constructs, while elegant, may not always be the most efficient choice, especially in performance-critical scenarios. -### [Facilitate a Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +- _Remember, values are the boundary_ -- Values serve as the boundary between the layers. The imperative shell should communicate with the functional core by passing value objects exclusively, avoiding objects or functions that could potentially induce side effects. This ensures that the functional core remains isolated from external state changes, promoting clarity, predictability, and maintainability in the codebase while facilitating easier testing, debugging, and refactoring. -Below are suggestions for topics and prompts you could explore with your team during a roundtable discussion. +- _Some behavior and logic needs to live on the imperative shell_ -- In simple cases, the imperative shell merely passes inputs to the functional core, receives the response, and renders it back to the user. However, there are scenarios where the functional core may produce outputs that require inspection or processing by the imperative shell. -#### Assessing the Benefits +## Deciding to Pitch or Polish -* What advantages could the adoption of the Functional Core, Imperative Shell pattern bring to our projects? -* How might separating business logic from side effects enhance code readability, maintainability, and scalability? -* In what ways could the Functional Core, Imperative Shell pattern mitigate the impact of changes in infrastructure technology, allowing for smoother transitions and future-proofing our codebase? -* What benefits might arise from writing unit tests that ensure the functional core code being tested has no side effects? -* How could the reduced presence of control statements in the imperative shell simplify integration tests? +After experimenting with this practice for [insert appropriate quantity of time in bold], bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: -#### Evaluating Team Readiness +### Fast & Measurable -* How prepared are our development teams to embrace a paradigm shift toward functional programming principles? -* Do team members possess the necessary skills and knowledge to implement and maintain code following this practice? -* What resources, training, or support mechanisms can we provide to facilitate the transition and ensure successful adoption? +Increased unit test coverage of business logic without needing mocks or stubs (as tracked by...) -#### Incremental Transition to Functional Core, Imperative Shell +Faster lead time of tickets (as tracked by...) -* How can we identify and prioritize modules or components within our existing codebase that are suitable candidates for transitioning to the Functional Core, Imperative Shell pattern? -* What strategies can we employ to refactor existing imperative code into pure functions within the functional core, while maintaining backward compatibility and minimizing disruptions to ongoing development? -* Are there opportunities to introduce the Functional Core, Imperative Shell pattern gradually, perhaps starting with new features or modules before expanding its adoption to legacy code? -* How can we ensure effective communication and collaboration among team members during the transition process, including knowledge sharing, pair programming, and code reviews? -* What metrics or milestones can we establish to measure progress and evaluate the success of incrementally transitioning to the Functional Core, Imperative Shell pattern? +### Slow & Intangible -## Adjacent Capabilities +Improved code clarity and consistency, as reported by developers via ... -This practice supports enhanced performance in the following capabilities. +## Supported Capabilities ### [Code Maintainability](/capabilities/code-maintainability.md) -Follow the Functional Core, Imperative Shell pattern significantly supports the Code Maintainability capability. By separating business logic into a functional core and side effects into an imperative shell, code becomes more readable, more comprehensible, and less complex. With a clear distinction between pure functions and imperative code, developers can more easily understand and modify code, leading to improved maintainability and stability of the software system. ### [Test Automation](/capabilities/test-automation.md) -Follow the Functional Core, Imperative Shell pattern supports the Test Automation capability because it facilitates the creation of highly testable and maintainable code. The functional core, being side-effect-free, allows for straightforward unit testing - its pure functions yield predictable results and don't rely on external states. This ensures that the core business logic is thoroughly tested and reliable. The imperative shell handles side effects and interactions with external systems, which can be tested through integration tests. This clear separation simplifies the testing process, improves test coverage, and provides faster and more reliable feedback during development, which is crucial for robust and efficient test automation. +The functional core allows for straightforward unit testing - its pure functions yield predictable results and don't rely on external states -- while the imperative shell handles side effects and interactions with external systems, which can be tested through integration tests. This clear separation simplifies the testing process, improves test coverage, and provides faster and more reliable feedback during development, which is crucial for robust and efficient test automation. From ae538d6c9bf031b8a5ed568d1dd4944468f5e861 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Mon, 29 Sep 2025 18:32:10 -0700 Subject: [PATCH 003/131] Add final part to the polish or pitch phase around lasting adoption --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1f6e048..1b8e383 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ Material in this repository supports Pragmint's cyclical **S.T.E.P.** framework: * **Experiment:** Play around with supported [practices](/practices) to enhance targeted Capabilities. Select one or two high-impact experiments, commit to them, and give the team time to integrate them into their regular workflow. -* **Polish or Pitch:** Gather feedback and reflect on how experimenting with one or more practices affected the team's or system's performance. Review Metrics & Signals, included in each practice ([example](/practices/migrate-to-monorepo.md#metrics--signals)), to determine whether an experiment is making a positive impact. Polish and adopt practices that are working or showing promise, pitch those that are not, then take the next S.T.E.P. +* **Polish or Pitch:** Gather feedback and reflect on how experimenting with one or more practices affected the team's or system's performance. Review Metrics & Signals, included in each practice ([example](/practices/migrate-to-monorepo.md#metrics--signals)), to determine whether an experiment is making a positive impact. Polish and adopt practices that are working or showing promise, pitch those that are not, then take the next S.T.E.P. As you polish successful practices, build in mechanisms to ensure continued adoption, such as CI checks that enforce test coverage thresholds or PR checklists that verify adherence to established patterns. ## DORA Capabilities From fdfe3a698c29a5bc236e6a4d4b429d2871aa9d83 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Fri, 17 Oct 2025 17:08:55 -0700 Subject: [PATCH 004/131] Refine FC/IS practice * Add supporting resource pages --- ...follow-functional-core-imperative-shell.md | 78 +++++++++---------- resources/tech/are-we-there-yet.md | 60 ++++++++++++++ resources/tech/moving_io_to_the_edges.md | 38 +++++++++ 3 files changed, 133 insertions(+), 43 deletions(-) create mode 100644 resources/tech/are-we-there-yet.md create mode 100644 resources/tech/moving_io_to_the_edges.md diff --git a/practices/follow-functional-core-imperative-shell.md b/practices/follow-functional-core-imperative-shell.md index ae1e5b5..987f78f 100644 --- a/practices/follow-functional-core-imperative-shell.md +++ b/practices/follow-functional-core-imperative-shell.md @@ -1,90 +1,82 @@ # Follow Functional Core, Imperative Shell -When a codebase has tight coupling between state and behavior, changes become difficult, testing impractical, and new developer onboarding difficult. The Functional Core, Imperative Shell pattern introduces a clear separation: pure, side-effect-free logic is isolated in a “functional core,” while I/O and system interactions are handled in an “imperative shell.” This structure improves modularity, simplifies testing, and makes it safer and easier to evolve complex parts of the system over time. +When a codebase has tight coupling between state and behavior, changes become difficult, testing impractical, and new developer onboarding difficult. The Functional Core, Imperative Shell pattern introduces a clear separation: pure, side-effect-free logic is isolated in abstractions called “functional cores,” while I/O and system interactions are handled in abstractions called “imperative shells.” This structure improves modularity, simplifies testing, and makes it safer and easier to evolve complex parts of the system over time. -## When to Experiment - -"I am a [persona] and I need to ... so I can gradually modernize the most brittle parts of our platform." +Lots of other patterns build on this same idea. Hexagonal, Onion, and Clean Architectures all formalize it at the system level by placing a pure, dependency-free domain at the center and pushing frameworks, databases, and APIs to the outer shell. In each case, the essence is the same: keep decision-making pure and deterministic, and confine the messy realities of the outside world to the edges where they can be swapped, mocked, or evolved independently. -"I am a [persona] and I need to ... so I can reduce cognitive load." +## When to Experiment -"I am a [persona] and I need to ... so I can build a foundation for more robust and maintainable systems." +- You are a developer and you are struggling to write isolated unit tests because the underlying system is very coupled. +- You are a frontend developer, and you need to keep UI rendering predictable while isolating browser events and API calls so the interface stays easy to reason about. +- You are an architect, and you need to organize systems so that they remain reliable, scalable, and easy to evolve as the business grows without constant rewrites or coordination bottlenecks. +- You are a data engineer, and you need to build testable, reusable, and replayable transformation pipelines. +- You are an engineering leader who needs to accelerate delivery while improving stability so new developers can ramp up quickly, teams can ship safely, and the platform can scale without breaking. ## How to Gain Traction ### Host a Roundtable Discussion +You can use the following conversation prompts: + #### Assess the Benefits -* What advantages could the adoption of the Functional Core, Imperative Shell pattern bring to our projects? -* How might separating business logic from side effects enhance code readability, maintainability, and scalability? -* In what ways could the Functional Core, Imperative Shell pattern mitigate the impact of changes in infrastructure technology, allowing for smoother transitions and future-proofing our codebase? -* What benefits might arise from writing unit tests that ensure the functional core code being tested has no side effects? -* How could the reduced presence of control statements in the imperative shell simplify integration tests? +- How could the reduced presence of control statements in the imperative shell simplify integration tests? +- How could the reduced presence of dependencies in the functional cores simplify unit tests? +- How might separating business logic from side effects enhance code readability, maintainability, and scalability? #### Evaluate Team Readiness -* How prepared are our development teams to embrace a paradigm shift toward functional programming principles? -* Do team members possess the necessary skills and knowledge to implement and maintain code following this practice? -* What resources, training, or support mechanisms can we provide to facilitate the transition and ensure successful adoption? +- How prepared are our development teams to embrace a paradigm shift toward functional programming principles? +- Do team members possess the necessary skills and knowledge to implement and maintain code following this practice? +- What resources, training, or support mechanisms can we provide to facilitate the transition and ensure successful adoption? #### Begin a Gradual Transition to Functional Core, Imperative Shell -* How can we identify and prioritize modules or components within our existing codebase that are suitable candidates for transitioning to the Functional Core, Imperative Shell pattern? -* What strategies can we employ to refactor existing imperative code into pure functions within the functional core, while maintaining backward compatibility and minimizing disruptions to ongoing development? -* Are there opportunities to introduce the Functional Core, Imperative Shell pattern gradually, perhaps starting with new features or modules before expanding its adoption to legacy code? -* How can we ensure effective communication and collaboration among team members during the transition process, including knowledge sharing, pair programming, and code reviews? -* What metrics or milestones can we establish to measure progress and evaluate the success of incrementally transitioning to the Functional Core, Imperative Shell pattern? - -### Build a Regression Testing Suite -[fit this detail in this section? It was listed as a "prerequisite experiment" in the GS report. The below was part of Nuances, but it seems more like essential "setup" of this experiment. Should it go here?] -**Unit tests** should concentrate on validating the business logic enclosed within the functional core, testing its expected behavior in isolation. This approach is especially advantageous due to the functional core's composition of pure functions, facilitating straightforward unit testing devoid of external dependencies or side effects. **Integration tests** should cover the behavior of the imperative shell as it interacts with external systems, including database calls, API requests, or user interfaces. Imperative shell integration tests ideally require fewer scenarios to validate, given that control statements such as `if`, `while`, or `for` loops should mostly reside within the functional core layer. +- How can we identify and prioritize modules or components within our existing codebase that are suitable candidates for transitioning to the Functional Core, Imperative Shell pattern? +- How can we ensure effective communication and collaboration among team members during the transition process, including knowledge sharing, pair programming, and code reviews? +- What metrics or milestones can we establish to measure progress and evaluate the success of incrementally transitioning to the Functional Core, Imperative Shell pattern? ### Encourage Mentoring -Transitioning to the Functional Core, Imperative Shell pattern may present a steep learning curve for teams. -To facilitate this transition smoothly, have developers with more knowledge mentor other developers through pair programming sessions. -Fostering an environment of knowledge sharing, providing resources, and allocating time for developers to study the pattern can greatly aid in its adoption and understanding across the team. -### Host Team Viewings +Transitioning to the Functional Core, Imperative Shell pattern may present a steep learning curve for teams. To facilitate this transition smoothly, have developers with more experience in this pattern mentor other developers through pair programming sessions. Fostering an environment of knowledge sharing, providing resources, and allocating time for developers to study the pattern can greatly aid in its adoption and understanding across the team. -- [Boundaries by Gary Bernhardt](https://github.com/pragmint/open-practices/blob/main/resources/tech/boundaries.md): This talk is about using simple values (as opposed to complex objects) as the boundaries between components and subsystems. It moves through many topics: functional programming, mutability's relationship to OOP, isolated unit testing with and without test doubles, concurrency, and more. +### Host Team Watch Parties -- [Moving I/O to the edges of your app: Functional Core, Imperative Shell](https://www.youtube.com/watch?v=P1vES9AgfC4): Modern software design patterns, like Onion, Clean, and Hexagonal architecture, suggest that your app's logic should run the same way every time, with I/O handled in separate abstractions at the edges. This talk introduces a simple way to keep I/O and core logic apart, simplifying code. +- [Boundaries by Gary Bernhardt](/resources/tech/boundaries.md): This talk is about using simple values (as opposed to complex objects) as the boundaries between components and subsystems. It moves through many topics: functional programming, mutability's relationship to OOP, isolated unit testing with and without test doubles, concurrency, and more. -- [Are We There Yet](https://www.youtube.com/watch?v=ScEPu1cs4l0): This talk covers some first-principles thinking about how software could, and should, be designed. It highlights the challenges of managing state and avoiding complexity, and advocates for designs that allow for smoother evolution over time. +- [Moving I/O to the edges of your app: Functional Core, Imperative Shell](/resources/tech/moving_io_to_the_edges.md): Modern software design patterns, like Onion, Clean, and Hexagonal architecture, suggest that your app's logic should run the same way every time, with I/O handled in separate abstractions at the edges. This talk introduces a simple way to keep I/O and core logic apart, simplifying code. -### Read as a Team - -- [How functional programming patterns can simplify code](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_02): This article promotes the mastery of functional programming principles, stating they improve code quality beyond multi-core challenges. It emphasizes referential transparency, where functions yield consistent results regardless of mutable state. It criticizes mutable variables in imperative code and suggests smaller, immutable functions for fewer defects. It acknowledges functional programming's limitations but advocates for its application in various domains, asserting it complements OOP. +- [Are We There Yet](/resources/tech/are-we-there-yet.md): This talk covers some first-principles thinking about how software could, and should, be designed. It highlights the challenges of managing state and avoiding complexity, and advocates for designs that allow for smoother evolution over time. ## Lessons From the Field -- _Don't overemphasize functional purity_ -- Ensuring the functional core remains pure and without side effects is important, but excessively fixating on absolute purity may inadvertently introduce impractical code. For instance, rigidly avoiding variable mutation can sometimes lead to the use of recursion. Functional programming constructs, while elegant, may not always be the most efficient choice, especially in performance-critical scenarios. - -- _Remember, values are the boundary_ -- Values serve as the boundary between the layers. The imperative shell should communicate with the functional core by passing value objects exclusively, avoiding objects or functions that could potentially induce side effects. This ensures that the functional core remains isolated from external state changes, promoting clarity, predictability, and maintainability in the codebase while facilitating easier testing, debugging, and refactoring. +- *Framework Gravity* – Framework conventions naturally pull logic toward controllers, services, and models, blurring the line between pure and side-effecting code. Teams often think they’ve built a functional core when it still depends on framework helpers. Breaking free usually starts by isolating one rule or workflow outside the framework to prove the value of true independence. -- _Some behavior and logic needs to live on the imperative shell_ -- In simple cases, the imperative shell merely passes inputs to the functional core, receives the response, and renders it back to the user. However, there are scenarios where the functional core may produce outputs that require inspection or processing by the imperative shell. +- *Fear of Architectural Overreach* – Teams burned by past "architecture experiments" often equate Functional Core / Imperative Shell with another dogmatism crusade. When the pattern is explained in abstract terms, skepticism has room to breathe; when it’s shown through concrete before-and-after examples of simpler testing or safer changes, the conversation shifts from ideology to practicality. ## Deciding to Pitch or Polish -After experimenting with this practice for [insert appropriate quantity of time in bold], bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: +After experimenting with this practice for a month, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: ### Fast & Measurable -Increased unit test coverage of business logic without needing mocks or stubs (as tracked by...) +**Higher Ratio of Unit to Integration Tests** – As logic becomes framework-independent, teams naturally write more unit tests and fewer brittle integrations. Test coverage tools or tagging schemes (e.g., @unit, @integration) reveal this shift toward isolated, fast-running verification. + +### Slow & Measurable + +**Reduced Test Runtime** – Pure functions execute without bootstrapping frameworks or external systems, cutting test times and feedback cycles. This improvement shows up over time in CI dashboards or local test runner metrics as test suites complete faster and more reliably. -Faster lead time of tickets (as tracked by...) +**Shorter Onboarding Time** – A clearer separation between core logic and I/O layers reduces cognitive load for new hires. Developer experience (DX) surveys should provide measurable evidence of ramp-up speed improving over multiple cohorts. ### Slow & Intangible -Improved code clarity and consistency, as reported by developers via ... +**Faster, Safer Refactors** – Once side effects are isolated at the edges, developers can modify or replace integrations with less coordination and lower regression risk. ## Supported Capabilities ### [Code Maintainability](/capabilities/code-maintainability.md) -By separating business logic into a functional core and side effects into an imperative shell, code becomes more readable, more comprehensible, and less complex. -With a clear distinction between pure functions and imperative code, developers can more easily understand and modify code, leading to improved maintainability and stability of the software system. +By separating business logic into a functional core and side effects into an imperative shell, code becomes more readable, more comprehensible, and less complex. With a clear distinction between pure functions and imperative code, developers can more easily understand and modify code, leading to improved maintainability and stability of the software system. ### [Test Automation](/capabilities/test-automation.md) diff --git a/resources/tech/are-we-there-yet.md b/resources/tech/are-we-there-yet.md new file mode 100644 index 0000000..bf96b97 --- /dev/null +++ b/resources/tech/are-we-there-yet.md @@ -0,0 +1,60 @@ +# Are We There Yet + +Resource type: video + +https://www.youtube.com/watch?v=ScEPu1cs4l0 + +In "Are We There Yet?", Rich Hickey pushes us to rethink how we treat time, state, and identity in software. He argues that many traditional models (especially mutable objects) hide complexity by leaky abstractions, making it harder to reason about system behavior over time. The talk encourages modeling change explicitly, favoring immutable values, and designing for clarity around temporal transitions rather than accidentally embedding time everywhere. + +## Opening Questions + +1. What do we currently treat as "state" in our systems? How mutable is it, and how often does it change? +2. How comfortable are we with the idea of treating time as a first-class concept (instead of letting it implicitly permeate everything)? +3. What challenges do we face when reasoning about temporal behavior (e.g. concurrency, caching, event ordering, eventual consistency)? + +## Core Themes & Concepts to Explore + +- **Time vs State vs Behavior** — Hickey differentiates between the underlying identity of an entity, the sequence of its states over time, and the behaviors we observe. +- **Values vs Objects** — He argues for privileging values (immutable data) over mutable objects, which carry identity and hidden mutation. +- **Separation of Identity & State** — Identity is what persists; state is a snapshot at a time. Treating state as changeable undermines predictability. +- **Explicit Time & Change** — Rather than letting time leak everywhere, model changes explicitly (e.g. via snapshots, versions, streams). +- **Simplicity and Clarity as Design Goals** — Use models that make it easier to reason about flow over time, rather than hiding complexity behind mutable state. + +## Team Exercises + +1. **State Snapshot Exercise** + - Pick an entity in your domain (e.g. User, Order, Session). + - Sketch how you currently track its state over time. + - Then reimagine it with immutable snapshots (or versioned states) and ask: how would your logic or APIs change? + +2. **Temporal Behavior Modeling** + - Take a feature where events or state transitions matter (e.g. order cancellation after timeout, state rollbacks, event replay). + - Model it as a sequence of value transitions. + - Identify where implicit time assumptions today might obscure correctness. + +3. **Mutation Audit** + - Audit a module or service to find mutable state—global variables, caches, in-place updates. + - For each, propose how you might convert it to a value-based approach or isolate the mutation boundary. + +4. **Language / Framework Mapping** + - Take a language or framework you use and map constructs (objects, mutable data structures, reactive streams, versioning tools) to Hickey’s ideas. + - Where are the gaps? Where does it already align? + +## Reflection Prompts + +- If we adopt more value-centric and time-aware design, where would we gain clarity or detect fewer bugs? +- What parts of our system are most hindered by hidden, implicit time assumptions (e.g. caching, synchronization, stale reads)? +- Which mutation boundaries or impure modules are good candidates for refactoring toward a more explicit, time-based model? +- What smaller experiment (e.g. one feature or module) could we try next week to test these ideas? + +## Facilitator Tip + +Encourage people to bring **real examples from the codebase**—especially ones involving race conditions, stale data, or temporal quirks. Use a whiteboard to draw the flow of changes over time. Ask: *"What is the identity here? What are its snapshots? What transitions do we care about?"* +Because Hickey uses fairly abstract philosophy, grounding the discussion in domains your team knows makes the leap more tangible. + +### How This Resource Brings Value + +- It offers a **deeper mental model** of time and state that challenges conventional OO or imperative thinking. +- The talk provides **vocabulary and perspectives** useful for discussing mutability, snapshots, versioning, and change semantics. +- It can catalyze **architecture improvements** by revealing where implicit time or mutability is causing fragility or bugs. +- As a shared conceptual anchor, it helps teams talk more precisely about where change occurs, what is historical vs current, and how to design modules that resist time-leak. diff --git a/resources/tech/moving_io_to_the_edges.md b/resources/tech/moving_io_to_the_edges.md new file mode 100644 index 0000000..fff1bbf --- /dev/null +++ b/resources/tech/moving_io_to_the_edges.md @@ -0,0 +1,38 @@ +# Moving IO to the edges of your app: Functional Core, Imperative Shell + +Resource Type: video + +https://www.youtube.com/watch?v=P1vES9AgfC4 + +This talk (by Scott Wlaschin) makes a compelling case for design discipline: by shunting all side effects (such as database calls, HTTP, logging, file I/O) into an "outer shell," your domain logic becomes purely functional, easier to test, and more resilient to change. Wlaschin presents clear techniques for structuring code with this separation and warns of common pitfalls. It’s especially helpful for teams working with mixed paradigms (OO, functional, procedural) who want to reduce complexity and improve maintainability. + +## Core Concepts to Explore + +- **Functional Core**: Pure, deterministic, side-effect-free code that models business logic. +- **Imperative Shell**: Orchestrates the system, performing I/O and passing data into/out of the core. +- **Dependency Direction**: Core should not depend on the shell; dependencies point outward. +- **Testing Implications**: Core logic can be tested without mocks; shell behavior can be validated through integration tests. + +## Team Exercises + +1. **Identify Boundaries** — Pick a representative feature or service and map which parts belong to the functional core vs the imperative shell. +2. **Refactor Thought Experiment** — Imagine rewriting that feature using this pattern. What would stay the same? What would need to move? +3. **Failure Mode Review** — Discuss where failures or inconsistencies arise today because side effects aren’t isolated. +4. **Vocabulary Alignment** — Agree on a shared language for describing "core" vs "shell" code so future design discussions are more precise. + +## Reflection Prompts + +- What would adopting this pattern improve the most for us—testability, onboarding, reasoning, or change safety? +- How could this approach fit within our current architecture (OO, layered, service-based, etc.)? +- What small step could we take this month to start moving side effects outward? + +## Facilitator Tip + +Encourage participants to anchor discussion in real examples from your codebase. Whiteboard or sketch the data flow of one subsystem, and ask: *"Which parts must be impure, and which could be purely functional?"* The clarity that emerges often reveals unnecessary coupling and opens the door to incremental architectural improvement. + +## How This Resource Brings Value + +- It provides **mental models** and **terminology** (functional core, imperative shell) that teams can adopt to reason clearly about side-effect boundaries. +- The talk includes **practical implementation ideas** (wrappers, commands, effects-as-data) that can be adapted to many languages and architectures. +- Viewing the talk as a team can spark **dialogue around existing architecture cruft**, helping refactor toward clearer separation. +- It serves as a **shared reference point**, so that when team members talk about "shell code" or "core logic," everyone has a common frame of reference. From a96d9df5ad01100c70d7606e93243c84e87d6d94 Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Mon, 11 Aug 2025 15:35:32 -0400 Subject: [PATCH 005/131] First draft of static code analysis (automated code analysis) for review. --- practices/perform-static-code-analysis.md | 130 +++++++--------------- 1 file changed, 43 insertions(+), 87 deletions(-) diff --git a/practices/perform-static-code-analysis.md b/practices/perform-static-code-analysis.md index ffce3f8..c881d0e 100644 --- a/practices/perform-static-code-analysis.md +++ b/practices/perform-static-code-analysis.md @@ -1,115 +1,71 @@ -# Perform Static Code Analysis +# Perform Automated Code Analysis -Performing static code analysis involves using automated tools to review and scan the codebase for potential issues, ensuring adherence to quality standards and best practices. -These tools help detect issues early in development, integrating with version control systems, IDEs, and CI/CD pipelines to enhance productivity. -Static code analysis is valuable for spotting code smells, basic security vulnerabilities, performance bottlenecks, and analyzing dependencies for better modularity. +Manually spotting every potential bug, style inconsistency, or design flaw is a tall order — and often a slow one. Automated code analysis brings speed and consistency by having tools (both traditional static analyzers and modern AI-powered assistants) scan code as you work. These tools can highlight security vulnerabilities, style discrepancies, dependency risks, and even suggest or apply fixes in real time. -## Nuance +Some popular tools include: +- [ESLint](https://eslint.org/docs/latest/use/getting-started) - General-purpose static analysis for JavaScript/TypeScript +- [Prettier](https://prettier.io/docs/integrating-with-linters) - Automated code formatting for JavaScript/TypeScript +- [SonarQube](https://www.sonarsource.com/sem/products/sonarqube/downloads/) - Multi-language static analysis with some AI-powered features +- [Semgrep](https://github.com/semgrep/semgrep) - Multi-language static and semantic analysis, with AI-assisted rule generation +- [Claude Code](https://www.anthropic.com/claude) - AI-powered code review, style enforcement, and bug detection +- Self-hosted LLMs - Using tools like Ollama or LM Studio to run open-source AI models locally -### Common Misconceptions about Static Code Analysis +## Who It’s For & Why -A common misconception is that static code analysis can catch all possible issues in a codebase. -While these tools are powerful for identifying code smells, basic security vulnerabilities, and performance bottlenecks, they are not foolproof. -They may miss more nuanced or context-specific problems, and sometimes flag good code as problematic. -Developers should not solely rely on these tools but use them as part of a broader quality assurance strategy. +- You are a developer and would be need feedback on potential bugs, design issues, and style mismatches without waiting for a review cycle. +- You are a QA Engineer and you need to identify high risk areas earlier in the development process, so you can identify the best use of your often limited testing time ensuring greater code quality. +- You are a Tech Lead or Manager and you need to ensure consistent code quality across a team without increasing review overhead. -### Importance of Developer Judgment +## Goals, Metrics & Signals -While static code analysis tools are helpful, they should not replace developer judgment. -These tools can highlight potential issues, but it is up to the developers to make the final call on whether a flagged issue is truly problematic. -Blindly following the tool's recommendations can lead to unnecessary code changes and reduce overall productivity. -The ability to override automated checks ensures that the development process remains flexible and pragmatic. +### Intended Outcomes -### Impact on Code Reviews +- Faster detection and resolution of bugs, security vulnerabilities, and style inconsistencies. +- Increased developer confidence and reduced rework caused by late-discovered issues. +- Improved consistency and maintainability across the codebase. -Relying too heavily on static code analysis might lead to a reduction in code reviews. -Automated tools should complement, not replace, human reviews, which are essential for catching context-specific issues and providing valuable feedback on code design and architecture. -Ensuring that manual code reviews remain a part of the development process is vital for maintaining high code quality. +### Target Measurements -## How to Improve +- Reduced number of issues caught during manual code reviews that could have been flagged by automated tools. +- Decrease in production bugs linked to preventable coding errors. +- Positive changes in developer sentiment around “friction” in code reviews, measured via retrospectives or surveys. -### [Do A Spike](/practices/do-a-spike.md) -#### Tool Selection and Initial Setup +## Lessons From The Field -Identify and set up a static code analysis tool that fits your team's needs. -Research various static code analysis tools, such as SonarQube or CodeClimate, and compare their features. -Select one or two tools that seem promising and run them on a small project or segment of your codebase. -Integrate the chosen tool with your version control system and IDE. -Review the initial set of issues identified to understand the tool's strengths and weaknesses, and determine which tool aligns best with your workflow. +- *Automation Should Complement, Not Replace Human Review* – Automated tools are great at spotting patterns but can miss context-specific problems. Keep human judgment in the loop. +- *False Positives Can Cause Fatigue* – Too many non-issues flagged will erode trust in the tool. Customize rulesets and adjust sensitivity over time. +- *Makes Tight Coupling Easier* - When you make it easier to write code that changes multiple parts of the broader system, you make it easier to introduce code that doesn't separate concerns and increases module coupling. This can be mitigated with constant retrospectives focused on how these boundaries are formed. The good news about monorepos is when you identify a poorly constructed boundary, it's easier to fix when it's all in the same repo. +- *AI Tools Can Provide Richer Feedback* – LLM-powered assistants can understand broader context and suggest better-structured fixes, but may occasionally “hallucinate” incorrect solutions — always verify. +- *Integrate Into the Workflow Early* – The earlier developers see issues (e.g., in their IDE), the less disruptive they are to fix. +- *Use the Right Tool for the Right Job* – Some tools shine in specific languages or environments; a multi-tool approach often works best. -### [Lead Workshops](/practices/lead-workshops.md) +## How to Gain Traction -#### Dependency and Modularity Analysis +### Run a Pilot on a Single Repo -Use static code analysis tools to evaluate and improve module dependencies. -Run a dependency analysis on your current codebase and document areas with high coupling and poor cohesion. -Based on the analysis, refactor parts of the codebase to improve modularity. -Run the dependency analysis again to measure improvements. +Choose one active repository, integrate one or two automated analysis tools (both a static analyzer and, optionally, an AI assistant). Measure how quickly developers address flagged issues and collect feedback. -### [Start A Book Club](/practices/start-a-book-club.md) +### Optimize Rules and Feedback Loops -#### [Automate Your Coding Standard](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_04) +Start with default rules, then refine based on false positive rates and team feedback. Set up CI/CD hooks or pre-commit checks to keep quality gates lightweight but effective. -This resource provides insights into the importance of automating coding standards to maintain code quality and consistency. -It highlights how automated tools can help enforce coding conventions, making the codebase more manageable and the development process more efficient. +### Expand Across Teams -#### [Design structure matrix](https://en.wikipedia.org/wiki/Design_structure_matrix) +After a successful pilot, share results and best practices. Provide setup guides and starter configs. Consider hosting internal workshops to help developers get the most from the tools. -The Design Structure Matrix (DSM) is a visual tool used in systems engineering and project management to represent the interactions and dependencies within complex systems or processes in a compact, square matrix format. -Originating in the 1960s, DSMs gained popularity in the 1990s across various industries and government agencies. -They can model both static systems, where elements coexist simultaneously, and time-based systems, which reflect processes over time. -DSMs are advantageous for highlighting patterns, managing changes, and optimizing system structures. -They utilize algorithms for reordering elements to minimize feedback loops and can be extended to multiple domain matrices to visualize interactions across different domains, enhancing information flow and office work optimization. - -#### [Two Wrongs Can Make a Right (and Are Difficult to Fix)](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_86) - -The article "Two Wrongs Can Make a Right (and Are Difficult to Fix)" by Allan Kelly highlights the complex nature of software bugs, particularly when two defects interact to create a single visible fault. This interplay can lead developers to repeatedly attempt fixes that fail because they only address part of the problem. Such scenarios demonstrate the importance of comprehensive error detection and resolution strategies. This concept supports the Perform Static Code Analysis Practice by underscoring the limitations of relying solely on automated tools to catch all issues. While static code analysis can identify many potential problems, it may miss nuanced or context-specific defects, especially those involving multiple interacting errors. - -#### [The power of feedback loops](https://lucamezzalira.medium.com/the-power-of-feedback-loops-f8e27e8ac25f) - -Luca Mezzalira's article 'The Power of Feedback Loops' underscores how iterative feedback enhances processes, resonating with the practice of Perform Static Code Analysis. -Like feedback loops in development cycles, static code analysis tools automate early detection of issues such as code smells and security vulnerabilities, aligning with Mezzalira's advocacy for leveraging feedback to maintain high standards while emphasizing the need for developer judgment and human oversight in software quality assurance. - -### [Host A Viewing Party](/practices/host-a-viewing-party.md) - -#### [System architecture as network data](https://vimeo.com/241241654) - -The speaker emphasizes the importance of loose coupling and high cohesion in software architecture to reduce dependencies between modules, thereby minimizing meetings and coordination overhead. -They demonstrate how to use tools like Line Topology, Cytoscape, and Jupyter Notebooks to analyze and visualize code dependencies, enabling automated detection of modularity and cohesion in the system. -By using network science and computational techniques, the speaker argues for the value of objective metrics in assessing and improving code modularity, drawing parallels to social networks and using examples like Game of Thrones character interactions to illustrate their points. - -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) - -#### Understanding and Usage - -* How well do we understand the capabilities and limitations of our static code analysis tools? -* Are we using static code analysis tools to their full potential within our development process? - -#### Integration and Workflow - -* How are our static code analysis tools integrated with our version control systems, IDEs, and CI/CD pipelines? -* Are there any bottlenecks or disruptions caused by static code analysis tools in our current workflow? - -#### Developer Judgment +## Supporting Capabilities -* Do our developers feel empowered to override automated checks when necessary? -* How often do we find that flagged issues are false positives, and how do we handle them? +### [Code Maintainability](/capabilities/code-maintainability.md) -#### Issue Detection and Resolution +Automated analysis enforces consistent coding standards and identifies maintainability issues early, keeping the codebase clean and approachable. -* Are we addressing the issues identified by static code analysis tools promptly and effectively? -* How frequently do we encounter issues that static code analysis tools miss, and how can we improve our detection methods? +### [Pervasive Security](/capabilities/pervasive-security.md) -#### Dependency Analysis +Static and AI-powered analysis can surface vulnerabilities before code is merged, helping meet security and compliance requirements. -* How effectively are we using static code analysis tools to assess and improve module cohesion and dependency management? -* Are there areas in our codebase with poor modularity that these tools have helped us identify and improve? +### [Job Satisfaction](/capabilities/job-satisfaction.md) -## Supporting Capabilities +Real-time feedback in editors and pull requests reduces context switching and increases developer confidence which will lend to more job satisfaction and less costly employee turnover. -### [Code Maintainability](/capabilities/code-maintainability.md) -The Perform Static Code Analysis practice robustly supports the Code Maintainability Dora Capability by providing automated tools that enhance code quality, consistency, and readability. -These tools meticulously scan the codebase to identify potential issues such as code smells, security vulnerabilities, and performance bottlenecks early in the development process. -By integrating static code analysis into version control systems, IDEs, and CI/CD pipelines, teams can receive immediate feedback on code changes, ensuring adherence to coding standards and best practices. This proactive approach reduces the cognitive load on developers, allowing them to focus on more complex tasks while maintaining a clean, modular, and easily comprehensible codebase. \ No newline at end of file From 7a40b64a9e84086bcddf65280981cc3a12873bad Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Mon, 18 Aug 2025 17:43:32 -0500 Subject: [PATCH 006/131] Updated static code analysis practice based on PR comments --- practices/perform-static-code-analysis.md | 41 ++++++++++++----------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/practices/perform-static-code-analysis.md b/practices/perform-static-code-analysis.md index c881d0e..b7c9eec 100644 --- a/practices/perform-static-code-analysis.md +++ b/practices/perform-static-code-analysis.md @@ -12,35 +12,38 @@ Some popular tools include: ## Who It’s For & Why -- You are a developer and would be need feedback on potential bugs, design issues, and style mismatches without waiting for a review cycle. -- You are a QA Engineer and you need to identify high risk areas earlier in the development process, so you can identify the best use of your often limited testing time ensuring greater code quality. -- You are a Tech Lead or Manager and you need to ensure consistent code quality across a team without increasing review overhead. +- **Developers** – Need fast feedback on bugs, design issues, and inconsistencies without waiting for review cycles. +- **QA Engineers** – Want to identify high-risk areas earlier to focus limited testing time more effectively. +- **Tech Leads or Managers** – Need to enforce consistent code quality across the team without increasing review overhead. -## Goals, Metrics & Signals +## Metrics & Signals -### Intended Outcomes +You know this practice is making a positive impact if... -- Faster detection and resolution of bugs, security vulnerabilities, and style inconsistencies. -- Increased developer confidence and reduced rework caused by late-discovered issues. -- Improved consistency and maintainability across the codebase. +- ...fewer issues are flagged during manual code reviews that could have been automatically detected. Track this by tagging review comments or using tools like GitHub's review insights, [DX](https://getdx.com/platform/data-lake/), or [Code Climate Velocity](https://docs.velocity.codeclimate.com/en/) to analyze trends over time. +- ...production bugs linked to preventable errors (e.g., null checks, insecure patterns) decrease. Teams can track this by tagging incident postmortems or using bug categorization in tools like [Jira](https://support.atlassian.com/jira-cloud-administration/docs/what-are-issue-types/), [Linear](https://linear.app/docs/labels), or observability platforms like [Sentry](https://docs.sentry.io/product/issues/) to monitor this trend. +- ...developer sentiment around code review “friction” improves. You can capture this through lightweight surveys using [Typeform](https://www.typeform.com/) or [Google Forms](https://www.google.com/forms/about/) before and after adoption. These can be incorporated into team retros—look for signals like reduced frustration with nitpicky feedback or faster review turnaround times. +- ... engineers begin resolving more issues before creating pull requests. IDE-integrated tools (like [ESLint](https://eslint.org/docs/latest/use/), [Semgrep](https://semgrep.dev/docs/extensions/overview#official-ide-extensions), or [Claude Code](https://claude.ai/)) often track autofix or alert resolution rates, which can be reviewed monthly to establish a baseline and measure improvement. +- ...codebase consistency and maintainability improves. This can be tracked by monitoring linter violations, rule compliance trends, or static analysis scores over time (e.g., via [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/)). -### Target Measurements -- Reduced number of issues caught during manual code reviews that could have been flagged by automated tools. -- Decrease in production bugs linked to preventable coding errors. -- Positive changes in developer sentiment around “friction” in code reviews, measured via retrospectives or surveys. +You'll want to ensure you have both a baseline measurement and an updated measurement after 4-5 weeks of experimenting with this practice. ## Lessons From The Field - -- *Automation Should Complement, Not Replace Human Review* – Automated tools are great at spotting patterns but can miss context-specific problems. Keep human judgment in the loop. -- *False Positives Can Cause Fatigue* – Too many non-issues flagged will erode trust in the tool. Customize rulesets and adjust sensitivity over time. -- *Makes Tight Coupling Easier* - When you make it easier to write code that changes multiple parts of the broader system, you make it easier to introduce code that doesn't separate concerns and increases module coupling. This can be mitigated with constant retrospectives focused on how these boundaries are formed. The good news about monorepos is when you identify a poorly constructed boundary, it's easier to fix when it's all in the same repo. -- *AI Tools Can Provide Richer Feedback* – LLM-powered assistants can understand broader context and suggest better-structured fixes, but may occasionally “hallucinate” incorrect solutions — always verify. -- *Integrate Into the Workflow Early* – The earlier developers see issues (e.g., in their IDE), the less disruptive they are to fix. -- *Use the Right Tool for the Right Job* – Some tools shine in specific languages or environments; a multi-tool approach often works best. +- *Review Fatigue Kills Trust* – When teams adopt static analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](https://medium.com/@sageniuz/where-ai-meets-code-techniques-and-best-practices-from-michael-feathers-a-summary-312ef91b6472)—a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. +**Lesson:** Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal over volume to preserve trust and ensure these tools remain useful over time. +- *AI Tools Can Provide Richer Feedback* – AI-assisted tools like Claude can help developers catch bugs earlier, write cleaner code, and accelerate onboarding—especially for newer team members. However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Automation should complement, not replace, human review. +**Lesson:** Treat AI and automation suggestions like junior developer input—often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” +- *Early Integration Reduces Friction* – Teams that surface static analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline—right when code is being written—leads to higher-quality fixes and builds better habits over time. +**Lesson:** The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. +- *Use the Right Tools for the Job* – Not all static analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments—leading to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. +**Lesson:** Choose tools tailored to your stack. A lightweight multi-tool setup, tuned per language, often outperforms an “everything in one” solution. ## How to Gain Traction +### Start with Education & Demos + +Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into their daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](https://www.youtube.com/watch?v=hkd5uk7J-qo) and [Semgrep’s blog](https://semgrep.dev/blog/2025/fix-what-matters-faster-how-semgrep-and-sysdig-are-unifying-security-from-code-to-runtime/) in advance, so team members come prepared with questions. Close the session with a short demo in your actual codebase using a tool like ESLint, SonarQube, or Claude Code to make the value real and immediate. ### Run a Pilot on a Single Repo From a5af3d33878434c674fc47b34022959a659622d9 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Thu, 21 Aug 2025 14:18:06 -0500 Subject: [PATCH 007/131] edits to new practice --- practices/perform-static-code-analysis.md | 85 ++++++++++++----------- 1 file changed, 45 insertions(+), 40 deletions(-) diff --git a/practices/perform-static-code-analysis.md b/practices/perform-static-code-analysis.md index b7c9eec..48954fb 100644 --- a/practices/perform-static-code-analysis.md +++ b/practices/perform-static-code-analysis.md @@ -1,74 +1,79 @@ # Perform Automated Code Analysis -Manually spotting every potential bug, style inconsistency, or design flaw is a tall order — and often a slow one. Automated code analysis brings speed and consistency by having tools (both traditional static analyzers and modern AI-powered assistants) scan code as you work. These tools can highlight security vulnerabilities, style discrepancies, dependency risks, and even suggest or apply fixes in real time. +Manually spotting every potential bug, style inconsistency, or design flaw is a tall order -- and often a slow one. Automated code analysis brings speed and consistency to teams by having tools scan code as they work. These tools -- both traditional static analyzers and modern AI-powered assistants -- can highlight security vulnerabilities, style discrepancies, dependency risks, and even suggest or apply fixes in real time. -Some popular tools include: -- [ESLint](https://eslint.org/docs/latest/use/getting-started) - General-purpose static analysis for JavaScript/TypeScript -- [Prettier](https://prettier.io/docs/integrating-with-linters) - Automated code formatting for JavaScript/TypeScript -- [SonarQube](https://www.sonarsource.com/sem/products/sonarqube/downloads/) - Multi-language static analysis with some AI-powered features -- [Semgrep](https://github.com/semgrep/semgrep) - Multi-language static and semantic analysis, with AI-assisted rule generation -- [Claude Code](https://www.anthropic.com/claude) - AI-powered code review, style enforcement, and bug detection -- Self-hosted LLMs - Using tools like Ollama or LM Studio to run open-source AI models locally +Some popular tools for automated code analysis include: +- [ESLint](https://eslint.org/docs/latest/use/getting-started): General-purpose static analysis for JavaScript/TypeScript +- [Prettier](https://prettier.io/docs/integrating-with-linters): Automated code formatting for JavaScript/TypeScript +- [SonarQube](https://www.sonarsource.com/sem/products/sonarqube/downloads/): Multi-language static analysis with some AI-powered features +- [Semgrep](https://github.com/semgrep/semgrep): Multi-language static and semantic analysis, with AI-assisted rule generation +- [Claude Code](https://www.anthropic.com/claude): AI-powered code review, style enforcement, and bug detection +- Self-hosted LLMs: Tools like Ollama or LM Studio allow you to run open-source AI models locally ## Who It’s For & Why -- **Developers** – Need fast feedback on bugs, design issues, and inconsistencies without waiting for review cycles. -- **QA Engineers** – Want to identify high-risk areas earlier to focus limited testing time more effectively. -- **Tech Leads or Managers** – Need to enforce consistent code quality across the team without increasing review overhead. +- **Developers** need fast feedback on bugs, design issues, and inconsistencies so they can work more efficiently and avoid waiting for review cycles. +- **QA engineers** need to identify high-risk areas earlier so they can effectively focus their limited testing time. +- **Tech leads or managers** need to enforce consistent code quality across the team so they can deliver successful products without increasing review overhead. -## Metrics & Signals +## How to Gain Traction -You know this practice is making a positive impact if... +### Start with Education & Demos -- ...fewer issues are flagged during manual code reviews that could have been automatically detected. Track this by tagging review comments or using tools like GitHub's review insights, [DX](https://getdx.com/platform/data-lake/), or [Code Climate Velocity](https://docs.velocity.codeclimate.com/en/) to analyze trends over time. -- ...production bugs linked to preventable errors (e.g., null checks, insecure patterns) decrease. Teams can track this by tagging incident postmortems or using bug categorization in tools like [Jira](https://support.atlassian.com/jira-cloud-administration/docs/what-are-issue-types/), [Linear](https://linear.app/docs/labels), or observability platforms like [Sentry](https://docs.sentry.io/product/issues/) to monitor this trend. -- ...developer sentiment around code review “friction” improves. You can capture this through lightweight surveys using [Typeform](https://www.typeform.com/) or [Google Forms](https://www.google.com/forms/about/) before and after adoption. These can be incorporated into team retros—look for signals like reduced frustration with nitpicky feedback or faster review turnaround times. -- ... engineers begin resolving more issues before creating pull requests. IDE-integrated tools (like [ESLint](https://eslint.org/docs/latest/use/), [Semgrep](https://semgrep.dev/docs/extensions/overview#official-ide-extensions), or [Claude Code](https://claude.ai/)) often track autofix or alert resolution rates, which can be reviewed monthly to establish a baseline and measure improvement. -- ...codebase consistency and maintainability improves. This can be tracked by monitoring linter violations, rule compliance trends, or static analysis scores over time (e.g., via [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/)). +Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into their daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](https://www.youtube.com/watch?v=hkd5uk7J-qo) and [Semgrep’s blog](https://semgrep.dev/blog/2025/fix-what-matters-faster-how-semgrep-and-sysdig-are-unifying-security-from-code-to-runtime/) in advance, so team members can come prepared with questions. Close the session with a short demo in your actual codebase using a tool like ESLint, SonarQube, or Claude Code to make the value real and immediate. +### Run a Pilot on a Single Repo -You'll want to ensure you have both a baseline measurement and an updated measurement after 4-5 weeks of experimenting with this practice. +Choose one active repository and integrate one or two automated analysis tools (both a static analyzer and, optionally, an AI assistant). Measure how quickly developers address flagged issues and collect feedback. +### Optimize Rules and Initiate Feedback Loops -## Lessons From The Field -- *Review Fatigue Kills Trust* – When teams adopt static analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](https://medium.com/@sageniuz/where-ai-meets-code-techniques-and-best-practices-from-michael-feathers-a-summary-312ef91b6472)—a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. -**Lesson:** Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal over volume to preserve trust and ensure these tools remain useful over time. -- *AI Tools Can Provide Richer Feedback* – AI-assisted tools like Claude can help developers catch bugs earlier, write cleaner code, and accelerate onboarding—especially for newer team members. However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Automation should complement, not replace, human review. -**Lesson:** Treat AI and automation suggestions like junior developer input—often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” -- *Early Integration Reduces Friction* – Teams that surface static analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline—right when code is being written—leads to higher-quality fixes and builds better habits over time. -**Lesson:** The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. -- *Use the Right Tools for the Job* – Not all static analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments—leading to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. -**Lesson:** Choose tools tailored to your stack. A lightweight multi-tool setup, tuned per language, often outperforms an “everything in one” solution. +Start with default rules, then refine based on false positive rates and team feedback. Set up CI/CD hooks or pre-commit checks to keep quality gates lightweight but effective. -## How to Gain Traction -### Start with Education & Demos +### Expand Across Teams -Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into their daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](https://www.youtube.com/watch?v=hkd5uk7J-qo) and [Semgrep’s blog](https://semgrep.dev/blog/2025/fix-what-matters-faster-how-semgrep-and-sysdig-are-unifying-security-from-code-to-runtime/) in advance, so team members come prepared with questions. Close the session with a short demo in your actual codebase using a tool like ESLint, SonarQube, or Claude Code to make the value real and immediate. +After a successful pilot, gather the team to share results and best practices. Provide setup guides and starter configs so the practice may gain wider adoption across teams. Consider hosting internal workshops to help developers get the most from the tools. -### Run a Pilot on a Single Repo +## Metrics & Signals + +You know this practice is making a positive impact if... -Choose one active repository, integrate one or two automated analysis tools (both a static analyzer and, optionally, an AI assistant). Measure how quickly developers address flagged issues and collect feedback. +- ...fewer issues are flagged during manual code reviews because they have been automatically detected. Track this by tagging review comments or using tools like GitHub's review insights, [DX](https://getdx.com/platform/data-lake/), or [Code Climate Velocity](https://docs.velocity.codeclimate.com/en/) to analyze trends over time. +- ...there is a decrease in production bugs linked to preventable errors (e.g., null checks, insecure patterns). Teams can monitor this trend by tagging incident postmortems or using bug categorization in tools like [Jira](https://support.atlassian.com/jira-cloud-administration/docs/what-are-issue-types/), [Linear](https://linear.app/docs/labels), or observability platforms like [Sentry](https://docs.sentry.io/product/issues/). +- ...developer sentiment around code review “friction” improves. You can capture this through lightweight surveys using [Typeform](https://www.typeform.com/) or [Google Forms](https://www.google.com/forms/about/) before and after adoption. These can be incorporated into team retros; look for signals like reduced frustration with nitpicky feedback or faster review turnaround times. +- ... engineers begin resolving more issues _before_ creating pull requests. IDE-integrated tools (like [ESLint](https://eslint.org/docs/latest/use/), [Semgrep](https://semgrep.dev/docs/extensions/overview#official-ide-extensions), or [Claude Code](https://claude.ai/)) often track autofix or alert-resolution rates, which can be reviewed monthly to establish a baseline and measure improvement. +- ...codebase consistency and maintainability improves. This can be tracked by monitoring linter violations, rule compliance trends, or static analysis scores over time (e.g., via [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/)). -### Optimize Rules and Feedback Loops -Start with default rules, then refine based on false positive rates and team feedback. Set up CI/CD hooks or pre-commit checks to keep quality gates lightweight but effective. +You'll want to ensure you have a baseline measurement and, after 4-5 weeks of experimenting with this practice, an updated measurement. -### Expand Across Teams -After a successful pilot, share results and best practices. Provide setup guides and starter configs. Consider hosting internal workshops to help developers get the most from the tools. +## Lessons From The Field + +- *Review Fatigue Kills Trust* – When teams adopt static code analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](https://medium.com/@sageniuz/where-ai-meets-code-techniques-and-best-practices-from-michael-feathers-a-summary-312ef91b6472), a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. +**Lesson:** Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal strength over volume to preserve trust and ensure these tools remain useful over time. + +- *AI Tools Can Provide Richer Feedback* – AI-assisted tools like Claude Code can help developers catch bugs earlier, write cleaner code, and accelerate onboarding (especially for newer team members). However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Treat AI and automation suggestions like junior developer input -- often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” +**Lesson:** Automation should complement, not replace, human review. + +- *Early Integration Reduces Friction* – Teams that surface static code analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline -- right when code is being written -- leads to higher-quality fixes and builds better habits over time. +**Lesson:** The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. + +- *Use the Right Tools for the Job* – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. +**Lesson:** Choose tools tailored to your stack. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. ## Supporting Capabilities ### [Code Maintainability](/capabilities/code-maintainability.md) -Automated analysis enforces consistent coding standards and identifies maintainability issues early, keeping the codebase clean and approachable. +Automated code analysis enforces consistent coding standards and identifies maintainability issues early, keeping the codebase clean and flexible. ### [Pervasive Security](/capabilities/pervasive-security.md) -Static and AI-powered analysis can surface vulnerabilities before code is merged, helping meet security and compliance requirements. +Static and AI-powered code analysis can surface vulnerabilities before code is merged, helping meet security and compliance requirements. ### [Job Satisfaction](/capabilities/job-satisfaction.md) -Real-time feedback in editors and pull requests reduces context switching and increases developer confidence which will lend to more job satisfaction and less costly employee turnover. +Real-time feedback in editors and pull requests reduces context switching and increases developer confidence, which leads to greater job satisfaction and less costly employee turnover. From 0d99c414595559f27749df6f5124147456e157eb Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Mon, 25 Aug 2025 14:19:48 -0400 Subject: [PATCH 008/131] Modification to line 69 as commented in PR --- practices/perform-static-code-analysis.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/perform-static-code-analysis.md b/practices/perform-static-code-analysis.md index 48954fb..af4758d 100644 --- a/practices/perform-static-code-analysis.md +++ b/practices/perform-static-code-analysis.md @@ -66,7 +66,7 @@ You'll want to ensure you have a baseline measurement and, after 4-5 weeks of ex ### [Code Maintainability](/capabilities/code-maintainability.md) -Automated code analysis enforces consistent coding standards and identifies maintainability issues early, keeping the codebase clean and flexible. +Automated code analysis enforces consistent coding standards and identifies maintainability issues early, keeping the codebase clean, reliable, and easy to work with. ### [Pervasive Security](/capabilities/pervasive-security.md) From 534ae18411173ed09978b9e91523b2c8652140db Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Wed, 27 Aug 2025 21:03:43 -0700 Subject: [PATCH 009/131] Extract resource pages and revise static code analysis practice page --- practices/perform-static-code-analysis.md | 67 ++++++++++--------- .../ai-vs-rule-based-static-code-analysis.md | 27 ++++++++ resources/tech/where-ai-meets-code.md | 39 +++++++++++ 3 files changed, 102 insertions(+), 31 deletions(-) create mode 100644 resources/tech/ai-vs-rule-based-static-code-analysis.md create mode 100644 resources/tech/where-ai-meets-code.md diff --git a/practices/perform-static-code-analysis.md b/practices/perform-static-code-analysis.md index af4758d..8670daa 100644 --- a/practices/perform-static-code-analysis.md +++ b/practices/perform-static-code-analysis.md @@ -1,26 +1,27 @@ # Perform Automated Code Analysis -Manually spotting every potential bug, style inconsistency, or design flaw is a tall order -- and often a slow one. Automated code analysis brings speed and consistency to teams by having tools scan code as they work. These tools -- both traditional static analyzers and modern AI-powered assistants -- can highlight security vulnerabilities, style discrepancies, dependency risks, and even suggest or apply fixes in real time. +Catching every bug or style inconsistency by hand is tough and takes a lot of time. Automated code analysis brings speed and consistency to teams by delegating that task to tools. These tools (both traditional static analyzers and modern AI-powered assistants) can highlight security vulnerabilities, style discrepancies, dependency risks, and even suggest or apply fixes in real time. Some popular tools for automated code analysis include: -- [ESLint](https://eslint.org/docs/latest/use/getting-started): General-purpose static analysis for JavaScript/TypeScript -- [Prettier](https://prettier.io/docs/integrating-with-linters): Automated code formatting for JavaScript/TypeScript -- [SonarQube](https://www.sonarsource.com/sem/products/sonarqube/downloads/): Multi-language static analysis with some AI-powered features -- [Semgrep](https://github.com/semgrep/semgrep): Multi-language static and semantic analysis, with AI-assisted rule generation -- [Claude Code](https://www.anthropic.com/claude): AI-powered code review, style enforcement, and bug detection -- Self-hosted LLMs: Tools like Ollama or LM Studio allow you to run open-source AI models locally -## Who It’s For & Why +- Static Analysis & Linting: [ESLint](https://eslint.org/docs/latest/use/getting-started), [SonarQube](https://github.com/SonarSource/sonarqube), and [Semgrep](https://github.com/semgrep/semgrep) can be used to enforce code quality +- Code Formatting: [Prettier](https://prettier.io/docs/integrating-with-linters) (TS/JS) and [rustfmt](https://github.com/rust-lang/rustfmt) (Rust) automatically enforce consistent code style +- Code Query Language: [GritQL](https://github.com/honeycombio/gritql), [CodeQL](https://codeql.github.com/), and [comby](https://github.com/comby-tools/comby) can search, lint, and modify code +- General Purpose AI Agents: [Claude Code](https://www.anthropic.com/claude), [Cursor](https://cursor.com/), and [Gemini-CLI](https://github.com/google-gemini/gemini-cli) are all general purpose AI-powered agents that can be used for code generation, review, style enforcement, and bug detection +- AI Powered Code Review: [Ellipsis](https://www.ellipsis.dev/), [GitHub Copilot](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review), [CodeRabbit](https://www.coderabbit.ai), and [Cursor Bugbot](https://cursor.com/bugbot) provide AI-assisted reviews and inline feedback +- Self-hosted LLMs: Tools like [Ollama](https://github.com/ollama/) or [LM Studio](https://github.com/lmstudio-ai) allow you to run open-source AI models locally and can be used to power some open source agentic tools + +## When to Experiment - **Developers** need fast feedback on bugs, design issues, and inconsistencies so they can work more efficiently and avoid waiting for review cycles. - **QA engineers** need to identify high-risk areas earlier so they can effectively focus their limited testing time. -- **Tech leads or managers** need to enforce consistent code quality across the team so they can deliver successful products without increasing review overhead. +- **Tech leads or managers** need to enforce consistent code quality across the team so they can deliver successful products without increasing review overhead. ## How to Gain Traction -### Start with Education & Demos +### Start with Education & Demos -Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into their daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](https://www.youtube.com/watch?v=hkd5uk7J-qo) and [Semgrep’s blog](https://semgrep.dev/blog/2025/fix-what-matters-faster-how-semgrep-and-sysdig-are-unifying-security-from-code-to-runtime/) in advance, so team members can come prepared with questions. Close the session with a short demo in your actual codebase using a tool like ESLint, SonarQube, or Claude Code to make the value real and immediate. +Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into their daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](/resources/tech/ai-vs-rule-based-static-code-analysis.md) in advance, so team members can come prepared with questions. Close the session with a short demo in your actual codebase using one or multiple of the tools listed above to make the value real and immediate. ### Run a Pilot on a Single Repo @@ -32,35 +33,41 @@ Start with default rules, then refine based on false positive rates and team fee ### Expand Across Teams -After a successful pilot, gather the team to share results and best practices. Provide setup guides and starter configs so the practice may gain wider adoption across teams. Consider hosting internal workshops to help developers get the most from the tools. +Assuming the pilot went well, gather the team to share results and best practices. Provide setup guides and starter configs so the practice may gain wider adoption across teams. Consider hosting internal workshops to help developers get the most from the tools. -## Metrics & Signals +## Lessons From The Field -You know this practice is making a positive impact if... +- _Review Fatigue Kills Trust_ – When teams adopt static code analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](/resources/tech/where-ai-meets-code.md), a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. +**Lesson:** Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal strength over volume to preserve trust and ensure these tools remain useful over time. -- ...fewer issues are flagged during manual code reviews because they have been automatically detected. Track this by tagging review comments or using tools like GitHub's review insights, [DX](https://getdx.com/platform/data-lake/), or [Code Climate Velocity](https://docs.velocity.codeclimate.com/en/) to analyze trends over time. -- ...there is a decrease in production bugs linked to preventable errors (e.g., null checks, insecure patterns). Teams can monitor this trend by tagging incident postmortems or using bug categorization in tools like [Jira](https://support.atlassian.com/jira-cloud-administration/docs/what-are-issue-types/), [Linear](https://linear.app/docs/labels), or observability platforms like [Sentry](https://docs.sentry.io/product/issues/). -- ...developer sentiment around code review “friction” improves. You can capture this through lightweight surveys using [Typeform](https://www.typeform.com/) or [Google Forms](https://www.google.com/forms/about/) before and after adoption. These can be incorporated into team retros; look for signals like reduced frustration with nitpicky feedback or faster review turnaround times. -- ... engineers begin resolving more issues _before_ creating pull requests. IDE-integrated tools (like [ESLint](https://eslint.org/docs/latest/use/), [Semgrep](https://semgrep.dev/docs/extensions/overview#official-ide-extensions), or [Claude Code](https://claude.ai/)) often track autofix or alert-resolution rates, which can be reviewed monthly to establish a baseline and measure improvement. -- ...codebase consistency and maintainability improves. This can be tracked by monitoring linter violations, rule compliance trends, or static analysis scores over time (e.g., via [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/)). +- _AI Tools Can Provide Richer Feedback_ – AI-assisted tools like Claude Code can help developers catch bugs earlier, write cleaner code, and accelerate onboarding (especially for newer team members). However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Treat AI and automation suggestions like junior developer input -- often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” +**Lesson:** Automation should complement, not replace, human review. +- _Early Integration Reduces Friction_ – Teams that surface static code analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline -- right when code is being written -- leads to higher-quality fixes and builds better habits over time. +**Lesson:** The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. -You'll want to ensure you have a baseline measurement and, after 4-5 weeks of experimenting with this practice, an updated measurement. +- _Use the Right Tools for the Job_ – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. +**Lesson:** Choose tools tailored to your stack. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. +## Deciding to Polish or Pitch -## Lessons From The Field +After experimenting with this practice for **4–5 weeks**, bring the team together and ensure the following metrics and/or signals have changed in a positive direction: -- *Review Fatigue Kills Trust* – When teams adopt static code analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](https://medium.com/@sageniuz/where-ai-meets-code-techniques-and-best-practices-from-michael-feathers-a-summary-312ef91b6472), a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. -**Lesson:** Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal strength over volume to preserve trust and ensure these tools remain useful over time. +### Fast & Measurable -- *AI Tools Can Provide Richer Feedback* – AI-assisted tools like Claude Code can help developers catch bugs earlier, write cleaner code, and accelerate onboarding (especially for newer team members). However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Treat AI and automation suggestions like junior developer input -- often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” -**Lesson:** Automation should complement, not replace, human review. +**Fewer Review Findings**. Manual code reviews should flag fewer preventable issues because automated checks caught them earlier. Track this by tagging review comments or analyzing review insights in AI-powered code review tools. -- *Early Integration Reduces Friction* – Teams that surface static code analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline -- right when code is being written -- leads to higher-quality fixes and builds better habits over time. -**Lesson:** The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. +**Pre-PR Issue Resolution**. Engineers should increasingly resolve issues before creating pull requests. IDE-integrated tools provide autofix or alert-resolution data you can measure monthly. -- *Use the Right Tools for the Job* – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. -**Lesson:** Choose tools tailored to your stack. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. +### Fast & Intangible + +**Developer Sentiment**. Friction during code reviews should decline. Capture this via lightweight surveys ([Typeform](https://www.typeform.com/) or [Google Forms](https://workspace.google.com/products/forms/)), or retro feedback looking for reduced nitpicky debates and faster review cycles. + +### Slow & Measurable + +**Production Bug Reduction**. Over time, there should be fewer production incidents tied to preventable errors (null checks, insecure patterns, etc.). Track this by tagging incident postmortems, categorizing bugs in [Jira](https://support.atlassian.com/jira-cloud-administration/docs/what-are-issue-types/), [Linear](https://linear.app/docs/labels), or observability platforms like [Sentry](https://docs.sentry.io/product/issues/). + +**Consistency & Maintainability**. Static analysis and linting scores should show steady improvement. Use [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/) to track rule compliance trends and codebase quality. ## Supporting Capabilities @@ -75,5 +82,3 @@ Static and AI-powered code analysis can surface vulnerabilities before code is m ### [Job Satisfaction](/capabilities/job-satisfaction.md) Real-time feedback in editors and pull requests reduces context switching and increases developer confidence, which leads to greater job satisfaction and less costly employee turnover. - - diff --git a/resources/tech/ai-vs-rule-based-static-code-analysis.md b/resources/tech/ai-vs-rule-based-static-code-analysis.md new file mode 100644 index 0000000..355c149 --- /dev/null +++ b/resources/tech/ai-vs-rule-based-static-code-analysis.md @@ -0,0 +1,27 @@ +# AI vs Rule-based Static Code Analysis by Kendrick Curtis + +Resource type: Video + +https://www.youtube.com/watch?v=hkd5uk7J-qo + +## What it’s about + +Kendrick shows what happens when you throw GPT at static code analysis and compares it to the old-school rule-based tools. The demo is equal parts fun and painful: wrong line numbers, inconsistent results, slow runs, and plenty of noise. But he also shows where AI could shine — explaining issues better, suggesting fixes, and maybe making devs’ lives easier. + +## Why it’s worth watching + +If you’ve ever cursed at lint warnings or doubted whether AI tools are ready for prime time, this talk hits close to home. It’s not a sales pitch — it’s more of a reality check with some hopeful takeaways. + +## Pause and Ponder + +01:13 – Beer vs AI analogy - What “bad decisions” do we risk if we lean too hard on AI tools? +04:16 – Wrong line numbers - Is noisy or misleading feedback worse than no feedback at all? +06:13 – Different run, different results - Could we trust this in a compliance/security pipeline? +08:18 – Prompt engineering headaches - Do we really want to replace maintaining rulesets with maintaining prompts? +13:40 – AI for explanations/fixes - Would we trust AI’s explanation of a bug more than its ability to find the bug? +15:01 – Cost and speed - Regex vs AI: if regex is faster, cheaper, and reliable, what’s the real use case for AI here? +20:00 – GitHub auto-fix suggestions - How comfortable would we be letting AI propose (or even auto-commit) fixes? + +## Takeaway + +AI isn’t ready to replace static analyzers, but AI tools that delegate to static analyzers might be more useful than just using static analyzers. Think of AI less as the cop writing you a ticket and more as the buddy explaining why you got pulled over. diff --git a/resources/tech/where-ai-meets-code.md b/resources/tech/where-ai-meets-code.md new file mode 100644 index 0000000..717081a --- /dev/null +++ b/resources/tech/where-ai-meets-code.md @@ -0,0 +1,39 @@ +# Where AI Meets Code by Michael Feathers + +Resource type: Video + +https://www.youtube.com/watch?v=g9m3R0NMJ1Y + +## What it’s about + +Michael Feathers (aka the "Working Effectively with Legacy Code" guy) explores how AI can be used with code, not just bolted into products. Instead of focusing on hype or tool demos, he shares mental models, experiments, and techniques for using LLMs to reason about code, generate different "views" of it (math, diagrams, translations), and spark design ideas. The talk’s vibe: curious, playful, and practical. + +## Why it’s worth watching + +It’s not another "Copilot vs. X" comparison. This is about how to think when pairing with AI. Tons of ideas you can steal for experimentation, code exploration, and better design discussions with your team. + +## Pause and Ponder + +02:31 – The "surfacing model" - When you mention a concept in a prompt, related concepts get "pulled along." How does that explain both the magic and the weird failures of LLMs? + +06:12 – Lost in the middle - Models remember the start and end of a session better than the middle. How do we see this same "recency/primacy effect" in our own team conversations? + +11:01 – Projections - What happens when you look at code from a completely different angle (math, state machines, another language)? Could we use this to spot bugs or design flaws faster? + +15:00 – Lensing - If you ask "show me the top 7 responsibilities," then narrow to 4, the model doesn’t just drop 3 — it re-evaluates. How might that trick help us clarify big messy classes? + +18:11 – Side-by-sides - Solve something yourself, then ask the model to solve it too. How often would comparing answers sharpen your own thinking? + +23:03 – "Pigeon languages" & waywords - Making up shorthand words in a session (like inate or chainify) to guide the model. Would inventing our own vocab speed up how we collaborate with AI? + +34:01 – Review fatigue - Do we have enough patience to constantly double-check AI output? How can we avoid burning out on endless "is this good enough?" reviews? + +36:50 – Small chunks rule - Why is "work in small slices" even more critical when using AI? + +37:56 – Generating tests - Should we treat AI-generated tests as disposable characterizations, not permanent assets? + +40:00 – Rigor vs. creativity - If hallucinations are "just creativity," are we leaning too hard on the wrong strengths of LLMs? + +## Takeaway + +AI isn’t here to replace us. It’s more like a brainstorming buddy that sees code from angles we normally don’t. Use it to stretch your thinking, not to hand over the wheel. From 3b59fe618d5432a29c01ac4f0dfa68d0d6c42e12 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Thu, 28 Aug 2025 20:39:07 -0700 Subject: [PATCH 010/131] Add newlines so formatting is better on github --- resources/tech/ai-vs-rule-based-static-code-analysis.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/resources/tech/ai-vs-rule-based-static-code-analysis.md b/resources/tech/ai-vs-rule-based-static-code-analysis.md index 355c149..5e75811 100644 --- a/resources/tech/ai-vs-rule-based-static-code-analysis.md +++ b/resources/tech/ai-vs-rule-based-static-code-analysis.md @@ -15,11 +15,17 @@ If you’ve ever cursed at lint warnings or doubted whether AI tools are ready f ## Pause and Ponder 01:13 – Beer vs AI analogy - What “bad decisions” do we risk if we lean too hard on AI tools? + 04:16 – Wrong line numbers - Is noisy or misleading feedback worse than no feedback at all? + 06:13 – Different run, different results - Could we trust this in a compliance/security pipeline? + 08:18 – Prompt engineering headaches - Do we really want to replace maintaining rulesets with maintaining prompts? + 13:40 – AI for explanations/fixes - Would we trust AI’s explanation of a bug more than its ability to find the bug? + 15:01 – Cost and speed - Regex vs AI: if regex is faster, cheaper, and reliable, what’s the real use case for AI here? + 20:00 – GitHub auto-fix suggestions - How comfortable would we be letting AI propose (or even auto-commit) fixes? ## Takeaway From 6f3c31c4b651f2550cc9ca2a7fa1b20fd046ea85 Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Tue, 2 Sep 2025 13:56:48 -0400 Subject: [PATCH 011/131] Per convo with dave, added how to improve resources and consolidated. Deleted solo auto code checks practice --- practices/automate-test-coverage-checks.md | 68 ------------------- ....md => perform-automated-code-analysis.md} | 28 ++++++++ 2 files changed, 28 insertions(+), 68 deletions(-) delete mode 100644 practices/automate-test-coverage-checks.md rename practices/{perform-static-code-analysis.md => perform-automated-code-analysis.md} (83%) diff --git a/practices/automate-test-coverage-checks.md b/practices/automate-test-coverage-checks.md deleted file mode 100644 index f0c502f..0000000 --- a/practices/automate-test-coverage-checks.md +++ /dev/null @@ -1,68 +0,0 @@ -# Automate Test Coverage Checks - -Automating test coverage ensures there is a baseline of test coverage for your software. -Following this practice won't guarantee the quality or reliability of your tests. As such, it's not a sufficient check by itself. -Nevertheless, it's usually a low-cost way to spot gaps in your codebase's test coverage. -Integrating these checks into CI pipelines ensures continuous validation without slowing down development. - -## Nuance - -### Coverage Metrics vs. Test Quality - -It's important to prioritize the quality of tests over coverage percentages. -Teams may focus solely on increasing coverage numbers without ensuring that tests are effective in catching bugs and edge cases. - -### Balancing Speed and Coverage - -While automating test coverage checks speeds up validation processes, overemphasizing coverage goals can lead to diminishing returns. -Setting overly ambitious coverage targets may slow down development or lead to superficial tests that don't add substantial value. -It's important to strike a balance between achieving sufficient coverage and maintaining a productive development pace. - -### Non-Functional Test Considerations - -Automated test coverage often focuses on functional aspects of software, such as correctness and behavior. -However, neglecting non-functional tests—like performance, security, and usability—can leave important aspects of automated test quality out. -Integrating non-functional tests into automated pipelines ensures comprehensive software validation. -For instance, performance tests can identify bottlenecks, security tests can detect vulnerabilities, and usability tests can improve user experience. -None of those types of tests fit neatly into a traditional "coverage" check. - -### Continuous Improvement - -Automating test coverage checks should not be a one-time setup but an ongoing process of refinement and improvement. -Teams should regularly review and adjust coverage thresholds based on evolving project requirements, feedback from testing outcomes, and changes in software functionality.### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) - -## How to Improve - -### [Start A Book Club](/practices/start-a-book-club.md) - -#### [Test Coverage](https://martinfowler.com/bliki/TestCoverage.html) - -In his blog post on test coverage, Martin Fowler explores the concept of test coverage as a tool for identifying untested code rather than as a definitive measure of test quality. -He argues that while high test coverage percentages can highlight which parts of the code are exercised by tests, they do not necessarily indicate the effectiveness of those tests. -Fowler emphasizes that test coverage should be used alongside other techniques and metrics to assess the robustness of tests, and that focusing solely on coverage numbers can lead to superficial or inadequate testing. -He advocates for a balanced approach that combines test coverage with thoughtful test design and evaluation to achieve meaningful software quality. - -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) - -#### Tailoring and Adjusting Test Coverage - -* Are our current coverage thresholds realistic and tailored to the specific needs of different modules within our application? -* How often do we review and adjust our coverage metrics to align with evolving project requirements? - -#### Effectiveness of Test Coverage - -* Do our tests catch bugs and edge cases, or are they merely boosting our coverage numbers? -* Are we adequately addressing non-functional testing, such as performance, security, and usability, in our automated test coverage? - -#### Challenges and Lessons in Test Coverage Implementation - -* Are there any cultural or organizational barriers that prevent us from fully implementing this practice? -* What lessons can we learn from past experiences to enhance our future approach to automated test coverage? - -## Supporting Capabilities - -### [Test Automation](/capabilities/test-automation.md) - -Automating test coverage checks supports the Test Automation capability by ensuring continuous and immediate feedback on code changes within the CI pipeline. -This practice identifies untested code early, helping prevent bugs and regressions, and aligns with a consistent testing strategy. -By maintaining realistic coverage thresholds for different modules, it optimizes testing efforts, enhances collaboration between testers and developers, and ultimately improves software quality and stability throughout the delivery lifecycle. diff --git a/practices/perform-static-code-analysis.md b/practices/perform-automated-code-analysis.md similarity index 83% rename from practices/perform-static-code-analysis.md rename to practices/perform-automated-code-analysis.md index 8670daa..8cd8635 100644 --- a/practices/perform-static-code-analysis.md +++ b/practices/perform-automated-code-analysis.md @@ -69,6 +69,34 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget **Consistency & Maintainability**. Static analysis and linting scores should show steady improvement. Use [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/) to track rule compliance trends and codebase quality. +## How to Improve + +### [Start A Book Club](/practices/start-a-book-club.md) + +#### [Test Coverage](https://martinfowler.com/bliki/TestCoverage.html) + +In his blog post on test coverage, Martin Fowler explores the concept of test coverage as a tool for identifying untested code rather than as a definitive measure of test quality. +He argues that while high test coverage percentages can highlight which parts of the code are exercised by tests, they do not necessarily indicate the effectiveness of those tests. +Fowler emphasizes that test coverage should be used alongside other techniques and metrics to assess the robustness of tests, and that focusing solely on coverage numbers can lead to superficial or inadequate testing. +He advocates for a balanced approach that combines test coverage with thoughtful test design and evaluation to achieve meaningful software quality. + +### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) + +#### Tailoring and Adjusting Test Coverage + +* Are our current coverage thresholds realistic and tailored to the specific needs of different modules within our application? +* How often do we review and adjust our coverage metrics to align with evolving project requirements? + +#### Effectiveness of Test Coverage + +* Do our tests catch bugs and edge cases, or are they merely boosting our coverage numbers? +* Are we adequately addressing non-functional testing, such as performance, security, and usability, in our automated test coverage? + +#### Challenges and Lessons in Test Coverage Implementation + +* Are there any cultural or organizational barriers that prevent us from fully implementing this practice? +* What lessons can we learn from past experiences to enhance our future approach to automated test coverage? + ## Supporting Capabilities ### [Code Maintainability](/capabilities/code-maintainability.md) From 8934e26ffcde0ef629edda8883b329b735af061a Mon Sep 17 00:00:00 2001 From: nicoletache Date: Fri, 5 Sep 2025 11:41:12 -0500 Subject: [PATCH 012/131] further edits to automated code analysis practice --- practices/perform-automated-code-analysis.md | 31 ++++++++------------ 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/practices/perform-automated-code-analysis.md b/practices/perform-automated-code-analysis.md index 8cd8635..1c7b8a1 100644 --- a/practices/perform-automated-code-analysis.md +++ b/practices/perform-automated-code-analysis.md @@ -13,9 +13,9 @@ Some popular tools for automated code analysis include: ## When to Experiment -- **Developers** need fast feedback on bugs, design issues, and inconsistencies so they can work more efficiently and avoid waiting for review cycles. -- **QA engineers** need to identify high-risk areas earlier so they can effectively focus their limited testing time. -- **Tech leads or managers** need to enforce consistent code quality across the team so they can deliver successful products without increasing review overhead. +- "I am a developer and I need fast feedback on bugs, design issues, and inconsistencies so I can work more efficiently and avoid waiting for review cycles." +- "I am a QA engineer and I need to identify high-risk areas earlier so I can effectively focus my limited testing time." +- "I am a tech lead or manager and I need to enforce consistent code quality across the team so we can deliver successful products without increasing review overhead." ## How to Gain Traction @@ -37,21 +37,17 @@ Assuming the pilot went well, gather the team to share results and best practice ## Lessons From The Field -- _Review Fatigue Kills Trust_ – When teams adopt static code analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](/resources/tech/where-ai-meets-code.md), a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. -**Lesson:** Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal strength over volume to preserve trust and ensure these tools remain useful over time. +- _Review Fatigue Kills Trust_ – When teams adopt static code analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](/resources/tech/where-ai-meets-code.md), a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal strength over volume to preserve trust and ensure these tools remain useful over time. -- _AI Tools Can Provide Richer Feedback_ – AI-assisted tools like Claude Code can help developers catch bugs earlier, write cleaner code, and accelerate onboarding (especially for newer team members). However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Treat AI and automation suggestions like junior developer input -- often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” -**Lesson:** Automation should complement, not replace, human review. +- _Combine AI Tools With Peer Review_ – Automation should complement, not replace, human review. AI-assisted tools like Claude Code can help developers catch bugs earlier, write cleaner code, and accelerate onboarding (especially for newer team members). However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Treat AI and automation suggestions like junior developer input -- often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” -- _Early Integration Reduces Friction_ – Teams that surface static code analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline -- right when code is being written -- leads to higher-quality fixes and builds better habits over time. -**Lesson:** The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. +- _Early Integration Reduces Friction_ – Teams that surface static code analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline -- right when code is being written -- leads to higher-quality fixes and builds better habits over time. The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. -- _Use the Right Tools for the Job_ – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. -**Lesson:** Choose tools tailored to your stack. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. +- _Use the Right Tools for the Job_ – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. ## Deciding to Polish or Pitch -After experimenting with this practice for **4–5 weeks**, bring the team together and ensure the following metrics and/or signals have changed in a positive direction: +After experimenting with this practice for **4–5 weeks**, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: ### Fast & Measurable @@ -61,7 +57,7 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ### Fast & Intangible -**Developer Sentiment**. Friction during code reviews should decline. Capture this via lightweight surveys ([Typeform](https://www.typeform.com/) or [Google Forms](https://workspace.google.com/products/forms/)), or retro feedback looking for reduced nitpicky debates and faster review cycles. +**Developer Sentiment**. Friction during code reviews should decline. Capture this via lightweight surveys ([Typeform](https://www.typeform.com/) or [Google Forms](https://workspace.google.com/products/forms/)), or retro feedback that points to reduced nitpicky debates and faster review cycles. ### Slow & Measurable @@ -71,14 +67,11 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ## How to Improve -### [Start A Book Club](/practices/start-a-book-club.md) - -#### [Test Coverage](https://martinfowler.com/bliki/TestCoverage.html) +### Read as a Team: [Test Coverage](https://martinfowler.com/bliki/TestCoverage.html) In his blog post on test coverage, Martin Fowler explores the concept of test coverage as a tool for identifying untested code rather than as a definitive measure of test quality. -He argues that while high test coverage percentages can highlight which parts of the code are exercised by tests, they do not necessarily indicate the effectiveness of those tests. -Fowler emphasizes that test coverage should be used alongside other techniques and metrics to assess the robustness of tests, and that focusing solely on coverage numbers can lead to superficial or inadequate testing. -He advocates for a balanced approach that combines test coverage with thoughtful test design and evaluation to achieve meaningful software quality. +He argues that while high test coverage percentages can highlight which parts of the code are exercised by tests, they do not necessarily indicate the *effectiveness* of those tests. +Fowler emphasizes that test coverage should be used alongside other techniques and metrics to assess the robustness of tests, and that focusing solely on coverage numbers can lead to superficial or inadequate testing. To achieve high-quality software, he advocates for a balanced approach that combines test coverage with thoughtful test design and evaluation. ### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) From 6c8f6aea6e5f91a901d8b442f84cdd2b0f62525c Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Thu, 23 Oct 2025 18:53:24 -0700 Subject: [PATCH 013/131] Update framing of when to experiment for code analysis tools --- practices/perform-automated-code-analysis.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/practices/perform-automated-code-analysis.md b/practices/perform-automated-code-analysis.md index 1c7b8a1..99f9019 100644 --- a/practices/perform-automated-code-analysis.md +++ b/practices/perform-automated-code-analysis.md @@ -13,9 +13,9 @@ Some popular tools for automated code analysis include: ## When to Experiment -- "I am a developer and I need fast feedback on bugs, design issues, and inconsistencies so I can work more efficiently and avoid waiting for review cycles." -- "I am a QA engineer and I need to identify high-risk areas earlier so I can effectively focus my limited testing time." -- "I am a tech lead or manager and I need to enforce consistent code quality across the team so we can deliver successful products without increasing review overhead." +- You are a developer who needs fast feedback on bugs, design issues, and inconsistencies so you can work more efficiently and avoid waiting for review cycles. +- You are a QA engineer and need to identify high-risk areas earlier so you can effectively focus your limited testing time. +- You are a tech lead or manager need to enforce consistent code quality across the team so we can deliver successful products without increasing review overhead. ## How to Gain Traction From 38c34974f5e96ba7372e13d622fd6e52c8359f09 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Thu, 23 Oct 2025 19:08:22 -0700 Subject: [PATCH 014/131] Remove cruft and rephrase supporting capability for static code --- practices/perform-automated-code-analysis.md | 29 ++------------------ 1 file changed, 2 insertions(+), 27 deletions(-) diff --git a/practices/perform-automated-code-analysis.md b/practices/perform-automated-code-analysis.md index 99f9019..7c7f363 100644 --- a/practices/perform-automated-code-analysis.md +++ b/practices/perform-automated-code-analysis.md @@ -9,7 +9,7 @@ Some popular tools for automated code analysis include: - Code Query Language: [GritQL](https://github.com/honeycombio/gritql), [CodeQL](https://codeql.github.com/), and [comby](https://github.com/comby-tools/comby) can search, lint, and modify code - General Purpose AI Agents: [Claude Code](https://www.anthropic.com/claude), [Cursor](https://cursor.com/), and [Gemini-CLI](https://github.com/google-gemini/gemini-cli) are all general purpose AI-powered agents that can be used for code generation, review, style enforcement, and bug detection - AI Powered Code Review: [Ellipsis](https://www.ellipsis.dev/), [GitHub Copilot](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review), [CodeRabbit](https://www.coderabbit.ai), and [Cursor Bugbot](https://cursor.com/bugbot) provide AI-assisted reviews and inline feedback -- Self-hosted LLMs: Tools like [Ollama](https://github.com/ollama/) or [LM Studio](https://github.com/lmstudio-ai) allow you to run open-source AI models locally and can be used to power some open source agentic tools +- Self-hosted LLMs: Tools like [Ollama](https://github.com/ollama/) or [LM Studio](https://github.com/lmstudio-ai) allow you to run open-source AI models locally which in turn can be used to power some open source agentic tools ## When to Experiment @@ -65,31 +65,6 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget **Consistency & Maintainability**. Static analysis and linting scores should show steady improvement. Use [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/) to track rule compliance trends and codebase quality. -## How to Improve - -### Read as a Team: [Test Coverage](https://martinfowler.com/bliki/TestCoverage.html) - -In his blog post on test coverage, Martin Fowler explores the concept of test coverage as a tool for identifying untested code rather than as a definitive measure of test quality. -He argues that while high test coverage percentages can highlight which parts of the code are exercised by tests, they do not necessarily indicate the *effectiveness* of those tests. -Fowler emphasizes that test coverage should be used alongside other techniques and metrics to assess the robustness of tests, and that focusing solely on coverage numbers can lead to superficial or inadequate testing. To achieve high-quality software, he advocates for a balanced approach that combines test coverage with thoughtful test design and evaluation. - -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) - -#### Tailoring and Adjusting Test Coverage - -* Are our current coverage thresholds realistic and tailored to the specific needs of different modules within our application? -* How often do we review and adjust our coverage metrics to align with evolving project requirements? - -#### Effectiveness of Test Coverage - -* Do our tests catch bugs and edge cases, or are they merely boosting our coverage numbers? -* Are we adequately addressing non-functional testing, such as performance, security, and usability, in our automated test coverage? - -#### Challenges and Lessons in Test Coverage Implementation - -* Are there any cultural or organizational barriers that prevent us from fully implementing this practice? -* What lessons can we learn from past experiences to enhance our future approach to automated test coverage? - ## Supporting Capabilities ### [Code Maintainability](/capabilities/code-maintainability.md) @@ -102,4 +77,4 @@ Static and AI-powered code analysis can surface vulnerabilities before code is m ### [Job Satisfaction](/capabilities/job-satisfaction.md) -Real-time feedback in editors and pull requests reduces context switching and increases developer confidence, which leads to greater job satisfaction and less costly employee turnover. +When developers get timely, contextual feedback inside their normal workflow, they stay in flow and feel more effective. DORA's findings link that sense of control and autonomy with higher job satisfaction. From bee8c74b00a9daca48dad2a3c3797080b286367a Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Thu, 23 Oct 2025 19:10:56 -0700 Subject: [PATCH 015/131] Revert "Per convo with dave, added how to improve resources and consolidated. Deleted solo auto code checks practice" This reverts commit dece103fbfb657d175bb8c04b10d8cf8a2e6a4e9. --- practices/automate-test-coverage-checks.md | 68 +++++++++++++++++++ ...sis.md => perform-static-code-analysis.md} | 0 2 files changed, 68 insertions(+) create mode 100644 practices/automate-test-coverage-checks.md rename practices/{perform-automated-code-analysis.md => perform-static-code-analysis.md} (100%) diff --git a/practices/automate-test-coverage-checks.md b/practices/automate-test-coverage-checks.md new file mode 100644 index 0000000..f0c502f --- /dev/null +++ b/practices/automate-test-coverage-checks.md @@ -0,0 +1,68 @@ +# Automate Test Coverage Checks + +Automating test coverage ensures there is a baseline of test coverage for your software. +Following this practice won't guarantee the quality or reliability of your tests. As such, it's not a sufficient check by itself. +Nevertheless, it's usually a low-cost way to spot gaps in your codebase's test coverage. +Integrating these checks into CI pipelines ensures continuous validation without slowing down development. + +## Nuance + +### Coverage Metrics vs. Test Quality + +It's important to prioritize the quality of tests over coverage percentages. +Teams may focus solely on increasing coverage numbers without ensuring that tests are effective in catching bugs and edge cases. + +### Balancing Speed and Coverage + +While automating test coverage checks speeds up validation processes, overemphasizing coverage goals can lead to diminishing returns. +Setting overly ambitious coverage targets may slow down development or lead to superficial tests that don't add substantial value. +It's important to strike a balance between achieving sufficient coverage and maintaining a productive development pace. + +### Non-Functional Test Considerations + +Automated test coverage often focuses on functional aspects of software, such as correctness and behavior. +However, neglecting non-functional tests—like performance, security, and usability—can leave important aspects of automated test quality out. +Integrating non-functional tests into automated pipelines ensures comprehensive software validation. +For instance, performance tests can identify bottlenecks, security tests can detect vulnerabilities, and usability tests can improve user experience. +None of those types of tests fit neatly into a traditional "coverage" check. + +### Continuous Improvement + +Automating test coverage checks should not be a one-time setup but an ongoing process of refinement and improvement. +Teams should regularly review and adjust coverage thresholds based on evolving project requirements, feedback from testing outcomes, and changes in software functionality.### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) + +## How to Improve + +### [Start A Book Club](/practices/start-a-book-club.md) + +#### [Test Coverage](https://martinfowler.com/bliki/TestCoverage.html) + +In his blog post on test coverage, Martin Fowler explores the concept of test coverage as a tool for identifying untested code rather than as a definitive measure of test quality. +He argues that while high test coverage percentages can highlight which parts of the code are exercised by tests, they do not necessarily indicate the effectiveness of those tests. +Fowler emphasizes that test coverage should be used alongside other techniques and metrics to assess the robustness of tests, and that focusing solely on coverage numbers can lead to superficial or inadequate testing. +He advocates for a balanced approach that combines test coverage with thoughtful test design and evaluation to achieve meaningful software quality. + +### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) + +#### Tailoring and Adjusting Test Coverage + +* Are our current coverage thresholds realistic and tailored to the specific needs of different modules within our application? +* How often do we review and adjust our coverage metrics to align with evolving project requirements? + +#### Effectiveness of Test Coverage + +* Do our tests catch bugs and edge cases, or are they merely boosting our coverage numbers? +* Are we adequately addressing non-functional testing, such as performance, security, and usability, in our automated test coverage? + +#### Challenges and Lessons in Test Coverage Implementation + +* Are there any cultural or organizational barriers that prevent us from fully implementing this practice? +* What lessons can we learn from past experiences to enhance our future approach to automated test coverage? + +## Supporting Capabilities + +### [Test Automation](/capabilities/test-automation.md) + +Automating test coverage checks supports the Test Automation capability by ensuring continuous and immediate feedback on code changes within the CI pipeline. +This practice identifies untested code early, helping prevent bugs and regressions, and aligns with a consistent testing strategy. +By maintaining realistic coverage thresholds for different modules, it optimizes testing efforts, enhances collaboration between testers and developers, and ultimately improves software quality and stability throughout the delivery lifecycle. diff --git a/practices/perform-automated-code-analysis.md b/practices/perform-static-code-analysis.md similarity index 100% rename from practices/perform-automated-code-analysis.md rename to practices/perform-static-code-analysis.md From 8363594e12c3615259a97f47bd3158852b2b1d73 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Thu, 23 Oct 2025 19:12:04 -0700 Subject: [PATCH 016/131] Unrevert rename of automated code analysis practice --- ...static-code-analysis.md => perform-automated-code-analysis.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename practices/{perform-static-code-analysis.md => perform-automated-code-analysis.md} (100%) diff --git a/practices/perform-static-code-analysis.md b/practices/perform-automated-code-analysis.md similarity index 100% rename from practices/perform-static-code-analysis.md rename to practices/perform-automated-code-analysis.md From 86229b5af8bbede1cbe68034535679e42ca5a97f Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Thu, 23 Oct 2025 19:15:16 -0700 Subject: [PATCH 017/131] Update framing in the new practice template --- templates/new-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/templates/new-practice.md b/templates/new-practice.md index 12e19ce..54702c0 100644 --- a/templates/new-practice.md +++ b/templates/new-practice.md @@ -9,7 +9,7 @@ Quick 2-4 sentence summary. What’s the practice? Why should teams care? Keep i ## When to Experiment ```text -“I am a [persona] and I need to [learn how to / ensure that] so I can [end goal].” +You are a [persona] and need to [learn how to / ensure that] so you can [end goal].” (List for each relevant persona: Non-technical exec, Technical exec, Developer, QA, PM, Product Manager, etc.) ``` From 27eb4ddf0c8e6e385a5046292ee78988e7072e97 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Tue, 28 Oct 2025 10:07:06 -0500 Subject: [PATCH 018/131] edit to updated FC-IS practice --- ...follow-functional-core-imperative-shell.md | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/practices/follow-functional-core-imperative-shell.md b/practices/follow-functional-core-imperative-shell.md index 987f78f..c3e3851 100644 --- a/practices/follow-functional-core-imperative-shell.md +++ b/practices/follow-functional-core-imperative-shell.md @@ -1,16 +1,16 @@ # Follow Functional Core, Imperative Shell -When a codebase has tight coupling between state and behavior, changes become difficult, testing impractical, and new developer onboarding difficult. The Functional Core, Imperative Shell pattern introduces a clear separation: pure, side-effect-free logic is isolated in abstractions called “functional cores,” while I/O and system interactions are handled in abstractions called “imperative shells.” This structure improves modularity, simplifies testing, and makes it safer and easier to evolve complex parts of the system over time. +When a codebase has tight coupling between state and behavior, changes become difficult, testing impractical, and new developer onboarding tricky. The Functional Core, Imperative Shell pattern introduces a clear separation between state and behavior: Pure, side-effect-free logic is isolated in abstractions called “functional cores,” while I/O and system interactions are handled in abstractions called “imperative shells.” This structure improves modularity, simplifies testing, and makes it safer and easier to evolve complex parts of the system over time. -Lots of other patterns build on this same idea. Hexagonal, Onion, and Clean Architectures all formalize it at the system level by placing a pure, dependency-free domain at the center and pushing frameworks, databases, and APIs to the outer shell. In each case, the essence is the same: keep decision-making pure and deterministic, and confine the messy realities of the outside world to the edges where they can be swapped, mocked, or evolved independently. +Lots of other patterns build on this same idea. Hexagonal, Onion, and Clean Architectures all formalize it at the system level by placing a pure, dependency-free domain at the center and pushing frameworks, databases, and APIs to the outer shell. In each case, the essence is the same: Keep decision-making pure and deterministic, and confine the messy realities of the outside world to the edges where they can be swapped, mocked, or evolved independently. ## When to Experiment - You are a developer and you are struggling to write isolated unit tests because the underlying system is very coupled. -- You are a frontend developer, and you need to keep UI rendering predictable while isolating browser events and API calls so the interface stays easy to reason about. -- You are an architect, and you need to organize systems so that they remain reliable, scalable, and easy to evolve as the business grows without constant rewrites or coordination bottlenecks. -- You are a data engineer, and you need to build testable, reusable, and replayable transformation pipelines. -- You are an engineering leader who needs to accelerate delivery while improving stability so new developers can ramp up quickly, teams can ship safely, and the platform can scale without breaking. +- You are a frontend developer and you need to keep UI rendering predictable while isolating browser events and API calls so the interface stays intuitive. +- You are an architect and you need to organize systems so that they remain reliable, scalable, and easy to evolve as the business grows *without* constant rewrites or coordination bottlenecks. +- You are a data engineer and you need to build testable, reusable, and replayable transformation pipelines. +- You are an engineering leader and you need to accelerate delivery while improving stability so new developers can ramp up quickly, teams can ship safely, and the platform can scale without breaking. ## How to Gain Traction @@ -52,32 +52,32 @@ Transitioning to the Functional Core, Imperative Shell pattern may present a ste - *Framework Gravity* – Framework conventions naturally pull logic toward controllers, services, and models, blurring the line between pure and side-effecting code. Teams often think they’ve built a functional core when it still depends on framework helpers. Breaking free usually starts by isolating one rule or workflow outside the framework to prove the value of true independence. -- *Fear of Architectural Overreach* – Teams burned by past "architecture experiments" often equate Functional Core / Imperative Shell with another dogmatism crusade. When the pattern is explained in abstract terms, skepticism has room to breathe; when it’s shown through concrete before-and-after examples of simpler testing or safer changes, the conversation shifts from ideology to practicality. +- *Fear of Architectural Overreach* – Teams that have been burned by past "architecture experiments" often equate the Functional Core, Imperative Shell pattern with another dogmatism crusade. When the pattern is explained in abstract terms, skepticism has room to breathe; when it’s shown through concrete before-and-after examples of simpler testing or safer changes, the conversation shifts from ideology to practicality. ## Deciding to Pitch or Polish -After experimenting with this practice for a month, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: +After experimenting with this practice for **one month**, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: ### Fast & Measurable -**Higher Ratio of Unit to Integration Tests** – As logic becomes framework-independent, teams naturally write more unit tests and fewer brittle integrations. Test coverage tools or tagging schemes (e.g., @unit, @integration) reveal this shift toward isolated, fast-running verification. +**Higher Ratio of Unit to Integration Tests.** As logic becomes framework-independent, teams naturally write more unit tests and fewer brittle integrations. Test coverage tools or tagging schemes (e.g., @unit, @integration) reveal this shift toward isolated, fast-running verification. ### Slow & Measurable -**Reduced Test Runtime** – Pure functions execute without bootstrapping frameworks or external systems, cutting test times and feedback cycles. This improvement shows up over time in CI dashboards or local test runner metrics as test suites complete faster and more reliably. +**Reduced Test Runtime.** Pure functions execute without bootstrapping frameworks or external systems, cutting test times and feedback cycles. This improvement shows up over time in CI dashboards or local test runner metrics as test suites complete faster and more reliably. -**Shorter Onboarding Time** – A clearer separation between core logic and I/O layers reduces cognitive load for new hires. Developer experience (DX) surveys should provide measurable evidence of ramp-up speed improving over multiple cohorts. +**Shorter Onboarding Time.** A clearer separation between core logic and I/O layers reduces cognitive load for new hires. Developer experience (DX) surveys should provide measurable evidence of ramp-up speed improving over multiple cohorts. ### Slow & Intangible -**Faster, Safer Refactors** – Once side effects are isolated at the edges, developers can modify or replace integrations with less coordination and lower regression risk. +**Faster, Safer Refactors.** Once side effects are isolated at the edges, developers can modify or replace integrations with less coordination and lower regression risk. ## Supported Capabilities ### [Code Maintainability](/capabilities/code-maintainability.md) -By separating business logic into a functional core and side effects into an imperative shell, code becomes more readable, more comprehensible, and less complex. With a clear distinction between pure functions and imperative code, developers can more easily understand and modify code, leading to improved maintainability and stability of the software system. +By separating business logic into a functional core and side effects into an imperative shell, code becomes more readable, more comprehensive, and less complex. With a clear distinction between pure functions and imperative code, developers can more easily understand and modify code, leading to improved maintainability and stability of the software system. ### [Test Automation](/capabilities/test-automation.md) -The functional core allows for straightforward unit testing - its pure functions yield predictable results and don't rely on external states -- while the imperative shell handles side effects and interactions with external systems, which can be tested through integration tests. This clear separation simplifies the testing process, improves test coverage, and provides faster and more reliable feedback during development, which is crucial for robust and efficient test automation. +The functional core allows for straightforward unit testing - its pure functions yield predictable results and don't rely on external states. The imperative shell handles side effects and interactions with external systems, which can be tested through integration tests. This clear separation simplifies the testing process, improves test coverage, and provides faster and more reliable feedback during development, which is crucial for robust and efficient test automation. From f9801e1e96aecd0aa6c7e5fe3d072cc97349071d Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Tue, 28 Oct 2025 17:26:02 -0700 Subject: [PATCH 019/131] Refine FC/IS intro sentence --- practices/follow-functional-core-imperative-shell.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/follow-functional-core-imperative-shell.md b/practices/follow-functional-core-imperative-shell.md index c3e3851..efe668c 100644 --- a/practices/follow-functional-core-imperative-shell.md +++ b/practices/follow-functional-core-imperative-shell.md @@ -1,6 +1,6 @@ # Follow Functional Core, Imperative Shell -When a codebase has tight coupling between state and behavior, changes become difficult, testing impractical, and new developer onboarding tricky. The Functional Core, Imperative Shell pattern introduces a clear separation between state and behavior: Pure, side-effect-free logic is isolated in abstractions called “functional cores,” while I/O and system interactions are handled in abstractions called “imperative shells.” This structure improves modularity, simplifies testing, and makes it safer and easier to evolve complex parts of the system over time. +When state and behavior are tightly coupled in a codebase, changes become difficult, testing becomes impractical, and new developer onboarding becomes tricky. The Functional Core, Imperative Shell pattern introduces a clear separation between state and behavior: Pure, side-effect-free logic is isolated in abstractions called “functional cores,” while I/O and system interactions are handled in abstractions called “imperative shells.” This structure improves modularity, simplifies testing, and makes it safer and easier to evolve complex parts of the system over time. Lots of other patterns build on this same idea. Hexagonal, Onion, and Clean Architectures all formalize it at the system level by placing a pure, dependency-free domain at the center and pushing frameworks, databases, and APIs to the outer shell. In each case, the essence is the same: Keep decision-making pure and deterministic, and confine the messy realities of the outside world to the edges where they can be swapped, mocked, or evolved independently. From 42d0e493d196c169f4c1567aaa99395af326410d Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 14 Nov 2025 15:14:46 -0700 Subject: [PATCH 020/131] Remove empty practices --- practices/address-resource-constraints-incrementally.md | 0 practices/automate-deployment.md | 0 practices/backup-data-daily.md | 0 practices/build-a-single-binary.md | 0 practices/clean-tests.md | 0 practices/conduct-retrospective-meetings.md | 0 practices/design-for-eventual-consistency.md | 0 practices/hold-environment-information-separately.md | 0 practices/implement-actor-based-model.md | 0 practices/implement-anti-entropy-patterns.md | 0 practices/implement-bulkheads.md | 0 practices/implement-cascading-failure-mitigation-strategies.md | 0 practices/implement-circuit-breaker-pattern.md | 0 practices/implement-composable-design.md | 0 practices/implement-distributed-tracing.md | 0 practices/implement-domain-driven-design.md | 0 practices/implement-elastic-systems.md | 0 practices/implement-event-driven-architecture.md | 0 practices/implement-feature-flags.md | 0 practices/implement-form-object-pattern.md | 0 practices/implement-graceful-degradation-and-fallbacks.md | 0 practices/implement-health-checks.md | 0 practices/implement-load-balancing.md | 0 practices/implement-logging.md | 0 practices/implement-message-driven-systems.md | 0 practices/implement-microservice-architecture.md | 0 practices/implement-monitoring-metrics.md | 0 practices/implement-plugin-architecture.md | 0 practices/implement-repository-pattern.md | 0 practices/implement-stability-patterns.md | 0 practices/implement-timeouts-and-retries.md | 0 practices/optimize-data-structures.md | 0 practices/plan-capacity.md | 0 practices/prioritize-design-separation.md | 0 practices/provide-dev-coaching.md | 0 practices/pursue-continuous-personal-development.md | 0 practices/reuse-code-mindfully.md | 0 practices/run-daily-standups.md | 0 practices/scan-vulnerabilities.md | 0 practices/segregate-sensitive-and-insensitive-data.md | 0 practices/separate-credentials-from-code.md | 0 practices/share-knowledge.md | 0 practices/test-for-fault-tolerance.md | 0 practices/understand-your-system-requirements.md | 0 practices/use-templates-for-new-projects.md | 0 practices/use-test-doubles.md | 0 practices/write-characterization-testing-for-legacy-code.md | 0 practices/write-code-in-functional-programming-style.md | 0 practices/write-ephemeral-model-based-tests.md | 0 practices/write-invest-back-log-items.md | 0 practices/write-performance-tests.md | 0 51 files changed, 0 insertions(+), 0 deletions(-) delete mode 100644 practices/address-resource-constraints-incrementally.md delete mode 100644 practices/automate-deployment.md delete mode 100644 practices/backup-data-daily.md delete mode 100644 practices/build-a-single-binary.md delete mode 100644 practices/clean-tests.md delete mode 100644 practices/conduct-retrospective-meetings.md delete mode 100644 practices/design-for-eventual-consistency.md delete mode 100644 practices/hold-environment-information-separately.md delete mode 100644 practices/implement-actor-based-model.md delete mode 100644 practices/implement-anti-entropy-patterns.md delete mode 100644 practices/implement-bulkheads.md delete mode 100644 practices/implement-cascading-failure-mitigation-strategies.md delete mode 100644 practices/implement-circuit-breaker-pattern.md delete mode 100644 practices/implement-composable-design.md delete mode 100644 practices/implement-distributed-tracing.md delete mode 100644 practices/implement-domain-driven-design.md delete mode 100644 practices/implement-elastic-systems.md delete mode 100644 practices/implement-event-driven-architecture.md delete mode 100644 practices/implement-feature-flags.md delete mode 100644 practices/implement-form-object-pattern.md delete mode 100644 practices/implement-graceful-degradation-and-fallbacks.md delete mode 100644 practices/implement-health-checks.md delete mode 100644 practices/implement-load-balancing.md delete mode 100644 practices/implement-logging.md delete mode 100644 practices/implement-message-driven-systems.md delete mode 100644 practices/implement-microservice-architecture.md delete mode 100644 practices/implement-monitoring-metrics.md delete mode 100644 practices/implement-plugin-architecture.md delete mode 100644 practices/implement-repository-pattern.md delete mode 100644 practices/implement-stability-patterns.md delete mode 100644 practices/implement-timeouts-and-retries.md delete mode 100644 practices/optimize-data-structures.md delete mode 100644 practices/plan-capacity.md delete mode 100644 practices/prioritize-design-separation.md delete mode 100644 practices/provide-dev-coaching.md delete mode 100644 practices/pursue-continuous-personal-development.md delete mode 100644 practices/reuse-code-mindfully.md delete mode 100644 practices/run-daily-standups.md delete mode 100644 practices/scan-vulnerabilities.md delete mode 100644 practices/segregate-sensitive-and-insensitive-data.md delete mode 100644 practices/separate-credentials-from-code.md delete mode 100644 practices/share-knowledge.md delete mode 100644 practices/test-for-fault-tolerance.md delete mode 100644 practices/understand-your-system-requirements.md delete mode 100644 practices/use-templates-for-new-projects.md delete mode 100644 practices/use-test-doubles.md delete mode 100644 practices/write-characterization-testing-for-legacy-code.md delete mode 100644 practices/write-code-in-functional-programming-style.md delete mode 100644 practices/write-ephemeral-model-based-tests.md delete mode 100644 practices/write-invest-back-log-items.md delete mode 100644 practices/write-performance-tests.md diff --git a/practices/address-resource-constraints-incrementally.md b/practices/address-resource-constraints-incrementally.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/automate-deployment.md b/practices/automate-deployment.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/backup-data-daily.md b/practices/backup-data-daily.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/build-a-single-binary.md b/practices/build-a-single-binary.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/clean-tests.md b/practices/clean-tests.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/conduct-retrospective-meetings.md b/practices/conduct-retrospective-meetings.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/design-for-eventual-consistency.md b/practices/design-for-eventual-consistency.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/hold-environment-information-separately.md b/practices/hold-environment-information-separately.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-actor-based-model.md b/practices/implement-actor-based-model.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-anti-entropy-patterns.md b/practices/implement-anti-entropy-patterns.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-bulkheads.md b/practices/implement-bulkheads.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-cascading-failure-mitigation-strategies.md b/practices/implement-cascading-failure-mitigation-strategies.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-circuit-breaker-pattern.md b/practices/implement-circuit-breaker-pattern.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-composable-design.md b/practices/implement-composable-design.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-distributed-tracing.md b/practices/implement-distributed-tracing.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-domain-driven-design.md b/practices/implement-domain-driven-design.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-elastic-systems.md b/practices/implement-elastic-systems.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-event-driven-architecture.md b/practices/implement-event-driven-architecture.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-feature-flags.md b/practices/implement-feature-flags.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-form-object-pattern.md b/practices/implement-form-object-pattern.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-graceful-degradation-and-fallbacks.md b/practices/implement-graceful-degradation-and-fallbacks.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-health-checks.md b/practices/implement-health-checks.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-load-balancing.md b/practices/implement-load-balancing.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-logging.md b/practices/implement-logging.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-message-driven-systems.md b/practices/implement-message-driven-systems.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-microservice-architecture.md b/practices/implement-microservice-architecture.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-monitoring-metrics.md b/practices/implement-monitoring-metrics.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-plugin-architecture.md b/practices/implement-plugin-architecture.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-repository-pattern.md b/practices/implement-repository-pattern.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-stability-patterns.md b/practices/implement-stability-patterns.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/implement-timeouts-and-retries.md b/practices/implement-timeouts-and-retries.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/optimize-data-structures.md b/practices/optimize-data-structures.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/plan-capacity.md b/practices/plan-capacity.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/prioritize-design-separation.md b/practices/prioritize-design-separation.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/provide-dev-coaching.md b/practices/provide-dev-coaching.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/pursue-continuous-personal-development.md b/practices/pursue-continuous-personal-development.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/reuse-code-mindfully.md b/practices/reuse-code-mindfully.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/run-daily-standups.md b/practices/run-daily-standups.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/scan-vulnerabilities.md b/practices/scan-vulnerabilities.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/segregate-sensitive-and-insensitive-data.md b/practices/segregate-sensitive-and-insensitive-data.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/separate-credentials-from-code.md b/practices/separate-credentials-from-code.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/share-knowledge.md b/practices/share-knowledge.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/test-for-fault-tolerance.md b/practices/test-for-fault-tolerance.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/understand-your-system-requirements.md b/practices/understand-your-system-requirements.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/use-templates-for-new-projects.md b/practices/use-templates-for-new-projects.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/use-test-doubles.md b/practices/use-test-doubles.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/write-characterization-testing-for-legacy-code.md b/practices/write-characterization-testing-for-legacy-code.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/write-code-in-functional-programming-style.md b/practices/write-code-in-functional-programming-style.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/write-ephemeral-model-based-tests.md b/practices/write-ephemeral-model-based-tests.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/write-invest-back-log-items.md b/practices/write-invest-back-log-items.md deleted file mode 100644 index e69de29..0000000 diff --git a/practices/write-performance-tests.md b/practices/write-performance-tests.md deleted file mode 100644 index e69de29..0000000 From d4fd410c5693cd2614d3de60fbc734cbfccb89fa Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 14 Nov 2025 15:42:04 -0700 Subject: [PATCH 021/131] remove under-construction practices --- practices/conduct-incident-reviews.md | 3 -- practices/host-a-roundtable-discussion.md | 13 ----- ...implement-a-documentation-search-engine.md | 45 ------------------ practices/incremental-development.md | 47 ------------------- .../run-automated-tests-in-ci-pipeline.md | 45 ------------------ .../schedule-regular-documentation-audits.md | 45 ------------------ ...e-documentation-auto-generation-tooling.md | 45 ------------------ .../write-code-with-single-responsibility.md | 1 - 8 files changed, 244 deletions(-) delete mode 100644 practices/conduct-incident-reviews.md delete mode 100644 practices/host-a-roundtable-discussion.md delete mode 100644 practices/implement-a-documentation-search-engine.md delete mode 100644 practices/incremental-development.md delete mode 100644 practices/run-automated-tests-in-ci-pipeline.md delete mode 100644 practices/schedule-regular-documentation-audits.md delete mode 100644 practices/use-documentation-auto-generation-tooling.md delete mode 100644 practices/write-code-with-single-responsibility.md diff --git a/practices/conduct-incident-reviews.md b/practices/conduct-incident-reviews.md deleted file mode 100644 index af7e96f..0000000 --- a/practices/conduct-incident-reviews.md +++ /dev/null @@ -1,3 +0,0 @@ -## Resources - -[Incident Review and Postmortem Best Practices](https://newsletter.pragmaticengineer.com/p/incident-review-best-practices) \ No newline at end of file diff --git a/practices/host-a-roundtable-discussion.md b/practices/host-a-roundtable-discussion.md deleted file mode 100644 index 2e77501..0000000 --- a/practices/host-a-roundtable-discussion.md +++ /dev/null @@ -1,13 +0,0 @@ -# Host A Roundtable Discussion - -Under Construction - - diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md deleted file mode 100644 index 81d835b..0000000 --- a/practices/implement-a-documentation-search-engine.md +++ /dev/null @@ -1,45 +0,0 @@ -# Implement A Documentation Search Engine - -Under Construction - - \ No newline at end of file diff --git a/practices/incremental-development.md b/practices/incremental-development.md deleted file mode 100644 index 0edc1f2..0000000 --- a/practices/incremental-development.md +++ /dev/null @@ -1,47 +0,0 @@ -# Incremental Development - - - -## Nuance - - - -## Introspective Questions - - - -## How to Improve - -### [Lead A Demonstration](/practices/lead-a-demonstration.md) - -### [Run Pair Programming Sessions](/practices/run-pair-programming-sessions.md) - -### [Lead Workshops](/practices/lead-workshops.md) - -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) - -### [Start A Book Club](/practices/start-a-book-club.md) - -### [Host A Viewing Party](/practices/host-a-viewing-party.md) - -### [Do A Spike](/practices/do-a-spike.md) - -### [Host A Retrospective](/practices/host-a-retrospective.md) - -### [Talk Directly With Users](/practices/talk-directly-with-users.md) - -### [Dogfood Your Systems](/practices/dogfood-your-systems.md) - -### [Start A Community Of Practice](/practices/start-a-community-of-practice.md) - -## Resources - - - -## Related Practices - - - -## Supporting Capabilities - - diff --git a/practices/run-automated-tests-in-ci-pipeline.md b/practices/run-automated-tests-in-ci-pipeline.md deleted file mode 100644 index c918acd..0000000 --- a/practices/run-automated-tests-in-ci-pipeline.md +++ /dev/null @@ -1,45 +0,0 @@ -# Run Automated Tests In An Integration/Deployment Pipeline - -## Key Points - -* Benefits of Running Tests in CI Pipeline - * Early detection of defects - * Reduced integration problems - * Improved code quality and reliability - * Faster feedback loops - * Enhanced collaboration among team members -* Types of Tests **instead of going into detail about each one, link to the appropriate practice and talk about when in the pipeline these types of tests should be run. Ex, you may not want to run all of these tests for every single type of build. - * [Unit tests](/practices/implement-unit-tests.md) - * [Integration tests](/practices/implement-integration-tests.md) - * [End-to-end tests](/practices/implement-end-to-end-tests.md) - * [Performance tests](/practices/implement-performance-tests.md) -* Best Practices for Running Tests in CI Pipeline - * Prioritize fast-running tests - * Parallelize test execution - * Maintain a clean test environment - * Containerization to ensure correct dependencies - * Database Management - * Mock external dependencies - * Ensure test data consistency - * Regularly review and update tests -* Challenges and Solutions - * Flaky tests and how to handle them - * Managing long-running tests - * Ensuring test coverage and avoiding test duplication - * Scaling tests with the project growth - - - \ No newline at end of file diff --git a/practices/schedule-regular-documentation-audits.md b/practices/schedule-regular-documentation-audits.md deleted file mode 100644 index e20aa28..0000000 --- a/practices/schedule-regular-documentation-audits.md +++ /dev/null @@ -1,45 +0,0 @@ -# Schedule Regular Documentation Audits - -Under Construction - - \ No newline at end of file diff --git a/practices/use-documentation-auto-generation-tooling.md b/practices/use-documentation-auto-generation-tooling.md deleted file mode 100644 index d05fd87..0000000 --- a/practices/use-documentation-auto-generation-tooling.md +++ /dev/null @@ -1,45 +0,0 @@ -# Use Documentation Auto-Generation Tooling - -Under Construction - - \ No newline at end of file diff --git a/practices/write-code-with-single-responsibility.md b/practices/write-code-with-single-responsibility.md deleted file mode 100644 index 1542112..0000000 --- a/practices/write-code-with-single-responsibility.md +++ /dev/null @@ -1 +0,0 @@ - From abf94a6d640e10314548275e9939847980378dc2 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Fri, 21 Nov 2025 09:31:41 -0800 Subject: [PATCH 022/131] Fix broken link on code maintainability page --- capabilities/code-maintainability.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/capabilities/code-maintainability.md b/capabilities/code-maintainability.md index e918ceb..35eb8ba 100644 --- a/capabilities/code-maintainability.md +++ b/capabilities/code-maintainability.md @@ -88,11 +88,11 @@ Reducing coupling between abstractions creates a modular and flexible codebase. Independent, well-defined components minimize unintended side effects, making the code easier to understand, modify, and test. This modularity ensures that changes in one part of the system do not disrupt others, preserving stability and reducing cognitive load on developers. Clear abstractions and minimal dependencies support better documentation and collaboration, facilitating efficient onboarding and continuous improvement. -### [Perform Static Code Analysis](/practices/perform-static-code-analysis.md) +### [Perform Automated Code Analysis](/practices/perform-automated-code-analysis.md) -Performing static code analysis involves using automated tools to enhance code quality, consistency, and readability. +Automating code analysis involves using tools to enhance code quality, consistency, and readability. These tools meticulously scan the codebase to identify potential issues such as code smells, security vulnerabilities, and performance bottlenecks early in the development process. -By integrating static code analysis into version control systems, IDEs, and CI/CD pipelines, teams can receive immediate feedback on code changes, ensuring adherence to coding standards and best practices. This proactive approach reduces the cognitive load on developers, allowing them to focus on more complex tasks while maintaining a clean, modular, and easily comprehensible codebase. +By integrating automated code analysis into version control systems, IDEs, and CI/CD pipelines, teams can receive immediate feedback on code changes, ensuring adherence to coding standards and best practices. This proactive approach reduces the cognitive load on developers, allowing them to focus on more complex tasks while maintaining a clean, modular, and easily comprehensible codebase. ### [Migrate to a Monorepo](/practices/migrate-to-monorepo.md) From 603ac345e91865cac220b952fb1295e9aa4ef7fb Mon Sep 17 00:00:00 2001 From: Monica Taylor Date: Wed, 27 Aug 2025 14:55:35 -0400 Subject: [PATCH 023/131] initial commit to create a file for the Otel Practice --- practices/open-telemetry-practice.md | 52 ++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 practices/open-telemetry-practice.md diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md new file mode 100644 index 0000000..766b259 --- /dev/null +++ b/practices/open-telemetry-practice.md @@ -0,0 +1,52 @@ +> Review an existing practice like [Migrate to Monorepo](/practices/migrate-to-monorepo.md) to see a good example of a practice following this template. + +# `[Action-Oriented Title]` + +```text +Quick 2-4 sentence summary. What’s the practice? Why should teams care? Keep it casual and motivating. +``` + +## Who It’s For & Why + +```text +“I am a [persona] and I need to [learn how to / ensure that] so I can [end goal].” +(List for each relevant persona: Non-technical exec, Technical exec, Developer, QA, PM, Product Manager, etc.) +``` + +## How to Gain Traction + +```text +List 3–5 steps to take a team from zero to adopted. +Each step gets: + ### [Action Step] + 3 sentences on how to do it, how to get buy-in, and what tools/resources help. Any external resources (videos, guides, book lists, templates, etc.) that help a team adopt this practice should be linked here within the relevant action step. +``` + +## Metrics & Signals + +```text +When writing Metrics & Signals, list target metrics, qualitative markers, or lightweight feedback mechanisms. + +Organize them into Fast & Measurable, Fast & Intangible, Slow & Measurable, or Slow & Intangible, but only include categories with strong, defensible signals. Exclude weak or hard-to-attribute signals. + +For measurable items, specify how to track them (e.g., DX, Jira, CI dashboards). For intangible items, note how to capture feedback (e.g., surveys, retro notes, developer chatter). + +Keep metrics scoped and outcome-focused (e.g., “reduced lead time for cross-repo changes” instead of just “reduced lead time”). +``` + +## Lessons From The Field + +```text +This section captures real-world patterns (things that consistently help or hinder this practice) along with short, relevant stories from the field. It’s not for personal rants or generic opinions. Each entry must be based on either: +1. a repeated observation across teams, or +2. a specific example (what worked, what didn’t, and why). +``` + +## Supported Capabilities + +```text +List 1–4 existing DORA Capabilities this practice supports. +For each: + ### Capability Name (link) + 1–2 sentences on how this practice helps improve it. +``` From 1d5621ee60d35ad2e6bd4a0f2e3618aa1d356fd0 Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Fri, 29 Aug 2025 16:00:12 -0400 Subject: [PATCH 024/131] first draft of open telemetry practice, still WIP. Not ready for review. --- practices/open-telemetry-practice.md | 105 +++++++++++++++++++-------- 1 file changed, 73 insertions(+), 32 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 766b259..1196b6d 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -1,52 +1,93 @@ -> Review an existing practice like [Migrate to Monorepo](/practices/migrate-to-monorepo.md) to see a good example of a practice following this template. +# Adopt OpenTelemetry for Unified Observability -# `[Action-Oriented Title]` +> **Practice framing:** +> This practice captures a socio-technical shift underway in modern software teams. +> Observability is moving from the “three pillars” (logs, metrics, traces stored separately) +> to a unified model where all signals are captured as wide, structured events. +> OpenTelemetry (OTel) is the open-source standard enabling this shift — reducing vendor lock-in, +> correlating signals across the stack, and letting developers see their impact in real time. -```text -Quick 2-4 sentence summary. What’s the practice? Why should teams care? Keep it casual and motivating. -``` +OpenTelemetry (OTel) is the open-source standard for collecting telemetry data across services. Instead of juggling separate tools for logs, metrics, and traces, OTel helps teams consolidate context into a single structured format. This gives developers and execs the same “source of truth” about how systems behave — enabling faster debugging, richer product insights, and a direct link between engineering work and business outcomes. ## Who It’s For & Why -```text -“I am a [persona] and I need to [learn how to / ensure that] so I can [end goal].” -(List for each relevant persona: Non-technical exec, Technical exec, Developer, QA, PM, Product Manager, etc.) -``` +- **Developers** need consistent telemetry that makes debugging easier and helps them see the real-world impact of their code. +- **Technical leaders** need observability that scales without vendor lock-in and provides flexibility to adapt tools over time. +- **Non-technical executives** need a trustworthy way to connect system health to business metrics like bookings, throughput, or customer retention. +- **Ops / SREs** need to reduce alert fatigue, catch issues before customers do, and correlate signals across multiple environments. + ## How to Gain Traction -```text -List 3–5 steps to take a team from zero to adopted. -Each step gets: - ### [Action Step] - 3 sentences on how to do it, how to get buy-in, and what tools/resources help. Any external resources (videos, guides, book lists, templates, etc.) that help a team adopt this practice should be linked here within the relevant action step. -``` +### Start with a Champion +You need one senior stakeholder with political capital to say: *“We’re doing this.”* Without a champion, OTel efforts stall. Use their support to carve out space for a pilot project. + +### Establish a Shared Repository +Set up a dedicated observability repo managed like an open-source project. Include schema definitions, testing rules, a README with setup instructions, and usage guidelines. Lock down standards (e.g., TypeScript interfaces for attributes) so all services produce consistent data. + +### Pilot with Auto-Instrumentation +Begin with OTel’s auto-instrumentation libraries to generate traces quickly. Expect noise: tune by suppressing low-value metrics and layering abstractions. Wrap complex packages in simple helpers so other developers can adopt without digging into raw node modules. + +### Correlate Across Tools +Don’t fight the fact that teams have “their favorite tool.” Instead, enrich OTel spans with IDs and references from those tools. This builds connective tissue and lets people see how siloed data relates inside a unified telemetry stream. + +### Show Quick Wins +Surface dashboards that answer questions execs and devs already care about: +- “Where are users dropping off?” +- “What query is slowing down checkout?” +- “Is this issue ours or the third party’s?” +Make the impact visible within days of a deploy to build momentum. + ## Metrics & Signals -```text -When writing Metrics & Signals, list target metrics, qualitative markers, or lightweight feedback mechanisms. +### Fast & Measurable +- **Time to debug incidents** (tracked via Jira/incident postmortems). Should drop once OTel is in place. +- **Deployment feedback cycle** (time between shipping and seeing results in telemetry dashboards). Expect to see 10–15 min loops instead of hours/days. + +### Fast & Intangible +- **Developer chatter about on-call load** (retro notes, Slack threads). A good sign is fewer “2am wake-up for nothing” complaints. +- **Dashboard creation by non-admins** (are devs confident enough to self-serve?) -Organize them into Fast & Measurable, Fast & Intangible, Slow & Measurable, or Slow & Intangible, but only include categories with strong, defensible signals. Exclude weak or hard-to-attribute signals. +### Slow & Measurable +- **Reduced reliance on multiple vendor tools** (track subscription spend or # of dashboards maintained). +- **Product KPIs tied to system changes** (e.g., conversion rates after a performance optimization). -For measurable items, specify how to track them (e.g., DX, Jira, CI dashboards). For intangible items, note how to capture feedback (e.g., surveys, retro notes, developer chatter). +### Slow & Intangible +- **Cross-team trust**: Are execs and product managers referencing telemetry data in conversations instead of relying on anecdote or gut feel? -Keep metrics scoped and outcome-focused (e.g., “reduced lead time for cross-repo changes” instead of just “reduced lead time”). -``` ## Lessons From The Field -```text -This section captures real-world patterns (things that consistently help or hinder this practice) along with short, relevant stories from the field. It’s not for personal rants or generic opinions. Each entry must be based on either: -1. a repeated observation across teams, or -2. a specific example (what worked, what didn’t, and why). -``` +- _Champion Power Matters_ – At Lifestance, OTel only took root because Andre had the authority to insist on adoption. Without a sponsor who can hold the line, expect stall-outs. +- _Schemas Prevent Chaos_ – Defining and versioning attribute schemas up front (e.g., TypeScript interfaces) ensures data is comparable across services. Teams that skip this get unmanageable dashboards. +- _Correlating External Tools is an Opportunity_ – Instead of resisting siloed tools, smart teams embed their IDs into traces, creating a de-facto unified model. This turns “annoying fragmentation” into a source of leverage. +- _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic cultures, this requires social skill: frame insights as opportunities, not punishments. +- _Developer Buy-in Comes From On-Call Relief_ – The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. + + ## Supported Capabilities -```text -List 1–4 existing DORA Capabilities this practice supports. -For each: - ### Capability Name (link) - 1–2 sentences on how this practice helps improve it. -``` +### [Continuous Delivery](/capabilities/continuous-delivery.md) +OTel enables faster, safer deploys by providing near-real-time feedback loops — developers can see the impact of changes minutes after release. + +### [Team Experimentation](/capabilities/team-experimentation.md) +Unified telemetry lets devs run safe experiments (optimize queries, adjust configs) and immediately measure business impact. + +### [Code Maintainability](/capabilities/code-maintainability.md) +Consistent observability abstractions act as shared infrastructure patterns, helping teams manage complexity across many repos. + +### [Job Satisfaction](/capabilities/job-satisfaction.md) +Reducing false alarms and giving developers visibility into their real impact improves morale and reduces burnout. + +### [Working in Small Batches](/capabilities/working-in-small-batches.md) +TBD + +### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) +TBD + +### [Monitoring & Observability](/capabilities/monitoring-and-observability.md) +TBD + + From ec65021c029658ac2166cd604ceefb93b1d59a33 Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Fri, 29 Aug 2025 16:02:03 -0400 Subject: [PATCH 025/131] Abstract out case study to remove identifier --- practices/open-telemetry-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 1196b6d..4a7e7f3 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -59,7 +59,7 @@ Make the impact visible within days of a deploy to build momentum. ## Lessons From The Field -- _Champion Power Matters_ – At Lifestance, OTel only took root because Andre had the authority to insist on adoption. Without a sponsor who can hold the line, expect stall-outs. +- _Champion Power Matters_ – At one company, OTel only took root because a high level engineering executive had the authority to insist on adoption. Without a sponsor who can hold the line, expect stall-outs. - _Schemas Prevent Chaos_ – Defining and versioning attribute schemas up front (e.g., TypeScript interfaces) ensures data is comparable across services. Teams that skip this get unmanageable dashboards. - _Correlating External Tools is an Opportunity_ – Instead of resisting siloed tools, smart teams embed their IDs into traces, creating a de-facto unified model. This turns “annoying fragmentation” into a source of leverage. - _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic cultures, this requires social skill: frame insights as opportunities, not punishments. From 12fc3a526cb18bf6bd519885b8201fba238f3cb9 Mon Sep 17 00:00:00 2001 From: Monica Taylor Date: Fri, 29 Aug 2025 16:08:22 -0400 Subject: [PATCH 026/131] modifying text --- practices/open-telemetry-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 4a7e7f3..9fa08b0 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -32,7 +32,7 @@ Begin with OTel’s auto-instrumentation libraries to generate traces quickly. E Don’t fight the fact that teams have “their favorite tool.” Instead, enrich OTel spans with IDs and references from those tools. This builds connective tissue and lets people see how siloed data relates inside a unified telemetry stream. ### Show Quick Wins -Surface dashboards that answer questions execs and devs already care about: +Surface dashboards that answer the questions execs and devs already care about: - “Where are users dropping off?” - “What query is slowing down checkout?” - “Is this issue ours or the third party’s?” From 79c54bea3446cb526f449a1ffc85d18b37eac493 Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Tue, 2 Sep 2025 13:42:39 -0400 Subject: [PATCH 027/131] Draft for review on otel practice. --- practices/open-telemetry-practice.md | 84 ++++++++++++++-------------- 1 file changed, 43 insertions(+), 41 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 9fa08b0..fbf3e89 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -7,20 +7,26 @@ > OpenTelemetry (OTel) is the open-source standard enabling this shift — reducing vendor lock-in, > correlating signals across the stack, and letting developers see their impact in real time. -OpenTelemetry (OTel) is the open-source standard for collecting telemetry data across services. Instead of juggling separate tools for logs, metrics, and traces, OTel helps teams consolidate context into a single structured format. This gives developers and execs the same “source of truth” about how systems behave — enabling faster debugging, richer product insights, and a direct link between engineering work and business outcomes. -## Who It’s For & Why +Observability has never been more important — or more complex. [OpenTelemetry](https://opentelemetry.io/) (OTel) is the open-source standard for collecting telemetry data across services. Instead of juggling separate tools for logs, metrics, and traces, OTel helps teams consolidate context into a single structured format. This gives developers and execs the same “source of truth” about how systems behave — enabling faster debugging, richer product insights, and a direct link between engineering work and business outcomes - without vendor lock-in. -- **Developers** need consistent telemetry that makes debugging easier and helps them see the real-world impact of their code. +Some popular OpenTelemetry-compatible platforms include: +- [Honeycomb](https://www.honeycomb.io/) – built around wide, structured events and exploratory debugging +- [Grafana Tempo](https://grafana.com/oss/tempo/) – integrates with the Grafana stack for trace visualization +- [Datadog](https://www.datadoghq.com/), [New Relic](https://newrelic.com/), [Dynatrace](https://www.dynatrace.com/) – commercial observability vendors with native OTel support +- [Uptrace](https://uptrace.dev/) – an OTel-native open-source observability backend + +## When to Experiment + +- **Developers** need consistent telemetry that makes debugging easier and helps them see the real-world impact of their code. +- **Ops / SREs** need to reduce alert fatigue, catch issues before customers do, and correlate signals across environments. - **Technical leaders** need observability that scales without vendor lock-in and provides flexibility to adapt tools over time. - **Non-technical executives** need a trustworthy way to connect system health to business metrics like bookings, throughput, or customer retention. -- **Ops / SREs** need to reduce alert fatigue, catch issues before customers do, and correlate signals across multiple environments. - ## How to Gain Traction -### Start with a Champion -You need one senior stakeholder with political capital to say: *“We’re doing this.”* Without a champion, OTel efforts stall. Use their support to carve out space for a pilot project. +### Start with Education & a Champion +Begin by aligning teams on what OTel is and why it matters. Share Charity Majors’ framing of [“Observability 2.0”](https://www.honeycomb.io/blog/one-key-difference-observability1dot0-2dot0) to shift mindsets from three pillars to structured events. Secure an executive or senior engineer to sponsor the adoption — without a champion, efforts often stall. ### Establish a Shared Repository Set up a dedicated observability repo managed like an open-source project. Include schema definitions, testing rules, a README with setup instructions, and usage guidelines. Lock down standards (e.g., TypeScript interfaces for attributes) so all services produce consistent data. @@ -36,58 +42,54 @@ Surface dashboards that answer the questions execs and devs already care about: - “Where are users dropping off?” - “What query is slowing down checkout?” - “Is this issue ours or the third party’s?” -Make the impact visible within days of a deploy to build momentum. +Demonstrating visible impact in days builds credibility and momentum. +## Lessons From The Field +- _Champion Power Matters_ – OTel takes root best when there is a champion to lead the charge. Without a sponsor who can hold the line, expect stall-outs. See [OTel in Practice: Alibaba's OpenTelemetry Journey](https://www.youtube.com/watch?v=fgbB0HhVBq8) +- _Schemas Prevent Chaos_ – Defining and versioning attribute schemas up front (e.g., TypeScript interfaces) ensures data is comparable across services. Teams that skip this get unmanageable dashboards. See [OpenTelemetry Q&A Featuring Hazel Weakly](https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-09-13T19:26:10Z-opentelemetry-q-a-feat-hazel-weakly.md) +- _Correlating External Tools is an Opportunity_ – Instead of resisting siloed tools, smart teams embed their IDs into traces, creating a de-facto unified model. This turns “annoying fragmentation” into a source of leverage. See [The Evolution of Observability Practices](https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-10-11T20:52:28Z-the-evolution-of-observability-practices.md) +- _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic +cultures, this requires social skill: frame insights as opportunities, not punishments. +- _Developer Buy-in Comes From On-Call Relief_ – The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. See [OTel Q&A Featuring Jennifer Moore](https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-12-04T23:34:44Z-otel-q-a-feat-jennifer-moore.md) -## Metrics & Signals +## Deciding to Polish or Pitch + +After experimenting with this practice for **4–5 weeks**, bring the team together and ensure the following metrics and/or signals have changed in a positive direction: ### Fast & Measurable -- **Time to debug incidents** (tracked via Jira/incident postmortems). Should drop once OTel is in place. -- **Deployment feedback cycle** (time between shipping and seeing results in telemetry dashboards). Expect to see 10–15 min loops instead of hours/days. +- **Reduced Debug Time** Incidents should resolve faster once OTel is in place. Track cycle time or ticket to resolution time via [Jira](https://atlassian.com/software/jira)/incident postmortems. +- **Shorter Deployment Feedback Loops.** Expect 10–15 minute telemetry updates instead of waiting hours or days. ### Fast & Intangible -- **Developer chatter about on-call load** (retro notes, Slack threads). A good sign is fewer “2am wake-up for nothing” complaints. -- **Dashboard creation by non-admins** (are devs confident enough to self-serve?) +- **Developer Sentiment** Look for fewer “wake-up-for-nothing” complaints in retros or Slack chatter. +- **Dashboard Adoption** Track how many non-admins confidently create their own dashboards. ### Slow & Measurable -- **Reduced reliance on multiple vendor tools** (track subscription spend or # of dashboards maintained). -- **Product KPIs tied to system changes** (e.g., conversion rates after a performance optimization). +- **Reduced Vendor Dependence.** Track spend on observability tools or # of dashboards maintained. +- **Product KPIs** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. ### Slow & Intangible -- **Cross-team trust**: Are execs and product managers referencing telemetry data in conversations instead of relying on anecdote or gut feel? - - -## Lessons From The Field - -- _Champion Power Matters_ – At one company, OTel only took root because a high level engineering executive had the authority to insist on adoption. Without a sponsor who can hold the line, expect stall-outs. -- _Schemas Prevent Chaos_ – Defining and versioning attribute schemas up front (e.g., TypeScript interfaces) ensures data is comparable across services. Teams that skip this get unmanageable dashboards. -- _Correlating External Tools is an Opportunity_ – Instead of resisting siloed tools, smart teams embed their IDs into traces, creating a de-facto unified model. This turns “annoying fragmentation” into a source of leverage. -- _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic cultures, this requires social skill: frame insights as opportunities, not punishments. -- _Developer Buy-in Comes From On-Call Relief_ – The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. - +- **Cross-Team Trust.** Do PMs and execs begin referencing telemetry data in decision-making instead of anecdote? +## Supporting Capabilities +### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) +Enables proactive observability across systems. Fast feedback loops mean catching issues before they reach the user. -## Supported Capabilities +### [Monitoring & Observability](/capabilities/monitoring-and-observability.md) +Otel is the open source standard for collecting telemetry data across services and utilizes a unified model to optimize observability in the modern tech landscape. -### [Continuous Delivery](/capabilities/continuous-delivery.md) -OTel enables faster, safer deploys by providing near-real-time feedback loops — developers can see the impact of changes minutes after release. +### [Continuous Delivery](/capabilities/continuous-delivery.md) +OTel enables faster, safer deploys by providing near-real-time feedback loops — developers can see the impact of changes minutes after release. -### [Team Experimentation](/capabilities/team-experimentation.md) -Unified telemetry lets devs run safe experiments (optimize queries, adjust configs) and immediately measure business impact. +### [Team Experimentation](/capabilities/team-experimentation.md) +Unified telemetry lets devs run safe experiments (optimize queries, adjust configs) and immediately measure business impact. -### [Code Maintainability](/capabilities/code-maintainability.md) -Consistent observability abstractions act as shared infrastructure patterns, helping teams manage complexity across many repos. +### [Code Maintainability](/capabilities/code-maintainability.md) +Consistent observability abstractions act as shared infrastructure patterns, helping teams manage complexity across many repos. -### [Job Satisfaction](/capabilities/job-satisfaction.md) +### [Job Satisfaction](/capabilities/job-satisfaction.md) Reducing false alarms and giving developers visibility into their real impact improves morale and reduces burnout. -### [Working in Small Batches](/capabilities/working-in-small-batches.md) -TBD - -### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) -TBD -### [Monitoring & Observability](/capabilities/monitoring-and-observability.md) -TBD From d7e8e1f4f9db3a3d52dadd4c97c0f1d0fdbc8c2c Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Tue, 2 Sep 2025 13:49:07 -0400 Subject: [PATCH 028/131] Added in real world case studies --- practices/open-telemetry-practice.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index fbf3e89..a25ca28 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -51,6 +51,9 @@ Demonstrating visible impact in days builds credibility and momentum. - _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic cultures, this requires social skill: frame insights as opportunities, not punishments. - _Developer Buy-in Comes From On-Call Relief_ – The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. See [OTel Q&A Featuring Jennifer Moore](https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-12-04T23:34:44Z-otel-q-a-feat-jennifer-moore.md) +- _SAP’s Massive-Scale Modernization_ – SAP revamped its observability architecture across a fleet of 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. This enabled unified logs, metrics, and traces, provided low-risk migration to OpenSearch 2.x, and dramatically sped up incident response. [SAP Case Study](https://opensearch.org/blog/case-study-sap-unifies-observability-at-scale-with-opensearch-and-opentelemetry/) +- _Pax8 Unleashes Curious, Cost‑Effective Instrumentation_ – By moving to Honeycomb’s Observability 2.0 platform, Pax8 empowered engineers, product managers, and ops with structured telemetry and dropped their observability costs by 30%. Their user base grew from 50 to 210 users, democratizing access without breaking the budget. [Pax8 Case Study](https://www.honeycomb.io/resources/case-studies/pax8-modern-observability-2-0-solution) + ## Deciding to Polish or Pitch From e1dac1c00ec1f9f4fa6e61b89653f2a30815c208 Mon Sep 17 00:00:00 2001 From: Nicole Lynn Date: Tue, 2 Sep 2025 14:25:19 -0400 Subject: [PATCH 029/131] Added resource pages --- practices/open-telemetry-practice.md | 14 +++++------ .../otel/alibaba-opentelemetry-journey.md | 25 +++++++++++++++++++ .../evolution-of-observability-practices.md | 25 +++++++++++++++++++ .../tech/otel/observability-2-0-honeycomb.md | 23 +++++++++++++++++ .../otel/opentelemetry-qa-hazel-weakly.md | 25 +++++++++++++++++++ .../otel/opentelemetry-qa-jennifer-moore.md | 25 +++++++++++++++++++ .../otel/pax8-observability-2-0-case-study.md | 23 +++++++++++++++++ .../tech/otel/sap-opentelemetry-case-study.md | 23 +++++++++++++++++ 8 files changed, 176 insertions(+), 7 deletions(-) create mode 100644 resources/tech/otel/alibaba-opentelemetry-journey.md create mode 100644 resources/tech/otel/evolution-of-observability-practices.md create mode 100644 resources/tech/otel/observability-2-0-honeycomb.md create mode 100644 resources/tech/otel/opentelemetry-qa-hazel-weakly.md create mode 100644 resources/tech/otel/opentelemetry-qa-jennifer-moore.md create mode 100644 resources/tech/otel/pax8-observability-2-0-case-study.md create mode 100644 resources/tech/otel/sap-opentelemetry-case-study.md diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index a25ca28..d89cf37 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -26,7 +26,7 @@ Some popular OpenTelemetry-compatible platforms include: ## How to Gain Traction ### Start with Education & a Champion -Begin by aligning teams on what OTel is and why it matters. Share Charity Majors’ framing of [“Observability 2.0”](https://www.honeycomb.io/blog/one-key-difference-observability1dot0-2dot0) to shift mindsets from three pillars to structured events. Secure an executive or senior engineer to sponsor the adoption — without a champion, efforts often stall. +Begin by aligning teams on what OTel is and why it matters. Share Charity Majors’ framing of [“Observability 2.0”](/resources/tech/otel/observability-2-0-honeycomb.md) to shift mindsets from three pillars to structured events. Secure an executive or senior engineer to sponsor the adoption — without a champion, efforts often stall. ### Establish a Shared Repository Set up a dedicated observability repo managed like an open-source project. Include schema definitions, testing rules, a README with setup instructions, and usage guidelines. Lock down standards (e.g., TypeScript interfaces for attributes) so all services produce consistent data. @@ -45,14 +45,14 @@ Surface dashboards that answer the questions execs and devs already care about: Demonstrating visible impact in days builds credibility and momentum. ## Lessons From The Field -- _Champion Power Matters_ – OTel takes root best when there is a champion to lead the charge. Without a sponsor who can hold the line, expect stall-outs. See [OTel in Practice: Alibaba's OpenTelemetry Journey](https://www.youtube.com/watch?v=fgbB0HhVBq8) -- _Schemas Prevent Chaos_ – Defining and versioning attribute schemas up front (e.g., TypeScript interfaces) ensures data is comparable across services. Teams that skip this get unmanageable dashboards. See [OpenTelemetry Q&A Featuring Hazel Weakly](https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-09-13T19:26:10Z-opentelemetry-q-a-feat-hazel-weakly.md) -- _Correlating External Tools is an Opportunity_ – Instead of resisting siloed tools, smart teams embed their IDs into traces, creating a de-facto unified model. This turns “annoying fragmentation” into a source of leverage. See [The Evolution of Observability Practices](https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-10-11T20:52:28Z-the-evolution-of-observability-practices.md) +- _Champion Power Matters_ – OTel takes root best when there is a champion to lead the charge. Without a sponsor who can hold the line, expect stall-outs. See [OTel in Practice: Alibaba’s OpenTelemetry Journey](/resources/tech/otel/alibaba-opentelemetry-journey.md). +- _Schemas Prevent Chaos_ – Defining and versioning attribute schemas up front (e.g., TypeScript interfaces) ensures data is comparable across services. Teams that skip this get unmanageable dashboards. See [OpenTelemetry Q&A Featuring Hazel Weakly](/resources/tech/otel/opentelemetry-qa-hazel-weakly.md) +- _Correlating External Tools is an Opportunity_ – Instead of resisting siloed tools, smart teams embed their IDs into traces, creating a de-facto unified model. This turns “annoying fragmentation” into a source of leverage. See [The Evolution of Observability Practices](/resources/tech/otel/evolution-of-observability-practices.md) - _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic cultures, this requires social skill: frame insights as opportunities, not punishments. -- _Developer Buy-in Comes From On-Call Relief_ – The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. See [OTel Q&A Featuring Jennifer Moore](https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-12-04T23:34:44Z-otel-q-a-feat-jennifer-moore.md) -- _SAP’s Massive-Scale Modernization_ – SAP revamped its observability architecture across a fleet of 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. This enabled unified logs, metrics, and traces, provided low-risk migration to OpenSearch 2.x, and dramatically sped up incident response. [SAP Case Study](https://opensearch.org/blog/case-study-sap-unifies-observability-at-scale-with-opensearch-and-opentelemetry/) -- _Pax8 Unleashes Curious, Cost‑Effective Instrumentation_ – By moving to Honeycomb’s Observability 2.0 platform, Pax8 empowered engineers, product managers, and ops with structured telemetry and dropped their observability costs by 30%. Their user base grew from 50 to 210 users, democratizing access without breaking the budget. [Pax8 Case Study](https://www.honeycomb.io/resources/case-studies/pax8-modern-observability-2-0-solution) +- _Developer Buy-in Comes From On-Call Relief_ – The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. See [OpenTelemetry Q&A Featuring Jennifer Moore](/resources/tech/otel/opentelemetry-qa-jennifer-moore.md) +- _SAP’s Massive-Scale Modernization_ – SAP revamped its observability architecture across a fleet of 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. This enabled unified logs, metrics, and traces, provided low-risk migration to OpenSearch 2.x, and dramatically sped up incident response. [SAP Case Study: Massive-Scale Observability Modernization](/resources/tech/otel/sap-opentelemetry-case-study.md) +- _Pax8 Unleashes Curious, Cost‑Effective Instrumentation_ – By moving to Honeycomb’s Observability 2.0 platform, Pax8 empowered engineers, product managers, and ops with structured telemetry and dropped their observability costs by 30%. Their user base grew from 50 to 210 users, democratizing access without breaking the budget. [Pax8 Case Study: Democratizing Observability 2.0](/resources/tech/otel/pax8-observability-2-0-case-study.md) ## Deciding to Polish or Pitch diff --git a/resources/tech/otel/alibaba-opentelemetry-journey.md b/resources/tech/otel/alibaba-opentelemetry-journey.md new file mode 100644 index 0000000..54efe75 --- /dev/null +++ b/resources/tech/otel/alibaba-opentelemetry-journey.md @@ -0,0 +1,25 @@ +# OTel in Practice: Alibaba’s OpenTelemetry Journey + +Resource type: Video + +Video: https://www.youtube.com/watch?v=fgbB0HhVBq8 + +## What it’s about + +This talk from Alibaba engineers describes their journey implementing OpenTelemetry at scale. It covers the challenges of migrating from proprietary monitoring stacks, building internal champions, and standardizing telemetry across diverse services. + +## Why it’s worth watching + +It’s a rare enterprise case study told from the inside — with practical detail on both technical hurdles and organizational dynamics. Alibaba shows that OTel adoption requires not just instrumentation, but strong leadership and persistence to overcome skepticism. + +## Pause and Ponder + +- Who in our org has the authority and conviction to enforce OTel adoption? +- Which legacy systems or processes would create the same friction Alibaba faced? +- How can we ensure telemetry remains comparable across hundreds of services? +- What objections might surface from entrenched teams or leaders, and how could we frame the benefits? +- What governance model would help us scale adoption without chaos? + +## Takeaway + +OTel adoption succeeds when backed by champions with enough authority to enforce change. Alibaba’s journey underscores the socio-technical nature of observability: success depends as much on politics as on code. \ No newline at end of file diff --git a/resources/tech/otel/evolution-of-observability-practices.md b/resources/tech/otel/evolution-of-observability-practices.md new file mode 100644 index 0000000..53dfa83 --- /dev/null +++ b/resources/tech/otel/evolution-of-observability-practices.md @@ -0,0 +1,25 @@ +# The Evolution of Observability Practices + +Resource type: Video + Transcript + +Video: https://www.youtube.com/watch?v=8sTZnM2BC1U +Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-10-11T20:52:28Z-the-evolution-of-observability-practices.md + +## What it’s about + +This session explores how observability practices have shifted from “three pillars” thinking to the more modern OTel-inspired view of wide structured events. Panelists discuss lessons from real organizations: what worked, what failed, and how they bridged siloed tools. + +## Why it’s worth watching/reading + +It connects the dots between OTel’s philosophy and lived team experience. The candid discussion about “tools teams won’t give up” makes it clear that correlation, not replacement, is the pragmatic path forward. + +## Pause and Ponder + +- Where are we still treating metrics/logs/traces as separate in our own stack? +- Which silos exist in our org today, and how could IDs/tags correlate them? +- What advantages do we gain from capturing everything in a single structured format? +- Which success patterns resonate most with our org’s culture? + +## Takeaway + +Observability practices evolve slowly, and teams rarely throw tools away. Success comes from correlation across silos, not dogmatic purity. \ No newline at end of file diff --git a/resources/tech/otel/observability-2-0-honeycomb.md b/resources/tech/otel/observability-2-0-honeycomb.md new file mode 100644 index 0000000..0d0b0c6 --- /dev/null +++ b/resources/tech/otel/observability-2-0-honeycomb.md @@ -0,0 +1,23 @@ +# Observability 2.0 (Honeycomb Blog) + +Resource type: Blog Post + +https://www.honeycomb.io/blog/one-key-difference-observability1dot0-2dot0 + +## What it’s about + +Honeycomb lays out the core distinction between “Observability 1.0” (centered on dashboards, metrics, and alerting) and “Observability 2.0” (centered on wide, structured events that allow flexible, ad-hoc exploration). The post argues that OTel is enabling this shift across the industry. + +## Why it’s worth reading + +This is the canonical articulation of what “Observability 2.0” means. Many teams still think observability is just monitoring with more dashboards; this article helps reframe it as a fundamentally different practice. + +## Pause and Ponder + +- Are we treating observability as pre-built dashboards, or as a flexible tool for answering new questions? +- How could wide structured events help us see connections we’re missing today? +- Where in our org would shifting from “1.0” to “2.0” make the biggest cultural impact? + +## Takeaway + +Observability 2.0 isn’t just more data — it’s a mindset shift. Wide structured events let engineers ask questions they didn’t anticipate, turning telemetry from a static dashboard into a dynamic conversation. \ No newline at end of file diff --git a/resources/tech/otel/opentelemetry-qa-hazel-weakly.md b/resources/tech/otel/opentelemetry-qa-hazel-weakly.md new file mode 100644 index 0000000..e9414f9 --- /dev/null +++ b/resources/tech/otel/opentelemetry-qa-hazel-weakly.md @@ -0,0 +1,25 @@ +# OpenTelemetry Q&A Featuring Hazel Weakly + +Resource type: Video + Transcript + +Video: https://www.youtube.com/watch?v=wMJEgrUnX7M +Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-09-13T19:26:10Z-opentelemetry-q-a-feat-hazel-weakly.md + +## What it’s about + +Hazel Weakly shares hard-won lessons from early OTel implementations. The Q&A touches on schema design, avoiding fragmentation, and how to get engineers comfortable with the shift from “logs/metrics/traces” to a unified event model. + +## Why it’s worth watching/reading + +It’s practical guidance from someone in the trenches, not just theory. Hazel’s framing of schema stability as the single most important adoption factor is a must-hear for any team starting out with OTel. + +## Pause and Ponder + +- How can we signal breaking schema changes clearly to devs consuming telemetry? +- When is it safe to add custom attributes, and when should we standardize them? +- What governance models help keep telemetry consistent across many repos? +- What cultural frictions are likely blockers for adoption in our org? + +## Takeaway + +OTel adoption lives or dies on schema discipline. Versioning and governance aren’t “nice to haves” — they’re the foundation that keeps telemetry comparable across services. \ No newline at end of file diff --git a/resources/tech/otel/opentelemetry-qa-jennifer-moore.md b/resources/tech/otel/opentelemetry-qa-jennifer-moore.md new file mode 100644 index 0000000..f38d960 --- /dev/null +++ b/resources/tech/otel/opentelemetry-qa-jennifer-moore.md @@ -0,0 +1,25 @@ +# OpenTelemetry Q&A Featuring Jennifer Moore + +Resource type: Video + Transcript + +Video: https://www.youtube.com/watch?v=fRaWavw0T5c +Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-12-04T23:34:44Z-otel-q-a-feat-jennifer-moore.md + +## What it’s about + +Jennifer Moore shares her experiences building developer buy-in for OTel. The Q&A covers alert fatigue, on-call culture, and how to position telemetry as a tool that empowers developers rather than burdens them. + +## Why it’s worth watching/reading + +It highlights the *human side* of observability. Jennifer explains how OTel can win over skeptics by reducing 2am false alarms and giving developers context they actually care about. + +## Pause and Ponder + +- How much noisy or low-value telemetry do we currently have, and how does it affect on-call rotations? +- What would it look like for devs in our org to self-serve dashboards instead of waiting on ops? +- How fast is our deploy-to-insight cycle today, and what would “minutes, not days” unlock? +- How do we make telemetry feel like relief, not surveillance? + +## Takeaway + +Developer adoption hinges on relief. When OTel reduces pointless alerts and empowers self-service, it transforms from “extra work” into a quality-of-life improvement. \ No newline at end of file diff --git a/resources/tech/otel/pax8-observability-2-0-case-study.md b/resources/tech/otel/pax8-observability-2-0-case-study.md new file mode 100644 index 0000000..5483a6c --- /dev/null +++ b/resources/tech/otel/pax8-observability-2-0-case-study.md @@ -0,0 +1,23 @@ +# Pax8 Case Study: Democratizing Observability 2.0 + +Resource type: Case Study + +https://www.honeycomb.io/resources/case-studies/pax8-modern-observability-2-0-solution + +## What it’s about + +Pax8 shifted from traditional observability tools with user-based pricing to Honeycomb’s Observability 2.0 platform. By adopting wide structured events and event-based pricing, they unlocked access for their entire engineering org while reducing costs by 30%. + +## Why it’s worth reading + +Pax8 highlights the cultural side of Observability 2.0. Their story shows how democratizing access — letting *everyone* explore telemetry data, not just senior engineers — boosts curiosity, experimentation, and shared ownership of reliability. It’s also a concrete proof point that OTel-based approaches can *lower* costs, not just add more. + +## Pause and Ponder + +- How does pricing and accessibility shape who actually uses observability data in your org? +- What happens when product managers and junior developers can explore telemetry, not just ops leads? +- What trade-offs did Pax8 make to gain both cost efficiency and coverage? + +## Takeaway + +Observability 2.0 isn’t only about better data — it’s about enabling more people to use that data. Pax8 proved that when access broadens, culture shifts toward curiosity and resilience. \ No newline at end of file diff --git a/resources/tech/otel/sap-opentelemetry-case-study.md b/resources/tech/otel/sap-opentelemetry-case-study.md new file mode 100644 index 0000000..538d6d5 --- /dev/null +++ b/resources/tech/otel/sap-opentelemetry-case-study.md @@ -0,0 +1,23 @@ +# SAP Case Study: Massive-Scale Observability Modernization + +Resource type: Case Study + +https://opensearch.org/blog/case-study-sap-unifies-observability-at-scale-with-opensearch-and-opentelemetry/ + +## What it’s about + +SAP overhauled observability across 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. They unified logs, metrics, and traces into a consistent telemetry model while migrating from OpenSearch 1.x to 2.x with minimal disruption. + +## Why it’s worth reading + +This case study shows what OTel adoption looks like at *extreme* scale — in a heavily regulated, enterprise-grade environment. It’s proof that the unified telemetry model isn’t just theory or startup hype: it works even in massive, legacy-heavy organizations. The migration strategy also provides concrete patterns for reducing risk during major observability shifts. + +## Pause and Ponder + +- How did SAP manage both modernization and migration without service disruption? +- What role did standardization (schemas, pipelines) play in making 11,000+ instances manageable? +- Which of SAP’s approaches could scale down to smaller orgs? + +## Takeaway + +OpenTelemetry provides a path for unifying observability across even the largest, most complex enterprises — not just greenfield startups. SAP’s success shows that with the right abstractions and migration planning, scale is not a blocker but an opportunity. \ No newline at end of file From 35997bac7a7fea4d48e80e8424712ef52d77d7aa Mon Sep 17 00:00:00 2001 From: nicoletache Date: Fri, 5 Sep 2025 09:47:20 -0500 Subject: [PATCH 030/131] edits to otel practice --- practices/open-telemetry-practice.md | 51 ++++++++++++---------------- 1 file changed, 22 insertions(+), 29 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index d89cf37..2c1e6b1 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -8,84 +8,80 @@ > correlating signals across the stack, and letting developers see their impact in real time. -Observability has never been more important — or more complex. [OpenTelemetry](https://opentelemetry.io/) (OTel) is the open-source standard for collecting telemetry data across services. Instead of juggling separate tools for logs, metrics, and traces, OTel helps teams consolidate context into a single structured format. This gives developers and execs the same “source of truth” about how systems behave — enabling faster debugging, richer product insights, and a direct link between engineering work and business outcomes - without vendor lock-in. +For modern software teams, observability has never been more important — or more complex. [OpenTelemetry](https://opentelemetry.io/) (OTel) is the open-source standard for collecting telemetry data across services. Instead of juggling separate tools for logs, metrics, and traces, OTel helps teams consolidate context into a single structured format. This gives developers and executives the same “source of truth” about how systems behave, which enables faster debugging, richer product insights, and a direct link between engineering work and business outcomes - all without vendor lock-in. Some popular OpenTelemetry-compatible platforms include: - [Honeycomb](https://www.honeycomb.io/) – built around wide, structured events and exploratory debugging - [Grafana Tempo](https://grafana.com/oss/tempo/) – integrates with the Grafana stack for trace visualization -- [Datadog](https://www.datadoghq.com/), [New Relic](https://newrelic.com/), [Dynatrace](https://www.dynatrace.com/) – commercial observability vendors with native OTel support +- [Datadog](https://www.datadoghq.com/), [New Relic](https://newrelic.com/), and [Dynatrace](https://www.dynatrace.com/) – commercial observability vendors with native OTel support - [Uptrace](https://uptrace.dev/) – an OTel-native open-source observability backend ## When to Experiment -- **Developers** need consistent telemetry that makes debugging easier and helps them see the real-world impact of their code. -- **Ops / SREs** need to reduce alert fatigue, catch issues before customers do, and correlate signals across environments. -- **Technical leaders** need observability that scales without vendor lock-in and provides flexibility to adapt tools over time. -- **Non-technical executives** need a trustworthy way to connect system health to business metrics like bookings, throughput, or customer retention. +- "I am a developer and I need to ensure consistent telemetry so that I can more easily debug and see the real-world impact of my code." +- "I am an Ops / SRE and I need to [learn to / ensure that] so that I can reduce alert fatigue, catch issues before customers do, and correlate signals across environments." +- "I am a technical leader and I need to ensure we have an observability tool that scales without vendor lock-in and remains flexibile over time." +- "I am a non-technical executive and I need to ensure the connection between system health and business metrics is reliable so that we have an accurate snapshot of bookings, throughput, and customer retention." ## How to Gain Traction ### Start with Education & a Champion -Begin by aligning teams on what OTel is and why it matters. Share Charity Majors’ framing of [“Observability 2.0”](/resources/tech/otel/observability-2-0-honeycomb.md) to shift mindsets from three pillars to structured events. Secure an executive or senior engineer to sponsor the adoption — without a champion, efforts often stall. +Begin by aligning teams on what OTel is and why it matters. Share Charity Majors’ blog post framing [“Observability 2.0”](/resources/tech/otel/observability-2-0-honeycomb.md) to begin to shift mindsets from three pillars to structured events. Secure an executive or senior engineer to sponsor the adoption — without a champion, efforts often stall. See [OTel in Practice: Alibaba’s OpenTelemetry Journey](/resources/tech/otel/alibaba-opentelemetry-journey.md). ### Establish a Shared Repository -Set up a dedicated observability repo managed like an open-source project. Include schema definitions, testing rules, a README with setup instructions, and usage guidelines. Lock down standards (e.g., TypeScript interfaces for attributes) so all services produce consistent data. +Set up a dedicated observability repo managed like an open-source project. Include schema definitions, testing rules, a README with setup instructions, and usage guidelines. Lock down standards (e.g., TypeScript interfaces for attributes) so all services produce consistent data. Teams that skip this get unmanageable dashboards. See [OpenTelemetry Q&A Featuring Hazel Weakly](/resources/tech/otel/opentelemetry-qa-hazel-weakly.md). ### Pilot with Auto-Instrumentation -Begin with OTel’s auto-instrumentation libraries to generate traces quickly. Expect noise: tune by suppressing low-value metrics and layering abstractions. Wrap complex packages in simple helpers so other developers can adopt without digging into raw node modules. +Begin with OTel’s auto-instrumentation libraries to generate traces quickly. Expect noise: Tune by suppressing low-value metrics and layering abstractions. Wrap complex packages in simple helpers so other developers can adopt without digging into raw node modules. ### Correlate Across Tools -Don’t fight the fact that teams have “their favorite tool.” Instead, enrich OTel spans with IDs and references from those tools. This builds connective tissue and lets people see how siloed data relates inside a unified telemetry stream. +Don’t fight the fact that teams have “their favorite tool.” Instead, enrich OTel spans with IDs and references from those tools. This builds connective tissue and lets people see how siloed data relates inside a unified telemetry stream. See [The Evolution of Observability Practices](/resources/tech/otel/evolution-of-observability-practices.md). ### Show Quick Wins -Surface dashboards that answer the questions execs and devs already care about: +The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. Surface dashboards that answer the questions executives and developers care about: - “Where are users dropping off?” - “What query is slowing down checkout?” - “Is this issue ours or the third party’s?” -Demonstrating visible impact in days builds credibility and momentum. +Demonstrating visible impact in days builds credibility and momentum. See [OpenTelemetry Q&A Featuring Jennifer Moore](/resources/tech/otel/opentelemetry-qa-jennifer-moore.md). ## Lessons From The Field -- _Champion Power Matters_ – OTel takes root best when there is a champion to lead the charge. Without a sponsor who can hold the line, expect stall-outs. See [OTel in Practice: Alibaba’s OpenTelemetry Journey](/resources/tech/otel/alibaba-opentelemetry-journey.md). -- _Schemas Prevent Chaos_ – Defining and versioning attribute schemas up front (e.g., TypeScript interfaces) ensures data is comparable across services. Teams that skip this get unmanageable dashboards. See [OpenTelemetry Q&A Featuring Hazel Weakly](/resources/tech/otel/opentelemetry-qa-hazel-weakly.md) -- _Correlating External Tools is an Opportunity_ – Instead of resisting siloed tools, smart teams embed their IDs into traces, creating a de-facto unified model. This turns “annoying fragmentation” into a source of leverage. See [The Evolution of Observability Practices](/resources/tech/otel/evolution-of-observability-practices.md) - _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic cultures, this requires social skill: frame insights as opportunities, not punishments. -- _Developer Buy-in Comes From On-Call Relief_ – The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. See [OpenTelemetry Q&A Featuring Jennifer Moore](/resources/tech/otel/opentelemetry-qa-jennifer-moore.md) -- _SAP’s Massive-Scale Modernization_ – SAP revamped its observability architecture across a fleet of 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. This enabled unified logs, metrics, and traces, provided low-risk migration to OpenSearch 2.x, and dramatically sped up incident response. [SAP Case Study: Massive-Scale Observability Modernization](/resources/tech/otel/sap-opentelemetry-case-study.md) -- _Pax8 Unleashes Curious, Cost‑Effective Instrumentation_ – By moving to Honeycomb’s Observability 2.0 platform, Pax8 empowered engineers, product managers, and ops with structured telemetry and dropped their observability costs by 30%. Their user base grew from 50 to 210 users, democratizing access without breaking the budget. [Pax8 Case Study: Democratizing Observability 2.0](/resources/tech/otel/pax8-observability-2-0-case-study.md) +- _SAP’s Massive-Scale Modernization_ – SAP revamped its observability architecture across a fleet of 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. This enabled unified logs, metrics, and traces, provided low-risk migration to OpenSearch 2.x, and dramatically sped up incident response. See [SAP Case Study: Massive-Scale Observability Modernization](/resources/tech/otel/sap-opentelemetry-case-study.md). +- _Pax8 Unleashes Curious, Cost‑Effective Instrumentation_ – By moving to Honeycomb’s Observability 2.0 platform, Pax8 empowered engineers, product managers, and ops with structured telemetry and dropped their observability costs by 30%. Their user base grew from 50 to 210 users, democratizing access without breaking the budget. See [Pax8 Case Study: Democratizing Observability 2.0](/resources/tech/otel/pax8-observability-2-0-case-study.md). ## Deciding to Polish or Pitch -After experimenting with this practice for **4–5 weeks**, bring the team together and ensure the following metrics and/or signals have changed in a positive direction: +After experimenting with this practice for **4–5 weeks**, bring the team together and determine whether the following metrics and/or signals have changed in a positive direction: ### Fast & Measurable -- **Reduced Debug Time** Incidents should resolve faster once OTel is in place. Track cycle time or ticket to resolution time via [Jira](https://atlassian.com/software/jira)/incident postmortems. -- **Shorter Deployment Feedback Loops.** Expect 10–15 minute telemetry updates instead of waiting hours or days. +- **Reduced Debug Time** Incidents should resolve faster once OTel is in place. Track cycle time or ticket-to-resolution time via [Jira](https://atlassian.com/software/jira)/incident postmortems. +- **Shorter Deployment Feedback Loops.** Expect 10– to 15-minute telemetry updates instead of waiting hours or days. ### Fast & Intangible - **Developer Sentiment** Look for fewer “wake-up-for-nothing” complaints in retros or Slack chatter. - **Dashboard Adoption** Track how many non-admins confidently create their own dashboards. ### Slow & Measurable -- **Reduced Vendor Dependence.** Track spend on observability tools or # of dashboards maintained. +- **Reduced Vendor Dependence.** Track spend on observability tools or number of dashboards maintained. - **Product KPIs** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. ### Slow & Intangible -- **Cross-Team Trust.** Do PMs and execs begin referencing telemetry data in decision-making instead of anecdote? +- **Cross-Team Trust.** Do PMs and executives begin referencing telemetry data in decision-making instead of anecdotes? ## Supporting Capabilities ### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) -Enables proactive observability across systems. Fast feedback loops mean catching issues before they reach the user. +Otel enables proactive observability across systems. Fast feedback loops mean catching issues before they reach the user. ### [Monitoring & Observability](/capabilities/monitoring-and-observability.md) -Otel is the open source standard for collecting telemetry data across services and utilizes a unified model to optimize observability in the modern tech landscape. +Otel is the open-source standard for collecting telemetry data across services and uses a unified model to optimize observability in the modern tech landscape. ### [Continuous Delivery](/capabilities/continuous-delivery.md) OTel enables faster, safer deploys by providing near-real-time feedback loops — developers can see the impact of changes minutes after release. ### [Team Experimentation](/capabilities/team-experimentation.md) -Unified telemetry lets devs run safe experiments (optimize queries, adjust configs) and immediately measure business impact. +Unified telemetry lets developers run safe experiments (optimize queries, adjust configs) and immediately measure business impact. ### [Code Maintainability](/capabilities/code-maintainability.md) Consistent observability abstractions act as shared infrastructure patterns, helping teams manage complexity across many repos. @@ -93,6 +89,3 @@ Consistent observability abstractions act as shared infrastructure patterns, hel ### [Job Satisfaction](/capabilities/job-satisfaction.md) Reducing false alarms and giving developers visibility into their real impact improves morale and reduces burnout. - - - From f4eec6166e900276354303449cc11e868bb492af Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Tue, 28 Oct 2025 20:01:41 -0700 Subject: [PATCH 031/131] Re-work OTel practice to simplify and respond to comments --- practices/open-telemetry-practice.md | 83 +++++++++---------- .../otel/alibaba-opentelemetry-journey.md | 14 ++-- ...ing-better-questions-with-opentelemetry.md | 54 ++++++++++++ .../evolution-of-observability-practices.md | 25 ------ .../tech/otel/observability-2-0-honeycomb.md | 8 +- .../otel/opentelemetry-qa-hazel-weakly.md | 25 ------ .../otel/opentelemetry-qa-jennifer-moore.md | 25 ------ .../otel/pax8-observability-2-0-case-study.md | 23 ----- .../tech/otel/sap-opentelemetry-case-study.md | 23 ----- 9 files changed, 105 insertions(+), 175 deletions(-) create mode 100644 resources/tech/otel/asking-better-questions-with-opentelemetry.md delete mode 100644 resources/tech/otel/evolution-of-observability-practices.md delete mode 100644 resources/tech/otel/opentelemetry-qa-hazel-weakly.md delete mode 100644 resources/tech/otel/opentelemetry-qa-jennifer-moore.md delete mode 100644 resources/tech/otel/pax8-observability-2-0-case-study.md delete mode 100644 resources/tech/otel/sap-opentelemetry-case-study.md diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 2c1e6b1..629bb82 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -1,55 +1,53 @@ -# Adopt OpenTelemetry for Unified Observability +# Adopt the OpenTelemetry Standard -> **Practice framing:** -> This practice captures a socio-technical shift underway in modern software teams. -> Observability is moving from the “three pillars” (logs, metrics, traces stored separately) -> to a unified model where all signals are captured as wide, structured events. -> OpenTelemetry (OTel) is the open-source standard enabling this shift — reducing vendor lock-in, -> correlating signals across the stack, and letting developers see their impact in real time. +Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it’s hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. +Without a shared standard, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. Doing so would create fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with a common standard and open-source tools for most major languages. Teams can instrument their systems consistently and send data to a central monitoring system (like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), [Uptrace](https://uptrace.dev/), etc). Since most popular monitoring systems support hte OTel format, teams can switch platforms without major disruptions. -For modern software teams, observability has never been more important — or more complex. [OpenTelemetry](https://opentelemetry.io/) (OTel) is the open-source standard for collecting telemetry data across services. Instead of juggling separate tools for logs, metrics, and traces, OTel helps teams consolidate context into a single structured format. This gives developers and executives the same “source of truth” about how systems behave, which enables faster debugging, richer product insights, and a direct link between engineering work and business outcomes - all without vendor lock-in. - -Some popular OpenTelemetry-compatible platforms include: -- [Honeycomb](https://www.honeycomb.io/) – built around wide, structured events and exploratory debugging -- [Grafana Tempo](https://grafana.com/oss/tempo/) – integrates with the Grafana stack for trace visualization -- [Datadog](https://www.datadoghq.com/), [New Relic](https://newrelic.com/), and [Dynatrace](https://www.dynatrace.com/) – commercial observability vendors with native OTel support -- [Uptrace](https://uptrace.dev/) – an OTel-native open-source observability backend +When the OpenTelemetry standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics become a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility debugging is faster, performance work is more deliberate, and improvement efforts are based on evidence rather than hunches. ## When to Experiment -- "I am a developer and I need to ensure consistent telemetry so that I can more easily debug and see the real-world impact of my code." -- "I am an Ops / SRE and I need to [learn to / ensure that] so that I can reduce alert fatigue, catch issues before customers do, and correlate signals across environments." -- "I am a technical leader and I need to ensure we have an observability tool that scales without vendor lock-in and remains flexibile over time." -- "I am a non-technical executive and I need to ensure the connection between system health and business metrics is reliable so that we have an accurate snapshot of bookings, throughput, and customer retention." +- You are a developer who needs to keep systems operational and performant. +- You are a QA who needs to ensure changes don't introduce systemic failures. +- You are a product leader who needs to track how various changes (or experiments) are affecting the user experience. +- You are an engineering leader who wants to improve the reliability of the overall system. ## How to Gain Traction -### Start with Education & a Champion -Begin by aligning teams on what OTel is and why it matters. Share Charity Majors’ blog post framing [“Observability 2.0”](/resources/tech/otel/observability-2-0-honeycomb.md) to begin to shift mindsets from three pillars to structured events. Secure an executive or senior engineer to sponsor the adoption — without a champion, efforts often stall. See [OTel in Practice: Alibaba’s OpenTelemetry Journey](/resources/tech/otel/alibaba-opentelemetry-journey.md). +1. Secure a Champion From Leadership + +Every successful OpenTelemetry rollout begins with executive sponsorship. Adopting the OpenTelemetry Standard often requires significant time and budget, competes with other organizational priorities, and may face cultural resistance. It's helpful to have a leader who can connect the work to measurable business goals and clear obstacles when resistance appears. Use [Alibaba’s OpenTelemetry journey](/resources/tech/otel/alibaba-opentelemetry-journey.md) as a reference point; it helps leaders understand both the early friction and the long-term payoff of adopting a shared telemetry standard. + +2. Form a Small Cross-Functional Team + +Once leadership is aligned, assemble a small pilot team capable of working across boundaries (backend, frontend, data pipelines, infrastructure, etc). Before starting any technical work, make sure this group shares a common understanding of why observability matters and what "good telemetry" looks like. Use [Charity Majors' Observability 2.0](/resources/tech/otel/observability-2-0-honeycomb.md) and [Asking Better Questions with OpenTelemetry](/resources/tech/otel/asking-better-questions-with-opentelemetry.md) to align on what data should be emitted, how it will be structured, and how teams will use it to ask better questions, not just build prettier dashboards. + +3. Establish a Foundational Repository + +Create a single observability foundation repository that makes OpenTelemetry adoption simple and consistent. Include shared libraries that wrap the OTel SDK, a common telemetry schema for naming and structure, and helper functions that auto-populate useful context like request IDs and build versions. -### Establish a Shared Repository -Set up a dedicated observability repo managed like an open-source project. Include schema definitions, testing rules, a README with setup instructions, and usage guidelines. Lock down standards (e.g., TypeScript interfaces for attributes) so all services produce consistent data. Teams that skip this get unmanageable dashboards. See [OpenTelemetry Q&A Featuring Hazel Weakly](/resources/tech/otel/opentelemetry-qa-hazel-weakly.md). +4. Pilot a Single Path -### Pilot with Auto-Instrumentation -Begin with OTel’s auto-instrumentation libraries to generate traces quickly. Expect noise: Tune by suppressing low-value metrics and layering abstractions. Wrap complex packages in simple helpers so other developers can adopt without digging into raw node modules. +Start with one business-critical request flow and instrument it end-to-end. Choose something visible, like checkout or signup, where results are easy to show. Begin with auto-instrumentation to get traces quickly, then add manual spans where context adds value. Run locally first with a default OTel Collector and simple viewer like Grafana; once stable, deploy to pre-prod and then production. -### Correlate Across Tools -Don’t fight the fact that teams have “their favorite tool.” Instead, enrich OTel spans with IDs and references from those tools. This builds connective tissue and lets people see how siloed data relates inside a unified telemetry stream. See [The Evolution of Observability Practices](/resources/tech/otel/evolution-of-observability-practices.md). +At this stage, success is measured by credibility: OTel should help answer questions that matter to both engineers and business stakeholders — where users drop off, what slows down checkout, and how changes affect conversion. -### Show Quick Wins -The fastest way to win hearts is to show OTel reduces noisy, pointless alerts and makes bugs easier to find. Surface dashboards that answer the questions executives and developers care about: -- “Where are users dropping off?” -- “What query is slowing down checkout?” -- “Is this issue ours or the third party’s?” -Demonstrating visible impact in days builds credibility and momentum. See [OpenTelemetry Q&A Featuring Jennifer Moore](/resources/tech/otel/opentelemetry-qa-jennifer-moore.md). +5. Standardize and Expand + +Once the pilot produces consistent, valuable traces, shift focus from proving value to scaling it. Capture what worked (helper functions, schema conventions, and collector configurations) in the foundation repo and make onboarding self-service. Add concise documentation and validation checks so teams can integrate with minimal friction. Assign clear ownership and version the schema like an API to prevent drift. Expand gradually, measuring progress by consistency rather than speed. Each new integration should strengthen the shared signal, not add noise. ## Lessons From The Field -- _Telemetry Surfaces Politics_ – OTel exposes bottlenecks and ownership gaps. In bureaucratic -cultures, this requires social skill: frame insights as opportunities, not punishments. -- _SAP’s Massive-Scale Modernization_ – SAP revamped its observability architecture across a fleet of 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. This enabled unified logs, metrics, and traces, provided low-risk migration to OpenSearch 2.x, and dramatically sped up incident response. See [SAP Case Study: Massive-Scale Observability Modernization](/resources/tech/otel/sap-opentelemetry-case-study.md). -- _Pax8 Unleashes Curious, Cost‑Effective Instrumentation_ – By moving to Honeycomb’s Observability 2.0 platform, Pax8 empowered engineers, product managers, and ops with structured telemetry and dropped their observability costs by 30%. Their user base grew from 50 to 210 users, democratizing access without breaking the budget. See [Pax8 Case Study: Democratizing Observability 2.0](/resources/tech/otel/pax8-observability-2-0-case-study.md). +**Quick Wins Build Momentum** Visibility improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. + +**Telemetry Surfaces Politics** OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as shared opportunities, not personal failings. + +**Standardization Is Crucial** Without schema discipline, dashboards turn chaotic within weeks. Treat naming and attribute drift as real tech debt. + +**Bridge, Don’t Replace** People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product analytics tooling. OpenTelemetry should compliment that instead of replacing it. + +**Expect Uneven Maturity** Logging support and SDK quality vary by language. Set expectations and plan incremental rollout accordingly. ## Deciding to Polish or Pitch @@ -57,18 +55,18 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ### Fast & Measurable - **Reduced Debug Time** Incidents should resolve faster once OTel is in place. Track cycle time or ticket-to-resolution time via [Jira](https://atlassian.com/software/jira)/incident postmortems. -- **Shorter Deployment Feedback Loops.** Expect 10– to 15-minute telemetry updates instead of waiting hours or days. +- **Shorter Deployment Feedback Loops.** Expect 10– to 15-minute telemetry updates instead of waiting hours or days. ### Fast & Intangible -- **Developer Sentiment** Look for fewer “wake-up-for-nothing” complaints in retros or Slack chatter. -- **Dashboard Adoption** Track how many non-admins confidently create their own dashboards. +- **Developer Sentiment** Look for fewer "wake-up-for-nothing" complaints in retros or Slack chatter. +- **Dashboard Adoption** Track how many non-admins confidently create their own dashboards. ### Slow & Measurable -- **Reduced Vendor Dependence.** Track spend on observability tools or number of dashboards maintained. -- **Product KPIs** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. +- **Reduced Vendor Dependence.** Track spend on observability tools or number of dashboards maintained. +- **Product KPIs** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. ### Slow & Intangible -- **Cross-Team Trust.** Do PMs and executives begin referencing telemetry data in decision-making instead of anecdotes? +- **Cross-Team Trust.** Do PMs and executives begin referencing telemetry data in decision-making instead of anecdotes? ## Supporting Capabilities ### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) @@ -88,4 +86,3 @@ Consistent observability abstractions act as shared infrastructure patterns, hel ### [Job Satisfaction](/capabilities/job-satisfaction.md) Reducing false alarms and giving developers visibility into their real impact improves morale and reduces burnout. - diff --git a/resources/tech/otel/alibaba-opentelemetry-journey.md b/resources/tech/otel/alibaba-opentelemetry-journey.md index 54efe75..a5eda4f 100644 --- a/resources/tech/otel/alibaba-opentelemetry-journey.md +++ b/resources/tech/otel/alibaba-opentelemetry-journey.md @@ -2,7 +2,7 @@ Resource type: Video -Video: https://www.youtube.com/watch?v=fgbB0HhVBq8 +Video: https://www.youtube.com/watch?v=fgbB0HhVBq8 ## What it’s about @@ -14,12 +14,12 @@ It’s a rare enterprise case study told from the inside — with practical deta ## Pause and Ponder -- Who in our org has the authority and conviction to enforce OTel adoption? -- Which legacy systems or processes would create the same friction Alibaba faced? -- How can we ensure telemetry remains comparable across hundreds of services? -- What objections might surface from entrenched teams or leaders, and how could we frame the benefits? -- What governance model would help us scale adoption without chaos? +- Who in our org has the authority and conviction to enforce OTel adoption? +- Which legacy systems or processes would create the same friction Alibaba faced? +- How can we ensure telemetry remains comparable across hundreds of services? +- What objections might surface from entrenched teams or leaders, and how could we frame the benefits? +- What governance model would help us scale adoption without chaos? ## Takeaway -OTel adoption succeeds when backed by champions with enough authority to enforce change. Alibaba’s journey underscores the socio-technical nature of observability: success depends as much on politics as on code. \ No newline at end of file +OTel adoption succeeds when backed by champions with enough authority to enforce change. Alibaba’s journey underscores the socio-technical nature of observability: success depends as much on politics as on code. diff --git a/resources/tech/otel/asking-better-questions-with-opentelemetry.md b/resources/tech/otel/asking-better-questions-with-opentelemetry.md new file mode 100644 index 0000000..cc0605d --- /dev/null +++ b/resources/tech/otel/asking-better-questions-with-opentelemetry.md @@ -0,0 +1,54 @@ +# Asking Better Questions with OpenTelemetry (Feat. Hazel Weakly) + +Resource type: Video + Transcript + +Video: https://www.youtube.com/watch?v=wMJEgrUnX7M +Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-09-13T19:26:10Z-opentelemetry-q-a-feat-hazel-weakly.md + +This Q&A between Adriana Villela and Hazel Weakly explores the hard-won lessons of implementing OpenTelemetry in real-world engineering environments. Hazel's story reveals the difference between "collecting everything" and building a culture that asks meaningful questions through observability. + +## Discussion Guide + +### 1. The Shift from Data Volume to Data Value + +Hazel's team began by flooding their system with 300 million telemetry events per day. The turning point came when they cut that volume by more than 95%, proving that insight comes from curiosity, not collection. + +- What signals do *we* over-collect that no one actually queries? +- What business or reliability questions would justify keeping them? + +### 2. Building a Feedback Loop Between Questions and Instrumentation + +Hazel reframed observability as an evolutionary process: **"asking meaningful questions and getting useful answers."** + +- How can teams treat observability as an iterative feedback loop rather than a one-time setup? +- What signals tell you your current telemetry isn't driving learning? + +### 3. The Ergonomics of Instrumentation + +Hazel describes the friction of manual instrumentation, language inconsistencies (TypeScript vs. Haskell vs. Python), and context-loss across async boundaries. + +- How can platform teams make telemetry ergonomics invisible for developers? +- What would an "OTel starter library" look like in your stack? + +### 4. Cost Pressure as a Forcing Function + +Budget overruns were what finally motivated engineers to refine what they measured. Cost, in this case, became a design constraint that clarified priorities. + +- How might connecting observability to tangible costs sharpen focus on value? + +### 5. Making the Case to Executives + +Hazel stresses that observability investments should show ROI in roughly four months and must tie the developer lifecycle to business value delivery. + +- How do you currently link telemetry outcomes to customer or business impact? +- Could you explain your observability spend to a CFO in one sentence? + +## How This Resource Brings Value + +This conversation moves beyond tutorials and into organizational behavior—the politics, ergonomics, and incentives behind effective observability. It's ideal for: + +- Platform engineers designing shared tracing or metrics frameworks +- Engineering leaders trying to justify observability budgets +- Teams struggling with "too much data, too little insight" + +Use it as a **watch-party or discussion starter** to unpack the social and technical challenges that come with scaling OpenTelemetry from proof-of-concept to practice. diff --git a/resources/tech/otel/evolution-of-observability-practices.md b/resources/tech/otel/evolution-of-observability-practices.md deleted file mode 100644 index 53dfa83..0000000 --- a/resources/tech/otel/evolution-of-observability-practices.md +++ /dev/null @@ -1,25 +0,0 @@ -# The Evolution of Observability Practices - -Resource type: Video + Transcript - -Video: https://www.youtube.com/watch?v=8sTZnM2BC1U -Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-10-11T20:52:28Z-the-evolution-of-observability-practices.md - -## What it’s about - -This session explores how observability practices have shifted from “three pillars” thinking to the more modern OTel-inspired view of wide structured events. Panelists discuss lessons from real organizations: what worked, what failed, and how they bridged siloed tools. - -## Why it’s worth watching/reading - -It connects the dots between OTel’s philosophy and lived team experience. The candid discussion about “tools teams won’t give up” makes it clear that correlation, not replacement, is the pragmatic path forward. - -## Pause and Ponder - -- Where are we still treating metrics/logs/traces as separate in our own stack? -- Which silos exist in our org today, and how could IDs/tags correlate them? -- What advantages do we gain from capturing everything in a single structured format? -- Which success patterns resonate most with our org’s culture? - -## Takeaway - -Observability practices evolve slowly, and teams rarely throw tools away. Success comes from correlation across silos, not dogmatic purity. \ No newline at end of file diff --git a/resources/tech/otel/observability-2-0-honeycomb.md b/resources/tech/otel/observability-2-0-honeycomb.md index 0d0b0c6..40ff3c8 100644 --- a/resources/tech/otel/observability-2-0-honeycomb.md +++ b/resources/tech/otel/observability-2-0-honeycomb.md @@ -14,10 +14,10 @@ This is the canonical articulation of what “Observability 2.0” means. Many t ## Pause and Ponder -- Are we treating observability as pre-built dashboards, or as a flexible tool for answering new questions? -- How could wide structured events help us see connections we’re missing today? -- Where in our org would shifting from “1.0” to “2.0” make the biggest cultural impact? +- Are we treating observability as pre-built dashboards, or as a flexible tool for answering new questions? +- How could wide structured events help us see connections we’re missing today? +- Where in our org would shifting from “1.0” to “2.0” make the biggest cultural impact? ## Takeaway -Observability 2.0 isn’t just more data — it’s a mindset shift. Wide structured events let engineers ask questions they didn’t anticipate, turning telemetry from a static dashboard into a dynamic conversation. \ No newline at end of file +Observability 2.0 isn’t just more data — it’s a mindset shift. Wide structured events let engineers ask questions they didn’t anticipate, turning telemetry from a static dashboard into a dynamic conversation. diff --git a/resources/tech/otel/opentelemetry-qa-hazel-weakly.md b/resources/tech/otel/opentelemetry-qa-hazel-weakly.md deleted file mode 100644 index e9414f9..0000000 --- a/resources/tech/otel/opentelemetry-qa-hazel-weakly.md +++ /dev/null @@ -1,25 +0,0 @@ -# OpenTelemetry Q&A Featuring Hazel Weakly - -Resource type: Video + Transcript - -Video: https://www.youtube.com/watch?v=wMJEgrUnX7M -Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-09-13T19:26:10Z-opentelemetry-q-a-feat-hazel-weakly.md - -## What it’s about - -Hazel Weakly shares hard-won lessons from early OTel implementations. The Q&A touches on schema design, avoiding fragmentation, and how to get engineers comfortable with the shift from “logs/metrics/traces” to a unified event model. - -## Why it’s worth watching/reading - -It’s practical guidance from someone in the trenches, not just theory. Hazel’s framing of schema stability as the single most important adoption factor is a must-hear for any team starting out with OTel. - -## Pause and Ponder - -- How can we signal breaking schema changes clearly to devs consuming telemetry? -- When is it safe to add custom attributes, and when should we standardize them? -- What governance models help keep telemetry consistent across many repos? -- What cultural frictions are likely blockers for adoption in our org? - -## Takeaway - -OTel adoption lives or dies on schema discipline. Versioning and governance aren’t “nice to haves” — they’re the foundation that keeps telemetry comparable across services. \ No newline at end of file diff --git a/resources/tech/otel/opentelemetry-qa-jennifer-moore.md b/resources/tech/otel/opentelemetry-qa-jennifer-moore.md deleted file mode 100644 index f38d960..0000000 --- a/resources/tech/otel/opentelemetry-qa-jennifer-moore.md +++ /dev/null @@ -1,25 +0,0 @@ -# OpenTelemetry Q&A Featuring Jennifer Moore - -Resource type: Video + Transcript - -Video: https://www.youtube.com/watch?v=fRaWavw0T5c -Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-12-04T23:34:44Z-otel-q-a-feat-jennifer-moore.md - -## What it’s about - -Jennifer Moore shares her experiences building developer buy-in for OTel. The Q&A covers alert fatigue, on-call culture, and how to position telemetry as a tool that empowers developers rather than burdens them. - -## Why it’s worth watching/reading - -It highlights the *human side* of observability. Jennifer explains how OTel can win over skeptics by reducing 2am false alarms and giving developers context they actually care about. - -## Pause and Ponder - -- How much noisy or low-value telemetry do we currently have, and how does it affect on-call rotations? -- What would it look like for devs in our org to self-serve dashboards instead of waiting on ops? -- How fast is our deploy-to-insight cycle today, and what would “minutes, not days” unlock? -- How do we make telemetry feel like relief, not surveillance? - -## Takeaway - -Developer adoption hinges on relief. When OTel reduces pointless alerts and empowers self-service, it transforms from “extra work” into a quality-of-life improvement. \ No newline at end of file diff --git a/resources/tech/otel/pax8-observability-2-0-case-study.md b/resources/tech/otel/pax8-observability-2-0-case-study.md deleted file mode 100644 index 5483a6c..0000000 --- a/resources/tech/otel/pax8-observability-2-0-case-study.md +++ /dev/null @@ -1,23 +0,0 @@ -# Pax8 Case Study: Democratizing Observability 2.0 - -Resource type: Case Study - -https://www.honeycomb.io/resources/case-studies/pax8-modern-observability-2-0-solution - -## What it’s about - -Pax8 shifted from traditional observability tools with user-based pricing to Honeycomb’s Observability 2.0 platform. By adopting wide structured events and event-based pricing, they unlocked access for their entire engineering org while reducing costs by 30%. - -## Why it’s worth reading - -Pax8 highlights the cultural side of Observability 2.0. Their story shows how democratizing access — letting *everyone* explore telemetry data, not just senior engineers — boosts curiosity, experimentation, and shared ownership of reliability. It’s also a concrete proof point that OTel-based approaches can *lower* costs, not just add more. - -## Pause and Ponder - -- How does pricing and accessibility shape who actually uses observability data in your org? -- What happens when product managers and junior developers can explore telemetry, not just ops leads? -- What trade-offs did Pax8 make to gain both cost efficiency and coverage? - -## Takeaway - -Observability 2.0 isn’t only about better data — it’s about enabling more people to use that data. Pax8 proved that when access broadens, culture shifts toward curiosity and resilience. \ No newline at end of file diff --git a/resources/tech/otel/sap-opentelemetry-case-study.md b/resources/tech/otel/sap-opentelemetry-case-study.md deleted file mode 100644 index 538d6d5..0000000 --- a/resources/tech/otel/sap-opentelemetry-case-study.md +++ /dev/null @@ -1,23 +0,0 @@ -# SAP Case Study: Massive-Scale Observability Modernization - -Resource type: Case Study - -https://opensearch.org/blog/case-study-sap-unifies-observability-at-scale-with-opensearch-and-opentelemetry/ - -## What it’s about - -SAP overhauled observability across 11,000+ OpenSearch instances by adopting OpenTelemetry and rebuilding ingestion pipelines with Data Prepper. They unified logs, metrics, and traces into a consistent telemetry model while migrating from OpenSearch 1.x to 2.x with minimal disruption. - -## Why it’s worth reading - -This case study shows what OTel adoption looks like at *extreme* scale — in a heavily regulated, enterprise-grade environment. It’s proof that the unified telemetry model isn’t just theory or startup hype: it works even in massive, legacy-heavy organizations. The migration strategy also provides concrete patterns for reducing risk during major observability shifts. - -## Pause and Ponder - -- How did SAP manage both modernization and migration without service disruption? -- What role did standardization (schemas, pipelines) play in making 11,000+ instances manageable? -- Which of SAP’s approaches could scale down to smaller orgs? - -## Takeaway - -OpenTelemetry provides a path for unifying observability across even the largest, most complex enterprises — not just greenfield startups. SAP’s success shows that with the right abstractions and migration planning, scale is not a blocker but an opportunity. \ No newline at end of file From 7867ae82f6fd0920f447c2276e7e71926f02c2f8 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Wed, 29 Oct 2025 20:22:38 -0700 Subject: [PATCH 032/131] Additional re-working of OTel practice --- practices/open-telemetry-practice.md | 53 ++++++++++++++++++---------- 1 file changed, 35 insertions(+), 18 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 629bb82..7d4bde5 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -6,12 +6,14 @@ Without a shared standard, each service describes its behavior differently. One When the OpenTelemetry standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics become a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility debugging is faster, performance work is more deliberate, and improvement efforts are based on evidence rather than hunches. +Beyond better debugging, OTel positions the organization for long-term data leverage. When you control your telemetry pipeline, you own your data, your schema, and your tooling. That gives your organization the ability to evolve, enrich, analyze, and build on the information that drives decisions, without being constrained by a vendor's roadmap or data model. + ## When to Experiment -- You are a developer who needs to keep systems operational and performant. -- You are a QA who needs to ensure changes don't introduce systemic failures. -- You are a product leader who needs to track how various changes (or experiments) are affecting the user experience. -- You are an engineering leader who wants to improve the reliability of the overall system. +- You're a developer who needs to keep systems operational and performant. +- You're a QA who needs to ensure changes don't introduce systemic failures. +- You're a product leader who needs to track how various changes (or experiments) are affecting the user experience. +- You’re an engineering leader focused on improving system reliability and creating fast feedback loops to understand system health and the impact of every change. ## How to Gain Traction @@ -29,13 +31,19 @@ Create a single observability foundation repository that makes OpenTelemetry ado 4. Pilot a Single Path -Start with one business-critical request flow and instrument it end-to-end. Choose something visible, like checkout or signup, where results are easy to show. Begin with auto-instrumentation to get traces quickly, then add manual spans where context adds value. Run locally first with a default OTel Collector and simple viewer like Grafana; once stable, deploy to pre-prod and then production. +Start with one business-critical request flow and instrument it end-to-end. Pick something visible, like checkout or signup, where results are easy to demonstrate. + +Begin with two simple telemetry configurations: + +- **Instrumenting like you log** Make adding spans as easy as calling console.log(). Developers should be able to drop in trace points without complicated dependency wiring, test mocks, or ceremony. During local development, spans should default to printing to stdout, and running silently during tests. + +Once signals are clear locally, deploy the collector and instrumentation to pre-prod and then production. -At this stage, success is measured by credibility: OTel should help answer questions that matter to both engineers and business stakeholders — where users drop off, what slows down checkout, and how changes affect conversion. +Success at this stage isn’t volume, it's usefulness. The pilot should answer questions engineers and leaders care about, such as where users drop off, what slows down a key flow, and how recent changes affect conversion or error spikes. 5. Standardize and Expand -Once the pilot produces consistent, valuable traces, shift focus from proving value to scaling it. Capture what worked (helper functions, schema conventions, and collector configurations) in the foundation repo and make onboarding self-service. Add concise documentation and validation checks so teams can integrate with minimal friction. Assign clear ownership and version the schema like an API to prevent drift. Expand gradually, measuring progress by consistency rather than speed. Each new integration should strengthen the shared signal, not add noise. +Once the pilot produces consistent, valuable traces, shift focus from proving value to scaling it. Capture what worked (helper functions, schema conventions, and collector configurations) in the foundation repo and make onboarding self-service. Add concise documentation and validation checks so teams can integrate with minimal friction. Create a clear governance for the repo to guide future changes and version the schema like an API to prevent drift. Expand gradually, measuring progress by consistency rather than speed. Each new integration should strengthen the shared signal, not add noise. ## Lessons From The Field @@ -43,7 +51,7 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va **Telemetry Surfaces Politics** OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as shared opportunities, not personal failings. -**Standardization Is Crucial** Without schema discipline, dashboards turn chaotic within weeks. Treat naming and attribute drift as real tech debt. +**Some Assembly Required** OpenTelemetry isn’t plug-and-play—it’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective layer that unifies data and insight across teams. **Bridge, Don’t Replace** People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product analytics tooling. OpenTelemetry should compliment that instead of replacing it. @@ -54,35 +62,44 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va After experimenting with this practice for **4–5 weeks**, bring the team together and determine whether the following metrics and/or signals have changed in a positive direction: ### Fast & Measurable -- **Reduced Debug Time** Incidents should resolve faster once OTel is in place. Track cycle time or ticket-to-resolution time via [Jira](https://atlassian.com/software/jira)/incident postmortems. -- **Shorter Deployment Feedback Loops.** Expect 10– to 15-minute telemetry updates instead of waiting hours or days. + +- **Mean Time To Recover** Developers should find and confirm root causes faster using telemetry rather than disparate data points and anecdotes. This can be tracked via incident timelines in postmortems or Jira/incident tooling. +- **Faster Deployment Feedback Loops** Engineers should see how changes affect the system within minutes, not hours. Can be measured via deployment pipelines (CI/CD timestamps) and time to first meaningful telemetry signals after deploy. ### Fast & Intangible -- **Developer Sentiment** Look for fewer "wake-up-for-nothing" complaints in retros or Slack chatter. -- **Dashboard Adoption** Track how many non-admins confidently create their own dashboards. + +- **More Productive Debugging Behaviors** Teams default to tracing and telemetry for understanding issues, instead of log-hunting or adding prints. Capture this via retro notes, engineering Slack chatter, or direct developer feedback. ### Slow & Measurable -- **Reduced Vendor Dependence.** Track spend on observability tools or number of dashboards maintained. + - **Product KPIs** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. ### Slow & Intangible -- **Cross-Team Trust.** Do PMs and executives begin referencing telemetry data in decision-making instead of anecdotes? + +- **Cross-Team Collaboration** Engineers, PMs, and leaders reference telemetry when discussing reliability or performance decisions, replacing anecdote-driven debates. Gauge via meeting observations, PM feedback, and roadmap discussions. ## Supporting Capabilities + ### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) -Otel enables proactive observability across systems. Fast feedback loops mean catching issues before they reach the user. + +OTel enables proactive observability across systems. Fast feedback loops mean catching issues before they reach the user. ### [Monitoring & Observability](/capabilities/monitoring-and-observability.md) -Otel is the open-source standard for collecting telemetry data across services and uses a unified model to optimize observability in the modern tech landscape. + +OTel is the open-source standard for collecting telemetry data across services and uses a unified model to optimize observability in the modern tech landscape. ### [Continuous Delivery](/capabilities/continuous-delivery.md) -OTel enables faster, safer deploys by providing near-real-time feedback loops — developers can see the impact of changes minutes after release. + +OTel enables faster, safer deploys by providing near-real-time feedback loops. Developers can see the impact of changes minutes after release. ### [Team Experimentation](/capabilities/team-experimentation.md) + Unified telemetry lets developers run safe experiments (optimize queries, adjust configs) and immediately measure business impact. ### [Code Maintainability](/capabilities/code-maintainability.md) -Consistent observability abstractions act as shared infrastructure patterns, helping teams manage complexity across many repos. + +OpenTelemetry doesn't make code cleaner by itself, but it does make complexity more visible. Traces highlight tangled dependencies, slow paths, and tightly-coupled services that are hard to change. This clarity gives teams confidence to refactor safely and helps prioritize the areas where cleanup will have the biggest impact. ### [Job Satisfaction](/capabilities/job-satisfaction.md) + Reducing false alarms and giving developers visibility into their real impact improves morale and reduces burnout. From 93d94307353d2736f28954c76b8d39ffdc5dc9f0 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Fri, 31 Oct 2025 12:23:30 -0500 Subject: [PATCH 033/131] edits to updated OTel practice --- practices/open-telemetry-practice.md | 52 ++++++++++++++-------------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 7d4bde5..d729940 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -2,11 +2,11 @@ Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it’s hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. -Without a shared standard, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. Doing so would create fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with a common standard and open-source tools for most major languages. Teams can instrument their systems consistently and send data to a central monitoring system (like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), [Uptrace](https://uptrace.dev/), etc). Since most popular monitoring systems support hte OTel format, teams can switch platforms without major disruptions. +Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system (like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. -When the OpenTelemetry standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics become a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility debugging is faster, performance work is more deliberate, and improvement efforts are based on evidence rather than hunches. +When the OTel standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics are collected to form a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility, debugging is faster, performance work is more deliberate, and improvements become evidence-based rather than guided by hunches. -Beyond better debugging, OTel positions the organization for long-term data leverage. When you control your telemetry pipeline, you own your data, your schema, and your tooling. That gives your organization the ability to evolve, enrich, analyze, and build on the information that drives decisions, without being constrained by a vendor's roadmap or data model. +Beyond better debugging, OTel positions the organization for long-term data leverage. When you control your telemetry pipeline, you *own* your data, your schema, and your tooling. That gives your organization the ability to evolve, enrich, and analyze the information that drives decisions, without being constrained by a vendor's roadmap or data model. ## When to Experiment @@ -17,45 +17,45 @@ Beyond better debugging, OTel positions the organization for long-term data leve ## How to Gain Traction -1. Secure a Champion From Leadership +### Secure a Champion From Leadership -Every successful OpenTelemetry rollout begins with executive sponsorship. Adopting the OpenTelemetry Standard often requires significant time and budget, competes with other organizational priorities, and may face cultural resistance. It's helpful to have a leader who can connect the work to measurable business goals and clear obstacles when resistance appears. Use [Alibaba’s OpenTelemetry journey](/resources/tech/otel/alibaba-opentelemetry-journey.md) as a reference point; it helps leaders understand both the early friction and the long-term payoff of adopting a shared telemetry standard. +Every successful OpenTelemetry rollout begins with executive sponsorship. Adopting the OTel standard often requires significant time and budget, and it means competing with other organizational priorities. So, a shift toward OTel may face cultural resistance. It's helpful to have a leader who can connect the work to measurable business goals and clear obstacles when resistance appears. Use [Alibaba’s OpenTelemetry journey](/resources/tech/otel/alibaba-opentelemetry-journey.md) as a reference point; it helps leaders understand both the early friction and the long-term payoff of adopting a shared telemetry standard. -2. Form a Small Cross-Functional Team +### Form a Small Cross-Functional Team -Once leadership is aligned, assemble a small pilot team capable of working across boundaries (backend, frontend, data pipelines, infrastructure, etc). Before starting any technical work, make sure this group shares a common understanding of why observability matters and what "good telemetry" looks like. Use [Charity Majors' Observability 2.0](/resources/tech/otel/observability-2-0-honeycomb.md) and [Asking Better Questions with OpenTelemetry](/resources/tech/otel/asking-better-questions-with-opentelemetry.md) to align on what data should be emitted, how it will be structured, and how teams will use it to ask better questions, not just build prettier dashboards. +Once leadership is aligned, assemble a small pilot team capable of working across boundaries (backend, frontend, data pipelines, infrastructure, etc). Before starting any technical work, make sure this group shares a common understanding of why observability matters and what "good telemetry" looks like. Use [Charity Majors' Observability 2.0](/resources/tech/otel/observability-2-0-honeycomb.md) and [Asking Better Questions with OpenTelemetry](/resources/tech/otel/asking-better-questions-with-opentelemetry.md) to align on what data should be emitted, how it will be structured, and how teams will use it to ask better questions (not just build prettier dashboards). -3. Establish a Foundational Repository +### Establish a Foundational Repository Create a single observability foundation repository that makes OpenTelemetry adoption simple and consistent. Include shared libraries that wrap the OTel SDK, a common telemetry schema for naming and structure, and helper functions that auto-populate useful context like request IDs and build versions. -4. Pilot a Single Path +### Pilot a Single Path Start with one business-critical request flow and instrument it end-to-end. Pick something visible, like checkout or signup, where results are easy to demonstrate. Begin with two simple telemetry configurations: -- **Instrumenting like you log** Make adding spans as easy as calling console.log(). Developers should be able to drop in trace points without complicated dependency wiring, test mocks, or ceremony. During local development, spans should default to printing to stdout, and running silently during tests. +- **Instrumenting like you log.** Make adding spans as easy as calling console.log(). Developers should be able to drop in trace points without complicated dependency wiring, test mocks, or ceremony. During local development, spans should default to printing to stdout and running silently during tests. Once signals are clear locally, deploy the collector and instrumentation to pre-prod and then production. Success at this stage isn’t volume, it's usefulness. The pilot should answer questions engineers and leaders care about, such as where users drop off, what slows down a key flow, and how recent changes affect conversion or error spikes. -5. Standardize and Expand +### Standardize and Expand -Once the pilot produces consistent, valuable traces, shift focus from proving value to scaling it. Capture what worked (helper functions, schema conventions, and collector configurations) in the foundation repo and make onboarding self-service. Add concise documentation and validation checks so teams can integrate with minimal friction. Create a clear governance for the repo to guide future changes and version the schema like an API to prevent drift. Expand gradually, measuring progress by consistency rather than speed. Each new integration should strengthen the shared signal, not add noise. +Once the pilot produces consistent, valuable traces, shift focus from proving value to scaling it. Capture what worked (helper functions, schema conventions, and collector configurations) in the foundation repository and make onboarding self-service. Add concise documentation and validation checks so teams can integrate with minimal friction. Create a clear governance for the repository to guide future changes and version the schema like an API to prevent drift. Expand gradually, measuring progress by consistency rather than speed. Each new integration should strengthen the shared signal, not add noise. ## Lessons From The Field -**Quick Wins Build Momentum** Visibility improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. +*Quick Wins Build Momentum* - Observability improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. -**Telemetry Surfaces Politics** OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as shared opportunities, not personal failings. +*Telemetry Surfaces Politics* - OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. -**Some Assembly Required** OpenTelemetry isn’t plug-and-play—it’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective layer that unifies data and insight across teams. +*Some Assembly Required* - OTel isn’t plug-and-play. It’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective tissue that unifies data and insight across teams. -**Bridge, Don’t Replace** People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product analytics tooling. OpenTelemetry should compliment that instead of replacing it. +*Bridge, Don’t Replace* - People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. -**Expect Uneven Maturity** Logging support and SDK quality vary by language. Set expectations and plan incremental rollout accordingly. +*Expect Uneven Maturity* - Logging support and SDK quality vary by language. Set expectations and plan incremental rollout accordingly. ## Deciding to Polish or Pitch @@ -63,20 +63,20 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ### Fast & Measurable -- **Mean Time To Recover** Developers should find and confirm root causes faster using telemetry rather than disparate data points and anecdotes. This can be tracked via incident timelines in postmortems or Jira/incident tooling. -- **Faster Deployment Feedback Loops** Engineers should see how changes affect the system within minutes, not hours. Can be measured via deployment pipelines (CI/CD timestamps) and time to first meaningful telemetry signals after deploy. +- **Mean Time To Recover.** Developers should find and confirm root causes faster using telemetry data rather than disparate data points and anecdotes. This can be tracked via incident timelines in postmortems or Jira/incident tooling. +- **Faster Deployment Feedback Loops.** Engineers should see how changes affect the system within minutes, not hours. This can be measured via deployment pipelines (CI/CD timestamps) and time-to-first-meaningful-telemetry-signals after deploy. ### Fast & Intangible -- **More Productive Debugging Behaviors** Teams default to tracing and telemetry for understanding issues, instead of log-hunting or adding prints. Capture this via retro notes, engineering Slack chatter, or direct developer feedback. +- **More Productive Debugging Behaviors.** Teams default to tracing and telemetry for understanding issues, instead of log-hunting or adding prints. Capture this via retrospective notes, engineering Slack chatter, or direct developer feedback. ### Slow & Measurable -- **Product KPIs** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. +- **Product KPIs.** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. ### Slow & Intangible -- **Cross-Team Collaboration** Engineers, PMs, and leaders reference telemetry when discussing reliability or performance decisions, replacing anecdote-driven debates. Gauge via meeting observations, PM feedback, and roadmap discussions. +- **Cross-Team Collaboration.** Engineers, PMs, and leaders should reference telemetry data when discussing reliability or performance decisions, replacing anecdote-driven debates. Gauge this via meeting observations, PM feedback, and roadmap discussions. ## Supporting Capabilities @@ -86,7 +86,7 @@ OTel enables proactive observability across systems. Fast feedback loops mean ca ### [Monitoring & Observability](/capabilities/monitoring-and-observability.md) -OTel is the open-source standard for collecting telemetry data across services and uses a unified model to optimize observability in the modern tech landscape. +OTel is the open-source standard for collecting, unifying, and standardizing telemetry data across services. Its model optimizes observability in the modern tech landscape. ### [Continuous Delivery](/capabilities/continuous-delivery.md) @@ -94,12 +94,12 @@ OTel enables faster, safer deploys by providing near-real-time feedback loops. D ### [Team Experimentation](/capabilities/team-experimentation.md) -Unified telemetry lets developers run safe experiments (optimize queries, adjust configs) and immediately measure business impact. +Unified telemetry data lets developers run safe experiments (i.e., optimize queries, adjust configs) and immediately measure business impact. ### [Code Maintainability](/capabilities/code-maintainability.md) -OpenTelemetry doesn't make code cleaner by itself, but it does make complexity more visible. Traces highlight tangled dependencies, slow paths, and tightly-coupled services that are hard to change. This clarity gives teams confidence to refactor safely and helps prioritize the areas where cleanup will have the biggest impact. +OTel doesn't make code cleaner by itself, but it does make complexity more visible. Traces highlight tangled dependencies, slow paths, and tightly-coupled services that are hard to change. This clarity gives teams confidence to refactor safely and helps prioritize the areas where cleanup will have the biggest impact. ### [Job Satisfaction](/capabilities/job-satisfaction.md) -Reducing false alarms and giving developers visibility into their real impact improves morale and reduces burnout. +Reducing false alarms and giving developers visibility into the real impact of their work improves morale and reduces burnout. From 199012a58c727473f2f4df39cbf270b92cbdee9c Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Fri, 31 Oct 2025 12:07:59 -0700 Subject: [PATCH 034/131] Re-add missing content to otel practice --- practices/open-telemetry-practice.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index d729940..23d9c09 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -36,6 +36,7 @@ Start with one business-critical request flow and instrument it end-to-end. Pick Begin with two simple telemetry configurations: - **Instrumenting like you log.** Make adding spans as easy as calling console.log(). Developers should be able to drop in trace points without complicated dependency wiring, test mocks, or ceremony. During local development, spans should default to printing to stdout and running silently during tests. +- **Run a real pipeline locally** In parallel, stand up a lightweight local collector + viewer (e.g., docker compose up for OTel Collector + Grafana / Jaeger) and send the same spans there. This validates structure, naming, and context while building confidence that the data tells a coherent story before touching production. Once signals are clear locally, deploy the collector and instrumentation to pre-prod and then production. @@ -72,7 +73,7 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ### Slow & Measurable -- **Product KPIs.** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. +- **Product KPIs.** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. ### Slow & Intangible From 18f5b0f15fa1da3b8c97d3b209b3420f39446261 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Fri, 31 Oct 2025 12:09:17 -0700 Subject: [PATCH 035/131] Convert unordered list to numbered list for clarity --- practices/open-telemetry-practice.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 23d9c09..8a5a840 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -35,8 +35,8 @@ Start with one business-critical request flow and instrument it end-to-end. Pick Begin with two simple telemetry configurations: -- **Instrumenting like you log.** Make adding spans as easy as calling console.log(). Developers should be able to drop in trace points without complicated dependency wiring, test mocks, or ceremony. During local development, spans should default to printing to stdout and running silently during tests. -- **Run a real pipeline locally** In parallel, stand up a lightweight local collector + viewer (e.g., docker compose up for OTel Collector + Grafana / Jaeger) and send the same spans there. This validates structure, naming, and context while building confidence that the data tells a coherent story before touching production. +1. **Instrumenting like you log.** Make adding spans as easy as calling console.log(). Developers should be able to drop in trace points without complicated dependency wiring, test mocks, or ceremony. During local development, spans should default to printing to stdout and running silently during tests. +2. **Run a real pipeline locally** In parallel, stand up a lightweight local collector + viewer (e.g., docker compose up for OTel Collector + Grafana / Jaeger) and send the same spans there. This validates structure, naming, and context while building confidence that the data tells a coherent story before touching production. Once signals are clear locally, deploy the collector and instrumentation to pre-prod and then production. From 397277b2b63b6f9c9cbb04b16c5c27f13bbf09dc Mon Sep 17 00:00:00 2001 From: Dave Moore <850537+dcmoore@users.noreply.github.com> Date: Tue, 4 Nov 2025 16:52:22 -0800 Subject: [PATCH 036/131] Update ordering of supported capabilities in otel practice Co-authored-by: Ian Carroll <14797009+IanDCarroll@users.noreply.github.com> --- practices/open-telemetry-practice.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 8a5a840..a3a2a12 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -81,14 +81,14 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ## Supporting Capabilities -### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) - -OTel enables proactive observability across systems. Fast feedback loops mean catching issues before they reach the user. - ### [Monitoring & Observability](/capabilities/monitoring-and-observability.md) OTel is the open-source standard for collecting, unifying, and standardizing telemetry data across services. Its model optimizes observability in the modern tech landscape. +### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) + +OTel enables proactive observability across systems. Fast feedback loops mean catching issues before they reach the user. + ### [Continuous Delivery](/capabilities/continuous-delivery.md) OTel enables faster, safer deploys by providing near-real-time feedback loops. Developers can see the impact of changes minutes after release. From 28e86c299cccbed50f29b024d1847abf2e62d6bc Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Tue, 4 Nov 2025 20:40:11 -0800 Subject: [PATCH 037/131] Add a resource page for the official OTel docs --- resources/tech/otel/official-docs.md | 83 ++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 resources/tech/otel/official-docs.md diff --git a/resources/tech/otel/official-docs.md b/resources/tech/otel/official-docs.md new file mode 100644 index 0000000..0052977 --- /dev/null +++ b/resources/tech/otel/official-docs.md @@ -0,0 +1,83 @@ +# OpenTelemetry Documentation + +Resource type: Documentation + +https://opentelemetry.io/docs/ + +The OpenTelemetry documentation is the definitive place for learning how to instrument modern systems with consistent telemetry. It explains the core ideas behind traces, metrics, and logs, how they work together, and how to apply them in real applications. The docs also walk through setup for different languages and show how to use the OpenTelemetry Collector to manage and route telemetry data. + +You want to start reading the docs once you've gotten your hands dirty enough to contextualize the information, but before you're stuck without a clean way forward and teammates begin imitating patterns you never meant to standardize. Treat the docs as a core learning asset—not a quickstart. They may seem intimidating, but they accelerate mastery and make OTel adoption smoother, cleaner, and far less frustrating. + +## Sequencing the Material + +1. Start with the Concepts section to align vocabulary +2. Follow the language-specific guides for your stack +3. Spin up the Collector locally to see data flow end-to-end in a simple hello world application +4. Bookmark semantic conventions as a reference while instrumenting +5. Pair further reading with adding real instrumentation to your system (e.g., signup, checkout, a job path). Instrument it together, referencing the docs as you go. + +## Discussion Prompts + +If going through the material with your team, it might be helpful to host a series of book clubs where the content gets discussed. Some of the following questions might be helpful to get a productive conversation flowing: + +### Core Concepts & Mental Models + +- How do traces, metrics, and logs relate? When does each add value? +- What is context propagation and why is it fundamental in distributed systems? +- What are "semantic conventions" and how do they reduce cognitive load? + +### Instrumentation Strategy + +- Should we start with auto-instrumentation or manual instrumentation? Why? +- Where in our stack should instrumentation begin? API edge, async jobs, data layer? +- What signals should we emit first (traces, metrics, logs)? +- How do we ensure we're not over-instrumenting and generating noise? +- What criteria help us decide when a span should be nested vs. separate? +- What does "done" look like for our first end-to-end instrumented flow? + +### Collector Design & Deployment + +- What's the simplest meaningful Collector pipeline we can start with? +- Do we benefit more from local agent collectors, a centralized gateway model, or a hybrid approach? How does that choice affect developer experience, reliability, and cost? +- Which processors and exporters make sense for us initially? +- Given OTel's support for multiple pipelines, shared receivers, fan-out to multiple exporters, and processor isolation per pipeline, how should we architect our telemetry flow to balance performance, reliability, and extensibility? +- How do we handle secure export (TLS, auth) across environments? + +### Semantic Conventions & Naming + +- How do we choose naming conventions for spans, attributes, and services? +- What attributes are required vs. recommended for HTTP, DB, messaging, etc.? +- How will we version conventions and document expected patterns internally? +- What process should engineers follow if semantic conventions conflict with real-world needs? + +### Performance & Overhead + +- What performance impact should we expect from instrumentation? +- When is sampling appropriate, and which sampling strategy fits our workloads? +- How should we measure and tune telemetry cost vs. value? +- Where do bottlenecks typically appear in telemetry pipelines? + +### Operational Maturity + +- How do we track when spans go missing or context propagation breaks? +- How will we teach new engineers to contribute instrumentation confidently? + +### Integration with Existing Tools + +- How do we layer OTel into our current APM / logging / metrics stack without disruption? +- When should we replace vendor-specific SDKs with OTel instrumentation? +- How do we validate data parity and reliability across platforms? + +### Organizational Behavior & Culture + +- How do we make observability a practice of learning and improving instead of a compliance task? +- What metrics or signals help us prove telemetry's value to leadership? +- How do we encourage teams to debug from traces instead of logs alone? +- How can we hack our existing rituals (PR review, pairing, demos, etc) to help telemetry adoption stick? + +### Good Capstone Questions + +- What's one misunderstanding you cleared up by reading the docs? +- What piece of the stack feels most urgent to instrument next—and why? +- What's one convention or pattern we should codify before scaling adoption? +- What blockers do we foresee, and how can we remove them now? From 58b60c0c2f3235cb5c983ea7aabd5332c651dcfa Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Tue, 4 Nov 2025 20:41:43 -0800 Subject: [PATCH 038/131] Make minor formatting tweaks to otel resource pages --- resources/tech/otel/alibaba-opentelemetry-journey.md | 2 +- .../tech/otel/asking-better-questions-with-opentelemetry.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/resources/tech/otel/alibaba-opentelemetry-journey.md b/resources/tech/otel/alibaba-opentelemetry-journey.md index a5eda4f..b0a15b8 100644 --- a/resources/tech/otel/alibaba-opentelemetry-journey.md +++ b/resources/tech/otel/alibaba-opentelemetry-journey.md @@ -2,7 +2,7 @@ Resource type: Video -Video: https://www.youtube.com/watch?v=fgbB0HhVBq8 +https://www.youtube.com/watch?v=fgbB0HhVBq8 ## What it’s about diff --git a/resources/tech/otel/asking-better-questions-with-opentelemetry.md b/resources/tech/otel/asking-better-questions-with-opentelemetry.md index cc0605d..2cbfe2d 100644 --- a/resources/tech/otel/asking-better-questions-with-opentelemetry.md +++ b/resources/tech/otel/asking-better-questions-with-opentelemetry.md @@ -1,6 +1,6 @@ # Asking Better Questions with OpenTelemetry (Feat. Hazel Weakly) -Resource type: Video + Transcript +Resource type: Video & Transcript Video: https://www.youtube.com/watch?v=wMJEgrUnX7M Transcript: https://github.com/open-telemetry/sig-end-user/blob/main/video-transcripts/transcripts/2023-09-13T19:26:10Z-opentelemetry-q-a-feat-hazel-weakly.md From 8d748c4ac3cb97d657cc9970cf7de7a255746099 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Tue, 4 Nov 2025 20:42:34 -0800 Subject: [PATCH 039/131] Incorporate John's feedback on the Otel practice --- practices/open-telemetry-practice.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index a3a2a12..1677590 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -23,11 +23,11 @@ Every successful OpenTelemetry rollout begins with executive sponsorship. Adopti ### Form a Small Cross-Functional Team -Once leadership is aligned, assemble a small pilot team capable of working across boundaries (backend, frontend, data pipelines, infrastructure, etc). Before starting any technical work, make sure this group shares a common understanding of why observability matters and what "good telemetry" looks like. Use [Charity Majors' Observability 2.0](/resources/tech/otel/observability-2-0-honeycomb.md) and [Asking Better Questions with OpenTelemetry](/resources/tech/otel/asking-better-questions-with-opentelemetry.md) to align on what data should be emitted, how it will be structured, and how teams will use it to ask better questions (not just build prettier dashboards). +Once leadership is aligned, assemble a small pilot team capable of working across boundaries (backend, frontend, data pipelines, infrastructure, etc). Before starting any technical work, make sure this group shares a common understanding of why observability matters and what "good telemetry" looks like. Use [Charity Majors' Observability 2.0](/resources/tech/otel/observability-2-0-honeycomb.md) and [Asking Better Questions with OpenTelemetry](/resources/tech/otel/asking-better-questions-with-opentelemetry.md) to align on what data should be emitted, how it will be structured, and how teams will use it to ask better questions (not just build prettier dashboards). Also consider organizing workshops, spikes, hackathons, etc that will get your team's hands dirty with the [documentation](https://opentelemetry.io/docs/what-is-opentelemetry/) and tooling. ### Establish a Foundational Repository -Create a single observability foundation repository that makes OpenTelemetry adoption simple and consistent. Include shared libraries that wrap the OTel SDK, a common telemetry schema for naming and structure, and helper functions that auto-populate useful context like request IDs and build versions. +Create a single observability foundation repository that makes OpenTelemetry adoption simple and consistent. Include shared libraries that wrap the OTel SDK, a common telemetry schema for naming and structure, and helper functions that auto-populate useful context like request IDs and build versions. Consider leveraging OTel's official [semantic conventions](https://opentelemetry.io/docs/concepts/semantic-conventions/) and [attribute registry](https://opentelemetry.io/docs/specs/semconv/registry/attributes/). ### Pilot a Single Path From 809d745d82464f2b6eb4fbea2a71a282d66cb73f Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Wed, 5 Nov 2025 20:36:56 -0800 Subject: [PATCH 040/131] Refine otel practice starting point based on client learnings --- practices/open-telemetry-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 1677590..5d8776f 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -35,7 +35,7 @@ Start with one business-critical request flow and instrument it end-to-end. Pick Begin with two simple telemetry configurations: -1. **Instrumenting like you log.** Make adding spans as easy as calling console.log(). Developers should be able to drop in trace points without complicated dependency wiring, test mocks, or ceremony. During local development, spans should default to printing to stdout and running silently during tests. +1. **Instrument to standard out.** Start with [auto-instrumentation libraries](https://opentelemetry.io/docs/languages/) available for your language or framework. These often require no code changes and can emit useful spans immediately. Then layer in a simple manual API that feels like console.log(). Developers should be able to add spans or structured logs with a single call, no complicated wiring or mocks required. During local dev, spans should print to stdout. During test runs, they should be silently ignored. 2. **Run a real pipeline locally** In parallel, stand up a lightweight local collector + viewer (e.g., docker compose up for OTel Collector + Grafana / Jaeger) and send the same spans there. This validates structure, naming, and context while building confidence that the data tells a coherent story before touching production. Once signals are clear locally, deploy the collector and instrumentation to pre-prod and then production. From 8c546098c6f859f3301fe1256f4f0982a8aaf1d7 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Wed, 5 Nov 2025 20:37:37 -0800 Subject: [PATCH 041/131] Replace an otel lesson from the field with one more novel --- practices/open-telemetry-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 5d8776f..25833bc 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -56,7 +56,7 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va *Bridge, Don’t Replace* - People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. -*Expect Uneven Maturity* - Logging support and SDK quality vary by language. Set expectations and plan incremental rollout accordingly. +*Expect The Unexpected* - Auto-instrumentation often surfaces insights teams wouldn't think to look for. It can reveal details that manual instrumentation might miss, like unused routes being hit by scanners, inefficient library calls, or unexpected dependency behavior. These discoveries can inform everything from performance tuning to security awareness, turning "extra" visibility into real operational intelligence. ## Deciding to Polish or Pitch From 259fe3c33ef2f56f161622db29968ec887aa2e2c Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Wed, 5 Nov 2025 21:41:04 -0800 Subject: [PATCH 042/131] Add a definition of done to otel pilot --- practices/open-telemetry-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 25833bc..3b7eaf0 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -44,7 +44,7 @@ Success at this stage isn’t volume, it's usefulness. The pilot should answer q ### Standardize and Expand -Once the pilot produces consistent, valuable traces, shift focus from proving value to scaling it. Capture what worked (helper functions, schema conventions, and collector configurations) in the foundation repository and make onboarding self-service. Add concise documentation and validation checks so teams can integrate with minimal friction. Create a clear governance for the repository to guide future changes and version the schema like an API to prevent drift. Expand gradually, measuring progress by consistency rather than speed. Each new integration should strengthen the shared signal, not add noise. +Once the pilot produces consistent, valuable traces, shift focus from proving value to scaling it. Capture what worked (helper functions, schema conventions, and collector configurations) in the foundation repository and make onboarding self-service. Add concise documentation and validation checks so teams can integrate with minimal friction. Create a clear governance for the repository to guide future changes and version the schema like an API to prevent drift. Expand gradually, measuring progress by consistency rather than speed. Each new integration should strengthen the shared signal, not add noise. You'll know you're on the right track when engineers and leadership alike start referencing the telemetry data to discuss issues or raise their own concerns un-prompted. ## Lessons From The Field From 14166328e365d7573ce1f4081a33b523c7ff7f5e Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Wed, 5 Nov 2025 21:50:43 -0800 Subject: [PATCH 043/131] Add a warning about skyrocketing costs for otel --- practices/open-telemetry-practice.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 3b7eaf0..6084acb 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -48,15 +48,17 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va ## Lessons From The Field -*Quick Wins Build Momentum* - Observability improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. +*Quick Wins Build Momentum* Observability improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. -*Telemetry Surfaces Politics* - OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. +*Telemetry Surfaces Politics* OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. -*Some Assembly Required* - OTel isn’t plug-and-play. It’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective tissue that unifies data and insight across teams. +*Some Assembly Required* OTel isn’t plug-and-play. It’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective tissue that unifies data and insight across teams. -*Bridge, Don’t Replace* - People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. +*Bridge, Don’t Replace* People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. -*Expect The Unexpected* - Auto-instrumentation often surfaces insights teams wouldn't think to look for. It can reveal details that manual instrumentation might miss, like unused routes being hit by scanners, inefficient library calls, or unexpected dependency behavior. These discoveries can inform everything from performance tuning to security awareness, turning "extra" visibility into real operational intelligence. +*Expect the Unexpected* Auto-instrumentation often surfaces insights teams wouldn't think to look for. It can reveal details that manual instrumentation might miss, like unused routes being hit by scanners, inefficient library calls, or unexpected dependency behavior. These discoveries can inform everything from performance tuning to security awareness, turning "extra" visibility into real operational intelligence. + +*Be Mindful of Costs* Early OTel rollouts often produce far more data than needed, especially when systems double-log through multiple agents or send every log line to a collector. This creates both noise and unexpected costs. The solution is to sample aggressively once instrumentation is proven and to use open-source visualizers such as [Grafana](https://grafana.com/) in development. You can also add a verbose configuration flag for local debugging that remains off in deployed environments, preserving deep visibility without inflating bills. Finally, favor traces and spans over raw logs since they provide richer context, lower storage costs, and make the data easier to interpret. ## Deciding to Polish or Pitch From aac7af067fbf9168aa87ca1a6db4b78a4aa824fb Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Wed, 5 Nov 2025 21:56:43 -0800 Subject: [PATCH 044/131] Add more color to the takeaway of observability 2.0 resource --- resources/tech/otel/observability-2-0-honeycomb.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resources/tech/otel/observability-2-0-honeycomb.md b/resources/tech/otel/observability-2-0-honeycomb.md index 40ff3c8..d977eb9 100644 --- a/resources/tech/otel/observability-2-0-honeycomb.md +++ b/resources/tech/otel/observability-2-0-honeycomb.md @@ -20,4 +20,4 @@ This is the canonical articulation of what “Observability 2.0” means. Many t ## Takeaway -Observability 2.0 isn’t just more data — it’s a mindset shift. Wide structured events let engineers ask questions they didn’t anticipate, turning telemetry from a static dashboard into a dynamic conversation. +Observability 2.0 isn’t about collecting more data; it’s about widening and unifying your view. Capture broad, richly attributed events, even when they seem redundant. If costs are a concern, sample widely instead of stripping context, or use open-source visualizers to explore freely. Bring logs, metrics, and traces together into a single feed so you can zoom in for detail, step back for the full picture, and connect signals that might otherwise stay hidden. In this model, telemetry becomes a living, navigable picture of your system rather than a static dashboard. From 267f5d7b34b0e2d183f27e1fb1d4f746846e2259 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Mon, 10 Nov 2025 11:42:19 -0600 Subject: [PATCH 045/131] additional edits to updated otel practice --- practices/open-telemetry-practice.md | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 6084acb..cf44758 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -1,8 +1,6 @@ # Adopt the OpenTelemetry Standard -Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it’s hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. - -Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system (like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. +Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it’s hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. But there's a catch: These details, while useful, may not be standardized. Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. When the OTel standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics are collected to form a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility, debugging is faster, performance work is more deliberate, and improvements become evidence-based rather than guided by hunches. @@ -23,7 +21,7 @@ Every successful OpenTelemetry rollout begins with executive sponsorship. Adopti ### Form a Small Cross-Functional Team -Once leadership is aligned, assemble a small pilot team capable of working across boundaries (backend, frontend, data pipelines, infrastructure, etc). Before starting any technical work, make sure this group shares a common understanding of why observability matters and what "good telemetry" looks like. Use [Charity Majors' Observability 2.0](/resources/tech/otel/observability-2-0-honeycomb.md) and [Asking Better Questions with OpenTelemetry](/resources/tech/otel/asking-better-questions-with-opentelemetry.md) to align on what data should be emitted, how it will be structured, and how teams will use it to ask better questions (not just build prettier dashboards). Also consider organizing workshops, spikes, hackathons, etc that will get your team's hands dirty with the [documentation](https://opentelemetry.io/docs/what-is-opentelemetry/) and tooling. +Once leadership is aligned, assemble a small pilot team capable of working across boundaries (backend, frontend, data pipelines, infrastructure, etc). Before starting any technical work, make sure this group shares a common understanding of why observability matters and what "good telemetry" looks like. Use [Charity Majors's Observability 2.0](/resources/tech/otel/observability-2-0-honeycomb.md) and [Asking Better Questions with OpenTelemetry](/resources/tech/otel/asking-better-questions-with-opentelemetry.md) to align on what data should be emitted, how it will be structured, and how teams will use it to ask better questions (not just build prettier dashboards). Also consider organizing workshops, spikes, and hackathons that will get your team's hands dirty with the [documentation](https://opentelemetry.io/docs/what-is-opentelemetry/) and tooling. ### Establish a Foundational Repository @@ -35,8 +33,8 @@ Start with one business-critical request flow and instrument it end-to-end. Pick Begin with two simple telemetry configurations: -1. **Instrument to standard out.** Start with [auto-instrumentation libraries](https://opentelemetry.io/docs/languages/) available for your language or framework. These often require no code changes and can emit useful spans immediately. Then layer in a simple manual API that feels like console.log(). Developers should be able to add spans or structured logs with a single call, no complicated wiring or mocks required. During local dev, spans should print to stdout. During test runs, they should be silently ignored. -2. **Run a real pipeline locally** In parallel, stand up a lightweight local collector + viewer (e.g., docker compose up for OTel Collector + Grafana / Jaeger) and send the same spans there. This validates structure, naming, and context while building confidence that the data tells a coherent story before touching production. +1. **Instrument to standard out.** Start with [auto-instrumentation libraries](https://opentelemetry.io/docs/languages/) available for your language or framework. These often require no code changes and can emit useful spans immediately. Then, layer in a simple manual API that feels like console.log(). Developers should be able to add spans or structured logs with a single call; no complicated wiring or mocks required. During local dev, spans should print to stdout. During test runs, they should be silently ignored. +2. **Run a real pipeline locally.** In parallel, stand up a lightweight local collector + viewer (e.g., docker compose up for OTel Collector + Grafana / Jaeger) and send the same spans there. This validates structure, naming, and context while building confidence that the data tells a coherent story before touching production. Once signals are clear locally, deploy the collector and instrumentation to pre-prod and then production. @@ -48,17 +46,17 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va ## Lessons From The Field -*Quick Wins Build Momentum* Observability improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. +*Quick Wins Build Momentum.* Observability improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. -*Telemetry Surfaces Politics* OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. +*Telemetry Surfaces Politics.* OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. *Some Assembly Required* OTel isn’t plug-and-play. It’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective tissue that unifies data and insight across teams. -*Bridge, Don’t Replace* People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. +*Bridge, Don’t Replace.* People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. -*Expect the Unexpected* Auto-instrumentation often surfaces insights teams wouldn't think to look for. It can reveal details that manual instrumentation might miss, like unused routes being hit by scanners, inefficient library calls, or unexpected dependency behavior. These discoveries can inform everything from performance tuning to security awareness, turning "extra" visibility into real operational intelligence. +*Expect the Unexpected.* Auto-instrumentation often surfaces insights teams wouldn't think to look for. It can reveal details that manual instrumentation might miss like unused routes being hit by scanners, inefficient library calls, or unexpected dependency behavior. These discoveries can inform everything from performance tuning to security awareness, turning "extra" visibility into real operational intelligence. -*Be Mindful of Costs* Early OTel rollouts often produce far more data than needed, especially when systems double-log through multiple agents or send every log line to a collector. This creates both noise and unexpected costs. The solution is to sample aggressively once instrumentation is proven and to use open-source visualizers such as [Grafana](https://grafana.com/) in development. You can also add a verbose configuration flag for local debugging that remains off in deployed environments, preserving deep visibility without inflating bills. Finally, favor traces and spans over raw logs since they provide richer context, lower storage costs, and make the data easier to interpret. +*Be Mindful of Costs.* Early OTel rollouts often produce far more data than needed, especially when systems double-log through multiple agents or send every log line to a collector. This creates both noise and unexpected costs. The solution is to sample aggressively once instrumentation is proven and to use open-source visualizers such as [Grafana](https://grafana.com/) in development. You can also add a verbose configuration flag for local debugging that remains off in deployed environments, preserving deep visibility without inflating bills. Finally, favor traces and spans over raw logs since they provide richer context, lower storage costs, and make the data easier to interpret. ## Deciding to Polish or Pitch @@ -66,12 +64,12 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ### Fast & Measurable -- **Mean Time To Recover.** Developers should find and confirm root causes faster using telemetry data rather than disparate data points and anecdotes. This can be tracked via incident timelines in postmortems or Jira/incident tooling. +- **Faster Mean Time To Recover.** Developers should find and confirm root causes faster using telemetry data rather than disparate data points and anecdotes. This can be tracked via incident timelines in postmortems or Jira/incident tooling. - **Faster Deployment Feedback Loops.** Engineers should see how changes affect the system within minutes, not hours. This can be measured via deployment pipelines (CI/CD timestamps) and time-to-first-meaningful-telemetry-signals after deploy. ### Fast & Intangible -- **More Productive Debugging Behaviors.** Teams default to tracing and telemetry for understanding issues, instead of log-hunting or adding prints. Capture this via retrospective notes, engineering Slack chatter, or direct developer feedback. +- **More Productive Debugging Behaviors.** Teams should default to tracing and telemetry for understanding issues, instead of log-hunting or adding prints. Capture this via retrospective notes, engineering Slack chatter, or direct developer feedback. ### Slow & Measurable From 99e9db98fe79dfb71ab80aa444d128e62195a43c Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Mon, 17 Nov 2025 22:33:15 -0800 Subject: [PATCH 046/131] Add expected changes to metrics listed in otel practice --- practices/open-telemetry-practice.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index cf44758..812c3fd 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -73,11 +73,11 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ### Slow & Measurable -- **Product KPIs.** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. +- **Improved Product KPIs.** Monitor conversion rates, drop-offs, or booking throughput tied to system optimizations. When problems are surfaced, they're more likely to be solved. ### Slow & Intangible -- **Cross-Team Collaboration.** Engineers, PMs, and leaders should reference telemetry data when discussing reliability or performance decisions, replacing anecdote-driven debates. Gauge this via meeting observations, PM feedback, and roadmap discussions. +- **Improved Cross-Team Collaboration.** Engineers, PMs, and leaders should reference telemetry data when discussing reliability or performance decisions, replacing anecdote-driven debates. Gauge this via meeting observations, PM feedback, and roadmap discussions. ## Supporting Capabilities From 4be8f246764a4dbf1e6108ea312da81e11f4d19c Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Thu, 20 Nov 2025 14:47:54 -0700 Subject: [PATCH 047/131] Premature optimzation of telemetry --- practices/open-telemetry-practice.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 812c3fd..3b18c3a 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -48,6 +48,8 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va *Quick Wins Build Momentum.* Observability improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. +*When is Open Telemetry Too Much or Premature?* If your Mean Time To Recovery and your Change Failure Rate are both really small, advanced telemetry is probably not your highest priority. If you already have a simple logging solution or even an advanced logging solution that enables your team to quickly have enough insight to solve and prevent problems effectively, there are most likely other areas that are higher priority to focus on. This doesn't mean that advanced telemetry won't be valuable in the future, but it should not be your current focus. + *Telemetry Surfaces Politics.* OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. *Some Assembly Required* OTel isn’t plug-and-play. It’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective tissue that unifies data and insight across teams. From cc601e5a1131d479b1277735586fc71610e60168 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Mon, 1 Dec 2025 10:24:34 -0800 Subject: [PATCH 048/131] Fix special chars --- practices/open-telemetry-practice.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 3b18c3a..9b63929 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -1,6 +1,6 @@ # Adopt the OpenTelemetry Standard -Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it’s hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. But there's a catch: These details, while useful, may not be standardized. Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. +Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it's hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. But there's a catch: These details, while useful, may not be standardized. Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. When the OTel standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics are collected to form a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility, debugging is faster, performance work is more deliberate, and improvements become evidence-based rather than guided by hunches. @@ -11,13 +11,13 @@ Beyond better debugging, OTel positions the organization for long-term data leve - You're a developer who needs to keep systems operational and performant. - You're a QA who needs to ensure changes don't introduce systemic failures. - You're a product leader who needs to track how various changes (or experiments) are affecting the user experience. -- You’re an engineering leader focused on improving system reliability and creating fast feedback loops to understand system health and the impact of every change. +- You're an engineering leader focused on improving system reliability and creating fast feedback loops to understand system health and the impact of every change. ## How to Gain Traction ### Secure a Champion From Leadership -Every successful OpenTelemetry rollout begins with executive sponsorship. Adopting the OTel standard often requires significant time and budget, and it means competing with other organizational priorities. So, a shift toward OTel may face cultural resistance. It's helpful to have a leader who can connect the work to measurable business goals and clear obstacles when resistance appears. Use [Alibaba’s OpenTelemetry journey](/resources/tech/otel/alibaba-opentelemetry-journey.md) as a reference point; it helps leaders understand both the early friction and the long-term payoff of adopting a shared telemetry standard. +Every successful OpenTelemetry rollout begins with executive sponsorship. Adopting the OTel standard often requires significant time and budget, and it means competing with other organizational priorities. So, a shift toward OTel may face cultural resistance. It's helpful to have a leader who can connect the work to measurable business goals and clear obstacles when resistance appears. Use [Alibaba's OpenTelemetry journey](/resources/tech/otel/alibaba-opentelemetry-journey.md) as a reference point; it helps leaders understand both the early friction and the long-term payoff of adopting a shared telemetry standard. ### Form a Small Cross-Functional Team @@ -38,7 +38,7 @@ Begin with two simple telemetry configurations: Once signals are clear locally, deploy the collector and instrumentation to pre-prod and then production. -Success at this stage isn’t volume, it's usefulness. The pilot should answer questions engineers and leaders care about, such as where users drop off, what slows down a key flow, and how recent changes affect conversion or error spikes. +Success at this stage isn't volume, it's usefulness. The pilot should answer questions engineers and leaders care about, such as where users drop off, what slows down a key flow, and how recent changes affect conversion or error spikes. ### Standardize and Expand @@ -52,9 +52,9 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va *Telemetry Surfaces Politics.* OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. -*Some Assembly Required* OTel isn’t plug-and-play. It’s a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective tissue that unifies data and insight across teams. +*Some Assembly Required.* OTel isn't plug-and-play. It's a toolkit of SDKs, exporters, and collectors you assemble to fit your system. Success depends on treating it like infrastructure work: apply clean code, schema discipline, and solid CI practices. Built with care, OTel becomes the connective tissue that unifies data and insight across teams. -*Bridge, Don’t Replace.* People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. +*Bridge, Don't Replace.* People already have preferred tools. Add trace IDs and references to link systems rather than trying to rip existing ones out early. For example, product teams may have specialized product-analytics tooling. OTel should complement that instead of replacing it. *Expect the Unexpected.* Auto-instrumentation often surfaces insights teams wouldn't think to look for. It can reveal details that manual instrumentation might miss like unused routes being hit by scanners, inefficient library calls, or unexpected dependency behavior. These discoveries can inform everything from performance tuning to security awareness, turning "extra" visibility into real operational intelligence. From 7a52cd13bffa31f65c5d087a0fb3d6f18ec7b838 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Mon, 1 Dec 2025 10:25:00 -0800 Subject: [PATCH 049/131] Add Ian's lesson from the field about otel performance concerns --- practices/open-telemetry-practice.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 9b63929..c7e502b 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -60,6 +60,8 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va *Be Mindful of Costs.* Early OTel rollouts often produce far more data than needed, especially when systems double-log through multiple agents or send every log line to a collector. This creates both noise and unexpected costs. The solution is to sample aggressively once instrumentation is proven and to use open-source visualizers such as [Grafana](https://grafana.com/) in development. You can also add a verbose configuration flag for local debugging that remains off in deployed environments, preserving deep visibility without inflating bills. Finally, favor traces and spans over raw logs since they provide richer context, lower storage costs, and make the data easier to interpret. +*Large Legacy Codebases and Data Pipelines can Exacerbate Performance Concerns.* Legacy code that is untested or built in haste may be pushing up against CPU, memory, storage, and time limitations all on its own. Oftentimes, the business depends on these processes, so it becomes risky to refactor them. OTel can serve as a reliability monitor to prove the system is functioning as it did before, and with auto-instrumentation, the source code doesn't need to be disturbed. However, you'll need to consider the system constraints and OTel's limitations. Check the number of spans, span events and logs you'll be generating and that you're not hitting hard limits in the OTel SDK, in the visualizer, or in the speed/volume of telemetry emissions. Also consider the OTel package size and whether you'll need to prune or compress it. You may consider sampling the telemetry output or increasing the legacy system's CPU class or memory allocations, etc. + ## Deciding to Polish or Pitch After experimenting with this practice for **4–5 weeks**, bring the team together and determine whether the following metrics and/or signals have changed in a positive direction: From b7be45193e9ec9e76f665659735e128c793cbf60 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 5 Dec 2025 12:27:29 -0700 Subject: [PATCH 050/131] The Mythical Man-Month --- resources/process/mytical-man-month.md | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 resources/process/mytical-man-month.md diff --git a/resources/process/mytical-man-month.md b/resources/process/mytical-man-month.md new file mode 100644 index 0000000..01e679d --- /dev/null +++ b/resources/process/mytical-man-month.md @@ -0,0 +1,8 @@ +# *The Mytical Man-Month* by Frederick P. Brook, Jr. + +Resource type: Book + +https://www.amazon.com/Mythical-Man-Month-Software-Engineering-Anniversary/dp/0201835959 + +The Mythical Man-Month teaches the essential and counter intuitive principle that man-hours do not always equate to development output. This is an essential read for IT leaders who seek to optimize their organizations output without wasting resources and slowing down their teams. + From 166b7c4c20fd4c31be2f21c2a746fc70fcf6cf95 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Fri, 12 Dec 2025 12:23:19 -0800 Subject: [PATCH 051/131] Add key learning to otel practice --- capabilities/monitoring-and-observability.md | 4 ++-- practices/open-telemetry-practice.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/capabilities/monitoring-and-observability.md b/capabilities/monitoring-and-observability.md index fab0c45..f4cdc06 100644 --- a/capabilities/monitoring-and-observability.md +++ b/capabilities/monitoring-and-observability.md @@ -47,9 +47,9 @@ Generally, an overall score equal to or less than 3 means you'll likely gain a l The following is a curated list of supporting practices to consider when looking to improve your team's Monitoring and Observability capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -### Instrument Systems With Telemetry Data +### [Adopt the OpenTelemetry Standard](/practices/open-telemetry-practice.md) -By instrumenting key parts of your application with telemetry data, teams gain real-time insights into usage patterns, performance bottlenecks, and opportunities to prioritize impactful changes. +By instrumenting key parts of your application with telemetry data, teams gain real-time insights into usage patterns, performance bottlenecks, and opportunities to prioritize impactful changes. By following the OpenTelemetry standard and suite of open-source tools to instrument your application will provide consistent, vendor-neutral telemetry that preserves long-term flexibility in tooling and cost management. ### Implement Symptom-Based Alerts diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index c7e502b..cfab8ed 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -48,7 +48,7 @@ Once the pilot produces consistent, valuable traces, shift focus from proving va *Quick Wins Build Momentum.* Observability improvements mean little if no one notices. Publicize early examples of time saved and bugs caught to fuel buy-in. -*When is Open Telemetry Too Much or Premature?* If your Mean Time To Recovery and your Change Failure Rate are both really small, advanced telemetry is probably not your highest priority. If you already have a simple logging solution or even an advanced logging solution that enables your team to quickly have enough insight to solve and prevent problems effectively, there are most likely other areas that are higher priority to focus on. This doesn't mean that advanced telemetry won't be valuable in the future, but it should not be your current focus. +*When is Open Telemetry Too Much?* When the organization values speed to insight over long-term flexibility. Direct vendor SDKs can deliver faster time-to-value with fewer moving parts, strong defaults, and less platform work. The trade-off is tighter technical and semantic coupling between application code and the observability vendor, increasing the long-term cost of change. OpenTelemetry becomes especially worthwhile when teams want an architectural boundary in which the collector centrally manages sampling, data transformation, cost controls, and routing to different backends without having to redeploy application servers. *Telemetry Surfaces Politics.* OTel reveals ownership gaps and bottlenecks. In bureaucratic cultures, this requires tact. Frame findings as **shared opportunities**, not personal failings. From 25cbf7b1dec4f31eaaf3a173fd324da78623ae13 Mon Sep 17 00:00:00 2001 From: Brian Pratt Date: Tue, 16 Dec 2025 15:30:53 -0500 Subject: [PATCH 052/131] More on Perform Automated Code Analysis --- practices/perform-automated-code-analysis.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/practices/perform-automated-code-analysis.md b/practices/perform-automated-code-analysis.md index 7c7f363..bda86e4 100644 --- a/practices/perform-automated-code-analysis.md +++ b/practices/perform-automated-code-analysis.md @@ -5,6 +5,7 @@ Catching every bug or style inconsistency by hand is tough and takes a lot of ti Some popular tools for automated code analysis include: - Static Analysis & Linting: [ESLint](https://eslint.org/docs/latest/use/getting-started), [SonarQube](https://github.com/SonarSource/sonarqube), and [Semgrep](https://github.com/semgrep/semgrep) can be used to enforce code quality + - for TypeScript, try the [typescript-eslint](https://typescript-eslint.io) plugin - Code Formatting: [Prettier](https://prettier.io/docs/integrating-with-linters) (TS/JS) and [rustfmt](https://github.com/rust-lang/rustfmt) (Rust) automatically enforce consistent code style - Code Query Language: [GritQL](https://github.com/honeycombio/gritql), [CodeQL](https://codeql.github.com/), and [comby](https://github.com/comby-tools/comby) can search, lint, and modify code - General Purpose AI Agents: [Claude Code](https://www.anthropic.com/claude), [Cursor](https://cursor.com/), and [Gemini-CLI](https://github.com/google-gemini/gemini-cli) are all general purpose AI-powered agents that can be used for code generation, review, style enforcement, and bug detection @@ -19,6 +20,12 @@ Some popular tools for automated code analysis include: ## How to Gain Traction +### Incremental Adoption + +Introducing linting/formatting tools to an existing codebase can result in large, sweeping changes, which can be hard to review, causing conflicts and delays. It is helpful to start with a more permissive configuration, gradually adding coverage and introducing more strict configurations. For example, ESLint has a [Typed-Linting](https://typescript-eslint.io/troubleshooting/typed-linting/) plugin that can be gradually applied. + +Additionally, and especially with legacy systems, it can be helpful to gradually apply the linting rules to the codebase itself. Start with your core business logic and you can gradually expand the coverage over time. + ### Start with Education & Demos Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into their daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](/resources/tech/ai-vs-rule-based-static-code-analysis.md) in advance, so team members can come prepared with questions. Close the session with a short demo in your actual codebase using one or multiple of the tools listed above to make the value real and immediate. @@ -45,6 +52,8 @@ Assuming the pilot went well, gather the team to share results and best practice - _Use the Right Tools for the Job_ – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. +- _Avoid tyrannical metrics_ - Tools that calculate a numerical quality or complexity score are useful as a general compass, but reach a point of diminishing returns quickly and can even be detrimental to overall codebase cohesion if you start chasing perfect numbers. + ## Deciding to Polish or Pitch After experimenting with this practice for **4–5 weeks**, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: From 6368c611696cee1cfce28c51fd06202f9c427a7b Mon Sep 17 00:00:00 2001 From: nicoletache Date: Tue, 30 Dec 2025 11:12:23 -0600 Subject: [PATCH 053/131] final edits to Perform Automated Code Analysis practice --- practices/perform-automated-code-analysis.md | 45 ++++++++++---------- 1 file changed, 22 insertions(+), 23 deletions(-) diff --git a/practices/perform-automated-code-analysis.md b/practices/perform-automated-code-analysis.md index bda86e4..2a1fb62 100644 --- a/practices/perform-automated-code-analysis.md +++ b/practices/perform-automated-code-analysis.md @@ -2,37 +2,36 @@ Catching every bug or style inconsistency by hand is tough and takes a lot of time. Automated code analysis brings speed and consistency to teams by delegating that task to tools. These tools (both traditional static analyzers and modern AI-powered assistants) can highlight security vulnerabilities, style discrepancies, dependency risks, and even suggest or apply fixes in real time. -Some popular tools for automated code analysis include: +Tools can perform a variety of automated code analysis tasks, including: -- Static Analysis & Linting: [ESLint](https://eslint.org/docs/latest/use/getting-started), [SonarQube](https://github.com/SonarSource/sonarqube), and [Semgrep](https://github.com/semgrep/semgrep) can be used to enforce code quality - - for TypeScript, try the [typescript-eslint](https://typescript-eslint.io) plugin -- Code Formatting: [Prettier](https://prettier.io/docs/integrating-with-linters) (TS/JS) and [rustfmt](https://github.com/rust-lang/rustfmt) (Rust) automatically enforce consistent code style -- Code Query Language: [GritQL](https://github.com/honeycombio/gritql), [CodeQL](https://codeql.github.com/), and [comby](https://github.com/comby-tools/comby) can search, lint, and modify code -- General Purpose AI Agents: [Claude Code](https://www.anthropic.com/claude), [Cursor](https://cursor.com/), and [Gemini-CLI](https://github.com/google-gemini/gemini-cli) are all general purpose AI-powered agents that can be used for code generation, review, style enforcement, and bug detection -- AI Powered Code Review: [Ellipsis](https://www.ellipsis.dev/), [GitHub Copilot](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review), [CodeRabbit](https://www.coderabbit.ai), and [Cursor Bugbot](https://cursor.com/bugbot) provide AI-assisted reviews and inline feedback -- Self-hosted LLMs: Tools like [Ollama](https://github.com/ollama/) or [LM Studio](https://github.com/lmstudio-ai) allow you to run open-source AI models locally which in turn can be used to power some open source agentic tools +- Static analysis & linting: [ESLint](https://eslint.org/docs/latest/use/getting-started), [SonarQube](https://github.com/SonarSource/sonarqube), and [Semgrep](https://github.com/semgrep/semgrep) can be used to enforce code quality. For TypeScript, try the [typescript-eslint](https://typescript-eslint.io) plugin. +- Code formatting: [Prettier](https://prettier.io/docs/integrating-with-linters) (TS/JS) and [rustfmt](https://github.com/rust-lang/rustfmt) (Rust) automatically enforce consistent code style. +- Code query language: [GritQL](https://github.com/honeycombio/gritql), [CodeQL](https://codeql.github.com/), and [comby](https://github.com/comby-tools/comby) can search, lint, and modify code. +- General purpose AI: [Claude Code](https://www.anthropic.com/claude), [Cursor](https://cursor.com/), and [Gemini-CLI](https://github.com/google-gemini/gemini-cli) are all general purpose AI-powered agents that can be used for code generation, review, style enforcement, and bug detection. +- AI-powered code review: [Ellipsis](https://www.ellipsis.dev/), [GitHub Copilot](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review), [CodeRabbit](https://www.coderabbit.ai), and [Cursor Bugbot](https://cursor.com/bugbot) provide AI-assisted reviews and inline feedback. +- Self-hosted LLMs: Tools like [Ollama](https://github.com/ollama/) or [LM Studio](https://github.com/lmstudio-ai) allow you to run open-source AI models locally, which in turn can be used to power some open source agentic tools. ## When to Experiment - You are a developer who needs fast feedback on bugs, design issues, and inconsistencies so you can work more efficiently and avoid waiting for review cycles. -- You are a QA engineer and need to identify high-risk areas earlier so you can effectively focus your limited testing time. -- You are a tech lead or manager need to enforce consistent code quality across the team so we can deliver successful products without increasing review overhead. +- You are a QA engineer who needs to identify high-risk areas earlier so you can effectively focus your limited testing time. +- You are a tech lead or manager who needs to enforce consistent code quality across the team so it can deliver successful products without increasing review overhead. ## How to Gain Traction ### Incremental Adoption -Introducing linting/formatting tools to an existing codebase can result in large, sweeping changes, which can be hard to review, causing conflicts and delays. It is helpful to start with a more permissive configuration, gradually adding coverage and introducing more strict configurations. For example, ESLint has a [Typed-Linting](https://typescript-eslint.io/troubleshooting/typed-linting/) plugin that can be gradually applied. +Introducing linting and formatting tools to an existing codebase can result in large, sweeping changes, which can be hard to review, causing conflicts and delays. It is helpful to start with a more permissive configuration, gradually adding coverage and introducing more strict configurations. For example, ESLint has a [Typed-Linting](https://typescript-eslint.io/troubleshooting/typed-linting/) plugin that can be gradually applied. -Additionally, and especially with legacy systems, it can be helpful to gradually apply the linting rules to the codebase itself. Start with your core business logic and you can gradually expand the coverage over time. +Additionally, and especially with legacy systems, it can be helpful to gradually apply the linting rules to the codebase itself. Start with your core business logic and gradually expand the coverage over time. -### Start with Education & Demos +### Start With Education & Demos -Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into their daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](/resources/tech/ai-vs-rule-based-static-code-analysis.md) in advance, so team members can come prepared with questions. Close the session with a short demo in your actual codebase using one or multiple of the tools listed above to make the value real and immediate. +Begin with a 30-minute live session to align the team on what automated code analysis is, why it matters, and how it fits into daily work. Share resources like [AI vs Rule-based Static Code Analysis by Kendrick Curtis](/resources/tech/ai-vs-rule-based-static-code-analysis.md) in advance, so team members can come prepared with questions. Close the session with a short demo in your actual codebase using one or many of the tools listed above to make their value relevant and immediate. ### Run a Pilot on a Single Repo -Choose one active repository and integrate one or two automated analysis tools (both a static analyzer and, optionally, an AI assistant). Measure how quickly developers address flagged issues and collect feedback. +Choose one active repository and integrate one or two automated analysis tools (a static analyzer and, optionally, an AI assistant). Measure how quickly developers address flagged issues and collect feedback. ### Optimize Rules and Initiate Feedback Loops @@ -40,19 +39,19 @@ Start with default rules, then refine based on false positive rates and team fee ### Expand Across Teams -Assuming the pilot went well, gather the team to share results and best practices. Provide setup guides and starter configs so the practice may gain wider adoption across teams. Consider hosting internal workshops to help developers get the most from the tools. +Assuming the pilot went well, gather the team to share results and best practices. Provide setup guides and starter configs so the practice may gain wider adoption across teams. Consider hosting internal workshops to help developers best leverage the tools. ## Lessons From The Field -- _Review Fatigue Kills Trust_ – When teams adopt static code analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](/resources/tech/where-ai-meets-code.md), a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen teams where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal strength over volume to preserve trust and ensure these tools remain useful over time. +- _Review Fatigue Kills Trust_ – When teams adopt static code analysis tools without tuning them, developers quickly become numb to the noise. Repeatedly flagging false positives or nitpicky issues creates [review fatigue](/resources/tech/where-ai-meets-code.md), a term coined by Michael Feathers to describe the erosion of attention and care during reviews due to cognitive overload. We’ve seen situations where high-friction rules led to engineers auto-dismissing feedback, eventually ignoring tools entirely. Curate rulesets with developer input and trim overly noisy alerts. Prioritize signal strength over volume to preserve trust and ensure these tools remain useful over time. -- _Combine AI Tools With Peer Review_ – Automation should complement, not replace, human review. AI-assisted tools like Claude Code can help developers catch bugs earlier, write cleaner code, and accelerate onboarding (especially for newer team members). However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Treat AI and automation suggestions like junior developer input -- often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” +- _Combine AI Tools With Peer Review_ – Automation should complement, not replace, human review. General purpose AI agents like Claude Code can help developers catch bugs earlier, write cleaner code, and accelerate onboarding (especially for newer team members). However, these tools can occasionally propose flawed or overly confident fixes. Teams that encourage developers to use AI for pre-review, followed by intentional peer validation, tend to see the greatest gains. Treat AI and automation suggestions like junior developer input -- often helpful, but not always right. Peer review remains essential for catching edge cases, maintaining architectural integrity, and avoiding over-reliance on “green checks.” - _Early Integration Reduces Friction_ – Teams that surface static code analysis results directly in the developer’s IDE tend to resolve issues faster and with less frustration. When feedback is delayed to CI or post-push review, issues are often skipped or rushed because the developer has already context-switched. By contrast, showing issues inline -- right when code is being written -- leads to higher-quality fixes and builds better habits over time. The sooner the feedback appears, the more likely it is to be acted on. Integrate tools into editors like VS Code or JetBrains, not just your CI, to reduce disruption and encourage learning. -- _Use the Right Tools for the Job_ – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general-purpose and language-specific tools where appropriate. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. +- _Use the Right Tools for the Job_ – Not all static code analysis tools are equally effective across languages or tech stacks. Teams often adopt a single tool out of convenience, only to find it lacks precision in certain environments; this leads to false positives or missed issues. A better approach is to assess tools based on the codebase, language, and team needs, combining general purpose and language-specific tools where appropriate. A lightweight multi-tool setup, tuned per language, often outperforms an all-in-one solution. -- _Avoid tyrannical metrics_ - Tools that calculate a numerical quality or complexity score are useful as a general compass, but reach a point of diminishing returns quickly and can even be detrimental to overall codebase cohesion if you start chasing perfect numbers. +- _Avoid Tyrannical Metrics_ - Tools that calculate a numerical quality or complexity score are useful as a general compass, but they reach a point of diminishing returns quickly and can even be detrimental to overall codebase cohesion if you start chasing perfect numbers. ## Deciding to Polish or Pitch @@ -66,13 +65,13 @@ After experimenting with this practice for **4–5 weeks**, bring the team toget ### Fast & Intangible -**Developer Sentiment**. Friction during code reviews should decline. Capture this via lightweight surveys ([Typeform](https://www.typeform.com/) or [Google Forms](https://workspace.google.com/products/forms/)), or retro feedback that points to reduced nitpicky debates and faster review cycles. +**Positive Developer Sentiment**. Friction during code reviews should decline. Capture this via lightweight surveys ([Typeform](https://www.typeform.com/) or [Google Forms](https://workspace.google.com/products/forms/)), or retro feedback that points to reduced nitpicky debates among developers and faster review cycles. ### Slow & Measurable -**Production Bug Reduction**. Over time, there should be fewer production incidents tied to preventable errors (null checks, insecure patterns, etc.). Track this by tagging incident postmortems, categorizing bugs in [Jira](https://support.atlassian.com/jira-cloud-administration/docs/what-are-issue-types/), [Linear](https://linear.app/docs/labels), or observability platforms like [Sentry](https://docs.sentry.io/product/issues/). +**Production Bug Reduction**. Over time, there should be fewer production incidents tied to preventable errors (null checks, insecure patterns, etc.). Track this by tagging incident postmortems, and categorizing bugs in [Jira](https://support.atlassian.com/jira-cloud-administration/docs/what-are-issue-types/), [Linear](https://linear.app/docs/labels), or observability platforms like [Sentry](https://docs.sentry.io/product/issues/). -**Consistency & Maintainability**. Static analysis and linting scores should show steady improvement. Use [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/) to track rule compliance trends and codebase quality. +**Greater Consistency & Maintainability**. Static analysis and linting scores should show steady improvement. Use [SonarQube dashboards](https://docs.sonarsource.com/sonarqube-server/10.6/user-guide/code-metrics/introduction/) or [Semgrep reports](https://semgrep.dev/docs/semgrep-ci/overview/) to track rule compliance trends and codebase quality. ## Supporting Capabilities From 4b99e908f4765ff482156799ba29e817ffb5c98b Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 5 Jan 2026 15:18:33 -0700 Subject: [PATCH 054/131] add new capability empty files --- capabilities/ai-accessable-internal-data.md | 0 capabilities/clear-and-communicated-ai-stance.md | 0 capabilities/healthy-data-ecosystems.md | 0 capabilities/platform-engineering.md | 0 capabilities/user-centric-focus.md | 0 5 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 capabilities/ai-accessable-internal-data.md create mode 100644 capabilities/clear-and-communicated-ai-stance.md create mode 100644 capabilities/healthy-data-ecosystems.md create mode 100644 capabilities/platform-engineering.md create mode 100644 capabilities/user-centric-focus.md diff --git a/capabilities/ai-accessable-internal-data.md b/capabilities/ai-accessable-internal-data.md new file mode 100644 index 0000000..e69de29 diff --git a/capabilities/clear-and-communicated-ai-stance.md b/capabilities/clear-and-communicated-ai-stance.md new file mode 100644 index 0000000..e69de29 diff --git a/capabilities/healthy-data-ecosystems.md b/capabilities/healthy-data-ecosystems.md new file mode 100644 index 0000000..e69de29 diff --git a/capabilities/platform-engineering.md b/capabilities/platform-engineering.md new file mode 100644 index 0000000..e69de29 diff --git a/capabilities/user-centric-focus.md b/capabilities/user-centric-focus.md new file mode 100644 index 0000000..e69de29 From 5bc7b3d99fe057bfe8a6233d23141178feba89e2 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 5 Jan 2026 16:00:20 -0700 Subject: [PATCH 055/131] add clear stance on AI capability --- .../clear-and-communicated-ai-stance.md | 77 +++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/capabilities/clear-and-communicated-ai-stance.md b/capabilities/clear-and-communicated-ai-stance.md index e69de29..0074dea 100644 --- a/capabilities/clear-and-communicated-ai-stance.md +++ b/capabilities/clear-and-communicated-ai-stance.md @@ -0,0 +1,77 @@ +# [Clear and Communicated AI Stance](https://dora.dev/capabilities/clear-and-communicated-ai-stance/) + +A **Clear and Communicated AI Stance** means that an organization has established and shared a formal position on the use of Artificial Intelligence. This isn't just a legal "thou shalt not" document; it is a framework that provides developers and teams with guidance on how, where, and why AI tools—such as Large Language Models (LLMs) and coding assistants—should be used. The goal is to provide a "safe paved road" for innovation while managing risks related to security, legal compliance, and ethics. + +## Nuances + +This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal is to arm you with the context needed to make informed decisions about implementing an AI stance. + +### The "Policy in a Vacuum" Problem + +A policy is only effective if people know it exists. Often, organizations create an AI policy that lives in a legal or HR portal that developers never visit. Without active communication, teams default to "Shadow AI," using tools without oversight because they don't know the rules or find the official process too slow. + +### Balancing Restriction with Enablement + +An AI stance that is purely restrictive (e.g., "AI is banned") often results in a loss of competitive advantage and can drive usage underground. Conversely, an "anything goes" approach introduces massive legal and security risks. The most effective stances are nuanced—defining which tools are safe for public data versus which can be used with proprietary code. + +### The Speed of Change + +The AI landscape evolves faster than traditional corporate policy cycles. A stance written six months ago may not cover new capabilities like autonomous AI agents or local model execution. To remain relevant, an AI stance must be a "living document" that is reviewed and updated at a higher frequency than other organizational policies. + +### Ambiguity in "Reasonable Use" + +Terms like "use AI responsibly" are too vague to be actionable for a developer. For an AI stance to be effective, it needs to address specific, everyday concerns: Can I use it for refactoring? Can I use it to summarize meeting notes containing customer data? Can I use AI-generated code in our production repository? Clarity is the antidote to hesitation and risk. + +## Assessment + +To assess how mature your team or organization is in this capability, complete this short exercise. + +Consider the descriptions below and score your team on this capability. Generally, score a **1** if the stance is non-existent or hidden, a **2** if it is reactive and vague, a **3** if it is clear and well-communicated, and a **4** if it is integrated and iteratively updated. + +1. **Absent or Hidden:** No formal stance exists, or it is buried in legal documentation that is not shared with technical teams. Developers are unsure what is allowed, leading to either total avoidance or "underground" usage. +2. **Reactive & Vague:** A stance exists but is mostly reactive (e.g., "don't put passwords in ChatGPT"). Guidelines are unclear, and there is no centralized place to find updates or ask questions about new tools. +3. **Clear & Communicated:** There is a well-documented AI policy that is easily accessible. Most team members understand the boundaries of AI use, and there is a clear process for requesting or vetting new AI tools. +4. **Integrated & Iterative:** The AI stance is part of the daily engineering culture. It is regularly updated based on team feedback and technological shifts. There is high confidence in using AI because the legal and security guardrails are clear and supportive. + +## Supporting Practices + +The following is a curated list of supporting practices to consider when looking to improve your team's AI Stance capability. + +### Create an "Approved AI Services" Catalog + +Maintain a central, internal list of AI tools and models that have been vetted for security and legal compliance. This reduces the cognitive load on developers, as they don't have to wonder if a specific tool is "safe" to use. + +### Establish "Security Tiers" for AI Interaction + +Clearly define what level of data can be sent to external AI providers. For example: +- **Tier 1 (Public):** Can be used with any tool. +- **Tier 2 (Proprietary Code):** Requires enterprise-grade tools with data-exclusion opt-outs. +- **Tier 3 (Sensitive/PII):** Strictly prohibited from external LLMs. + +### Provide "Prompt Engineering" Guidance + +Instead of just giving permission to use AI, provide guidance on how to use it effectively and safely. Sharing "Golden Prompts" for tasks like unit test generation or documentation helps standardize the quality of AI-assisted work. + +### Automate Policy Enforcement + +Where possible, use tooling—like secret scanners or egress filters—to ensure that sensitive data isn't being sent to unapproved AI endpoints. This moves the AI stance from a "policy you have to remember" to a system that supports you. + +## Adjacent Capabilities + +The following capabilities will be valuable for you and your team to explore, as they are either: + +- Related (they cover similar territory to Clear and Communicated AI Stance) +- Upstream (they are a pre-requisite for Clear and Communicated AI Stance) +- Downstream (Clear and Communicated AI Stance is a pre-requisite for them) + +### [Pervasive Security](/capabilities/pervasive-security.md) - Upstream + +A strong security culture is a prerequisite for a good AI stance. If you don't +already understand your data classification, it will be difficult to write a +clear policy on how AI should interact with it. + +### [Empowering Teams](/capabilities/empowering-teams.md) - Downstream + +Providing a clear AI stance empowers teams to innovate. When developers know exactly where the guardrails are, they feel safer experimenting with new ways to improve their workflow. + + From eac1bb073e0b2ba699b45b413cf1a55b4072301b Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 6 Jan 2026 09:03:46 -0700 Subject: [PATCH 056/131] fix formatting --- capabilities/clear-and-communicated-ai-stance.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/capabilities/clear-and-communicated-ai-stance.md b/capabilities/clear-and-communicated-ai-stance.md index 0074dea..25022c4 100644 --- a/capabilities/clear-and-communicated-ai-stance.md +++ b/capabilities/clear-and-communicated-ai-stance.md @@ -66,12 +66,9 @@ The following capabilities will be valuable for you and your team to explore, as ### [Pervasive Security](/capabilities/pervasive-security.md) - Upstream -A strong security culture is a prerequisite for a good AI stance. If you don't -already understand your data classification, it will be difficult to write a -clear policy on how AI should interact with it. +A strong security culture is a prerequisite for a good AI stance. If you don't already understand your data classification, it will be difficult to write a clear policy on how AI should interact with it. ### [Empowering Teams](/capabilities/empowering-teams.md) - Downstream Providing a clear AI stance empowers teams to innovate. When developers know exactly where the guardrails are, they feel safer experimenting with new ways to improve their workflow. - From e1433466c9f7f9adb997bf0038faed3b54d10104 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Fri, 9 Jan 2026 14:05:17 -0600 Subject: [PATCH 057/131] edits to new clear ai stance capability --- .../clear-and-communicated-ai-stance.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/capabilities/clear-and-communicated-ai-stance.md b/capabilities/clear-and-communicated-ai-stance.md index 25022c4..be658cb 100644 --- a/capabilities/clear-and-communicated-ai-stance.md +++ b/capabilities/clear-and-communicated-ai-stance.md @@ -1,20 +1,20 @@ # [Clear and Communicated AI Stance](https://dora.dev/capabilities/clear-and-communicated-ai-stance/) -A **Clear and Communicated AI Stance** means that an organization has established and shared a formal position on the use of Artificial Intelligence. This isn't just a legal "thou shalt not" document; it is a framework that provides developers and teams with guidance on how, where, and why AI tools—such as Large Language Models (LLMs) and coding assistants—should be used. The goal is to provide a "safe paved road" for innovation while managing risks related to security, legal compliance, and ethics. +A **clear and communicated AI stance** means that an organization has established and shared a formal position on the use of AI. This isn't just a legal "thou shalt not" document; it is a framework that provides developers and teams with guidance on how, where, and why AI tools—-such as Large Language Models (LLMs) and coding assistants—-should be used. The goal is to provide a "safe paved road" for innovation while managing risks related to security, legal compliance, and ethics. ## Nuances This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal is to arm you with the context needed to make informed decisions about implementing an AI stance. -### The "Policy in a Vacuum" Problem +### The "Policy-in-a-Vacuum" Problem -A policy is only effective if people know it exists. Often, organizations create an AI policy that lives in a legal or HR portal that developers never visit. Without active communication, teams default to "Shadow AI," using tools without oversight because they don't know the rules or find the official process too slow. +A policy is only effective if people know it exists. Often, organizations create an AI policy that lives in a legal or HR portal that developers never visit. Without active communication, teams default to "shadow AI," using tools without oversight because they don't know the rules or feel the official process is too slow. ### Balancing Restriction with Enablement -An AI stance that is purely restrictive (e.g., "AI is banned") often results in a loss of competitive advantage and can drive usage underground. Conversely, an "anything goes" approach introduces massive legal and security risks. The most effective stances are nuanced—defining which tools are safe for public data versus which can be used with proprietary code. +An AI stance that is purely restrictive (e.g., "AI is banned") often results in a loss of competitive advantage and can drive usage underground. Conversely, an "anything goes" approach introduces massive legal and security risks. The most effective stances are nuanced—-defining which tools are safe for public data versus which can be used with proprietary code. -### The Speed of Change +### Recognizing the Speed of Change The AI landscape evolves faster than traditional corporate policy cycles. A stance written six months ago may not cover new capabilities like autonomous AI agents or local model execution. To remain relevant, an AI stance must be a "living document" that is reviewed and updated at a higher frequency than other organizational policies. @@ -26,7 +26,7 @@ Terms like "use AI responsibly" are too vague to be actionable for a developer. To assess how mature your team or organization is in this capability, complete this short exercise. -Consider the descriptions below and score your team on this capability. Generally, score a **1** if the stance is non-existent or hidden, a **2** if it is reactive and vague, a **3** if it is clear and well-communicated, and a **4** if it is integrated and iteratively updated. +Consider the descriptions below and score your team on this capability. Generally, score a **1** if the AI stance is non-existent or hidden, a **2** if it is reactive and vague, a **3** if it is clear and well-communicated, and a **4** if it is integrated and iteratively updated. 1. **Absent or Hidden:** No formal stance exists, or it is buried in legal documentation that is not shared with technical teams. Developers are unsure what is allowed, leading to either total avoidance or "underground" usage. 2. **Reactive & Vague:** A stance exists but is mostly reactive (e.g., "don't put passwords in ChatGPT"). Guidelines are unclear, and there is no centralized place to find updates or ask questions about new tools. @@ -44,9 +44,9 @@ Maintain a central, internal list of AI tools and models that have been vetted f ### Establish "Security Tiers" for AI Interaction Clearly define what level of data can be sent to external AI providers. For example: -- **Tier 1 (Public):** Can be used with any tool. -- **Tier 2 (Proprietary Code):** Requires enterprise-grade tools with data-exclusion opt-outs. -- **Tier 3 (Sensitive/PII):** Strictly prohibited from external LLMs. +- **Tier 1 (Public):** Can be used with any tool +- **Tier 2 (Proprietary Code):** Requires enterprise-grade tools with data-exclusion opt-outs +- **Tier 3 (Sensitive/PII):** Strictly prohibited from external LLMs ### Provide "Prompt Engineering" Guidance @@ -54,7 +54,7 @@ Instead of just giving permission to use AI, provide guidance on how to use it e ### Automate Policy Enforcement -Where possible, use tooling—like secret scanners or egress filters—to ensure that sensitive data isn't being sent to unapproved AI endpoints. This moves the AI stance from a "policy you have to remember" to a system that supports you. +Where possible, use tooling—-like secret scanners or egress filters—-to ensure that sensitive data isn't being sent to unapproved AI endpoints. This moves the AI stance from a "policy you have to remember" to a system that supports you. ## Adjacent Capabilities From 529bbb88d99278fc1ff4621046cf4695bd46b17e Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 12 Jan 2026 11:38:47 -0700 Subject: [PATCH 058/131] adding missing standard text --- capabilities/clear-and-communicated-ai-stance.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/capabilities/clear-and-communicated-ai-stance.md b/capabilities/clear-and-communicated-ai-stance.md index be658cb..56d42f6 100644 --- a/capabilities/clear-and-communicated-ai-stance.md +++ b/capabilities/clear-and-communicated-ai-stance.md @@ -28,11 +28,17 @@ To assess how mature your team or organization is in this capability, complete t Consider the descriptions below and score your team on this capability. Generally, score a **1** if the AI stance is non-existent or hidden, a **2** if it is reactive and vague, a **3** if it is clear and well-communicated, and a **4** if it is integrated and iteratively updated. +Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. + 1. **Absent or Hidden:** No formal stance exists, or it is buried in legal documentation that is not shared with technical teams. Developers are unsure what is allowed, leading to either total avoidance or "underground" usage. 2. **Reactive & Vague:** A stance exists but is mostly reactive (e.g., "don't put passwords in ChatGPT"). Guidelines are unclear, and there is no centralized place to find updates or ask questions about new tools. 3. **Clear & Communicated:** There is a well-documented AI policy that is easily accessible. Most team members understand the boundaries of AI use, and there is a clear process for requesting or vetting new AI tools. 4. **Integrated & Iterative:** The AI stance is part of the daily engineering culture. It is regularly updated based on team feedback and technological shifts. There is high confidence in using AI because the legal and security guardrails are clear and supportive. +The number you selected represents your overall score for this capability. If you feel like the general Clear and Communicated AI Stance of your team fits somewhere in between two scores, it's okay to use a decimal. For example, if you think employees are somewhere between managing their loads and finding a good balance, you would score a 2.5. + +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of Clear and Communicated AI Stance; you would likely benefit from evaluating your scores in other capabilities. + ## Supporting Practices The following is a curated list of supporting practices to consider when looking to improve your team's AI Stance capability. @@ -44,6 +50,7 @@ Maintain a central, internal list of AI tools and models that have been vetted f ### Establish "Security Tiers" for AI Interaction Clearly define what level of data can be sent to external AI providers. For example: + - **Tier 1 (Public):** Can be used with any tool - **Tier 2 (Proprietary Code):** Requires enterprise-grade tools with data-exclusion opt-outs - **Tier 3 (Sensitive/PII):** Strictly prohibited from external LLMs From 2e750efaef0d1c6789c07e010552430e4d4e0980 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 6 Jan 2026 15:04:31 -0700 Subject: [PATCH 059/131] Add Capability - healthy data --- capabilities/healthy-data-ecosystems.md | 55 +++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/capabilities/healthy-data-ecosystems.md b/capabilities/healthy-data-ecosystems.md index e69de29..24c3864 100644 --- a/capabilities/healthy-data-ecosystems.md +++ b/capabilities/healthy-data-ecosystems.md @@ -0,0 +1,55 @@ +# [Healthy Data Ecosystems](https://dora.dev/capabilities/healthy-data-ecosystems) + +A Healthy Data Ecosystem means that the data your organization relies on—whether for operational decision-making, automated testing, or product features—is accurate, accessible, and handled with care. It’s about moving away from "data silos" and "data swamps" toward a state where data is treated as a first-class citizen. In a healthy ecosystem, teams can trust the data they use, understand where it came from, and access it without jumping through bureaucratic hoops. + +## Nuances + +Building a healthy data ecosystem isn't just a technical challenge; it’s a cultural and process-oriented one. Here are the common hurdles teams face when trying to improve their data health. + +### The "Garbage In, Garbage Out" Trap +Automated systems and AI are only as good as the data feeding them. If your upstream data is poorly formatted, incomplete, or inaccurate, any "downstream" capability (like Continuous Delivery or Predictive Analytics) will suffer. Teams often focus on the shiny tools at the end of the pipeline while neglecting the cleanliness of the data at the source. + +### Lack of Data Source Tracking +When a report looks "wrong", how long does it take your team to figure out why? Without data source tracking (knowing the path data took from source to destination), debugging data issues becomes a "detective work" nightmare. A healthy ecosystem provides clear visibility into how data is transformed at every step. + +## Assessment + +To assess the health of your data ecosystem, consider how your team interacts with data daily and score yourself using the guide below. + +1. **Fragmented & Untrusted:** Data is trapped in silos. Access requires manual approvals and long waits. No one is sure if the data is accurate, and "data cleaning" is a massive, manual chore for every project. +2. **Coordinated but Manual:** Data is documented, but often outdated. You have some central repositories (like a data warehouse), but getting new data types integrated is slow. Testing often uses stale or "hand-rolled" data that doesn't reflect reality. +3. **Accessible & Reliable:** Most data is discoverable via a catalog or API. Automated pipelines handle basic cleaning and transformation. There is high confidence in data quality, and privacy masking is largely automated. +4. **Fluid & Self-Service:** Data is treated as a product. Teams can self-serve the data they need through well-defined interfaces. Data tracking is fully transparent, and data quality issues are caught by automated "data tests" before they affect downstream systems. + +The number you selected represents your overall score for this capability. If you feel like your organization fits somewhere in between two scores, it's okay to use a decimal. + +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means your organization is largely proficient, or well on its way to becoming proficient, in the area of Generative Organizational Culture; you would likely benefit from evaluating your scores in other capabilities. + +## Supporting Practices + +### Use Data and Documentation Linting +Have automated processes in place to ensure documentation quality and timelyness. + +### Define Data Contracts +Have a unified structure for data/documentation providers and consumers. This simplifies the process of tools having access to your data and documentation. + +### Establish Clear Data Ownership +Every set of data and documentation should have a clear "owner" responsible for its quality and maintainence. This moves the organization away from a "tragedy of the commons" where everyone uses the data but no one feels empowered to fix it when it breaks. + +## Adjacent Capabilities + +The following capabilities will be valuable for you and your team to explore, as they are either: + +- Related (they cover similar territory to Healthy Data Ecosystems) +- Upstream (they are a pre-requisite for Healthy Data Ecosystems) +- Downstream (Healthy Data Ecosystems is a pre-requisite for them) + +### [AI-accessible internal data](/capabilities/ai-accessable-internal-data.md) - Downstream +Having your data clean and organized allows for the productivity amplification effects of AI-accessible data. Without healthy data AI-accessable internal data will likely cause more harm than good. + +### [Pervasive Security](/capabilities/pervasive-security.md) - Related +A healthy data ecosystem simplifies security. When data is classified and tracked automatically, applying security policies becomes much more consistent and less prone to human error. + +### [Visibility into Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) - Related +To see how work flows through your organization, you need healthy, integrated data. A healthy data ecosystem makes this high-level visibility possible. + From be4a35a6a3d427cc91c48525140fca8d3fc80cf1 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Tue, 13 Jan 2026 09:53:47 -0600 Subject: [PATCH 060/131] edit of new Healthy Data Ecosystems capability --- capabilities/healthy-data-ecosystems.md | 40 +++++++++++++------------ 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/capabilities/healthy-data-ecosystems.md b/capabilities/healthy-data-ecosystems.md index 24c3864..2c68995 100644 --- a/capabilities/healthy-data-ecosystems.md +++ b/capabilities/healthy-data-ecosystems.md @@ -1,55 +1,57 @@ # [Healthy Data Ecosystems](https://dora.dev/capabilities/healthy-data-ecosystems) -A Healthy Data Ecosystem means that the data your organization relies on—whether for operational decision-making, automated testing, or product features—is accurate, accessible, and handled with care. It’s about moving away from "data silos" and "data swamps" toward a state where data is treated as a first-class citizen. In a healthy ecosystem, teams can trust the data they use, understand where it came from, and access it without jumping through bureaucratic hoops. +A healthy data ecosystem means that the data your organization relies on for operational decision making, automated testing, or product features is accurate, accessible, and handled with care. It’s about moving away from "data silos" and "data swamps" toward a state where data is treated as a first-class citizen. In a healthy data ecosystem, teams can trust the data they use, understand where it came from, and access it without jumping through bureaucratic hoops. But building a healthy data ecosystem isn't just a technical challenge; it’s a cultural and process-oriented one. ## Nuances - -Building a healthy data ecosystem isn't just a technical challenge; it’s a cultural and process-oriented one. Here are the common hurdles teams face when trying to improve their data health. +This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. ### The "Garbage In, Garbage Out" Trap -Automated systems and AI are only as good as the data feeding them. If your upstream data is poorly formatted, incomplete, or inaccurate, any "downstream" capability (like Continuous Delivery or Predictive Analytics) will suffer. Teams often focus on the shiny tools at the end of the pipeline while neglecting the cleanliness of the data at the source. +Automated systems and AI are only as good as the data feeding them. If your upstream data is poorly formatted, incomplete, or inaccurate, any "downstream" capability (like Continuous Delivery or Predictive Analytics) will suffer. Teams often focus on the shiny tools, used at the end of the pipeline, while neglecting the cleanliness of the source data those tools rely on. ### Lack of Data Source Tracking -When a report looks "wrong", how long does it take your team to figure out why? Without data source tracking (knowing the path data took from source to destination), debugging data issues becomes a "detective work" nightmare. A healthy ecosystem provides clear visibility into how data is transformed at every step. +When a report looks "wrong," how long does it take your team to figure out why? Without data source tracking (knowing the path data took from source to destination), debugging data issues becomes a "detective work" nightmare. A healthy data ecosystem provides clear visibility into how data is transformed at every step. ## Assessment +To assess how mature your team or organization is in this capability, complete this short exercise. + +Consider the descriptions below and score your team or organization on this capability. Generally, score a 1 if data is untrusted and largely inaccessible, a 2 if data is documented but outdated, a 3 if data is trusted and discoverable, and a 4 if data is self-service and treated as a first-class citizen. -To assess the health of your data ecosystem, consider how your team interacts with data daily and score yourself using the guide below. +Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. -1. **Fragmented & Untrusted:** Data is trapped in silos. Access requires manual approvals and long waits. No one is sure if the data is accurate, and "data cleaning" is a massive, manual chore for every project. -2. **Coordinated but Manual:** Data is documented, but often outdated. You have some central repositories (like a data warehouse), but getting new data types integrated is slow. Testing often uses stale or "hand-rolled" data that doesn't reflect reality. -3. **Accessible & Reliable:** Most data is discoverable via a catalog or API. Automated pipelines handle basic cleaning and transformation. There is high confidence in data quality, and privacy masking is largely automated. -4. **Fluid & Self-Service:** Data is treated as a product. Teams can self-serve the data they need through well-defined interfaces. Data tracking is fully transparent, and data quality issues are caught by automated "data tests" before they affect downstream systems. +1. **Fragmented & Untrusted:** Data is trapped in silos. Access requires manual approvals and long waits. No one is sure if the data is accurate and "data cleaning" is a massive, manual chore for every project. +2. **Coordinated but Manual:** Data is documented, but often outdated. You have some central repositories (like a data warehouse), but integrating new data types is slow. Testing often uses stale or "hand-rolled" data that doesn't reflect reality. +3. **Accessible & Reliable:** Most data is discoverable via a catalog or API. Automated pipelines handle basic cleaning and transformation. There is high confidence in data quality and privacy masking is largely automated. +4. **Fluid & Self-Service:** Data is treated as a product. Teams can self-serve the data they need through well-defined interfaces. Data source tracking is fully transparent, and data quality issues are caught by automated "data tests" before they affect downstream systems. The number you selected represents your overall score for this capability. If you feel like your organization fits somewhere in between two scores, it's okay to use a decimal. -Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means your organization is largely proficient, or well on its way to becoming proficient, in the area of Generative Organizational Culture; you would likely benefit from evaluating your scores in other capabilities. +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means your team or organization is largely proficient, or well on its way to becoming proficient, in the area of data health; you would likely benefit from evaluating your scores in other capabilities. ## Supporting Practices +The following is a curated list of supporting practices to consider when looking to improve your team's Healthy Data Ecosystems capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. ### Use Data and Documentation Linting -Have automated processes in place to ensure documentation quality and timelyness. +Have automated processes in place to ensure documentation meets quality standards and is regularly updated. ### Define Data Contracts -Have a unified structure for data/documentation providers and consumers. This simplifies the process of tools having access to your data and documentation. +Have a contract that defines a unified structure for data/documentation providers and consumers. This simplifies the process when external tools have access to your internal data and documentation. ### Establish Clear Data Ownership -Every set of data and documentation should have a clear "owner" responsible for its quality and maintainence. This moves the organization away from a "tragedy of the commons" where everyone uses the data but no one feels empowered to fix it when it breaks. +Every set of data and documentation should have a clear owner, who is responsible for its quality and maintainence. This moves the team away from a "tragedy of the commons," where everyone uses the data but no one feels empowered to keep it updated or fix it when it breaks. ## Adjacent Capabilities - The following capabilities will be valuable for you and your team to explore, as they are either: - Related (they cover similar territory to Healthy Data Ecosystems) - Upstream (they are a pre-requisite for Healthy Data Ecosystems) - Downstream (Healthy Data Ecosystems is a pre-requisite for them) -### [AI-accessible internal data](/capabilities/ai-accessable-internal-data.md) - Downstream -Having your data clean and organized allows for the productivity amplification effects of AI-accessible data. Without healthy data AI-accessable internal data will likely cause more harm than good. - ### [Pervasive Security](/capabilities/pervasive-security.md) - Related A healthy data ecosystem simplifies security. When data is classified and tracked automatically, applying security policies becomes much more consistent and less prone to human error. ### [Visibility into Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) - Related -To see how work flows through your organization, you need healthy, integrated data. A healthy data ecosystem makes this high-level visibility possible. +To get an accurate view of how work flows through your organization, you need data that is healthy and integrated. A healthy data ecosystem makes this high-level visibility possible. + +### [AI-accessible Internal Data](/capabilities/ai-accessable-internal-data.md) - Downstream +Data that is clean and organized is easier for AI to access. And when internal data is AI-accessible, teams typically experience greater productivity. Serving unhealthy data to an AI will likely cause more harm than good. From 24091d27f82d2e0e53a074dda00e37f13f86ab8e Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 11:38:30 -0700 Subject: [PATCH 061/131] include more AI in intro --- capabilities/healthy-data-ecosystems.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/capabilities/healthy-data-ecosystems.md b/capabilities/healthy-data-ecosystems.md index 2c68995..6615b0d 100644 --- a/capabilities/healthy-data-ecosystems.md +++ b/capabilities/healthy-data-ecosystems.md @@ -1,17 +1,21 @@ # [Healthy Data Ecosystems](https://dora.dev/capabilities/healthy-data-ecosystems) -A healthy data ecosystem means that the data your organization relies on for operational decision making, automated testing, or product features is accurate, accessible, and handled with care. It’s about moving away from "data silos" and "data swamps" toward a state where data is treated as a first-class citizen. In a healthy data ecosystem, teams can trust the data they use, understand where it came from, and access it without jumping through bureaucratic hoops. But building a healthy data ecosystem isn't just a technical challenge; it’s a cultural and process-oriented one. +A healthy data ecosystem means that the data your organization relies on for operational decision making, automated testing, or product features is accurate, accessible, and handled with care. It’s about moving away from "data silos" and "data swamps" toward a state where data is treated as a first-class citizen. This has become an essential prerequisite for effectively taking advantage of AI allowing it to amplify your companies strengths instead of magnifying your companies dysfunctions. In a healthy data ecosystem, teams can trust the data they use, understand where it came from, and access it without jumping through bureaucratic hoops. But building a healthy data ecosystem isn't just a technical challenge; it’s a cultural and process-oriented one. ## Nuances + This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. ### The "Garbage In, Garbage Out" Trap + Automated systems and AI are only as good as the data feeding them. If your upstream data is poorly formatted, incomplete, or inaccurate, any "downstream" capability (like Continuous Delivery or Predictive Analytics) will suffer. Teams often focus on the shiny tools, used at the end of the pipeline, while neglecting the cleanliness of the source data those tools rely on. ### Lack of Data Source Tracking + When a report looks "wrong," how long does it take your team to figure out why? Without data source tracking (knowing the path data took from source to destination), debugging data issues becomes a "detective work" nightmare. A healthy data ecosystem provides clear visibility into how data is transformed at every step. ## Assessment + To assess how mature your team or organization is in this capability, complete this short exercise. Consider the descriptions below and score your team or organization on this capability. Generally, score a 1 if data is untrusted and largely inaccessible, a 2 if data is documented but outdated, a 3 if data is trusted and discoverable, and a 4 if data is self-service and treated as a first-class citizen. @@ -28,18 +32,23 @@ The number you selected represents your overall score for this capability. If yo Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means your team or organization is largely proficient, or well on its way to becoming proficient, in the area of data health; you would likely benefit from evaluating your scores in other capabilities. ## Supporting Practices + The following is a curated list of supporting practices to consider when looking to improve your team's Healthy Data Ecosystems capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. ### Use Data and Documentation Linting + Have automated processes in place to ensure documentation meets quality standards and is regularly updated. ### Define Data Contracts + Have a contract that defines a unified structure for data/documentation providers and consumers. This simplifies the process when external tools have access to your internal data and documentation. ### Establish Clear Data Ownership + Every set of data and documentation should have a clear owner, who is responsible for its quality and maintainence. This moves the team away from a "tragedy of the commons," where everyone uses the data but no one feels empowered to keep it updated or fix it when it breaks. ## Adjacent Capabilities + The following capabilities will be valuable for you and your team to explore, as they are either: - Related (they cover similar territory to Healthy Data Ecosystems) @@ -47,11 +56,14 @@ The following capabilities will be valuable for you and your team to explore, as - Downstream (Healthy Data Ecosystems is a pre-requisite for them) ### [Pervasive Security](/capabilities/pervasive-security.md) - Related + A healthy data ecosystem simplifies security. When data is classified and tracked automatically, applying security policies becomes much more consistent and less prone to human error. ### [Visibility into Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) - Related + To get an accurate view of how work flows through your organization, you need data that is healthy and integrated. A healthy data ecosystem makes this high-level visibility possible. ### [AI-accessible Internal Data](/capabilities/ai-accessable-internal-data.md) - Downstream + Data that is clean and organized is easier for AI to access. And when internal data is AI-accessible, teams typically experience greater productivity. Serving unhealthy data to an AI will likely cause more harm than good. From bae92b887223be83b090a1d2f6cce29a1fd90ca0 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 12:23:44 -0700 Subject: [PATCH 062/131] Link more capabilities --- capabilities/healthy-data-ecosystems.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/capabilities/healthy-data-ecosystems.md b/capabilities/healthy-data-ecosystems.md index 6615b0d..63206eb 100644 --- a/capabilities/healthy-data-ecosystems.md +++ b/capabilities/healthy-data-ecosystems.md @@ -35,17 +35,17 @@ Generally, an overall score equal to or less than 3 means you'll likely gain a l The following is a curated list of supporting practices to consider when looking to improve your team's Healthy Data Ecosystems capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -### Use Data and Documentation Linting +### Establish Data Governance and Cleaning -Have automated processes in place to ensure documentation meets quality standards and is regularly updated. +Treat internal documentation as code. Implement "Docs-as-Code" practices where documentation is reviewed and pruned regularly. Removing obsolete information is just as important as adding new information to prevent the AI from retrieving "zombie" instructions. -### Define Data Contracts +### Schedule Regular Documentation Audits -Have a contract that defines a unified structure for data/documentation providers and consumers. This simplifies the process when external tools have access to your internal data and documentation. +Without dedicated time and resources documentation can often rot or fall out of relevance. Set aside a time for your team to review your documentation. Make sure you come away with specific action items for individuals and plans for accountability. ### Establish Clear Data Ownership -Every set of data and documentation should have a clear owner, who is responsible for its quality and maintainence. This moves the team away from a "tragedy of the commons," where everyone uses the data but no one feels empowered to keep it updated or fix it when it breaks. +Every set of data and documentation should have a clear owner, who is responsible for its quality and maintenance. This moves the team away from a "tragedy of the commons," where everyone uses the data but no one feels empowered to keep it updated or fix it when it breaks. ## Adjacent Capabilities @@ -67,3 +67,6 @@ To get an accurate view of how work flows through your organization, you need da Data that is clean and organized is easier for AI to access. And when internal data is AI-accessible, teams typically experience greater productivity. Serving unhealthy data to an AI will likely cause more harm than good. +### [Documentation Quality](/capabilities/documentation-quality.md) - Related + +While Healthy Data Ecosystems is more about analytics and documentation quality specifically focuses on documentation, both are essential to keep clean and organized so that our AI systems can provide clear and positive insights. From 22dea11dc6504eecfeb10f26a3eba382c3d83e1a Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 6 Jan 2026 10:51:32 -0700 Subject: [PATCH 063/131] Add AI accessable data --- capabilities/ai-accessable-internal-data.md | 70 +++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/capabilities/ai-accessable-internal-data.md b/capabilities/ai-accessable-internal-data.md index e69de29..9694ba7 100644 --- a/capabilities/ai-accessable-internal-data.md +++ b/capabilities/ai-accessable-internal-data.md @@ -0,0 +1,70 @@ +# [AI-accessible Internal Data](https://dora.dev/capabilities/ai-accessible-internal-data/) + +AI-accessible internal data refers to the practice of making an organization's proprietary information—such as documentation, codebases, wikis, and process manuals—structured and available for consumption by Artificial Intelligence (AI) models. By utilizing technologies like Retrieval-Augmented Generation (RAG) and vector databases, organizations enable team members to query internal knowledge using natural language. The primary benefit is reducing "discovery time," allowing engineers and stakeholders to find accurate information quickly without sifting through fragmented silos. + +## Nuances + +This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. + +### The "Garbage In, Garbage Out" Problem + +AI is only as good as the data it can access. If your internal documentation is outdated, contradictory, or poorly written, the AI will provide confident but incorrect answers. Simply connecting an LLM to a messy SharePoint or a neglected Wiki will often result in "hallucinations" where the AI invents internal policies. High-quality AI accessibility requires a foundational commitment to data hygiene and documentation maintenance. + +### Privacy and Security Scoping + +Not all internal data should be accessible to everyone. Financial records, HR files, or sensitive project data must remain restricted. A common pitfall is creating an AI tool that inadvertently bypasses existing folder-level permissions, giving a junior developer access to executive-level salary data. Implementing "Identity-Aware Retrieval" is complex but necessary to ensure the AI respects existing access control lists (ACLs). + +### Context Window and Freshness + +Internal data changes rapidly. If your AI system relies on model fine-tuning or infrequent data indexing, it will quickly become obsolete. Teams often struggle with "stale context," where the AI suggests a deprecated API because the latest documentation hasn't been re-indexed yet. Building a pipeline that updates the AI’s knowledge base in near real-time is a significant engineering hurdle. + +### Over-reliance and Loss of Tribal Knowledge + +While AI makes information easier to find, there is a risk that teams stop talking to one another or stop documenting *why* decisions were made, relying instead on the AI to summarize the *what*. If the AI tool goes down or provides a wrong answer that becomes "canon," it can lead to systemic errors. It is vital to treat AI as a co-pilot for discovery, not the ultimate source of truth. + +## Assessment + +To assess how mature your team or organization is in this capability, complete this short exercise. + +Consider the descriptions below and score your team on this capability. Generally, score a 1 if your data is locked in silos and unsearchable, a 2 if you have basic search but no AI integration, a 3 if you have an AI tool that works but has limitations, and a 4 if AI is the primary, reliable interface for organizational knowledge. + +1. **Fragmented & Manual:** Data is scattered across various tools (Slack, Jira, Google Docs, Email). Finding information requires manual searching or asking individuals. There is no AI interface for internal data. +2. **Centralized but Static:** Most data is in a central wiki or repo with a basic keyword search. Some experiments with AI exist, but they are prone to hallucinations and lack access to real-time updates. +3. **Integrated & Useful:** An AI-powered search or chatbot exists that can access most technical documentation and code. It provides citations for its answers. Accuracy is high, though it occasionally misses very recent changes or restricted data. +4. **Ubiquitous & Trusted:** AI has secure, real-time access to all relevant internal data sources. It respects granular permissions and is the first place employees go for answers. Feedback loops are in place to correct the AI and update the underlying documentation simultaneously. + +## Supporting Practices + +The following is a curated list of supporting practices to consider when looking to improve your team's AI-accessible Internal Data capability. + +### Implement Retrieval-Augmented Generation (RAG) +Instead of training a model on your data, use RAG to retrieve relevant documents from a database and pass them to the AI as context for each specific query. This reduces hallucinations and allows the AI to cite its sources, enabling users to verify the information. + +### Automate Data Indexing Pipelines +Create automated workflows that trigger every time a document is updated or a pull request is merged. This ensures that the vector database used by the AI stays synchronized with the actual state of your projects, providing "fresh" answers. + +### Establish Data Governance and Cleaning +Treat internal documentation as code. Implement "Docs-as-Code" practices where documentation is reviewed and pruned regularly. Removing obsolete information is just as important as adding new information to prevent the AI from retrieving "zombie" instructions. + +### Use Identity-Aware Vector Search +Ensure your AI backend integrates with your Single Sign-On (SSO) provider. When a user asks a question, the system should only retrieve data fragments that the user's specific credentials allow them to see, maintaining the "principle of least privilege." + +### Human-in-the-Loop Feedback +Provide a "thumbs up/down" mechanism for AI-generated answers. Use this feedback to identify "knowledge gaps"—areas where the AI consistently fails because the underlying internal data is either missing or confusing. + +## Adjacent Capabilities + +The following capabilities will be valuable for you and your team to explore, as they are either: + +- Related (they cover similar territory to Continuous Integration) +- Upstream (they are a pre-requisite for Continuous Integration) +- Downstream (Continuous Integration is a pre-requisite for them) + +### [Documentation Quality](/capabilities/documentation-quality.md) - Upstream +High-quality, modular, and clear documentation is a prerequisite. If humans can't understand the source text, an AI's summary of that text will be equally confusing. + +### [Pervasive Security](/capabilities/pervasive-security.md) - Upstream +Robust security protocols must be in place before making data AI-accessible to prevent internal data leaks or unauthorized privilege escalation via the AI interface. + +### [Clear and Communicated AI Stance](/capabilities/clear-and-communicated-ai-stance.md) - Related +As internal AI accessible data gets rolled out at your company, it will pair perfectly with having a balanced and effectively communicated stance on AI. From 36adaa28cd44aad11c1e9dcb19046d343eb35abc Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 6 Jan 2026 10:58:35 -0700 Subject: [PATCH 064/131] fix spelling --- ...accessable-internal-data.md => ai-accessible-internal-data.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename capabilities/{ai-accessable-internal-data.md => ai-accessible-internal-data.md} (100%) diff --git a/capabilities/ai-accessable-internal-data.md b/capabilities/ai-accessible-internal-data.md similarity index 100% rename from capabilities/ai-accessable-internal-data.md rename to capabilities/ai-accessible-internal-data.md From b392766e1d9b5c505452bc0a5be8d0cb66e490b6 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Fri, 9 Jan 2026 14:59:58 -0600 Subject: [PATCH 065/131] edits to new ai-accessible data capability --- capabilities/ai-accessible-internal-data.md | 29 +++++++++++---------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/capabilities/ai-accessible-internal-data.md b/capabilities/ai-accessible-internal-data.md index 9694ba7..e3e0df7 100644 --- a/capabilities/ai-accessible-internal-data.md +++ b/capabilities/ai-accessible-internal-data.md @@ -1,6 +1,6 @@ # [AI-accessible Internal Data](https://dora.dev/capabilities/ai-accessible-internal-data/) -AI-accessible internal data refers to the practice of making an organization's proprietary information—such as documentation, codebases, wikis, and process manuals—structured and available for consumption by Artificial Intelligence (AI) models. By utilizing technologies like Retrieval-Augmented Generation (RAG) and vector databases, organizations enable team members to query internal knowledge using natural language. The primary benefit is reducing "discovery time," allowing engineers and stakeholders to find accurate information quickly without sifting through fragmented silos. +AI-accessible Internal Data, as a capability, means making an organization's proprietary information—-such as documentation, codebases, wikis, and process manuals—-structured and available for consumption by AI models. By using technologies like Retrieval-Augmented Generation (RAG) and vector databases, organizations enable team members to query internal knowledge using natural language. The primary benefit is reducing "discovery time," allowing engineers and stakeholders to find accurate information quickly without sifting through fragmented silos. ## Nuances @@ -8,7 +8,7 @@ This section outlines common pitfalls, challenges, or limitations teams commonly ### The "Garbage In, Garbage Out" Problem -AI is only as good as the data it can access. If your internal documentation is outdated, contradictory, or poorly written, the AI will provide confident but incorrect answers. Simply connecting an LLM to a messy SharePoint or a neglected Wiki will often result in "hallucinations" where the AI invents internal policies. High-quality AI accessibility requires a foundational commitment to data hygiene and documentation maintenance. +AI is only as good as the data it can access. If your internal documentation is outdated, contradictory, or poorly written, the AI will provide confident but incorrect answers. Simply connecting an LLM to a messy SharePoint or a neglected wiki will often result in "hallucinations" where the AI invents internal policies. High-quality AI accessibility requires a foundational commitment to data hygiene and documentation maintenance. ### Privacy and Security Scoping @@ -26,10 +26,10 @@ While AI makes information easier to find, there is a risk that teams stop talki To assess how mature your team or organization is in this capability, complete this short exercise. -Consider the descriptions below and score your team on this capability. Generally, score a 1 if your data is locked in silos and unsearchable, a 2 if you have basic search but no AI integration, a 3 if you have an AI tool that works but has limitations, and a 4 if AI is the primary, reliable interface for organizational knowledge. +Consider the descriptions below and score your team on this capability. Generally, score a 1 if your internal data is locked in silos and unsearchable, a 2 if you have basic search but no AI integration, a 3 if you have an AI tool that works but has limitations, and a 4 if AI is the primary, reliable interface for organizational knowledge. -1. **Fragmented & Manual:** Data is scattered across various tools (Slack, Jira, Google Docs, Email). Finding information requires manual searching or asking individuals. There is no AI interface for internal data. -2. **Centralized but Static:** Most data is in a central wiki or repo with a basic keyword search. Some experiments with AI exist, but they are prone to hallucinations and lack access to real-time updates. +1. **Fragmented & Manual:** Data is scattered across various tools (e.g., Slack, Jira, Google Docs, email). Finding information requires manual searching or asking individuals. There is no AI interface for internal data. +2. **Centralized but Static:** Most data is in a central wiki or repo and is accessible with a basic keyword search. Some experiments with AI exist, but they are prone to hallucinations and lack access to real-time updates. 3. **Integrated & Useful:** An AI-powered search or chatbot exists that can access most technical documentation and code. It provides citations for its answers. Accuracy is high, though it occasionally misses very recent changes or restricted data. 4. **Ubiquitous & Trusted:** AI has secure, real-time access to all relevant internal data sources. It respects granular permissions and is the first place employees go for answers. Feedback loops are in place to correct the AI and update the underlying documentation simultaneously. @@ -40,31 +40,32 @@ The following is a curated list of supporting practices to consider when looking ### Implement Retrieval-Augmented Generation (RAG) Instead of training a model on your data, use RAG to retrieve relevant documents from a database and pass them to the AI as context for each specific query. This reduces hallucinations and allows the AI to cite its sources, enabling users to verify the information. -### Automate Data Indexing Pipelines +### Automate Data-indexing Pipelines Create automated workflows that trigger every time a document is updated or a pull request is merged. This ensures that the vector database used by the AI stays synchronized with the actual state of your projects, providing "fresh" answers. ### Establish Data Governance and Cleaning Treat internal documentation as code. Implement "Docs-as-Code" practices where documentation is reviewed and pruned regularly. Removing obsolete information is just as important as adding new information to prevent the AI from retrieving "zombie" instructions. -### Use Identity-Aware Vector Search +### Use Identity-aware Vector Search Ensure your AI backend integrates with your Single Sign-On (SSO) provider. When a user asks a question, the system should only retrieve data fragments that the user's specific credentials allow them to see, maintaining the "principle of least privilege." ### Human-in-the-Loop Feedback -Provide a "thumbs up/down" mechanism for AI-generated answers. Use this feedback to identify "knowledge gaps"—areas where the AI consistently fails because the underlying internal data is either missing or confusing. +Provide a "thumbs up/down" mechanism for AI-generated answers. Use this feedback to identify "knowledge gaps"—-areas where the AI consistently fails because the underlying internal data is either missing or confusing. ## Adjacent Capabilities The following capabilities will be valuable for you and your team to explore, as they are either: -- Related (they cover similar territory to Continuous Integration) -- Upstream (they are a pre-requisite for Continuous Integration) -- Downstream (Continuous Integration is a pre-requisite for them) +- Related (they cover similar territory to AI-accessible Internal Data) +- Upstream (they are a pre-requisite for AI-accessible Internal Data) +- Downstream (AI-accessible Internal Data is a pre-requisite for them) + +### [Clear and Communicated AI Stance](/capabilities/clear-and-communicated-ai-stance.md) - Related +As internal AI-accessible data gets rolled out at your company, it will pair perfectly with having a balanced and effectively communicated stance on AI. ### [Documentation Quality](/capabilities/documentation-quality.md) - Upstream -High-quality, modular, and clear documentation is a prerequisite. If humans can't understand the source text, an AI's summary of that text will be equally confusing. +High-quality, modular, updated, and clear documentation is a pre-requisite for AI-accessible Internal Data. If humans can't understand the source text, an AI's summary of that text will be equally confusing. ### [Pervasive Security](/capabilities/pervasive-security.md) - Upstream Robust security protocols must be in place before making data AI-accessible to prevent internal data leaks or unauthorized privilege escalation via the AI interface. -### [Clear and Communicated AI Stance](/capabilities/clear-and-communicated-ai-stance.md) - Related -As internal AI accessible data gets rolled out at your company, it will pair perfectly with having a balanced and effectively communicated stance on AI. From 2e04019897af728e99399c68211059c7c8741c41 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 12 Jan 2026 12:22:02 -0700 Subject: [PATCH 066/131] add missing template text --- capabilities/ai-accessible-internal-data.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/capabilities/ai-accessible-internal-data.md b/capabilities/ai-accessible-internal-data.md index e3e0df7..fca2ed3 100644 --- a/capabilities/ai-accessible-internal-data.md +++ b/capabilities/ai-accessible-internal-data.md @@ -33,6 +33,10 @@ Consider the descriptions below and score your team on this capability. Generall 3. **Integrated & Useful:** An AI-powered search or chatbot exists that can access most technical documentation and code. It provides citations for its answers. Accuracy is high, though it occasionally misses very recent changes or restricted data. 4. **Ubiquitous & Trusted:** AI has secure, real-time access to all relevant internal data sources. It respects granular permissions and is the first place employees go for answers. Feedback loops are in place to correct the AI and update the underlying documentation simultaneously. +The number you selected represents your overall score for this capability. If you feel like the general AI-accessible Internal Data of your team fits somewhere in between two scores, it's okay to use a decimal. For example, if you think employees are somewhere between managing their loads and finding a good balance, you would score a 2.5. + +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of AI-accessible Internal Data; you would likely benefit from evaluating your scores in other capabilities. + ## Supporting Practices The following is a curated list of supporting practices to consider when looking to improve your team's AI-accessible Internal Data capability. From 08f1a624a55268a1c5d207856ece9045b6d846eb Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 12 Jan 2026 12:24:19 -0700 Subject: [PATCH 067/131] remove human in the loop feedback practice --- capabilities/ai-accessible-internal-data.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/capabilities/ai-accessible-internal-data.md b/capabilities/ai-accessible-internal-data.md index fca2ed3..b185afd 100644 --- a/capabilities/ai-accessible-internal-data.md +++ b/capabilities/ai-accessible-internal-data.md @@ -53,9 +53,6 @@ Treat internal documentation as code. Implement "Docs-as-Code" practices where d ### Use Identity-aware Vector Search Ensure your AI backend integrates with your Single Sign-On (SSO) provider. When a user asks a question, the system should only retrieve data fragments that the user's specific credentials allow them to see, maintaining the "principle of least privilege." -### Human-in-the-Loop Feedback -Provide a "thumbs up/down" mechanism for AI-generated answers. Use this feedback to identify "knowledge gaps"—-areas where the AI consistently fails because the underlying internal data is either missing or confusing. - ## Adjacent Capabilities The following capabilities will be valuable for you and your team to explore, as they are either: From 0139d36e1c56f2fe0c23ac97f5b927627fcb60a7 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 12 Jan 2026 12:39:16 -0700 Subject: [PATCH 068/131] add practice to AI-accessable Data --- capabilities/ai-accessible-internal-data.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/capabilities/ai-accessible-internal-data.md b/capabilities/ai-accessible-internal-data.md index b185afd..8a65866 100644 --- a/capabilities/ai-accessible-internal-data.md +++ b/capabilities/ai-accessible-internal-data.md @@ -47,9 +47,14 @@ Instead of training a model on your data, use RAG to retrieve relevant documents ### Automate Data-indexing Pipelines Create automated workflows that trigger every time a document is updated or a pull request is merged. This ensures that the vector database used by the AI stays synchronized with the actual state of your projects, providing "fresh" answers. + ### Establish Data Governance and Cleaning Treat internal documentation as code. Implement "Docs-as-Code" practices where documentation is reviewed and pruned regularly. Removing obsolete information is just as important as adding new information to prevent the AI from retrieving "zombie" instructions. +### Schedule Regular Documentation Audits + +Regular audits can help keep documentation owners accountable for the maintenance of their documentation which helps keep AI from returning bad or misleading data. + ### Use Identity-aware Vector Search Ensure your AI backend integrates with your Single Sign-On (SSO) provider. When a user asks a question, the system should only retrieve data fragments that the user's specific credentials allow them to see, maintaining the "principle of least privilege." From a9c87751409d500d878188eff6b4eeddb3c58070 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Tue, 13 Jan 2026 09:09:14 -0600 Subject: [PATCH 069/131] minor edit to ai-accessible-data --- capabilities/ai-accessible-internal-data.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/capabilities/ai-accessible-internal-data.md b/capabilities/ai-accessible-internal-data.md index 8a65866..b87e5c6 100644 --- a/capabilities/ai-accessible-internal-data.md +++ b/capabilities/ai-accessible-internal-data.md @@ -41,19 +41,17 @@ Generally, an overall score equal to or less than 3 means you'll likely gain a l The following is a curated list of supporting practices to consider when looking to improve your team's AI-accessible Internal Data capability. -### Implement Retrieval-Augmented Generation (RAG) +### Implement Retrieval-augmented Generation (RAG) Instead of training a model on your data, use RAG to retrieve relevant documents from a database and pass them to the AI as context for each specific query. This reduces hallucinations and allows the AI to cite its sources, enabling users to verify the information. ### Automate Data-indexing Pipelines Create automated workflows that trigger every time a document is updated or a pull request is merged. This ensures that the vector database used by the AI stays synchronized with the actual state of your projects, providing "fresh" answers. - ### Establish Data Governance and Cleaning Treat internal documentation as code. Implement "Docs-as-Code" practices where documentation is reviewed and pruned regularly. Removing obsolete information is just as important as adding new information to prevent the AI from retrieving "zombie" instructions. ### Schedule Regular Documentation Audits - -Regular audits can help keep documentation owners accountable for the maintenance of their documentation which helps keep AI from returning bad or misleading data. +Regular audits can help keep documentation owners accountable for the maintenance of their documents, which helps keep AI from returning incorrect, outdated, or misleading data. ### Use Identity-aware Vector Search Ensure your AI backend integrates with your Single Sign-On (SSO) provider. When a user asks a question, the system should only retrieve data fragments that the user's specific credentials allow them to see, maintaining the "principle of least privilege." From 8b24cd6ea32a56b48fc76ed929f5b88fdabfa811 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 6 Jan 2026 16:53:27 -0700 Subject: [PATCH 070/131] New Capability: User Centric Focus --- capabilities/user-centric-focus.md | 51 ++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/capabilities/user-centric-focus.md b/capabilities/user-centric-focus.md index e69de29..75a465a 100644 --- a/capabilities/user-centric-focus.md +++ b/capabilities/user-centric-focus.md @@ -0,0 +1,51 @@ +# [User-centric focus](https://dora.dev/capabilities/user-centric-focus/) + +Software value is ultimately defined by its usefulness to human beings, making a user-centric focus the essential "North Star" for development teams. In the era of AI-assisted coding, this focus ensures that increased velocity leads to meaningful solutions rather than just faster mistakes. By prioritizing actual user outcomes over simple technical output, organizations achieve significantly higher performance and job satisfaction. + +## Nuances + +This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. + +### The "Feature Factory" Trap + +One of the most common pitfalls is the "feature factory" mindset, where success is measured by output (velocity and features shipped) rather than outcomes (user value). When teams optimize for activity, they risk building "shiny but hardly used" features. AI can significantly exacerbate this problem; because AI makes it easier to write code, teams may find themselves producing a high volume of low-value software faster than ever, leading to high activity but low impact. + +### Organizational Silos and "Solutionism" + +Organizational structures often create a "gatekeeper" model where product managers or researchers sit between developers and end users. This disconnect robs engineers of the context needed to build intuitive solutions and effectively verify AI-generated outputs. This often leads to "resume-driven development" or "solutionism," where teams adopt complex AI models or new technologies for their own sake rather than to solve a specific, validated user problem. + +## Supporting Practices + +The following is a curated list of supporting practices to consider when looking to improve your team's User-centric focus capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. + +### Integrate Low-Latency Feedback Loops + +Teams should establish direct channels for user feedback—such as in-app surveys or observation sessions—that are accessible to developers immediately. For teams using AI, these loops are critical for refining workflows and validating that the AI’s output aligns with reality. When feedback is integrated directly into the development cycle, it allows for the continuous reprioritization of the backlog based on what users actually need. + +### Implement Spec-driven Development (SDD) + +To keep AI aligned with user needs, teams can use Spec-driven Development. In this approach, developers refine user requirements and constraints into detailed documentation (specs) before any code is written. This documentation serves as the source of truth for AI agents. By constraining AI output to validated user requirements, teams ensure that the generated code solves the intended problem rather than just following generic coding patterns. + +## Adjacent Capabilities + +The following capabilities will be valuable for you and your team to explore, as they are either: + +- Related (they cover similar territory to User-centric focus) +- Upstream (they are a pre-requisite for User-centric focus) +- Downstream (User-centric focus is a pre-requisite for them) + +### [Customer feedback](/capabilities/customer-feedback.md) - Related + +Customer feedback provides the raw data and insights necessary to build a user-centric focus. While user-centricity is the mindset and prioritization strategy, the customer feedback capability focuses on the technical and procedural methods used to gather that information. + +### [Documentation quality](/capabilities/documentation-quality.md) - Upstream + +High-quality documentation is a prerequisite for practices like Spec-driven Development. For AI to effectively assist in a user-centric way, the underlying requirements, user stories, and technical specs must be clear, accurate, and well-maintained. + +### [Team experimentation](/capabilities/team-experimentation.md) - Downstream + +Once a team has a strong user-centric focus, they can more effectively engage in experimentation. A deep understanding of the user allows teams to create meaningful hypotheses, using A/B testing and other experimental methods to see which solutions actually drive the desired user outcomes. + +### [Job satisfaction](/capabilities/job-satisfaction.md) - Downstream + +DORA research indicates that a user-centric focus is a strong predictor of job satisfaction. When developers see the direct impact of their work on real users and feel they are solving meaningful problems, it leads to higher engagement and morale. From 53bf535f9572a4dbafcc6eed80930f82cfd64330 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Tue, 13 Jan 2026 11:10:31 -0600 Subject: [PATCH 071/131] edit of new User-centric focus capability --- capabilities/user-centric-focus.md | 50 ++++++++++++++++-------------- 1 file changed, 26 insertions(+), 24 deletions(-) diff --git a/capabilities/user-centric-focus.md b/capabilities/user-centric-focus.md index 75a465a..f6f1582 100644 --- a/capabilities/user-centric-focus.md +++ b/capabilities/user-centric-focus.md @@ -1,51 +1,53 @@ -# [User-centric focus](https://dora.dev/capabilities/user-centric-focus/) +# [User-centric Focus](https://dora.dev/capabilities/user-centric-focus/) -Software value is ultimately defined by its usefulness to human beings, making a user-centric focus the essential "North Star" for development teams. In the era of AI-assisted coding, this focus ensures that increased velocity leads to meaningful solutions rather than just faster mistakes. By prioritizing actual user outcomes over simple technical output, organizations achieve significantly higher performance and job satisfaction. +The value of software is ultimately defined by its usefulness to human beings, making a user-centric focus the essential "North Star" for development teams. In the era of AI-assisted coding, this focus on the user ensures that increased velocity leads to meaningful solutions rather than just faster mistakes. By prioritizing actual user outcomes over simple technical output, organizations achieve significantly higher performance and job satisfaction. ## Nuances - This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. ### The "Feature Factory" Trap - -One of the most common pitfalls is the "feature factory" mindset, where success is measured by output (velocity and features shipped) rather than outcomes (user value). When teams optimize for activity, they risk building "shiny but hardly used" features. AI can significantly exacerbate this problem; because AI makes it easier to write code, teams may find themselves producing a high volume of low-value software faster than ever, leading to high activity but low impact. +When shifting the focus to the user, one of the most common pitfalls is getting into the "feature factory" mindset, where success is measured by output (velocity and features shipped) rather than outcomes (user value). When teams optimize for activity, they risk building "shiny but hardly used" features. AI can significantly exacerbate this problem; because AI makes it easier to write code, teams may find themselves producing a high volume of low-value software faster than ever, leading to high activity but low impact. ### Organizational Silos and "Solutionism" +Organizational structures often create a "gatekeeper" model, where product managers or researchers sit between developers and end users. This disconnect robs engineers of the context needed to build intuitive solutions and effectively verify AI-generated outputs. This often leads to "resumé-driven development" or "solutionism," where teams adopt complex AI models or new technologies for their own sake rather than to solve a specific, validated user problem. -Organizational structures often create a "gatekeeper" model where product managers or researchers sit between developers and end users. This disconnect robs engineers of the context needed to build intuitive solutions and effectively verify AI-generated outputs. This often leads to "resume-driven development" or "solutionism," where teams adopt complex AI models or new technologies for their own sake rather than to solve a specific, validated user problem. +##Assessment +To assess how mature your team or organization is in this capability, complete this short exercise. -## Supporting Practices +Consider the descriptions below and score your team or organization on the User-centric Focus capability. Generally, score a 1 if xxxx, a 2 if xxxx, a 3 if xxx, and a 4 if xxx. -The following is a curated list of supporting practices to consider when looking to improve your team's User-centric focus capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. +Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. -### Integrate Low-Latency Feedback Loops +... -Teams should establish direct channels for user feedback—such as in-app surveys or observation sessions—that are accessible to developers immediately. For teams using AI, these loops are critical for refining workflows and validating that the AI’s output aligns with reality. When feedback is integrated directly into the development cycle, it allows for the continuous reprioritization of the backlog based on what users actually need. +The number you selected represents your overall score for this capability. If you feel like your team or organization fits somewhere in between two scores, it's okay to use a decimal. For example, if you think database changes in your team or organization are somewhere between partially automated and mostly automated, then you would score a 2.5. -### Implement Spec-driven Development (SDD) +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of User-centric Focus; you would likely benefit from evaluating your scores in other capabilities. -To keep AI aligned with user needs, teams can use Spec-driven Development. In this approach, developers refine user requirements and constraints into detailed documentation (specs) before any code is written. This documentation serves as the source of truth for AI agents. By constraining AI output to validated user requirements, teams ensure that the generated code solves the intended problem rather than just following generic coding patterns. +## Supporting Practices +The following is a curated list of supporting practices to consider when looking to improve your team's User-centric Focus capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -## Adjacent Capabilities +### Integrate Low-latency Feedback Loops +Teams should establish direct channels for user feedback-—such as in-app surveys or observation sessions—-that are accessible to developers immediately. For teams using AI, these feedback loops are critical for refining workflows and validating that the AI’s output aligns with reality. When feedback is integrated directly into the development cycle, the backlog can be continuously reprioritized, based on what users actually need. +### Implement Spec-driven Development (SDD) +To keep AI efforts aligned with user needs, teams can use Spec-driven Development. In this approach, developers refine user requirements and constraints into detailed documentation (specs) before any code is written. These specs serve as the source of truth for AI agents. By constraining AI output to validated user requirements, teams ensure that generated code solves user problems rather than just following generic coding patterns. + +## Adjacent Capabilities The following capabilities will be valuable for you and your team to explore, as they are either: - Related (they cover similar territory to User-centric focus) - Upstream (they are a pre-requisite for User-centric focus) - Downstream (User-centric focus is a pre-requisite for them) -### [Customer feedback](/capabilities/customer-feedback.md) - Related - -Customer feedback provides the raw data and insights necessary to build a user-centric focus. While user-centricity is the mindset and prioritization strategy, the customer feedback capability focuses on the technical and procedural methods used to gather that information. +### [Customer Feedback](/capabilities/customer-feedback.md) - Related +Customer feedback provides the raw data and insights necessary to build a user-centric focus. While user-centricity is the mindset and prioritization strategy, the Customer Feedback capability focuses on the technical and procedural methods used to gather that information. -### [Documentation quality](/capabilities/documentation-quality.md) - Upstream - -High-quality documentation is a prerequisite for practices like Spec-driven Development. For AI to effectively assist in a user-centric way, the underlying requirements, user stories, and technical specs must be clear, accurate, and well-maintained. - -### [Team experimentation](/capabilities/team-experimentation.md) - Downstream +### [Documentation Quality](/capabilities/documentation-quality.md) - Upstream +High-quality documentation is a pre-requisite for practices like Spec-driven Development. For AI to effectively assist in a user-centric way, the underlying requirements, user stories, and technical specs must be clear, accurate, and well-maintained. +### [Team Experimentation](/capabilities/team-experimentation.md) - Downstream Once a team has a strong user-centric focus, they can more effectively engage in experimentation. A deep understanding of the user allows teams to create meaningful hypotheses, using A/B testing and other experimental methods to see which solutions actually drive the desired user outcomes. -### [Job satisfaction](/capabilities/job-satisfaction.md) - Downstream - -DORA research indicates that a user-centric focus is a strong predictor of job satisfaction. When developers see the direct impact of their work on real users and feel they are solving meaningful problems, it leads to higher engagement and morale. +### [Job Satisfaction](/capabilities/job-satisfaction.md) - Downstream +DORA research indicates that a user-centric focus is a strong predictor of job satisfaction. Developers who see the direct impact of their work on real users and feel they are solving meaningful problems are typically more engaged, and overall team morale tends to be higher. From 726a7facdec1b3ebc93a8e01bdb6274c634829f0 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 14:47:28 -0700 Subject: [PATCH 072/131] User Centric Focus: add Assessment --- capabilities/user-centric-focus.md | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/capabilities/user-centric-focus.md b/capabilities/user-centric-focus.md index f6f1582..cf3bc98 100644 --- a/capabilities/user-centric-focus.md +++ b/capabilities/user-centric-focus.md @@ -11,29 +11,37 @@ When shifting the focus to the user, one of the most common pitfalls is getting ### Organizational Silos and "Solutionism" Organizational structures often create a "gatekeeper" model, where product managers or researchers sit between developers and end users. This disconnect robs engineers of the context needed to build intuitive solutions and effectively verify AI-generated outputs. This often leads to "resumé-driven development" or "solutionism," where teams adopt complex AI models or new technologies for their own sake rather than to solve a specific, validated user problem. -##Assessment +## Assessment + To assess how mature your team or organization is in this capability, complete this short exercise. -Consider the descriptions below and score your team or organization on the User-centric Focus capability. Generally, score a 1 if xxxx, a 2 if xxxx, a 3 if xxx, and a 4 if xxx. +Consider the descriptions below and score your team or organization on this capability. Generally, score a 1 if data is untrusted and largely inaccessible, a 2 if data is documented but outdated, a 3 if data is trusted and discoverable, and a 4 if data is self-service and treated as a first-class citizen. Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. -... +1. **The Feature Factory**: Teams focus on output volume and use AI to ship more features without validating user impact or feedback. +2. **Reactive & Proxy-Led**: Teams rely on siloed feedback and manual hand-offs, using AI to accelerate ticket completion rather than user outcomes. +3. **Integrated & Spec-Driven**: Teams use spec-driven development and direct user observation to ensure AI outputs are grounded in verified requirements. +4. **User-Invested & Self-Correcting**: Teams treat AI as a discovery partner, using real-time user metrics and rapid prototyping to pivot toward maximum value. -The number you selected represents your overall score for this capability. If you feel like your team or organization fits somewhere in between two scores, it's okay to use a decimal. For example, if you think database changes in your team or organization are somewhere between partially automated and mostly automated, then you would score a 2.5. +The number you selected represents your overall score for this capability. If you feel like your organization fits somewhere in between two scores, it's okay to use a decimal. -Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of User-centric Focus; you would likely benefit from evaluating your scores in other capabilities. +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means your team or organization is largely proficient, or well on its way to becoming proficient, in the area of data health; you would likely benefit from evaluating your scores in other capabilities. ## Supporting Practices + The following is a curated list of supporting practices to consider when looking to improve your team's User-centric Focus capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -### Integrate Low-latency Feedback Loops -Teams should establish direct channels for user feedback-—such as in-app surveys or observation sessions—-that are accessible to developers immediately. For teams using AI, these feedback loops are critical for refining workflows and validating that the AI’s output aligns with reality. When feedback is integrated directly into the development cycle, the backlog can be continuously reprioritized, based on what users actually need. +### Talk Directly With Users + +Teams should establish direct channels for user feedback that are accessible to developers immediately. For teams using AI, these feedback loops are critical for refining workflows and validating that the AI’s output aligns with reality. When feedback is integrated directly into the development cycle, the backlog can be continuously reprioritized, based on what users actually need. ### Implement Spec-driven Development (SDD) + To keep AI efforts aligned with user needs, teams can use Spec-driven Development. In this approach, developers refine user requirements and constraints into detailed documentation (specs) before any code is written. These specs serve as the source of truth for AI agents. By constraining AI output to validated user requirements, teams ensure that generated code solves user problems rather than just following generic coding patterns. ## Adjacent Capabilities + The following capabilities will be valuable for you and your team to explore, as they are either: - Related (they cover similar territory to User-centric focus) @@ -41,13 +49,18 @@ The following capabilities will be valuable for you and your team to explore, as - Downstream (User-centric focus is a pre-requisite for them) ### [Customer Feedback](/capabilities/customer-feedback.md) - Related + Customer feedback provides the raw data and insights necessary to build a user-centric focus. While user-centricity is the mindset and prioritization strategy, the Customer Feedback capability focuses on the technical and procedural methods used to gather that information. ### [Documentation Quality](/capabilities/documentation-quality.md) - Upstream + High-quality documentation is a pre-requisite for practices like Spec-driven Development. For AI to effectively assist in a user-centric way, the underlying requirements, user stories, and technical specs must be clear, accurate, and well-maintained. ### [Team Experimentation](/capabilities/team-experimentation.md) - Downstream + Once a team has a strong user-centric focus, they can more effectively engage in experimentation. A deep understanding of the user allows teams to create meaningful hypotheses, using A/B testing and other experimental methods to see which solutions actually drive the desired user outcomes. ### [Job Satisfaction](/capabilities/job-satisfaction.md) - Downstream + DORA research indicates that a user-centric focus is a strong predictor of job satisfaction. Developers who see the direct impact of their work on real users and feel they are solving meaningful problems are typically more engaged, and overall team morale tends to be higher. + From af6456c39688ae45ac14e7ab3b9b124847f1a5e1 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 15:29:32 -0700 Subject: [PATCH 073/131] User Centric Focus: fix assessment --- capabilities/user-centric-focus.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/capabilities/user-centric-focus.md b/capabilities/user-centric-focus.md index cf3bc98..969f44a 100644 --- a/capabilities/user-centric-focus.md +++ b/capabilities/user-centric-focus.md @@ -15,7 +15,7 @@ Organizational structures often create a "gatekeeper" model, where product manag To assess how mature your team or organization is in this capability, complete this short exercise. -Consider the descriptions below and score your team or organization on this capability. Generally, score a 1 if data is untrusted and largely inaccessible, a 2 if data is documented but outdated, a 3 if data is trusted and discoverable, and a 4 if data is self-service and treated as a first-class citizen. +Consider the descriptions below and score your team on this capability: score a 1 if you are a Feature Factory focused on shipping volume without validation; a 2 if Reactive & Proxy-Led, using AI to close tickets rather than drive outcomes; a 3 if Integrated & Spec-Driven, grounding AI in verified requirements and user observation; and a 4 if User-Invested & Self-Correcting, using AI as a discovery partner to pivot toward maximum value. Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. From 22892ed9ee07bef183a31a9ea945a053c4b8b500 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 7 Jan 2026 11:47:06 -0700 Subject: [PATCH 074/131] New Capability: Platform Engineering --- capabilities/platform-engineering.md | 48 ++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/capabilities/platform-engineering.md b/capabilities/platform-engineering.md index e69de29..69b40e8 100644 --- a/capabilities/platform-engineering.md +++ b/capabilities/platform-engineering.md @@ -0,0 +1,48 @@ +# [Platform Engineering AI](https://dora.dev/capabilities/platform-engineering/) + +Platform Engineering is primarily about enabling value stream developers to do their jobs faster and with less cognitive load by creating an internal platform which is treated with the same care as the main product. Because of the rise of AI in development, having a strong platform has become "no longer optional". It has become necessary to allow gains from AI in development to not get negated by other organizational inefficiencies down the line. + +## Nuances + +This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. + +### The "Build It and They Will Come" Trap + +One of the most common pitfalls is building a platform based on top-down assumptions rather than user research. When a platform team operates in a vacuum, they often create tools that don't solve real-world developer problems or fit existing workflows. This leads to low adoption rates and the rise of "shadow IT," where teams bypass the platform entirely to get their work done, defeating the purpose of standardization. + +### The "One-Size-Fits-All" Golden Cage + +Different types of development—such as data science, mobile, and legacy systems—have unique requirements. A platform that fails to provide enabling constraints and instead mandates a single, inflexible workflow will frustrate specialized teams and hinder the very innovation it was meant to accelerate. + +## Supporting Practices + +The following is a curated list of supporting practices to consider when looking to improve your team's Platform Engineering AI capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. + +### Adopt a Product Management Mindset +Treat the platform as a product and developers as customers. This involves assigning a dedicated product manager to map critical user journeys—like "spinning up a new service" or "debugging production"—and creating a roadmap based on alleviating developer friction. Success is measured by developer satisfaction (DevEx) and the ease with which users can self-serve. + +### Proactively "Shift Down" Cognitive Load +The platform should abstract away the complexities of Kubernetes, cloud networking, and security policies. By "shifting down" these requirements into the platform's automated pathways, developers are freed from needing to be infrastructure experts. This independence is a significant driver of productivity, allowing teams to focus almost exclusively on delivering user value. + +### Prioritize Clear and Actionable Feedback +DORA data highlights that the capability most correlated with a positive user experience is receiving "clear feedback on the outcome of my tasks." Platforms must provide immediate, transparent logs and diagnostics when a task (like a deployment or test suite) fails. This empowers developers to troubleshoot independently without opening support tickets, maintaining the "flow" of development. + +## Adjacent Capabilities + +The following capabilities will be valuable for you and your team to explore, as they are either: + +- Related (they cover similar territory to Platform Engineering AI) +- Upstream (they are a pre-requisite for Platform Engineering AI) +- Downstream (Platform Engineering AI is a pre-requisite for them) + +### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) - Upstream +A platform requires high trust and collaboration to succeed. A generative culture ensures that the feedback loops between platform teams and developers are honest and productive, preventing the "ivory tower" approach where standards are dictated without empathy for the developer experience. + +### [Continuous Delivery](/capabilities/continuous-delivery.md) - Downstream +Platform engineering provides the "paved road" required for true Continuous Delivery. Without the automated, secure, and compliant pathways built by platform engineers, achieving the frequent, low-risk releases characterized by CD is nearly impossible at scale. + +### [Empowering Teams to Choose Tools](/capabilities/empowering-teams-to-choose-tools.md) - Related +This capability balances the platform's goal of standardization. While the platform provides "golden paths," empowering teams to choose tools ensures that the organization remains flexible enough to adopt new technologies that might eventually become the next standard within the platform. + +### [Visibility of Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) - Related +A quality platform naturally increases visibility by centralizing where work happens. By integrating telemetry and tracking into the platform's toolchain, organizations gain a clearer view of bottlenecks and lead times across the entire value stream. From 1f50c103f01a96dddec527487bb2a74b67765fab Mon Sep 17 00:00:00 2001 From: nicoletache Date: Tue, 13 Jan 2026 11:53:41 -0600 Subject: [PATCH 075/131] edit of new Platform engineering capability --- capabilities/platform-engineering.md | 51 ++++++++++++++++------------ 1 file changed, 29 insertions(+), 22 deletions(-) diff --git a/capabilities/platform-engineering.md b/capabilities/platform-engineering.md index 69b40e8..4ffef08 100644 --- a/capabilities/platform-engineering.md +++ b/capabilities/platform-engineering.md @@ -1,48 +1,55 @@ -# [Platform Engineering AI](https://dora.dev/capabilities/platform-engineering/) - -Platform Engineering is primarily about enabling value stream developers to do their jobs faster and with less cognitive load by creating an internal platform which is treated with the same care as the main product. Because of the rise of AI in development, having a strong platform has become "no longer optional". It has become necessary to allow gains from AI in development to not get negated by other organizational inefficiencies down the line. +# [Platform Engineering](https://dora.dev/capabilities/platform-engineering/) +Platform Engineering is primarily about enabling value stream developers to do their jobs faster and with less cognitive load by creating an internal platform that is treated with the same care as the main product. Because of the rise of AI in development, having a strong platform is no longer an option; it is necessary. Platform Engineering, as a capability, ensures that gains from AI-driven development are not negated by other organizational inefficiencies down the line. ## Nuances - This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. ### The "Build It and They Will Come" Trap - -One of the most common pitfalls is building a platform based on top-down assumptions rather than user research. When a platform team operates in a vacuum, they often create tools that don't solve real-world developer problems or fit existing workflows. This leads to low adoption rates and the rise of "shadow IT," where teams bypass the platform entirely to get their work done, defeating the purpose of standardization. +One of the most common pitfalls that teams run into is building a platform based on top-down assumptions rather than actual user research. When a platform team operates in a vacuum, they often create tools that don't solve real-world developer problems or fit existing workflows. This leads to low adoption rates and the rise of "shadow IT," where teams bypass the platform entirely to get their work done, defeating the purpose of standardization. ### The "One-Size-Fits-All" Golden Cage +Different types of development—-such as data science, mobile, and legacy systems-—have unique requirements. A platform that fails to provide relevant constraints and instead mandates a single, inflexible workflow will frustrate specialized teams and hinder the very innovation it was meant to accelerate. -Different types of development—such as data science, mobile, and legacy systems—have unique requirements. A platform that fails to provide enabling constraints and instead mandates a single, inflexible workflow will frustrate specialized teams and hinder the very innovation it was meant to accelerate. +## Assessment +To assess how mature your team or organization is in this capability, complete this short exercise. -## Supporting Practices +Consider the descriptions below and score yourself on the Platform Engineering capability. Generally, score a 1 if xxx, a 2 if xxx, a 3 if xxx, and a 4 if xxx. + +Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. + +... -The following is a curated list of supporting practices to consider when looking to improve your team's Platform Engineering AI capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. +The number you selected represents your overall score for this capability. If you feel like your team or organization fits somewhere in between two scores, it's okay to use a decimal. For example, if you think xxx is both xxx and xxx, you would score a 2.5. + +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of Platform Engineering; you would likely benefit from evaluating your scores in other capabilities. + +## Supporting Practices +The following is a curated list of supporting practices to consider when looking to improve your team's Platform Engineering capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. ### Adopt a Product Management Mindset -Treat the platform as a product and developers as customers. This involves assigning a dedicated product manager to map critical user journeys—like "spinning up a new service" or "debugging production"—and creating a roadmap based on alleviating developer friction. Success is measured by developer satisfaction (DevEx) and the ease with which users can self-serve. +During the building process, platform teams should treat the platform as a product and developers as customers. This involves two things: 1) assigning a dedicated product manager to map critical user journeys (like spinning up a new service or debugging production), and 2) creating a roadmap aimed at alleviating developer friction. Success with the platform is measured by developer satisfaction (DevEx) and the ease with which users can self-serve. ### Proactively "Shift Down" Cognitive Load -The platform should abstract away the complexities of Kubernetes, cloud networking, and security policies. By "shifting down" these requirements into the platform's automated pathways, developers are freed from needing to be infrastructure experts. This independence is a significant driver of productivity, allowing teams to focus almost exclusively on delivering user value. +The platform should abstract away the complexities of Kubernetes, cloud networking, and security policies. By "shifting down" these requirements into the platform's automated pathways, developers are freed from needing to be infrastructure experts. This independence is a significant driver of productivity, allowing developers to focus almost exclusively on delivering user value. ### Prioritize Clear and Actionable Feedback -DORA data highlights that the capability most correlated with a positive user experience is receiving "clear feedback on the outcome of my tasks." Platforms must provide immediate, transparent logs and diagnostics when a task (like a deployment or test suite) fails. This empowers developers to troubleshoot independently without opening support tickets, maintaining the "flow" of development. +DORA data highlights that a positive user experience is most commonly correlated with developers receiving "clear feedback on the outcome of my tasks." Platforms must provide immediate, transparent logs and diagnostics when a task (like a deployment or test suite) fails. This empowers developers to troubleshoot independently without opening support tickets, maintaining the "flow" of development. ## Adjacent Capabilities - The following capabilities will be valuable for you and your team to explore, as they are either: -- Related (they cover similar territory to Platform Engineering AI) -- Upstream (they are a pre-requisite for Platform Engineering AI) -- Downstream (Platform Engineering AI is a pre-requisite for them) - -### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) - Upstream -A platform requires high trust and collaboration to succeed. A generative culture ensures that the feedback loops between platform teams and developers are honest and productive, preventing the "ivory tower" approach where standards are dictated without empathy for the developer experience. - -### [Continuous Delivery](/capabilities/continuous-delivery.md) - Downstream -Platform engineering provides the "paved road" required for true Continuous Delivery. Without the automated, secure, and compliant pathways built by platform engineers, achieving the frequent, low-risk releases characterized by CD is nearly impossible at scale. +- Related (they cover similar territory to Platform Engineering) +- Upstream (they are a pre-requisite for Platform Engineering) +- Downstream (Platform Engineering is a pre-requisite for them) ### [Empowering Teams to Choose Tools](/capabilities/empowering-teams-to-choose-tools.md) - Related This capability balances the platform's goal of standardization. While the platform provides "golden paths," empowering teams to choose tools ensures that the organization remains flexible enough to adopt new technologies that might eventually become the next standard within the platform. ### [Visibility of Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) - Related A quality platform naturally increases visibility by centralizing where work happens. By integrating telemetry and tracking into the platform's toolchain, organizations gain a clearer view of bottlenecks and lead times across the entire value stream. + +### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) - Upstream +A platform requires high trust and collaboration to succeed. A generative culture ensures that the feedback loops between platform teams and developers are honest and productive, preventing the "ivory tower" approach where standards are dictated without empathy for the developer experience. + +### [Continuous Delivery](/capabilities/continuous-delivery.md) - Downstream +Platform engineering provides the "paved road" required for true Continuous Delivery. Without the automated, secure, and compliant pathways built by platform engineers, achieving the frequent, low-risk releases characterized by CD is nearly impossible at scale. From 1208288d9e206ef606bd3ccbcf87b2e556974119 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 15:23:43 -0700 Subject: [PATCH 076/131] platform-engineering: add Assessment --- capabilities/platform-engineering.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/capabilities/platform-engineering.md b/capabilities/platform-engineering.md index 4ffef08..3b58677 100644 --- a/capabilities/platform-engineering.md +++ b/capabilities/platform-engineering.md @@ -2,40 +2,51 @@ Platform Engineering is primarily about enabling value stream developers to do their jobs faster and with less cognitive load by creating an internal platform that is treated with the same care as the main product. Because of the rise of AI in development, having a strong platform is no longer an option; it is necessary. Platform Engineering, as a capability, ensures that gains from AI-driven development are not negated by other organizational inefficiencies down the line. ## Nuances + This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. ### The "Build It and They Will Come" Trap + One of the most common pitfalls that teams run into is building a platform based on top-down assumptions rather than actual user research. When a platform team operates in a vacuum, they often create tools that don't solve real-world developer problems or fit existing workflows. This leads to low adoption rates and the rise of "shadow IT," where teams bypass the platform entirely to get their work done, defeating the purpose of standardization. ### The "One-Size-Fits-All" Golden Cage + Different types of development—-such as data science, mobile, and legacy systems-—have unique requirements. A platform that fails to provide relevant constraints and instead mandates a single, inflexible workflow will frustrate specialized teams and hinder the very innovation it was meant to accelerate. ## Assessment + To assess how mature your team or organization is in this capability, complete this short exercise. -Consider the descriptions below and score yourself on the Platform Engineering capability. Generally, score a 1 if xxx, a 2 if xxx, a 3 if xxx, and a 4 if xxx. Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. -... +1. **Ticket-Ops & Fragmented Tooling:** The platform is a collection of infrastructure tickets and manual gates rather than a product. Individual AI coding gains are lost to downstream disorder, as security reviews, testing, and deployments remain manual bottlenecks that increase cognitive load. +2. **Standardized but Rigid:** Initial "golden paths" exist, but they function as a "golden cage" with little flexibility for diverse team needs. While some automation is present, developer feedback is often unclear, and the lack of self-service means AI-generated code frequently stalls at the integration phase. +3. **Product-Centric & Self-Service:** A dedicated platform team treats developers as customers, providing self-service interfaces that "shift down" complexity. Automated pipelines ensure AI-amplified throughput is consistently tested and secured, allowing teams to focus on user value rather than infrastructure hurdles. +4. **Fluid, Extensible & AI-Ready:** The platform is an extensible ecosystem where "golden paths" are the easiest choice but allow for contribution and flexibility. Real-time feedback and automated guardrails make experimentation cheap and recovery fast, fully realizing AI’s potential to accelerate the entire delivery lifecycle without sacrificing stability. -The number you selected represents your overall score for this capability. If you feel like your team or organization fits somewhere in between two scores, it's okay to use a decimal. For example, if you think xxx is both xxx and xxx, you would score a 2.5. +The number you selected represents your overall score for this capability. If you feel like your team or organization fits somewhere in between two scores, it's okay to use a decimal. For example, if you think your team or organization has somewhere between basic and integrated security, you would score a 2.5. -Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of Platform Engineering; you would likely benefit from evaluating your scores in other capabilities. +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of Pervasive Security; you would likely benefit from evaluating your scores in other capabilities. ## Supporting Practices + The following is a curated list of supporting practices to consider when looking to improve your team's Platform Engineering capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. ### Adopt a Product Management Mindset + During the building process, platform teams should treat the platform as a product and developers as customers. This involves two things: 1) assigning a dedicated product manager to map critical user journeys (like spinning up a new service or debugging production), and 2) creating a roadmap aimed at alleviating developer friction. Success with the platform is measured by developer satisfaction (DevEx) and the ease with which users can self-serve. ### Proactively "Shift Down" Cognitive Load + The platform should abstract away the complexities of Kubernetes, cloud networking, and security policies. By "shifting down" these requirements into the platform's automated pathways, developers are freed from needing to be infrastructure experts. This independence is a significant driver of productivity, allowing developers to focus almost exclusively on delivering user value. ### Prioritize Clear and Actionable Feedback + DORA data highlights that a positive user experience is most commonly correlated with developers receiving "clear feedback on the outcome of my tasks." Platforms must provide immediate, transparent logs and diagnostics when a task (like a deployment or test suite) fails. This empowers developers to troubleshoot independently without opening support tickets, maintaining the "flow" of development. ## Adjacent Capabilities + The following capabilities will be valuable for you and your team to explore, as they are either: - Related (they cover similar territory to Platform Engineering) @@ -43,13 +54,17 @@ The following capabilities will be valuable for you and your team to explore, as - Downstream (Platform Engineering is a pre-requisite for them) ### [Empowering Teams to Choose Tools](/capabilities/empowering-teams-to-choose-tools.md) - Related + This capability balances the platform's goal of standardization. While the platform provides "golden paths," empowering teams to choose tools ensures that the organization remains flexible enough to adopt new technologies that might eventually become the next standard within the platform. ### [Visibility of Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) - Related + A quality platform naturally increases visibility by centralizing where work happens. By integrating telemetry and tracking into the platform's toolchain, organizations gain a clearer view of bottlenecks and lead times across the entire value stream. ### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) - Upstream + A platform requires high trust and collaboration to succeed. A generative culture ensures that the feedback loops between platform teams and developers are honest and productive, preventing the "ivory tower" approach where standards are dictated without empathy for the developer experience. ### [Continuous Delivery](/capabilities/continuous-delivery.md) - Downstream + Platform engineering provides the "paved road" required for true Continuous Delivery. Without the automated, secure, and compliant pathways built by platform engineers, achieving the frequent, low-risk releases characterized by CD is nearly impossible at scale. From 94d3dd45b7bb41fb7e79d427085e9408fa3e8060 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 15:47:52 -0700 Subject: [PATCH 077/131] platform-engineering: remove Product Management Mindset --- capabilities/platform-engineering.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/capabilities/platform-engineering.md b/capabilities/platform-engineering.md index 3b58677..6ff80f8 100644 --- a/capabilities/platform-engineering.md +++ b/capabilities/platform-engineering.md @@ -33,10 +33,6 @@ Generally, an overall score equal to or less than 3 means you'll likely gain a l The following is a curated list of supporting practices to consider when looking to improve your team's Platform Engineering capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -### Adopt a Product Management Mindset - -During the building process, platform teams should treat the platform as a product and developers as customers. This involves two things: 1) assigning a dedicated product manager to map critical user journeys (like spinning up a new service or debugging production), and 2) creating a roadmap aimed at alleviating developer friction. Success with the platform is measured by developer satisfaction (DevEx) and the ease with which users can self-serve. - ### Proactively "Shift Down" Cognitive Load The platform should abstract away the complexities of Kubernetes, cloud networking, and security policies. By "shifting down" these requirements into the platform's automated pathways, developers are freed from needing to be infrastructure experts. This independence is a significant driver of productivity, allowing developers to focus almost exclusively on delivering user value. From 3f87aeceba1f67ba52bad10a5acffd0cd1adaa6f Mon Sep 17 00:00:00 2001 From: Dave Moore <850537+dcmoore@users.noreply.github.com> Date: Wed, 21 Jan 2026 21:43:29 -0800 Subject: [PATCH 078/131] Update open-telemetry-practice.md add groundcover to the list of observability tools --- practices/open-telemetry-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index cfab8ed..24d2981 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -1,6 +1,6 @@ # Adopt the OpenTelemetry Standard -Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it's hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. But there's a catch: These details, while useful, may not be standardized. Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system like [Honeycomb](https://www.honeycomb.io/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. +Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it's hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. But there's a catch: These details, while useful, may not be standardized. Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system like [Honeycomb](https://www.honeycomb.io/), [Groundcover](https://www.groundcover.com/) [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. When the OTel standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics are collected to form a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility, debugging is faster, performance work is more deliberate, and improvements become evidence-based rather than guided by hunches. From da9af865702d40c6e3615ab8ac0a27f3183b2a5a Mon Sep 17 00:00:00 2001 From: Dave Moore <850537+dcmoore@users.noreply.github.com> Date: Wed, 21 Jan 2026 21:43:52 -0800 Subject: [PATCH 079/131] Update open-telemetry-practice.md add a missing comma --- practices/open-telemetry-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/open-telemetry-practice.md b/practices/open-telemetry-practice.md index 24d2981..015cba6 100644 --- a/practices/open-telemetry-practice.md +++ b/practices/open-telemetry-practice.md @@ -1,6 +1,6 @@ # Adopt the OpenTelemetry Standard -Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it's hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. But there's a catch: These details, while useful, may not be standardized. Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system like [Honeycomb](https://www.honeycomb.io/), [Groundcover](https://www.groundcover.com/) [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. +Most systems perform thousands of actions every minute: serving pages, calling APIs, and writing to databases. Without records of what happens during those actions, it's hard to see where time is spent or why errors occur. Telemetry data fills that gap by capturing the details behind system behavior. But there's a catch: These details, while useful, may not be standardized. Without a shared standard for records, each service describes its behavior differently. One may log in JSON, another might use a custom tagging system, and a third could send metrics in a format only one tool understands. This creates fragmented, hard-to-compare data. OpenTelemetry (OTel) fixes that with its collection of APIs, SDKs, and open-source tools that allow developers to work with telemetry data in a standardized way. Teams can instrument their systems consistently and send metrics, logs, and traces to a central monitoring system like [Honeycomb](https://www.honeycomb.io/), [Groundcover](https://www.groundcover.com/), [Grafana](https://grafana.com/), [DataDog](https://www.datadoghq.com/), [Jaeger](https://www.jaegertracing.io/), [Fluent Bit](https://fluentbit.io/), or [Uptrace](https://uptrace.dev/). Since most popular monitoring systems support the OTel format, teams can switch platforms without major disruptions. When the OTel standard is adopted, teams can see how requests move through the system. Scattered logs and isolated metrics are collected to form a single, connected view of system behavior. It shows where time is spent, where failures occur, and how components interact. With that visibility, debugging is faster, performance work is more deliberate, and improvements become evidence-based rather than guided by hunches. From 08a43e96b9617ceac0cc56cdbd550f85384693d2 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 16:21:02 -0700 Subject: [PATCH 080/131] bun init --- .gitignore | 33 ++++++++++++++ tools/.gitignore | 34 ++++++++++++++ tools/CLAUDE.md | 106 ++++++++++++++++++++++++++++++++++++++++++++ tools/README.md | 15 +++++++ tools/bun.lock | 26 +++++++++++ tools/index.ts | 1 + tools/package.json | 12 +++++ tools/tsconfig.json | 29 ++++++++++++ 8 files changed, 256 insertions(+) create mode 100644 tools/.gitignore create mode 100644 tools/CLAUDE.md create mode 100644 tools/README.md create mode 100644 tools/bun.lock create mode 100644 tools/index.ts create mode 100644 tools/package.json create mode 100644 tools/tsconfig.json diff --git a/.gitignore b/.gitignore index e43b0f9..a14702c 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,34 @@ +# dependencies (bun install) +node_modules + +# output +out +dist +*.tgz + +# code coverage +coverage +*.lcov + +# logs +logs +_.log +report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json + +# dotenv environment variable files +.env +.env.development.local +.env.test.local +.env.production.local +.env.local + +# caches +.eslintcache +.cache +*.tsbuildinfo + +# IntelliJ based IDEs +.idea + +# Finder (MacOS) folder config .DS_Store diff --git a/tools/.gitignore b/tools/.gitignore new file mode 100644 index 0000000..a14702c --- /dev/null +++ b/tools/.gitignore @@ -0,0 +1,34 @@ +# dependencies (bun install) +node_modules + +# output +out +dist +*.tgz + +# code coverage +coverage +*.lcov + +# logs +logs +_.log +report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json + +# dotenv environment variable files +.env +.env.development.local +.env.test.local +.env.production.local +.env.local + +# caches +.eslintcache +.cache +*.tsbuildinfo + +# IntelliJ based IDEs +.idea + +# Finder (MacOS) folder config +.DS_Store diff --git a/tools/CLAUDE.md b/tools/CLAUDE.md new file mode 100644 index 0000000..764c1dd --- /dev/null +++ b/tools/CLAUDE.md @@ -0,0 +1,106 @@ + +Default to using Bun instead of Node.js. + +- Use `bun ` instead of `node ` or `ts-node ` +- Use `bun test` instead of `jest` or `vitest` +- Use `bun build ` instead of `webpack` or `esbuild` +- Use `bun install` instead of `npm install` or `yarn install` or `pnpm install` +- Use `bun run + + +``` + +With the following `frontend.tsx`: + +```tsx#frontend.tsx +import React from "react"; +import { createRoot } from "react-dom/client"; + +// import .css files directly and it works +import './index.css'; + +const root = createRoot(document.body); + +export default function Frontend() { + return

Hello, world!

; +} + +root.render(); +``` + +Then, run index.ts + +```sh +bun --hot ./index.ts +``` + +For more information, read the Bun API docs in `node_modules/bun-types/docs/**.mdx`. diff --git a/tools/README.md b/tools/README.md new file mode 100644 index 0000000..0c2c25b --- /dev/null +++ b/tools/README.md @@ -0,0 +1,15 @@ +# tools + +To install dependencies: + +```bash +bun install +``` + +To run: + +```bash +bun run index.ts +``` + +This project was created using `bun init` in bun v1.3.4. [Bun](https://bun.com) is a fast all-in-one JavaScript runtime. diff --git a/tools/bun.lock b/tools/bun.lock new file mode 100644 index 0000000..c1908df --- /dev/null +++ b/tools/bun.lock @@ -0,0 +1,26 @@ +{ + "lockfileVersion": 1, + "configVersion": 1, + "workspaces": { + "": { + "name": "tools", + "devDependencies": { + "@types/bun": "latest", + }, + "peerDependencies": { + "typescript": "^5", + }, + }, + }, + "packages": { + "@types/bun": ["@types/bun@1.3.6", "", { "dependencies": { "bun-types": "1.3.6" } }, "sha512-uWCv6FO/8LcpREhenN1d1b6fcspAB+cefwD7uti8C8VffIv0Um08TKMn98FynpTiU38+y2dUO55T11NgDt8VAA=="], + + "@types/node": ["@types/node@25.0.9", "", { "dependencies": { "undici-types": "~7.16.0" } }, "sha512-/rpCXHlCWeqClNBwUhDcusJxXYDjZTyE8v5oTO7WbL8eij2nKhUeU89/6xgjU7N4/Vh3He0BtyhJdQbDyhiXAw=="], + + "bun-types": ["bun-types@1.3.6", "", { "dependencies": { "@types/node": "*" } }, "sha512-OlFwHcnNV99r//9v5IIOgQ9Uk37gZqrNMCcqEaExdkVq3Avwqok1bJFmvGMCkCE0FqzdY8VMOZpfpR3lwI+CsQ=="], + + "typescript": ["typescript@5.9.3", "", { "bin": { "tsc": "bin/tsc", "tsserver": "bin/tsserver" } }, "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw=="], + + "undici-types": ["undici-types@7.16.0", "", {}, "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="], + } +} diff --git a/tools/index.ts b/tools/index.ts new file mode 100644 index 0000000..f67b2c6 --- /dev/null +++ b/tools/index.ts @@ -0,0 +1 @@ +console.log("Hello via Bun!"); \ No newline at end of file diff --git a/tools/package.json b/tools/package.json new file mode 100644 index 0000000..4752c4b --- /dev/null +++ b/tools/package.json @@ -0,0 +1,12 @@ +{ + "name": "tools", + "module": "index.ts", + "type": "module", + "private": true, + "devDependencies": { + "@types/bun": "latest" + }, + "peerDependencies": { + "typescript": "^5" + } +} diff --git a/tools/tsconfig.json b/tools/tsconfig.json new file mode 100644 index 0000000..bfa0fea --- /dev/null +++ b/tools/tsconfig.json @@ -0,0 +1,29 @@ +{ + "compilerOptions": { + // Environment setup & latest features + "lib": ["ESNext"], + "target": "ESNext", + "module": "Preserve", + "moduleDetection": "force", + "jsx": "react-jsx", + "allowJs": true, + + // Bundler mode + "moduleResolution": "bundler", + "allowImportingTsExtensions": true, + "verbatimModuleSyntax": true, + "noEmit": true, + + // Best practices + "strict": true, + "skipLibCheck": true, + "noFallthroughCasesInSwitch": true, + "noUncheckedIndexedAccess": true, + "noImplicitOverride": true, + + // Some stricter flags (disabled by default) + "noUnusedLocals": false, + "noUnusedParameters": false, + "noPropertyAccessFromIndexSignature": false + } +} From 486ccbd5db5400f838e826fa75cc1e2c943557e9 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 16 Jan 2026 20:08:39 -0700 Subject: [PATCH 081/131] setup linting --- tools/CLAUDE.md | 106 ----------------- tools/index.ts | 27 ++++- tools/src/Registry.ts | 19 +++ tools/src/example-capability.ts | 88 ++++++++++++++ tools/src/rules/Problem.ts | 46 ++++++++ tools/src/rules/Rule.ts | 21 ++++ tools/src/rules/raw/NewLineAfterHeadings.ts | 15 +++ .../rules/raw/NewlineAfterHeadings.test.ts | 29 +++++ tools/src/template-capability.ts | 108 ++++++++++++++++++ 9 files changed, 352 insertions(+), 107 deletions(-) delete mode 100644 tools/CLAUDE.md create mode 100644 tools/src/Registry.ts create mode 100644 tools/src/example-capability.ts create mode 100644 tools/src/rules/Problem.ts create mode 100644 tools/src/rules/Rule.ts create mode 100644 tools/src/rules/raw/NewLineAfterHeadings.ts create mode 100644 tools/src/rules/raw/NewlineAfterHeadings.test.ts create mode 100644 tools/src/template-capability.ts diff --git a/tools/CLAUDE.md b/tools/CLAUDE.md deleted file mode 100644 index 764c1dd..0000000 --- a/tools/CLAUDE.md +++ /dev/null @@ -1,106 +0,0 @@ - -Default to using Bun instead of Node.js. - -- Use `bun ` instead of `node ` or `ts-node ` -- Use `bun test` instead of `jest` or `vitest` -- Use `bun build ` instead of `webpack` or `esbuild` -- Use `bun install` instead of `npm install` or `yarn install` or `pnpm install` -- Use `bun run - - -``` - -With the following `frontend.tsx`: - -```tsx#frontend.tsx -import React from "react"; -import { createRoot } from "react-dom/client"; - -// import .css files directly and it works -import './index.css'; - -const root = createRoot(document.body); - -export default function Frontend() { - return

Hello, world!

; -} - -root.render(); -``` - -Then, run index.ts - -```sh -bun --hot ./index.ts -``` - -For more information, read the Bun API docs in `node_modules/bun-types/docs/**.mdx`. diff --git a/tools/index.ts b/tools/index.ts index f67b2c6..aa43442 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -1 +1,26 @@ -console.log("Hello via Bun!"); \ No newline at end of file +import { example_capability } from "./src/example-capability"; +import { Registry } from "./src/Registry"; +import { NewLineAfterHeadings } from "./src/rules/raw/NewlineAfterHeadings"; + +type ExitCode = 0 | 1 + +function runRawLintRules(input: string): ExitCode { + const registry = new Registry() + registry.register(new NewLineAfterHeadings({ 'new-line-after-headings': 'error' })) + registry.run(input) + registry.print() + + const problems = registry.getProblems() + + if (problems.length !== 0) { + return 1 + } + return 0 +} + +process.exit(runRawLintRules(example_capability)) + +// run raw lint rules on the text +// parse practice into practice data structure +// run structrual lint rules on the data structure +// output result diff --git a/tools/src/Registry.ts b/tools/src/Registry.ts new file mode 100644 index 0000000..cdb49e6 --- /dev/null +++ b/tools/src/Registry.ts @@ -0,0 +1,19 @@ +import type { Rule } from "./rules/Rule"; + +export class Registry { + private rules: Rule[] = [] + register(rule: Rule) { + this.rules.push(rule) + } + run(input: T) { + this.rules.forEach(rule => rule.run(input)) + } + getProblems() { + return this.rules.map(rule => rule.getProblems()) + } + print(){ + this.rules.forEach(rule => rule.print()) + } +} + + diff --git a/tools/src/example-capability.ts b/tools/src/example-capability.ts new file mode 100644 index 0000000..0a6491a --- /dev/null +++ b/tools/src/example-capability.ts @@ -0,0 +1,88 @@ +export const example_capability = `# [Well-Being](https://dora.dev/capabilities/well-being/) +The Well-Being capability focuses on the overall physical, mental, and emotional health of employees. In fact, DORA has found a compelling link between well-being and three workplace factors: deployment pain, rework, and burnout. + +_Deployment pain_ refers to the amount of effort it takes to safely apply changes to live environments. The more "pain" developers experience during deployment, the lower their well-being tends to be. + +_Rework_ is any unplanned work that arises as a result of low-quality software. Rework does NOT include refactoring; that's done as a routine part of the development process. The more rework developers are required to do, the lower their well-being tends to be. + +According to [Dr. Christina Maslach](https://psychology.berkeley.edu/people/christina-maslach), _burnout_ is physical or mental exhaustion that results from one of the following: work overload, lack of control, insufficient rewards, a breakdown of community, the absence of fairness, or value conflicts between an individual and organization. The more an individual feels burnout, the lower their well-being tends to be. + +When employees experience high levels of well-being, better organizational performance and increased retention tend to follow. Below, we'll discuss some ways to achieve high levels of well-being among your team(s). + +## Nuances + +This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. + +### Repetitive Work Creates Burnout + +There's a strong link between repetitive or toilsome work and burnout. Monotonous tasks, while not novel or interesting, are still important and need to get done. But it's worth trying to automate those tasks, or at least spread this workload across the team, where effective and practical. That way, mind-numbing work doesn't pile up on select team members. + +### Rework Is Inevitable + +No matter how hard a team tries, there is always going to be rework. The goal shouldn't be to completely eliminate rework through never-ending quality investments. Take a more pragmatic approach. Aim to _reduce_ unreasonable amounts of rework through incremental investments in quality -- this is more likely to yield a strong return in terms of team productivity. + +## Assessment + +To assess how mature your team or organization is in this capability, complete this short exercise. + +Consider the descriptions below and score your team on the Well-Being capability. Generally, score a 1 if you feel employees are overwhelmed and undervalued, a 2 if you feel employees are managing the load and there is a lot of room for improvement, a 3 if you feel employees are finding work-life balance and there is some room for improvement, and a 4 if you feel your employees are thriving in terms of their well-being. + +Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. + +1. **Overwhelmed and Undervalued:** Employees are consistently overwhelmed by work demands, have little control over their work, and feel undervalued and unrewarded. There is a breakdown in community and a lack of fairness in decision-making processes. +2. **Managing the Load:** Teams are coping with work demands, but some employees are still struggling with a lack of control and autonomy, and rewards and recognition are inconsistent. While there are some efforts to build a sense of community, aligning organizational and individual values is still a work in progress. +3. **Finding Balance:** Employees are generally happy and engaged, with a good work-life balance. Teams are making progress in addressing work overload, increasing control and autonomy, and providing sufficient rewards and recognition. There is still room for improvement in building a sense of community and fairness. +4. **Thriving Culture:** Employees are highly engaged, motivated, and happy. There is a strong sense of well-being. Teams consistently deliver high-quality work in a supportive and fair work environment. There is a clear alignment between organizational and individual values, and opportunities for growth and development are present. + +The number you selected represents your overall score for this capability. If you feel like the general well-being of your team fits somewhere in between two scores, it's okay to use a decimal. For example, if you think employees are somewhere between managing their loads and finding a good balance, you would score a 2.5. + +Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of Well-Being; you would likely benefit from evaluating your scores in other capabilities. + +## Supporting Practices + +The following is a curated list of supporting practices to consider when looking to improve your team's Well-Being capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. + +### Track Employee Engagement + +Designing and routinely sending out an employee engagement survey that fits your organization's culture can uncover issues affecting employee well-being and organizational performance. These surveys can be great tools for listening, understanding, and fostering a culture of continuous improvement. If this data is widely shared and acted upon, it can show employees that their input can drive meaningful change. See some [helpful guidelines here.](https://hbr.org/2002/02/getting-the-truth-into-workplace-surveys) + +### Host Skip-Level 1:1s + +Skip-level 1:1s create a direct channel for information flow, regardless of one's status in the organization. They provide a safe space for employees to share concerns, ideas, and feedback with their manager's manager or another high-level leader. They also demonstrate to employees that their voices are valued, giving them a sense of empowerment and autonomy. The goal is to foster open communication and build trust across the organization. By listening to employees' concerns, leaders can identify and address potential issues before they escalate, reducing turnover and improving job satisfaction. Leaders can also offer guidance, mentorship, and opportunities for growth, aligning employees' goals with organizational objectives. + +### Implement Employee-Recognition Programs + +Establish structured programs to recognize employees for their contributions and achievements. This could include monthly awards, public acknowledgments during team meetings, a digital "kudos" board, or personalized appreciation notes. Share specific examples of outstanding work or helpful behaviors. By promoting a culture of appreciation and peer-to-peer acknowledgment, team morale and motivation improve significantly. + +### Allocate Time To Have Fun and Build Strong Relationships + +Allocating time for fun and relationship-building fosters trust, collaboration, and a sense of belonging among team members. When employees can share positive experiences, it strengthens psychological safety, boosts creativity, and encourages cross-team connections. These elements not only make the workplace more enjoyable but also encourage retention and productivity by signaling that the organization values its employees' well-being. An organization that balances productivity with meaningful relationships creates an environment where employees thrive. + +### Automate Deployment Scripts + +Develop scripts that automate the entire deployment process, including environment preparation, package deployment, configuration, and post-deployment testing. By scripting these steps, you eliminate manual interventions, reduce the risk of human error, and lessen deployment pain. A repeatable and reliable deployment process can then be triggered with minimal effort. This enhances not only deployment speed and consistency but also employee well-being. + +### Incorporate Anomaly-Detection Tooling + +To avoid having to manually verify systems are working in production after a deployment, incorporate tooling that flags anomalies in your system's various environments. Examples of such flags include: reporting spikes in compute or network resources, reporting new error-level log events, A/B testing two versions of a system with the same traffic, running automated user acceptance tests, and so on. Lower deployment pain equals higher well-being. + +## Adjacent Capabilities + +The following capabilities will be valuable for you and your team to explore, as they are either: + +- Related (they cover similar territory to Well-Being) +- Upstream (they are a pre-requisite for Well-Being) +- Downstream (Well-Being is a pre-requisite for them) + +### [Job Satisfaction](/capabilities/job-satisfaction.md) - Related + +Job satisfaction is closely linked to well-being, as it reflects how content individuals are with their roles and work environment. When employees are satisfied with their jobs, they are more likely to experience higher levels of well-being. Focusing on factors that improve job satisfaction, such as meaningful work and recognition, can boost overall well-being. + +### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) - Related + +A generative organizational culture is characterized by high cooperation, shared risks, and a focus on performance. Employee well-being is supported in such an environment where individuals feel safe to take risks and collaborate. Improving organizational culture directly enhances employee well-being by reducing stressors associated with blame and fear. + +### [Transformational Leadership](/capabilities/transformational-leadership.md) - Upstream + +Transformational leaders motivate team members to exceed expectations by providing a clear vision, support, inspirational communication, intellectual stimulation, and personal recognition. Having such leadership in place can help address many organizational risk factors for burnout, such as lack of control and insufficient rewards. By promoting a positive work environment, transformational leaders enhance well-being within their teams. +` diff --git a/tools/src/rules/Problem.ts b/tools/src/rules/Problem.ts new file mode 100644 index 0000000..0ab48c8 --- /dev/null +++ b/tools/src/rules/Problem.ts @@ -0,0 +1,46 @@ +export type FileLocation = { + col: number + row: number +} + +export type Level = 'silent' | 'warning' | 'error' + +export class Problem { + private id: Ids + private message: string + private level: Level + private fileLocation: FileLocation + + private readonly colors = { + reset: "\x1b[0m", + bold: "\x1b[1m", + dim: "\x1b[2m", + red: "\x1b[31m", + yellow: "\x1b[33m", + cyan: "\x1b[36m", + gray: "\x1b[90m", + }; + + constructor(id: Ids, level: Level, fileLocation: FileLocation, message: string) { + this.id = id + this.message = message + this.level = level + this.fileLocation = fileLocation + } + + print() { + if (this.level === 'silent') return; + + const { red, yellow, cyan, gray, bold, reset } = this.colors; + + const color = this.level === 'error' ? red : yellow; + const label = this.level.toUpperCase(); + + console.log( + `${bold}${color}${label}${reset} ${bold}${this.id}${reset}: ${this.message}` + ); + console.log( + ` ${gray}at${reset} ${cyan}${this.fileLocation.row}${reset}:${cyan}${this.fileLocation.col}${reset}\n` + ); + } +} diff --git a/tools/src/rules/Rule.ts b/tools/src/rules/Rule.ts new file mode 100644 index 0000000..036b20f --- /dev/null +++ b/tools/src/rules/Rule.ts @@ -0,0 +1,21 @@ +import { type FileLocation, type Level, Problem } from "./Problem" + +type RuleConfig = Record + +export abstract class Rule { + private problems: Problem[] = [] + private config: RuleConfig + constructor(config: RuleConfig) { + this.config = config + } + abstract run(subject: In): void; + protected report(id: Ids, message: string, fileLocation: FileLocation) { + this.problems.push(new Problem(id, this.config[id], fileLocation, message)) + } + getProblems(): Problem[] { + return this.problems + } + print() { + this.problems.forEach(p => p.print()) + } +} diff --git a/tools/src/rules/raw/NewLineAfterHeadings.ts b/tools/src/rules/raw/NewLineAfterHeadings.ts new file mode 100644 index 0000000..4202777 --- /dev/null +++ b/tools/src/rules/raw/NewLineAfterHeadings.ts @@ -0,0 +1,15 @@ +import { Rule } from "../Rule"; + +export class NewLineAfterHeadings extends Rule { + override run(subject: string) { + const lines = subject.split('\n') + for (let i = 0; i < lines.length; i++) { + if (lines[i]?.charAt(0) === '#' && i+1 < lines.length && lines[i+1] !== '') { + this.report('new-line-after-headings', 'You must have a new line after headings.', { + row: i + 2, + col: 1, + }) + } + } + } +} diff --git a/tools/src/rules/raw/NewlineAfterHeadings.test.ts b/tools/src/rules/raw/NewlineAfterHeadings.test.ts new file mode 100644 index 0000000..13dd48a --- /dev/null +++ b/tools/src/rules/raw/NewlineAfterHeadings.test.ts @@ -0,0 +1,29 @@ +import { describe, it, expect } from 'bun:test' +import { NewLineAfterHeadings } from './NewlineAfterHeadings' + +const mkRule = () => new NewLineAfterHeadings({'my-rule': 'silent'}) + +describe(NewLineAfterHeadings.name, () => { + it('should fail if there is no newline after a heading', () => { + const rule = mkRule() + rule.run(` +# Some Heading +This is not correct. +`) + const problems = rule.getProblems() + expect(problems).not.toBeEmpty() + }) + it('should succeed if there is a newline after a heading', () => { + const rule = mkRule() + rule.run(`# Some Heading + +This is not correct. +`) + expect(rule.getProblems()).toBeEmpty() + }) + it('should not report when heading is last line (This is a different lint error)', () => { + const rule = mkRule() + rule.run(`# Some Heading`) + expect(rule.getProblems()).toBeEmpty() + }) +}) diff --git a/tools/src/template-capability.ts b/tools/src/template-capability.ts new file mode 100644 index 0000000..0a5d28f --- /dev/null +++ b/tools/src/template-capability.ts @@ -0,0 +1,108 @@ + +type TitleDescription = { + title: string + description: string +} + +type Practice = { + title: string + description: string + url?: string +} + +type AdjacentCapabilities = { + title: string + description: string + url: string + relationship: 'Related' | 'Upstream' | 'Downstream' +} + +type AssessmentItems = { + minimal: TitleDescription + basic: TitleDescription + good: TitleDescription + excelent: TitleDescription +} + +type Capability = { + title: string + intro: string + doraUrl: string + nuances: TitleDescription[] + assessment: AssessmentItems + practices: Practice[] + adjacentCapabilities: AdjacentCapabilities[] +} + +const displayAdjacentCapabilities = ({title, description, url, relationship}: AdjacentCapabilities) => `### [${title}](${url}) - ${relationship} + +${description}` + +const displayAssessemntItem = ({title, description}: TitleDescription) => + `**${title}:** ${description}` + +const displayPractice = ({ title, description, url: link }: Practice) => { + if (link) { + return `### ${title} + +${description}` + } + return +} + +export function template_thingy ({ + title, + intro, + doraUrl, + nuances, + assessment, + practices, + adjacentCapabilities +}: Capability) { + const assessmentIntro = `To assess how mature your team or organization is in this capability, complete this short exercise. + +Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score.` + const assessmentOutro = `The number you selected represents your overall score for this capability. If you feel like your company fits somewhere in between two scores, it's okay to use a decimal. Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient. Instead you would likely benefit from evaluating your scores in other capabilities.` + + + return `# [${title}](${doraUrl}) + +${intro} + +## Nuances + +${nuances.map(nuance => `${nuance.title} + +${nuance.description} + +`)} + +## Assessment + +${assessmentIntro} + +1. ${displayAssessemntItem(assessment.minimal)} +2. ${displayAssessemntItem(assessment.basic)} +3. ${displayAssessemntItem(assessment.good)} +4. ${displayAssessemntItem(assessment.excelent)} + +${assessmentOutro} + +## Supporting Practices + +The following is a curated list of supporting practices to consider when looking to improve your team's ${title} capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. + +${practices.map(displayPractice)} + +## Adjacent Capabilities + +The following capabilities will be valuable for you and your team to explore, as they are either: + +- Related (they cover similar territory to ${title}) +- Upstream (they are a pre-requisite for ${title}) +- Downstream (${title} is a pre-requisite for them) + +${adjacentCapabilities.map(displayAdjacentCapabilities)} +` +} + From 3f1214f4453cc4562ebc814eb405eaaae1498fbd Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 21 Jan 2026 16:07:45 -0700 Subject: [PATCH 082/131] add capabilities --- tools/README.md | 17 ++++----- tools/index.ts | 40 ++++++++++++++------- tools/src/Registry.ts | 4 +-- tools/src/rules/Problem.ts | 9 +++-- tools/src/rules/Rule.ts | 7 ++-- tools/src/rules/raw/NewLineAfterHeadings.ts | 8 ++--- tools/src/types.d.ts | 5 +++ tools/types.d.ts | 11 ++++++ 8 files changed, 68 insertions(+), 33 deletions(-) create mode 100644 tools/src/types.d.ts create mode 100644 tools/types.d.ts diff --git a/tools/README.md b/tools/README.md index 0c2c25b..40e1d1f 100644 --- a/tools/README.md +++ b/tools/README.md @@ -1,15 +1,12 @@ -# tools +# Open Practice Repository Tooling -To install dependencies: +Roadmap of features that we will want to implement starting with very simple linting. -```bash -bun install -``` +## Linting -To run: +Features: +- [x] lint all capabilities -```bash -bun run index.ts -``` +Rules: +- [x] New Line after all headings -This project was created using `bun init` in bun v1.3.4. [Bun](https://bun.com) is a fast all-in-one JavaScript runtime. diff --git a/tools/index.ts b/tools/index.ts index aa43442..db80615 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -1,24 +1,40 @@ -import { example_capability } from "./src/example-capability"; +import { join } from 'node:path' import { Registry } from "./src/Registry"; -import { NewLineAfterHeadings } from "./src/rules/raw/NewlineAfterHeadings"; +import { NewLineAfterHeadings } from "./src/rules/raw/NewLineAfterHeadings"; +import { readdirSync } from 'node:fs'; + +global.paths = { + capabilities: join(import.meta.dir, '..', 'capabilities'), + practices: join(import.meta.dir, '..', 'practices'), + resources: join(import.meta.dir, '..', 'resources'), + templates: join(import.meta.dir, '..', 'templates'), +} + +async function getAllFrom(folder: string) { + return await Promise.all(readdirSync(folder) + .map(async (file) => ({ + filename: join(folder, file), + content: await Bun.file(join(folder, file)).text() + }))) +} type ExitCode = 0 | 1 -function runRawLintRules(input: string): ExitCode { - const registry = new Registry() - registry.register(new NewLineAfterHeadings({ 'new-line-after-headings': 'error' })) +function runRawLintRules(input: LintableFile): ExitCode { + const registry = new Registry() + registry.register(new NewLineAfterHeadings({ + 'new-line-after-headings': 'error' + })) registry.run(input) registry.print() - const problems = registry.getProblems() - - if (problems.length !== 0) { - return 1 - } - return 0 + if (registry.isssuesWereFound()) return 0 + return 1 } -process.exit(runRawLintRules(example_capability)) +process.exit((await getAllFrom(global.paths.capabilities)) + .map(lf => runRawLintRules(lf)) + .reduce((_, c) => c === 1 ? 1 : 0)) // run raw lint rules on the text // parse practice into practice data structure diff --git a/tools/src/Registry.ts b/tools/src/Registry.ts index cdb49e6..297a4c2 100644 --- a/tools/src/Registry.ts +++ b/tools/src/Registry.ts @@ -8,8 +8,8 @@ export class Registry { run(input: T) { this.rules.forEach(rule => rule.run(input)) } - getProblems() { - return this.rules.map(rule => rule.getProblems()) + isssuesWereFound() { + return this.rules.map(r => r.hasProblems()).reduce((a, c) => c === true ? true : false) } print(){ this.rules.forEach(rule => rule.print()) diff --git a/tools/src/rules/Problem.ts b/tools/src/rules/Problem.ts index 0ab48c8..9a642f7 100644 --- a/tools/src/rules/Problem.ts +++ b/tools/src/rules/Problem.ts @@ -7,6 +7,7 @@ export type Level = 'silent' | 'warning' | 'error' export class Problem { private id: Ids + private filename: string private message: string private level: Level private fileLocation: FileLocation @@ -16,13 +17,15 @@ export class Problem { bold: "\x1b[1m", dim: "\x1b[2m", red: "\x1b[31m", + purple: "\x1b[0;35m", yellow: "\x1b[33m", cyan: "\x1b[36m", gray: "\x1b[90m", }; - constructor(id: Ids, level: Level, fileLocation: FileLocation, message: string) { + constructor(id: Ids, level: Level, filename: string, fileLocation: FileLocation, message: string) { this.id = id + this.filename = filename this.message = message this.level = level this.fileLocation = fileLocation @@ -31,7 +34,7 @@ export class Problem { print() { if (this.level === 'silent') return; - const { red, yellow, cyan, gray, bold, reset } = this.colors; + const { red, yellow, cyan, gray, bold, reset, purple } = this.colors; const color = this.level === 'error' ? red : yellow; const label = this.level.toUpperCase(); @@ -40,7 +43,7 @@ export class Problem { `${bold}${color}${label}${reset} ${bold}${this.id}${reset}: ${this.message}` ); console.log( - ` ${gray}at${reset} ${cyan}${this.fileLocation.row}${reset}:${cyan}${this.fileLocation.col}${reset}\n` + ` ${gray}in${reset} ${purple}"${this.filename}"${reset}\n ${gray}at${reset} ${cyan}${this.fileLocation.row}${reset}:${cyan}${this.fileLocation.col}${reset}\n` ); } } diff --git a/tools/src/rules/Rule.ts b/tools/src/rules/Rule.ts index 036b20f..7710d6d 100644 --- a/tools/src/rules/Rule.ts +++ b/tools/src/rules/Rule.ts @@ -9,12 +9,15 @@ export abstract class Rule { this.config = config } abstract run(subject: In): void; - protected report(id: Ids, message: string, fileLocation: FileLocation) { - this.problems.push(new Problem(id, this.config[id], fileLocation, message)) + protected report(filename: string, id: Ids, message: string, fileLocation: FileLocation) { + this.problems.push(new Problem(id, this.config[id], filename, fileLocation, message)) } getProblems(): Problem[] { return this.problems } + hasProblems() { + return this.problems.length !== 0 + } print() { this.problems.forEach(p => p.print()) } diff --git a/tools/src/rules/raw/NewLineAfterHeadings.ts b/tools/src/rules/raw/NewLineAfterHeadings.ts index 4202777..b7ac30e 100644 --- a/tools/src/rules/raw/NewLineAfterHeadings.ts +++ b/tools/src/rules/raw/NewLineAfterHeadings.ts @@ -1,11 +1,11 @@ import { Rule } from "../Rule"; -export class NewLineAfterHeadings extends Rule { - override run(subject: string) { - const lines = subject.split('\n') +export class NewLineAfterHeadings extends Rule { + override run({ filename, content }: LintableFile) { + const lines = content.split('\n') for (let i = 0; i < lines.length; i++) { if (lines[i]?.charAt(0) === '#' && i+1 < lines.length && lines[i+1] !== '') { - this.report('new-line-after-headings', 'You must have a new line after headings.', { + this.report(filename, 'new-line-after-headings', 'You must have a new line after headings.', { row: i + 2, col: 1, }) diff --git a/tools/src/types.d.ts b/tools/src/types.d.ts new file mode 100644 index 0000000..2e01dd0 --- /dev/null +++ b/tools/src/types.d.ts @@ -0,0 +1,5 @@ + +type LintableFile = { + filename: string + content: string +} diff --git a/tools/types.d.ts b/tools/types.d.ts new file mode 100644 index 0000000..000cc9b --- /dev/null +++ b/tools/types.d.ts @@ -0,0 +1,11 @@ +declare global { + var paths: { + capabilities: string + practices: string + resources: string + templates: string + }; +} + + +export {}; From 05c1081bc44833e0ea82f895e22db20170b52782 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 21 Jan 2026 18:21:08 -0700 Subject: [PATCH 083/131] clean up and organize linter --- tools/index.ts | 46 +++------- .../rules/raw/NewLineAfterHeadings.ts | 2 +- .../rules/raw/NewlineAfterHeadings.test.ts | 2 +- tools/src/{rules => }/Problem.ts | 0 tools/src/Registry.ts | 2 +- tools/src/Repository.ts | 19 ++++ tools/src/{rules => }/Rule.ts | 12 ++- tools/src/Runner.ts | 27 ++++++ tools/src/example-capability.ts | 88 ------------------- 9 files changed, 70 insertions(+), 128 deletions(-) rename tools/{src => }/rules/raw/NewLineAfterHeadings.ts (93%) rename tools/{src => }/rules/raw/NewlineAfterHeadings.test.ts (93%) rename tools/src/{rules => }/Problem.ts (100%) create mode 100644 tools/src/Repository.ts rename tools/src/{rules => }/Rule.ts (62%) create mode 100644 tools/src/Runner.ts delete mode 100644 tools/src/example-capability.ts diff --git a/tools/index.ts b/tools/index.ts index db80615..d724619 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -1,41 +1,21 @@ -import { join } from 'node:path' -import { Registry } from "./src/Registry"; -import { NewLineAfterHeadings } from "./src/rules/raw/NewLineAfterHeadings"; -import { readdirSync } from 'node:fs'; +import { NewLineAfterHeadings } from "./rules/raw/NewLineAfterHeadings"; +import { Runner } from './src/Runner'; +import { OpenPracticesRepository } from './src/Repository'; -global.paths = { - capabilities: join(import.meta.dir, '..', 'capabilities'), - practices: join(import.meta.dir, '..', 'practices'), - resources: join(import.meta.dir, '..', 'resources'), - templates: join(import.meta.dir, '..', 'templates'), -} +const runner = new Runner(await OpenPracticesRepository.getCapabilities(), [ + new NewLineAfterHeadings() +]) -async function getAllFrom(folder: string) { - return await Promise.all(readdirSync(folder) - .map(async (file) => ({ - filename: join(folder, file), - content: await Bun.file(join(folder, file)).text() - }))) -} - -type ExitCode = 0 | 1 +runner.run() -function runRawLintRules(input: LintableFile): ExitCode { - const registry = new Registry() - registry.register(new NewLineAfterHeadings({ - 'new-line-after-headings': 'error' - })) - registry.run(input) - registry.print() - - if (registry.isssuesWereFound()) return 0 - return 1 +if (runner.issuesWereFound()) { + runner.print() + process.exit(1) +} else { + console.log("No issues found") + process.exit(0) } -process.exit((await getAllFrom(global.paths.capabilities)) - .map(lf => runRawLintRules(lf)) - .reduce((_, c) => c === 1 ? 1 : 0)) - // run raw lint rules on the text // parse practice into practice data structure // run structrual lint rules on the data structure diff --git a/tools/src/rules/raw/NewLineAfterHeadings.ts b/tools/rules/raw/NewLineAfterHeadings.ts similarity index 93% rename from tools/src/rules/raw/NewLineAfterHeadings.ts rename to tools/rules/raw/NewLineAfterHeadings.ts index b7ac30e..5ffc9d0 100644 --- a/tools/src/rules/raw/NewLineAfterHeadings.ts +++ b/tools/rules/raw/NewLineAfterHeadings.ts @@ -1,4 +1,4 @@ -import { Rule } from "../Rule"; +import { Rule } from "../../src/Rule"; export class NewLineAfterHeadings extends Rule { override run({ filename, content }: LintableFile) { diff --git a/tools/src/rules/raw/NewlineAfterHeadings.test.ts b/tools/rules/raw/NewlineAfterHeadings.test.ts similarity index 93% rename from tools/src/rules/raw/NewlineAfterHeadings.test.ts rename to tools/rules/raw/NewlineAfterHeadings.test.ts index 13dd48a..ecbe132 100644 --- a/tools/src/rules/raw/NewlineAfterHeadings.test.ts +++ b/tools/rules/raw/NewlineAfterHeadings.test.ts @@ -1,5 +1,5 @@ import { describe, it, expect } from 'bun:test' -import { NewLineAfterHeadings } from './NewlineAfterHeadings' +import { NewLineAfterHeadings } from './NewLineAfterHeadings' const mkRule = () => new NewLineAfterHeadings({'my-rule': 'silent'}) diff --git a/tools/src/rules/Problem.ts b/tools/src/Problem.ts similarity index 100% rename from tools/src/rules/Problem.ts rename to tools/src/Problem.ts diff --git a/tools/src/Registry.ts b/tools/src/Registry.ts index 297a4c2..0d7668b 100644 --- a/tools/src/Registry.ts +++ b/tools/src/Registry.ts @@ -1,4 +1,4 @@ -import type { Rule } from "./rules/Rule"; +import type { Rule } from "./Rule"; export class Registry { private rules: Rule[] = [] diff --git a/tools/src/Repository.ts b/tools/src/Repository.ts new file mode 100644 index 0000000..b47db4c --- /dev/null +++ b/tools/src/Repository.ts @@ -0,0 +1,19 @@ +import { readdirSync } from 'node:fs' +import { join } from 'node:path' + +const ROOT = join(Bun.main, '..','..') + +async function getAllFrom(folder: string): Promise { + return await Promise.all(readdirSync(folder) + .map(async (file) => ({ + filename: join(folder, file), + content: await Bun.file(join(folder, file)).text() + }))) +} + +export class OpenPracticesRepository { + static async getCapabilities() { + return getAllFrom(join(ROOT, 'capabilities')) + } +} + diff --git a/tools/src/rules/Rule.ts b/tools/src/Rule.ts similarity index 62% rename from tools/src/rules/Rule.ts rename to tools/src/Rule.ts index 7710d6d..170bcf5 100644 --- a/tools/src/rules/Rule.ts +++ b/tools/src/Rule.ts @@ -4,13 +4,17 @@ type RuleConfig = Record export abstract class Rule { private problems: Problem[] = [] - private config: RuleConfig - constructor(config: RuleConfig) { - this.config = config + private config: RuleConfig | null + constructor(config?: RuleConfig) { + if (config === undefined) { + this.config = null + } else { + this.config = config + } } abstract run(subject: In): void; protected report(filename: string, id: Ids, message: string, fileLocation: FileLocation) { - this.problems.push(new Problem(id, this.config[id], filename, fileLocation, message)) + this.problems.push(new Problem(id, this.config === null ? 'error' : this.config[id], filename, fileLocation, message)) } getProblems(): Problem[] { return this.problems diff --git a/tools/src/Runner.ts b/tools/src/Runner.ts new file mode 100644 index 0000000..cde2d6e --- /dev/null +++ b/tools/src/Runner.ts @@ -0,0 +1,27 @@ +import { Registry } from "./Registry"; +import type { Rule } from "./Rule"; + +export class Runner { + private content: T[] + private registry: Registry + + constructor(content: T[], rules: Rule[]) { + this.content = content + this.registry = new Registry() + for (const rule of rules) { + this.registry.register(rule) + } + } + + run() { + for (const item of this.content) { + this.registry.run(item) + } + } + print() { + this.registry.print() + } + issuesWereFound() { + return this.registry.isssuesWereFound() + } +} diff --git a/tools/src/example-capability.ts b/tools/src/example-capability.ts deleted file mode 100644 index 0a6491a..0000000 --- a/tools/src/example-capability.ts +++ /dev/null @@ -1,88 +0,0 @@ -export const example_capability = `# [Well-Being](https://dora.dev/capabilities/well-being/) -The Well-Being capability focuses on the overall physical, mental, and emotional health of employees. In fact, DORA has found a compelling link between well-being and three workplace factors: deployment pain, rework, and burnout. - -_Deployment pain_ refers to the amount of effort it takes to safely apply changes to live environments. The more "pain" developers experience during deployment, the lower their well-being tends to be. - -_Rework_ is any unplanned work that arises as a result of low-quality software. Rework does NOT include refactoring; that's done as a routine part of the development process. The more rework developers are required to do, the lower their well-being tends to be. - -According to [Dr. Christina Maslach](https://psychology.berkeley.edu/people/christina-maslach), _burnout_ is physical or mental exhaustion that results from one of the following: work overload, lack of control, insufficient rewards, a breakdown of community, the absence of fairness, or value conflicts between an individual and organization. The more an individual feels burnout, the lower their well-being tends to be. - -When employees experience high levels of well-being, better organizational performance and increased retention tend to follow. Below, we'll discuss some ways to achieve high levels of well-being among your team(s). - -## Nuances - -This section outlines common pitfalls, challenges, or limitations teams commonly encounter when applying this capability. The goal here is not to discourage you. Rather, the goal is to arm you with the appropriate context so that you can make an informed decision about when and how to implement the capability with your teams. - -### Repetitive Work Creates Burnout - -There's a strong link between repetitive or toilsome work and burnout. Monotonous tasks, while not novel or interesting, are still important and need to get done. But it's worth trying to automate those tasks, or at least spread this workload across the team, where effective and practical. That way, mind-numbing work doesn't pile up on select team members. - -### Rework Is Inevitable - -No matter how hard a team tries, there is always going to be rework. The goal shouldn't be to completely eliminate rework through never-ending quality investments. Take a more pragmatic approach. Aim to _reduce_ unreasonable amounts of rework through incremental investments in quality -- this is more likely to yield a strong return in terms of team productivity. - -## Assessment - -To assess how mature your team or organization is in this capability, complete this short exercise. - -Consider the descriptions below and score your team on the Well-Being capability. Generally, score a 1 if you feel employees are overwhelmed and undervalued, a 2 if you feel employees are managing the load and there is a lot of room for improvement, a 3 if you feel employees are finding work-life balance and there is some room for improvement, and a 4 if you feel your employees are thriving in terms of their well-being. - -Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score. - -1. **Overwhelmed and Undervalued:** Employees are consistently overwhelmed by work demands, have little control over their work, and feel undervalued and unrewarded. There is a breakdown in community and a lack of fairness in decision-making processes. -2. **Managing the Load:** Teams are coping with work demands, but some employees are still struggling with a lack of control and autonomy, and rewards and recognition are inconsistent. While there are some efforts to build a sense of community, aligning organizational and individual values is still a work in progress. -3. **Finding Balance:** Employees are generally happy and engaged, with a good work-life balance. Teams are making progress in addressing work overload, increasing control and autonomy, and providing sufficient rewards and recognition. There is still room for improvement in building a sense of community and fairness. -4. **Thriving Culture:** Employees are highly engaged, motivated, and happy. There is a strong sense of well-being. Teams consistently deliver high-quality work in a supportive and fair work environment. There is a clear alignment between organizational and individual values, and opportunities for growth and development are present. - -The number you selected represents your overall score for this capability. If you feel like the general well-being of your team fits somewhere in between two scores, it's okay to use a decimal. For example, if you think employees are somewhere between managing their loads and finding a good balance, you would score a 2.5. - -Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient, in the area of Well-Being; you would likely benefit from evaluating your scores in other capabilities. - -## Supporting Practices - -The following is a curated list of supporting practices to consider when looking to improve your team's Well-Being capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. - -### Track Employee Engagement - -Designing and routinely sending out an employee engagement survey that fits your organization's culture can uncover issues affecting employee well-being and organizational performance. These surveys can be great tools for listening, understanding, and fostering a culture of continuous improvement. If this data is widely shared and acted upon, it can show employees that their input can drive meaningful change. See some [helpful guidelines here.](https://hbr.org/2002/02/getting-the-truth-into-workplace-surveys) - -### Host Skip-Level 1:1s - -Skip-level 1:1s create a direct channel for information flow, regardless of one's status in the organization. They provide a safe space for employees to share concerns, ideas, and feedback with their manager's manager or another high-level leader. They also demonstrate to employees that their voices are valued, giving them a sense of empowerment and autonomy. The goal is to foster open communication and build trust across the organization. By listening to employees' concerns, leaders can identify and address potential issues before they escalate, reducing turnover and improving job satisfaction. Leaders can also offer guidance, mentorship, and opportunities for growth, aligning employees' goals with organizational objectives. - -### Implement Employee-Recognition Programs - -Establish structured programs to recognize employees for their contributions and achievements. This could include monthly awards, public acknowledgments during team meetings, a digital "kudos" board, or personalized appreciation notes. Share specific examples of outstanding work or helpful behaviors. By promoting a culture of appreciation and peer-to-peer acknowledgment, team morale and motivation improve significantly. - -### Allocate Time To Have Fun and Build Strong Relationships - -Allocating time for fun and relationship-building fosters trust, collaboration, and a sense of belonging among team members. When employees can share positive experiences, it strengthens psychological safety, boosts creativity, and encourages cross-team connections. These elements not only make the workplace more enjoyable but also encourage retention and productivity by signaling that the organization values its employees' well-being. An organization that balances productivity with meaningful relationships creates an environment where employees thrive. - -### Automate Deployment Scripts - -Develop scripts that automate the entire deployment process, including environment preparation, package deployment, configuration, and post-deployment testing. By scripting these steps, you eliminate manual interventions, reduce the risk of human error, and lessen deployment pain. A repeatable and reliable deployment process can then be triggered with minimal effort. This enhances not only deployment speed and consistency but also employee well-being. - -### Incorporate Anomaly-Detection Tooling - -To avoid having to manually verify systems are working in production after a deployment, incorporate tooling that flags anomalies in your system's various environments. Examples of such flags include: reporting spikes in compute or network resources, reporting new error-level log events, A/B testing two versions of a system with the same traffic, running automated user acceptance tests, and so on. Lower deployment pain equals higher well-being. - -## Adjacent Capabilities - -The following capabilities will be valuable for you and your team to explore, as they are either: - -- Related (they cover similar territory to Well-Being) -- Upstream (they are a pre-requisite for Well-Being) -- Downstream (Well-Being is a pre-requisite for them) - -### [Job Satisfaction](/capabilities/job-satisfaction.md) - Related - -Job satisfaction is closely linked to well-being, as it reflects how content individuals are with their roles and work environment. When employees are satisfied with their jobs, they are more likely to experience higher levels of well-being. Focusing on factors that improve job satisfaction, such as meaningful work and recognition, can boost overall well-being. - -### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) - Related - -A generative organizational culture is characterized by high cooperation, shared risks, and a focus on performance. Employee well-being is supported in such an environment where individuals feel safe to take risks and collaborate. Improving organizational culture directly enhances employee well-being by reducing stressors associated with blame and fear. - -### [Transformational Leadership](/capabilities/transformational-leadership.md) - Upstream - -Transformational leaders motivate team members to exceed expectations by providing a clear vision, support, inspirational communication, intellectual stimulation, and personal recognition. Having such leadership in place can help address many organizational risk factors for burnout, such as lack of control and insufficient rewards. By promoting a positive work environment, transformational leaders enhance well-being within their teams. -` From 41c8de7a8a1a8c8be284379288f520c2cc949b8a Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Thu, 22 Jan 2026 13:19:59 -0700 Subject: [PATCH 084/131] add lint rule: no-trailing-white-space --- tools/index.ts | 4 ++- tools/rules/raw/NewlineAfterHeadings.test.ts | 12 ++++----- tools/rules/raw/NoTrailingWhitespace.test.ts | 28 ++++++++++++++++++++ tools/rules/raw/NoTrailingWhitespace.ts | 15 +++++++++++ tools/src/Problem.ts | 2 ++ 5 files changed, 54 insertions(+), 7 deletions(-) create mode 100644 tools/rules/raw/NoTrailingWhitespace.test.ts create mode 100644 tools/rules/raw/NoTrailingWhitespace.ts diff --git a/tools/index.ts b/tools/index.ts index d724619..5344ce4 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -1,9 +1,11 @@ import { NewLineAfterHeadings } from "./rules/raw/NewLineAfterHeadings"; import { Runner } from './src/Runner'; import { OpenPracticesRepository } from './src/Repository'; +import { NoTrailingWhitespace } from "./rules/raw/NoTrailingWhitespace"; const runner = new Runner(await OpenPracticesRepository.getCapabilities(), [ - new NewLineAfterHeadings() + new NewLineAfterHeadings(), + new NoTrailingWhitespace() ]) runner.run() diff --git a/tools/rules/raw/NewlineAfterHeadings.test.ts b/tools/rules/raw/NewlineAfterHeadings.test.ts index ecbe132..538bca2 100644 --- a/tools/rules/raw/NewlineAfterHeadings.test.ts +++ b/tools/rules/raw/NewlineAfterHeadings.test.ts @@ -1,29 +1,29 @@ import { describe, it, expect } from 'bun:test' import { NewLineAfterHeadings } from './NewLineAfterHeadings' -const mkRule = () => new NewLineAfterHeadings({'my-rule': 'silent'}) +const mkRule = () => new NewLineAfterHeadings({'new-line-after-headings': 'silent'}) describe(NewLineAfterHeadings.name, () => { it('should fail if there is no newline after a heading', () => { const rule = mkRule() - rule.run(` + rule.run({filename:"mock-thing.md", content: ` # Some Heading This is not correct. -`) +`}) const problems = rule.getProblems() expect(problems).not.toBeEmpty() }) it('should succeed if there is a newline after a heading', () => { const rule = mkRule() - rule.run(`# Some Heading + rule.run({filename:"mock-thing.md", content: `# Some Heading This is not correct. -`) +`}) expect(rule.getProblems()).toBeEmpty() }) it('should not report when heading is last line (This is a different lint error)', () => { const rule = mkRule() - rule.run(`# Some Heading`) + rule.run({ filename:"mock-thing.md", content: `# Some Heading` }) expect(rule.getProblems()).toBeEmpty() }) }) diff --git a/tools/rules/raw/NoTrailingWhitespace.test.ts b/tools/rules/raw/NoTrailingWhitespace.test.ts new file mode 100644 index 0000000..1b00861 --- /dev/null +++ b/tools/rules/raw/NoTrailingWhitespace.test.ts @@ -0,0 +1,28 @@ +import { describe, it, expect } from 'bun:test' +import { NoTrailingWhitespace } from './NoTrailingWhitespace' + +const mkRule = () => new NoTrailingWhitespace({'no-trailing-white-space': 'silent'}) + +describe(NoTrailingWhitespace.name, () => { + it('should fail when lines have trailing whitespace', () => { + const rule = mkRule() + const content = ` +# Some Heading +This is not correct. +` + rule.run({ filename: "mock-file.md", content }) + const problems = rule.getProblems() + expect(problems).not.toBeEmpty() + expect(problems[0]?.getFileLocation().row).toEqual(2) + expect(problems[0]?.getFileLocation().col).toEqual(15) + }) + it('should not fail when no lines have trailing white space', () => { + const rule = mkRule() + const content = `# Some Heading +This is not correct. +` + rule.run({ filename: "mock-file.md", content }) + const problems = rule.getProblems() + expect(problems).toBeEmpty() + }) +}) diff --git a/tools/rules/raw/NoTrailingWhitespace.ts b/tools/rules/raw/NoTrailingWhitespace.ts new file mode 100644 index 0000000..9e764ea --- /dev/null +++ b/tools/rules/raw/NoTrailingWhitespace.ts @@ -0,0 +1,15 @@ +import { Rule } from "../../src/Rule"; + +export class NoTrailingWhitespace extends Rule { + override run({ filename, content }: LintableFile) { + const lines = content.split('\n') + for (const [index, line] of lines.entries()) { + if (line !== line.trim()) { + this.report(filename, 'no-trailing-white-space', 'Trailing whitespace is not allowed', { + row: index + 1, + col: line.trim().length+1, + }) + } + } + } +} diff --git a/tools/src/Problem.ts b/tools/src/Problem.ts index 9a642f7..db17a96 100644 --- a/tools/src/Problem.ts +++ b/tools/src/Problem.ts @@ -31,6 +31,8 @@ export class Problem { this.fileLocation = fileLocation } + getFileLocation = () => this.fileLocation + print() { if (this.level === 'silent') return; From 4fed9e3f0a52136d88307e99816c3f6e9ac482fa Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Thu, 22 Jan 2026 13:22:26 -0700 Subject: [PATCH 085/131] fix no-trailing-whitespace-rule --- tools/rules/raw/NoTrailingWhitespace.test.ts | 2 ++ tools/rules/raw/NoTrailingWhitespace.ts | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/rules/raw/NoTrailingWhitespace.test.ts b/tools/rules/raw/NoTrailingWhitespace.test.ts index 1b00861..7f4049a 100644 --- a/tools/rules/raw/NoTrailingWhitespace.test.ts +++ b/tools/rules/raw/NoTrailingWhitespace.test.ts @@ -20,6 +20,8 @@ This is not correct. const rule = mkRule() const content = `# Some Heading This is not correct. +- cool + - beans ` rule.run({ filename: "mock-file.md", content }) const problems = rule.getProblems() diff --git a/tools/rules/raw/NoTrailingWhitespace.ts b/tools/rules/raw/NoTrailingWhitespace.ts index 9e764ea..c2b2631 100644 --- a/tools/rules/raw/NoTrailingWhitespace.ts +++ b/tools/rules/raw/NoTrailingWhitespace.ts @@ -4,10 +4,10 @@ export class NoTrailingWhitespace extends Rule Date: Thu, 22 Jan 2026 16:27:04 -0700 Subject: [PATCH 086/131] rename lint --- tools/index.ts | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/tools/index.ts b/tools/index.ts index 5344ce4..77c7985 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -3,22 +3,17 @@ import { Runner } from './src/Runner'; import { OpenPracticesRepository } from './src/Repository'; import { NoTrailingWhitespace } from "./rules/raw/NoTrailingWhitespace"; -const runner = new Runner(await OpenPracticesRepository.getCapabilities(), [ +const lintableFileRunner = new Runner(await OpenPracticesRepository.getCapabilities(), [ new NewLineAfterHeadings(), new NoTrailingWhitespace() ]) -runner.run() +lintableFileRunner.run() -if (runner.issuesWereFound()) { - runner.print() +if (lintableFileRunner.issuesWereFound()) { + lintableFileRunner.print() process.exit(1) } else { console.log("No issues found") process.exit(0) } - -// run raw lint rules on the text -// parse practice into practice data structure -// run structrual lint rules on the data structure -// output result From 3a162167455993306339c236f1e7591b63efc6d7 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 11:51:18 -0700 Subject: [PATCH 087/131] added remark runner --- tools/bun.lock | 171 ++++++++++++++++++++++++++++++ tools/index.ts | 19 +++- tools/package.json | 13 +++ tools/rules/remark/RemarkRules.ts | 30 ++++++ tools/src/Runner.ts | 2 +- 5 files changed, 231 insertions(+), 4 deletions(-) create mode 100644 tools/rules/remark/RemarkRules.ts diff --git a/tools/bun.lock b/tools/bun.lock index c1908df..584ca6d 100644 --- a/tools/bun.lock +++ b/tools/bun.lock @@ -4,6 +4,19 @@ "workspaces": { "": { "name": "tools", + "dependencies": { + "remark-lint": "^10.0.1", + "remark-lint-checkbox-content-indent": "^5.0.1", + "remark-lint-final-newline": "^3.0.1", + "remark-lint-no-html": "^4.0.1", + "remark-parse": "^11.0.0", + "remark-stringify": "^11.0.0", + "to-vfile": "^8.0.0", + "unified": "^11.0.5", + "vfile": "^6.0.3", + "vfile-reporter": "^8.1.1", + "vfile-reporter-json": "^4.0.0", + }, "devDependencies": { "@types/bun": "latest", }, @@ -15,12 +28,170 @@ "packages": { "@types/bun": ["@types/bun@1.3.6", "", { "dependencies": { "bun-types": "1.3.6" } }, "sha512-uWCv6FO/8LcpREhenN1d1b6fcspAB+cefwD7uti8C8VffIv0Um08TKMn98FynpTiU38+y2dUO55T11NgDt8VAA=="], + "@types/debug": ["@types/debug@4.1.12", "", { "dependencies": { "@types/ms": "*" } }, "sha512-vIChWdVG3LG1SMxEvI/AK+FWJthlrqlTu7fbrlywTkkaONwk/UAGaULXRlf8vkzFBLVm0zkMdCquhL5aOjhXPQ=="], + + "@types/estree": ["@types/estree@1.0.8", "", {}, "sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w=="], + + "@types/estree-jsx": ["@types/estree-jsx@1.0.5", "", { "dependencies": { "@types/estree": "*" } }, "sha512-52CcUVNFyfb1A2ALocQw/Dd1BQFNmSdkuC3BkZ6iqhdMfQz7JWOFRuJFloOzjk+6WijU56m9oKXFAXc7o3Towg=="], + + "@types/hast": ["@types/hast@3.0.4", "", { "dependencies": { "@types/unist": "*" } }, "sha512-WPs+bbQw5aCj+x6laNGWLH3wviHtoCv/P3+otBhbOhJgG8qtpdAMlTCxLtsTWA7LH1Oh/bFCHsBn0TPS5m30EQ=="], + + "@types/mdast": ["@types/mdast@4.0.4", "", { "dependencies": { "@types/unist": "*" } }, "sha512-kGaNbPh1k7AFzgpud/gMdvIm5xuECykRR+JnWKQno9TAXVa6WIVCGTPvYGekIDL4uwCZQSYbUxNBSb1aUo79oA=="], + + "@types/ms": ["@types/ms@2.1.0", "", {}, "sha512-GsCCIZDE/p3i96vtEqx+7dBUGXrc7zeSK3wwPHIaRThS+9OhWIXRqzs4d6k1SVU8g91DrNRWxWUGhp5KXQb2VA=="], + "@types/node": ["@types/node@25.0.9", "", { "dependencies": { "undici-types": "~7.16.0" } }, "sha512-/rpCXHlCWeqClNBwUhDcusJxXYDjZTyE8v5oTO7WbL8eij2nKhUeU89/6xgjU7N4/Vh3He0BtyhJdQbDyhiXAw=="], + "@types/supports-color": ["@types/supports-color@8.1.3", "", {}, "sha512-Hy6UMpxhE3j1tLpl27exp1XqHD7n8chAiNPzWfz16LPZoMMoSc4dzLl6w9qijkEb/r5O1ozdu1CWGA2L83ZeZg=="], + + "@types/unist": ["@types/unist@3.0.3", "", {}, "sha512-ko/gIFJRv177XgZsZcBwnqJN5x/Gien8qNOn0D5bQU/zAzVf9Zt3BlcUiLqhV9y4ARk0GbT3tnUiPNgnTXzc/Q=="], + + "ansi-regex": ["ansi-regex@6.2.2", "", {}, "sha512-Bq3SmSpyFHaWjPk8If9yc6svM8c56dB5BAtW4Qbw5jHTwwXXcTLoRMkpDJp6VL0XzlWaCHTXrkFURMYmD0sLqg=="], + + "bail": ["bail@2.0.2", "", {}, "sha512-0xO6mYd7JB2YesxDKplafRpsiOzPt9V02ddPCLbY1xYGPOX24NTyN50qnUxgCPcSoYMhKpAuBTjQoRZCAkUDRw=="], + "bun-types": ["bun-types@1.3.6", "", { "dependencies": { "@types/node": "*" } }, "sha512-OlFwHcnNV99r//9v5IIOgQ9Uk37gZqrNMCcqEaExdkVq3Avwqok1bJFmvGMCkCE0FqzdY8VMOZpfpR3lwI+CsQ=="], + "character-entities": ["character-entities@2.0.2", "", {}, "sha512-shx7oQ0Awen/BRIdkjkvz54PnEEI/EjwXDSIZp86/KKdbafHh1Df/RYGBhn4hbe2+uKC9FnT5UCEdyPz3ai9hQ=="], + + "debug": ["debug@4.4.3", "", { "dependencies": { "ms": "^2.1.3" } }, "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA=="], + + "decode-named-character-reference": ["decode-named-character-reference@1.3.0", "", { "dependencies": { "character-entities": "^2.0.0" } }, "sha512-GtpQYB283KrPp6nRw50q3U9/VfOutZOe103qlN7BPP6Ad27xYnOIWv4lPzo8HCAL+mMZofJ9KEy30fq6MfaK6Q=="], + + "dequal": ["dequal@2.0.3", "", {}, "sha512-0je+qPKHEMohvfRTCEo3CrPG6cAzAYgmzKyxRiYSSDkS6eGJdyVJm7WaYA5ECaAD9wLB2T4EEeymA5aFVcYXCA=="], + + "devlop": ["devlop@1.1.0", "", { "dependencies": { "dequal": "^2.0.0" } }, "sha512-RWmIqhcFf1lRYBvNmr7qTNuyCt/7/ns2jbpp1+PalgE/rDQcBT0fioSMUpJ93irlUhC5hrg4cYqe6U+0ImW0rA=="], + + "eastasianwidth": ["eastasianwidth@0.2.0", "", {}, "sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA=="], + + "emoji-regex": ["emoji-regex@10.6.0", "", {}, "sha512-toUI84YS5YmxW219erniWD0CIVOo46xGKColeNQRgOzDorgBi1v4D71/OFzgD9GO2UGKIv1C3Sp8DAn0+j5w7A=="], + + "extend": ["extend@3.0.2", "", {}, "sha512-fjquC59cD7CyW6urNXK0FBufkZcoiGG80wTuPujX590cB5Ttln20E2UB4S/WARVqhXffZl2LNgS+gQdPIIim/g=="], + + "is-plain-obj": ["is-plain-obj@4.1.0", "", {}, "sha512-+Pgi+vMuUNkJyExiMBt5IlFoMyKnr5zhJ4Uspz58WOhBF5QoIZkFyNHIbBAtHwzVAgk5RtndVNsDRN61/mmDqg=="], + + "longest-streak": ["longest-streak@3.1.0", "", {}, "sha512-9Ri+o0JYgehTaVBBDoMqIl8GXtbWg711O3srftcHhZ0dqnETqLaoIK0x17fUw9rFSlK/0NlsKe0Ahhyl5pXE2g=="], + + "mdast-comment-marker": ["mdast-comment-marker@3.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-mdx-expression": "^2.0.0" } }, "sha512-bt08sLmTNg00/UtVDiqZKocxqvQqqyQZAg1uaRuO/4ysXV5motg7RolF5o5yy/sY1rG0v2XgZEqFWho1+2UquA=="], + + "mdast-util-from-markdown": ["mdast-util-from-markdown@2.0.2", "", { "dependencies": { "@types/mdast": "^4.0.0", "@types/unist": "^3.0.0", "decode-named-character-reference": "^1.0.0", "devlop": "^1.0.0", "mdast-util-to-string": "^4.0.0", "micromark": "^4.0.0", "micromark-util-decode-numeric-character-reference": "^2.0.0", "micromark-util-decode-string": "^2.0.0", "micromark-util-normalize-identifier": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0", "unist-util-stringify-position": "^4.0.0" } }, "sha512-uZhTV/8NBuw0WHkPTrCqDOl0zVe1BIng5ZtHoDk49ME1qqcjYmmLmOf0gELgcRMxN4w2iuIeVso5/6QymSrgmA=="], + + "mdast-util-mdx-expression": ["mdast-util-mdx-expression@2.0.1", "", { "dependencies": { "@types/estree-jsx": "^1.0.0", "@types/hast": "^3.0.0", "@types/mdast": "^4.0.0", "devlop": "^1.0.0", "mdast-util-from-markdown": "^2.0.0", "mdast-util-to-markdown": "^2.0.0" } }, "sha512-J6f+9hUp+ldTZqKRSg7Vw5V6MqjATc+3E4gf3CFNcuZNWD8XdyI6zQ8GqH7f8169MM6P7hMBRDVGnn7oHB9kXQ=="], + + "mdast-util-phrasing": ["mdast-util-phrasing@4.1.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "unist-util-is": "^6.0.0" } }, "sha512-TqICwyvJJpBwvGAMZjj4J2n0X8QWp21b9l0o7eXyVJ25YNWYbJDVIyD1bZXE6WtV6RmKJVYmQAKWa0zWOABz2w=="], + + "mdast-util-to-markdown": ["mdast-util-to-markdown@2.1.2", "", { "dependencies": { "@types/mdast": "^4.0.0", "@types/unist": "^3.0.0", "longest-streak": "^3.0.0", "mdast-util-phrasing": "^4.0.0", "mdast-util-to-string": "^4.0.0", "micromark-util-classify-character": "^2.0.0", "micromark-util-decode-string": "^2.0.0", "unist-util-visit": "^5.0.0", "zwitch": "^2.0.0" } }, "sha512-xj68wMTvGXVOKonmog6LwyJKrYXZPvlwabaryTjLh9LuvovB/KAH+kvi8Gjj+7rJjsFi23nkUxRQv1KqSroMqA=="], + + "mdast-util-to-string": ["mdast-util-to-string@4.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0" } }, "sha512-0H44vDimn51F0YwvxSJSm0eCDOJTRlmN0R1yBh4HLj9wiV1Dn0QoXGbvFAWj2hSItVTlCmBF1hqKlIyUBVFLPg=="], + + "micromark": ["micromark@4.0.2", "", { "dependencies": { "@types/debug": "^4.0.0", "debug": "^4.0.0", "decode-named-character-reference": "^1.0.0", "devlop": "^1.0.0", "micromark-core-commonmark": "^2.0.0", "micromark-factory-space": "^2.0.0", "micromark-util-character": "^2.0.0", "micromark-util-chunked": "^2.0.0", "micromark-util-combine-extensions": "^2.0.0", "micromark-util-decode-numeric-character-reference": "^2.0.0", "micromark-util-encode": "^2.0.0", "micromark-util-normalize-identifier": "^2.0.0", "micromark-util-resolve-all": "^2.0.0", "micromark-util-sanitize-uri": "^2.0.0", "micromark-util-subtokenize": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-zpe98Q6kvavpCr1NPVSCMebCKfD7CA2NqZ+rykeNhONIJBpc1tFKt9hucLGwha3jNTNI8lHpctWJWoimVF4PfA=="], + + "micromark-core-commonmark": ["micromark-core-commonmark@2.0.3", "", { "dependencies": { "decode-named-character-reference": "^1.0.0", "devlop": "^1.0.0", "micromark-factory-destination": "^2.0.0", "micromark-factory-label": "^2.0.0", "micromark-factory-space": "^2.0.0", "micromark-factory-title": "^2.0.0", "micromark-factory-whitespace": "^2.0.0", "micromark-util-character": "^2.0.0", "micromark-util-chunked": "^2.0.0", "micromark-util-classify-character": "^2.0.0", "micromark-util-html-tag-name": "^2.0.0", "micromark-util-normalize-identifier": "^2.0.0", "micromark-util-resolve-all": "^2.0.0", "micromark-util-subtokenize": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-RDBrHEMSxVFLg6xvnXmb1Ayr2WzLAWjeSATAoxwKYJV94TeNavgoIdA0a9ytzDSVzBy2YKFK+emCPOEibLeCrg=="], + + "micromark-factory-destination": ["micromark-factory-destination@2.0.1", "", { "dependencies": { "micromark-util-character": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-Xe6rDdJlkmbFRExpTOmRj9N3MaWmbAgdpSrBQvCFqhezUn4AHqJHbaEnfbVYYiexVSs//tqOdY/DxhjdCiJnIA=="], + + "micromark-factory-label": ["micromark-factory-label@2.0.1", "", { "dependencies": { "devlop": "^1.0.0", "micromark-util-character": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-VFMekyQExqIW7xIChcXn4ok29YE3rnuyveW3wZQWWqF4Nv9Wk5rgJ99KzPvHjkmPXF93FXIbBp6YdW3t71/7Vg=="], + + "micromark-factory-space": ["micromark-factory-space@2.0.1", "", { "dependencies": { "micromark-util-character": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-zRkxjtBxxLd2Sc0d+fbnEunsTj46SWXgXciZmHq0kDYGnck/ZSGj9/wULTV95uoeYiK5hRXP2mJ98Uo4cq/LQg=="], + + "micromark-factory-title": ["micromark-factory-title@2.0.1", "", { "dependencies": { "micromark-factory-space": "^2.0.0", "micromark-util-character": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-5bZ+3CjhAd9eChYTHsjy6TGxpOFSKgKKJPJxr293jTbfry2KDoWkhBb6TcPVB4NmzaPhMs1Frm9AZH7OD4Cjzw=="], + + "micromark-factory-whitespace": ["micromark-factory-whitespace@2.0.1", "", { "dependencies": { "micromark-factory-space": "^2.0.0", "micromark-util-character": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-Ob0nuZ3PKt/n0hORHyvoD9uZhr+Za8sFoP+OnMcnWK5lngSzALgQYKMr9RJVOWLqQYuyn6ulqGWSXdwf6F80lQ=="], + + "micromark-util-character": ["micromark-util-character@2.1.1", "", { "dependencies": { "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-wv8tdUTJ3thSFFFJKtpYKOYiGP2+v96Hvk4Tu8KpCAsTMs6yi+nVmGh1syvSCsaxz45J6Jbw+9DD6g97+NV67Q=="], + + "micromark-util-chunked": ["micromark-util-chunked@2.0.1", "", { "dependencies": { "micromark-util-symbol": "^2.0.0" } }, "sha512-QUNFEOPELfmvv+4xiNg2sRYeS/P84pTW0TCgP5zc9FpXetHY0ab7SxKyAQCNCc1eK0459uoLI1y5oO5Vc1dbhA=="], + + "micromark-util-classify-character": ["micromark-util-classify-character@2.0.1", "", { "dependencies": { "micromark-util-character": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-K0kHzM6afW/MbeWYWLjoHQv1sgg2Q9EccHEDzSkxiP/EaagNzCm7T/WMKZ3rjMbvIpvBiZgwR3dKMygtA4mG1Q=="], + + "micromark-util-combine-extensions": ["micromark-util-combine-extensions@2.0.1", "", { "dependencies": { "micromark-util-chunked": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-OnAnH8Ujmy59JcyZw8JSbK9cGpdVY44NKgSM7E9Eh7DiLS2E9RNQf0dONaGDzEG9yjEl5hcqeIsj4hfRkLH/Bg=="], + + "micromark-util-decode-numeric-character-reference": ["micromark-util-decode-numeric-character-reference@2.0.2", "", { "dependencies": { "micromark-util-symbol": "^2.0.0" } }, "sha512-ccUbYk6CwVdkmCQMyr64dXz42EfHGkPQlBj5p7YVGzq8I7CtjXZJrubAYezf7Rp+bjPseiROqe7G6foFd+lEuw=="], + + "micromark-util-decode-string": ["micromark-util-decode-string@2.0.1", "", { "dependencies": { "decode-named-character-reference": "^1.0.0", "micromark-util-character": "^2.0.0", "micromark-util-decode-numeric-character-reference": "^2.0.0", "micromark-util-symbol": "^2.0.0" } }, "sha512-nDV/77Fj6eH1ynwscYTOsbK7rR//Uj0bZXBwJZRfaLEJ1iGBR6kIfNmlNqaqJf649EP0F3NWNdeJi03elllNUQ=="], + + "micromark-util-encode": ["micromark-util-encode@2.0.1", "", {}, "sha512-c3cVx2y4KqUnwopcO9b/SCdo2O67LwJJ/UyqGfbigahfegL9myoEFoDYZgkT7f36T0bLrM9hZTAaAyH+PCAXjw=="], + + "micromark-util-html-tag-name": ["micromark-util-html-tag-name@2.0.1", "", {}, "sha512-2cNEiYDhCWKI+Gs9T0Tiysk136SnR13hhO8yW6BGNyhOC4qYFnwF1nKfD3HFAIXA5c45RrIG1ub11GiXeYd1xA=="], + + "micromark-util-normalize-identifier": ["micromark-util-normalize-identifier@2.0.1", "", { "dependencies": { "micromark-util-symbol": "^2.0.0" } }, "sha512-sxPqmo70LyARJs0w2UclACPUUEqltCkJ6PhKdMIDuJ3gSf/Q+/GIe3WKl0Ijb/GyH9lOpUkRAO2wp0GVkLvS9Q=="], + + "micromark-util-resolve-all": ["micromark-util-resolve-all@2.0.1", "", { "dependencies": { "micromark-util-types": "^2.0.0" } }, "sha512-VdQyxFWFT2/FGJgwQnJYbe1jjQoNTS4RjglmSjTUlpUMa95Htx9NHeYW4rGDJzbjvCsl9eLjMQwGeElsqmzcHg=="], + + "micromark-util-sanitize-uri": ["micromark-util-sanitize-uri@2.0.1", "", { "dependencies": { "micromark-util-character": "^2.0.0", "micromark-util-encode": "^2.0.0", "micromark-util-symbol": "^2.0.0" } }, "sha512-9N9IomZ/YuGGZZmQec1MbgxtlgougxTodVwDzzEouPKo3qFWvymFHWcnDi2vzV1ff6kas9ucW+o3yzJK9YB1AQ=="], + + "micromark-util-subtokenize": ["micromark-util-subtokenize@2.1.0", "", { "dependencies": { "devlop": "^1.0.0", "micromark-util-chunked": "^2.0.0", "micromark-util-symbol": "^2.0.0", "micromark-util-types": "^2.0.0" } }, "sha512-XQLu552iSctvnEcgXw6+Sx75GflAPNED1qx7eBJ+wydBb2KCbRZe+NwvIEEMM83uml1+2WSXpBAcp9IUCgCYWA=="], + + "micromark-util-symbol": ["micromark-util-symbol@2.0.1", "", {}, "sha512-vs5t8Apaud9N28kgCrRUdEed4UJ+wWNvicHLPxCa9ENlYuAY31M0ETy5y1vA33YoNPDFTghEbnh6efaE8h4x0Q=="], + + "micromark-util-types": ["micromark-util-types@2.0.2", "", {}, "sha512-Yw0ECSpJoViF1qTU4DC6NwtC4aWGt1EkzaQB8KPPyCRR8z9TWeV0HbEFGTO+ZY1wB22zmxnJqhPyTpOVCpeHTA=="], + + "ms": ["ms@2.1.3", "", {}, "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="], + + "pluralize": ["pluralize@8.0.0", "", {}, "sha512-Nc3IT5yHzflTfbjgqWcCPpo7DaKy4FnpB0l/zCAW0Tc7jxAiuqSxHasntB3D7887LSrA93kDJ9IXovxJYxyLCA=="], + + "remark-lint": ["remark-lint@10.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "remark-message-control": "^8.0.0", "unified": "^11.0.0" } }, "sha512-1+PYGFziOg4pH7DDf1uMd4AR3YuO2EMnds/SdIWMPGT7CAfDRSnAmpxPsJD0Ds3IKpn97h3d5KPGf1WFOg6hXQ=="], + + "remark-lint-checkbox-content-indent": ["remark-lint-checkbox-content-indent@5.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-phrasing": "^4.0.0", "pluralize": "^8.0.0", "unified-lint-rule": "^3.0.0", "unist-util-position": "^5.0.0", "unist-util-visit-parents": "^6.0.0" } }, "sha512-R1gV4vGkgJQZQFIGve1paj4mVDUWlgX0KAHhjNpSyzuwuSIDoxWpEuSJSxcnczESgcjM4yVrZqEGMYi/fqZK0w=="], + + "remark-lint-final-newline": ["remark-lint-final-newline@3.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "devlop": "^1.0.0", "unified-lint-rule": "^3.0.0", "vfile-location": "^5.0.0" } }, "sha512-q5diKHD6BMbzqWqgvYPOB8AJgLrMzEMBAprNXjcpKoZ/uCRqly+gxjco+qVUMtMWSd+P+KXZZEqoa7Y6QiOudw=="], + + "remark-lint-no-html": ["remark-lint-no-html@4.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "unified-lint-rule": "^3.0.0", "unist-util-visit-parents": "^6.0.0" } }, "sha512-d5OD+lp2PJMtIIpCR12uDCGzmmRbYDx+bc2iTIX6Bgo0vprQY0dBG1UXbUT5q8KRijXwOFwBDX6Ogl9atRwCGA=="], + + "remark-message-control": ["remark-message-control@8.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-comment-marker": "^3.0.0", "unified-message-control": "^5.0.0", "vfile": "^6.0.0" } }, "sha512-brpzOO+jdyE/mLqvqqvbogmhGxKygjpCUCG/PwSCU43+JZQ+RM+sSzkCWBcYvgF3KIAVNIoPsvXjBkzO7EdsYQ=="], + + "remark-parse": ["remark-parse@11.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-from-markdown": "^2.0.0", "micromark-util-types": "^2.0.0", "unified": "^11.0.0" } }, "sha512-FCxlKLNGknS5ba/1lmpYijMUzX2esxW5xQqjWxw2eHFfS2MSdaHVINFmhjo+qN1WhZhNimq0dZATN9pH0IDrpA=="], + + "remark-stringify": ["remark-stringify@11.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-to-markdown": "^2.0.0", "unified": "^11.0.0" } }, "sha512-1OSmLd3awB/t8qdoEOMazZkNsfVTeY4fTsgzcQFdXNq8ToTN4ZGwrMnlda4K6smTFKD+GRV6O48i6Z4iKgPPpw=="], + + "space-separated-tokens": ["space-separated-tokens@2.0.2", "", {}, "sha512-PEGlAwrG8yXGXRjW32fGbg66JAlOAwbObuqVoJpv/mRgoWDQfgH1wDPvtzWyUSNAXBGSk8h755YDbbcEy3SH2Q=="], + + "string-width": ["string-width@6.1.0", "", { "dependencies": { "eastasianwidth": "^0.2.0", "emoji-regex": "^10.2.1", "strip-ansi": "^7.0.1" } }, "sha512-k01swCJAgQmuADB0YIc+7TuatfNvTBVOoaUWJjTB9R4VJzR5vNWzf5t42ESVZFPS8xTySF7CAdV4t/aaIm3UnQ=="], + + "strip-ansi": ["strip-ansi@7.1.2", "", { "dependencies": { "ansi-regex": "^6.0.1" } }, "sha512-gmBGslpoQJtgnMAvOVqGZpEz9dyoKTCzy2nfz/n8aIFhN/jCE/rCmcxabB6jOOHV+0WNnylOxaxBQPSvcWklhA=="], + + "supports-color": ["supports-color@9.4.0", "", {}, "sha512-VL+lNrEoIXww1coLPOmiEmK/0sGigko5COxI09KzHc2VJXJsQ37UaQ+8quuxjDeA7+KnLGTWRyOXSLLR2Wb4jw=="], + + "to-vfile": ["to-vfile@8.0.0", "", { "dependencies": { "vfile": "^6.0.0" } }, "sha512-IcmH1xB5576MJc9qcfEC/m/nQCFt3fzMHz45sSlgJyTWjRbKW1HAkJpuf3DgE57YzIlZcwcBZA5ENQbBo4aLkg=="], + + "trough": ["trough@2.2.0", "", {}, "sha512-tmMpK00BjZiUyVyvrBK7knerNgmgvcV/KLVyuma/SC+TQN167GrMRciANTz09+k3zW8L8t60jWO1GpfkZdjTaw=="], + "typescript": ["typescript@5.9.3", "", { "bin": { "tsc": "bin/tsc", "tsserver": "bin/tsserver" } }, "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw=="], "undici-types": ["undici-types@7.16.0", "", {}, "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="], + + "unified": ["unified@11.0.5", "", { "dependencies": { "@types/unist": "^3.0.0", "bail": "^2.0.0", "devlop": "^1.0.0", "extend": "^3.0.0", "is-plain-obj": "^4.0.0", "trough": "^2.0.0", "vfile": "^6.0.0" } }, "sha512-xKvGhPWw3k84Qjh8bI3ZeJjqnyadK+GEFtazSfZv/rKeTkTjOJho6mFqh2SM96iIcZokxiOpg78GazTSg8+KHA=="], + + "unified-lint-rule": ["unified-lint-rule@3.0.1", "", { "dependencies": { "@types/unist": "^3.0.0", "trough": "^2.0.0", "unified": "^11.0.0", "vfile": "^6.0.0" } }, "sha512-HxIeQOmwL19DGsxHXbeyzKHBsoSCFO7UtRVUvT2v61ptw/G+GbysWcrpHdfs5jqbIFDA11MoKngIhQK0BeTVjA=="], + + "unified-message-control": ["unified-message-control@5.0.0", "", { "dependencies": { "@types/unist": "^3.0.0", "devlop": "^1.0.0", "space-separated-tokens": "^2.0.0", "unist-util-is": "^6.0.0", "unist-util-visit": "^5.0.0", "vfile": "^6.0.0", "vfile-location": "^5.0.0", "vfile-message": "^4.0.0" } }, "sha512-B2cSAkpuMVVmPP90KCfKdBhm1e9KYJ+zK3x5BCa0N65zpq1Ybkc9C77+M5qwR8FWO7RF3LM5QRRPZtgjW6DUCw=="], + + "unist-util-is": ["unist-util-is@6.0.1", "", { "dependencies": { "@types/unist": "^3.0.0" } }, "sha512-LsiILbtBETkDz8I9p1dQ0uyRUWuaQzd/cuEeS1hoRSyW5E5XGmTzlwY1OrNzzakGowI9Dr/I8HVaw4hTtnxy8g=="], + + "unist-util-position": ["unist-util-position@5.0.0", "", { "dependencies": { "@types/unist": "^3.0.0" } }, "sha512-fucsC7HjXvkB5R3kTCO7kUjRdrS0BJt3M/FPxmHMBOm8JQi2BsHAHFsy27E0EolP8rp0NzXsJ+jNPyDWvOJZPA=="], + + "unist-util-stringify-position": ["unist-util-stringify-position@4.0.0", "", { "dependencies": { "@types/unist": "^3.0.0" } }, "sha512-0ASV06AAoKCDkS2+xw5RXJywruurpbC4JZSm7nr7MOt1ojAzvyyaO+UxZf18j8FCF6kmzCZKcAgN/yu2gm2XgQ=="], + + "unist-util-visit": ["unist-util-visit@5.1.0", "", { "dependencies": { "@types/unist": "^3.0.0", "unist-util-is": "^6.0.0", "unist-util-visit-parents": "^6.0.0" } }, "sha512-m+vIdyeCOpdr/QeQCu2EzxX/ohgS8KbnPDgFni4dQsfSCtpz8UqDyY5GjRru8PDKuYn7Fq19j1CQ+nJSsGKOzg=="], + + "unist-util-visit-parents": ["unist-util-visit-parents@6.0.2", "", { "dependencies": { "@types/unist": "^3.0.0", "unist-util-is": "^6.0.0" } }, "sha512-goh1s1TBrqSqukSc8wrjwWhL0hiJxgA8m4kFxGlQ+8FYQ3C/m11FcTs4YYem7V664AhHVvgoQLk890Ssdsr2IQ=="], + + "vfile": ["vfile@6.0.3", "", { "dependencies": { "@types/unist": "^3.0.0", "vfile-message": "^4.0.0" } }, "sha512-KzIbH/9tXat2u30jf+smMwFCsno4wHVdNmzFyL+T/L3UGqqk6JKfVqOFOZEpZSHADH1k40ab6NUIXZq422ov3Q=="], + + "vfile-location": ["vfile-location@5.0.3", "", { "dependencies": { "@types/unist": "^3.0.0", "vfile": "^6.0.0" } }, "sha512-5yXvWDEgqeiYiBe1lbxYF7UMAIm/IcopxMHrMQDq3nvKcjPKIhZklUKL+AE7J7uApI4kwe2snsK+eI6UTj9EHg=="], + + "vfile-message": ["vfile-message@4.0.3", "", { "dependencies": { "@types/unist": "^3.0.0", "unist-util-stringify-position": "^4.0.0" } }, "sha512-QTHzsGd1EhbZs4AsQ20JX1rC3cOlt/IWJruk893DfLRr57lcnOeMaWG4K0JrRta4mIJZKth2Au3mM3u03/JWKw=="], + + "vfile-reporter": ["vfile-reporter@8.1.1", "", { "dependencies": { "@types/supports-color": "^8.0.0", "string-width": "^6.0.0", "supports-color": "^9.0.0", "unist-util-stringify-position": "^4.0.0", "vfile": "^6.0.0", "vfile-message": "^4.0.0", "vfile-sort": "^4.0.0", "vfile-statistics": "^3.0.0" } }, "sha512-qxRZcnFSQt6pWKn3PAk81yLK2rO2i7CDXpy8v8ZquiEOMLSnPw6BMSi9Y1sUCwGGl7a9b3CJT1CKpnRF7pp66g=="], + + "vfile-reporter-json": ["vfile-reporter-json@4.0.0", "", { "dependencies": { "@types/unist": "^2.0.0", "vfile": "^6.0.0", "vfile-message": "^4.0.0" } }, "sha512-O+eR2OpupXW5vhDEUHJjFUR4f9jyeBIU0eHVb25eLSeRR6zeD1RQlIeZFSyP6J9v6XpC5zgjtZT/jbrkI3ZEag=="], + + "vfile-sort": ["vfile-sort@4.0.0", "", { "dependencies": { "vfile": "^6.0.0", "vfile-message": "^4.0.0" } }, "sha512-lffPI1JrbHDTToJwcq0rl6rBmkjQmMuXkAxsZPRS9DXbaJQvc642eCg6EGxcX2i1L+esbuhq+2l9tBll5v8AeQ=="], + + "vfile-statistics": ["vfile-statistics@3.0.0", "", { "dependencies": { "vfile": "^6.0.0", "vfile-message": "^4.0.0" } }, "sha512-/qlwqwWBWFOmpXujL/20P+Iuydil0rZZNglR+VNm6J0gpLHwuVM5s7g2TfVoswbXjZ4HuIhLMySEyIw5i7/D8w=="], + + "zwitch": ["zwitch@2.0.4", "", {}, "sha512-bXE4cR/kVZhKZX/RjPEflHaKVhUVl85noU3v6b8apfQEc1x4A+zBxjZ4lN8LqGd6WZ3dl98pY4o717VFmoPp+A=="], + + "vfile-reporter-json/@types/unist": ["@types/unist@2.0.11", "", {}, "sha512-CmBKiL6NNo/OqgmMn95Fk9Whlp2mtvIv+KNpQKN2F4SjvrEesubTRWGYSg+BnWZOnlCaSTU1sMpsBOzgbYhnsA=="], } } diff --git a/tools/index.ts b/tools/index.ts index 77c7985..b87dddb 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -2,16 +2,29 @@ import { NewLineAfterHeadings } from "./rules/raw/NewLineAfterHeadings"; import { Runner } from './src/Runner'; import { OpenPracticesRepository } from './src/Repository'; import { NoTrailingWhitespace } from "./rules/raw/NoTrailingWhitespace"; +import { join } from 'node:path' +import { read } from 'to-vfile' +import { RemarkRules } from "./rules/remark/RemarkRules"; +import type { VFile } from "vfile"; const lintableFileRunner = new Runner(await OpenPracticesRepository.getCapabilities(), [ new NewLineAfterHeadings(), - new NoTrailingWhitespace() + new NoTrailingWhitespace(), ]) -lintableFileRunner.run() +const filepath = join(import.meta.dir, '..', 'capabilities', 'test-automation.md') +const file = await read(filepath) -if (lintableFileRunner.issuesWereFound()) { +const remarkRunner = new Runner([file], [ + new RemarkRules() +]) + +await remarkRunner.run() +await lintableFileRunner.run() + +if (lintableFileRunner.issuesWereFound() || remarkRunner.issuesWereFound()) { lintableFileRunner.print() + remarkRunner.print() process.exit(1) } else { console.log("No issues found") diff --git a/tools/package.json b/tools/package.json index 4752c4b..5fdd714 100644 --- a/tools/package.json +++ b/tools/package.json @@ -8,5 +8,18 @@ }, "peerDependencies": { "typescript": "^5" + }, + "dependencies": { + "remark-lint": "^10.0.1", + "remark-lint-checkbox-content-indent": "^5.0.1", + "remark-lint-final-newline": "^3.0.1", + "remark-lint-no-html": "^4.0.1", + "remark-parse": "^11.0.0", + "remark-stringify": "^11.0.0", + "to-vfile": "^8.0.0", + "unified": "^11.0.5", + "vfile": "^6.0.3", + "vfile-reporter": "^8.1.1", + "vfile-reporter-json": "^4.0.0" } } diff --git a/tools/rules/remark/RemarkRules.ts b/tools/rules/remark/RemarkRules.ts new file mode 100644 index 0000000..fc05585 --- /dev/null +++ b/tools/rules/remark/RemarkRules.ts @@ -0,0 +1,30 @@ +import type { VFile } from "vfile"; +import { Rule } from "../../src/Rule"; +import remarkParse from 'remark-parse' +import { unified } from 'unified' +import remarkLint from 'remark-lint' +import remarkStringify from 'remark-stringify' +import remarkLintFinalNewline from "remark-lint-final-newline"; +import remarkLintNoHtml from "remark-lint-no-html"; + +export class RemarkRules extends Rule { + override async run(file: VFile) { + await unified() + .use(remarkParse) + .use(remarkLint) + .use(remarkLintFinalNewline) + .use(remarkLintNoHtml) + .use(remarkStringify) + .process(file) + + file.messages.map(m => { + if (m.file === undefined) return + this.report( + m.file, + m.ruleId || 'unknown-rule', + m.message, + { col: m.column || -1, row: m.line || -1 } + ) + }) + } +} diff --git a/tools/src/Runner.ts b/tools/src/Runner.ts index cde2d6e..174ea6b 100644 --- a/tools/src/Runner.ts +++ b/tools/src/Runner.ts @@ -13,7 +13,7 @@ export class Runner { } } - run() { + async run() { for (const item of this.content) { this.registry.run(item) } From 1f57043612cc0f70293c88302f006829239b6e63 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 12:20:25 -0700 Subject: [PATCH 088/131] convert normal runners to vfiles --- tools/index.ts | 4 +- tools/rules/raw/NewLineAfterHeadings.ts | 9 +++-- tools/rules/raw/NewlineAfterHeadings.test.ts | 15 ++++--- tools/rules/raw/NoTrailingWhitespace.test.ts | 17 +++++--- tools/rules/raw/NoTrailingWhitespace.ts | 9 +++-- tools/src/Repo.ts | 42 ++++++++++++++++++++ tools/src/Repository.ts | 19 --------- 7 files changed, 77 insertions(+), 38 deletions(-) create mode 100644 tools/src/Repo.ts delete mode 100644 tools/src/Repository.ts diff --git a/tools/index.ts b/tools/index.ts index b87dddb..9d05a34 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -1,13 +1,13 @@ import { NewLineAfterHeadings } from "./rules/raw/NewLineAfterHeadings"; import { Runner } from './src/Runner'; -import { OpenPracticesRepository } from './src/Repository'; +import { Repo } from './src/Repo'; import { NoTrailingWhitespace } from "./rules/raw/NoTrailingWhitespace"; import { join } from 'node:path' import { read } from 'to-vfile' import { RemarkRules } from "./rules/remark/RemarkRules"; import type { VFile } from "vfile"; -const lintableFileRunner = new Runner(await OpenPracticesRepository.getCapabilities(), [ +const lintableFileRunner = new Runner(await Repo.capabilities().vfiles(), [ new NewLineAfterHeadings(), new NoTrailingWhitespace(), ]) diff --git a/tools/rules/raw/NewLineAfterHeadings.ts b/tools/rules/raw/NewLineAfterHeadings.ts index 5ffc9d0..e3efcc1 100644 --- a/tools/rules/raw/NewLineAfterHeadings.ts +++ b/tools/rules/raw/NewLineAfterHeadings.ts @@ -1,10 +1,13 @@ +import type { VFile } from "vfile"; import { Rule } from "../../src/Rule"; -export class NewLineAfterHeadings extends Rule { - override run({ filename, content }: LintableFile) { +export class NewLineAfterHeadings extends Rule { + override run(file: VFile) { + const content = file.value.toString() + const filename = file.path const lines = content.split('\n') for (let i = 0; i < lines.length; i++) { - if (lines[i]?.charAt(0) === '#' && i+1 < lines.length && lines[i+1] !== '') { + if (lines[i]?.charAt(0) === '#' && i+1 < lines.length && lines[i + 1] !== '') { this.report(filename, 'new-line-after-headings', 'You must have a new line after headings.', { row: i + 2, col: 1, diff --git a/tools/rules/raw/NewlineAfterHeadings.test.ts b/tools/rules/raw/NewlineAfterHeadings.test.ts index 538bca2..d8ed4c7 100644 --- a/tools/rules/raw/NewlineAfterHeadings.test.ts +++ b/tools/rules/raw/NewlineAfterHeadings.test.ts @@ -1,29 +1,34 @@ import { describe, it, expect } from 'bun:test' import { NewLineAfterHeadings } from './NewLineAfterHeadings' +import type { VFile } from 'vfile' const mkRule = () => new NewLineAfterHeadings({'new-line-after-headings': 'silent'}) +const mkInput = (content: string) => { + return { path: "mock-thing.md", value: Buffer.from(content) } as unknown as VFile +} + describe(NewLineAfterHeadings.name, () => { it('should fail if there is no newline after a heading', () => { const rule = mkRule() - rule.run({filename:"mock-thing.md", content: ` + rule.run(mkInput(` # Some Heading This is not correct. -`}) +`)) const problems = rule.getProblems() expect(problems).not.toBeEmpty() }) it('should succeed if there is a newline after a heading', () => { const rule = mkRule() - rule.run({filename:"mock-thing.md", content: `# Some Heading + rule.run(mkInput(`# Some Heading This is not correct. -`}) +`)) expect(rule.getProblems()).toBeEmpty() }) it('should not report when heading is last line (This is a different lint error)', () => { const rule = mkRule() - rule.run({ filename:"mock-thing.md", content: `# Some Heading` }) + rule.run(mkInput(`# Some Heading`)) expect(rule.getProblems()).toBeEmpty() }) }) diff --git a/tools/rules/raw/NoTrailingWhitespace.test.ts b/tools/rules/raw/NoTrailingWhitespace.test.ts index 7f4049a..8770c9f 100644 --- a/tools/rules/raw/NoTrailingWhitespace.test.ts +++ b/tools/rules/raw/NoTrailingWhitespace.test.ts @@ -1,16 +1,21 @@ import { describe, it, expect } from 'bun:test' import { NoTrailingWhitespace } from './NoTrailingWhitespace' +import type { VFile } from 'vfile' const mkRule = () => new NoTrailingWhitespace({'no-trailing-white-space': 'silent'}) +const mkInput = (content: string) => { + return { path: "mock-thing.md", value: Buffer.from(content) } as unknown as VFile +} + describe(NoTrailingWhitespace.name, () => { it('should fail when lines have trailing whitespace', () => { const rule = mkRule() - const content = ` + const content = mkInput(` # Some Heading This is not correct. -` - rule.run({ filename: "mock-file.md", content }) +`) + rule.run(content) const problems = rule.getProblems() expect(problems).not.toBeEmpty() expect(problems[0]?.getFileLocation().row).toEqual(2) @@ -18,12 +23,12 @@ This is not correct. }) it('should not fail when no lines have trailing white space', () => { const rule = mkRule() - const content = `# Some Heading + const content = mkInput(`# Some Heading This is not correct. - cool - beans -` - rule.run({ filename: "mock-file.md", content }) +`) + rule.run(content) const problems = rule.getProblems() expect(problems).toBeEmpty() }) diff --git a/tools/rules/raw/NoTrailingWhitespace.ts b/tools/rules/raw/NoTrailingWhitespace.ts index c2b2631..f6aafdd 100644 --- a/tools/rules/raw/NoTrailingWhitespace.ts +++ b/tools/rules/raw/NoTrailingWhitespace.ts @@ -1,13 +1,16 @@ +import type { VFile } from "vfile"; import { Rule } from "../../src/Rule"; -export class NoTrailingWhitespace extends Rule { - override run({ filename, content }: LintableFile) { +export class NoTrailingWhitespace extends Rule { + override run(file: VFile) { + const content = file.value.toString() + const filename = file.path const lines = content.split('\n') for (const [index, line] of lines.entries()) { if (line !== line.trimEnd()) { this.report(filename, 'no-trailing-white-space', 'Trailing whitespace is not allowed', { row: index + 1, - col: line.trimEnd().length+1, + col: line.trimEnd().length + 1, }) } } diff --git a/tools/src/Repo.ts b/tools/src/Repo.ts new file mode 100644 index 0000000..f3da613 --- /dev/null +++ b/tools/src/Repo.ts @@ -0,0 +1,42 @@ +import { readdirSync } from 'node:fs' +import { join } from 'node:path' +import { read } from 'to-vfile'; + +const ROOT = join(Bun.main, '..','..') + +async function getAllFrom(folder: string): Promise { + return await Promise.all(readdirSync(folder) + .map(async (file) => ({ + filename: join(folder, file), + content: await Bun.file(join(folder, file)).text() + }))) +} + +class SourceFolder { + private root: string; + constructor(root: string) { + this.root = root + } + fileNames() { + return readdirSync(this.root) + } + filePaths() { + const root = this.root + return this.fileNames().map((name) => join(root, name)) + } + async vfiles() { + return await Promise.all(this.filePaths().map(async path => { + return await read(path) + })) + } +} + +export class Repo { + static async getCapabilities() { + return getAllFrom(join(ROOT, 'capabilities')) + } + static capabilities(): SourceFolder { + return new SourceFolder(join(ROOT, 'capabilities')) + } +} + diff --git a/tools/src/Repository.ts b/tools/src/Repository.ts deleted file mode 100644 index b47db4c..0000000 --- a/tools/src/Repository.ts +++ /dev/null @@ -1,19 +0,0 @@ -import { readdirSync } from 'node:fs' -import { join } from 'node:path' - -const ROOT = join(Bun.main, '..','..') - -async function getAllFrom(folder: string): Promise { - return await Promise.all(readdirSync(folder) - .map(async (file) => ({ - filename: join(folder, file), - content: await Bun.file(join(folder, file)).text() - }))) -} - -export class OpenPracticesRepository { - static async getCapabilities() { - return getAllFrom(join(ROOT, 'capabilities')) - } -} - From 23c6ce16ef1b07f920f9b78bf3dc0d24ee030cb9 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 12:26:09 -0700 Subject: [PATCH 089/131] combine runners --- tools/index.ts | 20 +++++--------------- tools/src/Registry.ts | 4 +++- tools/src/types.d.ts | 5 ----- 3 files changed, 8 insertions(+), 21 deletions(-) delete mode 100644 tools/src/types.d.ts diff --git a/tools/index.ts b/tools/index.ts index 9d05a34..a8d70c2 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -2,29 +2,19 @@ import { NewLineAfterHeadings } from "./rules/raw/NewLineAfterHeadings"; import { Runner } from './src/Runner'; import { Repo } from './src/Repo'; import { NoTrailingWhitespace } from "./rules/raw/NoTrailingWhitespace"; -import { join } from 'node:path' -import { read } from 'to-vfile' import { RemarkRules } from "./rules/remark/RemarkRules"; import type { VFile } from "vfile"; -const lintableFileRunner = new Runner(await Repo.capabilities().vfiles(), [ +const runner = new Runner(await Repo.capabilities().vfiles(), [ new NewLineAfterHeadings(), new NoTrailingWhitespace(), + new RemarkRules(), ]) -const filepath = join(import.meta.dir, '..', 'capabilities', 'test-automation.md') -const file = await read(filepath) +await runner.run() -const remarkRunner = new Runner([file], [ - new RemarkRules() -]) - -await remarkRunner.run() -await lintableFileRunner.run() - -if (lintableFileRunner.issuesWereFound() || remarkRunner.issuesWereFound()) { - lintableFileRunner.print() - remarkRunner.print() +if (runner.issuesWereFound()) { + runner.print() process.exit(1) } else { console.log("No issues found") diff --git a/tools/src/Registry.ts b/tools/src/Registry.ts index 0d7668b..86fef48 100644 --- a/tools/src/Registry.ts +++ b/tools/src/Registry.ts @@ -1,5 +1,7 @@ import type { Rule } from "./Rule"; + + export class Registry { private rules: Rule[] = [] register(rule: Rule) { @@ -9,7 +11,7 @@ export class Registry { this.rules.forEach(rule => rule.run(input)) } isssuesWereFound() { - return this.rules.map(r => r.hasProblems()).reduce((a, c) => c === true ? true : false) + return this.rules.map(r => r.hasProblems()).reduce((_, c) => c === true ? true : false) } print(){ this.rules.forEach(rule => rule.print()) diff --git a/tools/src/types.d.ts b/tools/src/types.d.ts deleted file mode 100644 index 2e01dd0..0000000 --- a/tools/src/types.d.ts +++ /dev/null @@ -1,5 +0,0 @@ - -type LintableFile = { - filename: string - content: string -} From 6524b7cf524f4230c8a3cbf463da493f09d234bb Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 12:42:32 -0700 Subject: [PATCH 090/131] allow remark rules from top level --- tools/index.ts | 6 +++++- tools/rules/remark/RemarkRules.ts | 14 +++++++++----- tools/src/Registry.ts | 2 +- 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/tools/index.ts b/tools/index.ts index a8d70c2..cdc528b 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -3,16 +3,20 @@ import { Runner } from './src/Runner'; import { Repo } from './src/Repo'; import { NoTrailingWhitespace } from "./rules/raw/NoTrailingWhitespace"; import { RemarkRules } from "./rules/remark/RemarkRules"; +import remarkLintFinalNewline from "remark-lint-final-newline"; +import remarkLintNoHtml from "remark-lint-no-html"; import type { VFile } from "vfile"; const runner = new Runner(await Repo.capabilities().vfiles(), [ new NewLineAfterHeadings(), new NoTrailingWhitespace(), - new RemarkRules(), + new RemarkRules([remarkLintFinalNewline, remarkLintNoHtml]), ]) await runner.run() +console.log(runner.issuesWereFound()) + if (runner.issuesWereFound()) { runner.print() process.exit(1) diff --git a/tools/rules/remark/RemarkRules.ts b/tools/rules/remark/RemarkRules.ts index fc05585..554c501 100644 --- a/tools/rules/remark/RemarkRules.ts +++ b/tools/rules/remark/RemarkRules.ts @@ -1,23 +1,27 @@ import type { VFile } from "vfile"; import { Rule } from "../../src/Rule"; import remarkParse from 'remark-parse' -import { unified } from 'unified' +import { unified, type PluggableList } from 'unified' import remarkLint from 'remark-lint' import remarkStringify from 'remark-stringify' -import remarkLintFinalNewline from "remark-lint-final-newline"; import remarkLintNoHtml from "remark-lint-no-html"; +import remarkLintFinalNewline from "remark-lint-final-newline"; export class RemarkRules extends Rule { + private list: PluggableList + constructor(list: PluggableList) { + super() + this.list = list + } override async run(file: VFile) { await unified() .use(remarkParse) .use(remarkLint) - .use(remarkLintFinalNewline) - .use(remarkLintNoHtml) + .use(this.list) .use(remarkStringify) .process(file) - file.messages.map(m => { + file.messages.forEach(m => { if (m.file === undefined) return this.report( m.file, diff --git a/tools/src/Registry.ts b/tools/src/Registry.ts index 86fef48..6c9b118 100644 --- a/tools/src/Registry.ts +++ b/tools/src/Registry.ts @@ -11,7 +11,7 @@ export class Registry { this.rules.forEach(rule => rule.run(input)) } isssuesWereFound() { - return this.rules.map(r => r.hasProblems()).reduce((_, c) => c === true ? true : false) + return this.rules.map(r => r.hasProblems()).includes(true) } print(){ this.rules.forEach(rule => rule.print()) From c8d1314899549f288ad9934e3281a7beca40dff3 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 12:59:48 -0700 Subject: [PATCH 091/131] clean up & add quick fix --- tools/index.ts | 34 +++++++++++++++---- .../NewLineAfterHeadings.test.ts} | 0 .../{raw => normal}/NewLineAfterHeadings.ts | 0 .../NoTrailingWhitespace.test.ts | 0 .../{raw => normal}/NoTrailingWhitespace.ts | 0 tools/rules/{remark => normal}/RemarkRules.ts | 2 -- tools/src/Problem.ts | 6 ++++ tools/src/Registry.ts | 3 ++ tools/src/Rule.ts | 3 ++ tools/src/Runner.ts | 3 ++ 10 files changed, 42 insertions(+), 9 deletions(-) rename tools/rules/{raw/NewlineAfterHeadings.test.ts => normal/NewLineAfterHeadings.test.ts} (100%) rename tools/rules/{raw => normal}/NewLineAfterHeadings.ts (100%) rename tools/rules/{raw => normal}/NoTrailingWhitespace.test.ts (100%) rename tools/rules/{raw => normal}/NoTrailingWhitespace.ts (100%) rename tools/rules/{remark => normal}/RemarkRules.ts (88%) diff --git a/tools/index.ts b/tools/index.ts index cdc528b..a6d75ad 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -1,24 +1,44 @@ -import { NewLineAfterHeadings } from "./rules/raw/NewLineAfterHeadings"; +import { NewLineAfterHeadings } from "./rules/normal/NewLineAfterHeadings"; import { Runner } from './src/Runner'; import { Repo } from './src/Repo'; -import { NoTrailingWhitespace } from "./rules/raw/NoTrailingWhitespace"; -import { RemarkRules } from "./rules/remark/RemarkRules"; +import { NoTrailingWhitespace } from "./rules/normal/NoTrailingWhitespace"; +import { RemarkRules } from "./rules/normal/RemarkRules"; import remarkLintFinalNewline from "remark-lint-final-newline"; import remarkLintNoHtml from "remark-lint-no-html"; import type { VFile } from "vfile"; +import { parseArgs } from "util"; + +const { values } = parseArgs({ + args: Bun.argv, + options: { + quickFix: { + type: "boolean", + }, + }, + allowPositionals: true, + strict: true, +}); + +console.log(); + const runner = new Runner(await Repo.capabilities().vfiles(), [ new NewLineAfterHeadings(), new NoTrailingWhitespace(), - new RemarkRules([remarkLintFinalNewline, remarkLintNoHtml]), + new RemarkRules([ + remarkLintFinalNewline, + [remarkLintNoHtml, { allowComments: false }] + ]), ]) await runner.run() -console.log(runner.issuesWereFound()) - if (runner.issuesWereFound()) { - runner.print() + if (values.quickFix) { + runner.printQuickFix() + } else { + runner.print() + } process.exit(1) } else { console.log("No issues found") diff --git a/tools/rules/raw/NewlineAfterHeadings.test.ts b/tools/rules/normal/NewLineAfterHeadings.test.ts similarity index 100% rename from tools/rules/raw/NewlineAfterHeadings.test.ts rename to tools/rules/normal/NewLineAfterHeadings.test.ts diff --git a/tools/rules/raw/NewLineAfterHeadings.ts b/tools/rules/normal/NewLineAfterHeadings.ts similarity index 100% rename from tools/rules/raw/NewLineAfterHeadings.ts rename to tools/rules/normal/NewLineAfterHeadings.ts diff --git a/tools/rules/raw/NoTrailingWhitespace.test.ts b/tools/rules/normal/NoTrailingWhitespace.test.ts similarity index 100% rename from tools/rules/raw/NoTrailingWhitespace.test.ts rename to tools/rules/normal/NoTrailingWhitespace.test.ts diff --git a/tools/rules/raw/NoTrailingWhitespace.ts b/tools/rules/normal/NoTrailingWhitespace.ts similarity index 100% rename from tools/rules/raw/NoTrailingWhitespace.ts rename to tools/rules/normal/NoTrailingWhitespace.ts diff --git a/tools/rules/remark/RemarkRules.ts b/tools/rules/normal/RemarkRules.ts similarity index 88% rename from tools/rules/remark/RemarkRules.ts rename to tools/rules/normal/RemarkRules.ts index 554c501..5ae27ee 100644 --- a/tools/rules/remark/RemarkRules.ts +++ b/tools/rules/normal/RemarkRules.ts @@ -4,8 +4,6 @@ import remarkParse from 'remark-parse' import { unified, type PluggableList } from 'unified' import remarkLint from 'remark-lint' import remarkStringify from 'remark-stringify' -import remarkLintNoHtml from "remark-lint-no-html"; -import remarkLintFinalNewline from "remark-lint-final-newline"; export class RemarkRules extends Rule { private list: PluggableList diff --git a/tools/src/Problem.ts b/tools/src/Problem.ts index db17a96..97c113a 100644 --- a/tools/src/Problem.ts +++ b/tools/src/Problem.ts @@ -48,4 +48,10 @@ export class Problem { ` ${gray}in${reset} ${purple}"${this.filename}"${reset}\n ${gray}at${reset} ${cyan}${this.fileLocation.row}${reset}:${cyan}${this.fileLocation.col}${reset}\n` ); } + printQuickfix() { + if (this.level === 'silent') return; + + const output = `${this.filename}:${this.fileLocation.row}:${this.fileLocation.col}: [${this.id}] ${this.level.toUpperCase()}: ${this.message}`; + console.log(output); + } } diff --git a/tools/src/Registry.ts b/tools/src/Registry.ts index 6c9b118..0d829b3 100644 --- a/tools/src/Registry.ts +++ b/tools/src/Registry.ts @@ -16,6 +16,9 @@ export class Registry { print(){ this.rules.forEach(rule => rule.print()) } + printQuickFix(){ + this.rules.forEach(rule => rule.printQuickFix()) + } } diff --git a/tools/src/Rule.ts b/tools/src/Rule.ts index 170bcf5..8b83460 100644 --- a/tools/src/Rule.ts +++ b/tools/src/Rule.ts @@ -25,4 +25,7 @@ export abstract class Rule { print() { this.problems.forEach(p => p.print()) } + printQuickFix() { + this.problems.forEach(p => p.printQuickfix()) + } } diff --git a/tools/src/Runner.ts b/tools/src/Runner.ts index 174ea6b..f60e8e4 100644 --- a/tools/src/Runner.ts +++ b/tools/src/Runner.ts @@ -21,6 +21,9 @@ export class Runner { print() { this.registry.print() } + printQuickFix() { + this.registry.printQuickFix() + } issuesWereFound() { return this.registry.isssuesWereFound() } From d2a18ece1e21a0884e3d4af931478e8e4083f760 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 13:45:05 -0700 Subject: [PATCH 092/131] added remark rules --- tools/bun.lock | 12 ++++++++++++ tools/index.ts | 11 ++++++++++- tools/package.json | 4 ++++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/tools/bun.lock b/tools/bun.lock index 584ca6d..9545c0f 100644 --- a/tools/bun.lock +++ b/tools/bun.lock @@ -8,7 +8,11 @@ "remark-lint": "^10.0.1", "remark-lint-checkbox-content-indent": "^5.0.1", "remark-lint-final-newline": "^3.0.1", + "remark-lint-list-item-content-indent": "^4.0.1", + "remark-lint-list-item-indent": "^4.0.1", "remark-lint-no-html": "^4.0.1", + "remark-lint-no-tabs": "^4.0.1", + "remark-lint-unordered-list-marker-style": "^4.0.1", "remark-parse": "^11.0.0", "remark-stringify": "^11.0.0", "to-vfile": "^8.0.0", @@ -136,8 +140,16 @@ "remark-lint-final-newline": ["remark-lint-final-newline@3.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "devlop": "^1.0.0", "unified-lint-rule": "^3.0.0", "vfile-location": "^5.0.0" } }, "sha512-q5diKHD6BMbzqWqgvYPOB8AJgLrMzEMBAprNXjcpKoZ/uCRqly+gxjco+qVUMtMWSd+P+KXZZEqoa7Y6QiOudw=="], + "remark-lint-list-item-content-indent": ["remark-lint-list-item-content-indent@4.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-phrasing": "^4.0.0", "pluralize": "^8.0.0", "unified-lint-rule": "^3.0.0", "unist-util-position": "^5.0.0", "unist-util-visit-parents": "^6.0.0", "vfile-message": "^4.0.0" } }, "sha512-KSopxxp64O6dLuTQ2sWaTqgjKWr1+AoB1QCTektMJ3mfHfn0QyZzC2CZbBU22KGzBhiYXv9cIxlJlxUtq2NqHg=="], + + "remark-lint-list-item-indent": ["remark-lint-list-item-indent@4.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-phrasing": "^4.0.0", "pluralize": "^8.0.0", "unified-lint-rule": "^3.0.0", "unist-util-position": "^5.0.0", "unist-util-visit-parents": "^6.0.0" } }, "sha512-gJd1Q+jOAeTgmGRsdMpnRh01DUrAm0O5PCQxE8ttv1QZOV015p/qJH+B4N6QSmcUuPokHLAh9USuq05C73qpiA=="], + "remark-lint-no-html": ["remark-lint-no-html@4.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "unified-lint-rule": "^3.0.0", "unist-util-visit-parents": "^6.0.0" } }, "sha512-d5OD+lp2PJMtIIpCR12uDCGzmmRbYDx+bc2iTIX6Bgo0vprQY0dBG1UXbUT5q8KRijXwOFwBDX6Ogl9atRwCGA=="], + "remark-lint-no-tabs": ["remark-lint-no-tabs@4.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "unified-lint-rule": "^3.0.0", "vfile-location": "^5.0.0" } }, "sha512-+lhGUgY3jhTwWn1x+tTIJNy5Fbs2NcYXCobRY7xeszY0VKPCBF2GyELafOVnr+iTmosXLuhZPp5YwNezQKH9IQ=="], + + "remark-lint-unordered-list-marker-style": ["remark-lint-unordered-list-marker-style@4.0.1", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-phrasing": "^4.0.0", "unified-lint-rule": "^3.0.0", "unist-util-position": "^5.0.0", "unist-util-visit-parents": "^6.0.0", "vfile-message": "^4.0.0" } }, "sha512-HMrVQC0Qbr8ktSy+1lJGRGU10qecL3T14L6s/THEQXR5Tk0wcsLLG0auNvB4r2+H+ClhVO/Vnm1TEosh1OCsfw=="], + "remark-message-control": ["remark-message-control@8.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-comment-marker": "^3.0.0", "unified-message-control": "^5.0.0", "vfile": "^6.0.0" } }, "sha512-brpzOO+jdyE/mLqvqqvbogmhGxKygjpCUCG/PwSCU43+JZQ+RM+sSzkCWBcYvgF3KIAVNIoPsvXjBkzO7EdsYQ=="], "remark-parse": ["remark-parse@11.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-from-markdown": "^2.0.0", "micromark-util-types": "^2.0.0", "unified": "^11.0.0" } }, "sha512-FCxlKLNGknS5ba/1lmpYijMUzX2esxW5xQqjWxw2eHFfS2MSdaHVINFmhjo+qN1WhZhNimq0dZATN9pH0IDrpA=="], diff --git a/tools/index.ts b/tools/index.ts index a6d75ad..d82e59e 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -8,12 +8,17 @@ import remarkLintNoHtml from "remark-lint-no-html"; import type { VFile } from "vfile"; import { parseArgs } from "util"; +import remarkLintNoTabs from "remark-lint-no-tabs"; +import remarkLintUnorderedListMarkerStyle from "remark-lint-unordered-list-marker-style"; +import remarkLintListItemContentIndent from "remark-lint-list-item-content-indent"; +import remarkLintListItemIndent from "remark-lint-list-item-indent"; const { values } = parseArgs({ args: Bun.argv, options: { quickFix: { type: "boolean", + short: "q", }, }, allowPositionals: true, @@ -27,7 +32,11 @@ const runner = new Runner(await Repo.capabilities().vfiles(), [ new NoTrailingWhitespace(), new RemarkRules([ remarkLintFinalNewline, - [remarkLintNoHtml, { allowComments: false }] + remarkLintNoTabs, + remarkLintListItemContentIndent, + [remarkLintListItemIndent, "one"], + [remarkLintUnorderedListMarkerStyle, '-'], + [remarkLintNoHtml, { allowComments: false }], ]), ]) diff --git a/tools/package.json b/tools/package.json index 5fdd714..42248ce 100644 --- a/tools/package.json +++ b/tools/package.json @@ -13,7 +13,11 @@ "remark-lint": "^10.0.1", "remark-lint-checkbox-content-indent": "^5.0.1", "remark-lint-final-newline": "^3.0.1", + "remark-lint-list-item-content-indent": "^4.0.1", + "remark-lint-list-item-indent": "^4.0.1", "remark-lint-no-html": "^4.0.1", + "remark-lint-no-tabs": "^4.0.1", + "remark-lint-unordered-list-marker-style": "^4.0.1", "remark-parse": "^11.0.0", "remark-stringify": "^11.0.0", "to-vfile": "^8.0.0", From bc164ab945f2846f77755aa2281770976c482175 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 18:27:28 -0700 Subject: [PATCH 093/131] cleanup linter --- tools/bun.lock | 1 + tools/index.ts | 11 +- tools/package.json | 1 + tools/rules/normal/NewLineAfterHeadings.ts | 4 +- .../rules/normal/NoTrailingWhitespace.test.ts | 4 +- tools/rules/normal/NoTrailingWhitespace.ts | 4 +- tools/rules/normal/RemarkRules.ts | 9 +- tools/src/Problem.ts | 36 +++--- tools/src/Registry.ts | 24 ---- tools/src/Repo.ts | 48 +++----- tools/src/Rule.ts | 37 ++++-- tools/src/Runner.ts | 16 +-- tools/src/template-capability.ts | 108 ------------------ 13 files changed, 77 insertions(+), 226 deletions(-) delete mode 100644 tools/src/Registry.ts delete mode 100644 tools/src/template-capability.ts diff --git a/tools/bun.lock b/tools/bun.lock index 9545c0f..c854610 100644 --- a/tools/bun.lock +++ b/tools/bun.lock @@ -18,6 +18,7 @@ "to-vfile": "^8.0.0", "unified": "^11.0.5", "vfile": "^6.0.3", + "vfile-message": "^4.0.3", "vfile-reporter": "^8.1.1", "vfile-reporter-json": "^4.0.0", }, diff --git a/tools/index.ts b/tools/index.ts index d82e59e..68c440a 100644 --- a/tools/index.ts +++ b/tools/index.ts @@ -6,7 +6,6 @@ import { RemarkRules } from "./rules/normal/RemarkRules"; import remarkLintFinalNewline from "remark-lint-final-newline"; import remarkLintNoHtml from "remark-lint-no-html"; import type { VFile } from "vfile"; - import { parseArgs } from "util"; import remarkLintNoTabs from "remark-lint-no-tabs"; import remarkLintUnorderedListMarkerStyle from "remark-lint-unordered-list-marker-style"; @@ -25,12 +24,10 @@ const { values } = parseArgs({ strict: true, }); -console.log(); - -const runner = new Runner(await Repo.capabilities().vfiles(), [ +const runner = new Runner(await Repo.all(), [ new NewLineAfterHeadings(), new NoTrailingWhitespace(), - new RemarkRules([ + new RemarkRules([ // See additional rules here: https://github.com/remarkjs/remark-lint/tree/main?tab=readme-ov-file#rules remarkLintFinalNewline, remarkLintNoTabs, remarkLintListItemContentIndent, @@ -48,8 +45,8 @@ if (runner.issuesWereFound()) { } else { runner.print() } - process.exit(1) + process.exit(1) // tell the pipeline to fail } else { console.log("No issues found") - process.exit(0) + process.exit(0) // tell the pipeline to pass } diff --git a/tools/package.json b/tools/package.json index 42248ce..f63f488 100644 --- a/tools/package.json +++ b/tools/package.json @@ -23,6 +23,7 @@ "to-vfile": "^8.0.0", "unified": "^11.0.5", "vfile": "^6.0.3", + "vfile-message": "^4.0.3", "vfile-reporter": "^8.1.1", "vfile-reporter-json": "^4.0.0" } diff --git a/tools/rules/normal/NewLineAfterHeadings.ts b/tools/rules/normal/NewLineAfterHeadings.ts index e3efcc1..c078794 100644 --- a/tools/rules/normal/NewLineAfterHeadings.ts +++ b/tools/rules/normal/NewLineAfterHeadings.ts @@ -9,8 +9,8 @@ export class NewLineAfterHeadings extends Rule for (let i = 0; i < lines.length; i++) { if (lines[i]?.charAt(0) === '#' && i+1 < lines.length && lines[i + 1] !== '') { this.report(filename, 'new-line-after-headings', 'You must have a new line after headings.', { - row: i + 2, - col: 1, + line: i + 2, + column: 1, }) } } diff --git a/tools/rules/normal/NoTrailingWhitespace.test.ts b/tools/rules/normal/NoTrailingWhitespace.test.ts index 8770c9f..3c4b2ee 100644 --- a/tools/rules/normal/NoTrailingWhitespace.test.ts +++ b/tools/rules/normal/NoTrailingWhitespace.test.ts @@ -18,8 +18,8 @@ This is not correct. rule.run(content) const problems = rule.getProblems() expect(problems).not.toBeEmpty() - expect(problems[0]?.getFileLocation().row).toEqual(2) - expect(problems[0]?.getFileLocation().col).toEqual(15) + expect(problems[0]?.getFileLocation().line).toEqual(2) + expect(problems[0]?.getFileLocation().column).toEqual(15) }) it('should not fail when no lines have trailing white space', () => { const rule = mkRule() diff --git a/tools/rules/normal/NoTrailingWhitespace.ts b/tools/rules/normal/NoTrailingWhitespace.ts index f6aafdd..8bc7d8f 100644 --- a/tools/rules/normal/NoTrailingWhitespace.ts +++ b/tools/rules/normal/NoTrailingWhitespace.ts @@ -9,8 +9,8 @@ export class NoTrailingWhitespace extends Rule for (const [index, line] of lines.entries()) { if (line !== line.trimEnd()) { this.report(filename, 'no-trailing-white-space', 'Trailing whitespace is not allowed', { - row: index + 1, - col: line.trimEnd().length + 1, + line: index + 1, + column: line.trimEnd().length + 1, }) } } diff --git a/tools/rules/normal/RemarkRules.ts b/tools/rules/normal/RemarkRules.ts index 5ae27ee..75bf657 100644 --- a/tools/rules/normal/RemarkRules.ts +++ b/tools/rules/normal/RemarkRules.ts @@ -11,6 +11,7 @@ export class RemarkRules extends Rule { super() this.list = list } + override async run(file: VFile) { await unified() .use(remarkParse) @@ -20,13 +21,7 @@ export class RemarkRules extends Rule { .process(file) file.messages.forEach(m => { - if (m.file === undefined) return - this.report( - m.file, - m.ruleId || 'unknown-rule', - m.message, - { col: m.column || -1, row: m.line || -1 } - ) + this.reportVfileMessage(m) }) } } diff --git a/tools/src/Problem.ts b/tools/src/Problem.ts index 97c113a..28cc3a4 100644 --- a/tools/src/Problem.ts +++ b/tools/src/Problem.ts @@ -1,16 +1,12 @@ -export type FileLocation = { - col: number - row: number -} +import type { VFileMessage } from "vfile-message" +import type { Point } from 'unist/index.d.ts' export type Level = 'silent' | 'warning' | 'error' -export class Problem { - private id: Ids - private filename: string - private message: string +export class Problem { + private message: VFileMessage private level: Level - private fileLocation: FileLocation + private point: Point private readonly colors = { reset: "\x1b[0m", @@ -23,15 +19,13 @@ export class Problem { gray: "\x1b[90m", }; - constructor(id: Ids, level: Level, filename: string, fileLocation: FileLocation, message: string) { - this.id = id - this.filename = filename + constructor(message: VFileMessage, level: Level) { this.message = message this.level = level - this.fileLocation = fileLocation + this.point = message.place as unknown as Point } - getFileLocation = () => this.fileLocation + getFileLocation = () => this.message.place print() { if (this.level === 'silent') return; @@ -40,18 +34,14 @@ export class Problem { const color = this.level === 'error' ? red : yellow; const label = this.level.toUpperCase(); - - console.log( - `${bold}${color}${label}${reset} ${bold}${this.id}${reset}: ${this.message}` - ); - console.log( - ` ${gray}in${reset} ${purple}"${this.filename}"${reset}\n ${gray}at${reset} ${cyan}${this.fileLocation.row}${reset}:${cyan}${this.fileLocation.col}${reset}\n` - ); + + console.log( `${bold}${color}${label}${reset} ${bold}${this.message.ruleId}${reset}: ${this.message.message}`); + console.log( ` ${gray}in${reset} ${purple}"${this.message.file}"${reset}\n ${gray}at${reset} ${cyan}${this.point.line}${reset}:${cyan}${this.point.column}${reset}\n`); } printQuickfix() { if (this.level === 'silent') return; - - const output = `${this.filename}:${this.fileLocation.row}:${this.fileLocation.col}: [${this.id}] ${this.level.toUpperCase()}: ${this.message}`; + const output = `${this.message.file}:${this.point.line || -1}:${this.point.column}: ${this.message.ruleId} -- ${this.level.toUpperCase()}: ${this.message}`; console.log(output); } } + diff --git a/tools/src/Registry.ts b/tools/src/Registry.ts deleted file mode 100644 index 0d829b3..0000000 --- a/tools/src/Registry.ts +++ /dev/null @@ -1,24 +0,0 @@ -import type { Rule } from "./Rule"; - - - -export class Registry { - private rules: Rule[] = [] - register(rule: Rule) { - this.rules.push(rule) - } - run(input: T) { - this.rules.forEach(rule => rule.run(input)) - } - isssuesWereFound() { - return this.rules.map(r => r.hasProblems()).includes(true) - } - print(){ - this.rules.forEach(rule => rule.print()) - } - printQuickFix(){ - this.rules.forEach(rule => rule.printQuickFix()) - } -} - - diff --git a/tools/src/Repo.ts b/tools/src/Repo.ts index f3da613..92a4c01 100644 --- a/tools/src/Repo.ts +++ b/tools/src/Repo.ts @@ -1,42 +1,26 @@ import { readdirSync } from 'node:fs' -import { join } from 'node:path' +import { join, resolve } from 'node:path' import { read } from 'to-vfile'; const ROOT = join(Bun.main, '..','..') -async function getAllFrom(folder: string): Promise { - return await Promise.all(readdirSync(folder) - .map(async (file) => ({ - filename: join(folder, file), - content: await Bun.file(join(folder, file)).text() - }))) -} +const getAllFiles = (dir: string): string[] => + readdirSync(dir, { withFileTypes: true }).flatMap((file) => { + if (file.isDirectory()) return getAllFiles(resolve(dir, file.name)) + return resolve(dir, file.name) + }) -class SourceFolder { - private root: string; - constructor(root: string) { - this.root = root - } - fileNames() { - return readdirSync(this.root) - } - filePaths() { - const root = this.root - return this.fileNames().map((name) => join(root, name)) - } - async vfiles() { - return await Promise.all(this.filePaths().map(async path => { - return await read(path) - })) - } -} +const capabilities = () => getAllFiles(join(ROOT, 'capabilities')) +const practices = () => getAllFiles(join(ROOT, 'practices')) +const resources = () => getAllFiles(join(ROOT, 'resources')) + +const getVfiles = async (paths: string[]) => + await Promise.all(paths.map(async path => await read(path))) export class Repo { - static async getCapabilities() { - return getAllFrom(join(ROOT, 'capabilities')) - } - static capabilities(): SourceFolder { - return new SourceFolder(join(ROOT, 'capabilities')) - } + static capabilities = async () => await getVfiles(capabilities()) + static practices = async () => await getVfiles(practices()) + static resources = async () => await getVfiles(resources()) + static all = async () => await getVfiles([ ...capabilities(), ...practices(), ...resources()]) } diff --git a/tools/src/Rule.ts b/tools/src/Rule.ts index 8b83460..3c1243d 100644 --- a/tools/src/Rule.ts +++ b/tools/src/Rule.ts @@ -1,30 +1,49 @@ -import { type FileLocation, type Level, Problem } from "./Problem" +import { VFileMessage } from "vfile-message" +import { type Level, Problem } from "./Problem" +import type { Point } from 'unist/index.d.ts' -type RuleConfig = Record +type RuleConfig = Record export abstract class Rule { - private problems: Problem[] = [] - private config: RuleConfig | null - constructor(config?: RuleConfig) { + private problems: Problem[] = [] + private config: RuleConfig | null + + constructor(config?: RuleConfig) { if (config === undefined) { this.config = null } else { this.config = config } } + abstract run(subject: In): void; - protected report(filename: string, id: Ids, message: string, fileLocation: FileLocation) { - this.problems.push(new Problem(id, this.config === null ? 'error' : this.config[id], filename, fileLocation, message)) - } - getProblems(): Problem[] { + + protected reportVfileMessage(m: VFileMessage) { + let ruleId = m.ruleId + if (this.config === null || ruleId === undefined || this.config[ruleId] === undefined) + this.problems.push(new Problem(m, 'error')) + else + this.problems.push(new Problem(m, this.config[ruleId])) + } + + protected report(file: string, ruleId: Ids, message: string, place: Point) { + const m = new VFileMessage(message, { place, ruleId }) + m.file = file + this.reportVfileMessage(m) + } + + getProblems(): Problem[] { return this.problems } + hasProblems() { return this.problems.length !== 0 } + print() { this.problems.forEach(p => p.print()) } + printQuickFix() { this.problems.forEach(p => p.printQuickfix()) } diff --git a/tools/src/Runner.ts b/tools/src/Runner.ts index f60e8e4..648f68c 100644 --- a/tools/src/Runner.ts +++ b/tools/src/Runner.ts @@ -1,30 +1,26 @@ -import { Registry } from "./Registry"; import type { Rule } from "./Rule"; export class Runner { private content: T[] - private registry: Registry + private rules: Rule[] constructor(content: T[], rules: Rule[]) { this.content = content - this.registry = new Registry() - for (const rule of rules) { - this.registry.register(rule) - } + this.rules = rules } async run() { for (const item of this.content) { - this.registry.run(item) + this.rules.forEach(rule => rule.run(item)) } } print() { - this.registry.print() + this.rules.forEach(rule => rule.print()) } printQuickFix() { - this.registry.printQuickFix() + this.rules.forEach(rule => rule.printQuickFix()) } issuesWereFound() { - return this.registry.isssuesWereFound() + return this.rules.map(rule => rule.hasProblems()).includes(true) } } diff --git a/tools/src/template-capability.ts b/tools/src/template-capability.ts deleted file mode 100644 index 0a5d28f..0000000 --- a/tools/src/template-capability.ts +++ /dev/null @@ -1,108 +0,0 @@ - -type TitleDescription = { - title: string - description: string -} - -type Practice = { - title: string - description: string - url?: string -} - -type AdjacentCapabilities = { - title: string - description: string - url: string - relationship: 'Related' | 'Upstream' | 'Downstream' -} - -type AssessmentItems = { - minimal: TitleDescription - basic: TitleDescription - good: TitleDescription - excelent: TitleDescription -} - -type Capability = { - title: string - intro: string - doraUrl: string - nuances: TitleDescription[] - assessment: AssessmentItems - practices: Practice[] - adjacentCapabilities: AdjacentCapabilities[] -} - -const displayAdjacentCapabilities = ({title, description, url, relationship}: AdjacentCapabilities) => `### [${title}](${url}) - ${relationship} - -${description}` - -const displayAssessemntItem = ({title, description}: TitleDescription) => - `**${title}:** ${description}` - -const displayPractice = ({ title, description, url: link }: Practice) => { - if (link) { - return `### ${title} - -${description}` - } - return -} - -export function template_thingy ({ - title, - intro, - doraUrl, - nuances, - assessment, - practices, - adjacentCapabilities -}: Capability) { - const assessmentIntro = `To assess how mature your team or organization is in this capability, complete this short exercise. - -Don't worry if the description doesn't exactly match your situation. These descriptions are meant to be examples of situations that would qualify for the associated score.` - const assessmentOutro = `The number you selected represents your overall score for this capability. If you feel like your company fits somewhere in between two scores, it's okay to use a decimal. Generally, an overall score equal to or less than 3 means you'll likely gain a lot of value from experimenting with some of the supporting practices listed here. An overall score higher than 3 generally means you and your team are largely proficient, or well on your way to becoming proficient. Instead you would likely benefit from evaluating your scores in other capabilities.` - - - return `# [${title}](${doraUrl}) - -${intro} - -## Nuances - -${nuances.map(nuance => `${nuance.title} - -${nuance.description} - -`)} - -## Assessment - -${assessmentIntro} - -1. ${displayAssessemntItem(assessment.minimal)} -2. ${displayAssessemntItem(assessment.basic)} -3. ${displayAssessemntItem(assessment.good)} -4. ${displayAssessemntItem(assessment.excelent)} - -${assessmentOutro} - -## Supporting Practices - -The following is a curated list of supporting practices to consider when looking to improve your team's ${title} capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. - -${practices.map(displayPractice)} - -## Adjacent Capabilities - -The following capabilities will be valuable for you and your team to explore, as they are either: - -- Related (they cover similar territory to ${title}) -- Upstream (they are a pre-requisite for ${title}) -- Downstream (${title} is a pre-requisite for them) - -${adjacentCapabilities.map(displayAdjacentCapabilities)} -` -} - From f5268cd77792017e3d8cbe2d462ac7cfde1cd9ff Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 18:54:52 -0700 Subject: [PATCH 094/131] improve readme --- tools/README.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/tools/README.md b/tools/README.md index 40e1d1f..6b5fa6f 100644 --- a/tools/README.md +++ b/tools/README.md @@ -1,12 +1,9 @@ # Open Practice Repository Tooling -Roadmap of features that we will want to implement starting with very simple linting. +There are two ways you can contribute a new rule. -## Linting +First, add new rules from remark in the `RemarkRules` constructore. See a list of those here: // See additional rules here: https://github.com/remarkjs/remark-lint/tree/main?tab=readme-ov-file#rules -Features: -- [x] lint all capabilities - -Rules: -- [x] New Line after all headings +Second is to duplicate an existing rule in the `tools/rules/normal` folder and replace your logic with your own. +Giving AI one rule and asking it to make a new rule from you based on the provided example might be a decent way to generate new rules if you are not satisfied with what is available currently. From a28709b93a9fcf757b966ddb72c8a9e5838400a3 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Fri, 23 Jan 2026 18:58:59 -0700 Subject: [PATCH 095/131] remove unused global types --- tools/types.d.ts | 11 ----------- 1 file changed, 11 deletions(-) delete mode 100644 tools/types.d.ts diff --git a/tools/types.d.ts b/tools/types.d.ts deleted file mode 100644 index 000cc9b..0000000 --- a/tools/types.d.ts +++ /dev/null @@ -1,11 +0,0 @@ -declare global { - var paths: { - capabilities: string - practices: string - resources: string - templates: string - }; -} - - -export {}; From 5e0f3bb7da14c90264088acfdd4085cdf26925e1 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 26 Jan 2026 09:47:21 -0700 Subject: [PATCH 096/131] clean up --- tools/README.md | 2 +- tools/rules/normal/NewLineAfterHeadings.test.ts | 2 +- tools/rules/normal/NoTrailingWhitespace.test.ts | 10 +++++++--- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/tools/README.md b/tools/README.md index 6b5fa6f..565858f 100644 --- a/tools/README.md +++ b/tools/README.md @@ -2,7 +2,7 @@ There are two ways you can contribute a new rule. -First, add new rules from remark in the `RemarkRules` constructore. See a list of those here: // See additional rules here: https://github.com/remarkjs/remark-lint/tree/main?tab=readme-ov-file#rules +First, add new rules from remark in the `RemarkRules` constructore. See a list of those [here](https://github.com/remarkjs/remark-lint/tree/main?tab=readme-ov-file#rules). Second is to duplicate an existing rule in the `tools/rules/normal` folder and replace your logic with your own. diff --git a/tools/rules/normal/NewLineAfterHeadings.test.ts b/tools/rules/normal/NewLineAfterHeadings.test.ts index d8ed4c7..ddea798 100644 --- a/tools/rules/normal/NewLineAfterHeadings.test.ts +++ b/tools/rules/normal/NewLineAfterHeadings.test.ts @@ -22,7 +22,7 @@ This is not correct. const rule = mkRule() rule.run(mkInput(`# Some Heading -This is not correct. +This is correct. `)) expect(rule.getProblems()).toBeEmpty() }) diff --git a/tools/rules/normal/NoTrailingWhitespace.test.ts b/tools/rules/normal/NoTrailingWhitespace.test.ts index 3c4b2ee..a3f2720 100644 --- a/tools/rules/normal/NoTrailingWhitespace.test.ts +++ b/tools/rules/normal/NoTrailingWhitespace.test.ts @@ -1,6 +1,7 @@ import { describe, it, expect } from 'bun:test' import { NoTrailingWhitespace } from './NoTrailingWhitespace' import type { VFile } from 'vfile' +import {type Point} from 'unist/index' const mkRule = () => new NoTrailingWhitespace({'no-trailing-white-space': 'silent'}) @@ -8,6 +9,7 @@ const mkInput = (content: string) => { return { path: "mock-thing.md", value: Buffer.from(content) } as unknown as VFile } + describe(NoTrailingWhitespace.name, () => { it('should fail when lines have trailing whitespace', () => { const rule = mkRule() @@ -18,13 +20,15 @@ This is not correct. rule.run(content) const problems = rule.getProblems() expect(problems).not.toBeEmpty() - expect(problems[0]?.getFileLocation().line).toEqual(2) - expect(problems[0]?.getFileLocation().column).toEqual(15) + + const location = (problems[0]?.getFileLocation() as Point) + expect(location.line).toEqual(2) + expect(location.column).toEqual(15) }) it('should not fail when no lines have trailing white space', () => { const rule = mkRule() const content = mkInput(`# Some Heading -This is not correct. +This is correct. - cool - beans `) From b474c702bf23f1fe6ded102dac5312a89771b192 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Mon, 26 Jan 2026 12:41:24 -0800 Subject: [PATCH 097/131] Add basic bun information to the tools readme --- tools/README.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/tools/README.md b/tools/README.md index 565858f..f8d8d15 100644 --- a/tools/README.md +++ b/tools/README.md @@ -1,5 +1,18 @@ # Open Practice Repository Tooling +This folder is where we store some helpful tooling to keep our repository organized. We currently have a linter, and might include more in future versions. + +## Running + +To run all of the linting rules, you'll have to `cd` into this directory and fun the following commands: + +```bash +bun install +bun index.ts +``` + +## Contributing + There are two ways you can contribute a new rule. First, add new rules from remark in the `RemarkRules` constructore. See a list of those [here](https://github.com/remarkjs/remark-lint/tree/main?tab=readme-ov-file#rules). From ece114f752cda976044977f913be328e7d1ab07c Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Thu, 18 Dec 2025 15:08:12 -0700 Subject: [PATCH 098/131] use data generation tools --- practices/use-data-generation-tools.md | 89 ++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 practices/use-data-generation-tools.md diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md new file mode 100644 index 0000000..4132f83 --- /dev/null +++ b/practices/use-data-generation-tools.md @@ -0,0 +1,89 @@ +# Use Data Generation Tools + +Data generation tools are namely about properly managing testing data. No +matter what type of test you are running there is an appropriate way to handle +your data so your tests are fast and reliable. + +For unit tests and other isolated tests data management might be as simple +defining variables or objects. This should almost always be your first choice. +When a need arises a simple [factory +method](https://refactoring.guru/design-patterns/factory-method) or factory +library like [Fishery](https://github.com/thoughtbot/fishery) will improve the +maintainability and readability. If your solution doesn't feel like it's making +things simpler, redirect to one that does. + +For integrated tests like E2E tests or integration tests, data generation might +be as simple as sql scripts to initialize and tear down test data before and +after your tests. Once data has grown in complexity or if you are dealing with +a complex legacy solution, introducing specialized tools like RedGate, dbForge, +SSDT or one of the many other SQL tool sets that fits your companies needs will +greatly improve. + +The core idea here whether in an isolated environment or in an integrated +environment is to ensure you keep your tests clear and to follow the "Arrange, +Act, Assert" pattern and that each test cleans up after its self so they are +reliable and easy to understand. Introduce tools as needs arise waiting to feel +the need for that tool before pulling the trigger and increasing the complexity +dependency structure. + +## When to Experiment + +You are a Developer and need to ensure that test data is easily managed so that +you can maintain a high quality developer experience and retain the users +positive experience. + +## How to Gain Traction + +First, bring up the need you have for data generation tools with your team so +you can gain consensus and ensure you are thinking of everyone's needs. + +Suggest a few tools and discuss different options with your team while being +understanding of the needs of your DevOps and DBA teams. + +Implement the agreed upon solution. + +## Lessons From The Field + +### Be careful about tests that depend on each other + +When dealing with data that can cross test boundaries like data inside your +database or global variables (`window` & `document` in a web context) make sure +that each test you write does not end up dependent on the setup or result of +another test. You can easily check this by running each of your tests in +isolation. If a test only passes when other tests are also run, some +modification needs to be made to decouple the tests from each other. + +## Deciding to Polish or Pitch + +After experimenting with this practice for 2-3 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + +### Feedback Cycles + +Are your tests providing faster feedback? + +### Maintainability + +Are your tests easier to write and work with? + +### Improved Test Reliability + +Are your tests less flaky and more reliable? + +## Supported Capabilities + +### Test Data Management + +[Test Data Management](/capabilities/test-data-management.md) + +### Database Change Management + +[Database Change Management](/capabilities/database-change-management.md) + +### Continuous Delivery + +[Continuous Delivery](/capabilities/continuous-delivery.md) + +### Continuous Integration + +[Continuous Integration](/capabilities/continuous-integration.md) + From 940d5941b0d569da0f5a3e92d499989b5b97b6b7 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 31 Dec 2025 11:37:38 -0800 Subject: [PATCH 099/131] Data Gen Tools: simplify intro --- practices/use-data-generation-tools.md | 55 ++++++++------------------ 1 file changed, 17 insertions(+), 38 deletions(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index 4132f83..e20197d 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -1,57 +1,36 @@ # Use Data Generation Tools -Data generation tools are namely about properly managing testing data. No -matter what type of test you are running there is an appropriate way to handle -your data so your tests are fast and reliable. - -For unit tests and other isolated tests data management might be as simple -defining variables or objects. This should almost always be your first choice. -When a need arises a simple [factory -method](https://refactoring.guru/design-patterns/factory-method) or factory -library like [Fishery](https://github.com/thoughtbot/fishery) will improve the -maintainability and readability. If your solution doesn't feel like it's making -things simpler, redirect to one that does. - -For integrated tests like E2E tests or integration tests, data generation might -be as simple as sql scripts to initialize and tear down test data before and -after your tests. Once data has grown in complexity or if you are dealing with -a complex legacy solution, introducing specialized tools like RedGate, dbForge, -SSDT or one of the many other SQL tool sets that fits your companies needs will -greatly improve. - -The core idea here whether in an isolated environment or in an integrated -environment is to ensure you keep your tests clear and to follow the "Arrange, -Act, Assert" pattern and that each test cleans up after its self so they are -reliable and easy to understand. Introduce tools as needs arise waiting to feel -the need for that tool before pulling the trigger and increasing the complexity -dependency structure. +Data generation tools reduce the complexity of generating complex data types or data rows. + +*Isolated Tests:* When a need arises use a simple [factory method](https://refactoring.guru/design-patterns/factory-method) or factory library like [Fishery](https://github.com/thoughtbot/fishery) to improve the maintainability and readability. + +*Integrated Tests:* Integrated Tests will usually need more data management. There are many good tools for this. For example: + +- RedGate +- dbForge +- SSDT ## When to Experiment -You are a Developer and need to ensure that test data is easily managed so that -you can maintain a high quality developer experience and retain the users -positive experience. +You are a Developer and need to ensure that test data is easily managed so that you can maintain a high quality developer experience and retain the users positive experience. ## How to Gain Traction -First, bring up the need you have for data generation tools with your team so -you can gain consensus and ensure you are thinking of everyone's needs. +First, bring up the need you have for data generation tools with your team so you can gain consensus and ensure you are thinking of everyone's needs. -Suggest a few tools and discuss different options with your team while being -understanding of the needs of your DevOps and DBA teams. +Suggest a few tools and discuss different options with your team while being understanding of the needs of your DevOps and DBA teams. Implement the agreed upon solution. ## Lessons From The Field +### Doint use them until you feel a need + +While data generation tools can be helpful in reducing complexity, if you don't see that complexity yet, consider defering the decision to add new tools until that complexity arises. + ### Be careful about tests that depend on each other -When dealing with data that can cross test boundaries like data inside your -database or global variables (`window` & `document` in a web context) make sure -that each test you write does not end up dependent on the setup or result of -another test. You can easily check this by running each of your tests in -isolation. If a test only passes when other tests are also run, some -modification needs to be made to decouple the tests from each other. +When dealing with data that can cross test boundaries like data inside your database or global variables (`window` & `document` in a web context) make sure that each test you write does not end up dependent on the setup or result of another test. You can easily check this by running each of your tests in isolation. If a test only passes when other tests are also run, some modification needs to be made to decouple the tests from each other. ## Deciding to Polish or Pitch From 9ec886848dae8e9cf7120a0f1c2078eb3f3a30ca Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 31 Dec 2025 15:08:53 -0800 Subject: [PATCH 100/131] Data Gen Tools: improved descriptions --- practices/use-data-generation-tools.md | 28 ++++++++------------------ 1 file changed, 8 insertions(+), 20 deletions(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index e20197d..9eb9548 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -36,33 +36,21 @@ When dealing with data that can cross test boundaries like data inside your data After experimenting with this practice for 2-3 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: -### Feedback Cycles +## Fast & Intangible -Are your tests providing faster feedback? - -### Maintainability - -Are your tests easier to write and work with? - -### Improved Test Reliability - -Are your tests less flaky and more reliable? +Your tests should be more *maintainable* after implementing this practice. Specifically, this means you should find yourself less likely to be fiddling with tests for lengthy periods to setup large sets of data. If you have not improved the time or energy to setup data, consider removing the tool and using factories for in memory data structures and raw sql scripts for sql data. ## Supported Capabilities -### Test Data Management - -[Test Data Management](/capabilities/test-data-management.md) - -### Database Change Management +### [Test Data Management](/capabilities/test-data-management.md) -[Database Change Management](/capabilities/database-change-management.md) +The reason we should use Data Generation Tools is pimerily for Test Data Management but Test Data Management can be done without Tools and that should be considered depending on the use case. -### Continuous Delivery +### [Database Change Management](/capabilities/database-change-management.md) -[Continuous Delivery](/capabilities/continuous-delivery.md) +Based on your strategy for Database Change Management or a lack there of in the past, tooling might be an essential part of how you continue or start doing Database Change Management. -### Continuous Integration +### [Continuous Delivery](/capabilities/continuous-delivery.md) & [Continuous Integration](/capabilities/continuous-integration.md) -[Continuous Integration](/capabilities/continuous-integration.md) +Any tests that endup in your pipelines will need Test Data Management which might be done by tools depending on your needs. From f06f9a47e5371332abe6c4b03a474c5ad53f32cd Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 31 Dec 2025 15:19:16 -0800 Subject: [PATCH 101/131] Data Gen Tools: minor fixes --- practices/use-data-generation-tools.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index 9eb9548..ea6e731 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -6,9 +6,11 @@ Data generation tools reduce the complexity of generating complex data types or *Integrated Tests:* Integrated Tests will usually need more data management. There are many good tools for this. For example: -- RedGate -- dbForge -- SSDT +- [RedGate](https://www.red-gate.com/) +- [dbForge](https://www.devart.com/) +- [SSDT](https://learn.microsoft.com/en-us/sql/ssdt/sql-server-data-tools?view=sql-server-ver17) + +These are just examples, it is recommended to research many different tool options before committing to them. ## When to Experiment @@ -24,9 +26,9 @@ Implement the agreed upon solution. ## Lessons From The Field -### Doint use them until you feel a need +### Don't use them until you feel a need -While data generation tools can be helpful in reducing complexity, if you don't see that complexity yet, consider defering the decision to add new tools until that complexity arises. +While data generation tools can be helpful in reducing complexity, if you don't see that complexity yet, consider deferring the decision to add new tools until that complexity arises. ### Be careful about tests that depend on each other @@ -44,7 +46,7 @@ Your tests should be more *maintainable* after implementing this practice. Speci ### [Test Data Management](/capabilities/test-data-management.md) -The reason we should use Data Generation Tools is pimerily for Test Data Management but Test Data Management can be done without Tools and that should be considered depending on the use case. +The reason we should use Data Generation Tools is primarily for Test Data Management but Test Data Management can be done without Tools and that should be considered depending on the use case. ### [Database Change Management](/capabilities/database-change-management.md) @@ -52,5 +54,5 @@ Based on your strategy for Database Change Management or a lack there of in the ### [Continuous Delivery](/capabilities/continuous-delivery.md) & [Continuous Integration](/capabilities/continuous-integration.md) -Any tests that endup in your pipelines will need Test Data Management which might be done by tools depending on your needs. +Any tests that end up in your pipelines will need Test Data Management which might be done by tools depending on your needs. From a97c224f7124f07db9663ed9ccd088e48d482a54 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Fri, 9 Jan 2026 10:12:15 -0600 Subject: [PATCH 102/131] initial edit of use data-generation tools practice --- practices/use-data-generation-tools.md | 34 ++++++++++++-------------- 1 file changed, 15 insertions(+), 19 deletions(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index ea6e731..ffdb890 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -1,24 +1,24 @@ -# Use Data Generation Tools +# Use Data-generation Tools -Data generation tools reduce the complexity of generating complex data types or data rows. +[When you/your team/developers do XYZ (run different types of tests?), they need to generate complex data types or data rows, to provide the system with a wide range of scenarios. But generating this complex data is...error-prone/time-consuming/burdensome/etc.] -*Isolated Tests:* When a need arises use a simple [factory method](https://refactoring.guru/design-patterns/factory-method) or factory library like [Fishery](https://github.com/thoughtbot/fishery) to improve the maintainability and readability. +Data-generation tools reduce the complexity of generating complex data types or data rows. [These tools are helpful when running isolated or integrated tests.] -*Integrated Tests:* Integrated Tests will usually need more data management. There are many good tools for this. For example: +*Isolated Tests:* When a need arises use a simple [factory method](https://refactoring.guru/design-patterns/factory-method) or factory library like [Fishery](https://github.com/thoughtbot/fishery) to improve the maintainability and readability. +*Integrated Tests:* Integrated tests will usually need more data management. There are many good tools for test data management, including but not limited to: - [RedGate](https://www.red-gate.com/) - [dbForge](https://www.devart.com/) - [SSDT](https://learn.microsoft.com/en-us/sql/ssdt/sql-server-data-tools?view=sql-server-ver17) -These are just examples, it is recommended to research many different tool options before committing to them. - ## When to Experiment -You are a Developer and need to ensure that test data is easily managed so that you can maintain a high quality developer experience and retain the users positive experience. +- You are a developer and need to ensure that test data is easily managed so that you can maintain a high-quality developer and user experience. ## How to Gain Traction -First, bring up the need you have for data generation tools with your team so you can gain consensus and ensure you are thinking of everyone's needs. +### Start With Collaboration +First, bring the team together and explain your rationale for needing to use data-generation tools. Listen to the feedback and ask the team to express their needs. Consider many perspectives before making any decisions. Suggest a few tools and discuss different options with your team while being understanding of the needs of your DevOps and DBA teams. @@ -26,33 +26,29 @@ Implement the agreed upon solution. ## Lessons From The Field -### Don't use them until you feel a need - -While data generation tools can be helpful in reducing complexity, if you don't see that complexity yet, consider deferring the decision to add new tools until that complexity arises. - -### Be careful about tests that depend on each other +- _Don't Use Data-generation Tools Until (and Unless) There is a Need_ - While data-generation tools can be helpful in reducing complexity in test data management, if you don't *see* that complexity yet, consider waiting to adopt new tools until that complexity arises. There are use cases where test data management can be done without tools. -When dealing with data that can cross test boundaries like data inside your database or global variables (`window` & `document` in a web context) make sure that each test you write does not end up dependent on the setup or result of another test. You can easily check this by running each of your tests in isolation. If a test only passes when other tests are also run, some modification needs to be made to decouple the tests from each other. +- _Be Careful About Tests That Depend on Each Other_ - When dealing with data that can cross test boundaries, like data inside your database or global variables (`window` & `document` in a web context), make sure that each test you write is independent of the setup or result of another test. You can easily check this by running each of your tests in isolation. If a test only passes when other tests are also run, then some modification needs to be made to decouple the tests from each other. ## Deciding to Polish or Pitch After experimenting with this practice for 2-3 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: -## Fast & Intangible +### Fast & Intangible -Your tests should be more *maintainable* after implementing this practice. Specifically, this means you should find yourself less likely to be fiddling with tests for lengthy periods to setup large sets of data. If you have not improved the time or energy to setup data, consider removing the tool and using factories for in memory data structures and raw sql scripts for sql data. +**Tests should be more maintainable**. You should be less likely to fiddle with tests for lengthy periods of time, setting up large sets of data. If the tool has not improved the time or energy it takes to set up test data, then consider removing the tool and using factories for in-memory data structures and raw sql scripts for sql data. ## Supported Capabilities ### [Test Data Management](/capabilities/test-data-management.md) -The reason we should use Data Generation Tools is primarily for Test Data Management but Test Data Management can be done without Tools and that should be considered depending on the use case. +To implement an effective Test Data Management strategy, teams should leverage tools that automate test data creation based on predefined schemas and rules. Such data-generation tools help teams create relevant and varied datasets, enabling them to cover a wider range of test scenarios. By automating the process of test data creation, teams reduce the time and effort spent on data management and improve test coverage. ### [Database Change Management](/capabilities/database-change-management.md) -Based on your strategy for Database Change Management or a lack there of in the past, tooling might be an essential part of how you continue or start doing Database Change Management. +Depending on your strategy, data-generation tooling might be an essential part of how you continue or start doing Database Change Management. ### [Continuous Delivery](/capabilities/continuous-delivery.md) & [Continuous Integration](/capabilities/continuous-integration.md) -Any tests that end up in your pipelines will need Test Data Management which might be done by tools depending on your needs. +Any tests that end up in your CI/CD pipelines will need test data to be managed. This might be done by data-generation tools, depending on your needs. From e35693274a6b420e1167adb0efe72c2f7c4390c7 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 13 Jan 2026 15:17:54 -0700 Subject: [PATCH 103/131] Use data-generation tools: Improve intro and experement sections --- practices/use-data-generation-tools.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index ffdb890..2f62d91 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -1,19 +1,21 @@ # Use Data-generation Tools -[When you/your team/developers do XYZ (run different types of tests?), they need to generate complex data types or data rows, to provide the system with a wide range of scenarios. But generating this complex data is...error-prone/time-consuming/burdensome/etc.] +Data-generation tools shine brightest when used with end-to-end or integration tests. They _can_ be used with more isolated tests or unit tests but this is often a symptom of a larger issues and should be used with discretion. -Data-generation tools reduce the complexity of generating complex data types or data rows. [These tools are helpful when running isolated or integrated tests.] +Data-generation tools simplify the process of generating complex data types or adding complex data to databases. These jobs can be done with SQL scripts or API Requests but as a system grows developers will feel the need to reach for more powerful strategies to manage that data which is where Data-generation tools come into play. -*Isolated Tests:* When a need arises use a simple [factory method](https://refactoring.guru/design-patterns/factory-method) or factory library like [Fishery](https://github.com/thoughtbot/fishery) to improve the maintainability and readability. +*Isolated Tests:* As setup complexity of your unit and isolated tests grows you will want to reach for simple solutions like [factory method](https://refactoring.guru/design-patterns/factory-method) before introducing third party tools. If efforts to reduce code duplication with factory methods and simple design patterns continues to fail, libraries like [Fishery](https://github.com/thoughtbot/fishery) may improve maintainability and readability. + +*Integrated Tests:* Integration and end-to-end tests are difficult to setup because of their multi-process nature. You can still achieve a satisfactory solution without needing third party data-generation tools but for large projects you will quite often find your self needing data-generation tools like the following: -*Integrated Tests:* Integrated tests will usually need more data management. There are many good tools for test data management, including but not limited to: - [RedGate](https://www.red-gate.com/) - [dbForge](https://www.devart.com/) - [SSDT](https://learn.microsoft.com/en-us/sql/ssdt/sql-server-data-tools?view=sql-server-ver17) ## When to Experiment -- You are a developer and need to ensure that test data is easily managed so that you can maintain a high-quality developer and user experience. +- You are a developer that needs to setup, tear down or reset large amounts of data before and after integration or end-to-end tests. +- You are a developer writing unit or isolated tests that have significant duplication of data setup that cannot be solved with simple design pattern changes and refactors. ## How to Gain Traction From 9bc142e81166497bd987e30ec89cfa601ff6daee Mon Sep 17 00:00:00 2001 From: nicoletache Date: Thu, 15 Jan 2026 09:50:35 -0600 Subject: [PATCH 104/131] edits to updates --- practices/use-data-generation-tools.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index 2f62d91..bf33a92 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -1,20 +1,19 @@ # Use Data-generation Tools -Data-generation tools shine brightest when used with end-to-end or integration tests. They _can_ be used with more isolated tests or unit tests but this is often a symptom of a larger issues and should be used with discretion. +Generating complex data types or adding complex data to databases can typically be done with SQL scripts or API requests. As a system grows, however, developers will feel the need to reach for more powerful strategies to manage that data. This is where data-generation tools step in and simplify the process. -Data-generation tools simplify the process of generating complex data types or adding complex data to databases. These jobs can be done with SQL scripts or API Requests but as a system grows developers will feel the need to reach for more powerful strategies to manage that data which is where Data-generation tools come into play. - -*Isolated Tests:* As setup complexity of your unit and isolated tests grows you will want to reach for simple solutions like [factory method](https://refactoring.guru/design-patterns/factory-method) before introducing third party tools. If efforts to reduce code duplication with factory methods and simple design patterns continues to fail, libraries like [Fishery](https://github.com/thoughtbot/fishery) may improve maintainability and readability. - -*Integrated Tests:* Integration and end-to-end tests are difficult to setup because of their multi-process nature. You can still achieve a satisfactory solution without needing third party data-generation tools but for large projects you will quite often find your self needing data-generation tools like the following: +Data-generation tools shine brightest when used with end-to-end or integration tests. They _can_ be used with more isolated tests or unit tests but this is often a symptom of a larger issue and should be used with discretion. +*Integrated Tests:* Integration and end-to-end tests are difficult to set up because of their multi-process nature. You can achieve a satisfactory solution without using third-party data-generation tools, but for large projects developers will quite often need data-generation tools like the following: - [RedGate](https://www.red-gate.com/) - [dbForge](https://www.devart.com/) - [SSDT](https://learn.microsoft.com/en-us/sql/ssdt/sql-server-data-tools?view=sql-server-ver17) +*Isolated Tests:* As setup complexity of unit and isolated tests grows, developers will want to reach for simple solutions like [factory method](https://refactoring.guru/design-patterns/factory-method) before introducing third-party tools. If efforts to reduce code duplication with factory methods and simple design patterns continues to fail, then libraries like [Fishery](https://github.com/thoughtbot/fishery) may improve maintainability and readability. + ## When to Experiment -- You are a developer that needs to setup, tear down or reset large amounts of data before and after integration or end-to-end tests. +- You are a developer who needs to set up, tear down, or reset large amounts of data before and after integration or end-to-end tests. - You are a developer writing unit or isolated tests that have significant duplication of data setup that cannot be solved with simple design pattern changes and refactors. ## How to Gain Traction From ee3887d21d32d7347e818f02ae841fcd59ea1b36 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 26 Jan 2026 16:07:02 -0700 Subject: [PATCH 105/131] refine use-data-generation-tools --- practices/use-data-generation-tools.md | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index bf33a92..a7dfc6d 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -4,7 +4,7 @@ Generating complex data types or adding complex data to databases can typically Data-generation tools shine brightest when used with end-to-end or integration tests. They _can_ be used with more isolated tests or unit tests but this is often a symptom of a larger issue and should be used with discretion. -*Integrated Tests:* Integration and end-to-end tests are difficult to set up because of their multi-process nature. You can achieve a satisfactory solution without using third-party data-generation tools, but for large projects developers will quite often need data-generation tools like the following: +*Integrated Tests:* Integration and end-to-end tests are difficult to set up because of their multi-process nature. In these evironments, your code isn't running in issolation. It needs to communicate across the network or between different processes on the same server. You can achieve a satisfactory solution without using third-party data-generation tools, but for projects serving tens of thousands or millions of users, developers will quite often need data-generation tools like the following: - [RedGate](https://www.red-gate.com/) - [dbForge](https://www.devart.com/) - [SSDT](https://learn.microsoft.com/en-us/sql/ssdt/sql-server-data-tools?view=sql-server-ver17) @@ -19,17 +19,26 @@ Data-generation tools shine brightest when used with end-to-end or integration t ## How to Gain Traction ### Start With Collaboration -First, bring the team together and explain your rationale for needing to use data-generation tools. Listen to the feedback and ask the team to express their needs. Consider many perspectives before making any decisions. -Suggest a few tools and discuss different options with your team while being understanding of the needs of your DevOps and DBA teams. +Start by bring the team together and explaining your rationale for needing to use data-generation tools. Make sure your problem is clearly articulated and simple to understand. Sometimes your problem will be solvable in a simpler manner without over investing in an external framework. Listen to feedback and be open to others solutions and perspectives. -Implement the agreed upon solution. +### Pilot + +Once you've agreed on a few potential solutions, set a time-box and run a pilot with the options. During the pilot try to strike the balance of investing as little as possible to see if a tool is a viable solution and giving the product a genuine chance to show productivity gains. Keep your team and other stakeholders in the loop as you iterate and take intentional notes about its pros and cons of each tool. + +### Present and Options Paper + +Once your pilot is over, create a document with each choice listed out. Add a small description and the pros and cons you found as your investigated different options. Find a time to re-convene and present your findings seeking to keep personal bias at bay. + +### Iterate and Improve + +Once that choice has been made, always be prepared to be flexible and iterate on the solution. ## Lessons From The Field -- _Don't Use Data-generation Tools Until (and Unless) There is a Need_ - While data-generation tools can be helpful in reducing complexity in test data management, if you don't *see* that complexity yet, consider waiting to adopt new tools until that complexity arises. There are use cases where test data management can be done without tools. +- _Don't Use Data-generation Tools Until (and Unless) There is a Need_ - While data-generation tools can be helpful in reducing complexity in test data management, if you don't *see* that complexity yet, consider waiting to adopt new tools until that complexity arises. There are use cases where test data management can be done without extraneous tools. -- _Be Careful About Tests That Depend on Each Other_ - When dealing with data that can cross test boundaries, like data inside your database or global variables (`window` & `document` in a web context), make sure that each test you write is independent of the setup or result of another test. You can easily check this by running each of your tests in isolation. If a test only passes when other tests are also run, then some modification needs to be made to decouple the tests from each other. +- _Be Careful About Tests That Depend on Each Other_ - Most test that will require data generation tools end up being across significant application boundaries. When dealing with setup for such tests, like data inside your database or global variables (`window` & `document` in a web context), make sure that each test you write is independent of the setup or result of another test. This will likely require some thought when setting up your data generation tools. You can easily check this by running each of your tests in isolation. If a test only passes when other tests are also run, then some modification needs to be made to decouple the tests from each other. ## Deciding to Polish or Pitch From d8ef59bdac584c9b4cccc56d5676e3428e74cb84 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 26 Jan 2026 17:24:48 -0700 Subject: [PATCH 106/131] temporarily remove CI from use-data generation tools --- practices/use-data-generation-tools.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/use-data-generation-tools.md b/practices/use-data-generation-tools.md index a7dfc6d..1571c08 100644 --- a/practices/use-data-generation-tools.md +++ b/practices/use-data-generation-tools.md @@ -58,7 +58,7 @@ To implement an effective Test Data Management strategy, teams should leverage t Depending on your strategy, data-generation tooling might be an essential part of how you continue or start doing Database Change Management. -### [Continuous Delivery](/capabilities/continuous-delivery.md) & [Continuous Integration](/capabilities/continuous-integration.md) +### [Continuous Delivery](/capabilities/continuous-delivery.md) Any tests that end up in your CI/CD pipelines will need test data to be managed. This might be done by data-generation tools, depending on your needs. From 6c42d3ade3a7b3b77b1757647f9e467270fb9cbd Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Mon, 26 Jan 2026 20:41:04 -0800 Subject: [PATCH 107/131] Update the assessment and list of capabilities to include the new AI tagged capabilities --- README.md | 45 +++-- capabilities-maturity-assessment.md | 265 ++++++++++++++++------------ 2 files changed, 170 insertions(+), 140 deletions(-) diff --git a/README.md b/README.md index 1b8e383..0d7d5d0 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Each of these practices maps to one or more [DORA capabilities](#dora-capabiliti Material in this repository supports Pragmint's cyclical **S.T.E.P.** framework: -* **Survey:** Use our [open-source assessment](/capabilities-maturity-assessment.md) to measure your team's maturity against the 29 DORA Capabilities. +* **Survey:** Use our [open-source assessment](/capabilities-maturity-assessment.md) to measure your team's maturity against the DORA Capabilities. * **Target:** Identify Capabilities where there are significant gaps in adoption, and prioritize improving on those that will deliver the highest impact. @@ -18,43 +18,40 @@ Material in this repository supports Pragmint's cyclical **S.T.E.P.** framework: ## DORA Capabilities -### Capabilities that enable a Climate for Learning - +* [AI-accessible Internal Data](/capabilities/ai-accessible-internal-data.md) +* [Clear and Communicated AI Stance](/capabilities/clear-and-communicated-ai-stance.md) * [Code Maintainability](/capabilities/code-maintainability.md) +* [Continuous Delivery](/capabilities/continuous-delivery.md) +* [Continuous Integration](/capabilities/continuous-integration.md) +* [Customer Feedback](/capabilities/customer-feedback.md) +* [Database Change Management](/capabilities/database-change-management.md) +* [Deployment Automation](/capabilities/deployment-automation.md) * [Documentation Quality](/capabilities/documentation-quality.md) * [Empowering Teams To Choose Tools](/capabilities/empowering-teams-to-choose-tools.md) +* [Flexible Infrastructure](/capabilities/flexible-infrastructure.md) * [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) +* [Healthy Data Ecosystems](/capabilities/healthy-data-ecosystems.md) * [Job Satisfaction](/capabilities/job-satisfaction.md) * [Learning Culture](/capabilities/learning-culture.md) -* [Team Experimentation](/capabilities/team-experimentation.md) -* [Transformational Leadership](/capabilities/transformational-leadership.md) -* [Well-Being](/capabilities/well-being.md) - -### Capabilities that enable Fast Flow - -* [Continuous Delivery](/capabilities/continuous-delivery.md) -* [Database Change Management](/capabilities/database-change-management.md) -* [Deployment Automation](/capabilities/deployment-automation.md) -* [Flexible Infrastructure](/capabilities/flexible-infrastructure.md) * [Loosely Coupled Teams](/capabilities/loosely-coupled-teams.md) -* [Streamline Change Approval](/capabilities/streamline-change-approval.md) -* [Trunk-Based Development](/capabilities/trunk-based-development.md) -* [Version Control](/capabilities/version-control.md) -* [Visual Management](/capabilities/visual-management.md) -* [Work in Process Limits](/capabilities/work-in-process-limits.md) -* [Working in Small Batches](/capabilities/working-in-small-batches.md) - -### Capabilities that enable Fast Feedback - -* [Continuous Integration](/capabilities/continuous-integration.md) -* [Customer Feedback](/capabilities/customer-feedback.md) * [Monitoring and Observability](/capabilities/monitoring-and-observability.md) * [Monitoring Systems to Inform Business Decisions](/capabilities/monitoring-systems-to-inform-business-decisions.md) * [Pervasive Security](/capabilities/pervasive-security.md) +* [Platform Engineering](/capabilities/platform-engineering.md) * [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) +* [Streamlining Change Approval](/capabilities/streamline-change-approval.md) +* [Team Experimentation](/capabilities/team-experimentation.md) * [Test Automation](/capabilities/test-automation.md) * [Test Data Management](/capabilities/test-data-management.md) +* [Transformational Leadership](/capabilities/transformational-leadership.md) +* [Trunk-Based Development](/capabilities/trunk-based-development.md) +* [User-Centric Focus](/capabilities/user-centric-focus.md) +* [Version Control](/capabilities/version-control.md) * [Visibility of Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) +* [Visual Management](/capabilities/visual-management.md) +* [Well-Being](/capabilities/well-being.md) +* [Work in Process Limits](/capabilities/work-in-process-limits.md) +* [Working in Small Batches](/capabilities/working-in-small-batches.md) ## Contributing diff --git a/capabilities-maturity-assessment.md b/capabilities-maturity-assessment.md index 05c0eb7..7fcbf24 100644 --- a/capabilities-maturity-assessment.md +++ b/capabilities-maturity-assessment.md @@ -1,16 +1,32 @@ # DORA Capabilities Maturity Assessment -The following assessments are designed to assess the maturity of your team(s) or organization as they relates to specific [DORA Capabilities](https://dora.dev/capabilities/). +## Description + +The following assessment is designed to assess the maturity of your team(s) or organization as they relates to specific DORA Capabilities. For each capability, choose the statement that best reflects your current experience within your team(s) or organization. The number next to that statement is your maturity score for that capability. Generally, score yourself a 1 if the capability is completely missing from your team, a 2 if there is a lot of room for improvement, a 3 if there is some room for improvement, and a 4 if your team is exemplary in the capability. Don't worry if the description doesn't exactly match your situation. The descriptions are meant as examples of situations that would qualify for the associated score. -Most capabilities only have one set of statements to consider. For those few capabilities with multiple sets of statements, average your scores -- that's your overall maturity score for the capability. +Most capabilities only have one set of statements to consider. For those few capabilities with multiple sets of statements, average your scores and you'll have your overall maturity score for the capability. Resist pressuring yourself to select a high rating. The goal of this assessment is to get an _honest_ reading of the current state of things within your team(s) and to pinpoint areas where improvements are likely to yield benefits for your organization. Taking this assessment may also spur helpful conversations with your team(s). To improve in a capability, navigate to its page by either clicking on its title or visiting the [capabilities list](/capabilities/). Once on the capability page, review the Supporting Practices. These are fresh, pragmatic, and actionable ideas you can begin experimenting with today to support the capability. After you've implemented some practices, take this assessment again to track your progress and find your next area of focus. -## Climate for Learning +## Assessment + +### [AI-accessible Internal Data](/capabilities/ai-accessible-internal-data.md) + +1. **Fragmented & Manual:** Data is scattered across various tools (e.g., Slack, Jira, Google Docs, email). Finding information requires manual searching or asking individuals. There is no AI interface for internal data. +2. **Centralized but Static:** Most data is in a central wiki or repo and is accessible with a basic keyword search. Some experiments with AI exist, but they are prone to hallucinations and lack access to real-time updates. +3. **Integrated & Useful:** An AI-powered search or chatbot exists that can access most technical documentation and code. It provides citations for its answers. Accuracy is high, though it occasionally misses very recent changes or restricted data. +4. **Ubiquitous & Trusted:** AI has secure, real-time access to all relevant internal data sources. It respects granular permissions and is the first place employees go for answers. Feedback loops are in place to correct the AI and update the underlying documentation simultaneously. + +### [Clear and Communicated AI Stance](/capabilities/clear-and-communicated-ai-stance.md) + +1. **Absent or Hidden:** No formal stance exists, or it is buried in legal documentation that is not shared with technical teams. Developers are unsure what is allowed, leading to either total avoidance or "underground" usage. +2. **Reactive & Vague:** A stance exists but is mostly reactive (e.g., "don't put passwords in ChatGPT"). Guidelines are unclear, and there is no centralized place to find updates or ask questions about new tools. +3. **Clear & Communicated:** There is a well-documented AI policy that is easily accessible. Most team members understand the boundaries of AI use, and there is a clear process for requesting or vetting new AI tools. +4. **Integrated & Iterative:** The AI stance is part of the daily engineering culture. It is regularly updated based on team feedback and technological shifts. There is high confidence in using AI because the legal and security guardrails are clear and supportive. ### [Code Maintainability](/capabilities/code-maintainability.md) @@ -28,64 +44,6 @@ To improve in a capability, navigate to its page by either clicking on its title 3. **Partially Modular Codebase:** Most parts of the system are modular and easy to update, but some are complex and difficult to work with. 4. **Well-organized Codebase:** When changes are made to the existing codebase, they don’t tend to require much rework. -### [Documentation Quality](/capabilities/documentation-quality.md) - -1. **Minimal:** The technical documentation is often outdated, incomplete, or inaccurate, making it difficult to rely on when working with the services or applications. It's hard to find what is needed, and others are often asked for help. -2. **Basic:** The technical documentation is somewhat reliable, but it's not always easy to find what is needed. Updates are sporadic, and multiple sources must be dug through to get the required information. In times of crisis, the documentation might be glanced at, but it's not always trusted. -3. **Good:** The technical documentation is generally reliable, and what is needed can usually be found with some effort. Updates are made regularly, but not always immediately. The documentation is used to help troubleshoot issues, but clarification from others might still be needed. -4. **Excellent:** The technical documentation is comprehensive, accurate, and up-to-date. What is needed can easily be found, and the documentation is relied on heavily when working with the services or applications. When issues arise, the documentation is confidently reached for to help troubleshoot and resolve problems. - -### [Empowering Teams To Choose Tools](/capabilities/empowering-teams-to-choose-tools.md) - -1. **Insufficient Tools:** The current tools are inadequate for getting the job done, and there is no clear way to evaluate or adopt new ones. -2. **Adequate but Limited:** The current tools are sufficient but limited, and new tools are occasionally adopted through an informal process. -3. **Capable and Evolving:** The current tools are capable of meeting needs, and a standardized process is in place for evaluating and adopting new tools should the need arise. -4. **Best-in-Class Tools:** The best tools available are used to get the job done, and new tools are proactively researched and teams are empowered to recommend their adoption via a standardized process. - -### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) - -1. **Top-Down:** Teams operate under a highly directive approach, with leadership providing explicit instructions and priorities. Autonomy for decision-making is limited. In the event of failure, the focus is on individual accountability and administering corrective action. -2. **Bureaucratic:** Teams follow established procedures and protocols, with clear roles and responsibilities. Sometimes the specific instructions are vague or incomplete with no clear product leader. Teams have some flexibility to adapt to changing circumstances. However, leadership approval is still required for most meaningful decisions. -3. **Collaborative:** Teams seek input and expertise from other teams to inform their decisions, but maintain clear ownership and responsibility for their work. Each team has some autonomy to make decisions within established boundaries. However, strategic direction is set by leadership, and teams are expected to align their work with these top-down priorities. -4. **Generative:** Teams seek input and expertise from other teams to inform their decisions, but maintain clear ownership and responsibility for their work. Each team has some autonomy to make decisions within established boundaries. Strategic direction is set by leadership, but factors in ground-level feedback from the teams. - -### [Job Satisfaction](/capabilities/job-satisfaction.md) - -1. **Unfulfilling Work:** Employees often feel undervalued, overworked, and disconnected from the organization's purpose. -2. **Limited Engagement:** Employees are somewhat satisfied but lack autonomy, opportunities for growth, and a sense of accomplishment. -3. **Satisfactory Engagement:** Employees are generally content, with some opportunities for growth and a sense of fulfillment. They may lack excitement or challenge, though. -4. **Exceptional Engagement:** Employees are highly motivated, empowered, and passionate about their work. They demonstrate a strong sense of purpose and fulfillment. - -### [Learning Culture](/capabilities/learning-culture.md) - -1. **Static Knowledge:** Learning is limited to onboarding and initial training, with little emphasis on ongoing development or skill-building. -2. **Ad Hoc Learning:** Teams occasionally attend conferences or workshops, but learning is not a prioritized or structured part of the organization's culture. -3. **Encouraged Learning:** Learning is valued and encouraged, with some resources and opportunities provided for professional development, but it's not a core part of the organization's identity. -4. **Learning as a Competitive Advantage:** Learning is deeply ingrained in the organization's culture, viewed as a key driver of improvement and innovation. It is actively prioritized and invested in, helping the team to stay ahead of the curve. - -### [Team Experimentation](/capabilities/team-experimentation.md) - -1. **Minimal or No Experimentation:** Teams follow a strict, rigid plan with little room for deviation or experimentation and must seek approval for even minor changes. They have limited visibility into the organization's overall goals and context, and they must pull information from leadership on an ad-hoc basis. -2. **Highly Controlled Experimentation:** Teams are allowed to explore new ideas but within tightly defined parameters and with close oversight from leadership. Deadline pressure regularly takes priority over idea exploration. Teams must request access to relevant context and information, which is provided on an ad-hoc basis. -3. **Emerging but Limited Experimentation:** Teams have some flexibility to try new approaches but must seek permission from leadership or follow established protocols for most changes. They have access to some organizational context, including goals and objectives, but may not have direct access to customer feedback or the financial performance of the company. -4. **Self-Directed Innovation:** Teams have autonomy to pursue new ideas and make decisions. Their experiments are informed by direct access to customer feedback and relevant context that is proactively shared by leadership, including the organization's vision, goals, strategic priorities, and financial state. - -### [Transformational Leadership](/capabilities/transformational-leadership.md) - -1. **Crisis Management:** The organization is in a state of crisis or chaos, requiring leaders to take a direct and hands-on approach. Leaders focus on short-term goals, with limited opportunities to communicate a clear long-term vision or inspire team members. -2. **Transactional Leadership:** Leaders focus on managing scenarios that deviate from the norm. They prioritize meeting urgent goals and objectives, providing clear direction and guidance to team members. They begin to communicate a vision and hold team members accountable for working toward common goals. -3. **Supportive Leadership:** Leaders work closely with team members to develop their skills and abilities, sometimes exhibiting other transformational leadership behaviors like clear vision, inspirational communication, intellectual stimulation, and personal recognition. -4. **Transformational Leadership:** Leaders create a culture of trust, empowerment, and autonomy, consistently demonstrating all five dimensions of transformational leadership: clear vision, inspirational communication, intellectual stimulation, support, and personal recognition. - -### [Well-Being](/capabilities/well-being.md) - -1. **Overwhelmed and Undervalued:** Employees are consistently overwhelmed by work demands, have little control over their work, and feel undervalued and unrewarded, with a breakdown in community and a lack of fairness in decision-making processes. -2. **Managing the Load:** Teams are coping with work demands, but some employees are still struggling with a lack of control and autonomy, and rewards and recognition are inconsistent. While there are some efforts to build a sense of community, fairness and values alignment are still a work in progress. -3. **Finding Balance:** Employees are generally happy and engaged, with a good work-life balance, and teams are making progress in addressing work overload, increasing control and autonomy, and providing sufficient rewards and recognition, but there is still room for improvement in building a sense of community and fairness. -4. **Thriving Culture:** Employees are highly engaged, motivated, and happy, with a strong sense of well-being, and teams are consistently delivering high-quality work in a supportive and fair work environment, with a clear alignment between organizational and individual values, and opportunities for growth and development. - -## Fast Flow - ### [Continuous Delivery](/capabilities/continuous-delivery.md) #### Value Delivery Frequency @@ -116,6 +74,20 @@ To improve in a capability, navigate to its page by either clicking on its title 3. **Under An Hour:** It typically takes somewhere between 10 and 60 minutes to restore service after a change failure. 4. **A Couple Of Minutes:** It typically takes under 10 minutes to restore service after a change failure. +### [Continuous Integration](/capabilities/continuous-integration.md) + +1. **Infrequent & Painful:** Integration is done rarely, with large batches of changes, requiring multiple levels of approval, and often resulting in merge conflicts and uncertain outcomes. +2. **Routine & Coordinated:** Integration happens regularly (e.g., weekly), with moderate-sized changes, requiring some approval and coordination, and occasional merge conflicts, but with a good understanding of the outcome. +3. **Regular & Smooth:** Integration happens frequently (e.g., daily), with small, incremental changes, requiring minimal approval, and rare painful merge conflicts, with a high degree of confidence in the outcome. +4. **Continuous & Seamless:** Integration happens continuously, with tiny, incremental changes, rarely requiring approval, and virtually no painful merge conflicts, with complete confidence in the outcome and immediate feedback. + +### [Customer Feedback](/capabilities/customer-feedback.md) + +1. **Meaningless:** User feedback from releases aren’t collected. +2. **Reactive:** User feedback is gathered, but usually only after significant issues arise, and it’s acted upon sporadically. +3. **Informative:** User feedback is regularly gathered and may influence our prioritization, but meaningful shifts in priority don’t happen frequently. +4. **Impactful:** User feedback is gathered, based upon recent changes, and acted upon often. + ### [Database Change Management](/capabilities/database-change-management.md) 1. **Manual and Error-Prone:** Database changes are made manually, with a high risk of errors. Deployments are slow, sometimes taking hours to complete, and sometimes requiring downtime. @@ -130,6 +102,20 @@ To improve in a capability, navigate to its page by either clicking on its title 3. **Mostly Automated:** Deployments are mostly automated, with minimal manual intervention. 4. **Fully Automated:** Deployments are fully automated, including rollback mechanisms and verification steps. +### [Documentation Quality](/capabilities/documentation-quality.md) + +1. **Minimal:** The technical documentation is often outdated, incomplete, or inaccurate, making it difficult to rely on when working with the services or applications. It's hard to find what is needed, and others are often asked for help. +2. **Basic:** The technical documentation is somewhat reliable, but it's not always easy to find what is needed. Updates are sporadic, and multiple sources must be dug through to get the required information. In times of crisis, the documentation might be glanced at, but it's not always trusted. +3. **Good:** The technical documentation is generally reliable, and what is needed can usually be found with some effort. Updates are made regularly, but not always immediately. The documentation is used to help troubleshoot issues, but clarification from others might still be needed. +4. **Excellent:** The technical documentation is comprehensive, accurate, and up-to-date. What is needed can easily be found, and the documentation is relied on heavily when working with the services or applications. When issues arise, the documentation is confidently reached for to help troubleshoot and resolve problems. + +### [Empowering Teams To Choose Tools](/capabilities/empowering-teams-to-choose-tools.md) + +1. **Insufficient Tools:** The current tools are inadequate for getting the job done, and there is no clear way to evaluate or adopt new ones. +2. **Adequate but Limited:** The current tools are sufficient but limited, and new tools are occasionally adopted through an informal process. +3. **Capable and Evolving:** The current tools are capable of meeting needs, and a standardized process is in place for evaluating and adopting new tools should the need arise. +4. **Best-in-Class Tools:** The best tools available are used to get the job done, and new tools are proactively researched and teams are empowered to recommend their adoption via a standardized process. + ### [Flexible Infrastructure](/capabilities/flexible-infrastructure.md) 1. **Rigid and Manual:** Infrastructure changes are slow and labor-intensive, requiring manual intervention and taking weeks or months to complete. @@ -137,70 +123,40 @@ To improve in a capability, navigate to its page by either clicking on its title 3. **Advanced Automation:** Infrastructure changes are largely automated, with self-service capabilities and rapid scalability, but different teams and functions may still work in silos, with some manual handoffs and coordination required. 4. **On-Demand and Elastic:** Infrastructure is fully automated, with seamless collaboration and alignment between teams and functions, enabling rapid scaling and flexibility, and providing a unified, on-demand experience for users. -### [Loosely Coupled Teams](/capabilities/loosely-coupled-teams.md) - -1. **Tightly Coupled:** Teams are heavily dependent on other teams for design decisions and deployment. Frequent, fine-grained communication and coordination are required. -2. **Somewhat Coupled:** Teams have some independence, but still require regular coordination with other teams. Deployment and design changes often need permission or resources from outside the team. -3. **Moderately Coupled:** Teams have a moderate level of independence. They can make some design changes and deploy without permission, but they may still need to coordinate with other teams routinely. -4. **Loosely Coupled:** Teams have full autonomy to make most large-scale design changes and deploy on-demand. They can test independently and release with negligible downtime, without needing fine-grained communication or coordination with other teams. - -### [Streamline Change Approval](/capabilities/streamline-change-approval.md) - -1. **Manual and Gatekeeping:** Changes require manual approval from a centralized Change Advisory Board (CAB) or external reviewers, creating a bottleneck and slowing down the delivery process. -2. **Peer-Reviewed and Coordinated:** Changes are manually verified, reviewed, and subsequently approved by peers. Changes require high levels of coordination when they affect multiple teams. It usually takes close to a week or more to get approval. -3. **Semi-automated and Efficient:** Changes are reviewed and approved through a mix of automated and manual processes, with peer review still in place. Coordination is more efficient and approval times are faster. When change approval is required, feedback is typically provided within a day or two. -4. **Streamlined:** The high level of automation in change approvals significantly reduces, and in some cases eliminates, the burden of peer review. A Change Advisory Board (CAB) may still exist, but their role is to simply advise and facilitate important discussions. When change approval is required, feedback is typically provided in under 24 hours. - -### [Trunk-Based Development](/capabilities/trunk-based-development.md) - -1. **Long-lived Branches:** Development happens on long-lived feature branches that are rarely merged to trunk, resulting in complex and painful integrations. -2. **Regular Merges:** Development happens on feature branches that are regularly merged to trunk (e.g., weekly), with some manual effort required to resolve conflicts. -3. **Short-lived Branches:** Development happens on short-lived feature branches (e.g., 1-3 days) that are frequently merged to trunk, with minimal manual effort required to resolve conflicts. -4. **Trunk-based:** Development happens either directly on trunk or on very short-lived feature branches (e.g., 1-3 hours). Changes are committed and validated continuously with immediate feedback. - -### [Version Control](/capabilities/version-control.md) - -1. **Limited or No Adoption:** Version control is not used, or its use is limited to select teams, with no organization-wide adoption or standardization. -2. **Basic Code and Data Storage:** Version control is used primarily for code and data backups, with limited or no version control for infrastructure and other assets. -3. **Standard Version Control:** Version control is used consistently across teams for code, configuration, data, infrastructure, and documentation. Disaster recovery is fully supported. -4. **Advanced Version Control:** Version control is optimized for small, comprehensible changes, with a focus on making it easy to traverse and understand the history of changes across code, configurations, documentation, data, and infrastructure. +### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) -### [Visual Management](/capabilities/visual-management.md) +1. **Top-Down:** Teams operate under a highly directive approach, with leadership providing explicit instructions and priorities. Autonomy for decision-making is limited. In the event of failure, the focus is on individual accountability and administering corrective action. +2. **Bureaucratic:** Teams follow established procedures and protocols, with clear roles and responsibilities. Sometimes the specific instructions are vague or incomplete with no clear product leader. Teams have some flexibility to adapt to changing circumstances. However, leadership approval is still required for most meaningful decisions. +3. **Collaborative:** Teams seek input and expertise from other teams to inform their decisions, but maintain clear ownership and responsibility for their work. Each team has some autonomy to make decisions within established boundaries. However, strategic direction is set by leadership, and teams are expected to align their work with these top-down priorities. +4. **Generative:** Teams seek input and expertise from other teams to inform their decisions, but maintain clear ownership and responsibility for their work. Each team has some autonomy to make decisions within established boundaries. Strategic direction is set by leadership, but factors in ground-level feedback from the teams. -1. **No Visibility:** No visual management displays or dashboards are used, and teams lack visibility into their processes and progress. -2. **Basic Dashboards:** Simple dashboards or visual displays are used, but they are not regularly updated, and teams do not actively use them to inform their work. -3. **Informative Displays:** Visual management displays are used to track key metrics and progress, and teams regularly review and update them to inform their work and identify areas for improvement. -4. **Real-Time Feedback:** Advanced visual management displays provide real-time feedback and insights, enabling teams to quickly identify and address issues, and make data-driven decisions to adjust their priorities and drive continuous improvement. +### [Healthy Data Ecosystems](/capabilities/healthy-data-ecosystems.md) -### [Work in Process Limits](/capabilities/work-in-process-limits.md) +1. **Fragmented & Untrusted:** Data is trapped in silos. Access requires manual approvals and long waits. No one is sure if the data is accurate and "data cleaning" is a massive, manual chore for every project. +2. **Coordinated but Manual:** Data is documented, but often outdated. You have some central repositories (like a data warehouse), but integrating new data types is slow. Testing often uses stale or "hand-rolled" data that doesn't reflect reality. +3. **Accessible & Reliable:** Most data is discoverable via a catalog or API. Automated pipelines handle basic cleaning and transformation. There is high confidence in data quality and privacy masking is largely automated. +4. **Fluid & Self-Service:** Data is treated as a product. Teams can self-serve the data they need through well-defined interfaces. Data source tracking is fully transparent, and data quality issues are caught by automated "data tests" before they affect downstream systems. -1. **No Limits:** No WIP limits are set, and teams work on multiple tasks simultaneously, leading to inefficiencies and burnout. -2. **Loose Limits:** WIP limits are set, but they are not enforced, and teams often exceed them, resulting in delays and inefficiencies. -3. **Managed Limits:** WIP limits are set and enforced, and teams prioritize work based on capacity, but there is still room for improvement in reducing lead times and increasing flow. -4. **Optimized Flow:** WIP limits are optimized and continuously refined to minimize lead times, reduce variability, and achieve single-piece flow, with a focus on continuous improvement and removing obstacles. - -### [Working in Small Batches](/capabilities/working-in-small-batches.md) - -1. **Large Batches:** Work is done in large batches that take a long time (months) to complete, resulting in reduced visibility into progress, increased integration effort, delayed value, and high variability. -2. **Moderate Batches:** Batches are moderately sized, taking several weeks to complete, which can lead to some delays in integration and value delivery, and moderate variability, making it difficult to track progress. -3. **Small Batches:** Work is broken down into small batches that can be completed and integrated quickly (days), allowing for clear visibility into progress, relatively low integration effort, and faster value delivery, with some variability. -4. **Minimal Viable Batches:** Work is decomposed into extremely small, minimal viable batches that can be completed and integrated rapidly (hours), providing clear and continuous visibility into progress, minimal integration effort, and fast value delivery, with low variability. +### [Job Satisfaction](/capabilities/job-satisfaction.md) -## Fast Feedback +1. **Unfulfilling Work:** Employees often feel undervalued, overworked, and disconnected from the organization's purpose. +2. **Limited Engagement:** Employees are somewhat satisfied but lack autonomy, opportunities for growth, and a sense of accomplishment. +3. **Satisfactory Engagement:** Employees are generally content, with some opportunities for growth and a sense of fulfillment. They may lack excitement or challenge, though. +4. **Exceptional Engagement:** Employees are highly motivated, empowered, and passionate about their work. They demonstrate a strong sense of purpose and fulfillment. -### [Continuous Integration](/capabilities/continuous-integration.md) +### [Learning Culture](/capabilities/learning-culture.md) -1. **Infrequent & Painful:** Integration is done rarely, with large batches of changes, requiring multiple levels of approval, and often resulting in merge conflicts and uncertain outcomes. -2. **Routine & Coordinated:** Integration happens regularly (e.g., weekly), with moderate-sized changes, requiring some approval and coordination, and occasional merge conflicts, but with a good understanding of the outcome. -3. **Regular & Smooth:** Integration happens frequently (e.g., daily), with small, incremental changes, requiring minimal approval, and rare painful merge conflicts, with a high degree of confidence in the outcome. -4. **Continuous & Seamless:** Integration happens continuously, with tiny, incremental changes, rarely requiring approval, and virtually no painful merge conflicts, with complete confidence in the outcome and immediate feedback. +1. **Static Knowledge:** Learning is limited to onboarding and initial training, with little emphasis on ongoing development or skill-building. +2. **Ad Hoc Learning:** Teams occasionally attend conferences or workshops, but learning is not a prioritized or structured part of the organization's culture. +3. **Encouraged Learning:** Learning is valued and encouraged, with some resources and opportunities provided for professional development, but it's not a core part of the organization's identity. +4. **Learning as a Competitive Advantage:** Learning is deeply ingrained in the organization's culture, viewed as a key driver of improvement and innovation. It is actively prioritized and invested in, helping the team to stay ahead of the curve. -### [Customer Feedback](/capabilities/customer-feedback.md) +### [Loosely Coupled Teams](/capabilities/loosely-coupled-teams.md) -1. **Meaningless:** User feedback from releases aren’t collected. -2. **Reactive:** User feedback is gathered, but usually only after significant issues arise, and it’s acted upon sporadically. -3. **Informative:** User feedback is regularly gathered and may influence our prioritization, but meaningful shifts in priority don’t happen frequently. -4. **Impactful:** User feedback is gathered, based upon recent changes, and acted upon often. +1. **Tightly Coupled:** Teams are heavily dependent on other teams for design decisions and deployment. Frequent, fine-grained communication and coordination are required. +2. **Somewhat Coupled:** Teams have some independence, but still require regular coordination with other teams. Deployment and design changes often need permission or resources from outside the team. +3. **Moderately Coupled:** Teams have a moderate level of independence. They can make some design changes and deploy without permission, but they may still need to coordinate with other teams routinely. +4. **Loosely Coupled:** Teams have full autonomy to make most large-scale design changes and deploy on-demand. They can test independently and release with negligible downtime, without needing fine-grained communication or coordination with other teams. ### [Monitoring and Observability](/capabilities/monitoring-and-observability.md) @@ -223,6 +179,13 @@ To improve in a capability, navigate to its page by either clicking on its title 3. **Integrated Security:** Security is a key consideration during development, with internal security training and some use of automated security tooling. 4. **Pervasive Security:** Security is deeply ingrained in the development culture, with continuous security testing, automated security tooling, and routine security reviews throughout the software development lifecycle. +### [Platform Engineering](/capabilities/platform-engineering.md) + +1. **Ticket-Ops & Fragmented Tooling:** The platform is a collection of infrastructure tickets and manual gates rather than a product. Individual AI coding gains are lost to downstream disorder, as security reviews, testing, and deployments remain manual bottlenecks that increase cognitive load. +2. **Standardized but Rigid:** Initial "golden paths" exist, but they function as a "golden cage" with little flexibility for diverse team needs. While some automation is present, developer feedback is often unclear, and the lack of self-service means AI-generated code frequently stalls at the integration phase. +3. **Product-Centric & Self-Service:** A dedicated platform team treats developers as customers, providing self-service interfaces that "shift down" complexity. Automated pipelines ensure AI-amplified throughput is consistently tested and secured, allowing teams to focus on user value rather than infrastructure hurdles. +4. **Fluid, Extensible & AI-Ready:** The platform is an extensible ecosystem where "golden paths" are the easiest choice but allow for contribution and flexibility. Real-time feedback and automated guardrails make experimentation cheap and recovery fast, fully realizing AI’s potential to accelerate the entire delivery lifecycle without sacrificing stability. + ### [Proactive Failure Notification](/capabilities/proactive-failure-notification.md) 1. **No Notifications:** There is no automated system of notifying teams that a failure has occurred in deployed environments. Failures are typically caught via manual QA or reported by users. @@ -230,6 +193,20 @@ To improve in a capability, navigate to its page by either clicking on its title 3. **Threshold-based Notifications:** Alerting rules are well-defined, with failure thresholds tuned to accurately spot issues. Notifications are relevant and timely. 4. **Proactive Notification:** Rate of change metrics are tracked to proactively spot potential issues. There are automated responses to many notifications, and teams continuously review and refine alerting rules to anticipate and prevent failures. +### [Streamlining Change Approval](/capabilities/streamline-change-approval.md) + +1. **Manual and Gatekeeping:** Changes require manual approval from a centralized Change Advisory Board (CAB) or external reviewers, creating a bottleneck and slowing down the delivery process. +2. **Peer-Reviewed and Coordinated:** Changes are manually verified, reviewed, and subsequently approved by peers. Changes require high levels of coordination when they affect multiple teams. It usually takes close to a week or more to get approval. +3. **Semi-automated and Efficient:** Changes are reviewed and approved through a mix of automated and manual processes, with peer review still in place. Coordination is more efficient and approval times are faster. When change approval is required, feedback is typically provided within a day or two. +4. **Streamlined:** The high level of automation in change approvals significantly reduces, and in some cases eliminates, the burden of peer review. A Change Advisory Board (CAB) may still exist, but their role is to simply advise and facilitate important discussions. When change approval is required, feedback is typically provided in under 24 hours. + +### [Team Experimentation](/capabilities/team-experimentation.md) + +1. **Minimal or No Experimentation:** Teams follow a strict, rigid plan with little room for deviation or experimentation and must seek approval for even minor changes. They have limited visibility into the organization's overall goals and context, and they must pull information from leadership on an ad-hoc basis. +2. **Highly Controlled Experimentation:** Teams are allowed to explore new ideas but within tightly defined parameters and with close oversight from leadership. Deadline pressure regularly takes priority over idea exploration. Teams must request access to relevant context and information, which is provided on an ad-hoc basis. +3. **Emerging but Limited Experimentation:** Teams have some flexibility to try new approaches but must seek permission from leadership or follow established protocols for most changes. They have access to some organizational context, including goals and objectives, but may not have direct access to customer feedback or the financial performance of the company. +4. **Self-Directed Innovation:** Teams have autonomy to pursue new ideas and make decisions. Their experiments are informed by direct access to customer feedback and relevant context that is proactively shared by leadership, including the organization's vision, goals, strategic priorities, and financial state. + ### [Test Automation](/capabilities/test-automation.md) 1. **Limited:** Test automation is minimal, slow, and/or unreliable. There is heavy reliance on manual testing. @@ -253,9 +230,65 @@ To improve in a capability, navigate to its page by either clicking on its title 3. **Scripted Data Seeding:** Automated tests use manual data seeding scripts to set up and tear down their data; they may not cover all production scenarios. 4. **All Categories of Automated Tests Are Supported:** In addition to supporting scripted data seeding, ephemeral environments are easily created and torn down. They can be seeded with realistic synthetic data or production data that has sensitive information scrubbed from it. This enables advanced testing categories like performance, load, and anomaly detection. +### [Transformational Leadership](/capabilities/transformational-leadership.md) + +1. **Crisis Management:** The organization is in a state of crisis or chaos, requiring leaders to take a direct and hands-on approach. Leaders focus on short-term goals, with limited opportunities to communicate a clear long-term vision or inspire team members. +2. **Transactional Leadership:** Leaders focus on managing scenarios that deviate from the norm. They prioritize meeting urgent goals and objectives, providing clear direction and guidance to team members. They begin to communicate a vision and hold team members accountable for working toward common goals. +3. **Supportive Leadership:** Leaders work closely with team members to develop their skills and abilities, sometimes exhibiting other transformational leadership behaviors like clear vision, inspirational communication, intellectual stimulation, and personal recognition. +4. **Transformational Leadership:** Leaders create a culture of trust, empowerment, and autonomy, consistently demonstrating all five dimensions of transformational leadership: clear vision, inspirational communication, intellectual stimulation, support, and personal recognition. + +### [Trunk-Based Development](/capabilities/trunk-based-development.md) + +1. **Long-lived Branches:** Development happens on long-lived feature branches that are rarely merged to trunk, resulting in complex and painful integrations. +2. **Regular Merges:** Development happens on feature branches that are regularly merged to trunk (e.g., weekly), with some manual effort required to resolve conflicts. +3. **Short-lived Branches:** Development happens on short-lived feature branches (e.g., 1-3 days) that are frequently merged to trunk, with minimal manual effort required to resolve conflicts. +4. **Trunk-based:** Development happens either directly on trunk or on very short-lived feature branches (e.g., 1-3 hours). Changes are committed and validated continuously with immediate feedback. + +### [User-Centric Focus](/capabilities/user-centric-focus.md) + +1. **The Feature Factory**: Teams focus on output volume and use AI to ship more features without validating user impact or feedback. +2. **Reactive & Proxy-Led**: Teams rely on siloed feedback and manual hand-offs, using AI to accelerate ticket completion rather than user outcomes. +3. **Integrated & Spec-Driven**: Teams use spec-driven development and direct user observation to ensure AI outputs are grounded in verified requirements. +4. **User-Invested & Self-Correcting**: Teams treat AI as a discovery partner, using real-time user metrics and rapid prototyping to pivot toward maximum value. + +### [Version Control](/capabilities/version-control.md) + +1. **Limited or No Adoption:** Version control is not used, or its use is limited to select teams, with no organization-wide adoption or standardization. +2. **Basic Code and Data Storage:** Version control is used primarily for code and data backups, with limited or no version control for infrastructure and other assets. +3. **Standard Version Control:** Version control is used consistently across teams for code, configuration, data, infrastructure, and documentation. Disaster recovery is fully supported. +4. **Advanced Version Control:** Version control is optimized for small, comprehensible changes, with a focus on making it easy to traverse and understand the history of changes across code, configurations, documentation, data, and infrastructure. + ### [Visibility of Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) 1. **Limited Visibility:** Teams have little understanding of the flow of work from idea to customer. They lack visibility into the current state of products and features. 2. **Partial Visibility:** Teams have some visibility into the flow of work, but it's limited to their own area of responsibility. They lack a comprehensive understanding of the entire value stream. 3. **Managed Visibility:** Teams use visual displays and dashboards to track the flow of work. They have a good understanding of the current state of products and features, but may not have a complete view of the entire value stream. 4. **End-to-End Visibility:** Teams have a complete and up-to-date understanding of the flow of work from idea to customer, with real-time visibility into the current state of products and features. They use data to improve the flow of work. + +### [Visual Management](/capabilities/visual-management.md) + +1. **No Visibility:** No visual management displays or dashboards are used, and teams lack visibility into their processes and progress. +2. **Basic Dashboards:** Simple dashboards or visual displays are used, but they are not regularly updated, and teams do not actively use them to inform their work. +3. **Informative Displays:** Visual management displays are used to track key metrics and progress, and teams regularly review and update them to inform their work and identify areas for improvement. +4. **Real-Time Feedback:** Advanced visual management displays provide real-time feedback and insights, enabling teams to quickly identify and address issues, and make data-driven decisions to adjust their priorities and drive continuous improvement. + +### [Well-Being](/capabilities/well-being.md) + +1. **Overwhelmed and Undervalued:** Employees are consistently overwhelmed by work demands, have little control over their work, and feel undervalued and unrewarded, with a breakdown in community and a lack of fairness in decision-making processes. +2. **Managing the Load:** Teams are coping with work demands, but some employees are still struggling with a lack of control and autonomy, and rewards and recognition are inconsistent. While there are some efforts to build a sense of community, fairness and values alignment are still a work in progress. +3. **Finding Balance:** Employees are generally happy and engaged, with a good work-life balance, and teams are making progress in addressing work overload, increasing control and autonomy, and providing sufficient rewards and recognition, but there is still room for improvement in building a sense of community and fairness. +4. **Thriving Culture:** Employees are highly engaged, motivated, and happy, with a strong sense of well-being, and teams are consistently delivering high-quality work in a supportive and fair work environment, with a clear alignment between organizational and individual values, and opportunities for growth and development. + +### [Work in Process Limits](/capabilities/work-in-process-limits.md) + +1. **No Limits:** No WIP limits are set, and teams work on multiple tasks simultaneously, leading to inefficiencies and burnout. +2. **Loose Limits:** WIP limits are set, but they are not enforced, and teams often exceed them, resulting in delays and inefficiencies. +3. **Managed Limits:** WIP limits are set and enforced, and teams prioritize work based on capacity, but there is still room for improvement in reducing lead times and increasing flow. +4. **Optimized Flow:** WIP limits are optimized and continuously refined to minimize lead times, reduce variability, and achieve single-piece flow, with a focus on continuous improvement and removing obstacles. + +### [Working in Small Batches](/capabilities/working-in-small-batches.md) + +1. **Large Batches:** Work is done in large batches that take a long time (months) to complete, resulting in reduced visibility into progress, increased integration effort, delayed value, and high variability. +2. **Moderate Batches:** Batches are moderately sized, taking several weeks to complete, which can lead to some delays in integration and value delivery, and moderate variability, making it difficult to track progress. +3. **Small Batches:** Work is broken down into small batches that can be completed and integrated quickly (days), allowing for clear visibility into progress, relatively low integration effort, and faster value delivery, with some variability. +4. **Minimal Viable Batches:** Work is decomposed into extremely small, minimal viable batches that can be completed and integrated rapidly (hours), providing clear and continuous visibility into progress, minimal integration effort, and fast value delivery, with low variability. From 198b366353ebdced261e713a006d3617e085a11b Mon Sep 17 00:00:00 2001 From: Ian Carroll Date: Thu, 4 Dec 2025 06:43:07 -0800 Subject: [PATCH 108/131] Create practices/build-local-dev-monorepo --- practices/build-local-dev-monorepo.md | 80 +++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 practices/build-local-dev-monorepo.md diff --git a/practices/build-local-dev-monorepo.md b/practices/build-local-dev-monorepo.md new file mode 100644 index 0000000..0721ad1 --- /dev/null +++ b/practices/build-local-dev-monorepo.md @@ -0,0 +1,80 @@ +# Migrate to a Monorepo + +Developing with dozens of local micro-repos that need to work together and may also have database access or cloud permissions needs can feel like you've only got the choice of flying blind or waiting to get off the ground. This practice helps teams consolidate scattered projects into a single monorepo so they can see the system at work locally and start coding anywhere (even on a plane) without a need for access approvals or an unused cloud environment. Many popular build tools support this functionality: + +- [pnpm](https://pnpm.io/workspaces) +- [Nx](https://nx.dev/concepts/decisions/why-monorepos) +- [Bazel](https://bazel.build/concepts/build-ref#workspace) +- [Turborepo](https://turborepo.com/docs#what-is-turborepo) +- [Pants](https://www.pantsbuild.org/stable/docs/using-pants/environments#in-workspace-execution-experimental_workspace_environment) +- etc (this isn't by any means a definitive list) + +## When to Experiment + +- You are a Developer working on features that involve multiple interdependent repositories and need to get fast feedback on changes without needing access to managed cloud environments so you can start coding right away instead of waiting for resources to be made available or accessible to you. + +## How to Gain Traction + +Treat it as an experiment, not a mandate. + +### Start by bringing together the interdependent repos + +Choose as few services as you can get away with (2-3 ideally) that already depend on each other and form a complete system. Clone them under a single directory called something like `\services`. + +### Create a monorepo to wrap the services + +In the parent directory, create a repo that uses a build tool that supports workspaces or monorepos. Develop shell scripts that clone the interdependent repos into `/services` from the monorepo root, build each service, run each service, etc. + +### Add an integration testing layer + +Use this monorepo to test the seams of interaction between the services, ensuring that contracts and connections still function. This can be a harness to gain more confidence that each service built in isolation previously will still work well with the rest of the system. Snapshot testing via a framework like jest, while not ideal in the longterm, can fill this role in a starting capacity. Do not add unit tests here, instead develop a script that can run through the tests on each service the monorepo houses. + +### Make a local docker container with a dummy database + +Use this pilot to work through defining the interfaces to the "outside world". Use docker-compose to orchestrate fake versions of the various databases your services depend on. Dump a working copy of the data from a lower environment (if none is available, scrub a production dump of PII and use that). Start by populating the local database via the dump, then add migration scripts to keep the schema current, finally create data generators to quickly set the database to specific or random conditions the code will work against. + +### Fill in vendor dependecies with echo servers + +For additional services that call vendors that return values, create echo servers that take requests in an identical shape and return fake production-like repsonses. This is an opportunity to build in robustness into the system if vendor latency fluxuates - when there is additional time, simulate this as well. Do the same for services that don't directly talk to the services, but modify the Database on cron schedules. + +### Create a Migration Path + +Even before all the pieces allow for a tested and independently running local setupo, demonstrate how it works for you to gain team awareness. Instead of mandating use, allow individual developers to adopt the monorepo locally themselves and become advocates and champions for it. Pair program using the tooling, and help orther developers mirror your setup. You know the team has adopted the practice when developers reference it amongst eachother in work that doesn't directly involve you. + +### Continue Improving DevEx + +Listen to gripes and complaints and set asside time to improve those parts of the system. Note what still causes you confusion or wait times. Schedule time with developers working in the new repository to understand how they use it and where they still get friction. This can be done in the course of regular work if you pair program. Brainstorm potential solutions as a group. Keep an eye toward improving the items listed in the Goals, Metrics & Signals section of this page to ensure efforts are making a difference. You know it's simple to setup and use when you observe developers using it who haven't come to you with questions first. + + +## Lessons From The Field + +- If you find yourself in a company structured so that you must wait for resource allocations or a separate team to come in and build/grant you access, the wait time you are experiencing can be used to start building this local repo. It may be a matter of building it for you, then once its working, sharing it with the team. +- Remote teams might not make thier local developer setups visible. Just because no one mentions the tool or praises it doesn't mean it's not being used. If it saves them time, and shows them what they need to see more easlily than what they've cobbled together for themselves, they will use it. + +## Deciding to Polish or Pitch + +After experimenting with this practice for **one month**, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + +### Fast & Measurable + +**Lead Time** for changes spanning multiple repositories should decrease. By consolidating services into a monorepo, cross-cutting changes can be made without waiting for access approvals, access re-approvals, test-deploying, or waiting for a clean tests enviornment, reducing coordination overhead and speeding up delivery. Tools like [DX](https://getdx.com/platform/data-lake/), [Jellyfish](https://jellyfish.co/platform/engineering-metrics/), or others can measure these sort of lead time changes. Custom solutions can also be built using data from project planning tools like [JIRA](https://community.atlassian.com/forums/App-Central-articles/Cycle-Time-and-Lead-Time-in-Jira-Productivity-Measurement-with/ba-p/1905845), [Monday](https://monday.com/blog/project-management/what-is-lead-time/), or others. + +### Slow & Intangible + +**Cleaner Service Boundaries** should evolve over time, because refactoring those boundaries becomes easier. Poor service boundaries can be removed with less friction when everything lives in one repo. On the flipside, teams can more quickly extract new services with the shared tooling, configuration, and build setup. + +**Better cross-team collaboration** The monorepo can be spun up fairly simply to collaborate and demonstrate system behavior. When database changes are considered, comparing what's set up locally to the new schema change can be a point of conversation between teams that manage the data and teams than manage the code. The same thing applies to cloud environment connections between services, and to thrid pary API vendor interactions. When change involves less uncertainty, confidence to experiment increases. + +## Supported Capabilities + +### [Code Maintainability](/capabilities/code-maintainability.md) + +A local monorepo can help expose duplicated scripts and configuration across micro-repos, making the codebase easier to maintain and evolve. + +### [Test Automation](/capabilities/test-automation.md) + +Being able to test the interactions between the services the team maintains means there is much greater confidence that the whole system will perform as expected. Knowing those interactions still work while developing instead of at deploy-time shortens feedback loops and speeds up delivery. + +### [Test Data Management](/capabilities/test-data-management.md) + +Having control of the data locally means we also gain knowledge of the data's schema. By knowing and maintaining this information within the monorepo, we can eliminate uncertainty caused by database changes that previously existed outside the team's control. The same can be said for outside service vendors that supply information or data transformations our system depends on. By designing echo servers, we can even start to build the codebase to be robust in the event these services don't perform flawlessly via principles of Chaos Engineering. From 717eb967c206277f9104ec9d1451a5c0dc55d480 Mon Sep 17 00:00:00 2001 From: Ian Carroll Date: Thu, 4 Dec 2025 06:44:55 -0800 Subject: [PATCH 109/131] Update the title --- practices/build-local-dev-monorepo.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/build-local-dev-monorepo.md b/practices/build-local-dev-monorepo.md index 0721ad1..1c6ee64 100644 --- a/practices/build-local-dev-monorepo.md +++ b/practices/build-local-dev-monorepo.md @@ -1,4 +1,4 @@ -# Migrate to a Monorepo +# Build a Monorepo for local development Developing with dozens of local micro-repos that need to work together and may also have database access or cloud permissions needs can feel like you've only got the choice of flying blind or waiting to get off the ground. This practice helps teams consolidate scattered projects into a single monorepo so they can see the system at work locally and start coding anywhere (even on a plane) without a need for access approvals or an unused cloud environment. Many popular build tools support this functionality: From 0bb0265b985d605f6b6f4dc4bd4c3af6b09d318f Mon Sep 17 00:00:00 2001 From: nicoletache Date: Thu, 18 Dec 2025 12:35:05 -0600 Subject: [PATCH 110/131] edits to build local dev monorepo practice --- practices/build-local-dev-monorepo.md | 46 +++++++++++++-------------- 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/practices/build-local-dev-monorepo.md b/practices/build-local-dev-monorepo.md index 1c6ee64..bcd73e5 100644 --- a/practices/build-local-dev-monorepo.md +++ b/practices/build-local-dev-monorepo.md @@ -1,55 +1,53 @@ -# Build a Monorepo for local development +# Build a Monorepo for Local Development -Developing with dozens of local micro-repos that need to work together and may also have database access or cloud permissions needs can feel like you've only got the choice of flying blind or waiting to get off the ground. This practice helps teams consolidate scattered projects into a single monorepo so they can see the system at work locally and start coding anywhere (even on a plane) without a need for access approvals or an unused cloud environment. Many popular build tools support this functionality: +Developing with dozens of local micro-repos that need to work together (and may also have database access or cloud permissions needs) is both daunting and time-consuming -- it can feel like your choices are either flying blind or waiting to get off the ground. Teams may encounter duplicated scripts and configuration across these many micro-repos. They may not trust interactions between services will behave as expected. And database changes may be out of their control, leading to more uncertainty. But when scattered projects are consolidated into a single monorepo, teams can see the system at work locally and start coding anywhere (even on a plane), without needing access approvals or an unused cloud environment. Many popular build tools support this functionality: - [pnpm](https://pnpm.io/workspaces) - [Nx](https://nx.dev/concepts/decisions/why-monorepos) - [Bazel](https://bazel.build/concepts/build-ref#workspace) - [Turborepo](https://turborepo.com/docs#what-is-turborepo) - [Pants](https://www.pantsbuild.org/stable/docs/using-pants/environments#in-workspace-execution-experimental_workspace_environment) -- etc (this isn't by any means a definitive list) +- Etc. (this isn't by any means a definitive list) ## When to Experiment -- You are a Developer working on features that involve multiple interdependent repositories and need to get fast feedback on changes without needing access to managed cloud environments so you can start coding right away instead of waiting for resources to be made available or accessible to you. +- You are a developer working on features that involve multiple interdependent repos and you need fast feedback on changes (without needing access to managed cloud environments) so you can start coding right away (instead of waiting for resources to be made available or accessible to you). ## How to Gain Traction -Treat it as an experiment, not a mandate. - -### Start by bringing together the interdependent repos +### Start by Bringing Together the Interdependent Repos Choose as few services as you can get away with (2-3 ideally) that already depend on each other and form a complete system. Clone them under a single directory called something like `\services`. -### Create a monorepo to wrap the services +### Create a Monorepo to Wrap the Services -In the parent directory, create a repo that uses a build tool that supports workspaces or monorepos. Develop shell scripts that clone the interdependent repos into `/services` from the monorepo root, build each service, run each service, etc. +In the parent directory, create a repo that uses a build tool that supports workspaces or monorepos (see the list above). Develop shell scripts that clone the interdependent repos into `/services` from the monorepo root, build each service, run each service, etc. -### Add an integration testing layer +### Add an Integration Testing Layer -Use this monorepo to test the seams of interaction between the services, ensuring that contracts and connections still function. This can be a harness to gain more confidence that each service built in isolation previously will still work well with the rest of the system. Snapshot testing via a framework like jest, while not ideal in the longterm, can fill this role in a starting capacity. Do not add unit tests here, instead develop a script that can run through the tests on each service the monorepo houses. +Use this monorepo to test the seams of interaction between the services, ensuring that contracts and connections still function. This can be a harness to gain more confidence that each service previously built in isolation will still work well with the rest of the system. Snapshot testing via a framework like jest, while not ideal in the long term, can get the job done initially. Do not add unit tests here; instead, develop a script that can run through the tests on each service the monorepo houses. -### Make a local docker container with a dummy database +### Make a Local Docker Container With a Dummy Database -Use this pilot to work through defining the interfaces to the "outside world". Use docker-compose to orchestrate fake versions of the various databases your services depend on. Dump a working copy of the data from a lower environment (if none is available, scrub a production dump of PII and use that). Start by populating the local database via the dump, then add migration scripts to keep the schema current, finally create data generators to quickly set the database to specific or random conditions the code will work against. +Use this pilot to work through defining the interfaces to the "outside world." Use docker-compose to orchestrate fake versions of the various databases your services depend on. Dump a working copy of the data from a lower environment (if none is available, scrub a production dump of PII and use that). Start by populating the local database with the dump, then add migration scripts to keep the schema current. Finally, create data generators to quickly set the database to specific or random conditions the code will work against. -### Fill in vendor dependecies with echo servers +### Fill in Vendor Dependecies With Echo Servers -For additional services that call vendors that return values, create echo servers that take requests in an identical shape and return fake production-like repsonses. This is an opportunity to build in robustness into the system if vendor latency fluxuates - when there is additional time, simulate this as well. Do the same for services that don't directly talk to the services, but modify the Database on cron schedules. +For additional services that call vendors that return values, create echo servers that take requests in an identical shape and return fake production-like repsonses. This is an opportunity to build in robustness into the system if vendor latency fluxuates -- when there is additional time, simulate this as well. Do the same for services that don't directly talk to the services, but modify the database on cron schedules. ### Create a Migration Path -Even before all the pieces allow for a tested and independently running local setupo, demonstrate how it works for you to gain team awareness. Instead of mandating use, allow individual developers to adopt the monorepo locally themselves and become advocates and champions for it. Pair program using the tooling, and help orther developers mirror your setup. You know the team has adopted the practice when developers reference it amongst eachother in work that doesn't directly involve you. +Before you have a tested and independently running local setup, get your team involved and demonstrate how it works for you. Treat this as an experiment, not a mandate. Allow individual developers to adopt the monorepo locally and become advocates and champions for it. Pair program using the tooling and help orther developers mirror your setup. You know the team has adopted the practice when developers reference it in work that doesn't directly involve you. ### Continue Improving DevEx -Listen to gripes and complaints and set asside time to improve those parts of the system. Note what still causes you confusion or wait times. Schedule time with developers working in the new repository to understand how they use it and where they still get friction. This can be done in the course of regular work if you pair program. Brainstorm potential solutions as a group. Keep an eye toward improving the items listed in the Goals, Metrics & Signals section of this page to ensure efforts are making a difference. You know it's simple to setup and use when you observe developers using it who haven't come to you with questions first. +Schedule time with developers working in the new repo to understand how they use it and where they still encounter friction. This can be done during the course of regular work if you pair program. Listen to any gripes and complaints, and note what still causes you confusion or wait times. Brainstorm potential solutions as a group and set aside time to improve problematic parts of the system. Focus on improving the items listed in the Polish or Pitch section of this page to ensure efforts are making a difference. You know it's simple to set up and use when you observe developers using it who haven't come to you with questions first. ## Lessons From The Field -- If you find yourself in a company structured so that you must wait for resource allocations or a separate team to come in and build/grant you access, the wait time you are experiencing can be used to start building this local repo. It may be a matter of building it for you, then once its working, sharing it with the team. -- Remote teams might not make thier local developer setups visible. Just because no one mentions the tool or praises it doesn't mean it's not being used. If it saves them time, and shows them what they need to see more easlily than what they've cobbled together for themselves, they will use it. +- [*insert summary line*] If your company is structured in a way that you must wait for resource allocations or a separate team to build/grant you access, then use that time to start building this local repo. Once you've got an experimental version up and running, share it with the team. +- [*insert summary line*] Remote teams might not make thier local developer setups visible. Just because no one mentions the tool or praises it doesn't mean it's not being used. If it saves your developers time and easily shows them what they need to see, then they'll use it. ## Deciding to Polish or Pitch @@ -57,13 +55,13 @@ After experimenting with this practice for **one month**, bring the team togethe ### Fast & Measurable -**Lead Time** for changes spanning multiple repositories should decrease. By consolidating services into a monorepo, cross-cutting changes can be made without waiting for access approvals, access re-approvals, test-deploying, or waiting for a clean tests enviornment, reducing coordination overhead and speeding up delivery. Tools like [DX](https://getdx.com/platform/data-lake/), [Jellyfish](https://jellyfish.co/platform/engineering-metrics/), or others can measure these sort of lead time changes. Custom solutions can also be built using data from project planning tools like [JIRA](https://community.atlassian.com/forums/App-Central-articles/Cycle-Time-and-Lead-Time-in-Jira-Productivity-Measurement-with/ba-p/1905845), [Monday](https://monday.com/blog/project-management/what-is-lead-time/), or others. +**Lead Time for Changes Decreases.** Changes spanning multiple repos should happen faster. By consolidating services into a monorepo, cross-cutting changes can be made without waiting for access approvals, access re-approvals, test deploying, or a clean tests enviornment. This reduces coordination overhead and speeds up delivery. Tools like [DX](https://getdx.com/platform/data-lake/) and [Jellyfish](https://jellyfish.co/platform/engineering-metrics/) can measure lead time changes. Custom solutions can also be built using data from project-planning tools like [JIRA](https://community.atlassian.com/forums/App-Central-articles/Cycle-Time-and-Lead-Time-in-Jira-Productivity-Measurement-with/ba-p/1905845) and [Monday](https://monday.com/blog/project-management/what-is-lead-time/). ### Slow & Intangible -**Cleaner Service Boundaries** should evolve over time, because refactoring those boundaries becomes easier. Poor service boundaries can be removed with less friction when everything lives in one repo. On the flipside, teams can more quickly extract new services with the shared tooling, configuration, and build setup. +**Cleaner Service Boundaries.** Refactoring service boundaries should become easier. When everything lives in one repo, poor service boundaries can be removed with less friction. On the flipside, teams can more quickly extract new services with the shared tooling, configuration, and build setup. -**Better cross-team collaboration** The monorepo can be spun up fairly simply to collaborate and demonstrate system behavior. When database changes are considered, comparing what's set up locally to the new schema change can be a point of conversation between teams that manage the data and teams than manage the code. The same thing applies to cloud environment connections between services, and to thrid pary API vendor interactions. When change involves less uncertainty, confidence to experiment increases. +**Better cross-team collaboration.** The monorepo can be spun up fairly simply for team collaboration and to demonstrate system behavior. In terms of database changes, comparing what's set up locally to the new schema change can be a point of conversation between teams that manage the data and teams than manage the code. The same thing applies to cloud environment connections between services, and third-party API vendor interactions. When change involves less uncertainty, then confidence to experiment increases. ## Supported Capabilities @@ -73,8 +71,8 @@ A local monorepo can help expose duplicated scripts and configuration across mic ### [Test Automation](/capabilities/test-automation.md) -Being able to test the interactions between the services the team maintains means there is much greater confidence that the whole system will perform as expected. Knowing those interactions still work while developing instead of at deploy-time shortens feedback loops and speeds up delivery. +When the team can test interactions between services, there is greater confidence that the whole system will perform as expected. Knowing those interactions still work *while developing*, instead of at deploy time, shortens feedback loops and speeds up delivery. ### [Test Data Management](/capabilities/test-data-management.md) -Having control of the data locally means we also gain knowledge of the data's schema. By knowing and maintaining this information within the monorepo, we can eliminate uncertainty caused by database changes that previously existed outside the team's control. The same can be said for outside service vendors that supply information or data transformations our system depends on. By designing echo servers, we can even start to build the codebase to be robust in the event these services don't perform flawlessly via principles of Chaos Engineering. +Having control of the data locally means we also gain knowledge of the data's schema. Having and maintaining this information within the monorepo means we can eliminate uncertainty caused by database changes that previously existed outside the team's control. The same can be said for outside service vendors that supply information or data transformations the system depends on. By designing echo servers, teams can build more robust codebases (in the event these services don't perform flawlessly via principles of Chaos Engineering). From 382169f56620c8fb2215e0628fa84e7c65291d0a Mon Sep 17 00:00:00 2001 From: Ian Carroll Date: Fri, 26 Dec 2025 14:42:53 -0800 Subject: [PATCH 111/131] Clarify vagueries --- practices/build-local-dev-monorepo.md | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/practices/build-local-dev-monorepo.md b/practices/build-local-dev-monorepo.md index bcd73e5..8c739b9 100644 --- a/practices/build-local-dev-monorepo.md +++ b/practices/build-local-dev-monorepo.md @@ -2,6 +2,8 @@ Developing with dozens of local micro-repos that need to work together (and may also have database access or cloud permissions needs) is both daunting and time-consuming -- it can feel like your choices are either flying blind or waiting to get off the ground. Teams may encounter duplicated scripts and configuration across these many micro-repos. They may not trust interactions between services will behave as expected. And database changes may be out of their control, leading to more uncertainty. But when scattered projects are consolidated into a single monorepo, teams can see the system at work locally and start coding anywhere (even on a plane), without needing access approvals or an unused cloud environment. Many popular build tools support this functionality: +#### build tools that support this functionality + - [pnpm](https://pnpm.io/workspaces) - [Nx](https://nx.dev/concepts/decisions/why-monorepos) - [Bazel](https://bazel.build/concepts/build-ref#workspace) @@ -21,7 +23,7 @@ Choose as few services as you can get away with (2-3 ideally) that already depen ### Create a Monorepo to Wrap the Services -In the parent directory, create a repo that uses a build tool that supports workspaces or monorepos (see the list above). Develop shell scripts that clone the interdependent repos into `/services` from the monorepo root, build each service, run each service, etc. +In the parent directory, create a repo that uses a build tool that supports workspaces or monorepos ([see the list above](#build-tools-that-support-this-functionality)). Develop shell scripts that clone the interdependent repos into `/services` from the monorepo root, then extend the shell scripts to set up each of the services for local development as you would in isolation(install, build, run, etc.) ### Add an Integration Testing Layer @@ -29,15 +31,17 @@ Use this monorepo to test the seams of interaction between the services, ensurin ### Make a Local Docker Container With a Dummy Database -Use this pilot to work through defining the interfaces to the "outside world." Use docker-compose to orchestrate fake versions of the various databases your services depend on. Dump a working copy of the data from a lower environment (if none is available, scrub a production dump of PII and use that). Start by populating the local database with the dump, then add migration scripts to keep the schema current. Finally, create data generators to quickly set the database to specific or random conditions the code will work against. +Use this pilot to work through defining the interfaces to the "outside world." Use docker-compose to orchestrate fake versions of the various databases your services depend on. It's likely you'll need a local copy of data that exists in running cloud-based systems. Create a copied data file from a database dump of a working staging environment (if none is available, scrub a production dump of PII and use that). Start by populating the local database with the copied file, then add migration scripts to keep the schema current. Finally, create data generators to quickly set the database to specific or random conditions the code will work against. ### Fill in Vendor Dependecies With Echo Servers -For additional services that call vendors that return values, create echo servers that take requests in an identical shape and return fake production-like repsonses. This is an opportunity to build in robustness into the system if vendor latency fluxuates -- when there is additional time, simulate this as well. Do the same for services that don't directly talk to the services, but modify the database on cron schedules. +Echo servers are named because they simply "echo back" what you send them. The echo servers we need will only be slightly more complex. They'll accept requests shaped as your vendor expects and return responses shaped the same as how your vendor replies. There should be next to no business logic in them. + +Your services may depend on calls to third-party vendors, such as accounting software, a CRM, or even a Zip Code finder. Calling out to the vendors directly, whenther using their live APIs or development sandboxes can have technical limitations or throttle feedback loops. To isolate your development environment and speed up feedback loops, you can instead create these basic servers ("echo servers") that take requests in an identical shape and return fake production-like repsonses. This can also create an opportunity to build in robustness into the system if vendor latency fluxuates -- when there is additional time, simulate this as well. Do the same for services that don't directly talk to the services, but modify the database on cron schedules. ### Create a Migration Path -Before you have a tested and independently running local setup, get your team involved and demonstrate how it works for you. Treat this as an experiment, not a mandate. Allow individual developers to adopt the monorepo locally and become advocates and champions for it. Pair program using the tooling and help orther developers mirror your setup. You know the team has adopted the practice when developers reference it in work that doesn't directly involve you. +Before you have a tested and independently running local setup, get your team involved and demonstrate how it works for you. Treat this as an experiment, not a mandate. Allow individual developers to adopt the monorepo locally and become advocates and champions for it. Pair program using the tooling and help orther developers mirror your setup. You know the team has adopted the practice when developers reference the monorepo amongst each other when when discussing daily work that's independent of yours. ### Continue Improving DevEx @@ -46,8 +50,8 @@ Schedule time with developers working in the new repo to understand how they use ## Lessons From The Field -- [*insert summary line*] If your company is structured in a way that you must wait for resource allocations or a separate team to build/grant you access, then use that time to start building this local repo. Once you've got an experimental version up and running, share it with the team. -- [*insert summary line*] Remote teams might not make thier local developer setups visible. Just because no one mentions the tool or praises it doesn't mean it's not being used. If it saves your developers time and easily shows them what they need to see, then they'll use it. +- *Use Idle-time to develop tooling* - Your organization may be structured in a way that you must wait for a separate team to build/configure/grant you access to required resurces - such as cloud databases, vendor sandboxes, or development environments. Instead of just waiting, use that time to start building this local repo. Once you've got an experimental version up and running, share it with the team. You may discover that you don't need to rely on others as much and can deliver value faster, saving cross-team-collaboration time for when it really matters. +- *Don't be discouraged by silence* - Remote teams might not make thier local developer setups visible. Just because no one mentions the monorepo or praises it doesn't mean it's not being used. If it saves your developers time and easily shows them what they need to see, then they'll use it. ## Deciding to Polish or Pitch @@ -55,7 +59,7 @@ After experimenting with this practice for **one month**, bring the team togethe ### Fast & Measurable -**Lead Time for Changes Decreases.** Changes spanning multiple repos should happen faster. By consolidating services into a monorepo, cross-cutting changes can be made without waiting for access approvals, access re-approvals, test deploying, or a clean tests enviornment. This reduces coordination overhead and speeds up delivery. Tools like [DX](https://getdx.com/platform/data-lake/) and [Jellyfish](https://jellyfish.co/platform/engineering-metrics/) can measure lead time changes. Custom solutions can also be built using data from project-planning tools like [JIRA](https://community.atlassian.com/forums/App-Central-articles/Cycle-Time-and-Lead-Time-in-Jira-Productivity-Measurement-with/ba-p/1905845) and [Monday](https://monday.com/blog/project-management/what-is-lead-time/). +**Lead Time for Changes Decreases.** Changes spanning multiple services and applications should happen faster. By consolidating services into a monorepo, cross-cutting changes can be made without waiting for access approvals, access re-approvals, test deploying, or an available cloud staging enviornment. This reduces coordination overhead and speeds up delivery. Tools like [DX](https://getdx.com/platform/data-lake/) and [Jellyfish](https://jellyfish.co/platform/engineering-metrics/) can measure lead time changes. Custom solutions can also be built using data from project-planning tools like [JIRA](https://community.atlassian.com/forums/App-Central-articles/Cycle-Time-and-Lead-Time-in-Jira-Productivity-Measurement-with/ba-p/1905845) and [Monday](https://monday.com/blog/project-management/what-is-lead-time/). ### Slow & Intangible @@ -75,4 +79,6 @@ When the team can test interactions between services, there is greater confidenc ### [Test Data Management](/capabilities/test-data-management.md) -Having control of the data locally means we also gain knowledge of the data's schema. Having and maintaining this information within the monorepo means we can eliminate uncertainty caused by database changes that previously existed outside the team's control. The same can be said for outside service vendors that supply information or data transformations the system depends on. By designing echo servers, teams can build more robust codebases (in the event these services don't perform flawlessly via principles of Chaos Engineering). +Having control of the data locally means we also gain knowledge of the data's schema. Having and maintaining this information within the monorepo means we can have confidence our code works against a given database schema. Comparing that schema to ones in cloud environments allows developers to stay in step with any updates a DBA or Data Team may add, which means there are less surprises, unsceduled meetings, and debugging at deploy-time. + +The same can be said for outside service vendors that supply information or data transformations the system depends on. By designing echo servers, developers can get faster feedback. Additionally, teams can build more robust code when a vendor's system slows or goes down. Major cloud vendors have been in the news for bringing enterprise systems down with them when the vendor experiences an outage. Since echo servers can also be built to simulate an outage, developers can develop graceful failure states that keep our systems running. From e67a17e1fd98f3ff37cebc0aa93241d2343dcb2a Mon Sep 17 00:00:00 2001 From: nicoletache Date: Tue, 30 Dec 2025 14:45:56 -0600 Subject: [PATCH 112/131] final edits to build monorepo for local dev practice --- practices/build-local-dev-monorepo.md | 30 +++++++++++++-------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/practices/build-local-dev-monorepo.md b/practices/build-local-dev-monorepo.md index 8c739b9..2c72bab 100644 --- a/practices/build-local-dev-monorepo.md +++ b/practices/build-local-dev-monorepo.md @@ -1,8 +1,6 @@ # Build a Monorepo for Local Development -Developing with dozens of local micro-repos that need to work together (and may also have database access or cloud permissions needs) is both daunting and time-consuming -- it can feel like your choices are either flying blind or waiting to get off the ground. Teams may encounter duplicated scripts and configuration across these many micro-repos. They may not trust interactions between services will behave as expected. And database changes may be out of their control, leading to more uncertainty. But when scattered projects are consolidated into a single monorepo, teams can see the system at work locally and start coding anywhere (even on a plane), without needing access approvals or an unused cloud environment. Many popular build tools support this functionality: - -#### build tools that support this functionality +Developing with dozens of local micro-repos that need to work together (and may also have database access or cloud permissions needs) is both daunting and time-consuming. Sometimes it can feel like your choices are either flying blind or waiting to get off the ground. Teams may encounter duplicated scripts and configuration across these many micro-repos. They may not trust interactions between services will behave as expected. And database changes may be out of their control, leading to more uncertainty. But when scattered projects are consolidated into a single monorepo, teams can see the system at work locally and start coding anywhere (even on a plane), without needing access approvals or an unused cloud environment. Many popular build tools support this functionality, including: - [pnpm](https://pnpm.io/workspaces) - [Nx](https://nx.dev/concepts/decisions/why-monorepos) @@ -17,13 +15,13 @@ Developing with dozens of local micro-repos that need to work together (and may ## How to Gain Traction -### Start by Bringing Together the Interdependent Repos +### Start by Bringing Together Some Interdependent Repos Choose as few services as you can get away with (2-3 ideally) that already depend on each other and form a complete system. Clone them under a single directory called something like `\services`. ### Create a Monorepo to Wrap the Services -In the parent directory, create a repo that uses a build tool that supports workspaces or monorepos ([see the list above](#build-tools-that-support-this-functionality)). Develop shell scripts that clone the interdependent repos into `/services` from the monorepo root, then extend the shell scripts to set up each of the services for local development as you would in isolation(install, build, run, etc.) +In the parent directory, create a repo that uses a build tool that supports workspaces or monorepos (see list at top of page). Develop shell scripts that clone the interdependent repos into `/services` from the monorepo root, then extend the shell scripts to set up each of the services for local development as you would in isolation (install, build, run, etc.). ### Add an Integration Testing Layer @@ -35,23 +33,23 @@ Use this pilot to work through defining the interfaces to the "outside world." U ### Fill in Vendor Dependecies With Echo Servers -Echo servers are named because they simply "echo back" what you send them. The echo servers we need will only be slightly more complex. They'll accept requests shaped as your vendor expects and return responses shaped the same as how your vendor replies. There should be next to no business logic in them. +Echo servers are named because they simply "echo back" what you send them. The echo servers we need will only be slightly more complex. They'll accept requests shaped as your vendor expects and return responses shaped in the way your vendor replies. There should be next to no business logic in them. -Your services may depend on calls to third-party vendors, such as accounting software, a CRM, or even a Zip Code finder. Calling out to the vendors directly, whenther using their live APIs or development sandboxes can have technical limitations or throttle feedback loops. To isolate your development environment and speed up feedback loops, you can instead create these basic servers ("echo servers") that take requests in an identical shape and return fake production-like repsonses. This can also create an opportunity to build in robustness into the system if vendor latency fluxuates -- when there is additional time, simulate this as well. Do the same for services that don't directly talk to the services, but modify the database on cron schedules. +Your services may depend on calls to third-party vendors, such as accounting software, a CRM, or even a Zip Code finder. Calling out to the vendors directly, whenther using their live APIs or development sandboxes can have technical limitations or throttle feedback loops. To isolate your development environment and speed up feedback loops, you can instead create these basic echo servers that take requests in an identical shape and return fake production-like repsonses. This can also create an opportunity to build robustness into the system if vendor latency fluxuates -- when there is additional time, simulate this as well. Do the same for services that don't directly talk to vendors, but modify the database on cron schedules. ### Create a Migration Path -Before you have a tested and independently running local setup, get your team involved and demonstrate how it works for you. Treat this as an experiment, not a mandate. Allow individual developers to adopt the monorepo locally and become advocates and champions for it. Pair program using the tooling and help orther developers mirror your setup. You know the team has adopted the practice when developers reference the monorepo amongst each other when when discussing daily work that's independent of yours. +Before you have a tested and independently running local setup, get your team involved and demonstrate how it works for you. Treat this as an experiment, not a mandate. Allow individual developers to adopt the monorepo locally and become advocates and champions for it. Pair program using the tooling and help other developers mirror your setup. You know the team has adopted the practice when developers reference the monorepo when when discussing daily work that's independent of yours. ### Continue Improving DevEx -Schedule time with developers working in the new repo to understand how they use it and where they still encounter friction. This can be done during the course of regular work if you pair program. Listen to any gripes and complaints, and note what still causes you confusion or wait times. Brainstorm potential solutions as a group and set aside time to improve problematic parts of the system. Focus on improving the items listed in the Polish or Pitch section of this page to ensure efforts are making a difference. You know it's simple to set up and use when you observe developers using it who haven't come to you with questions first. +Schedule time with developers working in the new repo to understand how they use it and where they still encounter friction. This can be done during the course of regular work if you pair program. Listen to any gripes and complaints, and note what still causes confusion or wait times. Brainstorm potential solutions as a group and set aside time to improve problematic parts of the system. Focus on improving items listed in the Polish or Pitch section of this page to ensure efforts are making a difference. You know the repo is simple to set up and use when you observe developers using it who haven't come to you with questions first. ## Lessons From The Field -- *Use Idle-time to develop tooling* - Your organization may be structured in a way that you must wait for a separate team to build/configure/grant you access to required resurces - such as cloud databases, vendor sandboxes, or development environments. Instead of just waiting, use that time to start building this local repo. Once you've got an experimental version up and running, share it with the team. You may discover that you don't need to rely on others as much and can deliver value faster, saving cross-team-collaboration time for when it really matters. -- *Don't be discouraged by silence* - Remote teams might not make thier local developer setups visible. Just because no one mentions the monorepo or praises it doesn't mean it's not being used. If it saves your developers time and easily shows them what they need to see, then they'll use it. +- *Use Idle-time to Develop Tooling* - Your organization may be structured in a way that requires you to wait for a separate team to build/configure/grant you access to needed resources -- cloud databases, vendor sandboxes, or development environments. Instead of just waiting, use that time to start building this local repo. You may discover that you don't need to rely on others as much here and you can deliver value faster than you thought. Once you've got an experimental version up and running, share it with the team. This way, you save cross-team collaboration for when it really matters. +- *Don't Be Discouraged By Silence* - Remote teams might not make their local developer setups visible. Just because no one mentions the monorepo or praises it doesn't mean it's not being used. If it saves your developers time and easily shows them what they need to see, then they'll use it. ## Deciding to Polish or Pitch @@ -59,13 +57,13 @@ After experimenting with this practice for **one month**, bring the team togethe ### Fast & Measurable -**Lead Time for Changes Decreases.** Changes spanning multiple services and applications should happen faster. By consolidating services into a monorepo, cross-cutting changes can be made without waiting for access approvals, access re-approvals, test deploying, or an available cloud staging enviornment. This reduces coordination overhead and speeds up delivery. Tools like [DX](https://getdx.com/platform/data-lake/) and [Jellyfish](https://jellyfish.co/platform/engineering-metrics/) can measure lead time changes. Custom solutions can also be built using data from project-planning tools like [JIRA](https://community.atlassian.com/forums/App-Central-articles/Cycle-Time-and-Lead-Time-in-Jira-Productivity-Measurement-with/ba-p/1905845) and [Monday](https://monday.com/blog/project-management/what-is-lead-time/). +**Lead Time for Changes Decreases.** Changes spanning multiple services and applications should happen faster. By consolidating services into a monorepo, cross-cutting changes can be made without waiting for access approvals, access re-approvals, test deploying, or an available cloud staging environment. This reduces coordination overhead and speeds up delivery. Tools like [DX](https://getdx.com/platform/data-lake/) and [Jellyfish](https://jellyfish.co/platform/engineering-metrics/) can measure lead time changes. Custom solutions can also be built using data from project-planning tools like [JIRA](https://community.atlassian.com/forums/App-Central-articles/Cycle-Time-and-Lead-Time-in-Jira-Productivity-Measurement-with/ba-p/1905845) and [Monday](https://monday.com/blog/project-management/what-is-lead-time/). ### Slow & Intangible -**Cleaner Service Boundaries.** Refactoring service boundaries should become easier. When everything lives in one repo, poor service boundaries can be removed with less friction. On the flipside, teams can more quickly extract new services with the shared tooling, configuration, and build setup. +**Cleaner Service Boundaries.** Refactoring service boundaries should become easier. When everything lives in one repo, poor service boundaries can be removed with less friction. Teams can also quickly extract new services with the shared tooling, configuration, and build setup. -**Better cross-team collaboration.** The monorepo can be spun up fairly simply for team collaboration and to demonstrate system behavior. In terms of database changes, comparing what's set up locally to the new schema change can be a point of conversation between teams that manage the data and teams than manage the code. The same thing applies to cloud environment connections between services, and third-party API vendor interactions. When change involves less uncertainty, then confidence to experiment increases. +**Better Cross-team Collaboration.** The monorepo can be spun up fairly simply for team collaboration and to demonstrate system behavior. In terms of database changes, comparing what's set up locally to the new schema change can be a point of conversation between teams that manage the data and teams than manage the code. The same thing applies to cloud environment connections between services and third-party API vendor interactions. When change involves less uncertainty, the confidence to experiment increases. ## Supported Capabilities @@ -79,6 +77,6 @@ When the team can test interactions between services, there is greater confidenc ### [Test Data Management](/capabilities/test-data-management.md) -Having control of the data locally means we also gain knowledge of the data's schema. Having and maintaining this information within the monorepo means we can have confidence our code works against a given database schema. Comparing that schema to ones in cloud environments allows developers to stay in step with any updates a DBA or Data Team may add, which means there are less surprises, unsceduled meetings, and debugging at deploy-time. +Having control of the data locally means understanding the data's schema. By maintaining this information within the monorepo, we can feel confident our code works against a given database schema. Comparing that schema to ones in cloud environments allows developers to stay in step with any updates a DBA or data team may add, which means there are less surprises, unscheduled meetings, and debugging at deploy time. -The same can be said for outside service vendors that supply information or data transformations the system depends on. By designing echo servers, developers can get faster feedback. Additionally, teams can build more robust code when a vendor's system slows or goes down. Major cloud vendors have been in the news for bringing enterprise systems down with them when the vendor experiences an outage. Since echo servers can also be built to simulate an outage, developers can develop graceful failure states that keep our systems running. +The same can be said for outside service vendors that supply information or data transformations the system depends on. By designing echo servers, developers can get faster feedback. Additionally, teams can build more robust code when a vendor's system slows or goes down. Major cloud vendors have been in the news for bringing enterprise systems down with them when the vendor experiences an outage. Since echo servers can also be built to simulate an outage, developers can develop graceful failure states that keep systems running. From 38d823d146c51b36193a52b40cfef21f02426c9f Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 27 Jan 2026 13:36:00 -0700 Subject: [PATCH 113/131] minor fixes --- practices/build-local-dev-monorepo.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/practices/build-local-dev-monorepo.md b/practices/build-local-dev-monorepo.md index 2c72bab..c19198c 100644 --- a/practices/build-local-dev-monorepo.md +++ b/practices/build-local-dev-monorepo.md @@ -25,7 +25,7 @@ In the parent directory, create a repo that uses a build tool that supports work ### Add an Integration Testing Layer -Use this monorepo to test the seams of interaction between the services, ensuring that contracts and connections still function. This can be a harness to gain more confidence that each service previously built in isolation will still work well with the rest of the system. Snapshot testing via a framework like jest, while not ideal in the long term, can get the job done initially. Do not add unit tests here; instead, develop a script that can run through the tests on each service the monorepo houses. +Use this monorepo to test the seams of interaction between the services, ensuring that contracts and connections still function. This can be a harness to gain more confidence that each service previously built in isolation will still work well with the rest of the system. Snapshot testing via a framework like [jest](https://jestjs.io/), while not ideal in the long term, can get the job done initially. Do not add unit tests here; instead, develop a script that can run through the tests on each service the monorepo houses. ### Make a Local Docker Container With a Dummy Database @@ -61,7 +61,7 @@ After experimenting with this practice for **one month**, bring the team togethe ### Slow & Intangible -**Cleaner Service Boundaries.** Refactoring service boundaries should become easier. When everything lives in one repo, poor service boundaries can be removed with less friction. Teams can also quickly extract new services with the shared tooling, configuration, and build setup. +**Cleaner Service Boundaries.** Refactoring service boundaries should become easier. When everything lives in one repo, poor service boundaries can be more clearly defined with less friction. Teams can also quickly extract new services with the shared tooling, configuration, and build setup. **Better Cross-team Collaboration.** The monorepo can be spun up fairly simply for team collaboration and to demonstrate system behavior. In terms of database changes, comparing what's set up locally to the new schema change can be a point of conversation between teams that manage the data and teams than manage the code. The same thing applies to cloud environment connections between services and third-party API vendor interactions. When change involves less uncertainty, the confidence to experiment increases. From 29c8718c6e8968abc6ae3f6194b4c98e95bb48ba Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 27 Jan 2026 15:38:39 -0700 Subject: [PATCH 114/131] Simplify practice template --- templates/new-practice.md | 87 ++++++++++++++++++++++++--------------- 1 file changed, 53 insertions(+), 34 deletions(-) diff --git a/templates/new-practice.md b/templates/new-practice.md index 54702c0..979f222 100644 --- a/templates/new-practice.md +++ b/templates/new-practice.md @@ -1,54 +1,73 @@ -> Review an existing practice like [Migrate to Monorepo](/practices/migrate-to-monorepo.md) to see a good example of a practice following this template. +# Use Data-generation Tools -# `[Action-Oriented Title]` - -```text -Quick 2-4 sentence summary. What’s the practice? Why should teams care? Keep it casual and motivating. -``` +Introduction. 2-4 paragraphs. ## When to Experiment -```text -You are a [persona] and need to [learn how to / ensure that] so you can [end goal].” -(List for each relevant persona: Non-technical exec, Technical exec, Developer, QA, PM, Product Manager, etc.) -``` +2-6 bullet points. (use markdown dash syntax for unordered lists) +- Reason 1 +- Reason 2 +- etc... ## How to Gain Traction -```text -List 3–5 steps to take a team from zero to adopted. -For each step, include: - ### [Action Step] - 3 sentences on how to do it, how to get buy-in, and what tools/resources help. Any external resources (videos, guides, book lists, templates, etc.) that help a team adopt this practice should be linked here within the relevant action step. -``` + + +Small introduction. 1 paragraph. + +### First Step Title + +1 paragraph. + +### Second Step Title + +1 paragraph. ## Lessons From The Field -```text -This section captures real-world patterns (things that consistently help or hinder this practice) along with short, relevant stories from the field. It’s not for personal rants or generic opinions. Each entry must be based on either: -1. a repeated observation across teams, or -2. a specific example (what worked, what didn’t, and why). -``` +2-6 bullet points. (use markdown dash syntax for unordered lists) +- Anicdote 1 +- Anicdote 2 +- etc... ## Deciding to Polish or Pitch -After experimenting with this practice for [**insert appropriate quantity of time in bold**], bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + -Organize metrics and signals into Fast & Measurable, Fast & Intangible, Slow & Measurable, or Slow & Intangible, but only include categories with strong, defensible signals. Exclude weak or hard-to-attribute signals. + +After experimenting with this practice for 2-3 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: -For measurable items, specify how to track them (e.g., DX, Jira, CI dashboards). For intangible items, note how to capture feedback (e.g., surveys, retro notes, developer chatter). +### Fast & Intangible -Keep metrics scoped and outcome-focused (e.g., “reduced lead time for cross-repo changes” instead of just “reduced lead time”). -``` +**Title of benefit**. 2-4 sentences about the benefit. + +### Slow & Tangible + +**Title of benefit**. 2-4 sentences about the benefit. ## Supported Capabilities -```text -List 1–4 existing DORA Capabilities this practice supports. -For each: - ### Capability Name (link) - 1–2 sentences on how this practice helps improve it. -``` + + +### [Example Capability](/capabilities/example-capability.md) + +2-4 sentences about how this capability relates to the practice. + +### [Example Capability](/capabilities/example-capability.md) + +2-4 sentences about how this capability relates to the practice. From 2fb50d65707dd159aaa0424657a62eb2ef67a19a Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Tue, 27 Jan 2026 15:49:39 -0700 Subject: [PATCH 115/131] improve template --- templates/new-practice.md | 1 + 1 file changed, 1 insertion(+) diff --git a/templates/new-practice.md b/templates/new-practice.md index 979f222..cde9044 100644 --- a/templates/new-practice.md +++ b/templates/new-practice.md @@ -4,6 +4,7 @@ Introduction. 2-4 paragraphs. ## When to Experiment + 2-6 bullet points. (use markdown dash syntax for unordered lists) - Reason 1 - Reason 2 From 17053cd91bdd4d62e5c0cd4362440eb9ce04d8e9 Mon Sep 17 00:00:00 2001 From: Nicole Tache Date: Wed, 3 Sep 2025 14:45:53 -0500 Subject: [PATCH 116/131] Update implement-a-documentation-search-engine.md Added text for new practice -- Introduce an Enterprise Search Tool (formerly known as "Implement a Documentation Search Engine"). --- ...implement-a-documentation-search-engine.md | 62 +++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 practices/implement-a-documentation-search-engine.md diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md new file mode 100644 index 0000000..596cfd9 --- /dev/null +++ b/practices/implement-a-documentation-search-engine.md @@ -0,0 +1,62 @@ +# Introduce an Enterprise Search Tool + +When critical system knowledge is locked away in the minds of a few tenured employees, bottlenecks are created, onboarding slows, and reliance on key personnel increases. Incomplete or outdated context will always lead to rework. An enterprise search tool like [Glean](https://www.glean.com/enterprise-search-software) or [Claude for Enterprise](https://claude.ai/login?returnTo=%2F%3F) unifies access to information across JIRA, Confluence, SVN, shared drives, and more. This allows teams to quickly find what they need without interrupting others or digging through multiple systems that require specialized knowledge. + +## When to Experiment + +“I am a new developer and I need to learn how to find accurate, updated documentation so I can onboard quickly and easily.” + +“I am a QA team member and I need to ensure I have quick access to system knowledge so I can avoid wasting time sifting through multiple systems.” + +“I am a developer and I need to ensure I have access to complete and up-to-date requirements so I can produce quality tickets and avoid QA rejections.” + +“I am a senior team member and I need to ensure that specialized knowledge is accessible so I can reduce interruptions and boost team autonomy.” + +## How to Gain Traction + + + +List 3–5 steps to take a team from zero to adopted. +For each step, include: + +### [Action Step] + +3 sentences on how to do it, how to get buy-in, and what tools/resources help. Any external resources (videos, guides, book lists, templates, etc.) that help a team adopt this practice should be linked here within the relevant action step. + +## Lessons From The Field + +[Pragmint to complete] + +This section captures real-world patterns (things that consistently help or hinder this practice) along with short, relevant stories from the field. It’s not for personal rants or generic opinions. Each entry must be based on either: +1. a repeated observation across teams, or +2. a specific example (what worked, what didn’t, and why). + +## Deciding to Polish or Pitch + +After experimenting with this practice for [insert appropriate quantity of time in bold], bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + +### Fast & Measurable + +Improved ticket quality and fewer QA rejections due to incomplete or unclear requirements + +### Slow & Measurable + +Increased self-service search usage, indicating trust and utility in the tool + +Reduced onboarding time for new developers and QA (time to first merged PR or tested ticket) + +### Slow & Intangible + +Lower interruption load on senior team members (tracked by volume of support requests sent via Teams chat) + +Fewer context-gathering delays in ticket implementation (measured via cycle time or qualitative developer feedback) + +## Supported Capabilities + +### [Learning Culture](https://github.com/pragmint/open-practices/blob/main/capabilities/learning-culture.md)) + +By centralizing institutional knowledge and making it accessible, enterprise search should boost team autonomy, reduce wasted time, and support a healthier learning culture. + +### [Documentation Quality](https://github.com/pragmint/open-practices/blob/main/capabilities/documentation-quality.md)) + +Excellent documentation is accurate, clear, complete, and accessible to internal teams. This enables teams to effectively collaborate, make informed decisions, and deliver high-quality software quickly and reliably. From 52e0fad5b56b4064c2c0895902769c769be90470 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 28 Jan 2026 10:14:25 -0700 Subject: [PATCH 117/131] improving enterprise search tool --- ...implement-a-documentation-search-engine.md | 61 ++++++++----------- 1 file changed, 25 insertions(+), 36 deletions(-) diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md index 596cfd9..ab7bafe 100644 --- a/practices/implement-a-documentation-search-engine.md +++ b/practices/implement-a-documentation-search-engine.md @@ -1,62 +1,51 @@ -# Introduce an Enterprise Search Tool +# Implement a Document Search Engine -When critical system knowledge is locked away in the minds of a few tenured employees, bottlenecks are created, onboarding slows, and reliance on key personnel increases. Incomplete or outdated context will always lead to rework. An enterprise search tool like [Glean](https://www.glean.com/enterprise-search-software) or [Claude for Enterprise](https://claude.ai/login?returnTo=%2F%3F) unifies access to information across JIRA, Confluence, SVN, shared drives, and more. This allows teams to quickly find what they need without interrupting others or digging through multiple systems that require specialized knowledge. +When critical system knowledge is locked away in the minds of a few tenured employees, bottlenecks are created, onboarding slows, and reliance on key personnel increases. Incomplete or outdated context will always lead to rework. An enterprise search tool—such as [Glean](https://www.glean.com/) or [Claude](https://claude.ai/) for Enterprise—unifies access to information across JIRA, Confluence, SVN, shared drives, and communication platforms. + +This practice allows teams to quickly find what they need without interrupting others or digging through multiple systems that require specialized knowledge. By centralizing institutional knowledge and making it accessible, enterprise search boosts team autonomy, reduces wasted time, and supports a healthier learning culture where answers are self-served rather than gatekept. ## When to Experiment -“I am a new developer and I need to learn how to find accurate, updated documentation so I can onboard quickly and easily.” +- You are a **New Developer** and you need to learn how to find accurate, updated documentation so you can onboard quickly without constantly interrupting mentors. +- You are a **QA Engineer** and you need to ensure you have quick access to system knowledge to avoid wasting time sifting through multiple systems during testing. +- You are a **Product Owner** or **Developer** and you need access to complete and up-to-date requirements to produce quality tickets and avoid downstream rejections. +- You are a **Senior Engineer** and you need to ensure that specialized knowledge is accessible to the wider team so you can reduce interruptions and boost team autonomy. -“I am a QA team member and I need to ensure I have quick access to system knowledge so I can avoid wasting time sifting through multiple systems.” +## How to Gain Traction -“I am a developer and I need to ensure I have access to complete and up-to-date requirements so I can produce quality tickets and avoid QA rejections.” +Implementing a document search engine requires more than just installing software; it requires mapping your knowledge ecosystem and training the team on how to retrieve it. -“I am a senior team member and I need to ensure that specialized knowledge is accessible so I can reduce interruptions and boost team autonomy.” +### Map the Knowledge Silos -## How to Gain Traction +Identify the highest-traffic repositories of knowledge that are currently disconnected. Usually, this begins with your ticket tracking system (e.g., [JIRA](https://www.atlassian.com/software/jira)), your documentation hub (e.g., [Confluence](https://www.atlassian.com/software/confluence)), and your version control system. Audit these sources to ensure permissions are clean before connecting a search tool, as effective search will surface documents that were previously "security through obscurity." - +### Connect and Pilot -List 3–5 steps to take a team from zero to adopted. -For each step, include: +Select an enterprise search tool and connect it to your two most critical data sources. Roll this out to a small pilot group—specifically targeting new hires and senior leads who feel the burden of questions most. Use this phase to tune the search relevance and ensure that the tool is indexing metadata correctly so that results are ranked by recency and relevance. -### [Action Step] +### Establish "Search First" Protocols -3 sentences on how to do it, how to get buy-in, and what tools/resources help. Any external resources (videos, guides, book lists, templates, etc.) that help a team adopt this practice should be linked here within the relevant action step. +To drive adoption, the team must shift from an "Ask First" to a "Search First" culture. Encourage senior staff to respond to questions with links to the search result rather than typing out the answer again. If a search comes up empty, use that as a trigger event to create the missing documentation immediately, ensuring the next search yields a result. ## Lessons From The Field -[Pragmint to complete] - -This section captures real-world patterns (things that consistently help or hinder this practice) along with short, relevant stories from the field. It’s not for personal rants or generic opinions. Each entry must be based on either: -1. a repeated observation across teams, or -2. a specific example (what worked, what didn’t, and why). +- **The "Security through Obscurity" Trap:** Teams often realize their permission settings are lax only after a search engine surfaces sensitive HR or roadmap documents to the whole engineering org. Audit permissions _before_ indexing. +- **Trust decay from stale data:** If the top three search results are deprecated documents from three years ago, users will quickly abandon the tool. You must archive old data or boost the ranking of fresh content. ## Deciding to Polish or Pitch -After experimenting with this practice for [insert appropriate quantity of time in bold], bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: - -### Fast & Measurable - -Improved ticket quality and fewer QA rejections due to incomplete or unclear requirements - -### Slow & Measurable - -Increased self-service search usage, indicating trust and utility in the tool - -Reduced onboarding time for new developers and QA (time to first merged PR or tested ticket) - -### Slow & Intangible +After experimenting with this practice for 4-6 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: -Lower interruption load on senior team members (tracked by volume of support requests sent via Teams chat) +### Fast & Intangible -Fewer context-gathering delays in ticket implementation (measured via cycle time or qualitative developer feedback) +**Reduced Context-Gathering Delays** Feedback from developers should indicate less frustration and time spent "hunting" for requirements or historical context before starting a ticket. ## Supported Capabilities -### [Learning Culture](https://github.com/pragmint/open-practices/blob/main/capabilities/learning-culture.md)) +### [Learning Culture](/capabilities/learning-culture.md) -By centralizing institutional knowledge and making it accessible, enterprise search should boost team autonomy, reduce wasted time, and support a healthier learning culture. +By centralizing institutional knowledge and making it accessible, enterprise search boosts team autonomy, reduces wasted time, and supports a healthier learning culture. -### [Documentation Quality](https://github.com/pragmint/open-practices/blob/main/capabilities/documentation-quality.md)) +### [Documentation Quality](/capabilities/documentation-quality.md) -Excellent documentation is accurate, clear, complete, and accessible to internal teams. This enables teams to effectively collaborate, make informed decisions, and deliver high-quality software quickly and reliably. +Excellent documentation is accurate, clear, complete, and accessible. This tool ensures that high-quality documentation is actually found and used, enabling teams to make informed decisions. From c6dd5976668ea4fc88ad45c3423cdde4c4686b68 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 28 Jan 2026 10:25:29 -0700 Subject: [PATCH 118/131] fix practice template name --- templates/new-practice.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/templates/new-practice.md b/templates/new-practice.md index cde9044..a0e2d6b 100644 --- a/templates/new-practice.md +++ b/templates/new-practice.md @@ -1,4 +1,4 @@ -# Use Data-generation Tools +# Practice Name Introduction. 2-4 paragraphs. From d704f793e95af874c1af1b8564dec0a8d4f15c5d Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 28 Jan 2026 10:34:16 -0700 Subject: [PATCH 119/131] link AI-accessible internal data capability --- practices/implement-a-documentation-search-engine.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md index ab7bafe..90dc21a 100644 --- a/practices/implement-a-documentation-search-engine.md +++ b/practices/implement-a-documentation-search-engine.md @@ -42,6 +42,10 @@ After experimenting with this practice for 4-6 weeks, bring the team together to ## Supported Capabilities +### [AI-accessible Internal Data](/capabilities/ai-accessable-internal-data.md) + +This practice is one option for implementing AI-accessible Internal Data. There may be other options depending on your organizations complexity and needs but using an enterprise search solution is a fantastic option for smaller to mid size companies that don't have the resources or need the customization of an in house system. + ### [Learning Culture](/capabilities/learning-culture.md) By centralizing institutional knowledge and making it accessible, enterprise search boosts team autonomy, reduces wasted time, and supports a healthier learning culture. From 596a24ba711e7a2531e2d042ffcb216eda773b77 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 28 Jan 2026 10:40:24 -0700 Subject: [PATCH 120/131] imporve lessons from the field --- practices/implement-a-documentation-search-engine.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md index 90dc21a..2fc9756 100644 --- a/practices/implement-a-documentation-search-engine.md +++ b/practices/implement-a-documentation-search-engine.md @@ -29,7 +29,7 @@ To drive adoption, the team must shift from an "Ask First" to a "Search First" c ## Lessons From The Field -- **The "Security through Obscurity" Trap:** Teams often realize their permission settings are lax only after a search engine surfaces sensitive HR or roadmap documents to the whole engineering org. Audit permissions _before_ indexing. +- **Reliance on Language Models for Security:** Teams often realize their permission settings are lax only after a search engine surfaces sensitive HR or roadmap documents to the whole engineering org. Make sure security permissions are handled with the appropriate data layer security and not through system prompts or other language model means. - **Trust decay from stale data:** If the top three search results are deprecated documents from three years ago, users will quickly abandon the tool. You must archive old data or boost the ranking of fresh content. ## Deciding to Polish or Pitch From 229147a5c65010993c23463857a7b84e32949bb0 Mon Sep 17 00:00:00 2001 From: nicoletache Date: Wed, 28 Jan 2026 12:27:08 -0600 Subject: [PATCH 121/131] final edits to intro-enterprise-search-tool --- ...implement-a-documentation-search-engine.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md index 2fc9756..24abbc9 100644 --- a/practices/implement-a-documentation-search-engine.md +++ b/practices/implement-a-documentation-search-engine.md @@ -6,10 +6,10 @@ This practice allows teams to quickly find what they need without interrupting o ## When to Experiment -- You are a **New Developer** and you need to learn how to find accurate, updated documentation so you can onboard quickly without constantly interrupting mentors. -- You are a **QA Engineer** and you need to ensure you have quick access to system knowledge to avoid wasting time sifting through multiple systems during testing. -- You are a **Product Owner** or **Developer** and you need access to complete and up-to-date requirements to produce quality tickets and avoid downstream rejections. -- You are a **Senior Engineer** and you need to ensure that specialized knowledge is accessible to the wider team so you can reduce interruptions and boost team autonomy. +- You are a **new developer** and you need to learn how to find accurate, updated documentation so you can onboard quickly without constantly interrupting mentors. +- You are a **QA engineer** and you need to ensure you have quick access to system knowledge to avoid wasting time sifting through multiple systems during testing. +- You are a **product owner** or **Developer** and you need access to complete and up-to-date requirements to produce quality tickets and avoid downstream rejections. +- You are a **senior engineer** and you need to ensure that specialized knowledge is accessible to the wider team so you can reduce interruptions and boost team autonomy. ## How to Gain Traction @@ -29,22 +29,22 @@ To drive adoption, the team must shift from an "Ask First" to a "Search First" c ## Lessons From The Field -- **Reliance on Language Models for Security:** Teams often realize their permission settings are lax only after a search engine surfaces sensitive HR or roadmap documents to the whole engineering org. Make sure security permissions are handled with the appropriate data layer security and not through system prompts or other language model means. -- **Trust decay from stale data:** If the top three search results are deprecated documents from three years ago, users will quickly abandon the tool. You must archive old data or boost the ranking of fresh content. +- _Reliance on Language Models for Security_ - Teams often realize their permission settings are lax only after a search engine surfaces sensitive HR or roadmap documents to the whole engineering org. Make sure security permissions are handled with the appropriate data layer security and not through system prompts or other language model means. +- _Trust decay from stale data_ - If the top three search results are deprecated documents from three years ago, users will quickly abandon the tool. You must archive old data or boost the ranking of fresh content. ## Deciding to Polish or Pitch -After experimenting with this practice for 4-6 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: +After experimenting with this practice for **4-6 weeks,** bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: ### Fast & Intangible -**Reduced Context-Gathering Delays** Feedback from developers should indicate less frustration and time spent "hunting" for requirements or historical context before starting a ticket. +**Reduced Context-gathering Delays.** Feedback from developers should indicate less frustration and time spent "hunting" for requirements or historical context before starting a ticket. ## Supported Capabilities ### [AI-accessible Internal Data](/capabilities/ai-accessable-internal-data.md) -This practice is one option for implementing AI-accessible Internal Data. There may be other options depending on your organizations complexity and needs but using an enterprise search solution is a fantastic option for smaller to mid size companies that don't have the resources or need the customization of an in house system. +This practice is one option for implementing AI-accessible Internal Data. There may be other options, depending on your organization's complexity and needs, but using an enterprise search solution is a fantastic option for smaller to mid-size companies that don't have the resources or need the customization of an in-house system. ### [Learning Culture](/capabilities/learning-culture.md) @@ -52,4 +52,4 @@ By centralizing institutional knowledge and making it accessible, enterprise sea ### [Documentation Quality](/capabilities/documentation-quality.md) -Excellent documentation is accurate, clear, complete, and accessible. This tool ensures that high-quality documentation is actually found and used, enabling teams to make informed decisions. +Excellent documentation is accurate, clear, complete, and accessible. This practice ensures that high-quality documentation is actually found and used, enabling teams to make informed decisions. From ec85dee16f0cf7b1564421e66dcdfa37dfa0832a Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Wed, 28 Jan 2026 15:07:26 -0700 Subject: [PATCH 122/131] fix personas --- practices/implement-a-documentation-search-engine.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md index 24abbc9..3e13d00 100644 --- a/practices/implement-a-documentation-search-engine.md +++ b/practices/implement-a-documentation-search-engine.md @@ -6,10 +6,10 @@ This practice allows teams to quickly find what they need without interrupting o ## When to Experiment -- You are a **new developer** and you need to learn how to find accurate, updated documentation so you can onboard quickly without constantly interrupting mentors. -- You are a **QA engineer** and you need to ensure you have quick access to system knowledge to avoid wasting time sifting through multiple systems during testing. -- You are a **product owner** or **Developer** and you need access to complete and up-to-date requirements to produce quality tickets and avoid downstream rejections. -- You are a **senior engineer** and you need to ensure that specialized knowledge is accessible to the wider team so you can reduce interruptions and boost team autonomy. +- You are a new developer and you need to learn how to find accurate, updated documentation so you can onboard quickly without constantly interrupting mentors. +- You are a *A engineer and you need to ensure you have quick access to system knowledge to avoid wasting time sifting through multiple systems during testing. +- You are a product owner or developer and you need access to complete and up-to-date requirements to produce quality tickets and avoid downstream rejections. +- You are a senior engineer and you need to ensure that specialized knowledge is accessible to the wider team so you can reduce interruptions and boost team autonomy. ## How to Gain Traction From 77756e8cdd61ed554d1ab6e428a541e9dee82d51 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Thu, 29 Jan 2026 14:25:58 -0700 Subject: [PATCH 123/131] fix ambiguity in capability link --- practices/implement-a-documentation-search-engine.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/practices/implement-a-documentation-search-engine.md b/practices/implement-a-documentation-search-engine.md index 3e13d00..5937237 100644 --- a/practices/implement-a-documentation-search-engine.md +++ b/practices/implement-a-documentation-search-engine.md @@ -7,7 +7,7 @@ This practice allows teams to quickly find what they need without interrupting o ## When to Experiment - You are a new developer and you need to learn how to find accurate, updated documentation so you can onboard quickly without constantly interrupting mentors. -- You are a *A engineer and you need to ensure you have quick access to system knowledge to avoid wasting time sifting through multiple systems during testing. +- You are a an engineer and you need to ensure you have quick access to system knowledge to avoid wasting time sifting through multiple systems during testing. - You are a product owner or developer and you need access to complete and up-to-date requirements to produce quality tickets and avoid downstream rejections. - You are a senior engineer and you need to ensure that specialized knowledge is accessible to the wider team so you can reduce interruptions and boost team autonomy. @@ -44,7 +44,7 @@ After experimenting with this practice for **4-6 weeks,** bring the team togethe ### [AI-accessible Internal Data](/capabilities/ai-accessable-internal-data.md) -This practice is one option for implementing AI-accessible Internal Data. There may be other options, depending on your organization's complexity and needs, but using an enterprise search solution is a fantastic option for smaller to mid-size companies that don't have the resources or need the customization of an in-house system. +This practice is one option for implementing AI-accessible Internal Data. The other option is building the system your self with in-house developers. However, you will likely only see the benefits of this if you are a large enough company with complex enough needs to justify putting a team of developers toward it. The vast majority of companies should opt for an off the shelf solution. ### [Learning Culture](/capabilities/learning-culture.md) From e8bbe9055b9dacda1db9d9eedd706fd8a7d7dcf8 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Fri, 30 Jan 2026 06:52:35 -0800 Subject: [PATCH 124/131] Fix broken links in practice pages --- capabilities/learning-culture.md | 6 ++--- practices/automate-coding-standards.md | 6 ++--- practices/automate-database-migrations.md | 6 ++--- .../automate-infrastructure-management.md | 8 +++---- practices/automate-test-coverage-checks.md | 6 ++--- .../build-consistent-testing-strategy.md | 6 ++--- practices/check-documentation-consistency.md | 8 +++---- practices/clean-git-history.md | 8 +++---- practices/conduct-code-reviews.md | 6 ++--- ...reate-and-manage-ephemeral-environments.md | 4 ++-- practices/decouple-from-third-parties.md | 10 ++++---- practices/host-crucial-conversation.md | 24 +++++++++---------- practices/lead-a-demonstration.md | 8 +++---- .../reduce-coupling-between-abstractions.md | 10 ++++---- practices/refactor.md | 6 ++--- practices/run-pair-programming-sessions.md | 12 +++++----- practices/separate-config-from-code.md | 8 +++---- ...-spin-to-unearth-problems-and-solutions.md | 2 +- practices/version-dependencies.md | 8 +++---- 19 files changed, 76 insertions(+), 76 deletions(-) diff --git a/capabilities/learning-culture.md b/capabilities/learning-culture.md index 1262b70..a2a7aa1 100644 --- a/capabilities/learning-culture.md +++ b/capabilities/learning-culture.md @@ -48,7 +48,7 @@ The following is a curated list of supporting practices to consider when looking Pair programming sessions facilitate collaboration, real-time code review, and knowledge sharing among developers. By working in pairs, developers can catch issues early, ensure code is comprehensible, and spread knowledge across the team. This collaborative practice also accelerates the onboarding process for new team members while helping experienced developers refine their skills. Additionally, pair programming promotes adherence to coding standards, enhancing code consistency and readability. -### [Do a Spike, or Timeboxed Experiment](/practices/do-a-spike.md) +### Do a Spike, or Timeboxed Experiment Also referred to as building a proof-of-concept (POC), this practice involves setting aside a short period of time for your team to get hands-on experience building a solution as a way to reduce uncertainty. Spikes tend to last a couple of hours or days (at the most). They're a great way to try out a new practice, process, or tool. Given a spike's short duration, it's helpful to have an experienced member of the team lead these efforts to avoid teams from getting stuck for prolonged periods of time. @@ -56,7 +56,7 @@ Also referred to as building a proof-of-concept (POC), this practice involves se Talking directly with users is an invaluable practice for gaining insights and understanding their needs and challenges. Field customer support calls. Host developer office hours. Run focus groups. Whatever method you use, engaging with users helps to gather real-world feedback, identify pain points, and uncover opportunities for improvement. By maintaining direct communication with users, you can ensure that your product or service aligns closely with expectations and foster a stronger connection between your team and your user base. -### [Dogfood Your Systems](/practices/dogfood-your-systems.md) +### Dogfood Your Systems Dogfooding your systems involves having your teams use the same products or systems that your users do, allowing them to experience the same pain points firsthand. This practice helps build empathy with users, identify issues early, and drive improvements based on direct "user" experience. By regularly using your own systems as customers would, your team can gain valuable insights and ensure that the product meets the highest standards of usability and performance. @@ -66,7 +66,7 @@ SPIN is a question-asking framework that was originally developed for sales prof ### Introduce a Screen-Recording Tool -By enabling richer, more intuitive communication, screen recording helps teams document intent more clearly, reduce back-and-forth, and improve the efficiency of handoffs without requiring ticket authors to spend a lot of time writing. A lightweight screen-recording tool like [Loom](https://www.loom.com) allows ticket authors to quickly demonstrate the issue or desired behavior using voice and visuals, reducing ambiguity without adding process overhead. +By enabling richer, more intuitive communication, screen recording helps teams document intent more clearly, reduce back-and-forth, and improve the efficiency of handoffs without requiring ticket authors to spend a lot of time writing. A lightweight screen-recording tool like [Loom](https://www.loom.com) allows ticket authors to quickly demonstrate the issue or desired behavior using voice and visuals, reducing ambiguity without adding process overhead. ## Adjacent Capabilities diff --git a/practices/automate-coding-standards.md b/practices/automate-coding-standards.md index 829dc1e..d11b250 100644 --- a/practices/automate-coding-standards.md +++ b/practices/automate-coding-standards.md @@ -29,7 +29,7 @@ The specific approach to incorporate automatic coding standards as part of the d ## How to Improve -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club - [Automate Your Coding Standard](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_04) @@ -43,11 +43,11 @@ It recommends a gradual approach, utilizing auto-fix capabilities wherever possi Also proposes secondary lint configuration for new rules, applied only to modified files via a pre-commit hook. This method, inspired by the Boy Scout Rule of leaving code better than one found it. -### [Do A Spike](/practices/do-a-spike.md) +### Do A Spike Implement what you learned in the article [Automate Your Coding Standard](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_04) with a project or module of your codebase. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion * Are our automated coding standards tools customized to reflect our specific coding practices and project needs, or are we using a one-size-fits-all approach? * Do team members understand the reasons behind certain coding rules? diff --git a/practices/automate-database-migrations.md b/practices/automate-database-migrations.md index 36bc97e..893f94a 100644 --- a/practices/automate-database-migrations.md +++ b/practices/automate-database-migrations.md @@ -27,7 +27,7 @@ When considering migrations for NoSQL databases, it's essential to embrace schem ## How to Improve -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Create a Simple Migration Script @@ -41,7 +41,7 @@ Understand the rollback process by writing a script that undoes a migration. Usi Gain experience with complex migrations involving data transformations. Write a migration script that alters an existing table by adding a new column. Populate the new column with data transformed from existing columns (e.g., concatenating two columns into a new one). Apply the migration and verify the data transformation was successful. -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club ### [Evolutionary Database Design](https://martinfowler.com/articles/evodb.html) @@ -51,7 +51,7 @@ This foundational article by Martin Fowler discusses the principles and practice This issue is an extensive guide on the process of migration. You can use it as a blueprint when preparing and executing migrations. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion ## Introspective Questions diff --git a/practices/automate-infrastructure-management.md b/practices/automate-infrastructure-management.md index d3320ea..8a1a439 100644 --- a/practices/automate-infrastructure-management.md +++ b/practices/automate-infrastructure-management.md @@ -36,7 +36,7 @@ While IaC inherently documents infrastructure setups, additional documentation o ## How to Improve -### [Do A Spike](/practices/do-a-spike.md) +### Do A Spike #### IaC Tool Comparison @@ -47,13 +47,13 @@ Compare at least two IaC tools (e.g., Terraform vs. Ansible) by setting up a sim Integrate your IaC setup with a CI/CD pipeline (using Jenkins, GitLab CI, or GitHub Actions) to automate the deployment of infrastructure changes. Learn how automation in deployment processes reduces manual errors and speeds up delivery times. -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Immutable Infrastructure Deployment Deploy a set of infrastructure components, then simulate a "disaster" by destroying them. Re-deploy using only your IaC scripts. Gain confidence in the immutability and recoverability of your infrastructure through IaC practices. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### State of you Automation @@ -77,7 +77,7 @@ Deploy a set of infrastructure components, then simulate a "disaster" by destroy * Think about the level of collaboration between your development, operations, and security teams in managing and evolving your IaC strategy. * Is there a culture of shared responsibility and knowledge sharing, or are silos hindering your progress? -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [Codify your infrastructure so it can also be version controlled](https://dzone.com/articles/secure-terraform-delivery-pipeline-best-practices) diff --git a/practices/automate-test-coverage-checks.md b/practices/automate-test-coverage-checks.md index f0c502f..9a8b850 100644 --- a/practices/automate-test-coverage-checks.md +++ b/practices/automate-test-coverage-checks.md @@ -29,11 +29,11 @@ None of those types of tests fit neatly into a traditional "coverage" check. ### Continuous Improvement Automating test coverage checks should not be a one-time setup but an ongoing process of refinement and improvement. -Teams should regularly review and adjust coverage thresholds based on evolving project requirements, feedback from testing outcomes, and changes in software functionality.### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +Teams should regularly review and adjust coverage thresholds based on evolving project requirements, feedback from testing outcomes, and changes in software functionality.### Host A Roundtable Discussion ## How to Improve -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [Test Coverage](https://martinfowler.com/bliki/TestCoverage.html) @@ -42,7 +42,7 @@ He argues that while high test coverage percentages can highlight which parts of Fowler emphasizes that test coverage should be used alongside other techniques and metrics to assess the robustness of tests, and that focusing solely on coverage numbers can lead to superficial or inadequate testing. He advocates for a balanced approach that combines test coverage with thoughtful test design and evaluation to achieve meaningful software quality. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### Tailoring and Adjusting Test Coverage diff --git a/practices/build-consistent-testing-strategy.md b/practices/build-consistent-testing-strategy.md index d9fa9d7..edf48d1 100644 --- a/practices/build-consistent-testing-strategy.md +++ b/practices/build-consistent-testing-strategy.md @@ -36,7 +36,7 @@ With the rise of automation, there is a misconception that manual testing is no ## How to Improve -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### Evaluate Current Testing Strategy @@ -48,7 +48,7 @@ After implementing changes, reassess after a few sprints to measure improvements -### [Lead a Workshop](/practices/lead-a-workshop.md) +### Lead a Workshop #### Develop a Testing Strategy Document @@ -81,7 +81,7 @@ Organize workshops or knowledge-sharing sessions where team members can learn ab Encourage the team to share their experiences and tips. Track how this knowledge transfer impacts the quality and consistency of your testing efforts. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### Tailoring Your Testing Strategy diff --git a/practices/check-documentation-consistency.md b/practices/check-documentation-consistency.md index 97d3bf8..fd47126 100644 --- a/practices/check-documentation-consistency.md +++ b/practices/check-documentation-consistency.md @@ -44,7 +44,7 @@ Recognizing and addressing documentation decay is a continuous effort. Avoiding knowledge silos where only certain team members know how to update documentation is crucial for consistency. Ensuring that knowledge and responsibility are shared across the team prevents bottlenecks. -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Documentation Audience Analysis @@ -54,7 +54,7 @@ Conduct an analysis of your documentation's audience. Identify the different gro Host a workshop with members from different teams (development, QA, support, etc.) to collaboratively review and update sections of the documentation. This will help identify inconsistencies and gaps from diverse perspectives and foster a shared responsibility for documentation. -### [Dogfood Your Systems](/practices/dogfood-your-systems.md) +### Dogfood Your Systems #### Documentation Usability Testing @@ -66,12 +66,12 @@ Organize a usability testing session for your documentation with participants fr Organize regular knowledge-sharing sessions where team members can present on areas of the codebase or technical artifacts they are experts in. Use these sessions to fill gaps in the documentation and ensure knowledge is not siloed. -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club ### [Two wrongs can make a right (and are difficult to fix)](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_86) Underscores the complexity of software development where two mistakes might cancel each other out, making them harder to identify and fix. It highlights the importance of thorough testing and documentation to prevent and resolve such issues effectively. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### Alignment with Development Processes diff --git a/practices/clean-git-history.md b/practices/clean-git-history.md index f66263c..8b5a4e3 100644 --- a/practices/clean-git-history.md +++ b/practices/clean-git-history.md @@ -36,14 +36,14 @@ While finding the right commit size should always be a judgement call, it may ma ## How to Improve -### [Lead A Demonstration](/practices/lead-a-demonstration.md) +### Lead A Demonstration #### Git Bisect Debugging Introduce the git bisect tool and demonstrate its usage in identifying problematic commits. Set up a mock scenario where a bug is introduced in the codebase, and have team members use git bisect to pinpoint the exact commit causing the issue. -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Commit Frequency Audit @@ -52,7 +52,7 @@ Identify instances of both too frequent and too sparse commits. Based on this analysis, develop guidelines for when to commit changes, aiming for logical breakpoints or completion of significant functionality. Discuss as a team and adjust practices accordingly. -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club ### [Keep a Clean Git History](https://simplabs.com/blog/2021/05/26/keeping-a-clean-git-history/) Offers guidance on maintaining a clean Git commit history, emphasizing practices like squashing similar commits, crafting clear commit messages, and organizing changes logically to make the project's history navigable and understandable, crucial for effective code reviews and project oversight. @@ -63,7 +63,7 @@ Advocates for the disciplined management of Git history through methods like fea ### [Two Wrongs Can Make a Right (And Are Difficult to Fix)](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_86) Details strategies for properly amending Git history issues, such as errant commits or merge mistakes, without exacerbating problems. Includes practical advice and Git command examples for correcting history efficiently and effectively, focusing on common Git missteps and the complexities of rectifying them. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### Commit Message Clarity and Relevance diff --git a/practices/conduct-code-reviews.md b/practices/conduct-code-reviews.md index 34d79fb..8becf1b 100644 --- a/practices/conduct-code-reviews.md +++ b/practices/conduct-code-reviews.md @@ -46,7 +46,7 @@ Furthermore, LLMs backed tools are able to provide automated feedback on code ch ## How to Improve -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Guideline Creation Workshop @@ -60,7 +60,7 @@ Create practical exercises and scenarios that encourage participants to offer he Highlight the importance of sharing observations, staying curious, and avoiding making judgments without understanding the author's intentions and thought process. Empower participants to receive feedback openly, emphasizing the importance of not taking criticism personally and avoid assuming negative intentions from the reviewer. -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [The power of feedback loops](https://lucamezzalira.medium.com/the-power-of-feedback-loops-f8e27e8ac25f) @@ -91,7 +91,7 @@ The Conduct Code Reviews practice significantly strengthens the Code Maintainabi The Conduct Code Reviews practice supports the Test Automation DORA capability by ensuring code changes adhere to quality standards, catching potential issues early, and enforcing consistent coding practices. This promotes a robust suite of automated tests, prevents regression errors, and maintains smooth continuous integration processes. By fostering collaboration and identifying gaps in test coverage, code reviews enhance software reliability and stability, directly contributing to effective test automation. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion * Do our reviewers have a clear understanding of the broader context surrounding code changes, including user requirements and project goals? * Are there opportunities for reviewers to engage in pair programming or collaborative discussions with developers to gain deeper insights into the code being reviewed? diff --git a/practices/create-and-manage-ephemeral-environments.md b/practices/create-and-manage-ephemeral-environments.md index c0343c5..27493b2 100644 --- a/practices/create-and-manage-ephemeral-environments.md +++ b/practices/create-and-manage-ephemeral-environments.md @@ -28,13 +28,13 @@ In such cases, adopting a hybrid approach—integrating both ephemeral and persi ## How to Improve -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [What is an ephemeral environment?](https://webapp.io/blog/what-is-an-ephemeral-environment/) This article goes through the basics of ephemeral environments. It's a great resource for those new to the concept. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion * In what ways do you anticipate ephemeral environments will simplify troubleshooting and debugging for your team? * Are most bugs easily reproducible? diff --git a/practices/decouple-from-third-parties.md b/practices/decouple-from-third-parties.md index ea33aee..c77148e 100644 --- a/practices/decouple-from-third-parties.md +++ b/practices/decouple-from-third-parties.md @@ -22,16 +22,16 @@ Every situation is unique, so there's no one size fits all guidance for this sit ## How to Improve -### [Do A Spike](/practices/do-a-spike.md) +### Do A Spike Choose am important dependency and refactor your code to introduce abstractions such as interfaces or abstract classes to encapsulate interactions with that dependency. Rewrite the implementations to depend on these abstractions rather than the concrete third-party tools. -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops Start by identifying the dependencies your project currently has on third-party software, frameworks, or libraries. Make a list of these dependencies and assess how tightly coupled they are to your codebase. -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club - [Clean Architecture Article](https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html) @@ -48,13 +48,13 @@ This article discusses the Dependency Inversion Principle (DIP) in software desi The article explores the pitfalls and benefits of using mock objects in test-driven development (TDD), emphasizing the principle of "Don't Mock What You Don't Own." The author discusses how improper use of mocks can lead to unreliable tests and proposes alternatives, such as wrapping third-party libraries in domain-specific objects. -### [Host A Viewing Party](/practices/host-a-viewing-party.md) +### Host a Viewing Party - [Boundaries](https://www.destroyallsoftware.com/talks/boundaries) This presentation delves into the concept of using simple values rather than complex objects as the boundaries between components and subsystems in software development. It covers various topics such as functional programming, the relationship between mutability and object-oriented programming (OO), isolated unit testing with and without test doubles, and concurrency. Understanding and implementing these concepts can be immensely beneficial in managing dependencies with third parties. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion * What are the key third-party dependencies we rely on in our projects? * Have we identified any single points of failure or critical dependencies on specific third-party tools? diff --git a/practices/host-crucial-conversation.md b/practices/host-crucial-conversation.md index ea0061c..4c21288 100644 --- a/practices/host-crucial-conversation.md +++ b/practices/host-crucial-conversation.md @@ -1,9 +1,9 @@ # Host a Crucial Conversation -By hosting a crucial conversation, you're helping team members with differing (sometimes strong) opinions navigate high-stakes and sensitive discussions. You're also helping to strengthen relationships and build trust within your team. +By hosting a crucial conversation, you're helping team members with differing (sometimes strong) opinions navigate high-stakes and sensitive discussions. You're also helping to strengthen relationships and build trust within your team. The key to hosting a crucial conversation is creating a space of psychological safety, where participants can speak openly and without fear of judgment or conflict. This opens the door to constructive feedback, civil conflict resolution, and collaborative decision-making where team members are working toward a common goal. For a team of software developers, a crucial conversation might center around deciding whether to refactor a legacy system or invest in building a new one. This discussion could involve balancing the technical debt, budget constraints, and the potential impact on delivery timelines. Another example might involve debating the adoption of test-driven development (TDD) as a standard practice, weighing its potential to improve code quality against concerns about increased development time. -With the help of a host, these difficult discussions can be turned into opportunities for growth. +With the help of a host, these difficult discussions can be turned into opportunities for growth. ## Nuances @@ -11,15 +11,15 @@ This section outlines common pitfalls, challenges, or limitations teams commonly ### Fool's Choice -The fool's choice arises when, during a difficult conversation, people think they must choose between being honest and preserving a relationship. They fear that speaking openly will cause harm or conflict. But the fool's choice is a false dilemma. It's also counterproductive, typically leading to silence or aggression and damaging trust. A _third option_ exists: addressing issues respectfully while maintaining the relationship. As host, remember this third option and focus on articulating shared goals and fostering a safe environment. +The fool's choice arises when, during a difficult conversation, people think they must choose between being honest and preserving a relationship. They fear that speaking openly will cause harm or conflict. But the fool's choice is a false dilemma. It's also counterproductive, typically leading to silence or aggression and damaging trust. A _third option_ exists: addressing issues respectfully while maintaining the relationship. As host, remember this third option and focus on articulating shared goals and fostering a safe environment. ### You have to keep an open mind It's important to enter a crucial conversation with a genuine curiosity about the other person's perspective. Without this, participants limit their ability to truly understand the underlying issues and unique viewpoints. An open mind allows them to consider different angles, question assumptions, and explore the conversation more deeply. By encouraging participants to remain curious, you create a more constructive environment for dialogue, fostering empathy and collaboration. You're also helping your team find common ground and work toward a resolution that benefits all parties to some degree. -### Stay Focused +### Stay Focused -During a crucial conversation, it can be challenging to maintain a single point of focus, especially if the discussion becomes uncomfortable. In these moments, individuals may bring up unrelated issues to divert attention from the primary topic, which can derail the conversation and lead to confusion. As a host, _bookmarking_ is a crucial strategy to employ here — this involves consciously noting unrelated issues for later discussion, so they don’t distract from the conversation at hand. +During a crucial conversation, it can be challenging to maintain a single point of focus, especially if the discussion becomes uncomfortable. In these moments, individuals may bring up unrelated issues to divert attention from the primary topic, which can derail the conversation and lead to confusion. As a host, _bookmarking_ is a crucial strategy to employ here — this involves consciously noting unrelated issues for later discussion, so they don’t distract from the conversation at hand. ### Full Consensus May Not Be Feasible @@ -27,27 +27,27 @@ A common misconception is that all decisions — especially those made during cr * Command: Decisions are made without involving others. * Consultation: Input is gathered from stakeholders and a smaller group makes the final decision. -* Voting: Decisions are made based on a majority agreement. +* Voting: Decisions are made based on a majority agreement. ## Gaining Traction -The following actions will help your team implement this practice. +The following actions will help your team implement this practice. -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Host a mock crucial conversation Hosting a mock crucial conversation involves your team role-playing in a challenging conversation to practice managing emotions and communicating effectively. Begin by identifying a shared purpose. What is the focus of the conversation and what is the common goal that all participants are working toward? One group then simulates scenarios like giving feedback or resolving conflicts, while another group observes and critiques. Afterward, the entire group reflects on the experience to improve future real-life conversations. As host, it's helpful to use the Crucial Conversations [worksheet](https://irp-cdn.multiscreensite.com/25ad169b/files/uploaded/Crucial-Conversations-Worksheet.pdf) to guide the mock conversation, ensuring that key strategies and goals are addressed throughout the exercise. -Hosting a mock crucial conversation will help your team build skills such as active listening, staying calm under pressure, and navigating sensitive or high-stakes issues. +Hosting a mock crucial conversation will help your team build skills such as active listening, staying calm under pressure, and navigating sensitive or high-stakes issues. -### [Start a Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [Crucial Conversations: Tools for Talking When Stakes are High](https://www.goodreads.com/book/show/15014.Crucial_Conversations) -The authors of _Crucial Conversations_ teach you how to navigate high-stakes situations where emotions run high and opinions differ. They offer practical tools to handle tough conversations, communicate clearly, and achieve positive outcomes. The techniques discussed in this book will help you quickly prepare for tough discussions, create a safe environment for open dialogue, be persuasive without being aggressive, and stay engaged even when others become defensive or silent. +The authors of _Crucial Conversations_ teach you how to navigate high-stakes situations where emotions run high and opinions differ. They offer practical tools to handle tough conversations, communicate clearly, and achieve positive outcomes. The techniques discussed in this book will help you quickly prepare for tough discussions, create a safe environment for open dialogue, be persuasive without being aggressive, and stay engaged even when others become defensive or silent. -### [Host A Viewing Party](/practices/host-a-viewing-party.md) +### Host a Viewing Party #### [Mastering Crucial Conversations by Joseph Grenny](https://www.youtube.com/watch?v=uc3ARpccRwQ) diff --git a/practices/lead-a-demonstration.md b/practices/lead-a-demonstration.md index 79e58c5..2a09710 100644 --- a/practices/lead-a-demonstration.md +++ b/practices/lead-a-demonstration.md @@ -38,19 +38,19 @@ Gather feedback and refine future demos for better effectiveness and stakeholder ## Related Practices -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion If your Q&A session runs long, you may want to schedule a roundtable discussion on the topcis your demonstration covered. That way attendees get more chances to play around with the ideas in their heads. -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops It may be a good idea to cover your topic with a workshop following a presentation / demonstration. Doing so will increase the chances that your ideas will be applied by the people you're trying to impart knowledge onto. -### [Host A Viewing Party](/practices/host-a-viewing-party.md) +### Host a Viewing Party If you or someone else has already recorded a demonstration that adequately covers the topics you wish to share, you can run a watch party instead of giving the presentation. -### [Start A Community Of Practice](/practices/start-a-community-of-practice.md) +### Start A Community Of Practice Finding or starting relevant Communities of Practice can be a great place to lead a demonstration. The audience has already self selected themselves to join with the hopes of learning. diff --git a/practices/reduce-coupling-between-abstractions.md b/practices/reduce-coupling-between-abstractions.md index 3b96db0..a2e96ab 100644 --- a/practices/reduce-coupling-between-abstractions.md +++ b/practices/reduce-coupling-between-abstractions.md @@ -44,15 +44,15 @@ Focus on the current requirements and only introduce abstractions when there's a ## Gaining Traction -The following actions will help your team implement this practice. +The following actions will help your team implement this practice. -### [Host a Viewing Party](/practices/host-a-viewing-party.md) +### Host a Viewing Party #### [Boundaries by Gary Bernhardt](https://www.destroyallsoftware.com/talks/boundaries) This talk explores the intricate dynamics between code boundaries and system architecture, illustrating how to create clean and maintainable code through effective separation of concerns. In particular, Gary introduces a way to use values as the boundaries between abstractions. -### [Start a Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [Clean Architecture by Robert C. Martin](https://www.goodreads.com/book/show/18043011-clean-architecture) @@ -66,7 +66,7 @@ This book discusses how to find seams, add automated test coverage, and refactor This is similar to Feathers's book above, but it covers the content from a first-principles standpoint. -### [Facilitate a Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Facilitate a Roundtable Discussion Below are suggestions for topics and prompts you could explore with your team during a roundtable discussion. @@ -93,7 +93,7 @@ Below are suggestions for topics and prompts you could explore with your team du * What small, incremental changes can we make to start reducing coupling in these areas? * How do we ensure system stability while refactoring to reduce coupling? -### [Do a Spike, or Timeboxed Experiment](/practices/do-a-spike.md) +### Do a Spike, or Timeboxed Experiment * **Refactor**: Set some time aside to refactor a key component or set of components to reduce coupling. Present your findings to the team to see if committing those changes or making additional changes have a good potential return on investment. * **Audit Your Dependencies**: Use a [dependency analysis tool](https://markgacoka.medium.com/how-to-visualize-your-codebase-7c4c4d948141) to visualize the relationships between modules and components, and to identify highly coupled areas. Discuss why these dependencies exist. diff --git a/practices/refactor.md b/practices/refactor.md index b0c9054..f0d9eec 100644 --- a/practices/refactor.md +++ b/practices/refactor.md @@ -56,9 +56,9 @@ The fast feedback loop allows for quick identification and resolution of any iss ## How to Improve -### [Lead A Demonstration](/practices/lead-a-demonstration.md) +### Lead A Demonstration -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Incremental and Frequent Refactoring @@ -87,7 +87,7 @@ Version control tracks code changes over time, facilitating collaboration and re Refactoring improves code structure without altering behavior. Together, they enable teams to systematically enhance code quality, with version control tracking and integrating improvements with confidence. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### Understanding Refactoring diff --git a/practices/run-pair-programming-sessions.md b/practices/run-pair-programming-sessions.md index 20264e5..8ceb346 100644 --- a/practices/run-pair-programming-sessions.md +++ b/practices/run-pair-programming-sessions.md @@ -33,9 +33,9 @@ Teams must use suitable collaboration tools (for example, Visual Studio Live Sha ## Gaining Traction -The following actions will help your team implement this practice. +The following actions will help your team implement this practice. -### [Host a Viewing Party](/practices/host-a-viewing-party.md) +### Host a Viewing Party #### [Async Code Reviews Are Chocking Your Company’s Throughput](https://www.youtube.com/watch?v=ZlLZEQQBcFg) @@ -45,7 +45,7 @@ Engineer Dragan Stepanovic discusses his analysis of thousands of PRs across var Engineer Elizabeth Engelman talks about how mismatches in personality, learning style, and experience levels can create challenges while pair programming. -### [Start a Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [The power of feedback loops](https://lucamezzalira.medium.com/the-power-of-feedback-loops-f8e27e8ac25f) @@ -63,7 +63,7 @@ Ultimately, Joshua's article champions simplicity in coding solutions. This article, co-authored by faculty from North Carolina State University and the University of Utah, focuses on the benefits of pair programming. Based on anecdotal evidence and structured experiments, the authors argue that pair programming leads to higher-quality code, faster development, and increased programmer satisfaction. The article presents data from professional programmers and students, showing that pair programming is more efficient despite initial skepticism. It also highlights pair programming's role in the Extreme Programming (XP) methodology, emphasizing its effectiveness across all levels of programming skill. -### [Facilitate a Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Facilitate a Roundtable Discussion Below are suggestions for topics and prompts you could explore with your team during a roundtable discussion. @@ -85,13 +85,13 @@ Below are suggestions for topics and prompts you could explore with your team du * What tools and protocols do we have in place to address challenges associated with remote pair programming, such as communication barriers and time zone differences? * How do we ensure that team members are equipped with the necessary skills and resources to effectively engage in pair programming? -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Trial Period To experiment with pair programming and evaluate its effects, start by pairing team members for a designated trial period, such as one or two weeks. Aim to pair individuals with mixed experience levels, if possible. Assign tasks to the pairs and rotate pairs frequently to encourage diverse collaboration. During each trial period, have one person in the driver role, actively writing code, while the other acts as the navigator, providing real-time feedback and suggestions. Encourage pairs to switch roles regularly. Monitor the outcomes by gathering feedback on code quality, productivity, and team satisfaction. Additionally, observe any improvements in knowledge sharing, problem-solving efficiency, and team cohesion. -### [Run a Retrospective](/practices/host-a-retrospective.md) +### Run a Retrospective #### Feedback Sessions diff --git a/practices/separate-config-from-code.md b/practices/separate-config-from-code.md index 11504a8..a6dba86 100644 --- a/practices/separate-config-from-code.md +++ b/practices/separate-config-from-code.md @@ -18,7 +18,7 @@ Allow local overrides of configuration values and provide developers with a blue ## How to Improve -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Review and Identify Configuration in Version Control @@ -28,7 +28,7 @@ Audit your current repositories to identify instances of configuration or sensit Simulate a process for managing changes to configuration data that involves multiple environments. Include steps for reviewing, approving, and applying configuration changes. Assess the impact on deployment times, security, and team collaboration. -### [Do A Spike](/practices/do-a-spike.md) +### Do A Spike #### Implement Environment-Specific Configuration Files @@ -38,7 +38,7 @@ Create separate configuration files for different environments (development, sta Explore and integrate a secure configuration management solution, such as HashiCorp Vault or AWS Secrets Manager. Evaluate the effectiveness of this solution in improving security and flexibility compared to storing sensitive data in version control. -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [The Twelve-Factor App - Config](https://12factor.net/config) This section of the Twelve-Factor App methodology emphasizes the importance of separating configuration from code. It advocates for storing config in the environment to improve security and adaptability across various deployment environments, offering foundational insights for efficient configuration management. @@ -46,7 +46,7 @@ This section of the Twelve-Factor App methodology emphasizes the importance of s #### [97 Things Every Programmer Should Know - Store Configurations in the Environment](https://github.com/97-things/97-things-every-programmer-should-know/tree/master/en/thing_61) A concise guide that underscores the significance of externalizing configuration, highlighting how this practice enhances application security, simplifies deployment, and supports scalability. It provides actionable advice for developers to implement this best practice effectively. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### Protecting Sensitive Configuration diff --git a/practices/use-spin-to-unearth-problems-and-solutions.md b/practices/use-spin-to-unearth-problems-and-solutions.md index 5a9094b..efe8ec0 100644 --- a/practices/use-spin-to-unearth-problems-and-solutions.md +++ b/practices/use-spin-to-unearth-problems-and-solutions.md @@ -29,7 +29,7 @@ The above example shows how following SPIN can set the stage for the need before ## How to Improve -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club - [SPIN Selling](https://www.amazon.com/SPIN-Selling-Neil-Rackham/dp/0070511136) diff --git a/practices/version-dependencies.md b/practices/version-dependencies.md index ae73d3b..3111b98 100644 --- a/practices/version-dependencies.md +++ b/practices/version-dependencies.md @@ -18,7 +18,7 @@ They monitor dependencies for new versions, and can automatically create pull re ## How to Improve -### [Lead Workshops](/practices/lead-workshops.md) +### Lead Workshops #### Audit Your Current Dependency Management @@ -32,7 +32,7 @@ To review and potentially revise your current policies on updating dependencies. To simulate a "dependency hell" scenario to understand its impact and identify strategies for mitigation. Practical experience in managing complex dependency chains, leading to improved strategies for avoiding or dealing with dependency hell in real projects. -### [Do A Spike](/practices/do-a-spike.md) +### Do A Spike #### Implement Semantic Versioning on a Small Scale @@ -42,13 +42,13 @@ To experiment with semantic versioning by applying it to a small, manageable por Lock mayor dependencies in your project and configure Dependabot or a similar tool, to generate PRs when new version of dependencies are published. Understand how automatic dependency update tools impact your work flow and the overall stability of the project. -### [Start A Book Club](/practices/start-a-book-club.md) +### Start A Book Club #### [Dependencies Belong in Version Control](https://www.forrestthewoods.com/blog/dependencies-belong-in-version-control/) This article explores the importance of including dependencies within version control systems to ensure consistency, reliability, and traceability in software development projects. It discusses the benefits and methodologies of version controlling dependencies, offering insights into best practices for managing software dependencies effectively. -### [Host A Roundtable Discussion](/practices/host-a-roundtable-discussion.md) +### Host A Roundtable Discussion #### How Effective Is Your Dependency Management? From 6e30b06295461104929b9ba5bd3c13a4e0751d2c Mon Sep 17 00:00:00 2001 From: Nolan Patterson Date: Thu, 29 Jan 2026 16:58:35 -0800 Subject: [PATCH 125/131] draft of visualize all work on a storyboard --- capabilities/work-in-process-limits.md | 2 +- .../visualize-all-work-on-a-storyboard.md | 97 +++++++++++++++++++ 2 files changed, 98 insertions(+), 1 deletion(-) create mode 100644 practices/visualize-all-work-on-a-storyboard.md diff --git a/capabilities/work-in-process-limits.md b/capabilities/work-in-process-limits.md index 8cb28fa..de66f3f 100644 --- a/capabilities/work-in-process-limits.md +++ b/capabilities/work-in-process-limits.md @@ -45,7 +45,7 @@ Generally, an overall score equal to or less than 3 means you'll likely gain a l The following is a curated list of supporting practices to consider when looking to improve your team's Work-in-Process Limits capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -### Visualize All Work on a Storyboard +### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) Using a storyboard to represent all ongoing tasks helps make work visible, including hidden or auxiliary tasks like meetings or production support. This visualization allows the team to see the full workload and better manage WIP, ensuring that prioritization is based on complete information. The benefit is clearer communication and better identification of bottlenecks across the process. diff --git a/practices/visualize-all-work-on-a-storyboard.md b/practices/visualize-all-work-on-a-storyboard.md new file mode 100644 index 0000000..9eab0ff --- /dev/null +++ b/practices/visualize-all-work-on-a-storyboard.md @@ -0,0 +1,97 @@ +# Visualize All Work on a Storyboard + +When everyone is busy, everyone tends to be caught up in their own world. Developers focus on their assigned tickets. QA focuses on testing queues. Ops focuses on incidents. Product focuses on stakeholder requests. Each person optimizes for their slice without seeing how those local optimizations create bottlenecks elsewhere in the system. Teams may work to improve and deliver in their areas of control and point the finger at others. The result is suboptimization and a system and culture that may be working against itself. + +A storyboard makes all work visible in one place. Unlike a sprint board that tracks only committed work, a storyboard captures everything happening across the team: feature work, bug fixes, production support, meetings, spikes, and the invisible coordination that consumes time but never gets tracked. When all work is visible, the team can see where effort is actually going, identify bottlenecks worth swarming on, and make prioritization decisions grounded in reality rather than assumptions. + +The storyboard operationalizes your value stream. Start with a [Value Stream Mapping (VSM) workshop](#run-a-value-stream-mapping-workshop) to build a shared understanding of how work flows from idea to delivery. The VSM creates the map; the storyboard becomes the live operational dashboard that tracks work against that map. Together, they provide a data grounded view for assessing performance, spotting constraints, and measuring improvement over time. + +## When to Experiment + +- You are a team lead who notices that everyone seems busy, yet throughput remains low and lead times seem slow or getting slower. +- You are a developer who keeps getting pulled into unplanned work that never appears on the backlog or sprint board. +- You are a manager who struggles to explain where time goes when stakeholders ask why features take so long. +- You are part of a cross-functional team where handoffs between roles create delays that no one can quantify. +- You are trying to implement WIP limits but lack visibility into the actual work in progress. + +## How to Gain Traction + +Building a useful storyboard requires more than choosing a tool and creating columns. The value comes from the collective whole team process of mapping, discussing, and agreeing on what work exists and how it flows. + +### Run a Value Stream Mapping Workshop + +Before building the storyboard, gather the whole team for a VSM workshop. Include everyone who touches the work: developers, QA, product, ops, and anyone else involved in delivery. The goal is not to create a perfect diagram but to surface everyone's perspective on how work actually flows. When a developer hears QA describe waiting three days for a deployable build, or when product learns that "quick fixes" consume 30% of engineering time, the team develops a shared understanding that no individual possessed before. DORA provides a solid breakdown of how to run these workshops [here](https://dora.dev/guides/value-stream-management/). The value is in the collective hearing and telling. + +### Identify All Work Types + +Most teams undercount their work. Sprint boards may track planned features and bugs, but miss production support, technical debt, meetings, code reviews, on-call rotations, spikes, cross-team coordination, and those high priority features or enhancements that were not planned. During the VSM workshop, explicitly ask: "What else consumes your time that doesn't show up on our board?" Capture these categories. A storyboard that only shows planned work will give a false picture of capacity and create frustration when "unexpected" work inevitably appears. + +### Design Columns That Match Your Value Stream + +Your storyboard columns should reflect how work actually moves through your system, not an idealized process. If your VSM revealed that work frequently waits for code review, create a "Waiting for Review" column so that wait time becomes visible. If handoffs to QA create delays, make that queue explicit. The columns should make bottlenecks impossible to ignore. Common patterns include: Backlog, Ready, In Progress, Waiting (blocked/review/QA), Done. Adjust based on your actual flow. + +### Make the Invisible Visible + +Add swim lanes or card types for work that traditionally stays hidden: production support, meetings, unplanned requests, technical debt. When a developer spends half their day on incident response, that should appear on the board. When the team loses a day to an all-hands meeting, capture it. This visibility serves two purposes: it explains where capacity actually goes, and it creates pressure to reduce low-value activities. What gets measured gets managed. + +### Establish Rituals Around the Board + +A storyboard only works if the team uses it. Build rituals that center on the board: daily standups that walk the board through, bi-weekly reviews that examine flow metrics, and bi-weekly retrospectives that ask what the board revealed. The board should be the single source of truth for "what are we working on?" If conversations happen elsewhere, the board becomes stale and loses its value. + +## Lessons From The Field + +- *Start Imperfect, Refine Along the Way.* The first version of your storyboard will be wrong. Columns won't match reality, work types will be incomplete, and the team will resist updating it. That's fine. Treat the board as a living artifact that improves through use. After two weeks, revisit what's working and what's not. + +- *Resistance Reveals Hidden Work.* When someone says "I don't have time to update the board," that's a signal. Either the board is too cumbersome, or their work is genuinely invisible and needs to be surfaced. Both are problems worth solving. + +- *The Conversation Matters More Than the Artifact.* The VSM workshop creates value through discussion, not through the diagram it produces. When a QA engineer explains their bottlenecks and developers hear it for the first time, alignment happens. Don't rush through the workshop to produce a deliverable. + +- *Bottlenecks Move.* Once you address the biggest constraint, another will emerge. This is expected. The storyboard helps you continuously identify where to focus improvement efforts rather than optimizing areas that aren't actually bottlenecks. + +- *Physical Boards Create Different Conversations.* If your team is co-located, consider a physical board alongside any digital tool. Standing in front of a wall of cards creates different dynamics than clicking through a Jira board. The tactile act of moving cards reinforces ownership and makes blockers harder to ignore. + +- *WIP Limits Need Visibility First.* Teams often try to implement WIP limits before they have visibility into actual work in progress. This creates frustration because limits feel arbitrary. Build the storyboard first, observe actual WIP for a few weeks, then set limits based on what you learn. + +## Deciding to Polish or Pitch + +After experimenting with this practice for 4-6 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + +### Fast & Intangible + +**Improved Shared Understanding.** Team members should be able to describe what others are working on and where the current bottlenecks are. Cross-functional conversations should reference the board as a shared artifact. If standup discussions become more focused and handoffs smoother, the board is working. + +**Stronger Team Camaraderie.** When team members hear each other's challenges during VSM workshops and see each other's work on the board, empathy grows. People stop assuming others have it easy and start appreciating the full picture. This mutual understanding builds trust and shifts the dynamic from finger-pointing to problem-solving together. + +### Fast & Tangible + +**Reduced Invisible and Unplanned Work.** The gap between "what the board shows" and "what people are actually doing" should shrink. If production support or unplanned work previously went untracked, it should now be visible and quantifiable. You should be able to answer "what percentage of our capacity goes to unplanned work?" + +### Slow & Tangible + +**Improved Flow Metrics.** Track lead time (idea to delivery) and cycle time (work started to work completed). Over 8-12 weeks, these metrics should stabilize or improve as the team identifies and addresses bottlenecks. If metrics aren't improving, the board may be revealing problems the team isn't acting on. + +### Slow & Intangible + +**Better Prioritization Decisions.** Conversations about what to work on next should become grounded in data rather than opinion. Stakeholders should understand trade-offs more clearly. The team should feel less reactive and more intentional about where effort goes. + +## Supported Capabilities + +### [Work in Process Limits](/capabilities/work-in-process-limits.md) + +A storyboard makes WIP visible, which is a prerequisite for setting meaningful limits. Without seeing all work in progress, teams cannot know whether they're over capacity or where work is piling up. The storyboard provides the visibility needed to set, enforce, and refine WIP limits based on actual data rather than guesses. + +### [Visibility of Work in the Value Stream](/capabilities/visibility-of-work-in-the-value-stream.md) + +The storyboard operationalizes your value stream map. While VSM workshops create a shared understanding of how work flows, the storyboard makes that flow visible day-to-day. It surfaces where work waits, where handoffs create delays, and where the team should focus improvement efforts. + +### [Visual Management](/capabilities/visual-management.md) + +A well-designed storyboard is a core visual management tool. It makes the current state of work obvious at a glance, highlights what's blocked or at risk, and provides the foundation for data-driven conversations about priorities and progress. + +### [Generative Organizational Culture](/capabilities/generative-organizational-culture.md) + +The process of building a storyboard, especially when grounded in a VSM workshop, creates space for cross-functional dialogue and shared ownership. When team members hear each other's challenges and see how their work connects, trust and collaboration improve. The board becomes a neutral artifact that enables difficult conversations about priorities and trade-offs. + +### [Well-Being](/capabilities/well-being.md) + +Making all work visible reduces the stress of invisible overload. When unplanned work, meetings, and production support appear on the board, teams can have honest conversations about capacity. This visibility helps prevent burnout by making unsustainable workloads obvious rather than hidden. From 989ccfb1e3764f7f41c7e2c772b5b30addd8ae93 Mon Sep 17 00:00:00 2001 From: Nolan Patterson Date: Fri, 30 Jan 2026 14:54:36 -0800 Subject: [PATCH 126/131] applied recommended changes and cross-references to visualize-all-work-on-a-storyboard --- .../generative-organizational-culture.md | 4 ++ .../visibility-of-work-in-the-value-stream.md | 4 ++ capabilities/visual-management.md | 4 ++ capabilities/well-being.md | 4 ++ .../visualize-all-work-on-a-storyboard.md | 38 +++++++++---------- 5 files changed, 35 insertions(+), 19 deletions(-) diff --git a/capabilities/generative-organizational-culture.md b/capabilities/generative-organizational-culture.md index 42836d9..7239f92 100644 --- a/capabilities/generative-organizational-culture.md +++ b/capabilities/generative-organizational-culture.md @@ -83,6 +83,10 @@ Aim to create teams that are composed of members from different functional areas Adopt tools and platforms that facilitate transparent and easy communication across the organization, such as chat systems, collaborative documentation, and shared dashboards. Encourage the sharing of information, questions, and updates openly and regularly rather than through isolated channels and in narrow timeframes. By improving information flow, teams are better informed and can respond more effectively to changes and challenges. +### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) - Related + +Building a storyboard through a cross-functional VSM workshop creates space for team members to hear each other's challenges and see how their work connects. When developers learn what blocks QA, or product discovers how much time goes to incident response, empathy and trust grow. This shared understanding shifts the culture from finger-pointing to collaborative problem-solving. + ## Adjacent Capabilities The following capabilities will be valuable for you and your team to explore, as they are either: diff --git a/capabilities/visibility-of-work-in-the-value-stream.md b/capabilities/visibility-of-work-in-the-value-stream.md index 3116220..4d97077 100644 --- a/capabilities/visibility-of-work-in-the-value-stream.md +++ b/capabilities/visibility-of-work-in-the-value-stream.md @@ -66,6 +66,10 @@ The following is a curated list of supporting practices to consider when looking Conduct regular VSM workshops involving representatives from each stage of the value stream. Map out both the current VSM and the ideal future-state VSM. Break down the value stream into clear process blocks and identify key metrics like lead time, process time, and percent complete and accurate (%C/A). DORA provided a great breakdown [here](https://dora.dev/guides/value-stream-management/). +### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) + +A storyboard operationalizes your value stream map by tracking all work, planned and unplanned, as it flows from idea through delivery. Start with a VSM workshop to build shared understanding of how work moves through the system, then use the storyboard to make the day to day flow of work visible. This combination surfaces the data needed to identify and address bottlenecks. + ### Set and Enforce Work-in-Process Limits Start by setting limits that feel ambitious. This forces teams to make deliberate choices about what work matters most. The exact number depends on your team's context, but the goal is to find the sweet spot where teams feel focused but not hamstrung. Display your WIP limits prominently on your board or dashboard. When the limit is reached, treat it as a hard stop. No new work enters the system until something completes. This discipline creates the pressure needed to finish what's started and forces the prioritization conversations that lead to better decisions. diff --git a/capabilities/visual-management.md b/capabilities/visual-management.md index 90609a1..00b0b3a 100644 --- a/capabilities/visual-management.md +++ b/capabilities/visual-management.md @@ -59,6 +59,10 @@ Visual management isn’t a set-it-and-forget-it practice. Just like code or arc It’s easy to default to tracking what’s easy to count: tickets closed, lines of code, story points. But these are outputs, not outcomes. To drive meaningful improvement, displays should connect work to its impact: customer behavior, system reliability, revenue generated, or time to resolve issues. When teams see the impact of their work, they can make smarter trade-offs and course-correct faster. +### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) + +A storyboard serves as a real-time visual display of all work across the team, including hidden work like production support, meetings, and unplanned requests. By making the full workload visible at a glance, teams can spot bottlenecks, track flow, and have grounded conversations about priorities. The board becomes the shared artifact that drives standups, planning, and retrospectives. + ### Set Work-in-Process Limits While setting work-in-progress (WIP) limits is a DORA capability, it is also a technique that is actionable. So, we're including it here as a supporting practice. Visually tracking and enforcing WIP limits prevents bottlenecks and helps to maintain a steady flow. By limiting the number of tasks that are actively worked on, teams can achieve greater focus, reduce context switching, and enjoy enhanced flow efficiency. This leads to faster and smarter software delivery. diff --git a/capabilities/well-being.md b/capabilities/well-being.md index 9564f35..de85202 100644 --- a/capabilities/well-being.md +++ b/capabilities/well-being.md @@ -59,6 +59,10 @@ Establish structured programs to recognize employees for their contributions and Allocating time for fun and relationship-building fosters trust, collaboration, and a sense of belonging among team members. When employees can share positive experiences, it strengthens psychological safety, boosts creativity, and encourages cross-team connections. These elements not only make the workplace more enjoyable but also encourage retention and productivity by signaling that the organization values its employees' well-being. An organization that balances productivity with meaningful relationships creates an environment where employees thrive. +### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) + +When unplanned work, meetings, and production support stay invisible, teams experience hidden overload that leads to burnout. A storyboard makes all work visible, enabling honest conversations about capacity and sustainable pace. By surfacing what's actually consuming time, teams can push back on unrealistic expectations and protect their collective well-being. + ### Automate Deployment Scripts Develop scripts that automate the entire deployment process, including environment preparation, package deployment, configuration, and post-deployment testing. By scripting these steps, you eliminate manual interventions, reduce the risk of human error, and lessen deployment pain. A repeatable and reliable deployment process can then be triggered with minimal effort. This enhances not only deployment speed and consistency but also employee well-being. diff --git a/practices/visualize-all-work-on-a-storyboard.md b/practices/visualize-all-work-on-a-storyboard.md index 9eab0ff..89d84f2 100644 --- a/practices/visualize-all-work-on-a-storyboard.md +++ b/practices/visualize-all-work-on-a-storyboard.md @@ -1,18 +1,18 @@ # Visualize All Work on a Storyboard -When everyone is busy, everyone tends to be caught up in their own world. Developers focus on their assigned tickets. QA focuses on testing queues. Ops focuses on incidents. Product focuses on stakeholder requests. Each person optimizes for their slice without seeing how those local optimizations create bottlenecks elsewhere in the system. Teams may work to improve and deliver in their areas of control and point the finger at others. The result is suboptimization and a system and culture that may be working against itself. +When everyone is busy, everyone tends to be caught up in their own world. Developers focus on their assigned tickets. QA focuses on testing queues. Ops focuses on incidents. Product focuses on stakeholder requests. Each person optimizes for their slice without seeing how those local optimizations create bottlenecks elsewhere in the system. Teams may work to improve and deliver in their areas of control and point the finger at others when things go sideways. The result is suboptimization and a system and culture that may be working against itself. -A storyboard makes all work visible in one place. Unlike a sprint board that tracks only committed work, a storyboard captures everything happening across the team: feature work, bug fixes, production support, meetings, spikes, and the invisible coordination that consumes time but never gets tracked. When all work is visible, the team can see where effort is actually going, identify bottlenecks worth swarming on, and make prioritization decisions grounded in reality rather than assumptions. +A storyboard makes all work visible in one place. Unlike a sprint board that tracks only committed work, a storyboard captures everything happening across the team: feature work, bug fixes, production support, meetings, spikes, and the invisible coordination that consumes time but never gets tracked. When all work is visible, teams can see where effort is actually going, identify bottlenecks worth swarming on, and make prioritization decisions grounded in reality rather than assumptions. -The storyboard operationalizes your value stream. Start with a [Value Stream Mapping (VSM) workshop](#run-a-value-stream-mapping-workshop) to build a shared understanding of how work flows from idea to delivery. The VSM creates the map; the storyboard becomes the live operational dashboard that tracks work against that map. Together, they provide a data grounded view for assessing performance, spotting constraints, and measuring improvement over time. +The storyboard operationalizes your value stream. Start with a [Value Stream Mapping (VSM) workshop](#run-a-value-stream-mapping-workshop) to build a shared understanding of how work flows from idea to delivery. The VSM creates the map; the storyboard becomes the live operational dashboard that tracks work against that map. Together, they provide a data-grounded view for assessing performance, spotting constraints, and measuring improvement over time. ## When to Experiment -- You are a team lead who notices that everyone seems busy, yet throughput remains low and lead times seem slow or getting slower. -- You are a developer who keeps getting pulled into unplanned work that never appears on the backlog or sprint board. -- You are a manager who struggles to explain where time goes when stakeholders ask why features take so long. -- You are part of a cross-functional team where handoffs between roles create delays that no one can quantify. -- You are trying to implement WIP limits but lack visibility into the actual work in progress. +- You are a **team or functional lead** who notices that everyone seems busy, yet throughput remains low and lead times are slow or getting slower. +- You are a **team or functional lead** trying to implement WIP limits but lack visibility into the actual work in progress. +- You are a **developer** who keeps getting pulled into unplanned work that never appears on the backlog or sprint board. +- You are a **manager** who struggles to explain where team members spend their time when stakeholders ask why features take so long. +- You are **part of a cross-functional team** where handoffs between roles create delays that no one can quantify. ## How to Gain Traction @@ -20,11 +20,11 @@ Building a useful storyboard requires more than choosing a tool and creating col ### Run a Value Stream Mapping Workshop -Before building the storyboard, gather the whole team for a VSM workshop. Include everyone who touches the work: developers, QA, product, ops, and anyone else involved in delivery. The goal is not to create a perfect diagram but to surface everyone's perspective on how work actually flows. When a developer hears QA describe waiting three days for a deployable build, or when product learns that "quick fixes" consume 30% of engineering time, the team develops a shared understanding that no individual possessed before. DORA provides a solid breakdown of how to run these workshops [here](https://dora.dev/guides/value-stream-management/). The value is in the collective hearing and telling. +Before building the storyboard, gather the whole team for a VSM workshop. Include everyone who touches the work: developers, QA, product, ops, and anyone else involved in delivery. The goal is not to create a perfect diagram but to surface everyone's perspective on how work actually flows. When a developer hears QA describe waiting three days for a deployable build, or when product learns that "quick fixes" consume 30% of engineering time, the team develops a shared understanding that no individual possessed before. DORA provides a solid breakdown of how to run these workshops [here](https://dora.dev/guides/value-stream-management/). ### Identify All Work Types -Most teams undercount their work. Sprint boards may track planned features and bugs, but miss production support, technical debt, meetings, code reviews, on-call rotations, spikes, cross-team coordination, and those high priority features or enhancements that were not planned. During the VSM workshop, explicitly ask: "What else consumes your time that doesn't show up on our board?" Capture these categories. A storyboard that only shows planned work will give a false picture of capacity and create frustration when "unexpected" work inevitably appears. +Most teams under account for their work. Sprint boards may track planned features and bugs, but miss production support, technical debt, meetings, code reviews, on-call rotations, spikes, cross-team coordination, and those high-priority features or enhancements that were not planned. During the VSM workshop, explicitly ask "What else consumes your time that doesn't show up on our board?" Capture and categorize these activities. A storyboard that only shows planned work will give a false picture of capacity and create frustration when "unexpected" work inevitably appears. ### Design Columns That Match Your Value Stream @@ -32,11 +32,11 @@ Your storyboard columns should reflect how work actually moves through your syst ### Make the Invisible Visible -Add swim lanes or card types for work that traditionally stays hidden: production support, meetings, unplanned requests, technical debt. When a developer spends half their day on incident response, that should appear on the board. When the team loses a day to an all-hands meeting, capture it. This visibility serves two purposes: it explains where capacity actually goes, and it creates pressure to reduce low-value activities. What gets measured gets managed. +Add swim lanes or card types for work that traditionally stays hidden: production support, meetings, unplanned requests, technical debt. When a developer spends half their day on incident response, that should appear on the board. When the team loses a day to an all-hands meeting, capture it. This visibility serves two purposes: It explains where capacity actually goes, and it creates pressure to reduce low-value activities. What gets measured gets managed. ### Establish Rituals Around the Board -A storyboard only works if the team uses it. Build rituals that center on the board: daily standups that walk the board through, bi-weekly reviews that examine flow metrics, and bi-weekly retrospectives that ask what the board revealed. The board should be the single source of truth for "what are we working on?" If conversations happen elsewhere, the board becomes stale and loses its value. +A storyboard only works if the team uses it. Build rituals that center on the board: daily standups that walk the board through, bi-weekly reviews that examine flow metrics, and bi-weekly retrospectives that ask what the board revealed. The board should be the single source of truth for "What are we working on?" If conversations happen elsewhere, the board becomes stale and loses its value. ## Lessons From The Field @@ -50,11 +50,15 @@ A storyboard only works if the team uses it. Build rituals that center on the bo - *Physical Boards Create Different Conversations.* If your team is co-located, consider a physical board alongside any digital tool. Standing in front of a wall of cards creates different dynamics than clicking through a Jira board. The tactile act of moving cards reinforces ownership and makes blockers harder to ignore. -- *WIP Limits Need Visibility First.* Teams often try to implement WIP limits before they have visibility into actual work in progress. This creates frustration because limits feel arbitrary. Build the storyboard first, observe actual WIP for a few weeks, then set limits based on what you learn. +- *WIP Limits Need Visibility First.* Teams often try to implement WIP limits before they have visibility into actual WIP. This creates frustration because limits feel arbitrary. Build the storyboard first, observe actual WIP for a few weeks, then set limits based on what you learn. ## Deciding to Polish or Pitch -After experimenting with this practice for 4-6 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: +After experimenting with this practice for **4-6 weeks**, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + +### Fast & Tangible + +**Reduced Invisible and Unplanned Work.** The gap between "what the board shows" and "what people are actually doing" should shrink. If production support or unplanned work previously went untracked, it should now be visible and quantifiable. You should be able to answer "What percentage of our capacity goes to unplanned work?" ### Fast & Intangible @@ -62,10 +66,6 @@ After experimenting with this practice for 4-6 weeks, bring the team together to **Stronger Team Camaraderie.** When team members hear each other's challenges during VSM workshops and see each other's work on the board, empathy grows. People stop assuming others have it easy and start appreciating the full picture. This mutual understanding builds trust and shifts the dynamic from finger-pointing to problem-solving together. -### Fast & Tangible - -**Reduced Invisible and Unplanned Work.** The gap between "what the board shows" and "what people are actually doing" should shrink. If production support or unplanned work previously went untracked, it should now be visible and quantifiable. You should be able to answer "what percentage of our capacity goes to unplanned work?" - ### Slow & Tangible **Improved Flow Metrics.** Track lead time (idea to delivery) and cycle time (work started to work completed). Over 8-12 weeks, these metrics should stabilize or improve as the team identifies and addresses bottlenecks. If metrics aren't improving, the board may be revealing problems the team isn't acting on. @@ -74,7 +74,7 @@ After experimenting with this practice for 4-6 weeks, bring the team together to **Better Prioritization Decisions.** Conversations about what to work on next should become grounded in data rather than opinion. Stakeholders should understand trade-offs more clearly. The team should feel less reactive and more intentional about where effort goes. -## Supported Capabilities +## Supporting Capabilities ### [Work in Process Limits](/capabilities/work-in-process-limits.md) From c847d6b7749ffb744919141722faf2655d36ec0e Mon Sep 17 00:00:00 2001 From: Nolan Patterson Date: Fri, 30 Jan 2026 14:58:24 -0800 Subject: [PATCH 127/131] fix support(ed|ing) capabilities visualize-all-work-on-a-storyboard --- practices/visualize-all-work-on-a-storyboard.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/practices/visualize-all-work-on-a-storyboard.md b/practices/visualize-all-work-on-a-storyboard.md index 89d84f2..3404d3f 100644 --- a/practices/visualize-all-work-on-a-storyboard.md +++ b/practices/visualize-all-work-on-a-storyboard.md @@ -74,7 +74,7 @@ After experimenting with this practice for **4-6 weeks**, bring the team togethe **Better Prioritization Decisions.** Conversations about what to work on next should become grounded in data rather than opinion. Stakeholders should understand trade-offs more clearly. The team should feel less reactive and more intentional about where effort goes. -## Supporting Capabilities +## Supported Capabilities ### [Work in Process Limits](/capabilities/work-in-process-limits.md) From 875f27401d07170890c4f0b25c8b513f824437cc Mon Sep 17 00:00:00 2001 From: Nolan Patterson Date: Fri, 30 Jan 2026 15:25:35 -0800 Subject: [PATCH 128/131] fixed visualize-all-work-on-a-storyboard title reference --- capabilities/generative-organizational-culture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/capabilities/generative-organizational-culture.md b/capabilities/generative-organizational-culture.md index 7239f92..37e6917 100644 --- a/capabilities/generative-organizational-culture.md +++ b/capabilities/generative-organizational-culture.md @@ -83,7 +83,7 @@ Aim to create teams that are composed of members from different functional areas Adopt tools and platforms that facilitate transparent and easy communication across the organization, such as chat systems, collaborative documentation, and shared dashboards. Encourage the sharing of information, questions, and updates openly and regularly rather than through isolated channels and in narrow timeframes. By improving information flow, teams are better informed and can respond more effectively to changes and challenges. -### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) - Related +### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) Building a storyboard through a cross-functional VSM workshop creates space for team members to hear each other's challenges and see how their work connects. When developers learn what blocks QA, or product discovers how much time goes to incident response, empathy and trust grow. This shared understanding shifts the culture from finger-pointing to collaborative problem-solving. From 4db1833a8a2954242ea6dfe9a4ff6ede047eb39d Mon Sep 17 00:00:00 2001 From: Dave Moore <850537+dcmoore@users.noreply.github.com> Date: Fri, 30 Jan 2026 16:05:29 -0800 Subject: [PATCH 129/131] Delete duplicate practice --- capabilities/visibility-of-work-in-the-value-stream.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/capabilities/visibility-of-work-in-the-value-stream.md b/capabilities/visibility-of-work-in-the-value-stream.md index 4d97077..f7bf2de 100644 --- a/capabilities/visibility-of-work-in-the-value-stream.md +++ b/capabilities/visibility-of-work-in-the-value-stream.md @@ -62,10 +62,6 @@ Generally, an overall score equal to or less than 3 means you'll likely gain a l The following is a curated list of supporting practices to consider when looking to improve your team's Visibility of Work in the Value Stream capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -### Implement VSM Workshops - -Conduct regular VSM workshops involving representatives from each stage of the value stream. Map out both the current VSM and the ideal future-state VSM. Break down the value stream into clear process blocks and identify key metrics like lead time, process time, and percent complete and accurate (%C/A). DORA provided a great breakdown [here](https://dora.dev/guides/value-stream-management/). - ### [Visualize All Work on a Storyboard](/practices/visualize-all-work-on-a-storyboard.md) A storyboard operationalizes your value stream map by tracking all work, planned and unplanned, as it flows from idea through delivery. Start with a VSM workshop to build shared understanding of how work moves through the system, then use the storyboard to make the day to day flow of work visible. This combination surfaces the data needed to identify and address bottlenecks. From e64cbac3ea7a7856df079afbc0d57caca10b55a0 Mon Sep 17 00:00:00 2001 From: Dave Moore Date: Mon, 2 Feb 2026 12:39:53 -0800 Subject: [PATCH 130/131] Fix naming of otel practice --- capabilities/monitoring-and-observability.md | 2 +- ...elemetry-practice.md => adopt-the-opentelemetry-standard.md} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename practices/{open-telemetry-practice.md => adopt-the-opentelemetry-standard.md} (100%) diff --git a/capabilities/monitoring-and-observability.md b/capabilities/monitoring-and-observability.md index f4cdc06..17d090a 100644 --- a/capabilities/monitoring-and-observability.md +++ b/capabilities/monitoring-and-observability.md @@ -47,7 +47,7 @@ Generally, an overall score equal to or less than 3 means you'll likely gain a l The following is a curated list of supporting practices to consider when looking to improve your team's Monitoring and Observability capability. While not every practice will be beneficial in every situation, this list is meant to provide teams with fresh, pragmatic, and actionable ideas to support this capability. -### [Adopt the OpenTelemetry Standard](/practices/open-telemetry-practice.md) +### [Adopt the OpenTelemetry Standard](/practices/adopt-the-opentelemetry-standard.md) By instrumenting key parts of your application with telemetry data, teams gain real-time insights into usage patterns, performance bottlenecks, and opportunities to prioritize impactful changes. By following the OpenTelemetry standard and suite of open-source tools to instrument your application will provide consistent, vendor-neutral telemetry that preserves long-term flexibility in tooling and cost management. diff --git a/practices/open-telemetry-practice.md b/practices/adopt-the-opentelemetry-standard.md similarity index 100% rename from practices/open-telemetry-practice.md rename to practices/adopt-the-opentelemetry-standard.md From b42cdd8bba003a3731c45104cbfef1bb809c00f3 Mon Sep 17 00:00:00 2001 From: Tristan Barrow Date: Mon, 2 Feb 2026 15:00:38 -0700 Subject: [PATCH 131/131] flesh out wip limits --- practices/set-and-enforce-wip-limits | 54 ------------------ practices/set-and-enforce-wip-limits.md | 75 +++++++++++++++++++++++++ 2 files changed, 75 insertions(+), 54 deletions(-) delete mode 100644 practices/set-and-enforce-wip-limits create mode 100644 practices/set-and-enforce-wip-limits.md diff --git a/practices/set-and-enforce-wip-limits b/practices/set-and-enforce-wip-limits deleted file mode 100644 index e69ada0..0000000 --- a/practices/set-and-enforce-wip-limits +++ /dev/null @@ -1,54 +0,0 @@ -# Set and Enforce Work-in-Process Limits - -Teams often have too much work in progress at once. This leads to long-lived branches, delayed code reviews, bottlenecks in QA, and constant context switching. Setting and enforcing work-in-process (WIP) limits helps teams stay focused, finish work already in motion, and reduce the overhead caused by juggling too many tasks at once. - -## When to Experiment - -“I am a developer and I need to learn how to prioritize tasks so I can move work across the finish line more quickly and avoid context switching.” - -"I am a team leader and I need to ensure our members stay focused on work that matters most so that we can avoid team burnout." - -## How to Gain Traction - - ### Set Limits that Feel Ambitious - -When teams start by setting limits that feel ambitious, it forces them to make deliberate choices about what work matters most. The exact number depends on your team's context, but the goal is to find the sweet spot where teams feel focused but not hamstrung. [more is needed here to make this point actionable, perhaps an example] - - ### Finish Work Before Starting New Work - -When team members are blocked or waiting, instead of starting new tickets, they can contribute in other ways. This might include refining upcoming tickets, pairing on active work with other developers, performing code reviews, or helping QA test in-progress items. These activities keep the team moving without adding more work to the queue. - - ### Visualize All Work - -Use a storyboard or dashboard tool, such as [xyz], to visualize all ongoing tasks, including hidden or auxiliary tasks like meetings or production support. When the board shows that a limit has been reached, treat it as a hard stop -- no new work enters the system until something completes. This creates the pressure needed to finish what's started and forces the prioritization conversations that lead to better decisions. - -## Lessons From The Field - -[Pragmint to complete] - -This section captures real-world patterns (things that consistently help or hinder this practice) along with short, relevant stories from the field. It’s not for personal rants or generic opinions. Each entry must be based on either: -1. a repeated observation across teams, or -2. a specific example (what worked, what didn’t, and why). - -## Deciding to Polish or Pitch - -After experimenting with this practice for [insert appropriate quantity of time in bold], bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: - -### Fast & Measurable - -Fewer tickets stuck in review or QA (as tracked by ...) - -### Slow & Measurable - -Shorter lead times from development to release (as tracked by ...) - -### Slow & Intangible - -Less context switching and fewer rework cycles (via feedback captured by ...) - -Higher throughput and better team focus (via feedback captured by ...) - -## Supported Capability - - ### [Work-in-Process Limits](https://github.com/pragmint/open-practices/blob/main/capabilities/work-in-process-limits.md) -WIP limits help teams deliver more value by finishing what matters most. The focus shifts from starting new work to moving existing work across the finish line with greater speed and quality. diff --git a/practices/set-and-enforce-wip-limits.md b/practices/set-and-enforce-wip-limits.md new file mode 100644 index 0000000..eb5299a --- /dev/null +++ b/practices/set-and-enforce-wip-limits.md @@ -0,0 +1,75 @@ +# Set and Enforce Work-in-Process Limits + +Teams often have too much work in progress at once. This leads to long-lived branches, delayed code reviews, bottlenecks in QA, and constant context switching. When individuals juggle multiple tasks, cognitive load increases, and the time required to complete any single task expands significantly due to the cost of refocusing. + +Setting and enforcing work-in-process (WIP) limits helps teams stay focused, finish work already in motion, and reduce the overhead caused by task-switching. By artificially constraining the number of active items in a specific stage of the workflow, the team creates a "pull system" where new work is only started when there is capacity to handle it. + +Ultimately, this practice shifts the team’s mindset from resource efficiency (keeping everyone busy) to flow efficiency (getting value to the customer). It encourages swarming on blocked items and exposes bottlenecks in the process that were previously hidden by a mountain of open tickets. + +## When to Experiment + +- You are a Developer and you need to learn how to prioritize tasks so you can move work across the finish line more quickly and avoid context switching. +- You are a Team Leader and you need to ensure members stay focused on work that matters most so that you can avoid team burnout. +- You are a Product Owner and you need to see a more predictable flow of value delivery rather than a large batch of features that are "almost done." +- You are a QA Engineer and you need to prevent a flood of testing tickets from arriving at the end of the sprint, which compromises quality. + +## How to Gain Traction + +Implementing WIP limits changes the fundamental mechanics of how a team works. It is best to start simply, visualize the constraints, and agree as a team that the limit is a trigger for conversation, not just a rule to be broken. + +### Set Limits that Feel Ambitious + +When teams start by setting limits that feel ambitious, it forces them to make deliberate choices about what work matters most. The exact number depends on your team's context, but the goal is to find the sweet spot where teams feel focused but not hamstrung. A common starting calculation is to set the total WIP limit to `(Team Size * 2) - 1`. For a team of 4, try a limit of 7 active items across the board. If the limit is rarely hit, it is too high; if it is hit constantly without resolution, it is too low. + +### Finish Work Before Starting New Work + +Adopt the mantra: "Stop starting, start finishing." When team members are blocked or waiting, instead of starting new tickets, they should look for ways to contribute to tickets already on the board. This might include refining upcoming tickets, pairing on active work with other developers, performing code reviews, or helping QA test in-progress items. These activities keep the team moving without adding more work to the queue, reducing the average cycle time per ticket. + +### Visualize All Work + +Use a storyboard or dashboard tool to visualize all ongoing tasks, including hidden or auxiliary tasks like meetings, unplanned maintenance, or production support. When the board shows that a limit has been reached, treat it as a hard stop—no new work enters the system until something completes. This creates the pressure needed to finish what's started and forces the prioritization conversations that lead to better decisions. + +## Lessons From The Field + +- Teams often try to "game" the system by creating a "Blocked" or "Waiting" column with infinite capacity. This defeats the purpose; blocked work is still work in process and consumes mental energy. Keep blocked items in their active column to visualize the pain of the dependency. +- Management may initially fear "idleness" if a developer cannot pull a new ticket because the limit is hit. It is crucial to explain that "slack" in the system is necessary for flow and that an idle developer should swarm to help a bottlenecked peer rather than starting new features. +- A common pattern is realizing that the bottleneck isn't development, but Code Review or QA. WIP limits effectively highlight these stages, forcing the whole team to take responsibility for quality rather than throwing code over the wall. +- WIP limits usually fail if they are not visualized. If the limit exists only in a policy document but not on the Jira board or physical wall, it will be ignored within a week. + +## Deciding to Polish or Pitch + +After experimenting with this practice for 2-3 weeks, bring the team together to determine whether the following metrics and/or signals have changed in a positive direction: + +### Fast & Intangible + +**Standup Quality**. Daily standups shift from status updates ("I did this, I will do that") to blocker-focused discussions ("We are at our limit in QA, who can help clear this?"). + +### Fast & Tangible + +**Reduction in Active Tickets**. The total count of tickets in "In Progress," "Review," and "Testing" states decreases, matching the agreed-upon limits. + +### Slow & Tangible + +**Decreased Cycle Time**. The time it takes for a single work item to move from "Started" to "Done" drops significantly as work stops languishing in queues. + +### Slow & Intangible + +**Improved Morale and Lower Stress**. Team members report feeling less overwhelmed and more satisfied by the frequency of actually completing tasks, rather than having many tasks permanently "in flight." + +## Supported Capabilities + +### [Work-in-Process Limits](/capabilities/work-in-process-limits.md) + +WIP limits help teams deliver more value by finishing what matters most. The focus shifts from starting new work to moving existing work across the finish line with greater speed and quality. + +### [Visual Management](/capabilities/visual-management.md) + +You cannot limit what you cannot see. Visualizing the work and the limits explicitly on a board is the primary mechanism for enforcing this practice and identifying system constraints. + +### [Well-Being](/capabilities/well-being.md) + +By reducing context switching and the pressure of juggling multiple unfinished tasks, WIP limits directly contribute to a sustainable pace of work and reduced burnout for team members. + +### [Continuous Delivery](/capabilities/continuous-delivery.md) + +Lowering WIP is a prerequisite for continuous delivery. By reducing the batch size of work in the system, code moves through the pipeline faster, enabling more frequent and reliable releases.