You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Variable names in programming**: `[a-zA-Z_][a-zA-Z0-9_]*`
44
+
-**Binary strings ending with 01**: `(0|1)*01`
45
+
-**Strings containing at least one 'a'**: `(a|b)*a(a|b)*`
46
+
-**Even number of 0's**: `1*(01*01*)*`
47
+
-**Strings starting and ending with the same symbol over {a,b}**: `a(a|b)*a|b(a|b)*b|a|b`
39
48
40
49
### Non-deterministic Finite Automata (NFA)
41
50
@@ -75,6 +84,17 @@ Consider an NFA that accepts strings ending with "01":
75
84
76
85
Thompson's construction, developed by Ken Thompson in 1968, is a systematic method for converting regular expressions into equivalent NFAs. The algorithm works recursively by building small NFAs for basic expressions and then combining them using specific patterns for each operator.
77
86
87
+
#### Implementation Approach
88
+
89
+
Thompson's construction is typically implemented using a **stack-based approach**. As the algorithm processes the regular expression (usually in postfix notation), it:
90
+
91
+
1.**Pushes NFAs onto a stack** as basic symbols are encountered
92
+
2.**Pops NFAs from the stack** when operators are encountered
93
+
3.**Combines the popped NFAs** according to the operator's construction rule
94
+
4.**Pushes the resulting NFA** back onto the stack
95
+
96
+
This stack-based approach naturally handles the compositional nature of regular expressions, where smaller expressions are built up into larger ones. The stack keeps track of the NFAs being built, allowing for proper handling of nested expressions and operator precedence.
97
+
78
98
#### Historical Fun Fact
79
99
80
100
Ken Thompson, who developed this algorithm, is the same person who co-created the Unix operating system and the C programming language. His work on regular expressions was motivated by the need for pattern matching in text editors, particularly the 'ed' editor in early Unix systems.
@@ -86,48 +106,156 @@ Thompson's algorithm follows these specific construction patterns:
86
106
**1. Base Case - Single Symbol 'a':**
87
107
88
108
```text
89
-
Start → [state1] --a--> [state2] (final)
109
+
┌────────┐ a ┌────────┐
110
+
→│ q₀ │ ───────────────> │ q₁ │ ◎
111
+
└────────┘ └────────┘
112
+
(start) (final)
90
113
```
91
114
115
+
This creates a simple two-state NFA where:
116
+
117
+
- q₀ is the start state (indicated by →)
118
+
- q₁ is the final/accepting state (indicated by ◎)
- Connect the final state of r₁ to the start state of r₂ with an ε-transition
96
141
- The start state of r₁ becomes the new start state
97
142
- The final state of r₂ becomes the new final state
98
143
99
144
**3. Union - Expressions r₁|r₂:**
100
145
146
+
```text
147
+
ε ┌──────────┐ ε
148
+
┌─────────> │ NFA₁ │ ─────────┐
149
+
│ │ (r₁) │ │
150
+
┌────┐ │ └──────────┘ │ ┌────┐
151
+
→ │ q₀ │ ───┤ ├──> │ qf │ ◎
152
+
└────┘ │ ┌──────────┐ │ └────┘
153
+
(new) │ │ NFA₂ │ │ (new)
154
+
└────────────> │ (r₂) │ ─────────┘
155
+
ε └──────────┘ ε
156
+
```
157
+
158
+
Construction steps:
159
+
101
160
- Build NFAs for r₁ and r₂ separately
102
-
- Create a new start state with ε-transitions to both start states
103
-
- Create a new final state with ε-transitions from both final states
161
+
- Create a new start state (q₀) with ε-transitions to both start states
162
+
- Create a new final state (qf) with ε-transitions from both final states
163
+
- This allows the automaton to "choose" either path
104
164
105
165
**4. Kleene Star - Expression r*:**
106
166
167
+
```text
168
+
┌─────── ε (skip) ─────────┐
169
+
│ │
170
+
↓ ↓
171
+
┌────┐ │ ┌──────────┐ ┌────┐
172
+
→ │ q₀ │ ───┴──> │ NFA │ ──────> │ qf │ ◎
173
+
└────┘ ε │ (r) │ ε └────┘
174
+
(new) └──────────┘ (new)
175
+
│ ↑
176
+
└──┘
177
+
ε (repeat)
178
+
```
179
+
180
+
Construction steps:
181
+
107
182
- Build an NFA for r
108
-
- Create new start and final states
183
+
- Create new start (q₀) and final (qf) states
109
184
- Add ε-transitions: new start → original start, original final → new final
110
-
- Add ε-transitions: new start → new final (for zero repetitions)
111
-
- Add ε-transitions: original final → original start (for repetitions)
185
+
- Add ε-transition: new start → new final (for zero repetitions)
186
+
- Add ε-transition: original final → original start (for multiple repetitions)
187
+
- This creates a loop allowing 0 or more repetitions
112
188
113
189
**5. Plus Operation - Expression r+:**
114
190
191
+
```text
192
+
┌────┐ ┌──────────┐ ┌────┐
193
+
→ │ q₀ │ ──────> │ NFA │ ──────> │ qf │ ◎
194
+
└────┘ ε │ (r) │ ε └────┘
195
+
(new) └──────────┘ (new)
196
+
│ ↑
197
+
└──┘
198
+
ε (repeat)
199
+
200
+
Note: Unlike r*, there is NO direct ε-transition from q₀ to qf
201
+
```
202
+
203
+
Construction steps:
204
+
115
205
- Build an NFA for r
116
206
- Similar to Kleene star but without the direct ε-transition from start to final
117
-
- This ensures at least one occurrence of r
207
+
- This ensures at least one occurrence of r before accepting
208
+
- The loop back allows for multiple repetitions
118
209
119
210
#### Step-by-Step Example
120
211
121
212
Let's convert the regular expression `(a|b)*abb` to an NFA:
122
213
123
214
1.**Build NFA for 'a'**: Simple two-state NFA
124
-
2.**Build NFA for 'b'**: Simple two-state NFA
215
+
216
+
```text
217
+
→ (q₀) ─a→ (q₁)
218
+
```
219
+
220
+
2.**Build NFA for 'b'**: Simple two-state NFA
221
+
222
+
```text
223
+
→ (q₂) ─b→ (q₃)
224
+
```
225
+
125
226
3.**Build NFA for 'a|b'**: Use union construction
227
+
228
+
```text
229
+
ε ─a→
230
+
┌───→ ( ) ───┐ ε
231
+
→ ( ) │ ├──→ ( )
232
+
└───→ ( ) ───┘
233
+
ε ─b→
234
+
```
235
+
126
236
4.**Build NFA for '(a|b)*'**: Apply Kleene star construction
127
-
5.**Build NFAs for second 'a', third 'b', fourth 'b'**: Simple constructions
128
-
6.**Concatenate all parts**: Connect using concatenation construction
129
237
130
-
The resulting NFA will have multiple states connected with both symbol transitions and ε-transitions.
238
+
```text
239
+
┌──────ε──────┐
240
+
│ ↓
241
+
→ ( ) ┴──ε→ [a|b] ──┴─ε→ ( )
242
+
↑ │
243
+
└──┘ ε
244
+
```
245
+
246
+
5.**Build NFAs for 'a', 'b', 'b'**: Simple constructions
247
+
248
+
```text
249
+
→ ( ) ─a→ ( ) → ( ) ─b→ ( ) → ( ) ─b→ ( ) ◎
250
+
```
251
+
252
+
6.**Concatenate all parts**: Final NFA for `(a|b)*abb`
253
+
254
+
```text
255
+
→ [(a|b)*] ─ε→ [a] ─ε→ [b] ─ε→ [b] ◎
256
+
```
257
+
258
+
The resulting NFA has approximately 11 states with multiple ε-transitions connecting the components. Each component maintains Thompson's properties: single entry and exit points, enabling clean composition.
0 commit comments