Skip to content

Commit 8d98f70

Browse files
Sid MohanSid Mohan
authored andcommitted
ficxed readme
1 parent e562f3b commit 8d98f70

File tree

1 file changed

+121
-12
lines changed

1 file changed

+121
-12
lines changed

README.md

Lines changed: 121 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,17 @@ DataFog can be installed via pip:
2525
pip install datafog
2626
```
2727

28-
For v4 we're introducing a CLI! see more details below.
29-
30-
# DataFog CLI Usage
31-
32-
> **🚀 Beta Release: v4.0.0-beta**
33-
>
34-
> This is a beta release of DataFog v4. Please report any issues or feedback to our [GitHub repository](https://github.com/datafog/datafog-python).
35-
36-
---
28+
# CLI
3729

3830
## 📚 Quick Reference
3931

4032
| Command | Description |
4133
| ------------------- | ------------------------------------ |
4234
| `scan-text` | Analyze text for PII |
4335
| `scan-image` | Extract and analyze text from images |
36+
| `redact-text` | Redact PII in text |
37+
| `replace-text` | Replace PII with anonymized values |
38+
| `hash-text` | Hash PII in text |
4439
| `health` | Check service status |
4540
| `show-config` | Display current settings |
4641
| `download-model` | Get a specific spaCy model |
@@ -85,6 +80,50 @@ To extract text and annotate PII:
8580
datafog scan-image "nokia-statement.png" --operations scan
8681
```
8782

83+
### Redacting Text
84+
85+
To redact PII in text:
86+
87+
```bash
88+
datafog redact-text "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
89+
```
90+
91+
which should output:
92+
93+
```bash
94+
[REDACTED] is the CEO of [REDACTED] and is based out of [REDACTED], [REDACTED]
95+
```
96+
97+
### Replacing Text
98+
99+
To replace detected PII:
100+
101+
```bash
102+
datafog replace-text "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
103+
```
104+
105+
which should return something like:
106+
107+
```bash
108+
[PERSON_B86CACE6] is the CEO of [UNKNOWN_445944D7] and is based out of [UNKNOWN_32BA5DCA], [UNKNOWN_B7DF4969]
109+
```
110+
111+
Note: a unique randomly generated identifier is created for each detected entity
112+
113+
### Hashing Text
114+
115+
You can select from SHA256, SHA3-256, and MD5 hashing algorithms to hash detected PII. Currently the hashed output does not match the length of the original entity, for privacy-preserving purposes. The default is SHA256.
116+
117+
```bash
118+
datafog hash-text "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
119+
```
120+
121+
generating an output which looks like this:
122+
123+
```bash
124+
5738a37f0af81594b8a8fd677e31b5e2cabd6d7791c89b9f0a1c233bb563ae39 is the CEO of f223faa96f22916294922b171a2696d868fd1f9129302eb41a45b2a2ea2ebbfd and is based out of ab5f41f04096cf7cd314357c4be26993eeebc0c094ca668506020017c35b7a9c, cad0535decc38b248b40e7aef9a1cfd91ce386fa5c46f05ea622649e7faf18fb
125+
```
126+
88127
### Utility Commands
89128

90129
#### 🏥 Health Check
@@ -135,7 +174,7 @@ datafog list-entities
135174

136175
💡 **Tip:** For more detailed information on each command, use the `--help` option, e.g., `datafog scan-text --help`.
137176

138-
# TODO: Reorganize below
177+
# Python SDK
139178

140179
## Getting Started
141180

@@ -151,7 +190,7 @@ client = DataFog(operations="scan")
151190
ocr_client = DataFog(operations="extract")
152191
```
153192

154-
### Text PII Annotation
193+
## Text PII Annotation
155194

156195
Here's an example of how to annotate PII in a text document:
157196

@@ -168,7 +207,7 @@ annotations = client.run_text_pipeline_sync(str_list=text_lines)
168207
print(annotations)
169208
```
170209

171-
### OCR PII Annotation
210+
## OCR PII Annotation
172211

173212
For OCR capabilities, you can use the following:
174213

@@ -191,6 +230,76 @@ loop.run_until_complete(run_ocr_pipeline_demo())
191230

192231
Note: The DataFog library uses asynchronous programming for OCR, so make sure to use the `async`/`await` syntax when calling the appropriate methods.
193232

233+
## Text Anonymization
234+
235+
DataFog provides various anonymization techniques to protect sensitive information. Here are examples of how to use them:
236+
237+
### Redacting Text
238+
239+
To redact PII in text:
240+
241+
```python
242+
from datafog import DataFog
243+
from datafog.config import OperationType
244+
245+
client = DataFog(operations=[OperationType.SCAN, OperationType.REDACT])
246+
247+
text = "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
248+
redacted_text = client.run_text_pipeline_sync([text])[0]
249+
print(redacted_text)
250+
```
251+
252+
Output:
253+
254+
```
255+
[REDACTED] is the CEO of [REDACTED] and is based out of [REDACTED], [REDACTED]
256+
```
257+
258+
### Replacing Text
259+
260+
To replace detected PII with unique identifiers:
261+
262+
```python
263+
from datafog import DataFog
264+
from datafog.config import OperationType
265+
266+
client = DataFog(operations=[OperationType.SCAN, OperationType.REPLACE])
267+
268+
text = "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
269+
replaced_text = client.run_text_pipeline_sync([text])[0]
270+
print(replaced_text)
271+
```
272+
273+
Output:
274+
275+
```
276+
[PERSON_B86CACE6] is the CEO of [UNKNOWN_445944D7] and is based out of [UNKNOWN_32BA5DCA], [UNKNOWN_B7DF4969]
277+
```
278+
279+
### Hashing Text
280+
281+
To hash detected PII:
282+
283+
```python
284+
from datafog import DataFog
285+
from datafog.config import OperationType
286+
from datafog.models.anonymizer import HashType
287+
288+
client = DataFog(operations=[OperationType.SCAN, OperationType.HASH], hash_type=HashType.SHA256)
289+
290+
text = "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
291+
hashed_text = client.run_text_pipeline_sync([text])[0]
292+
print(hashed_text)
293+
```
294+
295+
Output:
296+
297+
```
298+
5738a37f0af81594b8a8fd677e31b5e2cabd6d7791c89b9f0a1c233bb563ae39 is the CEO of f223faa96f22916294922b171a2696d868fd1f9129302eb41a45b2a2ea2ebbfd and is based out of ab5f41f04096cf7cd314357c4be26993eeebc0c094ca668506020017c35b7a9c, cad0535decc38b248b40e7aef9a1cfd91ce386fa5c46f05ea622649e7faf18fb
299+
```
300+
301+
You can choose from SHA256 (default), SHA3-256, and MD5 hashing algorithms by specifying the `hash_type` parameter
302+
194303
## Examples
195304

196305
For more detailed examples, check out our Jupyter notebooks in the `examples/` directory:

0 commit comments

Comments
 (0)