|
76 | 76 | # COMMAND ----------
|
77 | 77 |
|
78 | 78 | # MAGIC %md
|
79 |
| -# MAGIC ## Fetch seed data |
| 79 | +# MAGIC ## Create seed data |
80 | 80 | # MAGIC
|
81 |
| -# MAGIC Next we'll load a demo dataset into a Spark table so you can see how to easily load assets into Labelbox via URL. For simplicity, you can get a Dataset ID from Labelbox and we'll load those URLs into a Spark table for you (so you don't need to worry about finding data to get this demo notebook to run). Below we'll grab the "Example Nature Dataset" included in Labelbox trials. |
| 81 | +# MAGIC Next we'll load a demo dataset into a Spark table so you can see how to easily load assets into Labelbox via URLs with the Labelbox Connector for Databricks. |
82 | 82 | # MAGIC
|
83 | 83 | # MAGIC Also, Labelbox has native support for AWS, Azure, and GCP cloud storage. You can connect Labelbox to your storage via [Delegated Access](https://docs.labelbox.com/docs/iam-delegated-access) and easily load those assets for annotation. For more information, you can watch this [video](https://youtu.be/wlWo6EmPDV4).
|
| 84 | +# MAGIC |
| 85 | +# MAGIC You can also add data to Labelbox [using the Labelbox SDK directly](https://docs.labelbox.com/docs/datasets-datarows). We recommend using the SDK if you have complicated dataset creation requirements (e.g. including metadata with your dataset) which aren't handled by the Labelbox Connector for Databricks. |
84 | 86 |
|
85 | 87 | # COMMAND ----------
|
86 | 88 |
|
87 |
| -sample_dataset = next( |
88 |
| - client.get_datasets(where=(Dataset.name == "Example Nature Dataset"))) |
89 |
| -sample_dataset.uid |
| 89 | +sample_dataset_dict = { |
| 90 | + "external_id": [ |
| 91 | + "sample1.jpg", "sample2.jpg", "sample3.jpg", "sample4.jpg", |
| 92 | + "sample5.jpg", "sample6.jpg", "sample7.jpg", "sample8.jpg", |
| 93 | + "sample9.jpg", "sample10.jpg" |
| 94 | + ], |
| 95 | + "row_data": [ |
| 96 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000247422.jpg", |
| 97 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000484849.jpg", |
| 98 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000215782.jpg", |
| 99 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000312024.jpg", |
| 100 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000486139.jpg", |
| 101 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000302713.jpg", |
| 102 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000523272.jpg", |
| 103 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000094514.jpg", |
| 104 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000050578.jpg", |
| 105 | + "https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000073727.jpg" |
| 106 | + ] |
| 107 | +} |
| 108 | + |
| 109 | +df = pd.DataFrame.from_dict(sample_dataset_dict).to_spark( |
| 110 | +) #produces our demo Spark table of datarows for Labelbox |
90 | 111 |
|
91 | 112 | # COMMAND ----------
|
92 | 113 |
|
|
96 | 117 | tblList = spark.catalog.listTables()
|
97 | 118 |
|
98 | 119 | if not any([table.name == SAMPLE_TABLE for table in tblList]):
|
99 |
| - |
100 |
| - df = pd.DataFrame([{ |
101 |
| - "external_id": dr.external_id, |
102 |
| - "row_data": dr.row_data |
103 |
| - } for dr in sample_dataset.data_rows()]).to_spark() |
104 |
| - df.registerTempTable(SAMPLE_TABLE) |
| 120 | + df.createOrReplaceTempView(SAMPLE_TABLE) |
105 | 121 | print(f"Registered table: {SAMPLE_TABLE}")
|
106 | 122 |
|
107 | 123 | # COMMAND ----------
|
108 | 124 |
|
109 | 125 | # MAGIC %md
|
110 |
| -# MAGIC You should now have a temporary table "sample_unstructured_data" which includes the file names and URLs for some demo images. We're going to share this table with Labelbox using the Labelbox Connector for Databricks! |
| 126 | +# MAGIC You should now have a temporary table "sample_unstructured_data" which includes the file names and URLs for some demo images. We're going to use this table with Labelbox using the Labelbox Connector for Databricks! |
111 | 127 |
|
112 | 128 | # COMMAND ----------
|
113 | 129 |
|
|
167 | 183 | ontology = OntologyBuilder()
|
168 | 184 |
|
169 | 185 | tools = [
|
170 |
| - Tool(tool=Tool.Type.BBOX, name="Frog"), |
| 186 | + Tool(tool=Tool.Type.BBOX, name="Car"), |
171 | 187 | Tool(tool=Tool.Type.BBOX, name="Flower"),
|
172 | 188 | Tool(tool=Tool.Type.BBOX, name="Fruit"),
|
173 | 189 | Tool(tool=Tool.Type.BBOX, name="Plant"),
|
174 | 190 | Tool(tool=Tool.Type.SEGMENTATION, name="Bird"),
|
175 | 191 | Tool(tool=Tool.Type.SEGMENTATION, name="Person"),
|
176 |
| - Tool(tool=Tool.Type.SEGMENTATION, name="Sleep"), |
177 |
| - Tool(tool=Tool.Type.SEGMENTATION, name="Yak"), |
| 192 | + Tool(tool=Tool.Type.SEGMENTATION, name="Dog"), |
178 | 193 | Tool(tool=Tool.Type.SEGMENTATION, name="Gemstone"),
|
179 | 194 | ]
|
180 | 195 | for tool in tools:
|
|
223 | 238 | # COMMAND ----------
|
224 | 239 |
|
225 | 240 | labels_table = labelspark.get_annotations(client, project_demo.uid, spark, sc)
|
226 |
| -labels_table.registerTempTable(LABEL_TABLE) |
| 241 | +labels_table.createOrReplaceTempView(LABEL_TABLE) |
227 | 242 | display(labels_table)
|
228 | 243 |
|
229 | 244 | # COMMAND ----------
|
230 | 245 |
|
231 | 246 | # MAGIC %md
|
232 | 247 | # MAGIC ## Other features of Labelbox
|
233 | 248 | # MAGIC
|
234 |
| -# MAGIC <h3> [Model Assisted Labeling](https://docs.labelbox.com/docs/model-assisted-labeling) </h3> |
235 |
| -# MAGIC Once you train a model on your initial set of unstructured data, you can plug that model into Labelbox to support a Model Assisted Labeling workflow. Review the outputs of your model, make corrections, and retrain with ease! You can reduce future labeling costs by >50% by leveraging model assisted labeling. |
| 249 | +# MAGIC [Model Assisted Labeling](https://docs.labelbox.com/docs/model-assisted-labeling) |
| 250 | +# MAGIC <br>Once you train a model on your initial set of unstructured data, you can plug that model into Labelbox to support a Model Assisted Labeling workflow. Review the outputs of your model, make corrections, and retrain with ease! You can reduce future labeling costs by >50% by leveraging model assisted labeling. |
236 | 251 | # MAGIC
|
237 | 252 | # MAGIC <img src="https://files.readme.io/4c65e12-model-assisted-labeling.png" alt="MAL" width="800"/>
|
238 | 253 | # MAGIC
|
239 |
| -# MAGIC <h3> [Catalog](https://docs.labelbox.com/docs/catalog) </h3> |
240 |
| -# MAGIC Once you've created datasets and annotations in Labelbox, you can easily browse your datasets and curate new ones in Catalog. Use your model embeddings to find images by similarity search. |
| 254 | +# MAGIC [Catalog](https://docs.labelbox.com/docs/catalog) |
| 255 | +# MAGIC <br>Once you've created datasets and annotations in Labelbox, you can easily browse your datasets and curate new ones in Catalog. Use your model embeddings to find images by similarity search. |
241 | 256 | # MAGIC
|
242 | 257 | # MAGIC <img src="https://files.readme.io/14f82d4-catalog-marketing.jpg" alt="Catalog" width="800"/>
|
243 | 258 | # MAGIC
|
244 |
| -# MAGIC <h3> [Model Diagnostics](https://labelbox.com/product/model-diagnostics) </h3> |
245 |
| -# MAGIC Labelbox complements your MLFlow experiment tracking with the ability to easily visualize experiment predictions at scale. Model Diagnostics helps you quickly identify areas where your model is weak so you can collect the right data and refine the next model iteration. |
| 259 | +# MAGIC [Model Diagnostics](https://labelbox.com/product/model-diagnostics) |
| 260 | +# MAGIC <br>Labelbox complements your MLFlow experiment tracking with the ability to easily visualize experiment predictions at scale. Model Diagnostics helps you quickly identify areas where your model is weak so you can collect the right data and refine the next model iteration. |
246 | 261 | # MAGIC
|
247 | 262 | # MAGIC <img src="https://images.ctfassets.net/j20krz61k3rk/4LfIELIjpN6cou4uoFptka/20cbdc38cc075b82f126c2c733fb7082/identify-patterns-in-your-model-behavior.png" alt="Diagnostics" width="800"/>
|
248 | 263 |
|
|
255 | 270 | # MAGIC * Checkout our [notebook examples](https://github.com/Labelbox/labelspark/tree/master/notebooks) to follow along with interactive tutorials
|
256 | 271 | # MAGIC * view our [API reference](https://labelbox.com/docs/python-api/api-reference).
|
257 | 272 | # MAGIC
|
258 |
| -# MAGIC <h4>Questions or comments? Reach out to us at [ecosystem+databricks@labelbox.com](mailto:ecosystem+databricks@labelbox.com) |
| 273 | +# MAGIC <b>Questions or comments? Reach out to us at [ecosystem+databricks@labelbox.com](mailto:ecosystem+databricks@labelbox.com) |
259 | 274 |
|
260 | 275 | # COMMAND ----------
|
261 | 276 |
|
262 | 277 | # MAGIC %md
|
263 |
| -# MAGIC Copyright Labelbox, Inc. 2021. The source in this notebook is provided subject to the [Labelbox Terms of Service](https://docs.labelbox.com/page/terms-of-service). All included or referenced third party libraries are subject to the licenses set forth below. |
| 278 | +# MAGIC Copyright Labelbox, Inc. 2022. The source in this notebook is provided subject to the [Labelbox Terms of Service](https://docs.labelbox.com/page/terms-of-service). All included or referenced third party libraries are subject to the licenses set forth below. |
264 | 279 | # MAGIC
|
265 | 280 | # MAGIC |Library Name|Library license | Library License URL | Library Source URL |
|
266 | 281 | # MAGIC |---|---|---|---|
|
|
0 commit comments