Skip to content

Labic-ICMC-USP/multi-annotator-labelstudio-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Annotator NER Pipeline with Label Studio Community Edition

This project provides a practical pipeline for Named Entity Recognition (NER) annotation with multiple annotators using Label Studio Community Edition.

The main goal is to help teams set up a shared annotation environment that is easy to reproduce, easy to maintain, and suitable for collaborative data labeling. This is especially useful in research and applied NLP projects where annotation quality, user management, and data persistence are important.

A multi-annotator setup is important because it allows:

  • collaboration among several annotators in the same project
  • better control of annotation workflows
  • later analysis of agreement and annotation quality
  • more reliable dataset construction for NER tasks

To support this workflow, the project is organized into simple parts.

Project Structure

Why this project

Many annotation tutorials focus only on local or single-user setups. In practice, NER projects often require:

  • more than one annotator
  • persistent storage
  • project organization
  • API access for automation
  • a setup that can be reused in future datasets

This repository was designed to address these needs with a straightforward and reproducible approach.

Expected outcome

By following the tutorial, you will have:

  • a running Label Studio instance
  • PostgreSQL as the backend database
  • persistent storage for annotations and metadata
  • a NER project ready for collaborative annotation
  • a foundation for future automation with the Label Studio API

Audience

This tutorial is useful for:

  • NLP researchers
  • data science teams
  • students building annotation datasets
  • practitioners preparing custom NER corpora

Notes

This project uses Label Studio Community Edition and focuses on a practical self-hosted setup. It is intended as a simple starting point that can later be extended with reverse proxy, HTTPS, backups, and integration scripts.

Next steps

Start with Part 1, then continue to Part 2.

About

Collaborative annotation pipeline with Label Studio Community Edition, Docker, and PostgreSQL. Designed for multi-annotator NLP workflows, with persistent storage, project setup, and API-ready support for NER dataset creation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors