Google Drive Cleanup Script

A TypeScript script that uses the Claude Agent SDK to analyze Google Drive folders and extract company information from documents, outputting the results to a CSV file.

Features

Fetches all subfolders from a parent Google Drive folder
Reads documents (Google Docs, text files) within each subfolder
Uses Claude Agent SDK with enhanced tool access to extract company information:
- Canonical name
- Domain name
- English name
- Chinese name
- Japanese name
- Short summary (1-2 sentences)
- Can search the web for additional company information
- Can fetch company websites for verification
Outputs results to CSV with Google Drive links

Setup

1. Install Dependencies

npm install

2. Set up Google Cloud Project

Go to Google Cloud Console
Create a new project or select an existing one
Enable the Google Drive API:
- Go to "APIs & Services" > "Library"
- Search for "Google Drive API"
- Click "Enable"

3. Create OAuth2 Credentials

Go to "APIs & Services" > "Credentials"
Click "Create Credentials" > "OAuth client ID"
Choose "Desktop app" as the application type
Download the credentials JSON file
Copy the client_id and client_secret from the JSON

4. Choose Your AI Provider

The script supports two AI providers for extraction:

Option A: Claude (Anthropic)

Go to Anthropic Console
Create an account or sign in
Generate an API key
Set AI_PROVIDER=claude in your .env file

Option B: Gemini (Google)

Go to Google AI Studio
Create or sign in with your Google account
Generate an API key
Set AI_PROVIDER=gemini in your .env file

5. Configure Environment Variables

cp .env.example .env

Edit .env and fill in your credentials:

GOOGLE_CLIENT_ID=your_client_id_here
GOOGLE_CLIENT_SECRET=your_client_secret_here
GOOGLE_REDIRECT_URI=http://localhost:3000/oauth2callback
PARENT_FOLDER_URL=https://drive.google.com/drive/folders/YOUR_FOLDER_ID

# AI Provider: "claude" or "gemini"
AI_PROVIDER=claude

# Claude API Key (if using Claude)
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Gemini API Key (if using Gemini)
GEMINI_API_KEY=your_gemini_api_key_here

6. Authenticate

Run the authentication setup:

npm run auth

This will:

Open a URL in your browser
Ask you to authorize the application
Save the access token to token.json

Usage

Basic Usage

npm start

Or specify the folder URL and output path:

ts-node google-drive-cleanup.ts "https://drive.google.com/drive/folders/YOUR_FOLDER_ID" ./output.csv

Output

The script generates a CSV file with the following columns:

Folder Name: Original folder name in Google Drive
Canonical Name: Extracted official company name
Domain Name: Company website domain
English Name: Company name in English
Chinese Name: Company name in Chinese (if found)
Japanese Name: Company name in Japanese (if found)
Summary: AI-generated 1-2 sentence summary of the company
Google Drive Link: Direct link to the folder
Folder ID: Google Drive folder ID

How It Works

This script processes company folders in Google Drive to extract and organize business information into a CSV file.

What it does:

Scans all subfolders within a parent Google Drive folder Filters for files containing "call memo" in the filename Extracts text from various formats (Google Docs, PDFs, Word, Excel) Uses AI (Claude or Gemini) to analyze the documents and extract structured company data: Canonical company name Domain name English, Chinese, and Japanese names Business summary Outputs results to a CSV file with links back to the original folders

Supported File Types

Google Docs (.gdoc)
Plain text files (.txt)
CSV files (.csv)
Limited support for PDFs (requires additional setup)

Enhanced AI Capabilities

The script supports two AI providers with different capabilities:

Claude (via Agent SDK)

When using AI_PROVIDER=claude, the following tools are enabled:

Read: Deep analysis of document content (PDFs, DOCX, etc.)
WebSearch: Search the web for additional company information
WebFetch: Fetch and analyze company websites for verification

Claude can:

Read and analyze PDF, Word, Excel files directly
Verify company domains by checking their websites
Look up missing information (e.g., find a company's English name if only Chinese is in documents)
Cross-reference information from multiple sources
Provide more accurate and complete company profiles

Gemini (Google AI)

When using AI_PROVIDER=gemini, you get:

Local Text Extraction: Extracts text from PDF, DOCX, Excel files locally on your machine
Fast processing with Gemini 2.0 Flash
Cost-effective API pricing
Good multilingual support (English, Chinese, Japanese)
Strong performance on structured data extraction
Complete privacy - files never leave your machine

How it works: Files are downloaded from Google Drive, text is extracted locally using specialized libraries (pdf-parse, mammoth, xlsx), and only the extracted text is sent to Gemini for analysis.

Switching AI Providers

To switch between Claude and Gemini:

Open your .env file
Change AI_PROVIDER=claude to AI_PROVIDER=gemini (or vice versa)
Ensure the corresponding API key is set (ANTHROPIC_API_KEY or GEMINI_API_KEY)
Run the script

Which provider should I use?

Use Claude if:
- You want web search and website verification capabilities
- You need the highest quality extraction from very complex documents
- You want Claude to intelligently search for missing company information
Use Gemini if:
- You want faster processing (Gemini 2.0 Flash is very fast)
- You want lower API costs
- You need good multilingual support
- You want a simpler, more cost-effective solution

Both providers:

Extract text locally from PDFs, DOCX, and Excel files (complete privacy)
Process files without uploading to external services
Support multilingual company information extraction

Resume After Failures

The script automatically saves progress after processing each folder:

Progress is saved to progress.json
Results are appended to CSV immediately after each folder
Full logs are written to processing.log

If the script crashes or is interrupted:

Simply run npm start again
It will skip already processed folders automatically
Check processing.log to see what happened

To start completely fresh:

rm progress.json company_info.csv processing.log
npm start

Troubleshooting

"Invalid credentials" error

Make sure your .env file has the correct GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET
Delete token.json and run npm run auth again

"Insufficient permissions" error

Make sure the Google Drive API is enabled in your Google Cloud project
Check that you've authorized the correct Google account

API key errors

"ANTHROPIC_API_KEY is required": Make sure you've set ANTHROPIC_API_KEY in your .env file when using AI_PROVIDER=claude
- Get an API key from Anthropic Console
"GEMINI_API_KEY is required": Make sure you've set GEMINI_API_KEY in your .env file when using AI_PROVIDER=gemini
- Get an API key from Google AI Studio

No data extracted

Check that your documents contain company information
Verify that files are readable (Google Docs, text files)
Check the console output for specific error messages

Development

Build

npm run build

Run with ts-node

ts-node google-drive-cleanup.ts

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
STEPS.md		STEPS.md
combine-enriched.ts		combine-enriched.ts
compare-companies.ts		compare-companies.ts
enrich-company.ts		enrich-company.ts
google-drive-cleanup.ts		google-drive-cleanup.ts
merge-affinity.ts		merge-affinity.ts
package-lock.json		package-lock.json
package.json		package.json
setup-auth.ts		setup-auth.ts
test-gcis.ts		test-gcis.ts
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Google Drive Cleanup Script

Features

Setup

1. Install Dependencies

2. Set up Google Cloud Project

3. Create OAuth2 Credentials

4. Choose Your AI Provider

Option A: Claude (Anthropic)

Option B: Gemini (Google)

5. Configure Environment Variables

6. Authenticate

Usage

Basic Usage

Output

How It Works

Supported File Types

Enhanced AI Capabilities

Claude (via Agent SDK)

Gemini (Google AI)

Switching AI Providers

Resume After Failures

Troubleshooting

"Invalid credentials" error

"Insufficient permissions" error

API key errors

No data extracted

Development

Build

Run with ts-node

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages