A TypeScript script that uses the Claude Agent SDK to analyze Google Drive folders and extract company information from documents, outputting the results to a CSV file.
- Fetches all subfolders from a parent Google Drive folder
- Reads documents (Google Docs, text files) within each subfolder
- Uses Claude Agent SDK with enhanced tool access to extract company information:
- Canonical name
- Domain name
- English name
- Chinese name
- Japanese name
- Short summary (1-2 sentences)
- Can search the web for additional company information
- Can fetch company websites for verification
- Outputs results to CSV with Google Drive links
npm install- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable the Google Drive API:
- Go to "APIs & Services" > "Library"
- Search for "Google Drive API"
- Click "Enable"
- Go to "APIs & Services" > "Credentials"
- Click "Create Credentials" > "OAuth client ID"
- Choose "Desktop app" as the application type
- Download the credentials JSON file
- Copy the
client_idandclient_secretfrom the JSON
The script supports two AI providers for extraction:
- Go to Anthropic Console
- Create an account or sign in
- Generate an API key
- Set
AI_PROVIDER=claudein your.envfile
- Go to Google AI Studio
- Create or sign in with your Google account
- Generate an API key
- Set
AI_PROVIDER=geminiin your.envfile
cp .env.example .envEdit .env and fill in your credentials:
GOOGLE_CLIENT_ID=your_client_id_here
GOOGLE_CLIENT_SECRET=your_client_secret_here
GOOGLE_REDIRECT_URI=http://localhost:3000/oauth2callback
PARENT_FOLDER_URL=https://drive.google.com/drive/folders/YOUR_FOLDER_ID
# AI Provider: "claude" or "gemini"
AI_PROVIDER=claude
# Claude API Key (if using Claude)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Gemini API Key (if using Gemini)
GEMINI_API_KEY=your_gemini_api_key_hereRun the authentication setup:
npm run authThis will:
- Open a URL in your browser
- Ask you to authorize the application
- Save the access token to
token.json
npm startOr specify the folder URL and output path:
ts-node google-drive-cleanup.ts "https://drive.google.com/drive/folders/YOUR_FOLDER_ID" ./output.csvThe script generates a CSV file with the following columns:
- Folder Name: Original folder name in Google Drive
- Canonical Name: Extracted official company name
- Domain Name: Company website domain
- English Name: Company name in English
- Chinese Name: Company name in Chinese (if found)
- Japanese Name: Company name in Japanese (if found)
- Summary: AI-generated 1-2 sentence summary of the company
- Google Drive Link: Direct link to the folder
- Folder ID: Google Drive folder ID
This script processes company folders in Google Drive to extract and organize business information into a CSV file.
What it does:
Scans all subfolders within a parent Google Drive folder Filters for files containing "call memo" in the filename Extracts text from various formats (Google Docs, PDFs, Word, Excel) Uses AI (Claude or Gemini) to analyze the documents and extract structured company data: Canonical company name Domain name English, Chinese, and Japanese names Business summary Outputs results to a CSV file with links back to the original folders
- Google Docs (
.gdoc) - Plain text files (
.txt) - CSV files (
.csv) - Limited support for PDFs (requires additional setup)
The script supports two AI providers with different capabilities:
When using AI_PROVIDER=claude, the following tools are enabled:
- Read: Deep analysis of document content (PDFs, DOCX, etc.)
- WebSearch: Search the web for additional company information
- WebFetch: Fetch and analyze company websites for verification
Claude can:
- Read and analyze PDF, Word, Excel files directly
- Verify company domains by checking their websites
- Look up missing information (e.g., find a company's English name if only Chinese is in documents)
- Cross-reference information from multiple sources
- Provide more accurate and complete company profiles
When using AI_PROVIDER=gemini, you get:
- Local Text Extraction: Extracts text from PDF, DOCX, Excel files locally on your machine
- Fast processing with Gemini 2.0 Flash
- Cost-effective API pricing
- Good multilingual support (English, Chinese, Japanese)
- Strong performance on structured data extraction
- Complete privacy - files never leave your machine
How it works: Files are downloaded from Google Drive, text is extracted locally using specialized libraries (pdf-parse, mammoth, xlsx), and only the extracted text is sent to Gemini for analysis.
To switch between Claude and Gemini:
- Open your
.envfile - Change
AI_PROVIDER=claudetoAI_PROVIDER=gemini(or vice versa) - Ensure the corresponding API key is set (
ANTHROPIC_API_KEYorGEMINI_API_KEY) - Run the script
Which provider should I use?
-
Use Claude if:
- You want web search and website verification capabilities
- You need the highest quality extraction from very complex documents
- You want Claude to intelligently search for missing company information
-
Use Gemini if:
- You want faster processing (Gemini 2.0 Flash is very fast)
- You want lower API costs
- You need good multilingual support
- You want a simpler, more cost-effective solution
Both providers:
- Extract text locally from PDFs, DOCX, and Excel files (complete privacy)
- Process files without uploading to external services
- Support multilingual company information extraction
The script automatically saves progress after processing each folder:
- Progress is saved to
progress.json - Results are appended to CSV immediately after each folder
- Full logs are written to
processing.log
If the script crashes or is interrupted:
- Simply run
npm startagain - It will skip already processed folders automatically
- Check
processing.logto see what happened
To start completely fresh:
rm progress.json company_info.csv processing.log
npm start- Make sure your
.envfile has the correctGOOGLE_CLIENT_IDandGOOGLE_CLIENT_SECRET - Delete
token.jsonand runnpm run authagain
- Make sure the Google Drive API is enabled in your Google Cloud project
- Check that you've authorized the correct Google account
- "ANTHROPIC_API_KEY is required": Make sure you've set
ANTHROPIC_API_KEYin your.envfile when usingAI_PROVIDER=claude- Get an API key from Anthropic Console
- "GEMINI_API_KEY is required": Make sure you've set
GEMINI_API_KEYin your.envfile when usingAI_PROVIDER=gemini- Get an API key from Google AI Studio
- Check that your documents contain company information
- Verify that files are readable (Google Docs, text files)
- Check the console output for specific error messages
npm run buildts-node google-drive-cleanup.tsMIT