Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
7645030
refactor: Gemini API 관련 내용 제거
hjham0856 Nov 11, 2025
c2dbe25
chore: bedrock 호출을 위한 boto3 의존성 추가
hjham0856 Nov 12, 2025
5271432
feat: bedrock의 claude 3.5 haiku 모델을 사용하여 약관 요약을 제공하도록 수정
hjham0856 Nov 12, 2025
d93a956
feat: bedrock의 claude 3.5 haiku 모델을 사용하여 약관 조항 평가를 제공하도록 수정
hjham0856 Nov 12, 2025
c869bb0
fix: 수정된 tos_summarize와 tos_evaluate를 사용하도록 수정
hjham0856 Nov 12, 2025
0e112fd
Merge pull request #12 from TermLens/feature/9-migrate-to-bedrock
hjham0856 Nov 12, 2025
93db059
Merge pull request #13 from TermLens/main
hjham0856 Nov 12, 2025
5daec2e
fix: tos_summarize와 tos_evaluate에서 system instruction을 올바르게 전달하도록 수정
hjham0856 Nov 12, 2025
b8bf9b4
Merge pull request #14 from TermLens/feature/9-migrate-to-bedrock
hjham0856 Nov 12, 2025
8873ce0
fix: claude가 불필요한 서론/결론 등을 말하지 못하도록 형식을 구체적으로 수정
hjham0856 Nov 12, 2025
adf87f7
Merge pull request #15 from TermLens/feature/9-migrate-to-bedrock
hjham0856 Nov 12, 2025
3f4c58b
feat: claude 모델을 3.5 haiku에서 sonnet 4.5로 변경
hjham0856 Nov 12, 2025
76bddf3
Merge pull request #16 from TermLens/feature/9-migrate-to-bedrock
hjham0856 Nov 12, 2025
0da4938
fix: claude가 양식을 더 잘 지키도록 프롬프트 개선
hjham0856 Nov 12, 2025
d11a59a
Merge pull request #17 from TermLens/feature/9-migrate-to-bedrock
hjham0856 Nov 12, 2025
2d96096
fix: claude의 응답을 여는 중괄호부터 닫는 중괄호까지 잘라 사용
hjham0856 Nov 12, 2025
64b6eee
Merge pull request #18 from TermLens/feature/9-migrate-to-bedrock
hjham0856 Nov 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@

# 아키텍처
AWS Lambda
Gemini

# Getting Started
## 개발환경 구축
Expand All @@ -13,7 +12,6 @@ bash를 기준으로 작성됨
sudo apt install python3.12-venv
python3 -m venv venv # 가상환경 생성
source venv/bin/activate # 가상환경 사용
pip install -q -U google-genai
```

이후부터 작업 시 `source venv/bin/activate` 명령으로 가상환경을 실행한 후 작업
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
google-genai
boto3
10 changes: 2 additions & 8 deletions src/lambda_function.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
import json
import os
from google import genai

from tos_summarize import tos_summarize
from tos_evaluate import tos_evaluate

def lambda_handler(event, context):
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')
client = genai.Client(api_key=GEMINI_API_KEY)

# url이 없거나 빈 문자열인 경우
if ('queryStringParameters' not in event
or 'url' not in event['queryStringParameters']
Expand Down Expand Up @@ -36,11 +31,10 @@ def lambda_handler(event, context):
# TODO: 기존 URL 기반 캐싱 로직 구현

# text_html 문자열에서 중요 조항 위주로 약관 요약
summarized_tos = tos_summarize(text_html, client)
summarized_tos = tos_summarize(text_html)

# 약관 조항에 대해 분석 수행
# gemini api의 rate limit 문제로, 여러 조항을 한 번에 보내지 않고 하나씩 처리
evaluation_result = tos_evaluate(summarized_tos, client)
evaluation_result = tos_evaluate(summarized_tos)

return {
'statusCode': 200,
Expand Down
8 changes: 0 additions & 8 deletions src/test_local.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import os

from lambda_function import lambda_handler

Expand All @@ -15,13 +14,6 @@ def create_test_event(url: str, body: str) -> dict:
def run_test():
"""Lambda 함수 로컬 테스트 실행"""

# GEMINI_API_KEY 확인
api_key = os.getenv('GEMINI_API_KEY')
if not api_key:
print("GEMINI_API_KEY 환경변수가 설정되지 않았습니다.")
print()
return

# 테스트 데이터
test_url = "https://example.com/terms"
test_body = """
Expand Down
109 changes: 66 additions & 43 deletions src/tos_evaluate.py
Original file line number Diff line number Diff line change
@@ -1,48 +1,71 @@
import enum
import json
from google import genai
from google.genai import types
import boto3

def tos_evaluate(summarized_tos, client):
response = client.models.generate_content(
model="gemini-2.5-flash-lite",
config=types.GenerateContentConfig(
system_instruction=
"""
당신은 약관 분석 전문가입니다. 주어진 약관 및 각 조항을 평가합니다.
각 약관 조항은 good, neutral, bad 중 하나로 평가합니다.
'good'은 이용자에게 유리한 조항, 'neutral'은 중립적인 조항, 'bad'는 이용자에게 불리한 조항을 의미합니다.
A, B, C, D, E 등급 중 하나로 전체 약관을 평가합니다.
A는 매우 우수한 약관, E는 매우 불리한 약관을 의미합니다.
한국어로 응답합니다.
""",
response_mime_type="application/json",
response_schema={
"type": "object",
"properties": {
"overall_evaluation": {
"type": "string"
},
"evaluation_for_each_clause": {
"type": "array",
"items": {
"type": "object",
"properties": {
"evaluation": {
"type": "string"
},
"summarized_clause": {
"type": "string"
}
}
}
}
}
}
),
contents=summarized_tos,
def tos_evaluate(summarized_tos):
system_instruction=[{"text": """
당신은 전문적인 약관 분석 AI입니다. 주어진 약관 내용 및 각 조항을 평가합니다.
주어진 약관은 주요 조항을 위주로 요약된 내용입니다.
JSON 양식으로, 다음의 key값을 사용합니다.
"overall_evaluation": "A|B|C|D|E",
"evaluation_for_each_clause": [
"evaluation": "good|neutral|bad",
"summarized_clause": "조항 요약 내용"
]
"overall_evaluation"은 전체 약관의 등급을 나타냅니다. A는 가장 우수한 약관, E는 가장 불리한 약관입니다.
"evaluation_for_each_clause"는 각 조항에 대한 평가를 포함하는 리스트입니다.
"evaluation"은 각 조항이 소비자에게 유리한지(good)/중립적인지(neutral)/불리한지(bad)를 나타냅니다.
"summarized_clause"는 각 조항의 요약된 내용을 포함합니다.
JSON 형식 이외에 서론이나 결론, 코드 블럭 따위는 절대로 포함하지 마십시오.
응답은 곧바로 json.loads()를 통해 파싱되기 때문에 반드시 여는 중괄호(`{`})로 시작하고 닫는 중괄호(`}`)로 끝나야 합니다.
예시 응답:
{
"overall_evaluation": "D",
"evaluation_for_each_clause": [
{
"evaluation": "neutral",
"summarized_clause": "AWS 사이트 콘텐츠의 저작권은 AWS 또는 제공자에게 있으며, 관련 법률에 의해 보호됨을 명시합니다."
},
{
"evaluation": "neutral",
"summarized_clause": "AWS 상표 및 트레이드 드레스는 허가 없이 사용할 수 없으며, 타사 상표는 해당 소유자에게 있음을 명시합니다."
},
{
"evaluation": "bad",
"summarized_clause": "개인적인 사이트 이용 목적 외 상업적 재판매, 복제, 변경 등은 사전 서면 동의 없이는 금지됨을 명시하며, 이는 일반적인 내용이나 명확한 제한을 둠."
},
{
"evaluation": "bad",
"summarized_clause": "이용자의 계정 및 비밀번호 관리 책임을 명시하고, 계정 활동에 대한 책임을 이용자에게 부과합니다. 또한 AWS는 일방적으로 서비스 거절 및 계정 해지 권한을 가집니다."
}
]
}
"""}]
client = boto3.client(
service_name="bedrock-runtime",
region_name="us-west-2"
)
Comment on lines +43 to +46
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boto3 client is created on every function invocation. In AWS Lambda, clients should be initialized outside the handler function to leverage connection pooling and reduce cold start overhead. Consider moving the client creation outside the tos_evaluate function or making it a module-level singleton.

Copilot uses AI. Check for mistakes.

model_id = "us.anthropic.claude-sonnet-4-5-20250929-v1:0"
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model ID is duplicated in both tos_summarize.py and tos_evaluate.py. Consider extracting this to a shared constant or configuration file to maintain consistency and simplify future model updates.

Copilot uses AI. Check for mistakes.
messages = [{
"role": "user",
"content": [
{"text": summarized_tos}
]
}]

response = client.converse(
modelId=model_id,
system=system_instruction,
messages=messages,
)
Comment on lines +56 to 60
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Bedrock API call lacks error handling. If the API fails (e.g., throttling, service unavailable, invalid model ID), the Lambda will raise an unhandled exception. Add try-except blocks to catch botocore.exceptions.ClientError and return meaningful error messages.

Copilot uses AI. Check for mistakes.

# response에서 JSON 파싱 후 반환
print("TOS Evaluation Response:")
print(response)
Comment on lines +62 to +63
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements are present in production code. These will clutter CloudWatch Logs with potentially large response objects. Consider using proper logging with appropriate log levels (e.g., logging.debug()) or remove these statements for production.

Copilot uses AI. Check for mistakes.

return json.loads(response.text)
text = response['output']['message']['content'][0]['text']
start = text.find('{')
end = text.rfind('}') + 1
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual JSON extraction is fragile and can fail if the response doesn't contain curly braces or if the JSON is malformed. If find('{') returns -1, the slicing will produce incorrect results. Add validation to check if start is -1 and handle the case where no valid JSON is found, or consider using Bedrock's response format configuration to ensure JSON-only output.

Suggested change
end = text.rfind('}') + 1
end = text.rfind('}') + 1
if start == -1 or end == 0 or end <= start:
raise ValueError("No valid JSON object found in the response text.")

Copilot uses AI. Check for mistakes.
json_text = text[start:end]

# response에서 JSON 파싱 후 반환
return json.loads(json_text)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json.loads() call can raise json.JSONDecodeError if the extracted text is not valid JSON. This exception is not handled and will cause the Lambda function to fail. Wrap this in a try-except block to catch and handle JSON parsing errors gracefully.

Copilot uses AI. Check for mistakes.
44 changes: 29 additions & 15 deletions src/tos_summarize.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,32 @@
import json
from google import genai
from google.genai import types
import boto3

def tos_summarize(text_html, client):
response = client.models.generate_content(
model="gemini-2.5-flash-lite",
config=types.GenerateContentConfig(
system_instruction="""
당신은 약관 분석 전문가입니다.
주어진 html 페이지에서 주요 약관 내용을 요약합니다.
한국어로 응답합니다.
""",
),
contents=text_html,
def tos_summarize(text_html):
system_instruction=[{"text": """
당신은 약관 분석 전문가입니다.
주어진 html 페이지에서 주요 약관 내용을 요약합니다.
한국어로 응답합니다.
"""}]

client = boto3.client(
service_name="bedrock-runtime",
region_name="us-west-2"
)
Comment on lines +10 to 13
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boto3 client is created on every function invocation. In AWS Lambda, clients should be initialized outside the handler function to leverage connection pooling and reduce cold start overhead. Consider moving the client creation outside the tos_summarize function or making it a module-level singleton.

Copilot uses AI. Check for mistakes.

return response.text
model_id = "us.anthropic.claude-sonnet-4-5-20250929-v1:0"
messages = [{
"role": "user",
"content": [
{"text": text_html}
]
}]

response = client.converse(
modelId=model_id,
system=system_instruction,
messages=messages,
)
Comment on lines +23 to +27
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Bedrock API call lacks error handling. If the API fails (e.g., throttling, service unavailable, invalid model ID), the Lambda will raise an unhandled exception. Add try-except blocks to catch botocore.exceptions.ClientError and return meaningful error messages.

Copilot uses AI. Check for mistakes.

print("TOS Summarization Response:")
print(response)
Comment on lines +29 to +30
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements are present in production code. These will clutter CloudWatch Logs with potentially large response objects containing sensitive data. Consider using proper logging with appropriate log levels (e.g., logging.debug()) or remove these statements for production.

Suggested change
print("TOS Summarization Response:")
print(response)
# If needed, use logging for debug information:
# import logging
# logging.debug("TOS Summarization Response: %s", response)

Copilot uses AI. Check for mistakes.

return response['output']['message']['content'][0]['text']
Loading