How to Translate Excel Files Using Python and ChatGPT (2026 Guide, With Working Code)

Translate Excel ChatGPT

Last updated: June 2026 — tested with the OpenAI Python SDK v1.x and the Chat Completions API.

This tutorial uses the Chat Completions API because it is simple and widely supported. For new applications you may also consider OpenAI's Responses API and structured outputs.

Two ways to translate an Excel file

Before writing a single line of code, it helps to be honest about which approach actually fits you:

  • Write a Python script (this guide). Best if you want to automate translation across many files, run it on a schedule, or build it into a larger pipeline. You'll need an OpenAI API key and some basic Python.
  • Use a ready-made tool. If you just need to translate one spreadsheet right now — with charts, formulas, and styling preserved automatically — uploading it to Doc2Lang's Excel translator is faster and needs no code at all.

This guide focuses on the Python approach and gives you complete, working code. We'll also point out exactly where the do-it-yourself route gets tricky, so you can decide with your eyes open. There's a full comparison near the end.

Prerequisites

Before we start, you'll need a few things:

  • Basic Python knowledge. This article includes code. Knowing some Python helps, but we'll explain each step.
  • Python installed. If you don't have it yet, download it from the official website.
  • pip. This installs Python libraries. If you have Python 3, you almost certainly already have pip.
  • An OpenAI API key. Create one in your OpenAI account dashboard. The same key works with all current models (gpt-4o, gpt-4o-mini, and so on).
  • Your Excel file. Have the .xlsx file you want to translate ready.

Set your API key as an environment variable so it never lives in your code:

macOS / Linux:

export OPENAI_API_KEY="your_api_key_here"

Windows PowerShell:

$env:OPENAI_API_KEY="your_api_key_here"

How an .xlsx file is structured (optional background)

When you save a spreadsheet as .xlsx, you're really packaging several files into one archive. The format is called OpenXML. If you're curious, you can rename the file to .zip, unzip it, and look inside:

  • xl/worksheets/ — one XML file per sheet (sheet1.xml, sheet2.xml, …). This is where your rows, columns, and cell data actually live.
  • xl/styles.xml — every style in the workbook: which cells are bold, which are blue, which numbers are formatted as currency, and so on.
  • xl/sharedStrings.xml — to save space, Excel stores each unique string once here and references it wherever it's used. If "Total" appears 1,000 times, it's stored once.
  • xl/workbook.xml — the table of contents: which sheets exist, their order, and properties like sheet protection.

Libraries like openpyxl hide most of this from you, but understanding the structure helps when you need advanced operations or have to troubleshoot.

Windows Explorer showing the contents of an unzipped report.xlsx file

A report.xlsx file unzipped: the worksheets, shared strings, styles, and workbook structure are just XML files inside a zip archive.

Reading and writing Excel files with openpyxl

openpyxl is a Python library built specifically for reading and writing Excel files (.xlsx, .xlsm, .xltx, .xltm). It lets you work directly with sheets and cells.

One thing to be clear about up front: openpyxl is excellent for cell-level work, but it is not a full-fidelity Excel layout engine. Shapes, some images, charts, macros, and rich text inside a single cell may not be preserved when you open and re-save a file. For macro-enabled .xlsm files, load with load_workbook(..., keep_vba=True) and save back as .xlsm, otherwise macros may be lost. We'll come back to these limits at the end.

1. Install openpyxl

pip install openpyxl

2. Core concepts

A few concepts map directly onto openpyxl:

Workbook — an Excel file as a whole.

from openpyxl import load_workbook
 
workbook = load_workbook(filename="sample.xlsx")

Sheet — a workbook contains one or more sheets.

sheet = workbook.active          # the active sheet
another_sheet = workbook["Sheet2"]  # a sheet by name

Cell — where a row and column meet. This is where data lives.

cell_value = sheet["A1"].value   # read
sheet["B1"] = "Hello, Excel!"    # write

Rows and columns — iterate over them easily:

for row in sheet.iter_rows(values_only=True):
    for value in row:
        print(value)

Translating text with the OpenAI API

After loading your data with openpyxl, the next step is translating it with the OpenAI API.

1. Install the OpenAI Python library

pip install openai

2. A modern translation function (2026)

import os
from openai import OpenAI
 
# Read the key from an environment variable — never hard-code it in your script.
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
 
def translate_text(text, target_lang="Japanese"):
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # fast and cost-effective; use "gpt-4o" for the best quality
        messages=[
            {
                "role": "system",
                "content": (
                    f"You are a professional translator. Translate the user's text "
                    f"into {target_lang}. Return only the translation, with no extra commentary."
                ),
            },
            {"role": "user", "content": text},
        ],
    )
    return response.choices[0].message.content.strip()

Heads up: Many older tutorials use openai.ChatCompletion.create() with model="gpt-4". That syntax is from the pre-1.0 SDK and no longer works with openai>=1.0. The code above uses the current client.chat.completions.create() interface. Choose gpt-4o-mini for low cost and speed, or gpt-4o when you need maximum translation quality.

3. A naive first attempt

The most common tutorial approach is to loop over every cell and overwrite it:

from openpyxl import load_workbook
 
workbook = load_workbook(filename="your_file.xlsx")
 
for sheet in workbook.worksheets:
    for row in sheet.iter_rows():
        for cell in row:
            if isinstance(cell.value, str):
                cell.value = translate_text(cell.value)
 
workbook.save("your_translated_file.xlsx")

This works in a demo, but it has real problems on actual spreadsheets: it sends one API call per cell (slow and expensive), it translates formulas and breaks them, and it has no protection against rate limits. Let's fix all of that.

Making it production-ready

1. Don't translate formulas

If a cell holds a formula like =SUM(A1:A10), translating it will corrupt the spreadsheet. Skip formulas, empty cells, and non-text values:

def is_translatable(cell):
    # Skip empty cells, numbers, and formula cells.
    value = cell.value
    return (
        isinstance(value, str)
        and value.strip() != ""
        and cell.data_type != "f"   # "f" = formula
    )

openpyxl only rewrites cell.value, so number formats, fonts, fills, and most styling stay intact as long as you leave non-text cells alone.

2. Cache repeated strings to cut cost

Spreadsheets repeat the same labels constantly ("Total", "Date", a department name). Translate each unique string only once:

cache = {}
 
def translate_cached(text, target_lang="Japanese"):
    if text not in cache:
        cache[text] = translate_text(text, target_lang)
    return cache[text]

On a typical report this alone can cut your API calls — and your bill — by a large margin.

3. Batch cells into one request

One API call per cell is the biggest source of slowness. Send many strings in a single request instead, asking the model to return JSON so you can map results back reliably:

import json
 
def translate_batch(texts, target_lang="Japanese"):
    numbered = {str(i): t for i, t in enumerate(texts)}
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    f"You are a professional translator. Translate each value in the "
                    f"JSON object into {target_lang}. Keep the same keys. "
                    f"Return only a JSON object."
                ),
            },
            {"role": "user", "content": json.dumps(numbered, ensure_ascii=False)},
        ],
        response_format={"type": "json_object"},
    )
    result = json.loads(response.choices[0].message.content)
    return [result[str(i)] for i in range(len(texts))]

4. Retry on rate limits (exponential backoff)

When you hit a rate limit, don't crash — wait and try again, doubling the delay each time:

import time
 
def with_retry(func, *args, retries=5, **kwargs):
    for attempt in range(retries):
        try:
            return func(*args, **kwargs)
        except Exception as e:
            wait = 2 ** attempt
            print(f"Error: {e}. Retrying in {wait}s...")
            time.sleep(wait)
    raise RuntimeError("Translation failed after several retries.")

5. Handle untrusted files safely

If you process Excel files uploaded by other people, protect against malicious XML. Install defusedxml, which openpyxl will use automatically when present, and validate the file's size and type before loading it:

pip install defusedxml

This is optional for your own files, but strongly recommended for any file you didn't create yourself.

Complete working script

Here's everything combined into one script you can copy and run. It collects unique strings, translates them in batches with retries, caches the results, skips formulas, and saves to a new file so your original stays safe.

import json
import os
import time
from openai import OpenAI
from openpyxl import load_workbook
 
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
 
TARGET_LANG = "Japanese"
MODEL = "gpt-4o-mini"   # use "gpt-4o" for higher quality
BATCH_SIZE = 40
 
 
def is_translatable(cell):
    value = cell.value
    return (
        isinstance(value, str)
        and value.strip() != ""
        and cell.data_type != "f"   # "f" = formula
    )
 
 
def translate_batch(texts, target_lang=TARGET_LANG, retries=5):
    numbered = {str(i): t for i, t in enumerate(texts)}
    system = (
        f"You are a professional translator. Translate each value in the JSON object "
        f"into {target_lang}. Keep the same keys. Return only a JSON object."
    )
    for attempt in range(retries):
        try:
            response = client.chat.completions.create(
                model=MODEL,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": json.dumps(numbered, ensure_ascii=False)},
                ],
                response_format={"type": "json_object"},
            )
            result = json.loads(response.choices[0].message.content)
            return [result[str(i)] for i in range(len(texts))]
        except Exception as e:
            wait = 2 ** attempt
            print(f"Error: {e}. Retrying in {wait}s...")
            time.sleep(wait)
    raise RuntimeError("Translation failed after several retries.")
 
 
# 1. Load the workbook
workbook = load_workbook(filename="your_file.xlsx")
 
# 2. Collect every unique, translatable string
unique_texts = set()
for sheet in workbook.worksheets:
    for row in sheet.iter_rows():
        for cell in row:
            if is_translatable(cell):
                unique_texts.add(cell.value)
unique_texts = list(unique_texts)
 
# 3. Translate in batches and build a {original: translation} cache
cache = {}
for start in range(0, len(unique_texts), BATCH_SIZE):
    chunk = unique_texts[start:start + BATCH_SIZE]
    cache.update(dict(zip(chunk, translate_batch(chunk))))
 
# 4. Write translations back, leaving formulas and styling untouched
for sheet in workbook.worksheets:
    for row in sheet.iter_rows():
        for cell in row:
            if is_translatable(cell):
                cell.value = cache[cell.value]
 
# 5. Save to a new file
workbook.save("your_translated_file.xlsx")
print("Done!")

That's a complete, practical translator in well under 100 lines — fast, formula-safe, and far cheaper than the cell-by-cell version.

Optional: hardening for real workloads

The script above is already practical. For larger or recurring jobs, three more safeguards help.

Don't swallow every exception. Retrying on every exception means an invalid API key or a wrong model name will also be retried five times. Retry only on rate limits, and fail fast on everything else:

from openai import RateLimitError
 
def translate_batch(texts, target_lang=TARGET_LANG, retries=5):
    numbered = {str(i): t for i, t in enumerate(texts)}
    system = (
        f"You are a professional translator. Translate each value in the JSON object "
        f"into {target_lang}. Keep the same keys. Return only a JSON object."
    )
    for attempt in range(retries):
        try:
            response = client.chat.completions.create(
                model=MODEL,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": json.dumps(numbered, ensure_ascii=False)},
                ],
                response_format={"type": "json_object"},
            )
            result = json.loads(response.choices[0].message.content)
            missing = [str(i) for i in range(len(texts)) if str(i) not in result]
            if missing:
                raise ValueError(f"Missing translations for keys: {missing}")
            return [result[str(i)] for i in range(len(texts))]
        except RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
    raise RuntimeError("Translation failed after several retries.")

This version also validates that the model returned every key it was given, so a truncated or malformed response fails loudly instead of silently dropping cells.

Keep order when de-duplicating. Use ordered de-duplication instead of a set, so logs and error messages stay predictable:

# was: unique_texts = list(unique_texts)  with a set()
unique_texts = list(dict.fromkeys(collected_texts))

Python script vs. Doc2Lang: which should you use?

Writing your own script is powerful, but it has real limits. Here's an honest comparison:

Python (DIY)Doc2Lang
Formatting & chartsYou handle it yourself; easily brokenPreserved automatically
FormulasWrite your own conditions to skip themSkipped automatically
What you needOpenAI API key + codingJust upload the file
Best forDevelopers automating bulk or recurring jobsAnyone who needs one file translated now
Product spec sheet translated by Doc2Lang with all formatting preserved

A product spec sheet translated by Doc2Lang — tables, columns, colors, and sheet tabs are all preserved, exactly as in the original.

Try the Excel translator — no API key, no code.

The rule of thumb: if you want to automate recurring translation jobs, build the script above. If you just need to translate a spreadsheet right now without writing code, use the Excel translator — upload, translate, download, with formatting kept intact.

Related guides

Frequently asked questions

What if I get a rate-limit error?
OpenAI enforces rate limits based on your account tier. Use the exponential-backoff retry shown above, reduce your batch size, or upgrade your account tier if you process large volumes.

How do I translate into several languages at once?
Loop over a list of target languages and run the whole process once per language, saving each result to its own file (for example report_ja.xlsx, report_de.xlsx).

Does using the OpenAI API cost money?
Yes. You pay per token used. Caching repeated strings and batching requests (both shown above) keep costs low. gpt-4o-mini is dramatically cheaper than the larger models for everyday translation.

Will this preserve my formatting and formulas?
The script only changes text cell values, so basic cell styles — fonts, fills, borders, alignment, and number formats — are usually preserved, and formulas are skipped. However, openpyxl is not a full-fidelity Excel layout engine. Macros, shapes, charts, images, rich text inside a single cell, and complex workbook features may not be preserved perfectly. For .xlsm files, load with keep_vba=True and save back as .xlsm. If pixel-perfect layout matters more than automation, a dedicated tool handles these cases automatically.

Conclusion

With openpyxl and the OpenAI API you can build a fast, formula-safe Excel translator in well under 100 lines of code — a great fit when you need to automate translation at scale.

If your priority is layout preservation rather than automation, a dedicated Excel translation tool may be a better fit — it handles charts, shapes, and styling that openpyxl can't.

Translate an Excel file without writing code