This article was translated by AI

Cleaning data with AI

The ONLYOFFICE Spreadsheet Editor allows you to use AI assistants to help clean and prepare your data for analysis. By combining AI capabilities with built-in data cleaning features, you can remove duplicate entries, standardize inconsistent formats, and fill missing values to ensure your data is accurate and ready for use.

By default, AI is not connected to editors for security reasons. To use AI for data cleaning, you need to install and configure the AI plugin first. Refer to the AI assistants: setup and configuration guide for detailed instructions.

The Spreadsheet Editor offers two approaches that work together:

  • AI-assisted cleaning — Use natural language prompts to identify issues, get recommendations, and generate cleaning formulas.
  • Built-in cleaning features — Use the Remove Duplicates tool, text functions, and Find & Replace for precise corrections.

Opening the AI prompt

To open the AI prompt in the Spreadsheet Editor, press Ctrl+/ (or +/ on macOS). Use this shortcut whenever you need to send a prompt to AI in the workflows described in this article.

Removing duplicates

The Spreadsheet Editor provides a built-in Remove Duplicates tool on the Data tab that allows you to eliminate duplicate rows based on selected columns. For detailed instructions on using this feature, refer to the article on removing duplicates.

You can also use Conditional formatting with the Duplicate/Unique rule to highlight duplicates for review before removal.

Using AI to identify and handle duplicates

AI can help you identify potential duplicates that aren't exact matches and develop strategies for handling them:

  1. Select the data range you want to check.
  2. Open the AI prompt.
  3. Ask AI for help with duplicates:
    • "What columns should I check to identify duplicate records in this data?"
    • "Are there any near-duplicates or similar entries that might be the same record?"
    • "How should I handle duplicate entries — keep first, keep last, or merge?"

Standardizing formats

Text functions for standardization

The Spreadsheet Editor provides several text functions for standardizing data:

FunctionPurposeExample
TRIM(text)Removes leading, trailing, and extra spaces=TRIM(" John Smith ") → "John Smith"
UPPER(text)Converts text to uppercase=UPPER("london") → "LONDON"
LOWER(text)Converts text to lowercase=LOWER("LONDON") → "london"
PROPER(text)Capitalizes first letter of each word=PROPER("john smith") → "John Smith"
CLEAN(text)Removes non-printable characters=CLEAN(A1)
SUBSTITUTE(text, old, new)Replaces specific text=SUBSTITUTE("U.S.","U.S.","USA") → "USA"
TEXT(value, format)Formats numbers/dates consistently=TEXT(A1,"YYYY-MM-DD")
Using AI to generate standardization formulas

Ask AI to create formulas for your specific standardization needs:

Example prompt: "Create a formula to standardize phone numbers in column A to the format (XXX) XXX-XXXX."

Example response: "Use this formula to standardize phone numbers: =TEXT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"-",""),"(",""),")",""),"(###) ###-####"). This removes existing formatting and applies the standard format. For phone numbers with country codes or varying lengths, you may need a more complex approach using nested IF statements or a helper column."

Using Find & Replace for bulk standardization

The Spreadsheet Editor provides a built-in Find and Replace (Ctrl+H) feature for simple text replacements across your data.

AI-assisted standardization workflow

For complex standardization tasks, follow this workflow:

  1. Identify issues: Open the AI prompt and ask AI to analyze a sample of your data and list formatting inconsistencies.
  2. Get recommendations: Ask AI which approach is best — formulas, Find & Replace, or manual correction.
  3. Generate formulas: If using formulas, ask AI to create the specific formula for your data.
  4. Apply in a new column: Create a new column with the standardized values using the formula.
  5. Verify results: Review the standardized data before replacing the original.
  6. Replace original data: Copy the standardized column, then use Paste SpecialValues to replace the original.

Filling missing values

Identifying missing values

Use AI to help identify and assess missing data in your spreadsheet:

  1. Select your data range.
  2. Open the AI prompt.
  3. Ask AI to analyze missing values:
    • "Which columns have missing values and how many?"
    • "What percentage of data is missing in each column?"
    • "Are there patterns in the missing data?"
Using conditional formatting to highlight blanks

To visually identify missing values:

  1. Select your data range.
  2. Go to Home tab → Conditional formattingBlank/Error.
  3. Choose Blanks to highlight empty cells.
  4. Select a formatting style (e.g., red fill) and click OK.
Using AI to recommend fill strategies

Ask AI for recommendations based on your specific data:

  1. For trend analysis: Use linear interpolation if the data is time-series. Create a formula that averages the values before and after each gap.
  2. For summary statistics: Fill with the median (more robust to outliers than mean): =IF(ISBLANK(C1), MEDIAN($C$1:$C$200), C1)
  3. For segmented analysis: If you have categories (e.g., regions), fill with the average for that specific category using AVERAGEIF.
  4. For conservative estimates: Fill with the minimum value in the range if you want to avoid overestimating.
  5. Exclude from analysis: For critical calculations, you might filter out rows with missing values rather than filling them.

Limitations and considerations

When using AI for data cleaning, keep the following in mind:

  • AI provides guidance, not automation: AI helps identify issues and recommend solutions, but you apply the cleaning steps using the Spreadsheet Editor's built-in features.
  • Context matters: AI recommendations are based on common patterns. Your specific business rules may require different approaches.
  • Verify before replacing: Always review cleaned data before overwriting originals, especially for large datasets.
  • Some cleaning requires judgment: Deciding whether near-duplicates are truly the same record often requires human knowledge of the data.
WarningData cleaning operations can permanently modify your data. Always work on a copy of your original spreadsheet. ONLYOFFICE assumes no responsibility for data loss resulting from cleaning operations. Use the Undo function (Ctrl+Z) immediately if you need to reverse changes.
Article with the tag:
Browse all tags