- Home
- AI
- User guides
- Cleaning data with AI
Cleaning data with AI
The ONLYOFFICE Spreadsheet Editor allows you to use AI assistants to help clean and prepare your data for analysis. By combining AI capabilities with built-in data cleaning features, you can remove duplicate entries, standardize inconsistent formats, and fill missing values to ensure your data is accurate and ready for use.
The Spreadsheet Editor offers two approaches that work together:
- AI-assisted cleaning — Use natural language prompts to identify issues, get recommendations, and generate cleaning formulas.
- Built-in cleaning features — Use the Remove Duplicates tool, text functions, and Find & Replace for precise corrections.
Opening the AI prompt
To open the AI prompt in the Spreadsheet Editor, press Ctrl+/ (or ⌘+/ on macOS). Use this shortcut whenever you need to send a prompt to AI in the workflows described in this article.
Removing duplicates
The Spreadsheet Editor provides a built-in Remove Duplicates tool on the Data tab that allows you to eliminate duplicate rows based on selected columns. For detailed instructions on using this feature, refer to the article on removing duplicates.
You can also use Conditional formatting with the Duplicate/Unique rule to highlight duplicates for review before removal.
Using AI to identify and handle duplicates
AI can help you identify potential duplicates that aren't exact matches and develop strategies for handling them:
- Select the data range you want to check.
- Open the AI prompt.
-
Ask AI for help with duplicates:
- "What columns should I check to identify duplicate records in this data?"
- "Are there any near-duplicates or similar entries that might be the same record?"
- "How should I handle duplicate entries — keep first, keep last, or merge?"
Standardizing formats
Text functions for standardization
The Spreadsheet Editor provides several text functions for standardizing data:
| Function | Purpose | Example |
|---|---|---|
| TRIM(text) | Removes leading, trailing, and extra spaces | =TRIM(" John Smith ") → "John Smith" |
| UPPER(text) | Converts text to uppercase | =UPPER("london") → "LONDON" |
| LOWER(text) | Converts text to lowercase | =LOWER("LONDON") → "london" |
| PROPER(text) | Capitalizes first letter of each word | =PROPER("john smith") → "John Smith" |
| CLEAN(text) | Removes non-printable characters | =CLEAN(A1) |
| SUBSTITUTE(text, old, new) | Replaces specific text | =SUBSTITUTE("U.S.","U.S.","USA") → "USA" |
| TEXT(value, format) | Formats numbers/dates consistently | =TEXT(A1,"YYYY-MM-DD") |
Using AI to generate standardization formulas
Ask AI to create formulas for your specific standardization needs:
Example prompt: "Create a formula to standardize phone numbers in column A to the format (XXX) XXX-XXXX."
Example response: "Use this formula to standardize phone numbers: =TEXT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"-",""),"(",""),")",""),"(###) ###-####"). This removes existing formatting and applies the standard format. For phone numbers with country codes or varying lengths, you may need a more complex approach using nested IF statements or a helper column."
Using Find & Replace for bulk standardization
The Spreadsheet Editor provides a built-in Find and Replace (Ctrl+H) feature for simple text replacements across your data.
AI-assisted standardization workflow
For complex standardization tasks, follow this workflow:
- Identify issues: Open the AI prompt and ask AI to analyze a sample of your data and list formatting inconsistencies.
- Get recommendations: Ask AI which approach is best — formulas, Find & Replace, or manual correction.
- Generate formulas: If using formulas, ask AI to create the specific formula for your data.
- Apply in a new column: Create a new column with the standardized values using the formula.
- Verify results: Review the standardized data before replacing the original.
- Replace original data: Copy the standardized column, then use Paste Special → Values to replace the original.
Filling missing values
Identifying missing values
Use AI to help identify and assess missing data in your spreadsheet:
- Select your data range.
- Open the AI prompt.
-
Ask AI to analyze missing values:
- "Which columns have missing values and how many?"
- "What percentage of data is missing in each column?"
- "Are there patterns in the missing data?"
Using conditional formatting to highlight blanks
To visually identify missing values:
- Select your data range.
- Go to Home tab → Conditional formatting → Blank/Error.
- Choose Blanks to highlight empty cells.
- Select a formatting style (e.g., red fill) and click OK.
Using AI to recommend fill strategies
Ask AI for recommendations based on your specific data:
- For trend analysis: Use linear interpolation if the data is time-series. Create a formula that averages the values before and after each gap.
- For summary statistics: Fill with the median (more robust to outliers than mean):
=IF(ISBLANK(C1), MEDIAN($C$1:$C$200), C1) - For segmented analysis: If you have categories (e.g., regions), fill with the average for that specific category using
AVERAGEIF. - For conservative estimates: Fill with the minimum value in the range if you want to avoid overestimating.
- Exclude from analysis: For critical calculations, you might filter out rows with missing values rather than filling them.
Limitations and considerations
When using AI for data cleaning, keep the following in mind:
- AI provides guidance, not automation: AI helps identify issues and recommend solutions, but you apply the cleaning steps using the Spreadsheet Editor's built-in features.
- Context matters: AI recommendations are based on common patterns. Your specific business rules may require different approaches.
- Verify before replacing: Always review cleaned data before overwriting originals, especially for large datasets.
- Some cleaning requires judgment: Deciding whether near-duplicates are truly the same record often requires human knowledge of the data.