OpenOffice.org Help TNT
Tips'n'Tricks that somehow didn't make it to the help (yet).
All | Calc | Concepts | Cool Features | General | Help Development | Impress | Macros | Writer

« Hamburg, Heart of... | Main | Multi Language Spell... »
20060804 Friday August 04, 2006
Removing Duplicate Rows in Calc
How do I select all unique values from a column in an OOo Calc spreadsheet? I want to remove all redundant rows from a report with thousands of records.

There is no automatic function to remove duplicate rows. Follow these steps to delete all rows that have duplicate values in column A:
  1. Select all cells of the current data range.
    On most systems, you can click any cell inside the data range, then press Ctrl+Multiplication key on the numeric keypad.
  2. Sort the data range by column A.
    Choose Data - Sort.
  3. Click an empty cell in the first row. Let's assume it is cell C1. Enter the formula:
    =IF(A1=A2;1;0)
    This will display 1 if the current row has the same value in column A as the next row. It will display 0 if the values are different.
  4. Copy the formula down for all rows of the data range.
    Drag the lower right edge of the cell C1 down to the last row.
  5. Now the formulas must be replaced by their values to freeze the contents.
    While the column C is still highlighted, press Ctrl+C to copy all selected cells to the clipboard.
  6. Press Shift+Ctrl+V to open the Paste Special dialog box.
    In the Selection area, enable only the Numbers command; disable the other Selection commands. Click OK.
  7. Select the whole data range including the new column C and sort the range by column C.
    Choose Data - Sort.
  8. Select all rows which have a value 1 in column C, then press Del key.
  9. Optional steps: Delete column C. Select the remaining rows and sort them again by column A.
Tipp: This method can also be used to remove duplicate text lines from a Writer document. Copy the text to a Calc spreadsheet. Remove the duplicates, then copy and paste back as "unformatted text" to Writer.
Trackback URL: http://blogs.sun.com/oootnt/entry/removing_duplicate_rows_in_calc
Comments:

Press Shift+Ctrl+V to open the Paste Special dialog box. In the Selection area, enable only the Numbers command; disable the other Selection commands. Click OK. That's great!!

Posted by Noah on August 09, 2006 at 03:43 PM CEST #

This was VERY useful for me. I had a spreadsheet of 4800 rows of data and this saved a whole lotta time for me.
I would like to add if you want to check for duplicate rows where more than 1 column matches, you can do
=IF((A1=A2)  AND  (B1=B2)  AND  (C1=C2);1;0)
etc Thanx once again.

Posted by Richard on December 05, 2006 at 09:45 PM CET #

This one worked for me: Have you tried the standard filter (Data > Filter > Standard filter)? It has a "No duplicate" option and an option to copy the filtered list to another cell range. Taken from http://www.oooforum.org/forum/viewtopic.phtml?p=197646

Posted by Alex Deriziotis on January 12, 2007 at 02:04 PM CET #

Thank you, Alex, for the pointer to Data - Filter - Standard filter. This will compare whole rows and remove duplicate identical rows. On the other hand, the method described in this tip can be used to remove rows that have the same value in just one column (or in some columns, thank you, Richard, for your tip).

Posted by Ufi on January 12, 2007 at 04:24 PM CET #

I found that pasting the =IF(A1=A2;1;0) function in a seperate column helped quite a bit. I am unsure of how to make function variables persistent. So I copied all the results to a text editor and pasted it back in that column. After sorting on the column with all the 1 and 0's I removed any row with a 1 in that column. I then had to re-sort the data. Mine was timestamped so no big deal.

Posted by Shane R. Spencer on March 20, 2007 at 12:18 AM CET #

This was insanely useful.. thank you so much! This saved me many, many hours!

Posted by Josh on April 09, 2007 at 04:34 PM CEST #

The "No Duplicate" option was very usefull. Thanks!

Posted by Gabe on August 16, 2007 at 08:45 PM CEST #

Careful about this, removing duplicates and generating a unique list are two different things here. This assumes that you'll always have at least 1 duplicate pair. For example;
[CODE]|[RES],
A120|1,
A120|0,
A121|0,
A122|1,
A122|0,
..So, sorting by RES and deleting where RES=0, will delete A121.

Posted by Nick on October 26, 2007 at 07:56 AM CEST #

Great help... thanks for the stright forward setps!

Posted by Rick Denison on April 23, 2008 at 07:29 PM CEST #

I was trying to figure out how to sort all this data between 2 spreadsheets removing duplicates. Thanks for the formula help!

Posted by Jeremy K. on June 26, 2008 at 01:40 AM CEST #

for fancy footwork on rows and columns, like removing duplicates etc. in openorg calc save in csv, go to http://home.hccnet.nl/s.j.francke/software/software.htm and download and run "CSVed for editing your csv files..." and Bob (snr) is your uncle.
enjoy, I did!

Posted by Gabriël Smit on September 26, 2008 at 12:33 AM CEST #

I was able to get as far as pasting the result of the formula in a new column as number only. However, Calc conked out on me when I tried to sort by that column, deleting most of the rows. This was on a spreadsheet with 4200 rows of data. I think maybe it was too much for it, I am running a very powerful machine so I can't put the blame on the memory or processor. I was using OpenOffice 2.4.1

Posted by Joel on October 03, 2008 at 10:43 PM CEST #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed
Archives
« October 2008
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today

XML


Send us an email

Links
Referrers