Canadian Mac Forums at ehMac banner
1 - 20 of 30 Posts

·
Registered
Joined
·
777 Posts
Discussion Starter · #1 ·
Hi ehmac people...

I have been trying to get Excel to purge duplicate records, have checked numerous references online on how to do this, using the Advance Filter option, select one column, filter in place, unique records only... such a simple task yet it fails.

I find, any time I need to use Office for the slightest of tasks it fails me each and every time.

I try to avoid using any Microsoft apps but need to get this done tonight if possible, if anyone could contact me.

Thanks!

Andrew
 

·
Registered
Joined
·
777 Posts
Discussion Starter · #3 ·
Hi krs, yes, that's the basic simple process that is supposed to work, only it doesn't.
I am using the latest Excel for Mac. It is filtering the phone numbers in their own column, all formatted the same way.
So I'm still stuck with an excel sheet I can't use and need to layout the contents today!
 

·
Registered
Joined
·
641 Posts
I have been trying to get Excel to purge duplicate records, have checked numerous references online on how to do this, using the Advance Filter option, select one column, filter in place, unique records only... such a simple task yet it fails.
Advanced filter doesn't "purge" duplicates--it "filters" them.

Two possibilities:
1) Use the option to copy to another location. This will give you a copy of your original list only excluding the duplicates. Do whatever calcs you want on that result.

2) Use a function that respects the filtered status: SUBTOTAL(). (Check Help for details.)

BTW, I think your griping about Excel is over the top. The tools are there and work--you just don't know how to use them. I'd be interested to know if Numbers has an easier way to handle this type of situation.

Craig
 

·
Premium Member
Joined
·
9,103 Posts
So are these all phone numbers wher you want to eliminate duplicate records?

What's the format of the phone number? xxx-xxx-xxxx?

If you give me some basic info I could try it out on the older versions of Excel or maybe the process doesn't work with certain formats.
 

·
Premium Member
Joined
·
9,103 Posts
1) Use the option to copy to another location. This will give you a copy of your original list only excluding the duplicates. Do whatever calcs you want on that result.
Well - I just tried that with Excel 2004 and 15 telephone numbers of which three were duplicates.
Didn't work - all numbers were copied to the new location.

Then tried it with only ten-digit numbers as phone numbers, no hyphen separating NPA and NXX codes, still didn't work.

From what I can see, that function does not work the way it's described by Microsoft.
 

·
Registered
Joined
·
641 Posts
Well - I just tried that with Excel 2004 and 15 telephone numbers of which three were duplicates.
Didn't work - all numbers were copied to the new location.

Then tried it with only ten-digit numbers as phone numbers, no hyphen separating NPA and NXX codes, still didn't work.

From what I can see, that function does not work the way it's described by Microsoft.
You didn't mention checking the box for 'Unique records only'. Also, is it JUST a list of phone numbers or do you have other columns of data? Did you start by selecting the data range (A1 to A16) or something else?

I'm at a Windows box right now but it does work for me. If you turn on AutoFiltering for that range, do you see a de-duped list in the drop-down?

I use this feature quite regularly on both Windows and Mac versions of Excel and it works are described. I used the following as a quick check using 10 digit numbers. You can see the input has one dupe and the output doesn't.

Code:
Number		Number
9055551212	9055551212
4165551212	4165551212
9055551212
Craig
 

·
Registered
Joined
·
777 Posts
Discussion Starter · #9 ·
In all fairness to me... I'm not a newbie, I've used Office for decades, and have contacted Microsoft on numerous occasions about the shortcomings of their software and had them apologize to me because of it, though they still don't update their software to correct this... we're talking about back to 1990.

I have to filter the list in place so copying to another location doesn't help here. Copying to another location simply puts just the filtered numbers in a new column without any connection to the data in the other columns - I need them to remain with all of the other info.

To clarify, I have columns A-I, the phone numbers are in column I. I put the list of duplicates at the bottom of the original list and have the phone numbers in the same column. There are 14434 records and a list of 1955 duplicates, so I should end up with 12479 if all numbers in the delete list are duplicates.

I use the Advanced Filter on column I which should purge the list of the duplicates - read Microsoft's own kb directions as shown in an earlier post - this is what is not working.

Anyways, thanks for your info.
 

·
Registered
Joined
·
777 Posts
Discussion Starter · #10 ·
Hi WCraig... can I get you to look at the list and perform this simple action for me? Pretty please?
 

·
Premium Member
Joined
·
9,103 Posts
You didn't mention checking the box for 'Unique records only'. Also, is it JUST a list of phone numbers or do you have other columns of data? Did you start by selecting the data range (A1 to A16) or something else?

I'm at a Windows box right now but it does work for me. If you turn on AutoFiltering for that range, do you see a de-duped list in the drop-down?

I use this feature quite regularly on both Windows and Mac versions of Excel and it works are described. I used the following as a quick check using 10 digit numbers. You can see the input has one dupe and the output doesn't.

Code:
Number		Number
9055551212	9055551212
4165551212	4165551212
9055551212
Craig
Craig,

I must have tried this operation about a dozen times in a half-dozen different ways - could not get it to work even once.

Yes - I did turn on auto filtering for that range and I did check the unique records only box.
My entries were 12 numbers in a 5x3 matrix with three duplicated and when I did the filtering all fifteen numbers showed up in the resulting 15-cell portion of that spreadsheet at the location I specified.
The numbers were all just 10-digit numbers, I took the hyphen out but that made no difference.

I then looked at the last example I linked to -



In the example, I would have expected that function to also filter the duplicate "Susa" but it didn't, so I decided that the filtering was done on a per row basis.
Included a duplicate row in my test - still didn't folter it out.

I had no other data on that spreadsheet, selected only the 15 cells that each had a ten-digit number in it - the duplicates were in some cases in cells to the right in the same row and in others in cells in the same column and also in just other cells in the same 3x5 matrix I selected.
The function does not work in those cases.

You have all the entries in a single column - I never tried that since I wanted to test the more general case not knowing the layout the OP was working with.
 

·
Premium Member
Joined
·
9,103 Posts
OK - Just tried it with a single column.

For that it works for me as well, same as you demonstrated.

But it sure doesn't work for the more general case.
 

·
Registered
Joined
·
641 Posts
krs

Sorry, but it IS working as designed. In your posting, rows 2 and 5 contain identical data in all fields. Therefore the "records" are duplicates and one is omitted from the output listing starting at cell G4. The date field in row 4 is different from rows 2 and 5 and therefore that record is NOT a duplicate.

If the date field should not be considered in duplicate checking, then you need to do something to exclude it--delete it, define the 'database' range to exclude it, or something else.

The "Advanced Filter" function is pretty limited--but I use it regularly to get a unique list of values in a certain field in a 'database'. Most often, I create a new tab, copy the full column of data and then use Advanced Filter to extract a list of the unique values. I virtually always then delete the column with the original full list and use the de-duped list plus sumif() and vlookup() formulas for whatever summary report I need.

The other way to summarize a big database is with PivotTable(s). They have their own frustrating limitations but they are great for quickly summarizing, sorting and presenting information.

Finally, a true database program can do all this stuff and do it against data sets that far exceed a spreadsheet's capacity. In my career, I've worked with quite a few--starting with card-based input to IBM mainframe systems (yikes!). They all do the same sort of stuff: selecting, sorting, calculating, summarizing and outputting. Intuitive and friendly to new users NEVER goes along with fast, powerful and flexible, in my experience.

BTW, I worked with MacAndy offline. Turns out Advance Filter was never going to work for him. He needed to 'flag' rows in his data that contained telephone numbers that were known to be out of service or 'bad'. It was complicated because some of the telephone numbers were entered as numbers and others as text. After a bit of data cleansing and a vlookup() function, we were pretty much in business.

Craig
 

·
Premium Member
Joined
·
9,103 Posts
Sorry, but it IS working as designed. In your posting, rows 2 and 5 contain identical data in all fields. Therefore the "records" are duplicates and one is omitted from the output listing starting at cell G4. The date field in row 4 is different from rows 2 and 5 and therefore that record is NOT a duplicate.
Well, maybe it is working "as designed" but then calling this "How to find duplicates in Excel" is an overstatement.

When I read "finding duplicates in Excel" then I assume that's exactly what this function does - I select a list range (which in my case was just a 3x5 matrix of ten-digit numbers and I expected this function to find the duplicate numbers in those 15 cells and only display any duplicate or triplicate numbers once in the results.
But that isn't what the function does - it assumes that each cell in a row is part of the same record.
So OK, misunderstanding on what the function actually does.

But what I then don't understand - after I made one of the rows of five entries identical to a row above - why would the function not identify that row as a duplicate record?
I'll have to play with this a bit more in the morning.

I had another interesting Excel experience earlier tonight -

I have a single column of data that follows a pattern where the data is displayed in groups of three, say first name, last name, telephone number.
I want to rearrange that so that first name, last name and telephone number are shown in one row rather than below each other, so a single column of 99 entries would become three columns of 33 entries each.
This is something I run into all the time and I would have thought that Excel provides a function to do that transformation. However, it's not a transformation I can find in Excel.

Rather than use an elaborate formula to do this conversion (which is what you find when searching for this on google), I ended up pasting the single column of data into MSWord, then use the MSWord feature "text to table" which does this transformation nicely and then paste the whole thing back into Excel.
 

·
Registered
Joined
·
641 Posts
... but then calling this "How to find duplicates in Excel" is an overstatement.
The Microsoft article is titled: "How to remove duplicate records or create list of unique records in Excel".


...I have a single column of data that follows a pattern where the data is displayed in groups of three, say first name, last name, telephone number.
I want to rearrange that so that first name, last name and telephone number are shown in one row rather than below each other, so a single column of 99 entries would become three columns of 33 entries each.
This is something I run into all the time and I would have thought that Excel provides a function to do that transformation. However, it's not a transformation I can find in Excel.

Rather than use an elaborate formula to do this conversion (which is what you find when searching for this on google), I ended up pasting the single column of data into MSWord, then use the MSWord feature "text to table" which does this transformation nicely and then paste the whole thing back into Excel.
There are ways to do it in Excel--eg save as a text file and use the import wizard to split; OR use left(), mid(), right() functions to parse into separate columns; then use Paste Special, Transpose to flip rows and columns. But I'd probably do it in Word, too. Unless there were 20,000 rows. Then I'd use Excel. If there were 200,000 rows, I might use TextWrangler, Applescript or VBA, or a shell script. If there were 2,000,000 rows, something else. No one tool does everything well all the time.

Craig
 

·
Registered
Joined
·
2,227 Posts
The Microsoft article is titled: "How to remove duplicate records or create list of unique records in Excel".



There are ways to do it in Excel--eg save as a text file and use the import wizard to split; OR use left(), mid(), right() functions to parse into separate columns; then use Paste Special, Transpose to flip rows and columns. But I'd probably do it in Word, too. Unless there were 20,000 rows. Then I'd use Excel. If there were 200,000 rows, I might use TextWrangler, Applescript or VBA, or a shell script. If there were 2,000,000 rows, something else. No one tool does everything well all the time.

Craig
But there is one type of tool that would handle 20,000 rows, 200,000 rows or even 2,000,000 rows.

You've mentioned quite a few applications, but you didn't mention any that would be considered (in my opinion) the correct type of application.

A Database application.

I've gone on record as saying that while I love Excel (Pages more but that's just me), it has to be the #1 misused software application ever created. Excel is a SPREADSHEET. It is not a database application. For that we have products like Bento, FileMaker, Access on the Windows side, etc.

Things like phone numbers (as per your example) are not numbers, they are text strings with unique formatting characteristics. Database programs can be designed to handle these situations. Spreadsheet applications are out of their league when it comes to things like this.

Using an application like Excel to deal with data is the computer equivalent to using a knife as a screwdriver. We've all done it and I'll bet that the majority of ehMac readers have the prove in their cutlery drawer. At least one knife has the telltale bend marks on the blade where it was used in a pinch to tighten or loosen a screw.

Next time choose the right type of tool for the job. Spreadsheets are for mathematical numbers. Text columns for headers, but from cell B2 out, the majority of cells should contain mathematical numbers and nothing else.
 

·
Registered
Joined
·
641 Posts
Ah, a database snob. ;-)

A DBMS is certainly a good tool in the right circumstances (as I pointed out in post 13 in this thread). If you're dealing with a lot of data that is highly consistent (no pesky missing values or mis-formatted items) and especially if you need a recurring process that will be run regularly, by all means invest the extra time to define a database solution.

Maybe it offends your vision of conceptual purity, but there are lots of cases where using a spreadsheet to manage data is far simpler, easier, faster, or more flexible or accessible.

Craig
 

·
Registered
Joined
·
2,227 Posts
Ah, a database snob. ;-)

A DBMS is certainly a good tool in the right circumstances (as I pointed out in post 13 in this thread). If you're dealing with a lot of data that is highly consistent (no pesky missing values or mis-formatted items) and especially if you need a recurring process that will be run regularly, by all means invest the extra time to define a database solution.
Craig
If anything, it is for those situations where you do have missing values or mis-formatted items that benefit from the use of a database application, along with some simple scripts.

Ah, a database snob. ;-)
Maybe it offends your vision of conceptual purity, but there are lots of cases where using a spreadsheet to manage data is far simpler, easier, faster, or more flexible or accessible.

Craig
Is this one of those cases?:D

The user has been fighting with this for almost two full days now.

I don't consider myself a database snob. I have the same complaint about the use of Excel for things that should be done in Word. How many 2-3 column single page documents have you ever seen done in Excel simply because the person couldn't figure out, or wouldn't take the time to learn how to set up 2-more columns in Word?

I've seen the same thing when PowerPoint was used as a way of creating something that should have been done as a PDF.

We are creatures of habit, and as I get older and deal with adult learners, the saying "you can't teach an old dog new tricks" becomes more true each day. Sometimes I am amazed that we were able to sit in school for 5-6 hours each day and learn anything.

We are typically reluctant to try new things and can be amazingly stubborn. We stand on a chair when a ladder it the correct tool. We use the back of something instead of going and getting a hammer. We struggle with 3-4 bags when it would be better to make two trips. We walk into the grocery store and either grab a hand basket, or ignore even that and then ending up buying enough groceries to fill a shopping cart. We stretch a power cord across the floor rather than getting out an extension cord and running it out of the way.

BTW, try dragging that spreadsheet file on top of a FileMaker app's icon and see what happens.
 

·
Premium Member
Joined
·
3,899 Posts
Brute force method:
First, before doing any manipulation with Excel data I recommend adding a column to your data with a serial number, so you can return to your original sort order at any time.
Lets call the column with your phone numbers in it P
Add a column Q with the calculation in cell Q2: =If (P2=P1, "Dup", "") Copy this calculation all the way down the data.
Add a column R (for Results)
Then, select all of your data range and Sort on P ascending
You will now have all the duplicates flagged with "Dup". However this is of limited value because as soon as you resort the data, the Dup flags will be wrong again.
So, highlight the data as it is now in Column Q and Copy
Then, put the cursor into the top data cell in column R and do a Paste Special... Values only. Paste Special Values Only divorces the formula from the data, and so freezes the Dup flags on the data rows so it permanently represents the correct Dup status of each row
Sort by column R, sub sort by your serial field, and you will now have the uniques and the duplicates separated
Optionally, delete the "Dup" data rows
 
1 - 20 of 30 Posts
Top