SO:Removing duplicate rows from data pulled from db in java

I’ve become an active member of stack overflow, both answering and making questions.
It’s amazing, there’s a whole community there that I hadn’t noticed before.
As I have not had time to post new items, I will be borrowing my answers to stackoverflow questions and re posting them here.


User Question:

So here’s my question. I have a giant text file of data and I need to input all of this data into a mySQL database fast through obviously using a java program. My only problem is that, the data is identified by a certain ID. Some of these ID’s have duplicates and contain all the same info as eachother. I would like to remove all of these for sorting purposes and clarity sake.

What would be the best way to go about this? If anyone could help I’d appreciate it

Answer:

clones/duplicates

clones/duplicates

While reading the data have a hashmap or hashset. check if the id exists in the hasmap/hashset and if so continue. otherwise enter in set/map and insert.

An aside: The difference between hashmap and hashset is hashset only takes values while hashmap takes key values. However, Hashset itself uses a hashmap within memory and just inserts a dummy object for values. See: Differences between HashMap and Hashtable?

Example with hashset:

    HashSet<Integer> distinctIds = new HashSet<Integer>();

    MyRowData rowdata;
    int rowID;

    while((rowdata = this.getRowData())!=null ) // or however you iterate over the rows using reader etc
    {
    rowID = rowdata.getRowID(); 

    if(!distinctIds.contains(new Integer(rowID)))
    {
      distinctIds.add(rowID);
      inertDataInMysql(rowdata); //however you insert your data here
      System.out.println("Adding " + rowID);
    }
    }

You can use batch insert to further speed up your code by executing a commutative insert for many rows. See:

Source: http://stackoverflow.com/questions/15965360/removing-entire-rows-from-data-based-on-duplicates-in-a-column/15965402#15965402



Menelaos Bakopoulos

Mr. Menelaos Bakopoulos is currently pursuing his PhD both at Center for TeleInFrastruktur (CTiF) at Aalborg University (AAU) in Denmark and Athens Information Technology (AIT) in Athens, Greece. He received a Master in Information Technology and Telecommunications Systems from Athens Information Technology and a B.Sc. in Computer Science & Management Information Systems from the American College of Thessaloniki. Since April 2008 he has been a member of the Multimedia, Knowledge, and Web Technologies Group.

More Posts