Archive

Archive for the ‘Tabular Data Management’ Category

How can you update a big table efficiently?

Facts:

I have a big table in MySQL with roughly 5 million rows. The schema of this table is simple and I hope to update the “entities” column entirely.

table big_tech:

image

Solution 1: update each row directly

update big_tech set entities='…' where postid='…'

MySQL does not provide a batch update operation. You have to update each row one by one. Each if you update multiple rows in a transaction, this will not make too much difference.

The problem: it is fast in the beginning of updating, but the speed will slow down as the time rolls on.

Solution 2: use batch insert

  • Create a new table with simple columns, and insert into this table using batch inserts supported by MySQL natively.

image

  • Then make a join update with table big_tech. The problem: Overall, this solution will spend less time. However, the batch insertion is not efficient enough after the temp table grows big.

Solution 3: output to a CSV file and then load into database

  • Output “postid”, ”entities” into a CSV file, which costs nearly no time;
  • Load this CSV file into a temp table in database;
  • Make a join update between temp and big_tech.
  • It is worth noting that the encoding of the CSV file should be the same as the encoding of database in international language environment. Or else the loading will be problematic when special chars are met.

    Conclusion:
    Sometimes, the most efficient way to do things is very simple. No matter what kind of approaches, they are just tools with their specific pros and cons; it is the job of designers to utilize these tools appropriately to make the things done efficiently.
Advertisements