Multiple insert / update (bulk)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple insert / update (bulk)

Ken Chau
Has anyone here considered what it would be like to have something for bulk updates or inserts? Currently, the Zend_DB way is to:

$tb = new Table;

for ($i = 0; $i < 500; $i++) {
  $row = $tb->createRow();
  $row->blah = 'blah';
  $row->save();
}

// I propose something like this

$count = 5; // count of rows
$tb = new Table;

$rowset = $tb->createRowSet($count);
$rowset->save();
Reply | Threaded
Open this post in threaded view
|

Re: Multiple insert / update (bulk)

Ken Chau
The real power comes when you can do bulk inserts and updates where all the rows contain different data. So, this will cause ONE insert statement to occur vs. many inserts.

lqqkout4elfy wrote
Has anyone here considered what it would be like to have something for bulk updates or inserts? Currently, the Zend_DB way is to:

$tb = new Table;

for ($i = 0; $i < 500; $i++) {
  $row = $tb->createRow();
  $row->blah = 'blah';
  $row->save();
}

// I propose something like this

$count = 5; // count of rows
$tb = new Table;

$rowset = $tb->createRowSet($count);
$rowset->save();
Reply | Threaded
Open this post in threaded view
|

Re: Multiple insert / update (bulk)

Ken Chau
I assume the sequence (auto_increment) field can be ignored and will be automatically incremented on save(). I would imagine after save, you would be able to retrieve the ids then.


lqqkout4elfy wrote
The real power comes when you can do bulk inserts and updates where all the rows contain different data. So, this will cause ONE insert statement to occur vs. many inserts.

lqqkout4elfy wrote
Has anyone here considered what it would be like to have something for bulk updates or inserts? Currently, the Zend_DB way is to:

$tb = new Table;

for ($i = 0; $i < 500; $i++) {
  $row = $tb->createRow();
  $row->blah = 'blah';
  $row->save();
}

// I propose something like this

$count = 5; // count of rows
$tb = new Table;

$rowset = $tb->createRowSet($count);
$rowset->save();
Reply | Threaded
Open this post in threaded view
|

Re: Multiple insert / update (bulk)

Ken Chau
Bill K, can you take a look at this suggestion? I wouldn't even mind making this in Zend_DB, but I just want to know who's in charge now?
Reply | Threaded
Open this post in threaded view
|

Re: Multiple insert / update (bulk)

BillKarwin
Ken Chau wrote
Bill K, can you take a look at this suggestion? I wouldn't even mind making this in Zend_DB, but I just want to know who's in charge now?
Wil Sinclair is the project lead for the Zend Framework.  I no longer work for Zend.  However, I am happy to answer questions as I have time.  Here's my opinion on this issue:

Rowsets are basically a collection object, so I would add methods to that class to allow rows to be added to the set.  So you should be able to do this:

  $rowset = $table->createRowset(); // creates a rowset collection with zero rows
  $row = $table->createRow(); // creates one row with unset values
  $rowset->addRow($row); // adds one row to the rowset
  $rowset->save(); // iterates over the set of rows, calling save() on each row

It makes no sense to pass an integer to createRowset() to create N empty rows.  You would just have to iterate through them to populate them with values anyway.  So you might as well write a loop to create and populate individual rows with application data, and then add them to the collection.

  $rowset = $table->createRowset();
  foreach ($appData as $tuple) {
    $row = $table->createRow($tuple);
    $rowset->addRow($row);
  }
  $rowset->save();

It does make sense to allow an array of arrays to be passed to createRowset(), since this would be consistent with the usage of passing a tuple to createRow().

  $rowset = $table->createRowset($appData); // pass array of tuples

This would perform the same loop as the previous example above (except for the save() at the end), creating a new rowset of new rows, ready to be save()d.

There are two ways in SQL to improve the efficiency of inserting data:  

1. Use a single INSERT statement with multiple rows:

  INSERT INTO t (col1, col2, col3) VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9);

2. Prepare an INSERT statement and execute it multiple times:

  PREPARE INSERT INTO t (col1, col2, col3) VALUES (?, ?, ?);
  EXECUTE 1, 2, 3
  EXECUTE 4, 5, 6
  EXECUTE 7, 8, 9

However, supporting either of these improvements would add complexity to the Row and Rowset classes.  This is due to the internal way the current Zend_Db_Table_Row class differentiates between a row that needs to be INSERTed or UPDATEd when you call save().  This distinction is encapsulated by the Row object, so the Rowset doesn't know if the individual rows are new rows or modified copies of existing rows.  Therefore for the Rowset class to offer a multi-row save() method that uses more efficient SQL, the management of dirty data would have to be totally refactored.  The easier solution is for the Rowset to iterate over its rows, calling save() on each one.  This is better for OO encapsulation, though it doesn't help optimize SQL for inserting a rowset.

In any case, it's really rare to bulk-load many rows of data in a typical web request, when there's the greatest need for efficient SQL.  The difference in efficiency for a small number of rows is small, so it would be a noticeable improvement only if you're bulk-loading a huge number of rows.  If that's the case, you shouldn't be using INSERT anyway, you should be using MySQL's LOAD DATA statement, or equivalent feature if you use another RDBMS brand.  INSERT is not usually the most efficient choice for loading lots of data.

Regarding returning auto-generated keys, I wouldn't bother.  Notice that if you use plain SQL (in the mysql CLI for example), and you insert multiple rows in a single INSERT statement, you can only get the last generated id value, not the id values for all rows inserted.  This is SQL behavior; it's true for any language or any framework.

  INSERT INTO t (col1, col2, col3) VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9);
  SELECT LAST_INSERT_ID(); -- returns only the id for the third tuple

If you do need the id for each row, you should write a loop and insert the rows one at a time, retrieving the generated id after each row inserted.