Bill Mill web site logo

MySQL Partial Dump

At work, I'm trying to bring our testing from its current ad-hoc state into something more useful. Priority #1 at the moment is making what limited test suite we have into a practical, repeatable test of our main code library.

To do so, however, I need to create script to import a useful subset of the data from our nine gigabyte MySQL production database into a local database. Unfortunately, mysqldump seems not to support the export of only N rows from a database table, and neither does there seem to be a standard way to do so. In fact, there seems to be no standard way to automatically generate insert queries, so it's not even possible to hack it with OUTFILE tricks.

Indeed, a quick look look at the mysqldump source code shows that it just constructs a query of the form: "SELECT * FROM table [WHERE condition] [ORDER BY field]":

dynstr_append_checked(&query_string, 
    "SELECT /*!40001 SQL_NO_CACHE */ * FROM ");
dynstr_append_checked(&query_string, result_table);

if (where)
{
  /* snip */
  dynstr_append_checked(&query_string, " WHERE ");
  dynstr_append_checked(&query_string, where);
}
if (order_by)
{
  /* snip */
  dynstr_append_checked(&query_string, " ORDER BY ");
  dynstr_append_checked(&query_string, order_by);
}

And then pieces together an insert string manually:

while ((row= mysql_fetch_row(res)))
{
/* snip */
  for (i= 0; i < mysql_num_fields(res); i++)
  {
    /* snip */
    if (row[i])
    {
      if (!IS_NUM_FIELD(field))
      {
        /* snip */
        unescape(md_result_file, row[i], length);
        /* snip */
      }
      else
      {
        /* change any strings ("inf", "-inf", "nan") into NULL */
        char *ptr= row[i];
        /* snip */
        else if (my_isalpha(charset_info, *ptr) ||
                 (*ptr == '-' && my_isalpha(charset_info, ptr[1])))
          fputs("NULL", md_result_file);
        else if (field->type == MYSQL_TYPE_DECIMAL)
        {
          /* add " signs around */
          fputc('\'', md_result_file);
          fputs(ptr, md_result_file);
          fputc('\'', md_result_file);
        }
        else
          fputs(ptr, md_result_file);
      }
    }
  }
}

So, rather than deal with hacking the rather baroque C (in a proprietary source control that I've never used, no less) to add another option to limit output rows, I threw together a little python function that uses MySQLdb to do what I want:

"""
get_n_rows retrieves n rows from a given table on MySQLdb connection conn

conn:  open MySQLdb connection object
table: string; name of the table which we are generating queries for
n:     int; number of rows to output from table

Legal kwargs:
    reverse: bool; whether to return rows in reverse order (requires idCol)
    idCol:   string; results will be sorted on this column if reverse is 
             True
    where:   string; provides the "where" clause of the select query
             NOTE: escape this text yourself! use connection.escape() .

Released under WTFPL (http://sam.zoy.org/wtfpl/) - use this code as you 
wish. Note that it's not tested at all, and certainly won't handle blob 
fields. That said, it worked for my purposes today.

bill.mill@gmail.com 8/17/07
"""
def get_n_rows(conn, table, n, **kwargs):
    cur = conn.cursor()

    #TODO: copy mysqldump's table escaping function
    sql = "select * from `%s` " % (table)
    if "where" in kwargs:
        sql += "where %s " % (kwargs["where"])
    if "reverse" in kwargs and kwargs["reverse"]:
        sql += "order by `%s` desc " % (kwargs["idCol"])
    sql += "limit %s;"
    cur.execute(sql, (n, ))

    i = 0
    row = cur.fetchone()
    insert_stmts = [] 
    while row and i < n:
        i += 1
        #note that the MySQLdb source suggests that conn.literal is 
        # private; I see no problem with using it, as it just loads the 
        # default escaping functions. Caveat Emptor.
        strow = ", ".join([conn.literal(o) for o in row])
        insert_stmts.append("(%s)" % (strow))
        row = cur.fetchone()

    cur.close()
    
    return """
-------- INSERT DATA FOR TABLE %(table)s -----------
    
    LOCK TABLES `%(table)s` WRITE;
    DELETE FROM `%(table)s`;
    INSERT INTO `%(table)s` VALUES %(sql)s;
    UNLOCK TABLES;

-------- END INSERT DATA FOR TABLE %(table)s -----------
""" % {"table": table, "sql": ",".join(insert_stmts)}

On a side note, the MySQLdb docs are somewhat hard to track down; I found them after some effort here.