At work, I'm trying to bring our testing from its current ad-hoc state into something more useful. Priority #1 at the moment is making what limited test suite we have into a practical, repeatable test of our main code library.
To do so, however, I need to create script to import a useful subset of the data from our nine gigabyte MySQL production database into a local database. Unfortunately, mysqldump seems not to support the export of only N rows from a database table, and neither does there seem to be a standard way to do so. In fact, there seems to be no standard way to automatically generate insert queries, so it's not even possible to hack it with OUTFILE tricks.
Indeed, a quick look look at the mysqldump source code shows that it just constructs a query of the form: "SELECT * FROM table [WHERE condition] [ORDER BY field]":
dynstr_append_checked(&query_string,
"SELECT /*!40001 SQL_NO_CACHE */ * FROM ");
dynstr_append_checked(&query_string, result_table);
if (where)
{
/* snip */
dynstr_append_checked(&query_string, " WHERE ");
dynstr_append_checked(&query_string, where);
}
if (order_by)
{
/* snip */
dynstr_append_checked(&query_string, " ORDER BY ");
dynstr_append_checked(&query_string, order_by);
}
And then pieces together an insert string manually:
while ((row= mysql_fetch_row(res)))
{
/* snip */
for (i= 0; i < mysql_num_fields(res); i++)
{
/* snip */
if (row[i])
{
if (!IS_NUM_FIELD(field))
{
/* snip */
unescape(md_result_file, row[i], length);
/* snip */
}
else
{
/* change any strings ("inf", "-inf", "nan") into NULL */
char *ptr= row[i];
/* snip */
else if (my_isalpha(charset_info, *ptr) ||
(*ptr == '-' && my_isalpha(charset_info, ptr[1])))
fputs("NULL", md_result_file);
else if (field->type == MYSQL_TYPE_DECIMAL)
{
/* add " signs around */
fputc('\'', md_result_file);
fputs(ptr, md_result_file);
fputc('\'', md_result_file);
}
else
fputs(ptr, md_result_file);
}
}
}
}
So, rather than deal with hacking the rather baroque C (in a proprietary source control that I've never used, no less) to add another option to limit output rows, I threw together a little python function that uses MySQLdb to do what I want:
"""
get_n_rows retrieves n rows from a given table on MySQLdb connection conn
conn: open MySQLdb connection object
table: string; name of the table which we are generating queries for
n: int; number of rows to output from table
Legal kwargs:
reverse: bool; whether to return rows in reverse order (requires idCol)
idCol: string; results will be sorted on this column if reverse is
True
where: string; provides the "where" clause of the select query
NOTE: escape this text yourself! use connection.escape() .
Released under WTFPL (http://sam.zoy.org/wtfpl/) - use this code as you
wish. Note that it's not tested at all, and certainly won't handle blob
fields. That said, it worked for my purposes today.
bill.mill@gmail.com 8/17/07
"""
def get_n_rows(conn, table, n, **kwargs):
cur = conn.cursor()
#TODO: copy mysqldump's table escaping function
sql = "select * from `%s` " % (table)
if "where" in kwargs:
sql += "where %s " % (kwargs["where"])
if "reverse" in kwargs and kwargs["reverse"]:
sql += "order by `%s` desc " % (kwargs["idCol"])
sql += "limit %s;"
cur.execute(sql, (n, ))
i = 0
row = cur.fetchone()
insert_stmts = []
while row and i < n:
i += 1
#note that the MySQLdb source suggests that conn.literal is
# private; I see no problem with using it, as it just loads the
# default escaping functions. Caveat Emptor.
strow = ", ".join([conn.literal(o) for o in row])
insert_stmts.append("(%s)" % (strow))
row = cur.fetchone()
cur.close()
return """
-------- INSERT DATA FOR TABLE %(table)s -----------
LOCK TABLES `%(table)s` WRITE;
DELETE FROM `%(table)s`;
INSERT INTO `%(table)s` VALUES %(sql)s;
UNLOCK TABLES;
-------- END INSERT DATA FOR TABLE %(table)s -----------
""" % {"table": table, "sql": ",".join(insert_stmts)}
On a side note, the MySQLdb docs are somewhat hard to track down; I found them after some effort here.