Custom VB.Net scripting in SQL Server Integration Services

May 21st, 2008
Get the power of Visual Basic.Net within your SQL Server Integration Services (SSIS) package by adding the Script component. In this tip, you'll get the step-by-step method on how to add a Script component and extend any SSIS package.

SQL Server backup and restore commands to limit downtime

May 21st, 2008
Whatever SQL Server backup and restore solution you choose, you must be sure to back up your data in an acceptable amount of time and restore it in an acceptable amount of time. SQL Server MVP Hilary Cotter shares some best practices for using backup and restore commands.

MySQL Workbench and NetBeans

May 21st, 2008

I just returned from a meeting with the great Sun people in Prague. The main topic of our discussions was the database support in NetBeans and their MySQL support in general. To have the best possible database integration in all of Sun’s products is one thing that is high on my personal agenda.

As the current resources working on the NetBeans database integration are limited we looked into utilizing what we have done in MySQL Workbench to get a more powerful database interface into the hands of NetBeans users. But there are a few things that make this approach quite difficult. First, the two tools are using different development languages. Just to name a few. NetBeans is written in Java while MySQL Workbench is written in C/C+. MySQL Workbench does not include any query functionality yet and we are still working on the multi-platform support.

Still, we had good discussions and are currently working out a plan to overcome all these obstacles and help NetBeans users have an improved database experience.

While in Prague, Kaj Arnö and I gave talks at the Prague University. While Kaj was covering all MySQL related topics I was focusing on MySQL Workbench of course. People there asked a lot of questions and the feedback we got was pretty good.

We are currently working to get the next bug-fix release of WB 5.0 built. The release should happen soon.

In parallel we are making progress with WB 5.1, namely the Linux port. Maksym has GRT module loading working and is continuing to write makefiles and adjustments to the code to making everything compile. Stay tuned.

How Todd Hoff learned to stop worrying and use lots of disk space to scale

May 21st, 2008

Todd Hoff, who apparently learned a hell of a lot during a short stint at Yahoo followed by some startups has an extremely well-written and edutaining article about how scaling to a million or more users requires jettisoning more or less everything we know and love about relational modeling.

Even though he uses bigtable (Google’s distributed hash storage system) as his example, in reality this approach works well with relational datastores like MySQL and Oracle too, you just have to think about your data differently and use the databases differently. So I’m including this article in the MySQL and Oracle categories because I think it would be of interest.

Here’s a taste of how it reads:

How do you structure your database using a distributed hash table like BigTable? The answer isn’t what you might expect. If you were thinking of translating relational models directly to BigTable then think again. The best way to implement joins with BigTable is: don’t. You–pause for dramatic effect–duplicate data instead of normalize it. *shudder*

Flickr anticipated this design in their architecture when they chose to duplicate comments in both the commentor and the commentee user shards rather than create a separate comment relation. I don’t know how that decision was made, but it must have gone against every fiber in their relational bones…

But Flickr?s reasoning was genius. To scale you need to partition. User data must spread across the shards. So where do comments belong in a scalable architecture?

The answer is, in case you aren’t following yet, you store it everywhere you might need it and worry about keeping your multiple copies in sync later, if at all.

BigTable data ethics are more Mardi Gras than dinner with the in-laws. Data just wants to have fun. BigTable won?t stop you from hurting yourself. And to get the best results you may have to engage in some conventionally risky behaviors. But if those are the glass bead necklaces you have to give for a peak at scalability, why not take a walk on the wild side?

So anyway, this is awesome stuff and thanks Todd. For your reading and learning enjoyment: Todd Hoff’s “How I learned to stop worrying and use lots of disk space to scale”.

snafu with MySQL relay log path - the why and the fix

May 21st, 2008
Referred to by Launchpad Bug #119271 and MySQL Bug#28850, MySQL installations get bitten after an upgrade, if they were acting as a replication slave. However, the actually root cause is not an upgrade.

If you simply set up say Ubuntu Feisty, you'll encounter the same problem. If you set up as a slave, the server uses a relay log. In the affected versions, its put under /var/run. That's a serious snafu, because /var/run is generally on tmpfs and a) very small, and b) gets wiped on a restart. Only runtime foo like .pid files should be under /var/run (as per LSB, Linux Standards Base).
Anyway, the "gets wiped on restart" is where new installations get bitten, although the error is of course the same as on an upgrade where the path changes from /var/lib/mysql: the server simply can't find the relay logs it thought it had.
If you have a perfectly running replication slave, shut it down and restart the machine, you'll have broken replication anyway. One of my students encountered this in the MySQL Replication Workshop last week, just after another student had brought up an error they'd spotted in the logs of their production system.

The issue has only recently been resolved in Ubuntu (5.0.51a), upstream (MySQL has it pushed for 5.0.54), and the problem appears to be fairly prolific among the various distros. CentOS also has something on this. From the comments in all of these, it appears that the basic problem is, although simple, not actually that well understood. The MySQL relay logs should NOT be deleted (or vanish through other means) on a restart. That's all. Simple.

So how did it happen? I first thought the prob was restricted to Debian (from which Ubuntu is derived), but the Debian/Ubuntu my.cnf files have not changed and actually don't contain a relay-log entry. The problem originated upstream at MySQL where the default path for the PID file was changed from the datadir to /var/run/mysqld. That in itself was a correct move (again, for LSB compliance), but elsewhere in the code the base path for the relay log got derived from the path of the PID file, and that's where the trouble originates.
The change of the PID default path was not really noticed anywhere, as most distros already put it in /var/run/mysqld through their default my.cnf file. But since none of them explicitly specifies a relay-log path, it gets put where the compiled-in defaults tell it to go. Kaboom.

So, coding error. The path should not have been derived from the PID base, but programmers are human ;-)
I do wonder why MySQL's QA didn't spot this, they're a fairly thorough bunch.

Another thing to note is that if you have an affected version (any distro, or direct from MySQL), the quick fix is to just put some extra lines in your my.cnf:
# We're fixing up the paths for the relay log infrastructure (repl.slaves)
# 2008-05-21 by a r j e n (at) o p e n q u e r y (dot) c o m (dot) a u
# NOTE 1: adapt the filenames to whatever they currently are on disk!
# The filenames may depend on your hostname or distro specifics.
# NOTE 2: If you built your MySQL server from source,
# or if you installed from the binary tarball,
# your data path will be different from /var/lib/mysql,
# such as /usr/local/mysql/data. Check and adapt.
relay-log = /var/lib/mysql/relay-bin
# let's do these too, just in case
relay-log-index = /var/lib/mysql/relay-bin.index
relay-log-info-file = /var/lib/mysql/relay-bin.info
Then your replication slave universe should return to operate within normal parameters. Essentially, it un-breaks replication ;-)

Actually I think that relay-log-info-file probably used the datadir or the binary log base as its default path already, and that would be why the server would think that the relay logs are somewhere when they're not: the relay-log.info file would contain the current active log filename and position. With the log files disappearing.... you understand. But just in case, make it all explicit and thus prevent problems of this nature.

OpenOffice.org MySQL connectivity: update

May 21st, 2008
A while ago I wrote about native MySQL connectivy for OpenOffice.org which Georg Richter had spent a lot of time on, but which then got stuck in some licensing foo. That blog entry generated a number of responses from former MySQL colleagues, noting that something was in the works. A bit later Georg himself commented:
Arjen,

sorry, but I didn't find the time to blog about - I had to write some code :-)

The license of the code will be LGPL 3.0 (like OpenOffice) - it still uses libmysql. The code will be available via cvs.openoffice.org pretty soon.

If you're interested in testing some binaries, check out the new preview version at http://forge.mysql.com/wiki/Connector_OpenOffice

/Georg
So that's cool, and everybody please do check it out!

It's still not clear to me how this actually works in terms of licensing, if it still uses libmysql. Ohwell, I guess it doesn't really matter to users of OOo (LGPL+GPL is fine for us) and the most important thing is that it'll be in the main OOo codebase soon. How Sun makes it jive for StarOffice is entirely their problem ;-)

Stuff I did not know about… POSIX join command

May 21st, 2008
One of the comments to last week's scribble on joins referred to the POSIX join command. What does this mean? There is a commandline tool called 'join' available on most Unix-ish boxes, including Linux and OS X. From my own Mac ("man 1 join"):
NAME
join -- relational database operator

SYNOPSIS
join [-a file_number | -v file_number] [-e string] [-o list] [-t char]
[-1 field] [-2 field] file1 file2
The commenter also mentions this article which explores the command some more with examples.

It's not MySQL, but it looks very useful. Not all data should be in a db, sometimes a textfile is perfectly sensible, and sometimes you just need an operator like this. And it already exists. Coolness.

Arjen gets interviewed

May 21st, 2008
Sam Varghese of iTWire invited me for a broad interview when I was teaching in Melbourne last week, it's now published (MySQL: the Australian connection).

Customers versus users: a distinction

May 21st, 2008

I just got around to reading Stephen O’Grady’s post on the relative openness of open source vendors and realized I had failed to be as clear as I could have been in my original post on the subject.

Responding to my note about Milking the GNU’s suggestion that a new independent organization could be formed to judge vendors on their level of openness, Stephen wrote:

“Those in the industry that might care have, I would argue, already formed their opinions on whether or not a project such as MySQL?s is or is not open source. And those outside the industry, well, I don?t expect they?d care. At all. Most of the enterprises I speak with are still struggling with the basics of what open source is and what it means.”

He is of course right that most enterprise adopters would not care about an openness index, and in fact such a thing could actually cause more harm than good by confusing potential adopters. However enterprises were not the potential audience that I (or I believe MTG) envisioned for what MTG called the Equitable Open Source label.

The potential audience I was referring to in the post was the “community edition” user base, which is a distinct* audience from paying customers and one that needs to be addressed differently by vendors. It is this audience that Stephen was referring to as “those in the industry that might care”.

As to whether they have already made up their minds about the openness of specific vendors, things can and do change and that is why vendors may find that a self-imposed social contract or externally-imposed openness index becomes a ongoing factor in community (as opposed to customer) relations.

*They do, of course potentially overlap - one can move from being a community user to a paying customer and back again, while a paying customer can also simultaneously be a community user, depending on the implementation - however the needs of that customer/user differ depending on the nature of their relationship with the vendor with respect to that implementation.

MySQL: How do you install innotop to monitor innodb in real time?

May 21st, 2008
Innotop is a very useful tool to monitor innodb information in real time. This tool is written by Baron Schwartz who is also an author of “High Performance MySQL, Second edition” book. [Side note: I highly recommend getting this book when it comes out (in June, 08?). Other authors include: Peter Zaitsev, Jeremy [...]