[digg=http://digg.com/security/Is_trying_to_fix_E_SMTP_really_worth_it_part_1]
This one question has been in my mind for quite some time already. I mean, everyone uses SMTP (knowingly or not) when sending out emails and everyone sending emails also knows what SPAM is and receives SPAM messages.
However, few know how old SMTP actually is, and that, even though it serves everyone well, it has been designed in a time when everyone was thinking of Spam as canned meat. Back in 1982 SMTP was a great achievement and a lot of kudos should go to its creators, but now, in 2008, SMTP has become more of a liability than a great tool.
Originally, I wanted to write a single article covering all shortcomings of SMTP and possible solutions to these problems, but while writing the article a lot of text came up, so this is the first of two articles I am going to write on this topic. The first part is about the problems with SMTP and how fix-ups for SMTP are, even though they do work to some extent, a proper solutions to today's issues.
Due to the way SMTP was designed and the way the Internet was back then it is prone to various things, like SPAM messages, sender spoofing, data manipulation and so forth. A few attempts have been made at fixing some of the shortcomings of SMTP, like ESMTPA (SMTP-AUTH) or SPF, Callback Verification, and DKIM, but none of them has really fixed all problems that exist and all of these modifications are in my opinion mere workarounds. Let us have a look at why both SPF and DKIM fail to fix the all problems SMTP has right now.
SPF
First let's have a quick look at how SPF works: When an email is received a special TXT DNS record at the sender's domain is used to verify that the sending computer (using its IP address) of an email is actually allowed to send email for a domain. A great mechanism that in theory would work perfectly well. Reality is a bit different though.
There are some domains which are actually using SPF and do have valid SPF records on their DNS servers. However, those are only some of the millions of domains on the Internet. How should one treat emails coming from a domain without SPF record? The messages could be real, non-SPAM messages, that should be delivered, but on the other hand these could be SPAM messages. Also, the more people start using SPF, the more likely it becomes that spammers are simply going to use sender-domains which do not have SPF records.
Also, there are some organizations that have domains with improperly configured SPF records and there are even well-known ones, such as Technorati (I covered this in one of my articles). So one cannot even trust SPF records and valuable messages could be lost if a mail server is configured to drop all messages for which SPF authentication fails.
And there is a third problem: Sending emails from other places than your default one (office, home, etc.) and ISPs not allowing external users to use their SMTP servers (even not with authentication). A good example of this would be Austrian ISP UPC (their SMTP server tells me that the AUTH extension was not advertised, even though it was; long story short, I cannot log in from outside) and I am quite sure there are a lot of others.
And I can come up with yet another problem: What about email relaying? Think about downloading all messages from all your email accounts into a single one, using fetchmail for example. This makes SPF useless, as no checks can be done anymore, due to the sending system's IP address not being the original sender. If one assumes that every mail server uses SPF this is not a problem, but I like doing my checks on my server rather than relying on some other server.
Maybe there are even more problems with SPF, such as what to do when an email is received from a nonexistant domain or when there is a temporary DNS failure on either side, but the ones listed above are those I am confronted with most often.
Callback Verification
On to the next topic. Callback verification is a simple method used by mail servers to try verifying that the sender actually exists. Whilst this works for some SPAM messages which use non-existent senders, it does not help much as soon as the sender address does exist, and it does not even matter if the message was actually sent by the user owning the address. I guess there is nothing to add, even though it is a nice method to get rid of some spam it does not help with a lot of such messages.
DKIM
DKIM, or DomainKeys Identified Mail (originally named DomainKeys) is a method that is not meant to prevent abuse (such as SPAM), but rather to make tracking abuse easier. It works by the sender (usually the MSA of the sender on behalf of him) adding another header, "DKIM-Signature", which contains a cryptographic signature of the message body. The signature is generated using public-key cryptography, where the public key is stored in a DNS TXT record and can thus be used by the receiving end to verify that the message contents have not been tampered with during transport and that the mail actually originated from that domain.
This method, even though being one of the most advanced ones today, is prone to replay attacks and does not protect from tampering with message headers. In short this means that even though the message body cannot be modified without the receiving end detecting the modification, the headers can, and thus the message can be redirected. Also, it is possible to intercept the transmission of a message, generate a thousand messages with the same content but a different recipient and this way flood a mailbox with a message that would stand DKIM verification.
The email relaying problem that's present in SPF is not a problem here anymore, but the mobile-mail, the nonexistent domain and the DKIM-not-in-use problem still exist.
Also, DKIM seems not to be used by a lot of email servers on the Internet. Thinking about it for a second I can just come up with two names of well-known organizations using DKIM: Google and Yahoo.
The message format
The next part of this article is about the SMTP message format. This part is not directly related to the SPAM problem, but should provide you with some more information that verifies that SMTP is outdated nowadays.
RFC822 messages (or emails) usually consist of two parts: a message header and a message body. Originally these messages were designed to contain 7bit-encoded ASCII data, which is plain text. This means that there were only 128 different characters which could be transferred via email, without support of special characters, like German "Umlaut" characters. A solution has then be developed, not only to support special characters in emails, but also to support transferring of binary data (such as images).
MIME is the name of this solution, and it enables every one of us to send binary attachments and special characters via email today. MIME allows the email client to include more than just 7-bit plain text messages, including attachments. This is achieved by special header, "MIME-Version", which indicates that the contents of a message are MIME encoded. This header is then followed by a "Content-Type" header, identifying the type of content. For simple messages just consisting of a message body this would be "text/plain", telling the client that there is just text in the mail.
However, how can emails then consist of both text and attachments? Well, there is a special value for the "Content-Type" header: "multipart/mixed". This one indicates that there are several parts of a message, and every part comes with a separate "Content-Type" header. This way contents of a message can be organized in a tree, for example, containing the message body and a forwarded message.
An attachment is added by specifying an additional part of the message, usually with a "Content-Transfer-Encoding: base64" header, that says that the data has been base64 encoded. This way binary data can be represented using 7-bit ASCII.
But what does that mean? First of all, even though a message is split into several parts, there is only one body. Now if you are downloading a message via POP3 for example, there is no way of only downloading the actual text. You always need to download the whole message. Everyone knows this situation: You are downloading a message and have to wait for all attachments to be downloaded, even though you might not be interested in those attachments at all.
Also, encoding binary data using base64 creates a lot of overhead, as every byte (which can have 256 different values and corresponds to a single letter of text in 7-bit ASCII) is represented using only 64 possible values. Talking numbers here this means that messages encoded using base64 are usually 137% the size of the data they contain.
Conclusion
I hope that I have shown you what the problems with SMTP are right now. SMTP initially was designed to transport only text and had no way of verifying either the sender of a message or the integrity of data. Some workarounds have been created to get rid of these issues, but even though some helped a lot, none has really fixed any of those problems.
Also, one should never forget how much time and money has been used to try fixing SMTP, whilst a lot less money might have been sufficient for creating something new, something better, something that is built for the needs of the Internet today, and not for the needs of the Internet back in 1982.
Personally I believe that the days of SMTP are long over and that there is need of a proper replacement. I do understand that SMTP and the current email infrastructure are still in use because an infrastructure exists, but SMTP really deserves being retired, after serving us pretty well for more than 25 years.
The next part in this series of articles will be about what my idea of a successor of SMTP and the whole email infrastructure is, what it could look like and also how it could work. So stay tuned.
I finally did it. I modified my Exim's configuration to reject any mail with an OOXML attachment (ie. docx, pptx, xlsx).
There are two main reasons for this step. First of all I am not able to open these files and I believe I will not be able to do so and get them properly rendered anytime soon. Secondly, people using the new Microsoft Office suite seem to be ignorant enough to think everyone is able to view those files, which is not the case.
I am trying to make one point here:
People sending emails to other people should always send files in internationally standardized formats (open formats), such as ODF or PDF, so that everyone is able to open them and use the attachments. Also, I am trying to make people sending out emails in those formats aware of the fact that not everyone can open them, not everyone wants to invest a lot of money in new applications and that some people generally prefer Free Software and that there is no way of using those files using Free Software right now.
Enough for the introduction, I wanted to explain how to achieve this behavior using Exim4:
deny message = Message contains attachment of unwanted type ($found_extension)
demime = docx:pptx:xlsx
The Software Freedom Law Center, known for providing pro bono legal assistance to Free Software projects, announced the formation of Moglen Ravicher LLC, a law firm also providing services to for-profit clients.
"We are pleased to extend the services of the Software Freedom Law Center to companies that support software freedom," said Eben Moglen, founding director of SFLC.
Moglen Ravicher LLC is fully owned by the Software Freedom Law Center, and all profits will go to support SFLC's operations. Clients of Moglen Ravicher LLC will receive legal counsel from the same attorneys that staff the Software Freedom Law Center.
An initial client of Moglen Ravicher LLC is OpenNMS, an open source enterprise grade network management platform. OpenNMS has retained the firm for representation regarding violations of the GNU General Public License (GPL).
Just in case you do not know yet: today is Document Freedom Day.
Today is Document Freedom Day: Roughly 200 teams from more than 60 countries worldwide are organising local activities to raise awareness for Document Freedom and Open Standards.
I found a solution to the problem last described in this article.
To sum the problem I was experiencing up: My anti-spam system (namely Spamassassin) did not detect spam mails anymore.
Now here is the reason it did not: After some more investigation of the problem I noticed that spam emails were received via a local connection (forwarded from fetchmail). However, one of my Exim ACLs says not to scan emails from localhost for spam.
So, the solution might be a hack, but it worked out perfectly. Starting fetchmail with the -S <servername> argument causes it to send emails to the given SMTP server rather than localhost. Using the real hostname of my server caused the "do not scan local mails" not to kick in and all mails received via fetchmail to be scanned again.
Problem fixed.
And yet another post today. As I am planning to take down my personal server in the next few weeks (maybe months) I have moved my blog to wordpress.com. A 301-redirect has been set up at http://sp.or.at/blog so people (and robots) are still able to find my blog.
As I was looking into problems with my mail server I noticed one more thing: I was wondering why I did not receive password recovery emails from Technorati. It seems as if they are not obeying their own SPF rules:
2008-03-25 14:46:23 H=nat-365m.technorati.com (t120.technorati.com) [208.66.64.4] F= rejected RCPT : Not authorized by SPF
Now I am wondering why someone sets up SPF for his mail domain when he is in fact sending emails from other IP addresses as well. Time to update your SPF rules Technorati...
After writing my last article, I started digging into my mail configuration and after doing a quick "mailq" noticed a lot of frozen messages in Exim's queue. After inspecting the logs and the mails themselves I noticed the problem was caused by a broken POP server I retrieve mails from periodically. A few days ago something went wrong on that server and all messages were marked as unread causing my fetchmail to re-fetch all of them (about 2.5K).
Now that my mail server is configured to do sender verification and a few very old mails came from domains or systems which are non-existent today about 50 mails ended up being frozen.
But how to remove all frozen mails from Exim's queue? I ended up using mailq | grep frozen to get a list of all messages (and more importantly their message IDs) and saved that to a file. I then wrote a minimalistic Python script attached to this article to delete all those messages. Consider the script a quick and dirty hack, but it might come in handy for some of you. Get it here.
Right now I am asking myself if it just affects me or if more spam is sent out and less is detected by anti-spam software again.
I set up my mail server in February and noticed a decrease in spam mail delivered to my mailbox compared to my old system. However, in the past two weeks more and more spam mail has been delivered to my mailbox again. So is it just me, my system or the system's configuration or is everyone else receiving more spam again?
Anyways, it's about time to inspect the configuration of my mail system again...
In the past two days I have been playing around with various Python IDEs. It is not like I need a fully-fledged IDE, I'm fine with GNU Emacs to be honest. However, everyone is talking about IDE X and IDE Y and how they save so much time using these programs and how these programs assist them with hacking.
Well, I decided it was time to give a few IDEs a try. There were only two requirements I had: the IDE has to be Free Software and it has to run on GNU/Linux.
If you are planning to read on please be aware that this was no real test, but rather contains my observations regarding the IDEs I have tested, what I liked and did not like and if one surprised me enough to actually use it instead of my good old plain GNU Emacs.
Eclipse
As Java development in school is done with Eclipse and all teachers are more than happy with that program I gave it a try first. I head that there was some sort of Python IDE plugin and so I downloaded Eclipse 3.3. After a few problems keeping the bugger running for more than 5 minutes (seems like the default memory-usage configuration did not provide Eclipse with enough memory) I started downloading the PyDev plugin using the internal plugin download manager. This worked quite smoothly, however, it seemed a bit slow.
Now PyDev looks quite neat, but without the proprietary PyDev extensions it is rather useless and GNU Emacs gives me pretty much the same features.
CONCLUSION: Bloated, using a huge load of memory (Eight-Megabyte-And-Constantly-Swapping joke comes to my mind again), not offering a lot more features than GNU Emacs without proprietary PyDev Extensions.
OpenKomodo
After reading this post on lwn.net about OpenKomodo (note: the post says Komodo Edit, but that's proprietary software) and how it supports Python I gave it a try. I built it from Subversion trunk, which took some time. OpenKomodo is based on Mozilla and Gecko and if you ever built Firefox from source you should know that you can go and grab quite a few coffees while waiting for the build to finish.
The build system seems to be one specifically written for this application and so is a bit weird to use for people use to either GNU Autotools or Python's distutils. After the build process finished I was unable to find a way to install the application. The documentation only contains a note about using the build tool (black, "bk") with the "run" argument to start OpenKomodo.
At first everything looked quite nice. It supports Python quite well, including limited auto-completion support and so on and also supports, just like you would expect, tabbed-editing. After playing around in the source tree of one of my projects and trying to get used to "normal" keyboard shortcuts, such as Ctrl+s for saving a file, I had quite a few tabs open.
You probably know that having a lot of tabs open just leads to confusion and so I wanted to close all tabs but the currently active one and oops: that feature does not exist.
I then digged into the OpenKomodo source, added that feature, prepared a patch and tried to get it into the trunk: without luck as it seems. As noted in a comment to my bug report such features should go into extensions. As I am too lazy to write an extension just for this small patch and basic feature I am still trying getting the changes into trunk.
However, I abandoned OpenKomodo, as I found something better. First to my conclusion though:
CONCLUSION: Nice editor, but like Eclipse, quite bloated as it is based on Mozilla (memory leaks anyone?). Compile time is bad, again, because it is based on Mozilla. Getting simple patches into its trunk also seems to be a problem.
PIDA
I do not remember how or where I stumbled accross a reference to PIDA, but it sounded interesting. PIDA is a Python IDE, built using Python, with a lot of features.
Even though you cannot see this on the screenshots on the PIDA homepage it does not include its own editor. It rather makes use of an existing editor. It currently can embed either vim or GNU Emacs (you need CVS version 23.x or newer). As I was using GNU Emacs before this really caught my attention. I downloaded PIDA from mercurial and built it. Build time is less than 5 minutes on my machine, which is more than acceptable.
When starting PIDA for the first time it asks which editor you want to embed. I obviously chose Emacs there.
It seems like embedding Emacs is in an early stage right now. Even though everything seems to work PIDA embeds the whole GNU Emacs (GTK version) window, including the menu bar and the toolbar. This generates a weird look, as you have two menu- and toolbars, one belonging to PIDA and one belonging to GNU Emacs. No problem for me though, as it is rather a style-problem than a real one.
After opening up one of my projects I immediately noticed one thing: version control integration. I can confirm that Subversion is properly supported and works perfectly. Including reverting of files, updating the local copy, committing changes and viewing differences. This indeed is a great feature and I like it.
I played around a bit more and stumbled across the plugins. There are quite a few neat plugins, like a Trac integration plugin which allows you to view tickets inside the IDE or a TODO parser plugin, which parses comments containing "TODO:" or "XXX:" from files and gives you references to them.
Another useful plugin seems to be the Python Source Viewer, which displays all functions, classes and methods present in the current python file in a tree view.
CONCLUSION: The IDE I am most likely going to use for now. Why? Because it seems to be lightweight, uses GNU Emacs as embedded editor and comes with a proper feature set. I suggest everyone, even hardcore GNU Emacs users, to give PIDA a try. It looks worth it.