A different approach to Email Validation
Monday, December 13th, 2010I ran into an issue with a library that did email validation. One of the client’s email addresses didn’t validate and while debugging the code, I realized that more than just that client’s email address could be rejected. While using a regexp is not a failsafe solution to email address validation, it does attempt to make sure you are getting a reasonably good email address.
However, the regexp can still fail and in some cases, you get a client that is turned away. The only true way to validate an email address is valid at least once is to require a client to receive an email and validate their account. With so many throwaway account services, you can never be assured that anything past that first email address will ever be received. However, to give you the best chance at getting a valid email address, you need to do some validation.
There are a number of methods for validating email addresses on the internet and most fail somewhere. Do you accept emails with ” or < > in them? You might, since some email clients paste that information when you copy an email address. Or, do you accept the bare minimum email address, accepting the fact that people could use -, +, . or even ‘ in the name portion? Trying to get the person to enter a name that is reasonably valid is the goal.
One could have a very generic rule that looks for an @ with a . after it and some characters. That would ensure almost the largest cross-section of potential email addresses but we may have cast too wide a net. We could go with a very strict regexp but potentially miss out on some corner cases.
As we can see, the error message presented here is correct, the email address we’ve entered is not correct:
But, what if we do put in a valid email address:
Our regexp fails to recognize the email address even though it is valid. Of course, this requires someone to adjust some code and we’re fine until the next email address that fails our test is entered.
What if we introduce a new error condition, a soft fail. If the address passes our very simple test and looks to have a valid email address construction, we’ll put another field asking the user to check to see if any typos were made. Users that pass our more stringent regexp wouldn’t be subjected to the soft fail error handling.
When a user is subjected to the soft fail, our software can log the email address so we can adjust our validation rules, but, we’re not restricting a client from being able to sign up because their email address didn’t fit the regexp that we’re using.
While I feel reasonably confident with some of the regexp’s I use, email validation has always been one of the most troublesome portions of any system we’ve designed.
For the above test, we used Deform/Colander and Pyramid.