When I was going through PL/SQL procedures written by some of my
colleagues, I noticed a few mistakes made around the Oracle’s date
conversion functions. There are some peculiarities about those functions
that I thought everyone knew about. But I reckon if I write about it, it
may help others to avoid such mistakes. I also allowed myself to outline
a few rules that, if you adhere to them, will help you to write better
programmes.
There are 2 functions in Oracle to convert strings to dates and back.
The first one is
TO_DATE
– it takes a string parameter and returns a date. Ok, it’s
actually adate/time combination enclosed into a single data type –
Oracle’s DATE
.
The second one is
TO_CHAR
– does the opposite: it takes date/time as Oracle’s DATE
data type
and converts it to string.
Actually, these functions are bit more complex than that, but for our
purpose that will do. What’s important to understand here is the distinction between date as a DATE
data type and its string representation.
For example, when you type ’01-APR-09’
in the procedure’s text, that’s a
string, representing a date. Pay attention here: although you meant
a date, Oracle sees a string. For Oracle everything that is enclosed in
single quotation marks is a string. To make it a date, we need to
convert this string to a DATE
data type. Such conversion can be carried
out by 2 possible ways: explicitly and implicitly.
Explicit conversion is when we apply the TO_DATE
function to the
string:
v_date DATE;
v_date := TO_DATE('01/04/2009', 'DD/MM/YYYY');
Now v_date
is a date, representing April 1st, 2009.
Implicit conversion is when we let Oracle to perform the conversion:
v_date DATE;
v_date := '01-APR-09';
It has the same effect. Every time Oracle sees a string in place where
it expects a date, it is smart enough to perform the conversion for us.
“Well”, you may think, - “That’s great. Oracle does it all for us, so we
don’t have to do it. Life is easier, let’s go for another coffee break”.
Not quite.
You see, when Oracle does such implicit conversion, it relies on some
assumptions. If you read the documentation for TO_DATE
and TO_CHAR
functions, you’ll find that they take another
optional parameter – the date format. That format tells Oracle how
the string representing the date/time should be treated. If the format
parameter is not specified, it is taken from NLS_DATE_FORMAT
Oracle
parameter. Here’s the crux: We can’t assume that this parameter will be
the same on all Oracle systems. Although it is ‘DD-MON-RR’ by default
and it is left like that on most Oracle systems, we can’t assume that
it’s going to be like this always and everywhere. And if you rely on
implicit date conversions and some DBA changes NLS_DATE_FORMAT
parameter – WHAM! – All your programs will stop working.
So, a good practice and rule of thumb for you should be:
Never ever rely on implicit date conversions! Whenever you need to convert date to string or vice versa, use an appropriate TO_DATE
or TO_CHAR
function and always specify a date format.
Just like this:
v_date DATE;
v_date := TO_DATE('01/04/2009', 'DD/MM/YYYY');
The danger of NLS_DATE_FORMAT
being changed is the biggest threat but
not the only one.
Pay attention to the default date format I provided just above –
‘DD/MM/RR’
. Do you notice anything suspicious? The year is 2 digits.
Here Oracle tries to be smartass and tries to
guess
whether you mean XX or XXI century. Your only hope that it can figure
out what you meant and doesn’t make a mistake. But if it mistakes –
oops, welcome back the Millennium Bug. This brings us to the second
rule:
Always specify the 4-digit year.
Another dangerous programming technique is trying to convert Date to
Date where no conversion is necessary.
Let’s have a look at the following example, or should I say a puzzle?
DECLARE
v_date DATE := '01-APR-09';
v_date_2 DATE := TO_DATE (v_date, 'DD/MM/YYYY');
BEGIN
dbms_output.put_line (TO_CHAR (v_date_2, 'DD/MM/YYYY'));
END;
Try to guess what will be printed as a result.
If you think ‘01/04/2009’, you’ve just screwed your business critical
application and have sent it two thousand years back in time.
In fact, you’ll get ‘01/04/0009’.
This is where it all goes bad:
v_date_2 DATE := TO_DATE (v_date, 'DD/MM/YYYY');
And here’s why:
The first thing Oracle tries to do is to execute TO_DATE
function.
There is only one TO_DATE
function in Oracle – the one that takes a
string and converts it to a date. Despite we know that v_date
is not a
string, Oracle still proceeds with its logic. If you run this code, it
won’t produce an error.
Oracle successfully convinces itself that it sees a String where it has
a Date. That happens because Oracle is able to implicitly convert that
date to a string, effectively turning that line
into
v_date_2 DATE := TO_DATE (TO_CHAR(v_date), 'DD/MM/YYYY');
But, as we’ve already learned, implicit date to string conversions are
performed using the date format recorded in NLS_DATE_FORMAT
Oracle
parameter, which is by default set to ‘DD-MM-RR’
. Hence, what Oracle
effectively does is this:
v_date_2 DATE := TO_DATE (TO_CHAR(v_date, 'DD-MM-RR'), 'DD/MM/YYYY');
Can you spot the error already? The date formats are inconsistent! This
is what you get when you don’t pay attention to the details.
So, here comes rule 3:
Avoid unnecessary conversions. Never convert dates to dates.
If you think that all this stuff is pretty confusing, that’s because it
indeed is. The good news is that you can avoid the confusion altogether
by learning to program in a more clear, more concise way. That is a
foundation of a good programming style.
Virtually every software development methodology stresses a need for
specifying how the software end product is expected to work before
starting the programming. But the truth is that writing specifications
is boring. And most programmers, being lazy people (in good meaning of
“lazy”. Yes, I believe there is a good meaning) try to avoid boring work. So, when the specs are
written, they are often incomplete and don’t cover all the possible
scenarios. Moreover, as the requirements change, these specs become
outdated and, hence, even less useful.
I wonder if there is a way to replace the boring specs with something
which is fun.
One possible way to go may be a “Role play-style modelling”. Just think
about it. Usually software product consists of a number of modules
communicating to each other. Each module does a specific task. In a
typical software project, different developers are assigned to implement
different modules.
Now imaging that before diving into coding, you have a kind of a
role-playing game, where each of the programmers is actually pretending
to be a module he is going to program. It goes like that:
Dave is a module that reads an XML file and converts it into a spreadsheet,
Bob is a module that is responsible for taking the spreadsheet and sending
it to users as an e-mail attachment.
Dave: Trying to open the XML file. Wait a minute, what if it’s
not a well-formed XML, what am I supposed to do then? I should probably
send an error message back. Ok, I read it and successfully transformed.
Now I notify Bob that it’s ready. Bob, we need to decide on this
notification protocol. Now, it’s your turn.
Bob: Ok, first I need to create an e-mail. Where do we get the e-mail
address and subject line from – should we ask the user? This is something
to find out. Then I need to check that the e-mail address is well-formed
and the subject line is not empty. Then I read the spreadsheet file Dave
provided and attach it. Dave, how big this file can be?
Dave: I reckon it’s up to 1 Mb, but I need to clarify it.
Bob: I need to find out if I need to compress it. Then I try to send
the e-mail. If it is sent successfully, I report Ok status back to Dave.
If it’s failed, I report failure status with the error message.
…
And so on.
That would give all the participants the understanding of how the end
product should work. And, assuming the programmers are taking notes as
they do it, every one will end up with a mini-spec just for themselves.
Another advantage of such approach is that it would encourage people to
discuss every possible situation and ask “what if” questions. And asking
them is the most important part of the whole process. While there is at
least one outstanding question, the spec is incomplete. So, after our
game, Bob and Dave would depart on harassing whoever appropriate, like
business analysts or even end users, to extract the answers from them.
And once they have the answers, we would repeat the game again.
And would do it over and over again, until everything unknown is ruled
out.
It sounds like a odd approach, but it just might be odd enough to work.
A few days ago I went to a presentation by an Oracle recruiter. He
talked about the ways Oracle searches for the potential candidates and
and hires them. One particular thing that struck me was that Australian
Oracle recruitment team does not use specialised recruitment web sites.
About a decade ago, looking for a job was all about reading newspaper
ads. Then along came job web sites, and the recruitment landscape
changed forever.
I am myself of a younger generation, and looking for a job for me was
never associated with newspapers. When I look for a job, I go to a web
sites such as Seek or
Monster.com and search there. I take it as a given
that all the jobs I may be interested in are posted there, either
directly of through recruitment agents.
This approach is more convenient for both job advertisers and seekers,
because it allows them to find each other more easily. And most of the
activity is concentrated around a few dominant players such as
aforementioned Seek and Monster. This is because of so-called
network effect: such web sites are
created to connect people to each other, in our case job seekers and
advertisers, and because the number of possible connections is
proportional to the square of participants, the probability of “getting
connected” through a large job web site is much higher than through a
small one. Thus, size does matter. Indeed, if I want to find a job,
the first thing I do is look at the largest local job site because this
is where most jobs are, much like Willie Sutton robbing banks “Because that’s where the money is”.
But that is changing too.
As I have already said, I was surprised to hear that Australian Oracle
recruiters do not use job web sites. Instead, they exploit a relatively
new phenomenon of Internet social network services.
Social networks benefit from network effect at the scale of magnitude
comparing to the traditional job sites. This is because the network size
of job sites depend on the number of job advertisers paying for placing
their ads. In other words, id depends on supply of ads. And if supply side
does not increase, the network will not grow.
On the other hand, the growth rate of a social network is not limited by
either supply or demand. As long as new members keep subscribing,
network will grow. Even if they stop, the existing members will still be
able to establish new connections until all the members are connected to
each other.
A good example is LinkedIn, which is
effectively a tool for building a professional social network. It allows
you to easily find your co-workers, provided they are registered with
the site, as well as connect with potential employers.
So, what is there for employers? The benefit for them is that they are
able to build their network, even if they are not actively hiring. And
after that, when they need it, they will be able to instantly find the
right candidates through their connections. It is like having a job
seekers bank at their disposals, a readily available database of resumes. And
that bank will update and grow by itself.
That leads us to an interesting conclusion: within this new paradigm if
you’re not online, you cannot be found. And if you cannot be found, you
can’t be hired. Just because you are outside of the network.
All in all, we are witnessing a birth of a new recruitment methods. Who
knows where it is going to lead us. Maybe in 5 years it will be mandatory for job seekers to have a Second Life avatar
and all the interviews will be conducted in virtual reality.
Australian Oracle recruitment team has a
blog, they also follow
LinkedIn, Twitter and
MySpace networks.
Once during one of the projects I was asked what was the risk of
something going wrong. I remember I answered that we, programmers, do
not assess risks. We just do everything possible to prevent them. “There
is no point”, I said, “to calculate the chances and the impact of a file
not being in the incoming directory upon the start-up of the process if
we can implement a simple check and avoid the disaster”.
Needless to say, I was wrong.
Nowadays I work in a bank. And one of the things I learned about banks,
they all have departments responsible for managing risks. A bank in its
business plays a risky game - it lends money. It is risky because
there’s always a possibility that it will not get them back. And banks
learnt to manage these risks. They ask: “What can go wrong? What can we
do to prevent that? And what can we do to minimize the impact if we
won’t manage to prevent it?”
Software industry is much much younger than banking. But we are proud of
ourselves, because we use cutting-edge technology and most modern
methodologies. Yet, we haven’t learnt some basic principles other
industries employ for decades.
We don’t manage risks.
I’ve seen a few projects screw-ups, all for various reasons: a
technology chosen wasn’t robust enough to support the solution, the
staff’s skills were not strong enough to implement the solutions,
requirements were unrealistic… The cause may be different, but the
result is always the same - failure, or infinite slippage of the
deadline, which is the same as a failure.
And in every case the failure seemed to have been unexpected. It
puzzled me - why could nobody predict it? And it happens over and over
again. A few years ago I was a faithful reader of The Daily WTF blog. Every day it published failure stories
like the ones I witnessed. So, if we are so smart, why does it keep
happening over and over again?
Because in a software project nobody asks what can go wrong.
Not all risks are the same. Some risks can be assessed and eliminated.
Such as a risk that the existing or planned hardware will not be able to
manage the workload. This kind of risk is calculable. We can do a stress
test, measure the execution time, do basic calculations and determine
not only that we need a new server with more powerful CPU, but how
exactly powerful it needs to be.
Others are harder to predict. How would you assess a risk of your
project manager being a crook and ruining a project?
And, finally, there’s a mother of all risks – a highly improbable event,
the Black Swan, as Nassim Nicholas Taleb calls it in his book “The Black Swan: The Impact of the Highly
Improbable”.
(I would highly recommend it to everyone who wants to learn more about
risks) What, for example, is going to happen to your project if all the
programmers decide to quit at the same time?
Some time ago I worked on a project that employed so called custom rules engines
for performing complex batch calculations on a huge dataset - about a
hundred million rows. We had a few thousand rules defined by business
people, and in order to get a result we needed to apply every rule to
the source dataset. As you can imagine, it was slow. Scanning a hundred
million rows is not fast, especially when you need to do it a thousand
times. So, a strategic decision was made to speed it up. One of our
local geniuses came up with a brilliant, as he thought, idea: instead of
applying a thousand rules hundred thousand times each, we will reformat
the existing rules so that they form one monstrous rule, containing all
thousand, which can be applied to the source dataset in one pass. And so
we did. One thing, though, we didn’t take into account: when combined
all together, the rules formed a frankensteinian SQL expression more
than 1,000,000 character long (that’s right, more than a million). Yes, it
worked, and the process became reasonably fast. But what was the price
paid?
Imagine that you are a support programmer. You are quite unhappy with
your life, and your job in particular, because you’re fed up with those
stupid users and their problems. And on one evening, just as you were
going to go home, you get a call. Apparently, the rules engine failed.
You open the log and what do you see? A million characters long SQL and
at an error message the end, saying that an error happened somewhere in
there. Yep, somewhere in those million chars. You see, that’s the way
Oracle works. It tells you that an error has happened and tells which
one (let’s say, division by zero), but doesn’t tell where exactly within
the given SQL it has happened. So, what are you going to do, step by
step? Don’t know about you, but I would start writing a resignation
letter. And I wasn’t the only one confused. No one I asked could answer
that question.
You may ask, what does it have to do with risk? There is a direct
connection.
We had certain risk that the system being built couldn’t handle the
workload. The amount of that risk was known. We could calculate how much time it took to process the
data set, extrapolate it to the given requirements and find out that it
took 10 times more time than was acceptable. We could resolve it by
buying 10 times more powerful server, or cluster of less powerful
servers. In any case, the risk and the cost associated with eliminating
those risks were calculable.
After a strategic decision was made to solve the performance problem by
much more complex rules engine, the risk became incalculable. Yes, they
prevented the performance problem, but introduced another and much more
serious: in a case of a single failure the system was going to go down
for undefined time. Because no one would be able to fix it. And the
undefined downtime would mean an unlimited loss for the business,
because in a business world time is always money.
So, what they did is substituted an imminent, but assessable risk, that,
in fact, could be prevented, with a less probable, but potentially much
more severe which that could not be assessed. I don’t believe that
solving a minor problem by introducing a much severe one, and hiding a
possibility of a failure under the rug is a proper risk management. In the long term it is never a good strategy.