mardi 7 juin 2011

CPAN modules you love to hate (or the contrary)

If one thing starts to bother me in the Perl universe, it's all the dependencies which force you to pull half of the CPAN each time you try to install a major module.

Don't get me wrong: I love Perl, I love CPAN, I love modules and I definitively love code reuse BUT may be we start doing it the wrong way.

Let's think why we need code reuse:
1) Because we're lazy ( as every good Perl programmer :-) ) and don't want to
rewrite an existing wheel (if a good one already exist)
2) Because it reduces the code size by factorizing common parts

The factorizing aspect is especially important to me:
_ Common parts (modules) are more tested as more users eventually use them
_ Code is easier to maintain (smaller, less heterogeneous...)
_ It forces us to think about the API, which eases enhancements
_ ...

Now if I look at the dependency mess on CPAN, I realize that what's is bothering me is not the numerous dependencies,  but rather that most of them void/lesser the benefits of code reuse.

Why must I use 3 different XML SAX parsers?
Why must I use 3 different Serializer modules?
Must I really use 2 different dispatch modules?
Can't I just use one error/handling module?

I wholeheartedly adhere to the TIMTOWTDI motto, but the more I use CPAN modules the more I get functional duplication code: My applications get globally bigger, more complex, heterogeneous, less tested than it could be...
What a paradox!

I'm calling to your wisdom : am I the only one to feel this dependency bloat?
Do you see any path to a more efficient Modules use?

4 commentaires :

anonymous a dit…

In non-French typography, there is no space in front of exclamation/question mark, colon etc.

BooK a dit…

It's not so much a specific Perl/CPAN issue as a side effect having a rich environment. I'm pretty sure other languages with many libraries available and building on top of each other will have the same kind of issue: two libraries you might want to use for your project will use different libraries to do similar things.

The cost of a rich diet is being overweight. All things considered, it's probably a better fate than starvation.

Arnaud ASSAD a dit…

@anonymous: Thanks! Post now corrected.

@BooK: You're right but may be there's something else besides this "all or nothing" approach. CPAN got the tools (stats on dependencies, stats on most used modules, kwalitee...) to help us find something *better*, don't you think?

garu a dit…

Yes, there is Task::Kensho and several recommendations out there. But having the freedom to choose means you may not like one's choice of dependencies, but still find the outcome useful - outcome that might just have not been there should the author still be struggling with another API that just doesn't fit his/her mindset.

Some standards evolve, others are so common you might even take them for granted. When you think of relational databases in Perl you don't even consider rolling your own, you just think "DBI". The same is very slowly starting to happen for OO and Moose, but it's still has a huge way to go before being that accepted - and guess why? Too many non-core dependencies "for just an object system".

There are adaptors out there such as JSON::Any (which is now deprecated) and AnyEvent that try to provide a common layer and allow the module to use whatever the dev has installed in first place, but even that might be considered additional bloat depending on where you stand.

Finally, it's important to remember that dependencies are a concern for the module author as well. Let's say you need to parse YAML and all its edge-cases. You should probably install "YAML", a module that does the right thing but has a huge memory footprint. On the other hand, your code may also depend on a module that needs to parse YAML, but only simple ones. Why should it depend on "YAML" when the fast and lightweight "YAML::Tiny" fulfill its needs completely? And now your application loads two modules to parse YAML.

To me the problem is that reusable modules (in any language) are created from the bottom-up. Authors have no way to anticipate how (or *if*) you're going to use their code, so they just stick with what works for them and that is usually fine.

There's no excuse for depending on a crappy module (like you said, there are several tools on CPAN such as testers output, ratings, etc, to help you pick), but if you want to make sure your final app doesn't have any extra fat, you'll have to either tweak your dep's code in your environment (thus "voiding the warranty") or roll your own (and making sure it works properly on all the edge-cases yourself).

I take the extra (clean) fat over reinventing big wheels anytime :)