DataworksSummit Berlin - Wednesday morning

2018-04-19 06:50
Data strategy - cloud strategy - business strategy: Aligning the three was one of the main themes (initially put forward in his opening keynote by CTO of Hortonworks Scott Gnau) thoughout this weeks Dataworks Summit Berlin kindly organised and hosted by Hortonworks. The event was attended by over 1000 attendees joining from 51 countries.

The inspiration hat was put forward in the first keynote by Scott was to take a closer look at the data lifecycle - including the fact that a lot of data is being created (and made available) outside the control of those using it: Smart farming users are using a combination of weather data, information on soil conditions gathered through sensors out in the field in order to inform daily decisions. Manufacturing is moving towards closer monitoring of production lines to spot inefficiencies. Cities are starting to deploy systems that allow for better integration of public services. UX is being optimized through extensive automation.

When it comes to moving data to the cloud, the speaker gave a nice comparison: To him, explaining the difficulties that moving to the cloud brings is similar to the challenges that moving "stuff" to external storage in the garage brings: It opens questions of "Where did I put this thing?", but also about access control, security. Much the same way, cloud and on-prem integration means that questions like encryption, authorization, user tracking, data governance need to be answered. But also questions like findability, discoverability and integration for analysis purposes.

The second keynote was given by Mandy Chessell from IBM introducing Apache Atlas for metadata integration and governance.

In the third keynote, Bernard Marr talked about the five promises of big data:

  • Informing decisions based on data: The goal here should be to move towards self service platforms to remove the "we need a data scientist for that" bottleneck. That in turn needs quite some training and hand-holding for those interested in the self-service platforms.
  • Understanding customers and customer trends better: The example given was a butcher shop that would install a mobile phone tracker in his shop window in order to see which advertisement would make more people stop by and look closer. As a side effect he noticed an increase in people on the street in the middle of the night (coming from pubs nearby). A decision was made to open at that time, offer what people were searching for at that time according to Google trends - by now that one hour in the night makes a sizeable portion of the shop's income. The second example given was Disney already watching all it's Disney park visitors through wrist bands, automating line management at popular attractions - but also deploying facial recognition watching audiences watch shows in figure out how well those shows are received.
  • Improve the customer value proposition: The example given was the Royal Bank of Scotland moving closer to it's clients, informing them through automated means when interest rates are dropping, or when they are double insured - thus building trust and transparency. The other example given was that of a lift company building sensors into lifts in order to be able to predict failures and repair lifts when they are least used.
  • Automate business processes: Here the example was that of a car insurance that would offer dynamic rates if people would let themselves monitor during driving. Those adhering to speed limits, avoiding risky routes and times would get lower rates. Another example was that of automating the creation of sports reports e.g. for tennis matches based on sensors deployed, or that of automating Forbes analyst reports some of which get published without the involvement of a journalist.
  • Last but not least the speaker mentioned the obvious business case of selling data assets - e.g. selling aggregated and refined data gathered through sensors in the field back to farmers. Another example was the automatic detection of events based on sounds detected - e.g. gun shots close to public squares and selling that back to the police.


After the keynotes were over breakout sessions started - including my talk about the Apache Way. It was good to see people show up to learn how all the open source big data projects are working behind the scences - and how they themselves can get involved in contributing and shaping these projects. I'm looking forward to receiving pictures of feather shaped cookies.

During lunch there was time to listen in on how Santander operations is using data analytics to drive incident detection, as well as load prediction for capacity planning.

After lunch I had time for two more talks: The first explained how to integrate Apache MxNet with Apache NiFi to bring machine learning to the edge. The second one introduced Apache Beam - an abstraction layer above Apache Flink, Spark and Google's platform.

Both, scary and funny: Walking up to the Apache Beam speaker after his talk (having learnt at DataworksSummit that he is PMC Chair of Apache Beam) - only to be greeted with "I know who *you* are" before even getting to introduce oneself...

Apache Breakfast

2018-04-17 07:39

In case you missed it but are living in Berlin - or are visiting Berlin/ Germany this week: A handful of Apache people (committers/ members) are meeting over breakfast on Friday morning this week. If you are interested in joining, please let me know (or check yourself - in the archives of the mailing list party@apache.org)

FOSS Backstage - Schedule online

2018-04-17 07:27
In January the CfP for FOSS Backstage opened. By now reviews have been done, speakers notified and a schedule created.

I'm delighted to find both - a lot of friends from the Apache Software Foundation but also a great many speakers that aren't affiliated with the ASF among the speakers.

If you want to know how Open Source really works, if you want to get a glimpse behind the stage, do not wait for too long to grab your ticket now and join us in summer in Berlin/ Germany.

If project management is only partially of your interest, we have you covered as well: For those interested in storing, searching and scaling data analysis, Berlin Buzzwords is scheduled to take place in the same week. For those interested in Tomcat, httpd, cloud and iot, Apache Roadshow is scheduled to happen on the same days as FOSS Backstage - and your FOSS Backstage ticket grants you access to Apache Roadshow as well.

If you're still not convinced - head over to the conference website and check out the talks available yourself.

My board nomination statement 2018

2018-03-23 07:21
Two days ago the Apache Software Foundation members meeting started. One of the outcomes of each members meeting is an elected board of directors. The way that works is explained here: Annual Apache members meeting. As explained in the linked post, members accepting their nomination to become a

director are supposed to provide a nomination statement. This year they were also asked to answer a set of questions so members could better decide who to vote for.

As one of my favourite pet peeves is to make the inner workings of the foundation more transparent to outsiders (and have said so in the nomination statement) - I would like to start by publishing my own nomination statement here for others to read who don't have access to our internal communication channels:

Board statement:

Two years ago I was put on a roller coaster by being nominated as Apache board member which subsequently meant I got to serve on the board in 2016. Little did I know what kind of questions were waiting for me.

Much like back then I won't treat this position statement as a voting campaign. I don't claim to have answers to all the questions we face as we grow larger - however I believe being a board member even at our size should be something that is fun. Something that is lightweight enough so people don't outright decline their nominations just for lack of time.

One thing I learnt the hard way is scalability needs two major ingredients: Breaking dependencies and distribution of workload. Call me old-fashioned (even though chemistry can hide my gray hair, my preference for mutt as a mail client betrays my age), but I believe we already have some of the core values to achieve just that:
  • "Community over code" to me includes rewarding contributions that aren't code. I believe it is important to get people into the foundation that are committed to both our projects as well as the foundation itself - helping us in all sorts of ways, including but not limited to coding, documenting, marketing, mentoring, legal, education and more.
  • "What didn't happen on the mailing list didn't happen" to me means communicating as publicly as possible (while keeping privacy as needed) to enable others to better understand where we are, how we work, what we value and ultimately how to help us. I would like for us to think twice before sending information to private lists - both at the project and at the operational level.
  • I believe we can do better in getting those into the loop who have a vested interest in seeing that our projects are run in a vendor neutral way: Our downstream users who rely on Apache projects for their daily work.
I am married to a Linux kernel geek working for the Amazon kernel and operating systems team - I've learnt a long time ago that the Open Source world is bigger than just one project, bigger than just one foundation. Expect me to keep the bigger picture in mind during my work here that is not ASF exclusive.

Much like Bertrand I'm a European - that means I do see value in time spent offline, in being disconnected. I would like to urge others to take that liberty as well - if not for yourselves, then at least to highlight where we are still lacking in terms of number of people that can take care of a vital role.

As you may have guessed from the time it took for me to accept this nomination, I didn't take the decision lightly. For starters semi-regularly following the discussion on board@ to me feels like there are people way more capable than myself. Seeing just how active people are feels like my time budget is way too limited.

So what made me accept? I consider myself lucky seeing people nominated for the Apache board who are capable leaders that bring very diverse skills, capabilities and knowledge with them that taken together will make an awesome board of directors.

I know that with FOSS Backstage one other "pet project of mine" is in capable hands, so I don't need to be involved in it on a day-to-day basis.

Last but not least I haven't forgotten that back in autumn 2016 Lars Trieloff* told me that I am a role model: Being an ASF director, while still working in tech, with a today three year old at home. As the saying goes "Wege entstehen dadurch, dass man sie geht" - free-form translation: "paths are created by walking them." So instead of pre-emptively declining my nomination I would like to find a way to make the role of being a Director at the Apache Software Foundation something that is manageable for a volunteer. Maybe along that way we'll find a piece in the puzzle to the question of who watches the watchmen - how do we reduce the number of volunteers that we burn through, operating at a sustainable level, enabling people outside of the board of directors to take over or help with tasks.

* Whom I know through the Apache Dinner/ Lunch Berlin that I used to organise what feels like ages ago. We should totally re-instate that again now that there are so many ASF affiliated people in or close to Berlin. Any volunteers? The one who organises gets to choose date and location after all ;)

Answers to questions to the board nominees:

On Thu, Mar 15, 2018 at 01:57:07PM +0100, Daniel Gruno wrote:
> Missions, Visions...and Decisions:
> - The ASF exists with a primary goal of "providing open source
> software to the public, at no charge". What do you consider to be
> the foundation's most important secondary (implicit) goal?


I learnt a lot about what is valuable to us in the following discussion:

https://s.apache.org/hadw

(and the following public thread over on dev@community with the same subject. My main take-away from there came from Bertrand: The value we are giving back to projects is by providing "A neutral space where they can operate according to our well established best practices."

The second learning I had just recently when I had the chance of thinking through some of the values that are encoded in our Bylaws that you do not find in those of other organisations: At the ASF you pay for influence with time (someone I respect a lot extended that by stating that you actually pay with time and love).

> - Looking ahead, 5 years, 10 years...what do you hope the biggest
> change (that you can conceivably contribute to) to the foundation
> will be, if any? What are your greatest concerns?


One year ago I had no idea that little over two months from now we would have something like FOSS Backstage here in Berlin: One thing the ASF has taught me is that predicting the future is futile - the community as a whole will make changes in this world that are way bigger than the individual contributions taken together.

> < - Which aspect(s) (if any) of the way the ASF operates today are you > least satisfied with? What would you do to change it?

Those are in my position statement already.

> #######################################

> Budget and Operations:
> - Which roles do you envision moving towards paid roles. Is this the
> right move, and if not, what can we do to prevent/delay this?
>

Honestly I cannot judge what's right and wrong here. I do know that burning through volunteers to me is not an option. What I would like to hear from you as a member is what you would need to step up and do operational tasks at the ASF.

Some random thoughts: - Do we have the right people in our membership that can fill these operational roles? Are we doing a good enough job in bringing people in with all sorts of backgrounds, who have done all sorts of types of contributions? - Are we doing a good enough job at making transparent where the foundation needs operational help? Are those roles small enough to be filled by one individual?

This question could be read like today work at the ASF is not paid for. This is far from true - both at the project as well as at the operational level. What I think we need is collective understanding of what the implications of various funding models are: Even if the ASF doesn't accept payment for development doesn't directly imply that projects are more independent as a result. I would assume the same to be true at the operational level.

> #######################################
>
> Membership and Governance:
> - Should the membership play a more prominent role in
> decision-making at the ASF? If so, where do you propose this be?


I may be naive but I still believe in the "those who do the work are those who take decisions". There only close to a dozen people who participated in the "ask the members questionaire" I sent around - something that was troubling for me to see was how pretty much everyone wanted

> - What would be your take on the cohesion of the ASF, the PMCs, the
> membership and the communities. Are we one big happy family, or
> just a bunch of silos? Where do you see it heading, and where do
> we need to take action, if anywhere?


If "one big happy family" conjures the picture of people with smiling faces only, than that is a very cheesy image of a family that in my experience doesn't reflect reality of what families typically look like.

This year at FOSDEM in Brussels we had a dinner table of maybe 15 people (while I did book the table, I don't remember the exact number - over-provisioning and a bit of improvisation helped a lot in making things scale) from various projects, who joined at various times. I do remember a lot of laughter at that table. If anything I think we need the help people to bump into each other face to face independently of their respective project community more often.

> - If you were in charge of overall community development (sorry,
> Sharan!), what would you focus on as your primary and secondary
> goal? How would you implement what you think is needed to achieve
> this?


I'm not in charge in that - nor would I want to be, nor should I be. The value I see in the ASF is that we rely very heavily on self organisation, so this foundation is what each individual in it makes out of it - and to me those individuals aren't limited to foundation members, PMC members or even committers. In each Apache Way talk I've seen (and everytime I explain the Apache Way to people) the explanation starts with our projects' downstream users.

> Show and Tell:

I'm not much of a show and tell person. At ApacheCon Oakland I once was seeking help with getting a press article about ApacheCon reviewed. It was easy finding a volunteer to proof-read the article. The reason for that ease given by the volunteer themselves? What they got out of their contributions to the ASF was much bigger than anything they put into it. That observation holds true for me as well - and I do hope that this is true for everyone here who is even mildly active.

ApacheConNA: Meet the indian tribe

2013-05-08 20:10
ApacheCon is the ``User Conference of the Apache Software Foundation''. What
should that mean? If you are going to Apache Con you have the chance of meeting
committers of your favourite projects as well as members of the foundation
itself. Though there are a lot of talks that are interesting from a technical
point of view the goal really is to turn you into an active member of the
foundation yourself. This is true for the North American version even more than
for the European edition.


Though why should you as a general user of Apache software be interested in
attending then? Pieter Hintjens put it quite nicely in an interview on his
latest ZeroMQ book with O'Reilly:




If you are using free software in particular in commercial setups you really do
want to know how the project is governed and what it takes to get active and
involved yourself. What would it take to move the project into a direction that
fits your business needs? How do you make sure features you need are actually
being added to the project instead of useless stuff?


ApacheCon is the conference to find out how Apache projects work internally,
the place to be to meet active people in person and put faces to names. Lots of
community building events focus on getting newbies in touch with long term
contributors.

Moving to a new domain

2012-09-12 12:30
Executive summary: This is to warn those of you who are subscribed to this blog - the domain to reach this blog w/o redirects will soon change to by isabel-drost-fromm.de - you might want to adjust your rss subscription accordingly.

Longer version: This blog post is scheduled to go live some time after lunch-time on September 12th 2012. You might have heart rumors before - that date Ms. Isabel Drost and Mr. Thilo Fromm are supposed to get married.



There were times when war and conflicts between kingdoms were settled by having children of the reigns get married. Today this old tradition is being continued on a much smaller scale by having a couple get married that is comprised of one half being passionate about Linux Kernel hacking and a strong proponent of GPL/LGPL open source licensing and the other half coming from the Java world, mainly contributing to ASL projects.

As a bit of "showing of good will" both agreed to the proposal of Matthias Kirschner: Girls that are FSFE fellows really should only marry other FSFE fellows. So we got Thilo a fellowship membership setup very quickly.

PS: Now looking forward to dancing into a new part of life this evening ;)

Pps: Thanks to photomic for the DLSR fotos, and to masq for taking the above picture and mailing it to my server. Having a secure shell on your mobile phone rocks!

Apache Con returns to Europe

2012-08-01 20:41
In November Apache Con will come back to Europe. The event will take place in Sinsheim inviting foundation members, project committers, contributors and users to meet, discuss and have fun during the one week event.



Several meetups will be held the weekend before the main conference kicks off, watch out for announcements on your favourite project mailing list.

ApacheCon is still open for submissions until August 3rd - head over to the Call for submissions for more information. The conference is split into several tracks that are being handled individually: Apache Daily - Tools frameworks and components used on a daily basis, Apache Java Enterprise projects, Big Data, Camel in Action, Cloud, Linked Data, Lucene, Modular Java Applications, NoSQL Database, OFBiz (The Apache Enterprise Automation project), Open Office and finally Web Infrastructure (covering HTTPD, TomCat and Traffic Server, the heart of many Internet projects).

Make sure to mark the date in your calendar to meet with the people behind the ASF projects, learn more on how the foundation works and what makes Apache projects so particular compared to others. Join us for a week of fun and dense talks on all things Apache.


The Apache Feather logo is a trademark of The Apache Software Foundation.

Apache Sling and Jackrabbit event coming to Berlin

2012-07-12 20:59
Interested in Apache Sling and/or Apache Jackrabbit? Then you might be interested in hearing that on September 26th to 28th there will be an event in town on these two topics - mainly organised by Adobe, but labeled as community event, meaning that there will be a number of active community members attending the conference: adaptTo().

From their website:

In late September 2012 Berlin will become the global heart beat for developers working on the Adobe CQ technical stack. pro!vision and Adobe are working jointly to set up a pure technical event for developers that will be focused on Apache Sling, Apache Jackrabbit, Apache Felix and more specifically on Adobe CQ: adaptTo(), Berlin. September 26-28 2012.



Apache Mahout 0.6 released

2012-02-08 21:33
As of Monday, February 6th a new Apache Mahout version was released. The new package features

Lots of performance improvments:


  • A new LDA implementation using Collapsed Variational Bayes 0th Derivative Approximation - try that out if you have been bothered by the way less than optimal performance of the old version.
  • Improved Decision Tree performance and added support for regression problems
  • Reduced runtime of dot product between vectors - many algorithms in Mahout rely on that, so these performance improvements will affect anyone using them.
  • Reduced runtime of LanczosSolver tests - make modifications to Mahout more easily and have faster development cycles by faster testing.
  • Increased efficiency of parallel ALS matrix factorization
  • Performance improvements in RowSimilarityJob, TransposeJob - helpful for anyone trying to find similar items or running the Hadoop based recommender


New features:

  • K-Trusses, Top-Down and Bottom-Up clustering, Random Walk with Restarts implementation
  • SSVD enhancements


Better integration:

  • Added MongoDB and Cassandra DataModel support
  • Added numerous clustering display examples


Many bug fixes, refactorings, and other small improvements. More information is available in the Release Notes.

Overall great improvements towards better performance, better stability and integration. However there are still quite some outstanding issues and issues in need for review. Come join the project, help us improve existing patches, improve performance and in particular integration and streamlining of how to use the different parts of the project.

Talking people into submitting patches - results

2012-01-01 18:42
Back in November I gave a talk at Apache Con NA in Vancouver on talking friends and colleagues into contributing patches to open source projects. The intended audience for this talk were experienced committers to Apache projects, the goal was to learn more on their tricks for talking people into patching. First of all thanks for an interesting discussion on the topic - it was great to get into the room with barely enough slides to fill 10 min and still have a lively discussion 45min later.

For the impatient - the written feedback is available as Google Doc. Most common advise I heard involved patience, teaching, explaining, fast feedback and reward.

One warning before going into more detail on the talk: All assumptions and observations stated are highly subjective, influenced by my personal experience or by whatever the experience of the audience was. Do not expect an objective, balanced, well research analysis of the problems in general. That said, lets start with the talk itself. Before the talk I decided to limit scope to getting people in that have limited experience with open source. That intentionally excluded anyone downstream projects depending on one's code. Though in particular interaction with common Linux distributions and their package maintainers is vital, that issue warrants for a separate talk and discussion.

I divided those inexperienced with open source into three groups to keep discussion somewhat focused:

  • Students learning about open source projects during their education and have neither background in software engineering nor in open source but are generally very eager to lean and open to new ideas.
  • Researchers learning about the concept as part of a research grant who have some software engineering experience, some experience with open source - in particular with using it - but in general do not have writing open source software as their main objective, but have to participate as part of their research grant.
  • Software engineers having experience with software engineering, some experience in particular with using open source and in general both strong opinions on what the right way of doing things is and who have a strong position in their team that helps them in no way when starting to contribute.


One very common way



To understand some of the issues below let me first highlight what seems to be the most common way to become involved with any Apache project: Usually it starts with using one of their software packages. After some time what is shipped does no longer fit your needs, reveals bugs that stop you from reaching your goals or is missing one particular feature - even if that is just one particular method being protected instead of private.

People fix those issues. As the best software developers are utterly lazy the contribute stuff back to the project to avoid the work of having to maintain their private fork just for some simple modification. The more features of a project are being used, the more likely it gets that also larger contributions become possible. Overall this way of selecting issues to fix has a lot to do with scratching your own itch. In the end this kind of issue prioritisation also influences the general direction of a project: Whatever is most important to those actively contributing is driving the design and development. So the only way to change a project's direction to better fit your needs is to start getting active yourself: Those that do are the ones that decide.

Students



Lets take a closer look at students aspiring to work on an open source project. They are very keen on contributing new stuff, learning the process and open to new ways of doing things. However for the most part they are no active users of the projects they selected so they do not directly see what is important to fix. In addition they have only limited software development experience - at least when looking at German universities, bug trackers, source version control, build systems, release management, maintaining backwards compatibility, unit test frameworks are on no schedule - and most likely shouldn't be neither. So your average student has to learn to deal with checking out code, compiling it, getting it into their favourite editor, adding tests and making them pass.

Apart from teaching, giving even simple feedback it helps to provide the right links to literature at the right times, and generally mentor students actively. In addition it can be helpful to leave non-critical, easy to fix issues open and mark them as "beginner level" to make it easier for new-comers to get started. One last advise: Get students to publish what they do as early and as often as possible. Back in the days I used to do projects at TU Berlin with the goal of getting students to contribute to Mahout. In the first semester I left the decision on when to open up the code to the students - they never went public. In the second semester I forced them to publish progress on a weekly basis (and made that part of how their final evaluation was done) - suddenly what was developed turned into a patch committed to the code base.

Researchers



A second group of people that has an increasing interest in open source projects are researchers. In particular for EU project research grant the promise of providing results and software developed with the help of European tax-payers money under and open source license has become an important plus when asking for project grants.

However before becoming all too optimistic it might make sense to take a closer look: Even though there is an open source check box on your average research grant that by no means leads to highly motivated, well educated new contributors for your project: With software development only being a means to reach the ultimate goal of influential publications researchers usually do not have the time and motivation to polish software to the level needed for a successful and useful contribution. In addition the concept of maintaining your contribution for a longer time usually does not fit the timeline and timeframe of a research project.

Apart from teaching and mentoring projects themselves should start asking for the motivation of the contribution. There are a few popular arguments to contribute patches back. However not all of them really work for the research use case: The cost of maintaining a fork is close to zero if you intend to never upgrade to a new version and do not need security fixes. Another common argument is an improved visibility of your work and an improved reputation of yourself as software developer. If software development for you is just a means to reach a much higher goal those arguments may not mean much to you. A third common argument is that of improving code quality by having more than one pair of eyes review it - and where would you get a better review than in the project bringing together the original code authors? However if ultimate stability, security and flexibility is not your goal than also that may not mean much to you.

Key is to find out where the interest for working on open source comes from and build up arguments from there.

Software engineers



The third group I identified was professional software developers - as clarified after a question from the audience: Yes, I consider people who are unable to create, read, apply patches as professional software developers. If I would exclude these people there would be noone left who earns his living with software development and does not already work on open source projects.

In contrast to the above groups these people have extensive software development experience. However that also means that after having seen a lot of stuff that works and that does not work they do have a strong position in their teams. Usually those fixing issues in libraries they use re the ones that have established work-flows that work for them very well and who are used to being pretty influential. When going into an open source community however no-one knows them. In general they are only judged based on their patch. They get open feedback - in the context of that project. Projects tend to have established coding guidelines, best practices, build systems - that may differ from what you are used to in your corporate environment.

Getting up to speed in such an environment can be intimidating at best in particular if everything you do is public, searchable and findable by definition. All the more it is important to get involved and get feedback early by even putting online early sketches of what your plan is.

However with everything being open there is also one major positive side to motivating contributors: Give credit where credit is due - add praise to the issue tracker by assigning issues to the one providing he patch, add the name of the contributor to your release notes. When substantial, mention the contribution with name in talks, presentations and publications.

Another important issue here is the influence of deadlines: If it takes half a year to get feedback on your particular improvement the reason why you made it may no longer exist - the project may have been cancelled, the developer moved to a different team, the patch applied internally as is fixing the existing issues. Fast feedback on new patches, in particular if they are clean and come with tests is vital. One positive example for providing feedback on formal issues quickly is the automated review bot at Apache Hadoop: It checks stuff like style, addition of tests, checks against existing tests and the like quickly after the patch is submitted in an automated way. Just one nitpick from the audience: The output of that bot could be either marked more clearly as "this is automated" or the text formulated a bit friendlier - if a human had done the review it would have mentioned the positive things first before criticising what is wrong.

Last but not least (applies to researchers as well), there may be legal issues lurking: Most if not all contracts entail that at least what you do during working hours belongs to your employer - so it's up to them what gets open sourced and what doesn't. Suddenly your very technical new contributor has to convince management, deal with legal departments and work his way through the employers processes - most likely without deep prior knowledge on open source licenses - let alone contributor agreements (or did you know what the Apache CCLA entails, let alone being able to explain it to others before really getting active?)

General advise



To briefly summarise the most important points:


  • Give feedback fast - projects only run for so long, interest only lasts for so long. The faster a contributor is told what is not too great about his patch, the more likely those issues are fixed as part of the contribution. (Inspired by Avro and Zookeeper who were amazingly fast in providing feedback, committing and in the case of Avro even releasing a fixed version).
  • When it comes to new contributors be patient, remain friendly even when faced with seemingly stupid mistakes.
  • Give credit where credit is due - or could be due. Mention contributors in publications, press releases, release notes, the bug tracker. Let them know that you do. (Inspired by Drools, Tomcat, Zookeeper, Avro). Pro-tip: Make sure to have no typo in people's names even if checking takes one extra minute. (Learned from Otis).
  • Use any chance you get to teach the uninitiated about the whole patch process. I know that this seems trivial to those who work with open source on a daily basis. However when getting dependencies through Maven it may already be cumbersome to figure out where to get the source from. When used to git in the daily workflow it may be a hurdle to remember how to checkout stuff from svn ;) Back in June we had a Hadoop Hackathon in Berlin that was well attended - mostly by non-committers. Jakob Homan proposed a rather unusual but very well received format: In the Hadoop bug tracker there are several issues marked as trivial (typos in documentation and the like). Attendees were asked to choose one of these issues, checkout the source, create a patch and contribute it back to the project. Optionally they got explained how the process continues from there on the committer side of things. It may seem trivial to mechanically go through the patch process, however it help lower the bar in case you have a real issue to fix to first get accustomed to just how it works. If instead of contributing to Apache you are more into working on the Linux kernel I'd like to advise you to watch Greg Kroah Hartman on writing and submitting your first Linux kernel patch (FOSDEM).
  • Last but not least make sure to lower the bar for contribution - do not require people to jump through numerous loops, in general even just getting a patch ready is complicated enough. Provide a how to contribute page (e.g. see how to contribute and how to become a committer pages in the Apache Mahout wiki.
  • In particular when your project is still very young lower the bar by turning contributors into committers quickly - even if they are "just" contributing documentation fixes - in my view one of the most important contribution there is as only users spot areas for documentation improvement.


In case you yourself are thinking about contributing and need some additional advice as to why and for what purposes: Dr Dobbs has more information on reasons why developers tend to start to contribute to Apache software, Shalin explains why he contributes to open source, on the Mahout mailing list we hade a discussion on why also students should consider contributing, on the Apache community mailing list there was an interesting discussion on whether developers working on open source are happier than those that don't.