Gender diversity in tech - a promise

It doesn't take much to realize that the gender ratio in technology is severely out of balance. Whether you look at employees at tech companies, computer science faculty members, graduates in computer and information sciences, user surveys on StackOverflow, you find almost the same picture anywhere.

From personal experience, it seems to me that the situation is considerably worse in Europe than in the US, but I don't have any data to back this up.

If there is any good news, it's that the problem is increasingly recognized - not nearly enough, but at least it's going in the right direction. The problem is complex and there is a lot of debate about how to solve it most effectively. This post is not about going into this debate, but rather to make a simple observation, and a promise.

The simple observation is that I think a lot of it has to do with role models. We can do whatever we want, if a particular group is overwhelmingly composed of one specific phenotype, we have a problem, because anyone who is not of that phenotype is more likely to feel "out of place" than they would otherwise, not matter how welcoming that group is. 

The problem is that for existing groups, progress may be slow because adding new people to the group to increase diversity may initially be difficult, for many different reasons. Having a research group that is all-male (with the exception of my assistant), I am painfully aware of the issues. 

For new groups, however, progress can be faster, because it is often easier to prevent a problem than to fix one. And this is where the promise comes in. Last year, I became the academic director of a new online school at EPFL (the EPFL Extension School, which will focus on continued technology education). This sounds more glorious than it should, because at the time, this new school was simply an idea in my head, and the team was literally just me. But I made a promise to myself, namely that I would not build a technology program and have practically no women teaching on screen. No matter how well they would do it, if the teachers were predominantly male, we would be sending once again, without ill intent, the subtle signal that technology is something for guys.

Today, I want to make this promise publicly. At the EPFL Extension School, we will have gender parity for on-screen instructors. I can't guarantee that we will achieve this at all times, because we are not (yet) a large operation, and I also recognize that at any point in time we may be out of balance, hopefully in both directions, due to the fact that people come and people go. But it will be part of the school's DNA, and if we're off balance, we know what we have to do, and the excuse that it's a hard problem once you have it won't be acceptable. 

var promise = new Promise(function(resolve, reject) {
  if (/* gender parity for on-screen instructors */) {
    resolve("Doing great - go on.");
  }
  else {
    reject(Error("Fail. Fix immediately!"));
  }
});

Technology in public health: A discussion with Caroline Buckee

A few weeks ago, I came across a piece in the Boston Globe entitled Sorry, Silicon Valley, but ‘disruption’ isn’t a cure-all. It's a very short op-ed, so I recommend reading it. The piece was written by Caroline Buckee, Assistant Professor at the Harvard T.H. Chan School of Public Health. I know Caroline personally, and given that she has written some of the key papers in digital epidemiology, I was surprised to read her rant. Because Caroline is super smart and her work extremely innovative, I started to ask myself if I am missing something, so I decided to write to her. My idea was that rather than arguing over Twitter, we could have a discussion by email, which we can then publish on the internet. To my great delight, she agreed, and I am now posting the current state of the exchange here.

From: Marcel Salathé
To: Caroline Buckee
Date: 16. March 2017

Dear Caroline,

I hope this email finds you well. Via Marc I recently found you on Twitter, and I’m looking forward to now seeing more frequently what you’re up to.

Through Twitter, I also came across an article you wrote in the Boston Globe (the "rant about pandemic preparedness", as you called it on Twitter). While I thought it hilarious as a rant, I also thought there were a lot of elements in there that I strongly disagree with. At times, you come across as saying “how dare these whippersnappers with their computers challenge my authority”, and I think if I had been a just-out-of-college graduate reading this, excited about how I could bring digital tools to the field of global health, I would have found your piece deeply demotivating.

So I wanted to clarify with you some issues you raised there, and share those with the broader community. Twitter doesn’t work well for this, in my experience; but would you be willing to do this over email? I would then put the entire discussion on my blog, and you can of course do whatever you want to do with it. I promise that I won’t do any editing at all, and I will also not add anything beyond what we write in the emails.

Would you be willing to do this? I am sure you are super busy as well, but I think it could be something that many people may find worthwhile reading. I know I would.

All the best, and I hope you won’t have to deal with snow any longer in Boston!

Cheers,
Marcel


From: Caroline Buckee
To: Marcel Salathé
Date: 16. March 2017

Hi Marcel,

Sure, I would be happy to do that, I think this is a really important issue - I'll put down some thoughts. As you know, I like having technical CS and applied math grads in my group, and in no way do I think that the establishment should not be challenged. We may disagree as to who the establishment actually is, however. 

My concern is with the attitudes and funding streams that are increasingly prevalent among people I encounter from the start up world and Silicon Valley more generally (and these look to become even more important now that NIH funding it going away) - the attitude that we no longer need to do real field work and basic biology, that we can understand complex situations through remote sensing and crowd sourcing alone, that short term and quick fix tech solutions can solve problems of basic biology and complex political issues, that the problem must be to do with the fact that not enough physicists have thought about it. There is a pervasive arrogance in these attitudes, which are ultimately based on the assumption that technical skill can make up for ignorance. 

As for the idea that my small article would give any new grad pause for thought, I hope it does. I do not count myself as an expert at this stage of my career - these issues require years of study and research. I believe I know enough to understand that a superficial approach to pandemic preparedness will be unsuccessful, and I am genuinely worried about it. The article was not meant to be discouraging, it was supposed to make that particular echo chamber think for a second about whether they should perhaps pay a little more attention to the realities, rich history, and literature of the fields they are trying to fix. (As a side note, I have yet to meet a Silicon Valley graduate in their early 20's who is even slightly deflated when presented with evidence of their glaring ignorance... but I am a bit cynical...!)

In my experience, my opinion is unpopular (at my university and among funders), and does not represent "the establishment". At every level, there is an increasing emphasis on translational work, a decreasing appetite for basic science. This alarms me because any brief perusal of the history of science will show that many of the most important discoveries happen in pursuit of some other scientific goal whose original aim was to understand the world we live in in a fundamental sense - not to engineer a solution to a particular problem. In my field, I think the problem with this thinking is illustrated well by the generation of incredibly complex simulation models of malaria that are intended to help policy makers but are impossible to reproduce, difficult to interpret, and have hundreds of uncertain parameters, all in spite of the fact that we still don't understand the basic epidemiological features of the disease (e.g. infectious duration and immunity).

I think there is the potential for an amazing synergy between bright, newly trained tech savvy graduates and the field of global health. We need more of them for sure. What we don't need is to channel them into projects that are not grounded in basic research and deeply embedded in field experience.

I would enjoy hearing your thoughts on this - both of us are well-acquainted with these issues and I think the field is quite divided, so a discussion could be useful.

I hate snow. I hate it so much!

Take care,

Caroline  


From: Marcel Salathé
To: Caroline Buckee
Date: 18. March 2017

Dear Caroline,

Many thanks for your response, and thanks for doing this. I agree with you that it’s an important issue.

I am sorry that you encounter the attitude that we "no longer need to do real field work and basic biology, that we can understand complex situations through remote sensing and crowd sourcing alone”. This would indeed be an arrogant attitude, and I would be as concerned as you are. It does not reflect, however, my experience, which has largely been that basic research and field work are all that is needed, and the new approaches we and others tried to bring to the table were not taken seriously (the “oh you and your silly tech toys” attitude). So you can imagine why your article rubbed me a bit the wrong way.

I find both of these attitudes shortsighted. Let’s talk about pandemic preparedness, which is the focus of your article. Why wouldn’t we want to bring all possible weapons to the fight? It's very clear to me that both basic science and field work as well as digital approaches using mobile phones, social media, crowdsourcing, etc. will be important in addressing the threat of pandemics. Why does it have to be one versus the other? Is it just a reflection of the funding environment, where one dollar that is going to a crowdsourcing project is one dollar that is taken away from basic science? Or is there a more psychological issue, in that basic science is worthy of a dollar, but novel approaches like crowdsourcing are not?

You write that “the next global pandemic will not be prevented by the perfectly designed app. “Innovation labs” and “hackathons” have popped up around the world, trying to make inroads into global health using technology, often funded via a startup model of pilot grants favoring short-term innovation. They almost always fail.” And just a little later, you state that "Meanwhile, the important but time-consuming effort required to evaluate whether interventions actually work is largely ignored.” Here again, it’s easy to interpret this as putting one against the other. Evaluation studies are important and should be funded, but why can’t we at the same time use hackathons to bring people together and pick each other’s brains, even if only for a few days? In fact, hackathons may be the surest way to demonstrate that some problems can’t be solved on a weekend. And while it’s true that most ideas developed there end up going nowhere, some ideas take on a life of their own. And sometimes - very rarely, but sometimes - they lead to something wonderful. But independent of the outcome, people often walk away enlightened from these events, and have often made new connections that will be useful for their futures. So I would strongly disagree with you that they almost always fail.

Your observation that there is "an increasing emphasis on translational work, a decreasing appetite for basic science” is probably correct, but rather than blaming it on 20 year old SiliconValley graduates, I would ask ourselves why that is. Translational work is directly usable in practice, as per its definition. No wonder people like it! Basic research, on the other hand, is a much tougher sell. Most of the time, it will lead nowhere. Sometimes, it will lead to interesting places. And very rarely, it will lead to absolutely astonishing breakthroughs that could not have happened in any other way (such as the CRISPR discovery). By the way, in terms of probabilities of success, isn’t this quite similar to the field of mobile health apps, wich you dismissed as "a wasteland of marginally promising pilot studies, unused smartphone apps, and interesting but impractical gadgets that are neither scalable nor sustainable”? But I digress. Anyways, rather than spending our time explaining this enormous value of basic research to the public, which ultimately funds it, we engage in pity fights over vanity publications and prestige. People holding back data so that they can publish more; people publishing in closed access journals; hiring and tenure committees valuing publications in journals with high impact factors much more than public outreach. I know you agree here, because at one point you express this very well in your piece when you say that "the publish-or-perish model of promotion and tenure favors high-impact articles over real impact on health."

This that is exactly what worries me, and it worries me much, much more than a few arrogant people from Silicon Valley. We are at a point where the academic system is so obsessed with prestige that it created perverted incentives leading to the existential crisis science finds itself in. We are supposed to have an impact on the world, but the only way impact is assessed is by measures that have very little relevance in the real world, such as citation records and prizes. We can barely reproduce each other’s findings. For a long time, science has moved away from the public, and now it seems that the public is moving away from science. This is obviously enormously dangerous, leading to “alternative fact” bubbles, and politicians stating that people have had enough of experts.

On this background, I am very relieved to see scientists and funders excited about crowdsourcing, about citizen science, about creating apps that people can use, even at the risk that many of them will be abandoned. I would just wish that when traditional scientific experts see a young out of college grad trying to solve public health with a shiny new app, that they would go and offer to help them with their expertise - however naive their approach, or rather *especially* when the approach is naive. If they are too arrogant to accept the help, so be it. The people who will change things will always appreciate a well formed critique, or an advice that helps them jump over a hurdle much faster.

What I see, in short, is that very often, scientific experts, who already have a hard time getting resources, feel threatened by new tech approaches, while people trying to bring new tech approaches to the field are getting the cold shoulder from the more established experts. This, to me, is the wrong fight, and we shouldn’t add fuel to the fire by providing false choices. Why does it have to be "TED talks and elevator pitches as a substitute for rigorous, peer-reviewed science”; why can’t it be both?

Stay warm,

Marcel

PS Have you seen this “grant application” by John Snow? It made me laugh and cry at the same time… tinyurl.com/lofaoop


From: Caroline Buckee
To: Marcel Salathé
Date: 18. March 2017

Hi Marcel,

First of all, I completely and totally agree about the perverse incentives, ridiculous spats, and inefficiencies of academic science - it's a broken system in many ways. We spend our lives writing grants, we battle to get our papers into "high impact" journals (all of us do even though we hate doing it), and we are largely rewarded for getting press, bringing in money, and doing new shiny projects rather than seeing through potentially impactful work. 

You say that I am probably right about basic science funding going away, but I didn't follow the logic from there. We should educate the public instead of engaging in academic pettiness - yes, I agree. Basic science is a tough sell - not sure I agree about that as much, but this is probably linked to developing a deeper and broader education about science at every level. Most basic science leads nowhere? Strongly disagree! If you mean by "leads nowhere" that it does not result in a product, then fine, but if you mean that it doesn't result in a greater understanding of the world and insights into how to do experiments better, even if they "fail", then I disagree. The point is that basic science is about seeking truth about the world, not in designing a thing. You can learn a lot from engineering projects, but the exercise is fundamentally different in its goals and approach. Maybe this is getting too philosophical to be useful. 

In any case, I think it's important to link educating the public about the importance of basic science directly to the arrogance of Silicon Valley; it's not unrelated. Given that NIH funding is likely to become even more scarce, increasing the time and effort spent getting funding for our work, these problems will only get worse. I agree with you that this is a major crisis, but I do think it is important to think about the role played by Silicon Valley (and other wealthy philanthropists for that matter) as the crisis deepens. As they generously step in to fill the gaps - and I think it's wonderful that they consider doing so - it creates the opportunity for them to set the agenda for research. Large donations are given by rich donors whose children have rare genetic conditions to study those conditions in particular. The looming threat of mortality among rich old (mostly white) dudes is going to keep researchers who study dementia funded. I am in two minds about whether this increasing trend of personalized, directed funding from individuals represents worse oversight than we have right now with the NIH etc., but it is surely worth thinking about. And tech founders tend to think that tech-style solutions are the way forward. It is not too ridiculous, I don't think, to imagine a world where much if not most science funding comes from rich old white dudes who decide to bequeath their fortunes to good causes. How they decide to spend their money is up to them, but that worries me; should it be up to them? Who should set the agenda? It would be lovely to fund everything more, but that won't happen - there will always be fashionable and unfashionable approaches, not everyone gets funded, and Silicon Valley's money matters.  

Public health funding in low and middle income settings (actually, in every setting, but particularly in resource-limited regions) is also a very constrained zero sum game. Allocating resources for training and management of a new mHealth system does take money away from something else. Crowd sourcing and citizen science could be really useful for some things, but yes, in many cases I think that sexy new tech approaches do take funding away from other aspects of public health. I would be genuinely interested - and perhaps we could write this up collaboratively - to put together some case studies and try to figure out exactly how many and which mHealth solutions have actually worked, scaled up, and been sustained over time. We could also dig into how applied public health grants are allocated by organizations to short-term tech pilot studies like the ones I was critical of versus other things, and try to evaluate what that means for funding in other domains, and which, if any, have led to solutions that are being used widely. This seems like it might be a useful exercise.

We agree that there should be greater integration of so-called experts and new tech grads but I don't see that happening very much. I don't think it's all because the experts are in a huff about being upstaged, although I'm sure that happens sometimes. If we could figure this out I would be very happy. This is getting too long so I will stop, but I think it's worth us thinking about why there is so little integration. I suspect some of it has to do with the timescales of global health and requirements for long-term relationship building and slow, careful work in the field. I think some of it has to do with training students to value get-rich-quick start-up approaches and confident elevator pitches over longer term investments in understanding and grappling with a particular field. I do think that your example (a young tech grad trying to naively build an app, and the expert going to them to try to help) should be reversed. In my opinion, the young tech grad should go and study their problem of choice with experts in the field, and subsequently solicit their advice about how to move forward with their shiny app idea, which may by then have morphed into something much more informed and ultimately useful...

C

PS :)

From: Marcel Salathé
To: Caroline Buckee
Date: 19. March 2017

Dear Caroline

My wording of “leads nowhere” may indeed have been too harsh, I agree with you that if well designed, then basic research will always tell us something about the world. My reference there was indeed that it doesn’t necessary lead to a product or a usable method. This is probably a good time where I should stress that I am a big proponent of basic research - anyone who doubts that is invited to go read my PhD thesis which was on a rather obscure aspect of theoretical evolutionary biology!

I actually think that the success distribution of basic research is practically identical with that of VC investments. Most VC investments are a complete loss, some return the money, very few return a few X, and the very rare one gives you 100X - 1000X. So is it still worth doing VC investments? Yes, as long as that occasional big success comes along. And so it is with basic research, except, as you say, and I agree, that we will never lose all the money, because we always learn something. But even if you dismiss that entirely, it would still be worth doing.

The topic we seem to be converging on is how much money should be given to what. Unless I am completely misinterpreting you, the frustration in your original piece came from the notion that a dollar in new tech approaches is a dollar taken away from other aspects of public health. With respect to private money, I don’t think we have many options. Whoever gives their wealth gets to decide how it is spent, which is only fair. I myself get some funding from private foundations and I am very grateful for it, especially because I am given the necessary freedom I need to reach the goals I want to achieve with this funding. The issue we should debate more vigorously is how much public money should be spent on what type of approach. In that respect, I am equally interested in the funding vs outcome questions you raised.

As to why there isn’t more integration between tech and public health, I don’t have any answers. My suspicion is that it is a cultural problem. The gap between the two worlds is still very large. And people with tech skills are in such high demand that they can choose from many other options that seem more exciting (even if in reality they end up contributing to selling more and better ads). So I think there is an important role for people like us, who have legs in both worlds, and who can at least try to communicate between the two. This is why I am so careful not to present them as “either or” approaches - an important part of the future work will be done by the approaches in combination.

(I think we’ve clarified a lot of points and I understand your view much better now. I’m going to go ahead and put this on the blog, also to see if there are any reactions to it. I am very happy to go on and discuss more - thanks for doing this!)

Marcel

Self-driving cars: the public health breakthrough of the early 21st century

As readers of this blog know, I am a big fan of self-driving cars. I keep saying that self-driving cars are going to be the biggest public health breakthrough of the early 21st century. Why? Because the number of people that get injured or killed by humans in cars is simply astounding, and self-driving cars will bring this number close to zero.

If you have a hard time believing this, consider these statistics from the Association for Safe International Road Travel:

In the US alone,

  • each year 37,000+ people die in car crashes - over 1,600 are children, almost 8,000 are teenagers
  • each year, about 2.3 million people are injured or disabled

Globally, the numbers are even more staggering:

  • each year, almost 1,3 million people die in car crashes
  • each year, somewhere between 20 and 50 million people are injured or disabled
  • Car crashes are the leading cause of death among young people ages 15-29, and the second leading cause of death worldwide among young people ages 5-14.

If car accidents were an infectious disease, we would very clearly make driving illegal.

Self-driving cars will substantially reduce those numbers. It has recently been shown that the current version of Tesla's autopilot reduced crashes by a whopping 40% - and we're in early days when it comes to the sophistication of these systems. 

All these data points lead me to the conclusion stated above, that self-driving cars are going to be the biggest public health breakthrough of the early 21st century. 

I cannot wait to see the majority of cars being autonomous. I have two kids of age 4 and 7 - the only time I am seriously worried about their safety is when they are in a car, or when they play near a road, and the stats make this fear entirely rational. According to the CDC, the injuries due to transportation are the leading cause of death for children in the US, and I don't assume that this is much different in Europe.

In fact, the only time I am worried about my own safety is when I am in a car, or near a car. I am biking to and from the train station every day, and if you were to plot my health risk over the course of a day, you'd see two large peaks exactly when I'm on the bike.

If there is any doubt that I am super excited to see full autonomous vehicles on the street, I hope to have put them to rest. But what increasingly fascinates me about self-driving cars, beyond the obvious safety benefits, is what they will do to our lives, and how they will affect public transport, cities, companies, etc. I have some thoughts on this and will write another blog post later. My thinking on this has taken an unexpected turn after reading Rodney Brook's blog post entitled "Unexpected Consequences of Self Driving Cars", which I recommend highly. 


Some News(letter)

(It appears as I was updating this website, I was accidentally sending out a test post - my sincere apologies. The digital transformation is hard. QED.)

I've recently deleted my Facebook account, once again. I'm pretty sure that it's final this time. I respect Facebook as a company and I can see how one can love their products. But it's not for me. It used to be great for staying in touch with friends, and seeing pictures and news of friends and family. But Facebook ended up being mostly ads and "news" about Trump. Then came the fake news story. Then came the long-held realization about micro-targeting for political purposes. Then came the censorship deal with China's government. And the realization that what I see on Facebook, and what my friends see about me, is entirely driven by an algorithm. And this algorithm is entirely driven to maximize profits for Facebook. None of this on its own is a game stopper, but in its combination, and the fact that there was little upside to being on Facebook, I had to quit. In a strange way, I felt dirty using Facebook - I knew I was the product that was being monetized, but I checked nevertheless multiple times a day, just to think each and every time "why am I doing this?".

I have seen a lot of "quit social media you'll better off" posts lately. For me, Twitter has been too much of a benefit, professionally and personally, to just walk away from it. It truly has become my major source of professional news. I try to keep the politics and personal stuff to a minimum, and appreciate if others do that as well. To me, Twitter is what LinkedIn wanted to be - a professional network. The fact that my Twitter client does not algorithmically filter the tweets is a great benefit. The fact that Twitter has remained true to its roots - short messages where you need to get to the point fast - is a great benefit. The fact that Twitter is public by default is a great benefit - it means that I think twice about posting rants (sometimes I fail at this). 

But I feel uneasy about Twitter too. The fact that I am delegating communication to a third party is troubling. What if Twitter gets sold to another company and becomes a horrible product? What if they change in a way I don't like? What if they go out of business? I cannot contact the people who follow me, without Twitter's blessing. If Twitter is gone, my contacts are gone. This is not good.

That is why I decided to start a newsletter called Digital Intelligence. I know there are some people who follow me on Twitter because I provide them with interesting bits of news, typically around anything digital - technology, education, academia, economy, etc. - that I find on the web and that I love to share. I think for many it would in fact be more efficient to subscribe to the newsletter instead. Email may not feel like the hip thing in 2017, but I think there are many benefits to email that are simply not there with Twitter. All of us already use email. It's easily searchable. It's not typically filtered by an algorithm (beyond the spam filter, of course). It allows us to go beyond 140 characters when necessary.

But ultimately, I want to be able to communicate with other people without the dependency of a third party platform. In the digital world, email is the only way to do that.

Data of the people, by the people, for the people

About 150 years ago, the American president Abraham Lincoln gave a very short speech - only a few minutes long - on a battlefield in Gettysburg, Pennsylvania. The occasion was to honor the soldiers who died in a fierce battle at the height of the American Civil War. Despite the brevity of the speech, and the fact that almost nobody understood what Lincoln was saying, it is now perhaps the most famous speech in US history by a US president. It is only ten sentences long, but to condense it even further here, Lincoln essentially said that there is nothing anyone could do to properly honor the fallen soldiers, other than to help ensuring that the idea of this newly conceived nation would continue to live on, and that “government of the people, by the people, for the people, shall not vanish from the earth.”

Why is this such a powerful line? It’s powerful because it expresses in very simple terms the basic idea of democracy, that we the people can form government, and that we the people can make political decisions, which is in itself the best guarantee that the decisions are made in the best interest of us, the people.

So, what does all of this have to do with open data?

Fundamentally, government is about organizing power. The vast majority of us agrees that power should be distributed among the many, not the few. To quote John Dalberg Acton: “Liberty consists in the division of power. Absolutism, in the concentration of power.” That is what democracy is about. And that is the discussion we should have about data. Because data is power. And if liberty consist in the division of power, or in the divided access to power, then that means that liberty also consist in the division of data.

But what does it even mean to say that data equals power?

Data contains information, and information can be used for commercial gains. We all understand that. But the power of data is much more fundamental than that. To understand this, we need to reflect on where we are as humans, at this point in time. We have now entered the second machine age - an age where machines will not only be much stronger, physically, as they have been for centuries, but also much, much smarter than we are. Not just a little smarter, but orders of magnitudes smarter. Most of us have come to terms with the fact that machines will achieve human intelligence. But think about machines that are ten times smarter, a hundred times smarter. How do you feel about a machine that is a million times smarter than a human? It’s a question worth asking, because while we may not live to see such a machine, our children, or grandchildren, probably will. In any case, even a machine that’s 100 times smarter than us is something you wouldn’t want to compete against. You wouldn’t feel comfortable if such machines were controlled by a small elite group. However, if such a machine were an agent, at your service, and if everyone would have such agents, which they’d use to make their lives better, that would be an entirely different story. Thus, when AI - artificial intelligence - becomes very powerful, it would be a disaster if that power were in the hands of a few. We would go back to absolutism, and despotism. We therefore need to ensure that the power of AI is distributed widely. 

There are some efforts, like the non-profit organization OpenAI, that aim to ensure that this is the case. In fact, if you follow the field of machine learning a little bit, a field that is currently at the heart of many of the AI-relevant breakthroughs, then you would see that most organizations are now open-sourcing the code that’s behind these AI breakthroughs. That’s a good thing, because it helps ensuring that the raw machinery to build AI, the algorithms, are indeed in the hands of many.

But this is not enough - not nearly. It’s very important to recognize that the power of AI is not simply in the algorithms; it’s not simply in the technology per se. It’s in the data. AI becomes intelligent when it can quickly learn on large amounts of data. AI without data does not exist. The analog version, the human brain, can perhaps help us to understand this idea a bit better. A human brain, in isolation, can only do so many things. It’s when the brain can learn on data that the magic happens. We call this education, or learning more generally. The brain itself is necessary, but it is the access to data - in the form of knowledge, and education - that makes us the most intelligent individuals to ever walk the face of the earth; of such an intelligence that we can even create artificial intelligence. And to take this analogy one step further, if you learn on small, false, or just generally crappy data, your brain will consistently make the wrong predictions. Coincidentally, this is why science has been such a boon for mankind: the scientific method helps us ensure that our brains get trained on high quality data.

So this is the central idea here: 

The enormous power of AI is based on data. If we want everyone to have access to this power, we need widespread access to data.

Put slightly differently:

Broad open data access is an absolute necessity for human liberty in the machine age.

If we accept this, then the question immediately arises, how do we get there? The fact that AI power is derived from data also means that from an economic perspective, privileged data access is incredibly valuable. Market players with privileged data access have absolutely no interest in losing this privilege. This is understandable - in the information economy, being able to extract information from data that can be used commercially is a matter of life and death, economically speaking. Forcing these players to give up their privileged access to data, which they generally collected themselves, would likely have severely negative economic consequences. It would also be highly unethical - for example, I’d be very upset if we forced Google to open up their data centers where anyone could have access to my data. There has to be another way.

I would like to offer a suggestion for another way. Access to personal data should be controlled by those who generate the data, not by those who collect it. The data generator is the person whose data is collected. In order for the data generator to be able to control access, the collector needs to provide the person a copy of the personal data.

Let’s make an example. Let’s say you use a provider’s map on your smartphone to drive from A to B. As you’re driving, GPS data of your trip is collected by the app maker. The app maker uses this kind of data to give you real-time traffic information. Great service - but you’ll never be able to access this data. You should be able to access this data, either in real time or with some delay, and do whatever you please to do with it, from training your own AI to sharing or selling it to third parties.

Another example. Let’s say you track your fitness with some device, you always shop for food at the same grocery store, and you also took part in a cohort study where your genome was sequenced, with your permission of course. The fitness device maker may reuse your data to make a more compelling product; the grocery store may direct ads at you for new products that fit your profile; and the cohort study will use your DNA data for research. All good - but is it easy for you to combine these three data sources? Not at the moment. You should be able to access all three data source - your fitness data, your nutrition data, and your DNA, without having to ask anyone for permission, for whatever reason. If you’re now asking, “why would anyone want that data”, you are asking the exact wrong question. It’s not anyone’s business why you would want that data - the point is that you should be able to get it with zero effort, in machine readable form, and then you should be allowed to do with it whatever you want to. It's your data. 

In some situations, we’re already close to this scenario. For example, when you open a bank account, of course you will be able to access every last detail of any transaction at any point in time, whenever and wherever you want to, without having to ask anyone. Any banking service without this possibility would be unthinkable. Why isn’t it like this with any service? If I can have my financial data like that, why can I not have the same access to my health data, my location data, my shopping data?

Once our own data is easily accessible for us, then it will be possible for us to let others access the data, provided we allow it. We can for example give the data to third parties such as trusted research groups, not-for-profit-organizations, or even trusted parts of the government or trusted corporations. At the moment, this sounds very futuristic. But imagine, for example, a trusted health data organization, perhaps a cooperative, where hundreds of thousands or even millions of people share their health data. This would be an enormous data pool that could be studied by public health officials to make better recommendations. It could be investigated by pharmaceutical companies to design new drugs. And, to bring this back to the original thought about AI, anyone could use this data to improve the artificial intelligence agents that will increasingly make health decisions on our behalf.  

Today, we’ll hear many excellent arguments that make the case for open data, highlighting social, political, economical and scientific aspects. My argument is that human liberty cannot exist in the machine age that is run by algorithms, unless people have broad access to data to improve their own intelligent agents. From this perspective, it makes no sense to be concerned about “smart machines”, or “smart algorithms” - the major concern should be about closed data. We won’t be able to leverage the phenomenal power of smart, learning, machines for the public good, and for distributed AI - for distributed power, really - if all the data is locked away, accessible only to select few. We need data of the people, by the people, for the people. 


1 Year Apple Watch

It's now been roughly one year since I started wearing Apple Watch. I must say I find it quite a compelling device. I use it primarily as an activity tracker, and I also really love the calendar. Generally, its tight integration with Apple's ecosystem makes it a real winner for me.

It's hard to objectively justify a device. For sure I have objectively become more aware about how much I move and exercise per day. I put it on more or less first thing in the morning, but when I forget to do that, I quickly have the feeling that something is missing. In other words, it has become part of my daily routine, something only a few devices have managed to do. By comparison, I've tried pretty much all iPads since version 1 and none of them ever managed to become irreplaceable.

I do hope Apple will use its "win by continuous improvements" strategy for the watch as well. It's clear we're at very early days in the wearable space. I'm excited to see what's ahead.

I have only one very urgent request: please fix Siri. ("I'm sorry Marcel, I did not get what you said").

The curse of self-contempt

(I wrote this post almost 9 months ago, but never published it, for reasons that now escape me. Realizing this omission today, I decided to publish it since I haven't changed my mind about the issue).

This morning, a friend shared an article on Twitter, originally published in the Guardian, with the title "Sophie Hunger: Sadly, I don't need a history to be able to exist somewhere". Sophie Hunger is a Swiss musician who, as the daughter of a diplomat, spent large parts of her live abroad (in other European countries). The article is about authenticity, home, and identity. In it, she writes:

"I can't be proud to be Swiss, although I'm predestined to have these kind of feelings. I'm afraid, I'm not an entirely humble person, but I do have the typical European extra dose of self-contempt. Yet, I discipline myself not to feel proud about my country because I know it is a dishonourable kind of feeling. What have I done to be Swiss, and why should it be an achievement? You see, there's a philosophical problem there."

Eight years ago, I left Switzerland - where I was born and raised - to travel the world a bit, and then to permanently move and live in the US. When I left the country I grew up in, I had the exact same feelings that Sophie expressed in her article. Now, having just returned, I see these feelings in an entirely different way: as part of the root of European angst, perhaps the root of European arrogance, and to some extent as a terrible curse: the curse of self-contempt.

Before I go on, let me clarify that I don't think all Europeans are angsty or arrogant. But while in the US, I have often been astounded by the arrogance of some Europeans, criticizing everything - especially the ones who were either just visiting, or hadn't been in the country for long. Even more surprisingly, Americans would usually take it lightly, laugh with the visitors, which in turn infuriated them even more - did they not get that they were being criticized (stupid!), or were they making fun of them (arrogant!)?

The curse of self-contempt is almost entirely absent in the US, at least compared to Europe. Instead, Americans are brought up to be proud of what they achieved, and full of hope for where they can go. It's widely known that American students are off the charts when it comes to self-esteem and self-confidence. And it's easy for the European critic to laugh this off, especially when the stats on important measures like reading ability and math skills are much more average. But I now believe that it's better to be overconfident about yourself, than under- confident. Extremes in both direction are harmful. But in the long run, modest chronic under-confidence is much more harmful than modest chronic overconfidence.

In the culture I grew up, I was taught that "Eigenlob stinkt", which literally means that "self-praise stinks". And that feeling is still part of the national identity - just two days ago, it was the headline of a paragraph in one of Switzerland's major newspapers (NZZ), in an article about the Swiss National holiday. Think about this expression for a moment. It quite clearly states that it is very bad if you praise yourself. How dare you praise yourself? What have you done that is worthy of praise? Let others be the judge to decide who is worthy of praise.

This is the curse of self-contempt - the inability to be content with yourself, or to praise yourself. It is almost unavoidable that arrogance follows. And as Sophie Hunger's paragraph shows, not only are you not supposed to praise yourself, but don't dare to be proud of your country, because it is dishonorable too. After all, you have not done anything to be Swiss, so how dare you be proud? 

This is no critique of Sophie Hunger (the irony would be unbearable). As I mentioned, I had the exact same feelings, and I am grateful that she expressed those complex feelings in a few clear sentences. I am merely pointing out that I now consider these feelings harmful to any one person, and certainly harmful to a society. Of course it's crucial to live in a society where critical thought is possible and even encouraged, and an occasional dose of self-reflection and self-criticism is certainly healthy too. But not to be allowed to be praising yourself, or to be proud of the place you grew up in, that strikes me as highly destructive to the development of healthy people and a healthy population. 

I have no internal conflict feeling proud of what the Swiss, my ancestors, have achieved, while at the same time feeling disgust at some of the dark historical moments, and some current developments. Just like I don't have any conflict to feel happy with myself, without ignoring, and working on, my darker sides. On a higher level, I can also feel proud to be European - part of a continent that has inflicted so much harm to others, and to itself, until 60 years ago, but that has been remarkably peaceful and resilient in the past few decades, and that managed to keep its cool when others (cough, USA, cough) lost their temper for a decade. And I can do this perfectly well while being alarmed by the current inability to find good solutions to deep economic and social problems. Feeling love for something and retaining the ability to see and point out problems, with the goal to improve - these things are not exclusive, but rather depend on each other. 

It is my hope that I can retain this attitude for as long as possible. Just as living abroad for many years has changed me, being back at home will probably change me again over the years. Perhaps this is why I feel compelled to write this, so I can remember in the future. 

Jack of all trades

The Swiss National Science Foundation just published an interview with me, in the form of an article (you can read the article in english, french, or german). The last paragraph reads as follows:

So he wears the caps of scientist, entrepreneur, author and musician. Can he manage them all? "I envy those scientists who spend all of their energy on a single pursuit. Being active in a number of different research fields sometimes leads you to think that you lack depth in a number of them. But given that modern science is interdisciplinary, becoming involved in areas outside of one’s comfort zone is also an asset. After all, why choose one approach over another?"

I could probably write an entire book on the idea expressed in this paragraph. Interdisciplinary research has fascinated me from the beginning of my career as a scientist. Doing interdisciplinary science is hard. It's hard because, despite best efforts by the various institutions involved in science, the cards are stacked against you:

  1. A truly interdisciplinary research project is hard to get funded; experts in one discipline won't understand - or worse, trivialize - the challenges in the other disciplines.
  2. A truly interdisciplinary research project is hard to execute; different domains speak different languages, have different theories, consider different issues relevant.

  3. A truly interdisciplinary research project is hard to get published; they don't fit in the neat categories of most journals that are rooted in their disciplines, and there are only a few multidisciplinary journals. Also, point 1.

  4. A truly interdisciplinary research project is hard to get noticed; there are almost no conferences, prizes, recognitions, societies, etc. for interdisciplinary work.

These challenges are increasingly recognized. Unfortunately, there is almost nothing substantial that is being done to address them. And it's not for the lack of trying. It is just simply a very, very hard problem to solve. Disciplines may be arbitrary, but they do exist for a good reason. 

But the key point I tried to address in the interview - and which led to the highly condensed last paragraph cited above - is that the biggest hurdle for doing interdisciplinary science is found in oneself. At least, that is my experience. Doing interdisciplinary science means spending much time trying to understand the other disciplines. You can't do interdisciplinary science without having a basic grasp of the other disciplines. The more you understand of the other disciplines, the more interesting your interdisciplinary research will be. 

And here's the catch: all this time you spend keeping up with understanding at least superficially what's going on in the other disciplines, is time you'd normally spend keeping up with your own field. As a consequence, you are constantly in danger of becoming a "jack of all trades, master of none". I highly recommend reading the Wikipedia entry on the etymology of this term. When it first emerged, it was simply "jack of all trades", meaning a person who was able to do many different things. The negative spin "master of none" was only added later, but it's deeply engrained in our culture. The fact that similar sayings exist in all other languages, as listed on the Wikipedia page, speaks volumes. 

In science, not being perceived as an outstanding expert in one particular field is a real danger to one's career, especially in the mid-career stage. The incentive structure of science is hugely influenced by reputation, which is the main reason scientists are so excited about anything with prestige. At the beginning of your career, as a student, it's clear you're not an expert; at the end, it's clear you're an expert, which presumably is why you survived in the system for so long (exceptions apply). But in the ever growing stretch in between - especially the roughly ten years between PhD and tenure - you definitely do not want to be seen as a "jack of all trades, master of none"

Unless you don't give a damn, which, if you're like me, is what I advise you to do. 

I wasn't sarcastic when I said that I envy scientists who spend all of their time working on a single topic. Focus is something I strive for in everything I do. How marvelous to be consumed by one particular question! How satisfying it must be to point all one's neurons to a single problem, like a laser! What a pleasure to be fully in command of all the literature in your speciality! How wonderful to go back to the same conferences, knowing everyone by name, being friends with most of them. Alas, it is not for me.

I'm drawn to many different fields, just like I'm drawn to experiencing many different types of food. Goodness knows I can get obsessed about one particular food item, spending years trying to perfect it. But that doesn't mean I'm not intently curious at all the other things that surround me. In science, I've decided I find the space between disciplines too interesting to be focusing exclusively on one discipline. 

But this is the catch 22: you need to be able to deal with the fact that you're not as much of an expert in your main discipline as you could be. Are you able to deal with this? 

One advice that I would give, completely unsolicited, like everything on this blog, is to first become very very good in one particular field. Good enough that you find it easy to publish, get funding, get jobs, get invited to conferences, and so on. At this point, you'll be in a much stronger position to branch out. You'll still face all the negative incentives listed above, but at least you have a home base you can return to if things get too crazy.

And when everything goes haywire, always remember:

Open Data: Our Best Guarantee for a Just Algorithmic Future

(Two days ago I gave a talk at TEDxLausanne - I'll post the video when it will become available. This is the prepared text of the talk.)

Imagine you are coming down with the flu. A sudden, rapid onset of a fever, a sore throat, perhaps a cough. Worried, you start searching for your symptoms online. A few days later, as you're not getting better, you decide it's time to go see a doctor. Again a few days later, at your appointment with the doctor, you get diagnosed with the flu. And because flu is a notifiable disease, your doctor will pass on that information to the public health authorities.

Now, let's pause for a moment and reflect on what just happened. The first thing you did was to go on the internet. Let’s say you searched on Google. Google now has a search query from you with typical flu-related search terms. And Google has that information from millions of other people who are coming down with the flu as well - 1 two 2 weeks before that information made it to the public health authorities. In other words, from the perspective of Google, it will be old news.

In fact, this example isn’t hypothetical. Google Flu Trends was the first big example of a new field called “digital epidemiology”. When it launched, I was a postdoc. It became clear to me that the data that people generate about being sick, or staying healthy, would increasingly bypass the traditional healthcare systems, and go through the internet, apps, and online services. Not only would these novel data streams be much faster than traditional data streams, they would also be much larger, because - sadly - many more people have access to the internet through a phone than access to a health care system. In epidemiology, speed and coverage are everything; something the world was painfully reminded of last year during the Ebola outbreak.

So I became a digital epidemiologist - and I wondered: what other problems could we solve with these new data? Diseases like the flu, Ebola, and Zika get all the headlines, but there is an entire world of diseases that regularly kills on a large scale that almost nobody talks about: plant diseases. Today, 500 million smallholder farmers in the world depend on their crops doing well, but help is often hard to get when diseases start spreading. Now that the internet and mobile phones are omnipresent, even in low income countries, it seemed that digital epidemiology could help, and so a colleague, David Hughes, and I built a platform called PlantVillage. The idea was simple - if you have a disease in your field or garden, simply snap a picture with your phone and load it onto the site. We’ll immediately have an expert look at it and help you.

This system works well - but there are only so many human experts available in real time. Can we possibly get the diagnosis done by a machine too? Can we teach a computer to see what’s in an image? 

A project at Stanford called ImageNet tried to do this with computer vision – they created a dataset of hundreds of thousands of images - showing things like a horse, a car, a frog, a house. They wanted to develop software that could learn from the images, to later correctly classify images that the software had never seen before. This process is called “machine learning”, because you are letting a machine learn on existing data. The other way of saying this is that you are training an algorithm on existing data. And when you do this right, then the end product - the trained algorithm - can work with information it hasn’t encountered before. But the people at Image Net didn’t just use machine learning. They organized a challenge - a friendly competition - by saying “here, everybody can have access to all this data - if you think you can develop an algorithm that is better than the current state of the art, go for it!” And go for it, people did! Around the world, hundreds of research teams participated in this challenge, submitting their algorithms. And a remarkable thing happened. In less than five years, the field experienced a true revolution. At the end, the algorithms weren’t merely better than the previous ones. They were now better than humans. 

Machine learning is an incredibly hot and exciting research field, and it’s the basis of all the “artificial intelligence” craze that’s going on at the moment. And it's not just academic: it is how Facebook recognizes your friends when you upload an image. It is how Netflix recommends which movies you will probably like. And it is how self driving cars will bring you safely from A to B in the very near future.

Now, take the ImageNet project, but replace the images of horses and cars and houses, with images of plant diseases. That is what we are now doing with PlantVillage. We are collecting hundreds of thousands of images from diseased and healthy plants around the world, making them open access, and we are running open challenges where everyone can pitch in algorithms that can correctly identify a disease. Imagine how transformational this can be! Imagine if these algorithms can be just as good, or perhaps even better, than human experts. Imagine what can happen when you build these algorithms into apps, and release those apps for free to the 5 billion people around the globe with smartphones.

It’s clear to me now that this not only the future of PlantVillage, but a future of applied science more generally. Because if you can do this with plant diseases, you can do this with human diseases as well. You can in principle do it with skin cancer detection. Basically, any task where a human needs to make a decision based on an image, you can train an algorithm to be just as good. And it doesn’t stop at images, of course. Text, videos, sounds, more complex data altogether - anything is up for grabs. As long as you have enough good data that a machine learning algorithm can train on, it’s only a matter of time until someone will develop an algorithm that will reach and exceed human performance. And here, we're not talking science fiction, in the next 50 years, we're talking now, in the next couple of years. And this is why these large datasets - big data - are so exciting. Big data is not exciting because it’s big per se. It’s exciting because that bigness means that algorithms can learn from vast amounts of knowledge stored in those datasets, and achieve human performance.

If algorithms derive their power from data, then data equals power. So who has the data?  Things may be ethically easy with images of horses, cars, houses, or even plant diseases -  but what about the data concerning your personal health? Who has the data about our health, data which will form the basis for smart, personalized health algorithms? The answer may surprise you, because it’s not just about your past visits to doctors, and to hospitals. It’s your genome, your microbiome, all the data from your various sensors, from smartphones to smartwatches. The drugs you took. The vaccines you received. The diseases you had. Everything you eat, every place you go to, how much you exercise. Almost anything you do is relevant to your health in one way or another. And all that data exists somewhere. In hospital databases. In electronic health records. On the servers of the Googles and Apples and Facebooks of this world. In the databases of the grocery stores, where you buy your food. In the databases of the credit card companies who know where you bought what, when. These organizations have the data on which to train the future algorithms of smart personalized healthcare.

Today, these mainly business organizations provide us with compelling services that we love to use. In the process, they collect a lot of data about us, and store them in their mostly secure databases. They use these data primarily driven by the potential of commercial gains. But the data are closed, not accessible to the public - we imprison our data in those silos that only a selected few have access to, because we are afraid of privacy loss. And because of this fear, we don’t let the data work for us.  

Remember Google Flu Trends that I mentioned a few minutes ago? Last year, Google shut it down. Why? We can only speculate. But what this reminds us of is that those who have the data with which they can build these fantastic services... can also shut them down. And when it comes to our health, to our wealth, to our public infrastructure, we should be really careful to think deeply about who owns the data. I applaud Google for what they have done with Google Flu Trends. I am a happy consumer of many Google services that I love to use. But it is our responsibility to ensure that we don’t start to depend too strongly on systems that can be shut down any day without warning, because of a business decision that's been made thousands of miles away. 

So, how we can strike the right balance between protecting individual privacy and unleashing big data for the good of the public? I think the solution lies in giving each of us a right to a copy of our data.  We can then take a copy of our data, and either choose to retain complete privacy - or we can choose to donate parts of these data to others, to research projects, or into the public domain to pursue a public good, with the reassurance that these data will not be used by insurance companies, banks and employers to discriminate against us.  

Implementing this vision is not going to be easy, but it is possible. It has to be possible. Why? Two reasons (at least). First, our data is already digital, stored in machines somewhere and hence eminently hackable. We should have regulations in place to manage the risks of the inevitable data breaches. Second, we are now running full speed into a 2nd machine age where machines will not only be much stronger than us - as they have been in the past decades - but also much, much smarter than us. We need to continue to ensure that the machines work in our common interest. It’s not smart machines and artificial intelligence we should be concerned about - they are smart and intelligent because of the data. Our concern should be about closed data. We won’t be able to leverage the phenomenal power of smart, learning, machines for the public good if all the data is locked away.

Open data is not what we should be afraid of - it's what we should embrace. It’s our best guarantee that we remain in control of the algorithms that will rule our digital world in the future.

Forty

Today I'm forty. 

40.

I'm starring at this number in disbelief, even though I've seen it coming for, say, 35 years. It's not because I feel old (I don't). It's not because I can't believe how fast it's gone (I can). 

It's because I can't believe how lucky I've been. I've lived a life of privilege from the day I was born. 

I was born in what is today the world's wealthiest country by almost any measure. I was born to loving parents who supported me in every decision I've made, who didn't pressure me into any particular direction. I was lucky to meet many wonderful friends across the decades and across the continents, some of whom I've only known for a few months or a few years. As far as I can tell, none of them has ever betrayed me. I have a wonderful wife who supported me in all my decisions, who stuck with me in my low times, and who is simply the best partner and mother to our children I can imagine. I have two wonderful children who make me smile every single day. With the exception of the regular childhood sniffles, they have never been seriously ill.

I have never been in a hospital as a patient in the past 40 years. I would love to keep this record going for the next 40 years.

I have a wonderful job, a tenured position in one of the world's best and most innovative universities. I'd like to think I had something to do with this - but it's hard not to see how everything has also simply lined up perfectly: a free educational system, wonderful mentors, perfect timing.

In other words, I was lucky beyond imagination.

I'm curious what's in store for me. Statistically speaking, I'm not even at half time. But ever since I've had children, I've been thinking about death more often. I'm not sure why, but there is something about seeing your children grow up that somehow reminds you that for new life to begin, old life must eventually end. I am now keenly aware that my heart could stop beating tomorrow, or that a car may run me over on my ride to work next week. So what? If anyone reads this after I've passed away, I hope they can see that I've had a wonderful life, and that I've been incredibly grateful for it. 

Perhaps I'll luck out even more, and I will see my children grow up to become independent and responsible adults. Perhaps I will see my wife make her dream come true of becoming a winemaker. Perhaps I can one day hold a grandchild in my arms. I will surely shed a tear then. Perhaps I can work on interesting projects for another two to three decades, and try to make the world a slightly better place than it was before. Perhaps I can see thousands of sunrises while marveling at the beauty of the universe, and my astronomical luck to be among the few bags of atoms that understand where they came from and why they are here. Perhaps I can drink thousands of wonderful bottles of wine, with friends new and old, laughing and crying about whatever it is life has thrown our way.

Perhaps not. Perhaps my luck runs out in a few days, a few months, a few years. So be it. Today, I am simply pausing, after having circled around the sun 40 times, somewhere in a distant corner of the universe, to be grateful for what I've had.