Leveraging on our data heritage

0
147

Twenty-two years ago when I was a doctoral student in artificial intelligence (AI) at the University of Cambridge, I had to create all the AI algorithms I needed to understand complex phenomena related to this field. AI is a computer software that performs intelligent tasks that normally require human beings. An algorithm is a set of rules that instruct a computer to execute specific tasks. In that era, the ability to create AI algorithms was more important than the ability to acquire and use data. The company Google has created an open source library called TensorFlow, which contains all the developed AI algorithms.

This way Google wants people to develop Applications (Apps) using their software and the payoff for Google is that it will collect data on any individual using the Apps developed using TensorFlow. Today, an AI algorithm is not a competitive advantage but data is. The World Economic Forum calls data the new “oxygen”, while the Chinese AI specialist Kai-Fu Lee calls data the new “oil”.

The population of the African continent is increasing faster than in any region in the world. Africa has a population of 1.3 billion people and a total nominal gross domestic product (GDP) of US$2.3 trillion. This increase in the population is in effect an increase in data and if data is the new oil, it is akin to an increase in oil reserve. Even oil-rich countries like Saudi Arabia do not experience an increase in their oil reserve. How do we as Africans take advantage of this huge amount of data? There are two categories of data available in Africa, and these are heritage and personal data. Heritage data reside in society whereas personal data reside in individuals. Heritage data include data gathered from our languages, emotions and accents. Personal data include health, face and fingerprints data.

Facebook, Amazon, Apple, Netflix and Google are data companies. They trade data to advertisers, banks, political parties etc. For example, the controversial company Cambridge Analytica harvested Facebook data to influence the presidential election that potentially contributed to Donald Trump’s victory in the US elections. The company Google collects language data to build an application called Google Translate that translates from one language to another. This application claims to cover African languages including Zulu, Yoruba and Swahili. Google Translate is less effective on handling African languages when compared to handling European and Asian languages. Now, how do we capitalise on our language heritage to create economic value? We need to build our own language database and create our own versions of Google Translate.       

An important area is the creation of African emotion database. Different cultures exhibit emotions differently. These emotions are very important in areas such as safety of cars and airplanes. If we can build a system that can read pilots’ emotions, we can be able to establish if a pilot is in a good state of mind to operate an aircraft and this can increase safety. To capitalise on the African emotion database, we should create a databank that will capture emotions of African people at various parts of the continent and then use this database to create AI apps that we can use to read people’s emotions. Mercedes-Benz has already implemented the “Attention Assist” which alerts drivers to fatigue.

Another important area is the creation of African health database. AI algorithms are able to diagnose diseases better than human doctors are. However, these AI algorithms depend on the availability of data. To capitalise on this opportunity we need to create a program of collecting such data and use it to build algorithms that will be able to augment medical care.

Some of the latest technological developments are intelligent personal assistants. These devices can take voice instructions. Google has developed Google Assistant, Amazon Alexa, Apple Siri and IBM Watson. These devices are very effective but they do not handle African accents well. We can enhance these devices by including emotion detection algorithms and making them to be less sensitive to different accents, especially, rich and diverse African accents corpus. For us to capitalise on the oversight we need to create our own database of African accents and use this to build intelligent personal assistants that can understand African languages.

Face recognition algorithm does not to work very well for African faces. This is because of the limitations of the African faces libraries. The second reason is the suboptimal data collection for African faces, which are different from Asian and European faces. The third reason is that we have not designed AI algorithms for face recognition from the African perspective. Companies such as Facebook are collecting huge amounts of data from African people who have Facebook accounts. However, we should think of how we can create a face database. Departments of home affairs (DHA) can use this database to increase security in points of entries into our countries. Currently, for the Smart identity card (ID) card, the DHA is imaging the face front only. For a facial database, imaging is required for the sides as well. In a way, Facebook is building this, with our help, as we upload our images. These images also express emotions and this contributes to another aspect of the database built.

Therefore, there is so much heritage and personal data that we can collect and monetise to derive economic value. Some of these data include pictures of the iris of the eye and fingerprints that are very valuable for building biometric security systems. However, for us to be ready we need to develop sets of skills to effectively collect and analyse these data. These skills are data analytics and AI algorithmic skills. To use these open source AI algorithms, one requires some understanding of programming. The data analytics skills should go beyond the basic statistics courses that we often find in our universities and must include advanced topics such as signal processing as well as the capability of handling incomplete and imperfect data sets.

How do we then increase our capacity to collect and analyse data? Firstly, nationally we should introduce data banks that collect these data. However, we should do this in such a way that we protect data security and privacy. One way of achieving this is to expand the mandate of organisations such the Statistics South Africa (Stats SA) to include gathering of personal and heritage data in addition to gathering and analysing economic data and performing national census. Regional organisations such as the South African Development Community must create regional data banks that gather and monetise regional data. 

At the continental level, the African Union (AU) should establish continental data banks that will consolidate and monetise continental database. New opportunities for this arise, for instance, the AU’s approach for the “African Union Passport.” On creating such databanks, we should bear in mind that any given database is usually incomplete and imperfect. We should capacitate data gathering organisations with the competence to analyse incomplete and imperfect data. If we can be able to explore the vast heritage and personal data of the 1.3 billion African people, then we can create the “Saudi Arabia” (oil) of the fourth industrial revolution. There are ethical implications of collection of individual data, which we should take into account.

Professor Tshilidzi Marwala is the Vice-Chancellor and Principal of the University of Johannesburg. He deputises President Cyril Ramaphosa on the South African Presidential Commission on the Fourth Industrial Revolution.