# Skill networks and measures of complex human capital

1. aTepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213
1. Edited by Matthew O. Jackson, Stanford University, Stanford, CA, and approved October 24, 2017 (received for review April 20, 2017)

## Significance

The relationship between worker human capital and wages is a question of considerable economic interest. Skills are usually characterized using a one-dimensional measure, such as years of training. However, in knowledge-based production, the interaction between a worker’s skills is also important. Here, we propose a network-based method for characterizing worker skill sets. We construct a human capital network, wherein nodes are skills and two skills are connected if a worker has both or both are required for the same job. We then illustrate the method by analyzing an online freelance labor market, showing that workers with diverse skills earn higher wages and that those who use their diverse skills in combination earn the highest wages of all.

## Abstract

We propose a network-based method for measuring worker skills. We illustrate the method using data from an online freelance website. Using the tools of network analysis, we divide skills into endogenous categories based on their relationship with other skills in the market. Workers who specialize in these different areas earn dramatically different wages. We then show that, in this market, network-based measures of human capital provide additional insight into wages beyond traditional measures. In particular, we show that workers with diverse skills earn higher wages than those with more specialized skills. Moreover, we can distinguish between two different types of workers benefiting from skill diversity: jacks-of-all-trades, whose skills can be applied independently on a wide range of jobs, and synergistic workers, whose skills are useful in combination and fill a hole in the labor market. On average, workers whose skills are synergistic earn more than jacks-of-all-trades.

The relationship between worker skills and wages is a problem of tremendous economic interest, making it critical to have effective measures of the skills, knowledge, and experience that a worker brings to production: a bundle of worker characteristics that economists refer to as human capital. Traditionally, human capital measures either divide workers into broad categories (e.g., laborers and management) or count years of experience, training, or education (1). However, treating skills as interchangeable removes some of the richness of human capital: workers’ skills are clearly heterogeneous and multidimensional as are the skills required for jobs. A considerable body of literature shows the importance of skill diversity, specialization, and recombination in problem-solving and knowledge generation (2?????8). This plus continued growth in knowledge-based industry (9) have generated interest in more nuanced measures of human capital (10????????19).

Determining the relationship between wages and factors like skill diversity requires us to not only look at a worker’s individual skills but also, her skill combinations. However, considering skills in combination makes measuring human capital much more difficult (20). Some skills (e.g., programing and user interface design) are synergistic, meaning that the combination is more valuable than the sum of its parts: each skill enhances the effectiveness of the other. Other skills (e.g., programing and Russian translation) are no more valuable together than they are individually. On the supply side, some skills (e.g., programing and management skills) are quite common individually but extremely rare in combination. Taken together, these factors mean that the value of an additional skill will depend on the skills that the worker already has (16, 21).

Here, we propose a network-based framework for the characterization of human capital that complements existing notions of human capital and production. Given a pool of workers with multidimensional skill baskets, we construct a network in which skills are nodes and two skills are connected by a link if a worker has both. Links are weighted according to how often the two skills co-occur. We construct a similar network using the skill sets required to perform different jobs, wherein two skills are connected if they are required by an employer in combination. Together, these two networks provide a more complete picture of the supply and demand for human capital in a particular job market. Most importantly, they suggest a number of measures of human capital that account for both the relationships between skills and the context dependency of human capital.

We then use data from an online freelance labor market as an illustration of the method. Using information drawn from worker profiles and employer job advertisements, we construct a human capital network and several network-based measures of worker skills. We use an algorithmic method to split the human capital networks into clusters of closely related skills, providing an entirely endogenous categorization of skills. There is considerable variation in wages between workers specializing in these different skill categories. Workers with more diverse skills tend to earn higher wages than specialists. We then compare the skill measures on the supply and demand sides of the market to show that workers with diverse skills fall into two categories: those who exploit gaps in the market tend to earn higher wages than “jacks-of-all-trades,” who use their skills for multiple different jobs. Finally, we illustrate the value of this approach by showing that our network-based human capital measures explain variation in worker wages, even after controlling for individual skills.

## Methods: Constructing a Human Capital Network

Network science provides a means of making sense of the relationships between skills in the labor market. Here, we construct two different networks: one representing the skills that workers have and the other representing the skills that employers require. Nodes in these networks are skills present in the labor market. On the workers’ side, two skills are connected by a link if the same worker has both. On the employers’ side, two skills are connected if they are required for the same job. We will call these networks the worker (supply) side and the employer (demand) side human capital networks.

More formally, let <mml:math><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:mi>N</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>I={1,2,…N} be a pool of workers, each endowed with a skill set <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>Ai={s1,s2,…sk}. Let <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mo largeop="true" stretchy="false" symmetric="true">?</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>SW=?i∈I{Ai} denote the set of all skills possessed by workers, with <mml:math><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>W</mml:mi></mml:msub></mml:mrow></mml:math>|SW|=MW. Let <mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>A={A1,A2,…AN} denote the set of all worker skill sets. Let <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>ni be the number of skill sets containing skill <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>si, and let <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math>nij be the number of skill sets containing both <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>si and <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>sj. (Note that we have assumed that skills are binary. Our data do not allow us to consider the ability level of an individual in a particular skill, and therefore, we do not consider it explicitly. However, the intensive margin could easily be incorporated into either the link weights or the measures derived from the network.) In a worker human capital network, <mml:math><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>A</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>g(A), the nodes are the skills in the set <mml:math><mml:msub><mml:mi>S</mml:mi><mml:mi>W</mml:mi></mml:msub></mml:math>SW, and two skills, <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>sj and <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:math>sk, are connected if <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>∈</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>sj,sk∈Ai for some <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:math>Ai∈A. This network can be represented as an <mml:math><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>W</mml:mi></mml:msub></mml:mrow></mml:math>MW×MW matrix, where <mml:math><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>gjk=wjk if <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>∈</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>sj,sk∈Ai for some <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:math>Ai∈A and is zero otherwise. The value <mml:math><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math>wjk is the weight on the link between skills <mml:math><mml:mi>i</mml:mi></mml:math>i and <mml:math><mml:mi>j</mml:mi></mml:math>j.

Let <mml:math><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:mi>F</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>K={1,2,…F} be the set of vacancies in the labor market. Let <mml:math><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>Bf={s1,s2,…sk} be the set of skills required for job <mml:math><mml:mi>f</mml:mi></mml:math>f, and let <mml:math><mml:mrow><mml:mi>B</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>F</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>B={B1,B2,…BF}. Let <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mo largeop="true" stretchy="false" symmetric="true">?</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>SJ=?i∈I{Bi} denote the set of all skills requested by all employers in the market with <mml:math><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>J</mml:mi></mml:msub></mml:mrow></mml:math>|SJ|=MJ. Then, one can construct a network similar to that above, <mml:math><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>B</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>g(B), an <mml:math><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>J</mml:mi></mml:msub></mml:mrow></mml:math>MJ×MJ matrix where <mml:math><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>gjk=wjk if <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>∈</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>sj,sk∈Bi for some <mml:math><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:math>Bi∈B and is <mml:math><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>gjk=0 otherwise. Note that this network might be much different from the human capital possessed on the supply side.

We will weight each link in the network to reflect how closely related the two skills are in the labor market. Here, our weights will be a modification of conditional probability, which we will call skill similarity weights (alternative weighting schemes are discussed in SI Appendix): <mml:math><mml:mrow><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math>wijsim=P(si|sj)=nij/nj, where <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>ni and <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>nj are the numbers of workers who have skills <mml:math><mml:mi>i</mml:mi></mml:math>i and <mml:math><mml:mi>j</mml:mi></mml:math>j, respectively, and <mml:math><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo><</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>nj<ni. Skill similarity weights have three properties that are desirable in this context.

• i) If skill <mml:math><mml:mi>A</mml:mi></mml:math>A never co-occurs with skill <mml:math><mml:mi>B</mml:mi></mml:math>B (<mml:math><mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mo>∩</mml:mo><mml:mi>B</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="normal">?</mml:mi></mml:mrow></mml:math>A∩B=?), then the link weight is zero.

• ii) If skill <mml:math><mml:mi>A</mml:mi></mml:math>A always occurs with skill <mml:math><mml:mi>B</mml:mi></mml:math>B (<mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mo>?</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:math>A?B), then the link weight is one.

• iii) The weight between two skills is strictly increasing as they co-occur more frequently.

## Illustration of Network-Based Measures of Human Capital

The worker and job networks effectively summarize the human capital in a particular job market. A major advantage of this method is that it is context-dependent—a human capital network is constructed using data from a particular labor pool, meaning that a worker’s human capital measures will also be context-dependent, which reflects differences in how her skills are valued in different markets (17). We will now show this method in a particular context: an online freelance labor market called UpWork. (These data both are publically available and do not require interaction with any individual. Thus, it does not qualify as human subjects research according to the Institutional Review Board at Carnegie Mellon University.)

### Data.

The world of online freelancing is a natural data source for this method. Online freelance labor has been a growing part of the economy fueled by a combination of better technology and evolving attitudes toward career change, and that growth is expected to continue (22). As a result, UpWork and associated markets (e.g., Elance and ODesk) have been the subject of much recent study (23???27).

In the UpWork market, workers apply for jobs, and employers hire and pay workers through the site. The information on a worker’s profile includes a list of her skills and a list of previous jobs with associated hourly wages. Job postings contain information about the job, including a list of required skills. The job opportunities range from small tasks, such as data entry and software testing, to large-scale projects, like application development and website design.

We use a sample of 26,046 worker profiles and 365,561 job listings collected over a period of 3 mo between November 2013 and January 2014. We use this full population to construct our skill networks, because all of the workers are visible to employers and thus, part of the labor market; 18,283 workers have a wage history on the site. A worker’s average hourly wage is calculated from her wage history, which both is publicly visible and cannot be altered (26). While both hourly and flat rate jobs are listed, we only consider hourly jobs in calculating the wage rate, because we do not observe the hours worked on fixed price jobs. The average worker on the site makes $16.74/h, has worked 765 h on the site, and lists six to seven skills. The distribution of workers’ hourly wages shows typical inequality (SI Appendix has summary statistics). The worker profiles include 2,197 different skills, and the job postings include 2,447. Workers and employers must choose their listed skills from the site’s database of allowable skills. Adding a skill to this database requires a petition by a worker/employer and is only granted if the skill is not redundant. This eliminates any ambiguity in skills caused by spelling errors or synonymous entries and makes the data ideal for this application (23). We drop skills that occur only once in our sample. (The results that follow are no different than they would be with those skills included, because the skill similarity weights for these links are, by definition, one.) We are left with 1,933 worker skills and 2,293 job skills. We then use these data to construct supply side and demand side human capital networks as detailed above. ### An Endogenous Skill Taxonomy. Placing the skills on a network allows us to use the deep toolbox of network analysis to reveal underlying structure in the job market, such as subpools of labor that are evaluated similarly by employers and categories of jobs for which there is no dedicated labor pool. Here, we partition the human capital networks into groups of related skills using the Louvain method: a standard community-finding algorithm (28). We show that this division is significant using modularity. (SI Appendix has additional analysis concerning the significance of this categorization.) The modularity of a partition is proportional to the number of links within a group, relative to what would be expected in a random network. The modularity of the worker network is 0.47, and the modularity of the job network is 0.5, indicating that this division is very strong, and represents real community structure in the network. [It is widely held that any network with modularity above 0.3 has significant community structure (29).] We will call these skill groups “categories.” The skill categories are easily identified using the list of skills in each (SI Appendix has the most common skills in each category). The categories are represented by the colors in Fig. 1. For clarity, we have limited the visualization to skills listed by at least 0.5% of worker profiles and at least 0.2% of job listings (the full networks containing all communities are pictured in SI Appendix). We have attached names to each of the categories based on that identification. On the worker side, the skill categories are (i) administration, writing, and marketing; (ii) art and design; (iii) software testing; (iv) statistics and mobile development; (v) information technology (IT) administration; and (vi) general programing. Jobs divide into far more categories, presumably because jobs are more specific than workers. The categories here are (i) administrative, (ii) writing, (iii) translation, (iv) marketing, (v) art and design, (vi) music and audio, (vii) software testing, (viii) engineering and physical design, (ix) data handling and statistics, (x) mobile and game development, (xi) IT administration, and (xii) general programing. Note that jobs divide into more categories than the workers, suggesting that there are well-defined jobs that lack a well-defined labor pool (e.g., while there are significant numbers of jobs that could be categorized under “translation,” few workers could be definitively identified as “translators”). This categorization of skills is useful for several reasons. It allows us to quantify the diversity of a worker’s skills. Those whose skills fall into a single category are specialists, while those who bridge categories are generalists. The specialists can be further divided by their area of specialty. These denote different subpools of labor, which are likely identifiable to employers observing the labor market. Employers use this kind of “low-bandwidth” information as a substitute for more costly search mechanisms (26). In the case of UpWork, there is substantial overlap between the endogenous categories derived from the network and their exogenously generated job categories. (They do not provide information about worker categories.) However, there are some notable differences between the two. The network suggests that data analysis and software testing require very different sets of skills: the workers in those areas are in distinct labor subpools and are qualified for different jobs. Given that worker search and employer search are guided by these categories, employer/employee matches might be improved by allowing jobs to be categorized by the labor market itself. (SI Appendix has more on the differences between the endogenous and exogenous skill taxonomies.) ### Human Capital and Wages. We will now illustrate how human capital networks can shed light on the relationship between human capital and wages. We provide three examples where skill interactions might be relevant: worker type, skill diversity, and synergies between skills. By definition, these are aspects of human capital that cannot be captured with independent skills. Note that what follows is not a complete treatment of any of these questions. Our simple illustration faces many of the typical problems that arise in identifying wage determinants, and we will not attempt to make any causal statements. Our intent is to simply illustrate how more nuanced measures of human capital can provide greater insight into the correlations between worker skills and wages. We leave the identification of causal relationships to future work. ### Skill Categories. We consider how different types of workers are valued using the skill categories defined above. In the worker network, workers whose skills are in a single area are specialists, representing a worker “type.” Fig. 2A shows wages for workers who specialize in each skill category: technical workers earn, on average,$3.40/h more than artists and designers, who in turn, make an average of $3.00/h more than writers and administrative workers. View larger version: Fig. 2. Average wages for freelance workers with different kinds of human capital. Wages are calculated for all workers who average more than$1/h. Error bars are a 95% confidence interval on the point estimate. (A) Average wages for workers in different areas. (B) Average wages for workers qualified for jobs in different areas (we have omitted two categories with fewer than 20 workers). (C) Average wages for workers with different levels of skill diversity. (D) Average wages for workers with synergistic skills and those whose skills must be applied independently as a jack-of-all-trades.

In the job network, workers whose skills are in a single category are qualified for one type of job. Fig. 2B shows that there is a similar pattern in wages: workers qualified for technical jobs make more than those whose employment opportunities are in creative fields, who in turn make more than those who qualify for administrative tasks (a full breakdown of average wages in each worker and job category is in SI Appendix). These results are similar to those found in previous studies using UpWork’s exogenous categorization of worker skills (22) and the literature using occupation-level data from O*net and similar sources (17).

### The Value of Considering Skill Interactions.

Finally, we address whether these network-based measures provide insight into wages beyond that provided by individual skills. Our baseline will be the most flexible specification using only individual skills. Specifically, we relate log wages to a vector of skill dummies, where <mml:math><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>di=1 if the worker has that skill and <mml:math><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>di=0 otherwise. We include terms for skills that appear in at least 2% of worker profiles: a total of 62 dummies. (When too many dummies are included, the terms become colinear.) As expected, some dummies (e.g., project management) are associated with higher wages, and others (e.g., data entry) are associated with lower wages (Model 1 in SI Appendix, Table S10).

We then consider a model that includes both the dummies for individual skills and the network-based worker skill categories: mobile development, testing, programing, IT administration, and art and design (we omit administration as a comparison). The coefficients on these additional terms are significant with signs in the expected directions (Model 2 in SI Appendix, Table S10). The effect sizes are quite large: for example, workers in programing fields earn 51% (<mml:math><mml:mo>∽</mml:mo></mml:math>$5.40) more than administrative workers with the same skills, while workers in software testing earn 30% (<mml:math><mml:mo>∽</mml:mo></mml:math>$4.00) less. The adjusted <mml:math><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math>R2 of this model is higher than that of the baseline model, indicating that the network-based measures explain variance in worker wages, even when controlling for the skills individually.

The story is similar for the other network-based measures. In models containing both the skill dummies and the number of worker categories/job categories crossed, the network-based measures have significant coefficients (Models 3 and 4 in SI Appendix, Table S10). The effect sizes on these terms are smaller but still notable. Workers with skills in two different worker categories earn an average of $0.63 more than their more specialized peers. Workers who qualify for two different types of job earn an average of$0.62 more than those with more constrained options. Again, the adjusted <mml:math><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math>R2 of both models is higher than that of the baseline.

Ideally, we would also examine a model with a more traditional one-dimensional measure of human capital, such as years of education or experience. Unfortunately, because of the nature of this dataset, we do not observe either. The closest that we have to such a measure is the number of skills listed on a worker’s profile, which is obviously problematic in this context. However, as this type of data becomes more common, we expect that future studies will perform that comparison.

## Conclusion

The complex nature of human capital in knowledge-based industry has made the measurement of worker skills increasingly difficult. Placing skills relevant for employment decisions onto a network provides both a way to operationalize the interrelationships between skills and a deep toolbox with which to measure them. As online marketplaces play a greater role in matching workers to employers, human capital networks have an increasing number of practical applications. The detailed skill data in these markets lend themselves to aggregation and algorithmic search (30?32). Even small improvements in these algorithms would reduce search frictions and improve employer–employee matches. The networks can also reveal which skills are complementary to an existing skill set, which would help workers decide on which new skills to acquire and how to best appeal to potential employers.

## Acknowledgments

We thank Brian Kovak, Rebecca Lassem, Sarah Feldt-Muldoon, Seth Richards-Shubik, Ben Tengelsen, Ross O’Connell, and the attendees of the 2016 Networks in Economics Conference.

## Footnotes

• ?1Email: andersok{at}andrew.cmu.edu.

## References

1. ?
.
2. ?
.
3. ?
.
4. ?
.
5. ?
.
6. ?
.
7. ?
.
8. ?
.
9. ?
.
10. ?
.
11. ?
.
12. ?
.
13. ?
.
14. ?
.
15. ?
.
16. ?
.
17. ?
.
18. ?
.
19. ?
.
20. ?
.
21. ?
.
22. ?
.
23. ?
.
24. ?
.
25. ?
.
26. ?
.
27. ?
.
28. ?
.
29. ?
.
30. ?
.
31. ?
.
32. ?
.

#### Online Impact

• 1634281249 2018-02-17
• 2115681248 2018-02-17
• 8627591247 2018-02-17
• 1184961246 2018-02-17
• 9203941245 2018-02-17
• 4504061244 2018-02-16
• 5597191243 2018-02-16
• 5234981242 2018-02-16
• 6285841241 2018-02-16
• 3913011240 2018-02-16
• 5129741239 2018-02-16
• 3595841238 2018-02-16
• 3166311237 2018-02-16
• 633831236 2018-02-16
• 4424691235 2018-02-16
• 4865101234 2018-02-16
• 159241233 2018-02-16
• 8626671232 2018-02-16
• 315591231 2018-02-16
• 5822951230 2018-02-16