Skill networks and measures of complex human capital

Edited by Matthew O. Jackson, Stanford University, Stanford, CA, and approved October 24, 2017 (received for review April 20, 2017)
Significance
The relationship between worker human capital and wages is a question of considerable economic interest. Skills are usually characterized using a onedimensional measure, such as years of training. However, in knowledgebased production, the interaction between a worker’s skills is also important. Here, we propose a networkbased method for characterizing worker skill sets. We construct a human capital network, wherein nodes are skills and two skills are connected if a worker has both or both are required for the same job. We then illustrate the method by analyzing an online freelance labor market, showing that workers with diverse skills earn higher wages and that those who use their diverse skills in combination earn the highest wages of all.
Abstract
We propose a networkbased method for measuring worker skills. We illustrate the method using data from an online freelance website. Using the tools of network analysis, we divide skills into endogenous categories based on their relationship with other skills in the market. Workers who specialize in these different areas earn dramatically different wages. We then show that, in this market, networkbased measures of human capital provide additional insight into wages beyond traditional measures. In particular, we show that workers with diverse skills earn higher wages than those with more specialized skills. Moreover, we can distinguish between two different types of workers benefiting from skill diversity: jacksofalltrades, whose skills can be applied independently on a wide range of jobs, and synergistic workers, whose skills are useful in combination and fill a hole in the labor market. On average, workers whose skills are synergistic earn more than jacksofalltrades.
The relationship between worker skills and wages is a problem of tremendous economic interest, making it critical to have effective measures of the skills, knowledge, and experience that a worker brings to production: a bundle of worker characteristics that economists refer to as human capital. Traditionally, human capital measures either divide workers into broad categories (e.g., laborers and management) or count years of experience, training, or education (1). However, treating skills as interchangeable removes some of the richness of human capital: workers’ skills are clearly heterogeneous and multidimensional as are the skills required for jobs. A considerable body of literature shows the importance of skill diversity, specialization, and recombination in problemsolving and knowledge generation (2?????–8). This plus continued growth in knowledgebased industry (9) have generated interest in more nuanced measures of human capital (10????????–19).
Determining the relationship between wages and factors like skill diversity requires us to not only look at a worker’s individual skills but also, her skill combinations. However, considering skills in combination makes measuring human capital much more difficult (20). Some skills (e.g., programing and user interface design) are synergistic, meaning that the combination is more valuable than the sum of its parts: each skill enhances the effectiveness of the other. Other skills (e.g., programing and Russian translation) are no more valuable together than they are individually. On the supply side, some skills (e.g., programing and management skills) are quite common individually but extremely rare in combination. Taken together, these factors mean that the value of an additional skill will depend on the skills that the worker already has (16, 21).
Here, we propose a networkbased framework for the characterization of human capital that complements existing notions of human capital and production. Given a pool of workers with multidimensional skill baskets, we construct a network in which skills are nodes and two skills are connected by a link if a worker has both. Links are weighted according to how often the two skills cooccur. We construct a similar network using the skill sets required to perform different jobs, wherein two skills are connected if they are required by an employer in combination. Together, these two networks provide a more complete picture of the supply and demand for human capital in a particular job market. Most importantly, they suggest a number of measures of human capital that account for both the relationships between skills and the context dependency of human capital.
We then use data from an online freelance labor market as an illustration of the method. Using information drawn from worker profiles and employer job advertisements, we construct a human capital network and several networkbased measures of worker skills. We use an algorithmic method to split the human capital networks into clusters of closely related skills, providing an entirely endogenous categorization of skills. There is considerable variation in wages between workers specializing in these different skill categories. Workers with more diverse skills tend to earn higher wages than specialists. We then compare the skill measures on the supply and demand sides of the market to show that workers with diverse skills fall into two categories: those who exploit gaps in the market tend to earn higher wages than “jacksofalltrades,” who use their skills for multiple different jobs. Finally, we illustrate the value of this approach by showing that our networkbased human capital measures explain variation in worker wages, even after controlling for individual skills.
Methods: Constructing a Human Capital Network
Network science provides a means of making sense of the relationships between skills in the labor market. Here, we construct two different networks: one representing the skills that workers have and the other representing the skills that employers require. Nodes in these networks are skills present in the labor market. On the workers’ side, two skills are connected by a link if the same worker has both. On the employers’ side, two skills are connected if they are required for the same job. We will call these networks the worker (supply) side and the employer (demand) side human capital networks.
More formally, let <mml:math><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:mi>N</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>
I={1,2,…N} be a pool of workers, each endowed with a skill set <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>
Ai={s1,s2,…sk}. Let <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mo
largeop="true" stretchy="false" symmetric="true">?</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>
SW=?i∈I{Ai} denote the set of all skills possessed by workers, with <mml:math><mml:mrow><mml:mrow><mml:mo></mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo></mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>W</mml:mi></mml:msub></mml:mrow></mml:math>
SW=MW. Let <mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>
A={A1,A2,…AN} denote the set of all worker skill sets. Let <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>
ni be the number of skill sets containing skill <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>
si, and let <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math>
nij be the number of skill sets containing both <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>
si and <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>
sj. (Note that we have assumed that skills are binary. Our data do not allow us to consider the ability level of an individual
in a particular skill, and therefore, we do not consider it explicitly. However, the intensive margin could easily be incorporated
into either the link weights or the measures derived from the network.) In a worker human capital network, <mml:math><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>A</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
g(A), the nodes are the skills in the set <mml:math><mml:msub><mml:mi>S</mml:mi><mml:mi>W</mml:mi></mml:msub></mml:math>
SW, and two skills, <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>
sj and <mml:math><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:math>
sk, are connected if <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>∈</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
sj,sk∈Ai for some <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:math>
Ai∈A. This network can be represented as an <mml:math><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>W</mml:mi></mml:msub></mml:mrow></mml:math>
MW×MW matrix, where <mml:math><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
gjk=wjk if <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>∈</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
sj,sk∈Ai for some <mml:math><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:math>
Ai∈A and is zero otherwise. The value <mml:math><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math>
wjk is the weight on the link between skills <mml:math><mml:mi>i</mml:mi></mml:math>
i and <mml:math><mml:mi>j</mml:mi></mml:math>
j.
Let <mml:math><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:mi>F</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>
K={1,2,…F} be the set of vacancies in the labor market. Let <mml:math><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>
Bf={s1,s2,…sk} be the set of skills required for job <mml:math><mml:mi>f</mml:mi></mml:math>
f, and let <mml:math><mml:mrow><mml:mi>B</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mo>…</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>F</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math>
B={B1,B2,…BF}. Let <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mo
largeop="true" stretchy="false" symmetric="true">?</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>
SJ=?i∈I{Bi} denote the set of all skills requested by all employers in the market with <mml:math><mml:mrow><mml:mrow><mml:mo></mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo></mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>J</mml:mi></mml:msub></mml:mrow></mml:math>
SJ=MJ. Then, one can construct a network similar to that above, <mml:math><mml:mrow><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>B</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
g(B), an <mml:math><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>J</mml:mi></mml:msub></mml:mrow></mml:math>
MJ×MJ matrix where <mml:math><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
gjk=wjk if <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>∈</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
sj,sk∈Bi for some <mml:math><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:math>
Bi∈B and is <mml:math><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>
gjk=0 otherwise. Note that this network might be much different from the human capital possessed on the supply side.
We will weight each link in the network to reflect how closely related the two skills are in the labor market. Here, our weights
will be a modification of conditional probability, which we will call skill similarity weights (alternative weighting schemes
are discussed in SI Appendix): <mml:math><mml:mrow><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo
stretchy="false"></mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math>
wijsim=P(sisj)=nij/nj, where <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>
ni and <mml:math><mml:msub><mml:mi>n</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>
nj are the numbers of workers who have skills <mml:math><mml:mi>i</mml:mi></mml:math>
i and <mml:math><mml:mi>j</mml:mi></mml:math>
j, respectively, and <mml:math><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo><</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
nj<ni. Skill similarity weights have three properties that are desirable in this context.

i) If skill
<mml:math><mml:mi>A</mml:mi></mml:math>
A never cooccurs with skill<mml:math><mml:mi>B</mml:mi></mml:math>
B (<mml:math><mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mo>∩</mml:mo><mml:mi>B</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="normal">?</mml:mi></mml:mrow></mml:math>
A∩B=?), then the link weight is zero. 
ii) If skill
<mml:math><mml:mi>A</mml:mi></mml:math>
A always occurs with skill<mml:math><mml:mi>B</mml:mi></mml:math>
B (<mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mo>?</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:math>
A?B), then the link weight is one. 
iii) The weight between two skills is strictly increasing as they cooccur more frequently.
Illustration of NetworkBased Measures of Human Capital
The worker and job networks effectively summarize the human capital in a particular job market. A major advantage of this method is that it is contextdependent—a human capital network is constructed using data from a particular labor pool, meaning that a worker’s human capital measures will also be contextdependent, which reflects differences in how her skills are valued in different markets (17). We will now show this method in a particular context: an online freelance labor market called UpWork. (These data both are publically available and do not require interaction with any individual. Thus, it does not qualify as human subjects research according to the Institutional Review Board at Carnegie Mellon University.)
Data.
The world of online freelancing is a natural data source for this method. Online freelance labor has been a growing part of the economy fueled by a combination of better technology and evolving attitudes toward career change, and that growth is expected to continue (22). As a result, UpWork and associated markets (e.g., Elance and ODesk) have been the subject of much recent study (23???–27).
In the UpWork market, workers apply for jobs, and employers hire and pay workers through the site. The information on a worker’s profile includes a list of her skills and a list of previous jobs with associated hourly wages. Job postings contain information about the job, including a list of required skills. The job opportunities range from small tasks, such as data entry and software testing, to largescale projects, like application development and website design.
We use a sample of 26,046 worker profiles and 365,561 job listings collected over a period of 3 mo between November 2013 and January 2014. We use this full population to construct our skill networks, because all of the workers are visible to employers and thus, part of the labor market; 18,283 workers have a wage history on the site. A worker’s average hourly wage is calculated from her wage history, which both is publicly visible and cannot be altered (26). While both hourly and flat rate jobs are listed, we only consider hourly jobs in calculating the wage rate, because we do not observe the hours worked on fixed price jobs. The average worker on the site makes $16.74/h, has worked 765 h on the site, and lists six to seven skills. The distribution of workers’ hourly wages shows typical inequality (SI Appendix has summary statistics).
The worker profiles include 2,197 different skills, and the job postings include 2,447. Workers and employers must choose their listed skills from the site’s database of allowable skills. Adding a skill to this database requires a petition by a worker/employer and is only granted if the skill is not redundant. This eliminates any ambiguity in skills caused by spelling errors or synonymous entries and makes the data ideal for this application (23). We drop skills that occur only once in our sample. (The results that follow are no different than they would be with those skills included, because the skill similarity weights for these links are, by definition, one.) We are left with 1,933 worker skills and 2,293 job skills. We then use these data to construct supply side and demand side human capital networks as detailed above.
An Endogenous Skill Taxonomy.
Placing the skills on a network allows us to use the deep toolbox of network analysis to reveal underlying structure in the job market, such as subpools of labor that are evaluated similarly by employers and categories of jobs for which there is no dedicated labor pool. Here, we partition the human capital networks into groups of related skills using the Louvain method: a standard communityfinding algorithm (28). We show that this division is significant using modularity. (SI Appendix has additional analysis concerning the significance of this categorization.) The modularity of a partition is proportional to the number of links within a group, relative to what would be expected in a random network. The modularity of the worker network is 0.47, and the modularity of the job network is 0.5, indicating that this division is very strong, and represents real community structure in the network. [It is widely held that any network with modularity above 0.3 has significant community structure (29).] We will call these skill groups “categories.”
The skill categories are easily identified using the list of skills in each (SI Appendix has the most common skills in each category). The categories are represented by the colors in Fig. 1. For clarity, we have limited the visualization to skills listed by at least 0.5% of worker profiles and at least 0.2% of job listings (the full networks containing all communities are pictured in SI Appendix). We have attached names to each of the categories based on that identification. On the worker side, the skill categories are (i) administration, writing, and marketing; (ii) art and design; (iii) software testing; (iv) statistics and mobile development; (v) information technology (IT) administration; and (vi) general programing. Jobs divide into far more categories, presumably because jobs are more specific than workers. The categories here are (i) administrative, (ii) writing, (iii) translation, (iv) marketing, (v) art and design, (vi) music and audio, (vii) software testing, (viii) engineering and physical design, (ix) data handling and statistics, (x) mobile and game development, (xi) IT administration, and (xii) general programing. Note that jobs divide into more categories than the workers, suggesting that there are welldefined jobs that lack a welldefined labor pool (e.g., while there are significant numbers of jobs that could be categorized under “translation,” few workers could be definitively identified as “translators”).
This categorization of skills is useful for several reasons. It allows us to quantify the diversity of a worker’s skills. Those whose skills fall into a single category are specialists, while those who bridge categories are generalists. The specialists can be further divided by their area of specialty. These denote different subpools of labor, which are likely identifiable to employers observing the labor market. Employers use this kind of “lowbandwidth” information as a substitute for more costly search mechanisms (26). In the case of UpWork, there is substantial overlap between the endogenous categories derived from the network and their exogenously generated job categories. (They do not provide information about worker categories.) However, there are some notable differences between the two. The network suggests that data analysis and software testing require very different sets of skills: the workers in those areas are in distinct labor subpools and are qualified for different jobs. Given that worker search and employer search are guided by these categories, employer/employee matches might be improved by allowing jobs to be categorized by the labor market itself. (SI Appendix has more on the differences between the endogenous and exogenous skill taxonomies.)
Human Capital and Wages.
We will now illustrate how human capital networks can shed light on the relationship between human capital and wages. We provide three examples where skill interactions might be relevant: worker type, skill diversity, and synergies between skills. By definition, these are aspects of human capital that cannot be captured with independent skills. Note that what follows is not a complete treatment of any of these questions. Our simple illustration faces many of the typical problems that arise in identifying wage determinants, and we will not attempt to make any causal statements. Our intent is to simply illustrate how more nuanced measures of human capital can provide greater insight into the correlations between worker skills and wages. We leave the identification of causal relationships to future work.
Skill Categories.
We consider how different types of workers are valued using the skill categories defined above. In the worker network, workers whose skills are in a single area are specialists, representing a worker “type.” Fig. 2A shows wages for workers who specialize in each skill category: technical workers earn, on average, $3.40/h more than artists and designers, who in turn, make an average of $3.00/h more than writers and administrative workers.
In the job network, workers whose skills are in a single category are qualified for one type of job. Fig. 2B shows that there is a similar pattern in wages: workers qualified for technical jobs make more than those whose employment opportunities are in creative fields, who in turn make more than those who qualify for administrative tasks (a full breakdown of average wages in each worker and job category is in SI Appendix). These results are similar to those found in previous studies using UpWork’s exogenous categorization of worker skills (22) and the literature using occupationlevel data from O*net and similar sources (17).
Skill Diversity.
The human capital networks also provide a method for measuring the diversity of a worker’s skills; a worker whose skills are spread widely on the worker network has a more diverse skill set than one whose skills are tightly clustered in one area. The broader literature on skill diversity is split—some work suggests that workers with more specialized skills make contributions that have higher impact (4), while other work suggests that workers with diverse skills bring more to the problemsolving process (7). Crucially, while most existing work considers diversity in problemsolving and production, it does not consider the wages that workers earn. Here, we measure skill diversity according to the number of categories that a worker’s skills span. Skill sets that are within a single area are specialized, while skill sets spread across multiple areas are more diverse. Here, skill diversity is associated with higher wages—Fig. 2C indicates that workers with diverse skill sets earn about $2.25/h more than those who specialize in a single area.
Skill Synergies.
The results of the previous section raise a question: what is it about skill diversity that gives workers higher wages? Looking at the data suggests that workers with diverse skills have one of two potential advantages in the job market. First, skill diversity can expand a worker’s pool of available jobs. For example, a worker with skills in Russian–English translation and mobile application development is unlikely to use her skills in combination. Instead, she uses her diverse skills independently and earns higher wages by virtue of being able to choose from both programing and translation jobs. Second, a worker may have an uncommon combination of skills that can be used synergistically to fill a hole in the market. For example, a worker with user interface design skills and mobile application development skills may use both skills to develop better iOS games. This worker has diverse skills that she uses in combination.
In the data, we can distinguish workers with these two different types of diversity by examining the number of job categories that they cross. Groups of skills that fit into multiple job categories tend to be used independently, while those that fit a single job category tend to be used in combination. We would hypothesize that workers who fill gaps in the market would have higher wages than those who are jacksofalltrades. Fig. 2D compares workers by their number of worker categories and job categories. As we saw earlier, workers with diverse skills uniformly outperform workers with specialized skills. However, among workers with diverse skills, those who use them in a single job area earn $1.15/h more than those who use them in two or more job areas. This is consistent with our theory that, among workers with diverse skills, those who use them independently earn more by expanding their range of available jobs but less than those who use a rare combination of skills synergistically.
The Value of Considering Skill Interactions.
Finally, we address whether these networkbased measures provide insight into wages beyond that provided by individual skills.
Our baseline will be the most flexible specification using only individual skills. Specifically, we relate log wages to a
vector of skill dummies, where <mml:math><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>
di=1 if the worker has that skill and <mml:math><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>
di=0 otherwise. We include terms for skills that appear in at least 2% of worker profiles: a total of 62 dummies. (When too many
dummies are included, the terms become colinear.) As expected, some dummies (e.g., project management) are associated with
higher wages, and others (e.g., data entry) are associated with lower wages (Model 1 in SI Appendix, Table S10).
We then consider a model that includes both the dummies for individual skills and the networkbased worker skill categories:
mobile development, testing, programing, IT administration, and art and design (we omit administration as a comparison). The
coefficients on these additional terms are significant with signs in the expected directions (Model 2 in SI Appendix, Table S10). The effect sizes are quite large: for example, workers in programing fields earn 51% (<mml:math><mml:mo>∽</mml:mo></mml:math>
∽$5.40) more than administrative workers with the same skills, while workers in software testing earn 30% (<mml:math><mml:mo>∽</mml:mo></mml:math>
∽$4.00) less. The adjusted <mml:math><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math>
R2 of this model is higher than that of the baseline model, indicating that the networkbased measures explain variance in worker
wages, even when controlling for the skills individually.
The story is similar for the other networkbased measures. In models containing both the skill dummies and the number of worker
categories/job categories crossed, the networkbased measures have significant coefficients (Models 3 and 4 in SI Appendix, Table S10). The effect sizes on these terms are smaller but still notable. Workers with skills in two different worker categories earn
an average of $0.63 more than their more specialized peers. Workers who qualify for two different types of job earn an average
of $0.62 more than those with more constrained options. Again, the adjusted <mml:math><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math>
R2 of both models is higher than that of the baseline.
Ideally, we would also examine a model with a more traditional onedimensional measure of human capital, such as years of education or experience. Unfortunately, because of the nature of this dataset, we do not observe either. The closest that we have to such a measure is the number of skills listed on a worker’s profile, which is obviously problematic in this context. However, as this type of data becomes more common, we expect that future studies will perform that comparison.
Conclusion
The complex nature of human capital in knowledgebased industry has made the measurement of worker skills increasingly difficult. Placing skills relevant for employment decisions onto a network provides both a way to operationalize the interrelationships between skills and a deep toolbox with which to measure them. As online marketplaces play a greater role in matching workers to employers, human capital networks have an increasing number of practical applications. The detailed skill data in these markets lend themselves to aggregation and algorithmic search (30?–32). Even small improvements in these algorithms would reduce search frictions and improve employer–employee matches. The networks can also reveal which skills are complementary to an existing skill set, which would help workers decide on which new skills to acquire and how to best appeal to potential employers.
Acknowledgments
We thank Brian Kovak, Rebecca Lassem, Sarah FeldtMuldoon, Seth RichardsShubik, Ben Tengelsen, Ross O’Connell, and the attendees of the 2016 Networks in Economics Conference.
Footnotes
 ?^{1}Email: andersok{at}andrew.cmu.edu.

Author contributions: K.A.A. designed research, performed research, analyzed data, and wrote the paper.

The author declares no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.danielhellerman.com/lookup/suppl/doi:10.1073/pnas.1706597114//DCSupplemental.
 Copyright ? 2017 the Author(s). Published by PNAS.
This open access article is distributed under Creative Commons AttributionNonCommercialNoDerivatives License 4.0 (CC BYNCND).