Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
- 1 -
Paper DV03
Using Sankey Diagram to Analyze Drug Pipeline
Tanmay Khole, Bristol-Myers Squibb, Berkeley Heights NJ, USA
ABSTRACT
Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. Sankey diagrams put a
visual emphasis on the major transfers or flows within a system. They are helpful in locating dominant contributions to an overall flow. This paper will focus on drug pipeline of a sponsor and leverage data from clinicaltrials.gov to
analyze number of clinical trials a sponsor has with respect to conditions, interventions, and phases. This will be visualized with the use of Sankey diagram and display the weightage a sponsor has given to a drug or a condition
based on the phases of clinical trials. A drug pipeline gives us an idea about the future of a company and this paper will give a deep dive on some of the
aspects by use of sankey diagram.
INTRODUCTION
This paper analyzes data from clinicaltrials.gov for selected few clinical trial sponsors and uses that info to create sankey diagram. A sankey diagram is a
visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys
are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages and data from clinicaltrials.gov is an excellent example to analyze a sponsor’s drug pipeline to
see which clinical condition or interventions are focused by sponsor with respect to stages of clinical trials. Techniques such as data mapping, data
analysis and data visualization are used to create the sankey diagrams displayed in this paper. Phase I clinical trials are excluded from data analysis and data visualization for ease of understanding the flow of clinical trials which
are in Phase 2-4. Data is obtained in csv file format from clinicaltrials.gov using advanced search option and searching only for sponsor section. Analysis is performed on trials with status: "Active, not recruiting", "Available", "Enrolling
by invitation, "Not yet recruiting", or "Recruiting".
- 2-
SANKEY DIAGRAM FOR CLINICALTRIALS.GOV DATA
Data obtained from clinicaltrials.gov in csv format is one record per trial, see figure 1. In order to use it for Sankey diagram, it needs to be processed as per below steps:
• Data Mapping
• Data Analysis
• Data Visualization
Figure 1: Data obtained from clinicaltrials.gov and imported into SAS® dataset.
Sponsors listed in table 1 are considered in this paper for data analysis and to
create sankey diagrams for the on-going clinical trials of each sponsor.
Clinical trials with status: "Active, not recruiting", "Available", "Enrolling by invitation, "Not yet recruiting", or "Recruiting" are considered as on-going.
Only those clinical trials are selected where sponsor is the lead sponsor of that clinical trial.
Sponsor Distinct On-going
Clinical Trials Count
Data Extraction Date
Sponsor 1 Bristol-Myers Squibb 250 22NOV2019
Sponsor 2 Janssen 126
Sponsor 3 Merck & Co. 173
Sponsor 4 Amgen 56
Sponsor 5 Bayer 56
Table 1: List of Sponsors
22JAN2020
- 3-
DATA MAPPING
Data mapping is an essential component in order to connect links and nodes in
sankey diagrams. Clinical trials data obtained from clinicaltrials.gov contains multiple names for same conditions (e.g.: “NSCLC”, “Non-Small Cell Lung
Cancer”, or “Carcinoma, Non-Small-Cell Lung”), figure 2, and multiple names for same drug/biologic compounds (e.g.: "Nivolumab", "Opdivo", "BMS-936558", "ONO-4538“), figure 3. Hence it is important to identify each
condition and intervention into correct category. As there are numerous conditions, they are mapped into high-level categories like Solid Tumors, Cardiovascular, Leukemia & Lymphoma, etc. See figure 4 for example of
mapping different conditions to high-level category.
Figure 2: Mapping different names of same condition into single category.
Figure 3: Mapping different names of same compound/intervention into single category.
Figure 4: Mapping different conditions to high-level category.
- 4-
Below mapping rules are applied before data analysis step. The mapping rules are designed to identify the focus of the sponsor regards to clinical
conditions/interventions.
• Clinical trials with multiple phases are mapped toward the higher phase
• Clinical trials with multiple clinical conditions are mapped towards each condition
• Clinical trials with multiple interventions are mapped towards each intervention of the respective sponsor
Example 1: Clinical trial NCT03331198, title “Study Evaluating Safety and Efficacy of JCAR017 in Subjects With Relapsed or Refractory Chronic
Lymphocytic Leukemia (CLL) or Small Lymphocytic Lymphoma (SLL)”, has trial design for phase 1 and phase 2. As per the mapping rules, it will be mapped for Phase 2 only. This trial also has multiple clinical conditions listed such as
Chronic Lymphocytic Leukemia, Small Lymphocytic Lymphoma, and will be mapped to each clinical condition as per the mapping rules.
Example 2: Clinical trial NCT04088500, title “A Study of Combination
Nivolumab and Ipilimumab Retreatment in Patients With Advanced Renal Cell Carcinoma” has multiple interventions: Nivolumab and Ipilimumab. As per the
mapping rules, this trial will be mapped to each intervention listed.
Example 3: Clinical trial NCT03036098, title “Study of Nivolumab in Combination With Ipilimumab or Standard of Care Chemotherapy Compared to
the Standard of Care Chemotherapy Alone in Treatment of Patients With Untreated Inoperable or Metastatic Urothelial Cancer” has multiple
interventions: nivolumab, ipilimumab, gemcitabine, cisplatin, carboplatin but only the first two are sponsor’s compounds, hence this trial will be mapped to two interventions: nivolumab & ipilimumab.
Data mapping for this paper is performed by creating flags/identifiers for each condition and intervention listed in respective sponsor’s clinical trials data. Each sponsor listed in table 1 have unique compounds and mapping of each
compound/intervention is required by closely observing the data.
Data obtained from clinicaltrials.gov is one record per trial (horizontal data
format) and it needs to be transformed into vertical data format as shown in figure 5 by using the flags created for each condition category and intervention.
- 5-
Figure 5: Horizontal data mapped and transformed into vertical data format
DATA ANALYSIS
Data analysis is performed by calculating number of objects with respect to its categories which needs to be displayed in sankey diagram. The categories are used as nodes and the count of those objects are used to determine the width
of links between the selected categories.
In this paper, data analysis is performed by calculating number of clinical
trials with respect to sponsor, conditions, interventions, and phases. This step is performed after data mapping to ensure correct connection of links and nodes. SAS® macro %sankey_nodes is used for data analysis and reference
code can be found in the appendix.
%sankey_nodes(inds = ct_gov
,outds = sankey_out
,nodes=%str(sponsor|conditions|interventions|phases)
,cond =
);
%sankey_nodes will calculate the number of objects, in this case, number of
clinical trials. The mapped data is fed into “inds” macro parameter. The nodes
(categories) which needs to be displayed in the sankey diagram are listed in “nodes” macro parameter and if any condition needs to be applied, it can be
listed in “cond” macro parameter. This macro creates a macro variable
&sankeydata. and output dataset which has data for sankey diagram stored in
it. It gets used in the data visualization step to create sankey diagram.
- 6-
DATA VISUALIZATION
Data visualization step is performed using SAS® macro %sankey2html and
D3.js which is a JavaScript library. The output created is in HTML format.
%sankey2html(indata = %nrbquote(&sankeydata.)
,outfl = %sysfunc(pathname(outg,f))/sankey.html
,width = 2100
,height = 700
,flow_num =
);
%sankey2html macro reads macro variable &sankeydata. created from
%sankey_nodes and implement it in HTML file. The output file location and
HTML filename is specified in “outfl” macro parameter. “width” and “height”
parameters are used for sankey diagram height and width. “flow_num” parameter is used to display link labels above a specified number.
Sankey diagrams displays flow of number of clinical trials from
SPONSOR → CONDITIONS → INTERVENTIONS → PHASES
which are also used as nodes for sankey diagrams displayed in this paper.
The thickness of the links signifies the number of clinical trials connecting the
nodes.
Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases
Number of on-going clinical trials for each node are displayed in parenthesis.
Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards
each condition; Clinical trials with multiple interventions are counted towards each intervention.
Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.
- 7 -
SANKEY DIAGRAM 1
Sponsor: Bristol-Myers Squibb
https://clinicaltrials.gov/ct2/about-site/background
Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.
-8-
SANKEY DIAGRAM 2
Sponsor: Janssen
https://clinicaltrials.gov/ct2/about-site/background
Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.
-9-
SANKEY DIAGRAM 3
Sponsor: Merck & Co.
https://clinicaltrials.gov/ct2/about-site/background
Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.
-10-
SANKEY DIAGRAM 4
Sponsor: Amgen
https://clinicaltrials.gov/ct2/about-site/background
Node 1: Sponsor; Node 2: Clinical Conditions; Node 3: Interventions; Node 4: Clinical Trial Phases Number of on-going clinical trials for each node are displayed in parenthesis. Clinical trials with multiple phases are counted toward the higher phase; Clinical trials with multiple clinical conditions are counted towards each condition; Clinical trials with multiple interventions are counted towards each intervention. Note: Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the data acquired from clinicaltrials.gov.
-11-
SANKEY DIAGRAM 5
Sponsor: Bayer
https://clinicaltrials.gov/ct2/about-site/background
- 12 -
CONCLUSION
Sankey diagram is an impressive data visualization tool to understand flow of clinical trials. It helps to track several clinical trials in a single view. Sankey diagram also facilitates to understand weightage of clinical condition or
intervention with respect to the phases of clinical trials and it represents flow in a manner that can be understood by anyone, instantly. Sankey diagrams in this paper allows user to see complex pipeline of a sponsor in a single image
with a focus on the clinical conditions and interventions/compounds of that sponsor. Sankey diagrams make dominant clinical conditions or interventions stand out, and they help users to see relative magnitudes and/or areas with
the largest opportunities.
By using provided macros, sankey diagram can be adjusted as per user’s need. Sankey diagrams offer the added benefit of supporting multiple viewing levels. Users can get a high-level view, see specific details, or generate custom
diagrams by using provided macros.
NOTE FROM THE AUTHOR
Data analysis and data visualization performed in this paper is not an official representation of any of the sponsor’s pipeline but based on the public data acquired from clinicaltrials.gov. This presentation reflects views of the author and should not be construed to represent any of the clinical trial sponsors’ pipeline. ClinicalTrials.gov is a Web-based resource that provides patients, their family members, health care professionals, researchers, and the public with easy access to information on publicly and privately supported clinical studies on a wide range of diseases and conditions.
ACKNOWLEDGMENTS
I would like to thank Vineet Mathur and Simon Xue for their guidance and
support for this paper. You supported me greatly and were always willing to help me.
I would like to thank the dedicated people who manage and maintain clinicaltrials.gov and d3js.org. Without the resources from these sources, this
paper wouldn’t be possible.
https://clinicaltrials.gov/https://d3js.org/
-13-
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Author Name: Tanmay Khole
Company: Bristol-Myers Squibb
Address: 300 Connell Drive, Berkeley Heights
City / Postcode: NJ 07922
Email: [email protected]
Brand and product names are trademarks of their respective companies.
APPENDIX
%macro sankey_nodes(inds=, outds=, nodes=, cond=);
%let cnt = %eval(%sysfunc(countc(&nodes.,"|")) +1);
%put &cnt.;
data _inds;
set &inds.;
run;
%do i = 1 %to &cnt;
%let single&i. = %scan(&nodes, &i , '|');
%put single&i. = &&single&i;
%end;
proc sql;
%do i = 1 %to %eval(&cnt. -1);
create table &&single&i.._wt_chk as
select distinct
%do k=1 %to &i. ;
&&single&k.,
%end;
%superq(single%eval(&i. +1)), &&single&i. as SOURCE
length=100, %superq(single%eval(&i. +1)) as TARGET length=100, count(&&single&i.) as VALUE,
"
{'source':'"||strip(&&single&i.)||"','target':'"||strip(%superq(single%eval(&i.
+1)))||"','value':"||strip(put(count(&&single&i.), 5.0))||"}," as final length=1000
from _inds
%if &cond. ne %then %do;
where &cond.
%end;
group by
%do k=1 %to &i. ;
&&single&k.,
%end;
%superq(single%eval(&i. +1))
;
%end;
quit;
data &outds.;
set
%do i = 1 %to %eval(&cnt. -1);
&&single&i.._wt_chk
%end;
;
run;
-14-
options linesize=max;
%global sankeydata;
proc sql noprint;
select final into: sankeydata separated by " "
from &outds.
;
quit;
%put &sankeydata. ;
%mend sankey_nodes;
%macro sankey2html(indata=, outfl=, width=, height=, flow_num=);
data _null_;
file "&outfl.";
put '';
put '';
put '';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' .node rect {';
put ' cursor: move;';
put '}';
put '.link {';
put ' fill: none;';
put ' stroke: #000;';
put ' stroke-opacity: .2;';
put '}';
put '.link:hover {';
put ' stroke-opacity: .5;';
put '} ';
put ' * {';
put ' font: 11px sans-serif;';
put '}';
put '.linkLabel {';
put ' z-index:10;';
put '}';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' //
-15-
put ' nodes = [],';
put ' links = [];';
put ' ';
put ' sankey.nodeWidth = function (_) {';
put ' if (!arguments.length) return nodeWidth;';
put ' nodeWidth = +_;';
put ' return sankey;';
put ' };';
put ' ';
put ' sankey.nodePadding = function (_) {';
put ' if (!arguments.length) return nodePadding;';
put ' nodePadding = +_;';
put ' return sankey;';
put ' };';
put ' ';
put ' sankey.nodes = function (_) {';
put ' if (!arguments.length) return nodes;';
put ' nodes = _;';
put ' return sankey;';
put ' };';
put ' ';
put ' sankey.links = function (_) {';
put ' if (!arguments.length) return links;';
put ' links = _;';
put ' return sankey;';
put ' };';
put ' ';
put ' sankey.size = function (_) {';
put ' if (!arguments.length) return size;';
put ' size = _;';
put ' return sankey;';
put ' };';
put ' ';
put ' sankey.layout = function (iterations) {';
put ' computeNodeLinks();';
put ' computeNodeValues();';
put ' computeNodeBreadths();';
put ' computeNodeDepths(iterations);';
put ' computeLinkDepths();';
put ' return sankey;';
put ' };';
put ' ';
put ' sankey.relayout = function () {';
put ' computeLinkDepths();';
put ' return sankey;';
put ' };';
put ' ';
put ' sankey.link = function () {';
put ' var curvature = .5;';
put ' ';
put ' function link(d) {';
put ' var x0 = d.source.x + d.source.dx,';
put ' x1 = d.target.x,';
put ' xi = d3.interpolateNumber(x0, x1),';
put ' x2 = xi(curvature),';
put ' x3 = xi(1 - curvature),';
put ' y0 = d.source.y + d.sy + d.dy / 2,';
put ' y1 = d.target.y + d.ty + d.dy / 2;';
put ' return "M" + x0 + "," + y0 + "C" + x2 + "," + y0 + " " + x3 + "," + y1 +
" " + x1 + "," + y1;';
put ' }';
put ' ';
put ' link.curvature = function (_) {';
put ' if (!arguments.length) return curvature;';
put ' curvature = +_;';
put ' return link;';
put ' };';
put ' ';
put ' return link;';
put ' };';
put ' ';
put ' // Populate the sourceLinks and targetLinks for each node.';
put ' // Also, if the source and target are not objects, assume they are indices.';
put ' function computeNodeLinks() {';
put ' nodes.forEach(function (node) {';
put ' node.sourceLinks = [];';
-16-
put ' node.targetLinks = [];';
put ' });';
put ' links.forEach(function (link) {';
put ' var source = link.source,';
put ' target = link.target;';
put ' if (typeof source === "number") source = link.source =
nodes[link.source];';
put ' if (typeof target === "number") target = link.target =
nodes[link.target];';
put ' source.sourceLinks.push(link);';
put ' target.targetLinks.push(link);';
put ' });';
put ' }';
put ' ';
put ' // Compute the value (size) of each node by summing the associated links.';
put ' function computeNodeValues() {';
put ' nodes.forEach(function (node) {';
put ' node.value = Math.max(';
put ' d3.sum(node.sourceLinks, value),';
put ' d3.sum(node.targetLinks, value));';
put ' });';
put ' }';
put ' ';
put ' // Iteratively assign the breadth (x-position) for each node.';
put ' // Nodes are assigned the maximum breadth of incoming neighbors plus one;';
put ' // nodes with no incoming links are assigned breadth zero, while';
put ' // nodes with no outgoing links are assigned the maximum breadth.';
put ' function computeNodeBreadths() {';
put ' var remainingNodes = nodes,';
put ' nextNodes,';
put ' x = 0;';
put ' ';
put ' while (remainingNodes.length) {';
put ' nextNodes = [];';
put ' remainingNodes.forEach(function (node) {';
put ' node.x = x;';
put ' node.dx = nodeWidth;';
put ' node.sourceLinks.forEach(function (link) {';
put ' nextNodes.push(link.target);';
put ' });';
put ' });';
put ' remainingNodes = nextNodes;';
put ' ++x;';
put ' }';
put ' ';
put ' //';
put ' moveSinksRight(x);';
put ' scaleNodeBreadths((width - nodeWidth) / (x - 1));';
put ' }';
put ' ';
put ' function moveSourcesRight() {';
put ' nodes.forEach(function (node) {';
put ' if (!node.targetLinks.length) {';
put ' node.x = d3.min(node.sourceLinks, function (d) {';
put ' return d.target.x;';
put ' }) - 1;';
put ' }';
put ' });';
put ' }';
put ' ';
put ' function moveSinksRight(x) {';
put ' nodes.forEach(function (node) {';
put ' if (!node.sourceLinks.length) {';
put ' node.x = x - 1;';
put ' }';
put ' });';
put ' }';
put ' ';
put ' function scaleNodeBreadths(kx) {';
put ' nodes.forEach(function (node) {';
put ' node.x *= kx;';
put ' });';
put ' }';
put ' ';
put ' function computeNodeDepths(iterations) {';
put ' var nodesByBreadth = d3.nest()';
-17-
put ' .key(function (d) {';
put ' return d.x;';
put ' })';
put ' .sortKeys(d3.ascending)';
put ' .entries(nodes)';
put ' .map(function (d) {';
put ' return d.values;';
put ' });';
put ' ';
put ' //';
put ' initializeNodeDepth();';
put ' resolveCollisions();';
put ' for (var alpha = 1; iterations > 0; --iterations) {';
put ' relaxRightToLeft(alpha *= .99);';
put ' resolveCollisions();';
put ' relaxLeftToRight(alpha);';
put ' resolveCollisions();';
put ' }';
put ' ';
put ' function initializeNodeDepth() {';
put ' var ky = d3.min(nodesByBreadth, function (nodes) {';
put ' return (size[1] - (nodes.length - 1) * nodePadding) / d3.sum(nodes,
value);';
put ' });';
put ' ';
put ' nodesByBreadth.forEach(function (nodes) {';
put ' nodes.forEach(function (node, i) {';
put ' node.y = i;';
put ' node.dy = node.value * ky;';
put ' });';
put ' });';
put ' ';
put ' links.forEach(function (link) {';
put ' link.dy = link.value * ky;';
put ' });';
put ' }';
put ' ';
put ' function relaxLeftToRight(alpha) {';
put ' nodesByBreadth.forEach(function (nodes, breadth) {';
put ' nodes.forEach(function (node) {';
put ' if (node.targetLinks.length) {';
put ' var y = d3.sum(node.targetLinks, weightedSource) /
d3.sum(node.targetLinks, value);';
put ' node.y += (y - center(node)) * alpha;';
put ' }';
put ' });';
put ' });';
put ' ';
put ' function weightedSource(link) {';
put ' return center(link.source) * link.value;';
put ' }';
put ' }';
put ' ';
put ' function relaxRightToLeft(alpha) {';
put ' nodesByBreadth.slice().reverse().forEach(function (nodes) {';
put ' nodes.forEach(function (node) {';
put ' if (node.sourceLinks.length) {';
put ' var y = d3.sum(node.sourceLinks, weightedTarget) /
d3.sum(node.sourceLinks, value);';
put ' node.y += (y - center(node)) * alpha;';
put ' }';
put ' });';
put ' });';
put ' ';
put ' function weightedTarget(link) {';
put ' return center(link.target) * link.value;';
put ' }';
put ' }';
put ' ';
put ' function resolveCollisions() {';
put ' nodesByBreadth.forEach(function (nodes) {';
put ' var node,';
put ' dy,';
put ' y0 = 0,';
put ' n = nodes.length,';
put ' i;';
-18-
put ' ';
put ' // Push any overlapping nodes down.';
put ' nodes.sort(ascendingDepth);';
put ' for (i = 0; i < n; ++i) {';
put ' node = nodes[i];';
put ' dy = y0 - node.y;';
put ' if (dy > 0) node.y += dy;';
put ' y0 = node.y + node.dy + nodePadding;';
put ' }';
put ' ';
put ' // If the bottommost node goes outside the bounds, push it back up.';
put ' dy = y0 - nodePadding - size[1];';
put ' if (dy > 0) {';
put ' y0 = node.y -= dy;';
put ' ';
put ' // Push any overlapping nodes back up.';
put ' for (i = n - 2; i >= 0; --i) {';
put ' node = nodes[i];';
put ' dy = node.y + node.dy + nodePadding - y0;';
put ' if (dy > 0) node.y -= dy;';
put ' y0 = node.y;';
put ' }';
put ' }';
put ' });';
put ' }';
put ' ';
put ' function ascendingDepth(a, b) {';
put ' return a.y - b.y;';
put ' }';
put ' }';
put ' ';
put ' function computeLinkDepths() {';
put ' nodes.forEach(function (node) {';
put ' node.sourceLinks.sort(ascendingTargetDepth);';
put ' node.targetLinks.sort(ascendingSourceDepth);';
put ' });';
put ' nodes.forEach(function (node) {';
put ' var sy = 0,';
put ' ty = 0;';
put ' node.sourceLinks.forEach(function (link) {';
put ' link.sy = sy;';
put ' sy += link.dy;';
put ' });';
put ' node.targetLinks.forEach(function (link) {';
put ' link.ty = ty;';
put ' ty += link.dy;';
put ' });';
put ' });';
put ' ';
put ' function ascendingSourceDepth(a, b) {';
put ' return a.source.y - b.source.y;';
put ' }';
put ' ';
put ' function ascendingTargetDepth(a, b) {';
put ' return a.target.y - b.target.y;';
put ' }';
put ' }';
put ' ';
put ' function center(node) {';
put ' return node.y + node.dy / 2;';
put ' }';
put ' ';
put ' function value(link) {';
put ' return link.value;';
put ' }';
put ' ';
put ' return sankey;';
put '};';
put ' ';
put ' ';
put '/* ------------------- our code ------------------------ */';
put '//var canvas = document.getElementById("chart");';
put ' ';
put 'var units = "Widgets";';
put ' ';
put 'var margin = {';
-19-
put ' top: 10,';
put ' right: 10,';
put ' bottom: 10,';
put ' left: 10';
put '},';
/****ADJUST WIDTH AND HEIGHT****/
put "width = &width - margin.left - margin.right,";
/*******************************/
put " height = &height - margin.top - margin.bottom;";
put ' ';
put 'var formatNumber = d3.format(",.0f"), // zero decimal places';
put ' format = function (d) {';
put ' return formatNumber(d) + " " + units;';
put ' },';
put ' color = d3.scale.category20();';
put ' ';
put '// append the svg canvas to the page';
put 'var svg = d3.select("#chart").append("svg")';
put ' .attr("width", width + margin.left + margin.right)';
put ' .attr("height", height + margin.top + margin.bottom)';
put ' .append("g")';
put ' .attr("transform",';
put ' "translate(" + margin.left + "," + margin.top + ")");';
put ' ';
put '// Set the sankey diagram properties';
put 'var sankey = d3.sankey()';
put ' .nodeWidth(10)';
put ' .nodePadding(20)';
put ' .size([width, height]);';
put ' ';
put 'var path = sankey.link();';
put ' ';
put ' ';
put 'var data = [';
put "&indata.";
put ']; ';
put ' ';
put '//set up graph in same style as original example but empty';
put 'graph = {';
put ' "nodes": [],';
put ' "links": []';
put '};';
put ' ';
put 'data.forEach(function (d) {';
put ' graph.nodes.push({';
put ' "name": d.source';
put ' });';
put ' graph.nodes.push({';
put ' "name": d.target';
put ' });';
put ' graph.links.push({';
put ' "source": d.source,';
put ' "target": d.target,';
put ' "value": +d.value';
put ' });';
put '});';
put ' ';
put '// return only the distinct / unique nodes';
put 'graph.nodes = d3.keys(d3.nest()';
put ' .key(function (d) {';
put ' return d.name;';
put '})';
put ' .map(graph.nodes));';
put ' ';
put '// loop through each link replacing the text with its index from node';
put 'graph.links.forEach(function (d, i) {';
put ' graph.links[i].source = graph.nodes.indexOf(graph.links[i].source);';
put ' graph.links[i].target = graph.nodes.indexOf(graph.links[i].target);';
put '});';
put ' ';
put '//now loop through each nodes to make nodes an array of objects';
put '// rather than an array of strings';
put 'graph.nodes.forEach(function (d, i) {';
put ' graph.nodes[i] = {';
-20-
put ' "name": d';
put ' };';
put '});';
put ' ';
put 'sankey.nodes(graph.nodes)';
put ' .links(graph.links)';
put ' .layout(32);';
put ' ';
put '// add in the links';
put 'var link = svg.append("g").selectAll(".link")';
put ' .data(graph.links)';
put ' .enter()';
put ' .append("path")';
put ' .attr("class", "link")';
put ' .attr("id",function(d,i) { return "linkLabel" + i; })';
put ' .attr("d", path)';
put ' .style("stroke-width", function (d) {';
put ' return Math.max(1, d.dy);';
put ' })';
put ' .sort(function (a, b) {';
put ' return b.dy - a.dy;';
put ' })';
put ' ';
put ' ';
put ' ';
put '// add in the nodes';
put 'var node = svg.append("g").selectAll(".node")';
put ' .data(graph.nodes)';
put ' .enter().append("g")';
put ' .attr("class", "node")';
put ' .attr("transform", function (d) {';
put ' return "translate(" + d.x + "," + d.y + ")";';
put '})';
put ' .call(d3.behavior.drag()';
put ' .origin(function (d) {';
put ' return d;';
put '})';
put ' .on("dragstart", function () {';
put ' this.parentNode.appendChild(this);';
put '})';
put ' .on("drag", dragmove));';
put ' ';
put '// add the rectangles for the nodes';
put 'node.append("rect")';
put ' .attr("height", function (d) {';
put ' return d.dy;';
put '})';
put ' .attr("width", sankey.nodeWidth())';
put ' .style("fill", function (d) {';
put ' return d.color = color(d.name.replace(/ .*/, ""));';
put '})';
put ' .style("stroke", function (d) {';
put ' return d3.rgb(d.color);//.darker(2);';
put '})';
put ' .append("title")';
put ' .text(function (d) {';
put ' return d.name + "\n" + format(d.value);';
put '});';
put ' ';
put '// add in the title for the nodes';
put 'node.append("text")';
put ' .attr("x", -6)';
put ' .attr("y", function (d) {';
put ' return d.dy / 2;';
put '})';
put ' .attr("dy", ".35em")';
put ' .attr("text-anchor", "end")';
put ' .attr("transform", null)';
put ' .text(function (d) {';
put ' return d.name + " (" + d.value + ")";';
put '})';
put ' .filter(function (d) {';
put ' return d.x < width / 2;';
put '})';
put ' .attr("x", 6 + sankey.nodeWidth())';
put ' .attr("text-anchor", "start");';
-21-
put ' ';
put '/* add labels to graphs */';
put 'var labelText = svg.selectAll(".labelText")';
put ' .data(graph.links)';
put ' .enter()';
put ' .append("text")';
put ' .attr("class","labelText")';
put ' .attr("dx",130)';
put ' .attr("dy",0)';
put ' .append("textPath")';
put ' .attr("xlink:href",function(d,i) { return "#linkLabel" + i;})';
put ' .text(function(d,i) ';
put ' { ';
%if &flow_num. > 0 %then %do;
put " if (d.value > &flow_num.) return ' -> ' + d.target.name + ' : ' + d.value;";
%end;
put ' }';
put ' );';
put '// if (d.value > 10) return " -> " + d.value + " -> ";});';
put ' ';
put '// the function for moving the nodes';
put 'function dragmove(d) {';
put ' d3.select(this).attr("transform",';
put ' "translate(" + d.x + "," + (';
put ' d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))) + ")");';
put ' sankey.relayout();';
put ' link.attr("d", path);';
put '}';
put ' ';
put ' ';
put ' ';
put ' ';
put ' }';
put ' ';
put ' //]]>';
put ' ';
put '';
put '';
put '
';
put ' ';
put ' ';
put ' ';
put ' ';
put ' ';
put ' // tell the embed parent frame the height of the content';
put ' if (window.parent && window.parent.parent){';
put ' window.parent.parent.postMessage(["resultsFrame", {';
put ' height: document.body.getBoundingClientRect().height,';
put ' slug: "Lsjkhzf1"';
put ' }], "*")';
put ' }';
put ' ';
put ' // always overwrite window.name, in case users try to set it manually';
put ' window.name = "result"';
put ' ';
put '';
put '';
;;;;
run;
%mend sankey2html;