pradachan
/

AI-Scientist

Model card Files Files and versions Community

AI-Scientist / review_iclr_bench /iclr_parsed /0SiVrAfIxOe.txt

pradachan

Upload folder using huggingface_hub

f71c233 verified 19 days ago

raw

history blame contribute delete

68.4 kB

	# CLOSED-LOOP CONTROL OF ADDITIVE MANUFAC## TURING VIA REINFORCEMENT LEARNING

	Anonymous authors
	Paper under double-blind review

	ABSTRACT

	Additive manufacturing suffers from imperfections in hardware control and material consistency. As a result, the deposition of a wide range of materials requires on-the-fly adjustment of process parameters. Unfortunately, learning the inprocess control is challenging. The deposition parameters are complex and highly
	coupled, artifacts occur after long time horizons, available simulators lack predictive power, and learning on hardware is intractable. In this work, we demonstrate
	the feasibility of learning a closed-loop control policy for additive manufacturing.
	To achieve this goal, we assume that the perception of a deposition device is limited and can capture the process only qualitatively. We leverage this assumption
	to formulate an efficient numerical model that explicitly includes printing imperfections. We further show that in combination with reinforcement learning, our
	model can be used to discover control policies that outperform state-of-the-art
	controllers. Furthermore, the recovered policies have a minimal sim-to-real gap.
	We showcase this by implementing a single-layer self-correcting printer.

	1 INTRODUCTION

	A critical component of manufacturing is identifying process parameters that consistently produce,
	high-quality structures. In commercial devices, this is typically achieved by expensive trial-and-error
	experimentation (Gao et al., 2015). To make such an optimization feasible, a critical assumption is
	made: there exists a set of parameters for which the relationship between process parameters and
	process outcome is predictable. However, such an assumption does not hold in practice because all
	manufacturing processes are stochastic in nature. Specifically in additive manufacturing, variability
	in both materials and intrinsic process parameters can cause geometric errors leading to imprecision
	that can compromise the functional properties of the final prints. Therefore, transition to closed-loop
	control is indispensable for industrial adoption of additive manufacturing (Wang et al., 2020).

	Recently, we have seen promising progress in learning policies for interaction with amorphous materials (Li et al., 2019b; Zhang et al., 2020). Unfortunately, in the context of additive manufacturing,
	discovering effective control strategies is significantly more challenging. The deposition parameters
	have a non-linear coupling to the dynamic material properties. To assess the severity of deposition
	errors, we need to observe the material over long time horizons. Available simulators either lack predictive power (Mozaffar et al., 2018) or are too complex for learning (Tang et al., 2018; Yan et al.,
	2018). Moreover, learning on hardware is intractable as we require tens of thousands of printed
	samples. These challenges are further exaggerated by the limited perception of printing hardware,
	where typically, only a small in-situ view is available to assess the deposition quality.

	In this work, we propose the first closed-loop controller for additive manufacturing based on reinforcement learning deployed on real hardware. To achieve this we formulate a custom numerical
	model of the deposition process. Motivated by the limited hardware perception we make a key
	assumption: to learn closed-loop control it is sufficient to model the deposition only qualitatively.
	This allows us to replace physically accurate but prohibitively slow simulations with efficient approximations. To ameliorate the sim-to-real gap, we enhance the simulation with a data-driven noise
	distribution on the spread of the deposited material. We further show that careful selection of input
	and action space is necessary for hardware transfer. Lastly, we leverage the privileged information
	about the deposition process to formulate a reward function that encourages policies that account
	for material changes over long horizons. Thanks to the above advancements, our control policy can


	-----

	be trained exclusively in simulation with a minimal sim-to-real gap. We demonstrate that our policy outperforms baseline deposition methods in simulation and physical hardware with low or high
	viscosity materials. Furthermore, our numerical model can serve as an essential building block for
	future research in optimal material deposition, and we plan to make the source code available.

	2 RELATED WORK

	To identify process parameters for additive manufacturing, it is important to understand the complex
	interaction between a material and a deposition process. This is typically done through trial-anderror experimentation (Kappes et al., 2018; Wang et al., 2018; Baturynska et al., 2018). Recently,
	optimal experiment design and, more specifically, Gaussian processes have become a tool for efficient use of the samples to understand the deposition problem (Erps et al., 2021). However, even
	though Gaussian Processes model the deposition variance, they do not offer tools to adjust the deposition on-the-fly. Another approach to improve the printing process is to design closed-loop controllers. One of the first designs was proposed by Sitthi-Amorn et al. (2015) that monitors each
	layer deposited by a printing process to compute an adjustment layer. Liu et al. (2017) built upon
	the idea and trained a discriminator that can identify the type and magnitude of observed defects. A
	similar approach was proposed by Yao et al. (2018) that uses handcrafted features to identify when a
	print significantly drops in quality. The main disadvantage of these methods is that they rely on collecting the in-situ observations to propose one corrective step by adjusting the process parameters.
	However, this means that the prints continue with sub-optimal parameters, and it can take several
	layers to adjust the deposition. In contrast, our system runs in-process and reacts to the in-situ views
	immediately. This ensures high-quality deposition and adaptability to material changes.

	Recently machine learning techniques sparked a new interest in the design of adaptive control policies (Mnih et al., 2015). A particularly successful approach for high-quality in-process control is to
	adopt the Model Predictive Control paradigm (MPC) (Gu et al., 2016; Silver et al., 2017; Oh et al.,
	2017; Srinivas et al., 2018; Nagabandi et al., 2018). The control scheme of MPC relies on an observation of the current state and a short-horizon prediction of the future states. By manipulating the
	process parameters, we observe the changes in future predictions and can pick a future with desirable
	characteristics. Particularly useful is to utilize deep models to generate differentiable predictors that
	provide derivatives with respect to control changes (de Avila Belbute-Peres et al., 2018; Schenck &
	Fox, 2018; Toussaint et al., 2018; Li et al., 2019a). However, addressing the uncertainties of the deposition process with MPC is challenging. In a noisy environment, we can rely only on the expected
	prediction of the deposition. This leads to a conservative control policy that effectively executes the
	mean action. Moreover, reacting to material changes over time requires optimizing actions for long
	time horizons which is a known weakness of the MPC paradigm (Garcia et al., 1989). As a result,
	MPC is not suitable for in-process control in noisy environments.

	Another option to derive control policies is to leverage deep reinforced learning (Rajeswaran et al.,
	2017; Liu & Hodgins, 2018; Peng et al., 2018; Yu et al., 2019; Lee et al., 2019; Akkaya et al.,
	2019). The key challenge in the design of such controllers is formulating an efficient numerical
	model that captures the governing physical phenomena. As a consequence, it is most commonly
	applied to rigid body dynamics and rigid robots where such models are readily available (Todorov
	et al., 2012; Bender et al., 2014; Coumans & Bai, 2016; Lee et al., 2018). In contrast, learning with
	non-rigid objects is significantly more challenging as the computation time for deformable materials
	is higher and relies on some prior knowledge on the task (Clegg et al., 2018; Elliott & Cakmak,
	2018; Ma et al., 2018; Wu et al., 2019). Recently Zhang et al. (2020) proposed a numerical model
	for training control policies where a rigid object interacts with amorphous materials. Similarly, in
	our work a rigid printing nozzle interacts with the fluid-like printing material. However, our model
	is specialized for the printing hardware and models not only the deposition but also its variance.
	We demonstrate that this is an important component in minimizing the sim-to-real gap and design
	control policies that are readily applicable to the physical hardware.

	3 HARDWARE PRELIMINARIES

	The choice of additive manufacturing technology constraints the subsequent numerical modeling.
	To keep the applicability of our developed system as wide as possible, we opted for a direct write


	-----

	needle deposition system mounted on a 3-axis Cartesian robot
	(inset). The robot allows us to freely control the acceleration and
	position of the dispenser. The dispenser can process a wide range
	of viscous materials, and the deposition is very similar to fused
	deposition modeling. We further enhance the apparatus with two
	camera modules. The cameras lie on the opposite sides of the
	nozzle to allow our apparatus to perceive the location around
	the deposition. It is this locality of the in-situ view that we will
	leverage to formulate our numerical model.

	Material
	Nozzle

	Camera


	3.1 BASELINE CONTROLLER

	To control the printing apparatus, we employ a baseline slicer. The
	input to the slicer is a three-dimensional object. The output is a series of locations the printing head visits to reproduce the model as
	closely as possible. To generate a single slice of the object, we start
	by intersecting the 3D model with a Z-axis aligned plane (please
	note that this does not affect the generalizability since the input can
	be arbitrarily rotated). The slice is represented by a polygon that
	marks the outline of the printout (Figure 1 gray). To generate the
	printing path, we assume a constant width of deposition (Figure 1
	red) that acts as a convolution on the printing path. The printing
	path (Figure 1 blue) is created by offsetting the print boundary by
	half the width of the material using the Clipper algorithm (Johnson, 2015). The infill pattern is generated by tracing a zig-zag line
	through the area of the print (Figure 1 green).


	\|Col1\|Material Width Outline Path Infill Path Target\|
	\|---\|---\|


	Figure 1: Baseline slicer.


	4 REINFORCEMENT LEARNING FOR ADDITIVE MANUFACTURING

	The baseline control strictly relies on a constant width of the material. To discover policies that
	can adapt to the in-situ observations, we formulate the search in a reinforcement learning framework. The control problem is described by a Markov decision process (S, A, P, R), where S is the
	observation space, A is a continuous action space, P = P (s[′]\|s, a) is the transition function, and
	_R(s, a) →_ R is the reward function.

	To learn a control policy we take a model free approach by learning directly from printing. Unfortunately, learning on a physical device is challenging. The interaction between various process
	parameters can lead to deposition errors that require manual attention. As such discovering control
	policies directly on the hardware has too steep sample complexity to be practical. A potential solution is to learn the control behavior in simulation and transfer to the physical device. However,
	transfer from simulation to real world is a notoriously hard problem that hinges on applicability of
	the learned knowledge. In this work, we propose a framework for preparing numerical models for
	additive manufacturing that facilitate the sim-to-real transfer. Our model has three key components
	that facilitate the generalization of the learned control policies.

	The first component is the design of the observation space. To facilitate the transfer of learning
	between simulation and a physical device, we rely on an abstraction of the observation space (Kaufmann et al., 2020). Rather than using the direct appearance feed from our camera module we process
	the signal into a heightmap. A heightmap is a 2D image where each pixel stores the height of the
	deposited material. For each height map location, the height is measured as a distance from the
	building plate to the deposited material. This allows our system to generalize to many different
	sensors such as cameras, depth sensors, or laser profilometers. However, unlike Kaufmann et al.
	(2020), we do not extract the feature vectors manually. Instead, similarly to OpenAI et al. (2018),
	we learn the features directly from the heightmap. In contrast to OpenAI et al. (2018), we do not
	randomize the observation domain. Additional randomization is not necessary in our case thanks to
	the controlled observation conditions of the physical apparatus.

	A key insight of our approach is that the engineered observation space coupled with learned features
	can significantly help with policy learning. A careful design of the observation space can facilitate


	-----

	the sim-to-real transfer, make the hardware design more flexible by enabling the use of a range of
	sensors that compute similar observations, and remove the need to hand-craft the features. It is
	therefore worth wile to invest in the design of observation spaces.

	The second component of our system is the design of the action space. Instead of directly controlling
	the motors of the printer we rely on a high-level control scheme and tune coupled parameters such
	as velocity or offset from the printing path. This idea is similar in spirit to OpenAI et al. (2018).
	OpenAI et al. (2018) suggest not using direct sensory inputs from the mechanical hand as observations due to their noisiness and lack of generalization across environments. Instead, they use image
	data to track the robotic hand. Similarly, but instead in action space, we do not control the printer
	by directly inputting the typically noisy and hardware-specific voltages that actuate the motors of
	the apparatus. Instead, we control the printer by setting the desired velocity and offset and letting
	the apparatus match them to the best of its capabilities. This translation layer allows us to utilize the
	controller on a broader range of devices without per-device training.

	This idea could also be generalized to other robotic tasks, for example, by applying a hierarchical
	divide and conquer approach to the action space. The control policies could output only high-level
	actions such as desired locations for robots actuators or deviations from a baseline behavior. Lowlevel controllers could then execute these higher-level actions. Such a control hierarchy can facilitate
	training by decoupling the higher-level goals from low-level inputs and transferring existing control
	policies to new devices through specialized low-level controllers.

	The third and last component of our system is an approximative transition function. Rather than
	modelling the deposition process exactly we propose to approximate it qualitatively. A qualitative
	approximation allows us to design an efficient computational model. To facilitate the transfer of
	the simulated model to the physical device we reintroduce the device uncertainty in a data-driven
	fashion. This is similar to OpenAI et al. (2018), but instead of covering a large array of options, we
	specialize the randomization. Inspired by Chebotar et al. (2019), we designed a data-driven LPC
	filter that matches the statistical distribution of variations observed during a typical printing process.
	This noise enables our control policies to adapt to changing environments and, to some extent, to
	changes in material properties such as viscosity.

	Our approximative transition function shows that it is not necessary to reproduce the physical world
	in simulation perfectly. A qualitative approximation is sufficient as long as we learn behavior patterns that translate to real-world experiences. This is an important observation for any task where we
	manipulate objects and elastic or frictional forces dominate the behavior. Relying on computationally more affordable simulations allows for applying existing learning algorithms to a broader range
	of problems where precise numerical modeling has prohibitive computational complexity. Moreover, by leveraging a numerical model it is possible to utilize privileged information that would be
	challenging if not impossible to collect in the real world. For full description of our methods please
	see Appendix A.

	5 RESULTS

	In this section, we provide results obtained in both virtual and physical environments. We first
	show that an adaptive policy can outperform baseline approaches in environments with constant
	deposition. Next, we showcase the in-process monitoring and the ability of our policy to adapt to
	dynamic environments. Finally, we demonstrate our learned controllers transferring to the physical
	world with a minimal sim-to-real gap.

	5.1 COMPARISON WITH BASELINE CONTROLLER

	We evaluate the optimized control scheme on a selection of freeform and CAD models sampled
	from Thingy10k (Zhou & Jacobson, 2016) and ABC (Koch et al., 2019) datasets (Appendix A.6).
	In total, we have 113 unseen slices corresponding to 96 unseen geometries. We report our findings in
	Figure 2. For each input slice, we report improvement on the printed boundary as the average offset.
	The average offset is defined as a sum of areas of under and over deposited material normalized by
	the outline length. More specifically, given an image of the target slice T, printed canvas C, a weight


	-----

	mask W, and the length of the outline l, the average offset O is computed as:


	= [(1][ −] _[C][)][TW]_
	_O_ _l_


	+ _[C][(1][ −]_ _[T]_ [)]


	(1)


	The improvement is calculated as a difference between the baseline and our policy. Therefore, a
	value higher than zero indicates that our control policy outperformed the baseline. As we can see,
	our policy achieved better performance in all considered models.

	25

	20

	15

	(microns) 10

	Average Offset Improvement


	Validation Slices


	Figure 2: The relative improvement of our policy over baseline.

	Next, we investigate the shapes where our control policy achieves the highest and the lowest gain,
	respectively (Figure 3). Best performance is achieved in smooth regions. The reason is that our
	policy is capable of adjusting the printing parameters based on the curvature while the baseline’s
	constant speed is more suitable for a limited range of curvatures. Conversely, our policy achieves
	the weakest performance on objects with sharp features. This is natural as the width of the deposited
	material in sharp regions is too large for the desired feature scale, leading to over-deposition. If such
	thin features are desired to print regularly, a thinner material nozzle can alleviate this issue.


	Highest Gain Lowest Gain


	Figure 3: Representative deposited patterns from the evaluation dataset.

	Finally, we compare our control policy with fine-tuned baseline. The baseline controller uses the
	same parameters for each slice. Different process parameters may be optimal for different slices. To
	this end, we choose two slices, a freeform slice of a bird and a CAD slice of a bolt and optimize their
	process parameters using Bayesian optimization, Figure 26 (for numerical details see Appendix B).
	We can observe that the two control schemes require drastically different velocities (1.46 SU/s vs.
	0.89 SU/s) to maximize performance. Moreover, we can see that the control parameters are not
	interchangeable, Appendix B. When switching the control parameters, we can observe a loss in
	performance. This loss is caused by each set of control parameters exploiting the local properties of
	the slice used for training. Lastly, we compare the individually optimized control parameters with
	our policy. Our policy improves upon both baseline solutions while maintaining generalizability.
	This is possible because our control policy relies on live feedback to adjust the printing parameters
	on-the-fly.


	5.1.1 ABLATION STUDY ON OBSERVATION SPACE

	Our control policy relies on a live view of the deposition system to select the control parameters.
	However, the in-situ view is a technologically challenging addition to the printer hardware that
	requires a carefully calibrated imagining setup. With this ablation study, we verify how important
	the individual observations are to the final print quality. We consider three cases: (1) no printing
	bed view, (2) no target view, and (3) no future path view. We analyzed the results from the pre-test
	(full observation space µ = 9.7, σ = 4.9) and the post-tests (no canvas µ = 8.8, σ = 5.7, no target
	_µ = 7.2, σ = 5.5, no path µ = 8.4, σ = 4.8) printing task using paired t-tests with Holm-Bonferroni_
	correction. The analysis indicates that the availability of all three inputs: the printing bed, the target,
	and the path improved final printouts (P < 0.01 for all three cases).


	-----

	5.2 PERFORMANCE IN DYNAMIC ENVIRONMENTS

	We use an identical random pressure variation profile to perform a quantitative evaluation in environments with varying pressure. We use the same evaluation dataset as for constant-pressure policies
	and report the overall improvement over the baseline controller, (Figure 4). We can observe that in
	each of the considered slices, our closed-loop controller outperformed the baseline.


	35

	30

	25

	20

	15

	10


	0

	Validation Slices

	Figure 4: The relative improvement of our policy over baseline.


	We have also evaluated the infill policy in a noisy environment, (Figure 5). We can observe that the deposition noise
	leads to an accumulation of material. The accumulation eventually results in a bulge of material in the center of the print,
	complicating the deposition of subsequent layers as the material would tend to slide off. In contrast, our policy dynamically
	adjusts the printing path to generate a print with significantly
	better height uniformity. As we can observe, the surface generated by our policy is almost flat and ready for deposition of
	potentially more layers.

	5.3 ABLATION STUDY ON VISCOSITY


	Overdeposition


	Baseline Infill Our Control Policy Infill

	Max

	Height

	Min


	Figure 5: Infill comparison.


	To verify that our policy can adapt to various materials, we trained three models of varying viscosity,
	(Figure 6). We can observe that, without an adaptive control scheme, the pressure changes are
	sufficiently strong to cause local over- or under-deposition. Our trained policy dynamically adjusts
	the offset and velocity to counterbalance the changes in the deposition. We can see that our policy
	is particularly good at handling smooth width changes and quickly recovers from a spike in printing
	width.

	Baseline Ours Baseline Ours Baseline Ours

	Viscosity


	Figure 6: Performance of our policy and baseline with varying viscosity.


	5.3.1 ABLATION STUDY ON ACTION SPACE

	To evaluate the need to tweak both the print- Full Action Space Velocity Only Displacement Only
	ing velocity and the printing path, we trained
	two control policies with a limited action set
	to either alter the velocity or path offset. We
	analyzed the results from the pre-test (full action space µ = 12.7, σ = 5.7) and the posttests (velocity µ = 7.5, σ = 2.5, displacement

	Full Action Space

	Velocity Only Displacement Only

	_µ = 5.6, σ = 8.3) printing task using paired t-_ Figure 7: Action space ablation study.
	tests with Holm-Bonferroni correction. The analysis indicates that the availability of the full action
	space resulted in an improvement in final printouts (P < 0.001 for both cases). The difference in
	performance depends on the inherent limitations of the individual actions. On the one hand, adjusting velocity is fast (under 6.6 milliseconds) but can cope only with moderate changes in material


	-----

	width. This can be observed as the larger bulges of over-deposited material in Figure 7 middle.
	On the other hand, while offset can cope with larger material differences but it needs between 0.13
	and 1.3 seconds to adjust. As a result, offset adjustment cannot cope with sudden material changes,
	(Figure 7 right). In contrast, by utilizing the full action space our policy can combine the advantages
	of the individual actions and minimize over-deposition, (Figure 7 left).

	5.3.2 ABLATION STUDY ON REWARD FUNCTION


	Our reward function uses privileged informa- Privileged Reward Delayed Reward Immediate Reward
	tion from the numerical simulation to evaluate
	how material settles over time. However, such
	information is not readily available on physical
	hardware. One either evaluates the reward once
	at the end of each episode to include material
	flow or at each timestep by disregarding long
	Privileged Reward

	Delayed Reward

	Immediate Reward

	term material motion. We evaluated how such Figure 8: Reward function ablation study.
	changes to the reward function would affect our control policies. We analyzed the results from the
	pre-test (privileged reward µ = 12.7, σ = 5.7) and the post-tests (delayed reward µ = −22.3,
	_σ = 8.6, immediate reward µ = 9.2, σ = 8.0) printing task using paired t-tests with Holm-_
	Bonferroni correction. The analysis indicates that the availability of the privileged information
	resulted in an improvement in final patterns (P < 0.001 for both cases). The learning process for a
	delayed reward is significantly slower, and it is unclear if performance similar to our policy can be
	achieved, Appendix A.6. On the other hand, the immediate reward policy learns faster but cannot
	handle material changes over longer time horizons, (Figure 8).

	5.4 PERFORMANCE ON PHYSICAL HARDWARE


	Finally, we evaluate our control policies on physical hardware. The policies were trained entirely
	in simulation without any additional fine-tuning on the printing device. To conduct the evaluation,
	we equipped our printer with a pressure controller. The pressure control was set to a sinusoidal
	oscillatory signal to provide a controllable dynamic change in material properties. We used two
	materials, with high and low viscosity, and used two separate policies pretrained in simulation using
	those materials. We printed 22 slices, of which 11 corresponded to the simulation training set and
	11 to the evaluation set. We monitor the printing process and use the captured images to run our

	100

	80

	60

	(microns) 40

	20

	Average Offset Improvement

	\|to capture quantitative results. We observe that our controllers the baseline print in every scenario, (Figure 9).\|Col2\|Col3\|Col4\|Col5\|Col6\|Col7\|Col8\|Col9\|Col10\|Col11\|Col12\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\|\|\|\|\|\|\|\|\|\|\|\|\|
	\|Low Viscosity Material\|\|\|\|\|\|\|\|\|High Viscosity Material\|\|\|
	\|\|\|\|\|\|\|\|\|\|\|\|\|
	\|\|\|\|\|\|\|\|\|\|\|\|\|
	\|\|\|\|\|\|\|\|\|\|\|\|\|
	\|\|\|\|\|\|\|\|\|\|Training Slices Evaluation Slices\|\|\|


	Figure 9: The relative improvement of our policy over baseline.

	A sample of the fabricated slices can be seen in Figure 10. The print target (white) is overlayed with
	a map of underdeposited (blue) and overdeposited (red) material. We further plot a histogram of
	under and over deposition.

	We can see that our control policy transferred excellently to the physical hardware without any additional training. Our policy consistently achieves smaller over-deposition while not suffering from
	significant under-deposition. Moreover, in many cases our policy achieves histograms with smaller
	width suggesting we achieved a tighter control over the material deposition than the baseline. This
	demonstrates that our numerical model enables learning control policies for additive manufacturing
	in simulation.


	-----

	6 CONCLUSION

	We present the first closed-loop control policy for additive manufacturing recovered via reinforcement learning. To learn an effective control policy, we propose a custom numerical model of the
	deposition process. During the design of our model, we tackle several challenges. To obtain an
	efficient approximation of the deposition process, we leverage the limited perception of a printing
	apparatus and model the deposition only qualitatively. To include non-linear coupling between process parameters and printed materials, we utilize a data-driven predictive model for the deposition
	imperfections. Finally, to enable long horizon learning with viscous materials, we use the privileged
	information generated by our numerical model for reward computation. In several ablation studies,
	we show that these components are required to achieve high-quality printing, effectively react to
	instantaneous and long horizon material changes, handle materials with varying viscosity, and adapt
	the deposition parameters to achieve printouts with minimal over-deposition and smooth top layers.

	We demonstrate that our model can be used to train control policies that outperform baseline controllers, and transfer to physical apparatus with a minimal sim-to-real gap. We showcase this by
	applying control policies trained exclusively in simulation on a physical printing apparatus. We
	use our policies to fabricate several prototypes using low and high viscosity materials. The quantitative and qualitative analysis clearly shows the improvement of our controllers over baseline printing.
	This indicates that our numerical model can guide the future development of closed-loop policies for
	additive manufacturing. Thanks to its minimal sim-to-real gap, the model democratizes research on
	learning for additive manufacturing by limiting the need to invest in specialized hardware. Furthermore, by expanding the simulator with other physical phenomena, e.g., abrasion, melting, or heat
	transfer, our numerical model can serve as a blueprint for learning closed-loop control policies of
	other manufacturing methods such as milling, direct energy deposition, or selective laser sintering.


	-----

	\|scosity Material Baseline\|Col2\|
	\|---\|---\|
	\|Baseline Ours\|\|
	\|Underdeposition\|Overdeposition\|


	Baseline


	Ours


	Baseline

	Ours




	\|Baseline Ours\|Col2\|
	\|---\|---\|
	\|Underdeposition\|Overdeposition\|

	\|Baseline Ours\|Col2\|
	\|---\|---\|
	\|Underdeposition\|Overdeposition\|

	\|Underdeposition\|Overdeposition\|
	\|---\|---\|

	\|Baseline Ours\|Col2\|
	\|---\|---\|
	\|Underdeposition\|Overdeposition\|


	Baseline

	Ours


	Baseline

	Ours


	Baseline

	Ours


	Baseline

	Ours


	Low Viscosity Material

	Baseline Ours

	Underdeposition Overdeposition

	Baseline Ours

	Underdeposition Overdeposition

	Baseline Ours

	Underdeposition Overdeposition

	Baseline Ours

	Underdeposition Overdeposition


	Baseline

	Ours


	Baseline

	Ours


	Figure 10: Deposition quality estimation of physical result manufactured with baseline and our
	learned policy.


	-----

	REFERENCES

	Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron,
	Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, et al. Solving rubik’s cube with a
	robot hand. arXiv preprint arXiv:1910.07113, 2019.

	Ivanna Baturynska, Oleksandr Semeniuta, and Kristian Martinsen. Optimization of process parameters for powder bed fusion additive manufacturing by combination of machine learning and finite
	element method: A conceptual framework. Procedia Cirp, 67:227–232, 2018.

	Jan Bender, Matthias M¨uller, Miguel A Otaduy, Matthias Teschner, and Miles Macklin. A survey on position-based simulation methods in computer graphics. In Computer graphics forum,
	volume 33, pp. 228–251. Wiley Online Library, 2014.

	John Parker Burg. Maximum Entropy Spectral Analysis. Stanford Exploration Project. Stanford
	University, 1975.

	Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff,
	and Dieter Fox. Closing the sim-to-real loop: Adapting simulation randomization with real world
	experience. In 2019 International Conference on Robotics and Automation (ICRA), pp. 8973–
	8979. IEEE, 2019.

	Alexander Clegg, Wenhao Yu, Jie Tan, C Karen Liu, and Greg Turk. Learning to dress: Synthesizing
	human dressing motion via deep reinforcement learning. ACM Transactions on Graphics (TOG),
	37(6):1–10, 2018.

	Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games,
	robotics and machine learning. 2016.

	Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J Zico Kolter. Endto-end differentiable physics for learning and control. Advances in neural information processing
	_systems, 31:7178–7189, 2018._

	Sarah Elliott and Maya Cakmak. Robotic cleaning through dirt rearrangement planning with learned
	transition models. In 2018 IEEE International Conference on Robotics and Automation (ICRA),
	pp. 1623–1630. IEEE, 2018.

	Timothy Erps, Michael Foshey, Mina Konakovi´c Lukovi´c, Wan Shou, Hanns Hagen Goetzke, Herve
	Dietsch, Klaus Stoll, Bernhard von Vacano, and Wojciech Matusik. Accelerated discovery of 3d
	printing materials using data-driven multi-objective optimization, 2021.

	Wei Gao, Yunbo Zhang, Devarajan Ramanujan, Karthik Ramani, Yong Chen, Christopher B
	Williams, Charlie CL Wang, Yung C Shin, Song Zhang, and Pablo D Zavattieri. The status,
	challenges, and future of additive manufacturing in engineering. Computer-Aided Design, 69:
	65–89, 2015.

	Carlos E Garcia, David M Prett, and Manfred Morari. Model predictive control: Theory and practice—a survey. Automatica, 25(3):335–348, 1989.

	Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. Continuous deep q-learning
	with model-based acceleration. In International conference on machine learning, pp. 2829–2838.
	PMLR, 2016.

	Angus Johnson. Clipper - an open source freeware library for clipping and offsetting lines and
	[polygons. http://www.angusj.com/delphi/clipper.php, 2015.](http://www.angusj.com/delphi/clipper.php)

	Branden Kappes, Senthamilaruvi Moorthy, Dana Drake, Henry Geerlings, and Aaron Stebner. Machine learning to optimize additive manufacturing parameters for laser powder bed fusion of inconel 718. In Proceedings of the 9th International Symposium on Superalloy 718 & Derivatives:
	_Energy, Aerospace, and Industrial Applications, pp. 595–610. Springer, 2018._

	Elia Kaufmann, Antonio Loquercio, Ren´e Ranftl, Matthias M¨uller, Vladlen Koltun, and Davide
	Scaramuzza. Deep drone acrobatics. arXiv preprint arXiv:2006.05768, 2020.


	-----

	Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. Abc: A big cad model dataset for geometric
	deep learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
	June 2019.

	Jeongseok Lee, Michael X Grey, Sehoon Ha, Tobias Kunz, Sumit Jain, Yuting Ye, Siddhartha S
	Srinivasa, Mike Stilman, and C Karen Liu. Dart: Dynamic animation and robotics toolkit. Journal
	_of Open Source Software, 3(22):500, 2018._

	Seunghwan Lee, Moonseok Park, Kyoungmin Lee, and Jehee Lee. Scalable muscle-actuated human
	simulation and control. ACM Transactions On Graphics (TOG), 38(4):1–13, 2019.

	Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, and Antonio Torralba. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In International
	_Conference on Learning Representations, 2019a._

	Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B Tenenbaum, Antonio Torralba, and Russ Tedrake.
	Propagation networks for model-based control under partial observation. In 2019 International
	_Conference on Robotics and Automation (ICRA), pp. 1205–1211. IEEE, 2019b._

	Chenang Liu, David Roberson, and Zhenyu Kong. Textural analysis-based online closed-loop quality control for additive manufacturing processes. In IIE Annual Conference. Proceedings, pp.
	1127–1132. Institute of Industrial and Systems Engineers (IISE), 2017.

	Libin Liu and Jessica Hodgins. Learning basketball dribbling skills using trajectory optimization
	and deep reinforcement learning. ACM Transactions on Graphics (TOG), 37(4):1–14, 2018.

	Pingchuan Ma, Yunsheng Tian, Zherong Pan, Bo Ren, and Dinesh Manocha. Fluid directed rigid
	body control using deep reinforcement learning. ACM Transactions on Graphics (TOG), 37(4):
	1–11, 2018.

	Miles Macklin and Matthias M¨uller. Position based fluids. ACM Transactions on Graphics (TOG),
	32(4):1–12, 2013.

	Larry Marple. A new autoregressive spectrum analysis algorithm. IEEE Transactions on Acoustics,
	_Speech, and Signal Processing, 28(4):441–454, 1980. doi: 10.1109/TASSP.1980.1163429._

	Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level
	control through deep reinforcement learning. nature, 518(7540):529–533, 2015.

	Mojtaba Mozaffar, Arindam Paul, Reda Al-Bahrani, Sarah Wolff, Alok Choudhary, Ankit Agrawal,
	Kornel Ehmann, and Jian Cao. Data-driven prediction of the high-dimensional thermal history
	in directed energy deposition processes via recurrent neural networks. Manufacturing letters, 18:
	35–39, 2018.

	Matthias M¨uller, David Charypar, and Markus H Gross. Particle-based fluid simulation for interactive applications. In Symposium on Computer animation, pp. 154–159, 2003.

	Anusha Nagabandi, Gregory Kahn, Ronald S Fearing, and Sergey Levine. Neural network dynamics
	for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE Interna_tional Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE, 2018._

	Junhyuk Oh, Satinder Singh, and Honglak Lee. Value prediction network. In NIPS, 2017.

	M Andrychowicz OpenAI, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub
	Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous
	in-hand manipulation. arXiv preprint arXiv:1808.00177, 2(3):5–1, 2018.

	Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deepmimic: Exampleguided deep reinforcement learning of physics-based character skills. _ACM Transactions on_
	_Graphics (TOG), 37(4):1–14, 2018._


	-----

	Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel
	Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement
	learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.

	Connor Schenck and Dieter Fox. Spnets: Differentiable fluid dynamics for deep neural networks.
	In Conference on Robot Learning, pp. 317–335. PMLR, 2018.

	John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy
	optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

	David Silver, Hado Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel DulacArnold, David Reichert, Neil Rabinowitz, Andre Barreto, et al. The predictron: End-to-end
	learning and planning. In International Conference on Machine Learning, pp. 3191–3199. PMLR,
	2017.

	Pitchaya Sitthi-Amorn, Javier E Ramos, Yuwang Wangy, Joyce Kwan, Justin Lan, Wenshou Wang,
	and Wojciech Matusik. Multifab: a machine vision assisted platform for multi-material 3d printing. Acm Transactions on Graphics (Tog), 34(4):1–11, 2015.

	Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, and Chelsea Finn. Universal planning networks: Learning generalizable representations for visuomotor control. In International
	_Conference on Machine Learning, pp. 4732–4741. PMLR, 2018._

	Chao Tang, Jie Lun Tan, and Chee How Wong. A numerical investigation on the physical mechanisms of single track defects in selective laser melting. International Journal of Heat and Mass
	_Transfer, 126:957–968, 2018._

	Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control.
	In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033.
	IEEE, 2012.

	Marc A Toussaint, Kelsey Rebecca Allen, Kevin A Smith, and Joshua B Tenenbaum. Differentiable
	physics and stable modes for tool-use and manipulation planning. 2018.

	Chengcheng Wang, Xipeng Tan, Erjia Liu, and Shu Beng Tor. Process parameter optimization and
	mechanical properties for additively manufactured stainless steel 316l parts by selective electron
	beam melting. Materials & Design, 147:157–166, 2018.

	Chengcheng Wang, XP Tan, SB Tor, and CS Lim. Machine learning in additive manufacturing:
	State-of-the-art and perspectives. Additive Manufacturing, pp. 101538, 2020.

	Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, and Pieter Abbeel. Learning to manipulate
	deformable objects without demonstrations. arXiv preprint arXiv:1910.13439, 2019.

	Wentao Yan, Ya Qian, Wenjun Ge, Stephen Lin, Wing Kam Liu, Feng Lin, and Gregory J Wagner.
	Meso-scale modeling of multiple-layer fabrication process in selective electron beam melting:
	inter-layer/track voids formation. Materials & Design, 141:210–219, 2018.

	Bing Yao, Farhad Imani, and Hui Yang. Markov decision process for image-guided additive manufacturing. IEEE Robotics and Automation Letters, 3(4):2792–2798, 2018.

	Ri Yu, Hwangpil Park, and Jehee Lee. Figure skating simulation from video. In Computer graphics
	_forum, volume 38, pp. 225–234. Wiley Online Library, 2019._

	Yunbo Zhang, Wenhao Yu, C Karen Liu, Charlie Kemp, and Greg Turk. Learning to manipulate
	amorphous materials. ACM Transactions on Graphics (TOG), 39(6):1–11, 2020.

	Qingnan Zhou and Alec Jacobson. Thingi10k: A dataset of 10,000 3d-printing models. arXiv
	_preprint arXiv:1605.04797, 2016._


	-----

	A METHODS

	A.1 HARDWARE SETUP

	In this work, we developed a direct write 3D printing platform with an optical feedback system
	that can measure the dispensed material real-time, in-situ. The 3D printer is comprised of a pressure
	driven syringe pump and pressure controller, a 3-axis Cartesian robot, optical imaging system, backlit build platform, 3D-printer controller and CPU Figure 11. The 3-axis Cartesian robot is used to
	locate the build platform in x and y-direction and the print carriage in the z-direction. The pressure
	driven syringe pump and pressure controller are used to dispense and optically opaque material onto
	the back-lit build platform. The back-lit platform is used to illuminate the dispensed material. The
	movement of the robot, actuation of the syringe pump and timing of the cameras are controlled via
	the controller. The CPU is used to process the images after they are acquired and compute updated
	commands to send to the controller.

	Figure 11: The printing apparatus consisting of a 3-axis Cartesian robot, a direct write printing head,
	and a camera setup.

	A.1.1 CALIBRATION

	To enable realtime control of the printing process we implemented an in-situ view of the material
	deposition. Ideally we would capture a top-down view of the deposited material. Unfortunately, this
	is not possible since the material is obstructed by the dispensing nozzle. As a result the camera has
	to observe the printing bed from an angle. Since the nozzle would obstruct the view of any single
	camera we opted to use two cameras. More specifically, we place two CMOS cameras (Basler AG,
	Ahrensburg, Germany) at 45 degrees on each side of the dispensing nozzle, Figure 11. We calibrate
	the camera by collecting a set of images and estimating its intrinsic parameters, Figure 12 calibration.
	To obtain a single top-down view we capture a calibration target aligned with the image frames of
	both cameras, Figure 12 homography. By calculating the homography between the captured targets
	and an ideal top-down view we can stitch the images into a single view from a virtual over-the-top
	camera. Finally, we mask the location of each nozzle in the image (Figure 12 nozzle masks) and
	obtain the final in-situ view, Figure 12 stitched image.

	The recovered in-situ view is scaled to attain the same universal scene unit size as our control policies
	are trained in. Since we seek to model the deposition only qualitatively it is sufficient to rescale the
	in-situ view to match the scale of the virtual environments. We identify this scaling factor separately
	for each material. To calibrate a single material we start by depositing a straight line at maximum
	velocity. The scaling factor is then the ratio required to match the observed thickness of the line with
	simulation. To extract the thickness of the deposited material we rely on its translucency properties.
	More precisely, we correlate material thickness with optical intensity. We do this be depositing the


	-----

	Left

	Right
	Combined

	Stitched

	Nozzle Image Stitched

	Locations


	Homography Nozzle

	Calibration

	Image Masks


	Stitched

	Image


	Figure 12: The calibration of the imaging setup. First intrinsic parameters are estimated from calibration patterns. Next we compute the extrinsic calibration by calculating homographies between
	the cameras and an overhead view. We extract the masks by thresholding a photo of the nozzle. The
	final stitched image consists of 4 regions: (1) view only in left camera, (2) view only in right camera,
	(3) view in both cameras, (4) view in no camera. The final stitched image is show on the right.

	material at various thickness and taking a picture with our camera setup. The optical intensity then
	decays exponentially with increased thickness which is captured by a power law fit.



	1.00

	0.75


	0.50

	0.25

	\|Col1\|1mm\|Col3\|Col4\|Col5\|
	\|---\|---\|---\|---\|---\|
	\|\|\|\|\|\|
	\|0.4mm 0.1mm\|\|\|\|\|
	\|\|\|\|0.19mm 0.05mm\|\|
	\|\|\|\|\|\|
	\|0.75mm\|\|\|\|\|
	\|\|\|\|\|\|


	0.00


	Intensity


	Figure 13: Calibration images for correlating deposited material thickness with optical intensity and
	the corresponding fit.

	The last assumption of our control policy is that the deposition needle is centered with respect to
	the in-situ view. To ensure that this assumption holds with the physical hardware we calibrate the
	location of the dispensing needle within the field of view of each camera and with respect to the
	build platform. First, a dial indicator is used to measure the height of the nozzle in z and the fine
	adjustment stage is adjusted until the nozzle is 254 microns above the print platform. Next, using a
	calibration target located on the build platform and the fine adjustment stage, the nozzle is centered
	in the field of view of each camera. This calibration procedure is done each time the nozzle is
	replaced during the start of each printing session.

	A.1.2 BASELINE CONTROLLER


	To calibrate the baseline control we follow the same procedure in simulation and physical hardware.
	We start be depositing a straight line at a constant velocity, Figure 14. Next, we measure the width
	of the deposited line at various locations to estimate the mean width. We use the width to generate
	the offset for outline printing and spacing of the infill pattern.


	-----

	Print Boundary

	Nozzle Path
	Width w

	Width w


	Width Estimation

	Figure 14: Baseline controller starts by estimating the width w of the deposited material. A control
	sequence for the nozzle is estimated by offsetting the desired shape by half the size of material width.

	A.2 CONTROL POLICY INPUT STATES

	To define the input states, we closely follow the
	constraints of the physical hardware. We model
	our observation space as a small in-situ view
	centered at the printing nozzle location. The
	view has a size of 84 84 pixels which trans_×_
	late to roughly 2.95 × 2.95 scene units (SU).
	The view contains either a heightmap (for infill In-Situ Printing Bed Desired Target Nozzle Path

	\|Occluded by Nozzle\|Col2\|Col3\|
	\|---\|---\|---\|


	Occluded by Nozzle

	printing) or material segmentation (for outline
	printing). Since the location directly under the Figure 15: Control Policy Input.
	nozzle is obscured for the physical hardware,
	we mask a small central position in the view equivalent to 0.42 SU or 7[1] [th of the in-situ view. To-]

	gether with the local view, we also provide the printer with a local image of the desired printing
	target and the path the control policy will take in the environment. To further minimize data redundancy, we rotate the in-situ view such that the printer moves along the positive X-axis in the image.
	These three inputs are stacked together into a 3-channel image, (Figure 15).


	A.3 ACTION SPACE

	The selection of action space plays a critical role in adapting a controller to the actual hardware. One possibility is to control and directly modify the voltage input of individual motors. However, such
	an approach is not readily transferable between printing devices. The
	controls are tied too tightly to the hardware selection and would exaggerate the sim-to-real gap. Moreover, directly affecting the motor
	voltage would mean that the control policy must learn how to trace Velocity
	print inputs. Instead, we propose a strategy that leverages the body
	of work on designing baseline controllers. Similar to baseline, our
	control policy follows a path generated by a slicer. However, we
	enable dynamic modification of the path. At each state, the printer

	Displacement

	Velocity

	can modify two actions: (1) the velocity at which the printing head In-Situ Printing Bed
	is moving and (2) displacement of the printing head in a direction
	perpendicular to the motion, Figure 16. Such a formulation allows Figure 16: The action space.
	us to decouple the hardware parameters from the control scheme and
	apply the same policy in both simulation and physical hardware by
	scaling the input units appropriately. In our simulation, we limit our velocity to the range of [0.2, 2]
	SU/s and the displacement to 0.2666 SU.


	A.4 TRANSITION FUNCTION

	The transition function takes a state-action pair and outputs a new state of the environment. In our
	setting, this means we need to numerically model the fabrication process, which is a notoriously


	-----

	difficult problem. Here we leverage our assumption that the observation space is so localized that it
	can identify the deposited materials only qualitatively. Therefore, we can trade physical realism for
	visual fidelity and efficiency. This description fits the Position-Based-Dynamics (PBD) (Macklin &
	M¨uller, 2013) framework, which is a geometrical approximation to the equations of motion.

	To model the interaction of the deposited material with the printing apparatus we rely on PositionBased Dynamics (PBD). PBD approximates rigid, viscous, and fluid objects as collections of particles. To represent the fluid we assume a set of N particles where each particle is defined by its
	position p, velocity v, mass m, and a set of constraints C. In our setting we consider two constraints: (1) collision with the nozzle and (2) incompressibility of the fluid material. We model the
	collision with the nozzle as a hard inequality constraint:

	_Ci(pi) = (pi_ qc) nc, (2)
	_−_ _·_

	where qc is the contact point of a particle with the nozzle geometry along the direction of particles
	motion v and nc is the normal at the contact location. To ensure that our fluids remain incompressilbe we follow (Macklin & M¨uller, 2013) and formulate a density constraint for each particle:

	_Ci(p1, ..., pn) =_ _[ρ][i]_ 1, (3)

	_ρ0_ _−_


	_mjW_ (pi pj, h), (4)
	_−_


	_ρi =_


	where ρ0 is the rest density and ρi is given by a Smoothed Particle Hydrodynamics estimator (M¨uller
	et al., 2003) in which W is the smoothing kernel defined by the smoothing scale h.

	We further tune the simulation parameters to achieve a wide range of viscosity properties. More
	specifically, we couple the effects of viscosity, adhesion, and energy dissipation into a single setting. By coupling these parameters we obtain materials with optically different viscosity properties.
	Moreover, we noticed that the number of solving substeps has a significant effect on viscosity and
	surface tension of the simulated fluids. Therefore, we also tweak the number of substeps from 2 for
	liquid-like materials to 5 for highly-viscous materials.

	We replicate our printing apparatus in the simulation, inset. We
	model the nozzle as a collision object with a hard contact constraint Nozzle
	on the fluid particles. Since modeling a pressurized reservoir is
	computationally costly as it requires us to have many particles in Material
	constant contact, we chose to approximate the deposition process Emitter
	at the peak of the nozzle. More specifically, we model the depo- Deposited
	sition as a particle emitter. To set the volume and velocity of the Material
	particles, we use a flow setting. The higher the flow, the more particles with higher initial velocities are generated. This qualitatively

	Nozzle

	Material
	Emitter

	Deposited
	Material

	approximates the deposition process with a pressurized reservoir. Printing Bed
	The particle emitter is placed slightly inside the nozzle to allow for realistic material buildup and a
	delayed stop, similar to extrusion processes. Finally, we consider the printer to have only a finite
	acceleration per timestep. To accelerate to target velocity, we employ a linear acceleration scheme.

	Another important choice for the numerical model is the used dis- Minimum Velocity Maximum Velocity
	cretization. We have two options: (1) time-based and (2) distancebased. We originally experimented with time-based discretization.
	However, we found out that time discretization is not suitable for
	printer modeling. As the velocity in simulation approaches zero,

	Time Based

	the difference in deposited material becomes progressively smaller
	until the gradient information completely vanishes, Figure 17 left.
	Moreover, a time-based discretization allows the policy to affect
	the number of evaluations of the environment directly. As a result,
	it can avoid being punished for bad material deposition by quickly
	rushing the environment to finish. Considering these factors we Distance Based
	opted for distance-based discretization, Figure 17 right. The policy New Material Between Timesteps

	New Material Between Timesteps

	specifies the desired velocity at each interaction point, and the en
	Figure 17: Discretization.

	vironment travels a predefined distance (0.2666 SU) at the desired
	speed. This helps to regularize the reward function and enable learning of varying control policies.


	-----

	An interesting design element is the orientation of the control polygons created by the slicer. When the outline is defined as points
	given counter-clockwise, then due to the applied rotation, each view
	is split roughly into two half-spaces, (Figure 18). The bottom one
	corresponds to outside i.e., generally black, and the upper one corresponds to inside i.e., generally white. However, the situation
	changes when outlining a hole. When printing a hole the two halfspaces swap location. We can remove this disambiguity by changing the orientation of the polylines defining holes in the model. By
	orienting them clockwise, we will effectively swap the two halfspaces to the same orientation as when printing the outer part. As
	a result, we achieve a better usage of trajectories and a more robust
	control scheme that does not need to be separately trained for each
	print’s outer and inner parts.


	Outline Hole

	Figure 18: Orientation.


	To design a realistic virtual printing environment, the model needs to capture the deposition imperfections. The source of these imperfections is the complex non-linear coupling between the dynamic
	material properties and the deposition parameters. Analytical modeling of this coupling is challenging as it requires a deep understanding of these interactions. Instead, we adopted a data-driven
	model. We observe that the final effect of the deposition error is a varying width of the deposited
	material. To recover such a model for our apparatus, we start by printing a reference slice over multiple iterations, (Figure 19 left). At each iteration, we measure the width of the deposited material at
	specified cross-sections, (Figure 19 middle). This yields us observations of how the material width
	evolves in time, (Figure 19 left). To formulate a predictive generative model, we employ a tool from
	speech processing called Linear Predictive Coding (LPC) (Marple, 1980). The model assumes that a
	signal is generated by a buzz filtered by an auto-correlation filter. We use this assumption to recover
	filter coefficients that transform white Gaussian noise into realistic pressure samples, (Figure 19
	left).

	To formulate a predictive generative model we employ a tool from speech processing called Linear
	Predictive Coding (LPC) (Marple, 1980). We can predict the next sample of a signal as a weighted
	sum of M past output samples and a noise term:


	_aM,mxn_ _m + ϵn,_ (5)
	_−_
	_m=1_

	X


	_xn = −_


	where x are the signal samples, ϵ is the noise term, and aM,m are the parameters of M -th order
	auto-correlation filter. To find these coefficients Burg (1975) propose to minimize the following
	energies:


	_N_ _−m_

	_fM,k_ +
	_\|_ _\|[2]_
	_k=1_

	X


	_N_ _−m_

	_bM,k_ _,_ (6)
	_\|_ _\|[2]_
	_k=1_

	X


	_eM =_

	_fM,k =_


	_aM,ixk+M_ _−i,_ (7)
	_i=0_

	X

	_M_

	_a[∗]M,i[x][k][+][i][,]_ (8)
	_i=0_

	X


	_bM,k =_


	where ∗ denotes the complex conjugate. After finding the filter coefficients with Equation 6 we can
	synthesize new width variations with similar frequency composition to the physical hardware by
	filtering a buzz modeled as a white Gaussian noise. Since we sampled the width variation at discrete
	intervals we further find a smooth interpolating curve that corresponds model the observed pressure
	variation. We use the proposed model to drive the flow setting of our simulator. This directly
	influences the width of the deposited material similarly to the imperfections in the deposition.

	A.5 REWARD FUNCTION


	Viscous materials take significant time to settle after deposition. Therefore, to assess deposition
	errors, it is needed to observe the deposition over long horizons. However, the localized nature


	-----

	Measurement
	LPC Model

	Start End


	Calibration Printouts Sample Locations

	Figure 19: We performed nine printouts and measured the width variation at specified locations. We
	fit the measured data with an LPC model. Please note that since our model is generative, we do not
	exactly match the data or any observed resemblance is a testament to the quality of our predictor.


	of the in-situ view makes such observations impossible on the physical hardware. As a result,
	learning long-horizon planning has infeasible sample complexity. To tackle this issue, we leverage
	the fact that we utilize a numerical approximation of the deposition process with access to privileged
	information. At each simulation step, we model the entire printing bed. This allows us to formulate
	the reward function as a global print quality metric. More specifically, our metric is composed of two
	terms: (1) a reward term for depositing material inside the desired slice and (2) a punishment term
	for depositing material outside of the slice. To keep the values consistent across slices of varying
	size, we normalize them by the length of the outline or the infill area, respectively. We provide dense
	rewards as the difference between the metrics evaluated at two subsequent timesteps to accelerate
	the training further.

	We consider two reward function in our setting one for outline printing and one for infill printing.
	Each reward function evaluates the print quality as a whole. To accelerate the learning we provide
	the algorithm with dense rewards as a delta between the reward in-between steps R = R[n][+1] _−_ _R[n]._

	To print the outline we want to follow the boundary as closely as possible without overfilling. To
	this end we compose our reward function of two terms. Given an image of the current printing bed
	_C and the desired target T we define the reward as_ _CT_ . While such a formulation rewards the
	control policy for depositing material inside the printing volume it does not encourage a tight outline
	fill. Indeed a potential strategy with such a reward would be to offset the printing nozzle as much

	[P]
	inside as possible and then move safely within the object bounds. To address this issue we propose
	to include a weight map W that is computed as a thresholded distance transform of the target T . The
	final reward function is then: R = _CTW_ . Using such a formulation we put the highest weight on
	depositing directly on the outline boundary. The threshold cutoff then helps preventing a strategy of
	filling up the shape interior. To ensure that the printer deposits material inside the desired locations

	[P]
	we include an additional punishment term P = _C(1 −_ _T_ ). Finally, both reward and punishment
	is normalized by the length of the outline of our target.

	For infill printing we compute the reward from the heightfield of the deposited material. We start[P]
	by estimating how much of the slice was covered. To this end, we use a thresholded version of
	the canvas and compute the coverage as R = _CT_ . Similarly, we estimate the amount of overdeposited material as P = _C(1 −_ _T_ ). To keep these values consistent across different slices we
	normalize them by the total area of the print. Finally, to motivate deposition of flat surfaces suitable

	[P]
	for 3D printing we add another penalty term as the standard deviation of the canvas heightfield.

	[P]

	A.6 TRAINING PROCEDURE


	To train our control policy we start with g-code generated by a slicer. As inputs to the slicer we
	consider a set of 3D models collected from the Thingy10k dataset. To train a controller the input
	models need to be carefully selected. On the one hand, if we pick an object with too low frequency
	features with respect to the printing nozzle size then any printing errors due to control policy will
	have negligible influence on the final result. On the other hand, if we pick a model with too high
	frequency features with respect to the printing nozzle then the nozzle will be physically unable to
	reproduce these features. As a result we opted for a manual selection of 18 models that span a wide
	variety of features, Figure 21. Each model is scaled to fit into a printing volume of 18 × 18 SU and
	sliced at random locations.


	-----

	Outline Infill

	Target Printout Rewarded Punished


	Figure 20: The reward function.

	Figure 21: Models in our curriculum. For a full view of exemplar slices please see the supplementary
	material.


	Our policy is represented as a CNN modeled after Mnih et al. (2015). The network imput is a
	84 × 84 × 3 image. The image is passed through three hidden layers. The convolution layers have
	the respective parameters: (32 filters, filter size 8, stride 4), (64 filters, filter size 4, stride 2), and (64
	filters, filter size 3, stride 1). The final convolved image is linearized and passed through a fullyconnected layer with 512 neurons that is connected to the output action. Each hidden layer uses the
	nonlinear rectifier activation. We formulate our objective function as:

	arg maxθ E[C]t ππθtθ−t (1a(at\|ts\|st)t) _Aˆt_ _,_ (9)

	where t is a timestep in the optimization, θ are the hyperparameters of a neural network encoding our
	policy π that generates an action at based on a set of observations st, _A[ˆ]t is the estimator of the advan-_
	tage function and the expectation E[C]t [is an average of a finite batch of samples generated by printing]
	sliced models from our curriculum C. To maximize Equation 9 we use PPO algorithm (Schulman
	et al., 2017). Each trajectory consists of a randomly selected mesh slice that is fully printed out
	before proceeding to the next one. One epoch terminates when we collect 10000 observations. We
	run the algorithm for a total of 4 million observations but convergence was achieved well before
	that, Figure 22. For the training parameters we set the entropy coefficient to 0.01 and anneal it towards 0. Similarly we anneal the learning rate from 3e-4 towards zero. Lastly, we picked a discount
	factor of 0.99 which corresponds to one action having a half time of 70 steps. This is equivalent to
	roughly 18.6 SU of distance traveled. In our training set this corresponds to 29-80 percent of the
	total episode length.


	Full Training No Printing Bed No Path No Target

	0.12 0.12 0.12


	0.12


	0 Iterations 4e⁶ 0 Iterations 4e⁶ 0 Iterations 4e⁶ 0 Iterations

	Figure 22: Training curves for controllers with constant material flow.


	4e⁶


	-----

	We also experimented with training controllers for materials with varying viscosity, Figure 23. In
	general we have observed that the change in viscosity did not significantly affect the learning convergence. However, we have observed a drop in performance when training control policies for
	deposition of liquid materials. The liquid material requires longer time horizons to stabilize and has
	a wider deposition area making precise tracing of fine features challenging.


	0.12


	0.12


	0.12



	4e⁶


	Iterations Iterations Iterations

	Viscosity


	Figure 23: Training curves for controllers with increasing viscosity in an environment with noisy
	flow.

	Lastly, we conducted ablation studies on action space and reward function in the environment with
	noisy deposition, Figure 24. We can see that employing the delayed reward had a negative effect on
	convergence and it is unclear if a policy of sufficient quality would be achieved.


	Velocity Only

	Iterations


	Displacement Only


	Delayed Reward


	Immediate Reward

	0.12


	0.12


	0.12

	Reward

	2e⁶ 0


	0.12


	Iterations


	2e⁶


	Iterations


	2e⁶


	Iterations


	2e⁶


	Figure 24: Training curves for controllers with variable material flow.


	For evaluation we constructed a separate dataset consisting of freeform and CAD geometries that
	were not present in the training, Figure 25.

	Figure 25: Exemplar models from the evaluation dataset.


	BAYESIAN OPTIMIZATION FOR BASELINE CONTROL


	While the baseline controller closely follows the printed boundaries is possible that there is a more
	suitable policy to maximize our objective function. To verify this we use the environment described
	in Section 4 to search for a velocity and offset that maximizes the reward function. More specifically
	we optimize a simplified objective of Equation 9 limited to a single shape:

	arg maxv,d E [πv,d(at\|st)], (10)


	where v and d are the optimized velocity and displacement of the printing policy πv,d, and E reduces
	to the expected cumulative reward of executing our proposed environment with a single slice. Maximizing Equation 10 even for a single shape is a challenging task due to the high cost associated with
	evaluating the objective function. Because of this we rely on Bayesian optimization to maximize
	the objective. We warm-start the optimization with 20 samples acquired through Latin sampling of


	-----

	our 2-dimensional action space. We run the optimization until convergence that we define as not
	improving upon the found maxima for over 300 iterations. We can see the optimized controllers for
	a free-form bird model and a CAD model of a bolt compared to our optimized policy in Figure 26.

	Optimization 1 Optimization 2

	Rewards: 10 5.3 8.5 8.9 17 10

	Bayesian Optimization Our Policy

	Figure 26: Printouts realized using control policies recovered with Bayesian optimization (left and
	middle, blue square marks the optimized slice) compared to our trained policy (right).

	C ADAPTATION TO VARYING VISCOSITY

	We evaluate how our learned controllers adapt to varying viscosity, (Figure 27). We can observe
	that our policy learned on low-viscosity materials consistently under-deposits when used to print at
	higher viscosities. Conversely, our control policy learned on high-viscosity material over-deposits
	when applied to materials with lower viscosities. From this observation we conclude that our policy
	learns the spread of the material post-deposition and uses this information to guide the deposition.
	Therefore, small viscosity variations are not likely to pose significat challenge for our learned policies. However, if the learned material behavior is significantly violated the in-situ observation space
	limits the ability of our policy to adapt to a before unseen material.

	Baseline Our Our
	Low Viscosity Medium Viscosity

	Viscosity


	Figure 27: We compare the baseline policy and our three learned policies on materials with varying
	viscosity.


	-----

	DETAILED PHYSICAL RESULTS

	Low Viscosity Material

	Baseline Ours Baseline Ours


	High Viscosity Material

	Baseline Ours Baseline Ours


	Figure 28: Policy evaluation on physical hardware.


	-----