Backseat Typist

Friday, October 23, 2015

TL;DW for A Brief, Opinionated History of the API

Josh Bloch's whirlwind talk A Brief, Opinionated History of the API is worth watching, both for its historical exploration that situates our own work and for the clarity it brings regarding this lawsuit between Oracle and Google. The following are some notes I took as I watched.

The beginning (of my notes, and of his talk) is more detailed in order to show the development of the idea.

Driving question: "How did APIs develop in the history of programming?"
Who invented the API?

00:59 "Subroutine library" gave idea of re-using code for common operations (von Neumann & Goldstine, design for EDVAC, 1948)
02:26 ACM awarded Turing Award to Maurice Wilkes (1967)

Why not awarded to von Neumann & Goldstine?

03:16 Wilkes got the credit because von Neumann and Goldstine had an idea, but hadn't implemented it.

Theory --> Practice

04:47 Wilkes' machine, EDSAC, came online in 1951 and was immediately useful.
06:40 Wilkes' machine was immediately useful because he prioritized simplicity over performance.

Goal: produce a working machine quickly, rather than to create a more refined machine that would take longer to build.
[Agile process. Win! :)]

08:45 First programs to run were toys, and their architecture was:

first 30 words were "initial orders" (boot loader)

Pressing start button [heh, 44 years before Windows 95] loaded initial orders to memory and began execution.

program was loaded from tape into memory, starting at location 30
programs were written in assembly

09:52 Wilkes' first real program: he realized that a good part of the remainder of his life was going to be spent debugging.

Saw subroutines as the solution, assigned it to his grad student, Wheeler.

11:13 Wheeler architected working subroutines that required no manual intervention within three(!) months.

"coordinating orders" augmented "initial orders"
"pseudo-orders" for relocation of subroutines, parameter assignment, etc.
initial orders ran coordinating orders interpretively

12:25 subroutine linkage technique allowed for

self-modifying code (good idea when you're working with 512 words of memory, 1024 bytes)
subroutines could invoke other subroutines ad infinitum (recursion didn't come for another decade)

13:41 categories of subroutines in the EDSAC subroutine library

floating point arithmetic
arithmetic on complex numbers
checking (dynamic debugging!)
division
exponentials
general routines relating to functions
differential equations
special functions
power series
logarithms
misc.
print & layout
quadrature
read (i.e., input)
nth root
trig functions
counting operations
vectors & matrices

14:37 The Preparation of Programs for an Electronic Digital Computer

world's first computer programming text
definitive until high level languages arose
contained entire API
cited in Wilke's Turing Award
Tech report published: 1950
Published as a book in 1951

16:08 Wheeler presented key ideas in 1952 paper to ACM; these had all been implemented:

subroutine
subroutine library
generality vs. performance tradeoffs
importance & difficulty of library documentation
information hiding
the interpretive routine (in order to squeeze your program into memory: write your programs in your own optimized mini-language, and then interpret that)
the interpretive debugger
higher-order functions

18:00 Quote from Wheeler's paper: "It should be pointed out that the preparation of a library sub-routine requires a considerable amount of work. This is much greater than the effort merely required to code the sub-routine in its simplest possible form. It will usually be necessary to code it in the library standard form and this may detract from its efficiency in time and space. It may be desirable to code it in such a manner that the operation is generalized to some extent. However, even after it has been coded and tested there still remains the considerable task of writing a description so that people not acquainted with the interior coding can nevertheless use it easily. This last task may be the most difficult."
18:35 Conclusion of Wheeler's paper: "The prime objectives to be borne in mind when constructing sub-routine libraries are simplicity of use, correctness of codes and accuracy of description. All complexities should—if possible—be buried out of sight."
19:41 Wheeler's paper was only two pages long.
20:12 Why didn't Wilkes and Wheeler discuss the API as distinct from the library?

the two were largely isomorphic
one machine [architecture]; no notion of portability
no legacy programs; no notion of backward compatibility

20:58 reimplementation of existing subroutine libraries for new hardware and with better algorithms: APIs became independent from libraries
21:36 Bloch's own research: 1968 paper first to use the term "Application Program Interface"
23:33 Why did the term arise?

goal of allowing implementations to be replaced without harm to clients
needed a name for the concept
libraries in practice give rise to APIs: APIs were discovered, more than invented

What exactly is an API?

25:16 Bloch's proposed simple, working definition: "An application programming interface (API) specifies a component in terms of its operations, their inputs, and outputs. Its main purpose is to define a set of functionalities that are independent of their implementation, allowing the implementation to vary without compromising the users of the component."
25:41 If you can answer yes to these questions, it's an API:

Does it provide a set of operations defined by their inputs and outputs?
Does it admit reimplementation without compromising its users?

26:40 FORTRAN II standard library, 1958. The API still works in modern FORTRAN. (!!!!!)

API? YES

… Bloch reviews 12 other instruction sets, standard library documents, etc. to see if they pass his two-question API Test.
31:07 It seems that too many things meet Bloch's definition: Instruction Set Architectures, CLIs, wire-level protocols. Augment definition [augmentation in italics]:

"An application programming interface (API) specifies a component in terms of its operations, their inputs, and outputs. Its main purpose is to define a set of functionalities that are independent of their implementation, allowing the implementation to vary without compromising the users of the component. An API augments a programming language (or set of languages with an interoperable calling convention). Alternatively an API may be described in an interface definition language."

31:50 Four lessons from quick tour:

APIs come in all shapes and sizes (and keep getting bigger!)
Many APIs live forever (outliving the platforms for which they were created)
APIs can create entire industries above/beneath
APIs are the methods of operation by which components in a system use one another.

How does an API come to be?

32:49 People write software to address a pain point. If others like your software, they use it. BOOM. Now you have an API, whether you were ready for it, or not.
"Necessity is the mother of the API."

What makes an API successful?

33:20 Right thing (solves a real problem), right place (successful language or platform), right time, good enough
34:22 Success isn't everything!

If your API is successful and not good, that's bad. If it's good and not successful, it may influence a successful API in the future.

34:48 Morals

You can't know which APIs will take off, or when.
Design all interfaces as if they were public APIs. (They might be!)
Don't wait to design in the quality. (It might not be possible, later, and your API may take off while it still sucks.)
Principles of API design are well known, as are the costs of ignoring them.

A legal digression

35:40 We've always had the freedom to reimplement each others' APIs.
Current (disfavorable) conclusion to the court case Bloch mentions: http://bits.blogs.nytimes.com/2015/06/29/supreme-court-declines-to-hear-appeal-in-google-oracle-copyright-fight/

Thursday, October 15, 2015

TL;DW for Unknown pearls from the Clojure standard library

This brief and to-the-point talk by Renzo Borgatti can be seen here. He goes through 10 relatively unknown fns that are available without external dependencies, and interesting in some way. Here's the list:

destructure—useful to debug destructuring. "It's like macroexpand, but for destructuring."
reductions—like reduce, but also returns intermediate results
test—docstring says it best: "test [v] finds fn at key :test in var metadata and calls it, presuming failure will throw exception". You might use this to document/demonstrate assertions about a var in the immediate context of the var's definition.
clojure.pprint/cl-format—crazy-powerful formatting function from Common Lisp. Pluralization of English words! Roman numerals! Spelled out English representations of numbers! The docstring links to this documentation on format control strings.
clojure.java.browse/browse-url—programmatically open a URL in the system browser
clojure.java.javadoc/javadoc—quick peek into java docs
clojure.reflect/reflect—deep reflection on types. Get variables, fields, methods supported, signatures of each method, etc.
clojure.inspector/inspect-tree—visual inspector of data structures, handy for complex structures. Swing UI.
clojure.lang.PersistentQueue—immutable FIFO queue, with buffers, schedulers, etc.
fnil—nil-patch a fn. Useful to handle nil when it wouldn't be handled otherwise, or else to override nil-handling.

The following get honorable mentions:

counted?—does coll implement 'count in constant time?
reversible?—does coll implement Reversible?
vector-of—uniform-type vectors of unboxed primitives
clojure.set/rename-keys—rename keys in a map
clojure.data/diff—Clojure data structure diffing
munge—munge special characters in symbols or strings to _ENGLISH_ representations. (munge "!") -> "_BANG_"
gensym—Returns a new symbol with a unique name, optionally prefixed.
seque—Creates a queued seq on another (presumably lazy) seq s.
zippers—functional tree editing

These were highlights for me: destructure, test, cl-format, reflect, inspect-tree, munge

Thursday, September 18, 2014

TL;DW for Clojure Data Science

Edmund Jackson talked at the 2012 Clojure/Conj, and you can see his talk here.

I took these notes as I watched it:

What is "data science"?

"That realm of endeavor that requires, simultaneously, advanced computational and statistical methods."
Some people aren't sure whether "data science" is a thing, or just data analysis dressed up with a fancy name. That question amuses me.

What's new, such that everybody suddenly cares about data science?

widely available computing resources, open source tools such as R, and large amounts of data available in private companies and in public
Compares to early days of Linux, when there was a bunch of new stuff that everybody could hack on

Interactive tools aren't enough; you're not taking some data, analyzing it, and coming back with the answer. You need platform features like native language speed, data structures, language constructs, connectivity, and QC in order to embed your analysis in business processes.
The tools with better analysis features (e.g., R, Mathematica) lack the platform features, and the tools with better platform features (he focuses primarily on C++ as his example here) lack the analysis features.
Python is in the sweet spot, with platform features and (via numpy, scipy, and pandas) analysis features. But:

It's full of mutable data!
The mode of expression in imperative languages poorly matches the content of expression when you're dealing with maths.

F#, Scala, and Clojure are all functional, and therefore (immutable data, more natural expression of maths) better alternatives than Python.
Clojure yay! points:

Native: Incanter, Storm, Cascalog, Datomic
JVM: Mahout (ML on Hadoop), jBLAS, Weka (Java lib with many ML algorithms)
Interop: Rincanter (call out to R), JNI

From here he goes into calculating the entropy of a distribution, and the relative entropy of different distributions.
Demonstrates using relative entropy fns in Datomic queries

Wednesday, September 3, 2014

TL;DW for "How To Design A Good API and Why it Matters"

Josh Bloch's Google Tech Talk video How To Design A Good API and Why it Matters is about an hour long, and well worth your time. It's focused on OOP, but has lots of good principles that can be followed elsewhere.

In case you don't have an hour right now, here's a summary/index kind of thing that points out the bits I thought were most important.

6:27: Characteristics of a good API:

Easy to learn
Easy to use, even without documentation
Hard to misuse
Easy to read and maintain code that uses it
Sufficiently powerful to satisfy requirements
Easy to evolve
Appropriate to audience

7:52: Gather requirements, but differentiate between true requirements (which should take the form of use cases) and proposed solutions.
10:02: Start with a short spec; one page is ideal.

Agility trumps completeness at this point.
Get as many spec reviews from as many audiences as possible, modify according to feedback.
Flesh the spec out as you gain confidence.

15:10: Write to your API early and often

Start writing to your API before you've implemented it, or even specified it properly.
Continue writing to your API as you flesh it out.
Your code will live on in examples and unit tests.

17:32: Write to SPI [Service Provider Interface]

Write at least three plugins before your release.
Application in Clojure-land: Not sure...

19:35: Maintain realistic expectations.

You won't please everyone.
Aim to displease everyone equally.
Expect to make mistakes and evolve the API in the future.

22:01: API should do one thing and do it well.

Functionality should be easy to explain.
If it's hard to name, that's a bad sign.

Example of bad name that I can't leave out of this summary: OMGVMCID

24:32: API should be as small as possible but no smaller

"When in doubt, leave it out." You can always add stuff, but you can't ever remove anything you've included. (The speaker calls this out as his most important point.)

26:27: Implementation should not impact API.

Do not over-specify. For example, nobody needs to know how your hash function works, unless the hashes are persistent.
Don't leak implementation details such as SQL exceptions!

29:36: Minimize accessibility of everything.

Don't let API callers see stuff you don't want to be public, and that includes anything you might want to change in the future.

30:39: Names matter: API is a little language.

Make names self-explanatory.
Be consistent.
Strive for symmetry. (If you can GET a monkey-uncle, make sure you can PUT a monkey-uncle, too.)

32:32: Documentation matters.

Document parameter units! ("Length of banana in centimeters")

35:41: Consider performance consequences of API design decisions.

Bad decisions can limit performance -- and this is permanent.
Do not warp your API to gain performance -- the slow thing you avoided can be fixed and get faster, but your warped API will be permanent.
Good design usually coincides with good performance.

40:00: Minimize mutability

Make everything immutable unless there's a reason to do otherwise.

45:31: Don't make the caller do anything your code should do.

If there are common use cases that require stringing a bunch of your stuff together in a boilerplate way, that's a bad sign.

48:36: Don't violate the principle of least astonishment

Make sure your API callers are never surprised by what the API does.

50:03: Report errors as soon as possible after they occur.
52:00: Provide programmatic access to all data that is available in string form.

Rich Hickey makes a similar point here.

56:15: Use consistent parameter ordering across methods.

Here's a bad example:

char *strncpy (char *dst, char *src, size_t n);
void bcopy (void *src, void *dst, size_t n);

57:15: Avoid long parameter lists.
58:21: Avoid return values that demand exceptional processing.

Example: return an empty list instead of nil/null.

Friday, February 14, 2014

hostnames as commands

Several years ago, I adopted a practice I've realized I should write down. I have two shell scripts that live in ~/bin/:

james.mojo.home ~ $ cat bin/ssh-host
#!/bin/bash

start=`date`
remote_host=`basename $0`
if ! ssh $remote_host $*; then
    echo from $start to
    date
fi

james.mojo.home ~ $ cat bin/mosh-host
#!/bin/bash

start=`date`
remote_host=`basename $0`
if ! mosh $remote_host -- $*; then
    echo from $start to
    date
fi

And I have many symlinks in ~/bin/ that point to those scripts. For example:

lrwxr-xr-x 1 moquist staff 8 Jul 12 2013 aristotle -> ssh-host
lrwxr-xr-x 1 moquist staff 8 Jul 12 2013 bhs.somedomain.com -> ssh-host 
lrwxr-xr-x 1 moquist staff 8 Jul 12 2013 devserver.somedomain.com -> mosh-host

Of course I also have ~/.ssh/config set up, and my SSH keys are all in the appropriate ~/.ssh/authorized_keys files on remote systems.

But once all that's done, if I want to log in to a system, I can just type the name of the system (with tab completion). If I want to pipe something into or out of a command on a remote system (via ssh-host only), the system name just becomes another command:

james.mojo.home ~ $ aristotle "w | grep eviluser || echo eviluser is absent"
eviluser is absent
james.mojo.home ~ $ aristotle cat somefile | grep bits-i-want
### elided ###
james.mojo.home ~ $ for h in aristotle plato plantinga kant; do echo ====$h====; $h ls | grep lostfile; done

Obviously these are contrived examples, and there are plenty of other ways to do the same things. I've just found it convenient to think of hosts as commands, and this approach has let me do that.