Looks like a bug to me (how to make testable code)

4h 52m 2410s - this happened to me trying browser local storage by making a "stopwatch" application in html+javascript.
This looks like a bug to me.

I can’t remeber how old I was when I started coding. Maybe I was 9 or 10 years old.

Back then, we coded in basic directly in a spectrum 48k computer via a horrible command-line interface that we found (now surprisingly) pleasant.

Nobody did anything to test their programs. You just run it, typed something on the keyboard, and it worked (or not).

Nowadays stuff is not that simple. You’ve got to test a lot your software because you don’t have control on where and how your app will be used.

How to make software testable?

If you started coding back in the 80s, there’s almost impossible to resist the urge to have a quick’n dirty prototype just to see results (yep, there was no REPL available those days).

So, after having *some* results (without formal testing). It’s easy to convert your mini app to something testable.

This is the “trick”:

from a quick and dirty prototype to a testable code

  1. Refactor your quick’n dirty protoye into two pieces:
    1. A shell with minimal functionality that looks like pseudocode.  If you’re coding in C, your app main()  function will be there. A bunch of simple code that you can test (and hopefully reuse in the future).

Strip everything from your codebase until looks like pseudocode. This pile of code is almost untestable. Just make a formal technical revision on it and let it live (or die if it’s not okay).

All code that doesn’t read like in english or pseudocode should be “promoted” to a function and stored in a separate file.

Compile and run the app. Everything works as before, right?

When you have everything modeled as a function, testing is easy: just make a program that feeds that function with all the data you can imagine and compare it to the expected result. You don’t need a testing framework to do this: a simple program might do it.

BTW – if your code has global variables, or uses singletons  you should refactor your logic to remove them before making anything “testable”.

Talend for Big Data (packt publishing) review

I’ve just finished reading a review copy of Talend for Big Data, courtesy of Packt Publishing. Talend for Big Data (PacktPub library)I’ve been using Talend for ETL and automation tasks for some years and I wanted to start using it to feed data into a small hadoop cluster we have, so I think I can be able to put myself on this book readers shoes easily.

Book structure: a journey in Big Data

I’ve enjoyed the book follows a real use case of sentiment analisys using twitter data: I was getting tired of examples word counting / term extraction examples found in other Hadoop texts.

Although the book doesn’t describe in depth how to get the data from the twitter API using a Talend component (there are many available for this task), I think the information is enough to follow the steps in the book: Keep in mind the use case is an excuse to work with talend and big data.

The structure is very straightforward and It resembles closely a real world Big Data integration job:

  • The basics: what’s Talend, what’s hadoop, and how to get started (terminology and setup)
  • How to get data into a hadoop cluster (there’s a component for that: tHDFDOutput)
  • Working with tables (hive) in Talend using Hive.
  • Working with data using Pig.
  • Loading results back to an SQLdatabase using Apache Sqoop
  • And finally, how to industrialize this process.

In the real world you’ll surely choose between Hive and Pig to make your project simpler. Having a chapter for hive and another for pig lets you see and compare both technologies and helps you choose the one you feel more comfortable working with.

I’ve also found very interesting using Apache Sqoop  to getting the data out of Hadoop back to the SQL World.

I didn’t know about Sqoop before reading the book and I was tempted to extract the data from Hadoop using a Talend job as a bridge. Dont’ do IT!. Using Sqoop is much better because it can paralelize the load job. It remembers me how to make backups using a disk cabin vs using a server agent (just tell the cabin to do the backup by its own vs copying all the data to a point and move it around).

Surprises

The good

  • Contexts! I’ve ever thought the best part of Talend are contexts and I find great to see all the examples in the book using contexts since the beginning.
  • In chapter 4 we learn how to use UDF (user-defined-functions) with Hive inside Talend. In the book the problem it solves is Hive does not support regular expressions; but It gives us a clue that may allow us to do something with interesting with other kinds of data, like images or audio files.
  • The way Talend works with Pig is easier that I expected. Why? because you dont’ need to know anything about Pig latin code to get results. I expected something more complicated. In fact, I thing I’m going to use tPig* components more frequently than the Hive ones.
  • The chapter about using Sqoop with Talend. For me, this chapter just justifies buying the book because it saves you a lot of time.

The bad

  • I discovered in the book that Talend doesnt include all the JARs needed to work with Hadoop. This is not a technical problem per se; but a legal one: Talend cannot distribute the hadoop files under their own license. Fortunately the guys from Talend have made available a one-click-fix.
  • At first glance I found the book short. Maybe I’m used to technical books with a lot of literature and this book has a very practical how-to-make-things-happen approach. I hope to see a second edition soon with dedicated to Google Big Query (which, by the way, is supported by Talend in the latest release with its own set of components).

Conclusion: concise, hands-on book about data integration with Talend and Hadoop. Highly recommendable even if you just want to extract data from an existing hadoop cluster.

Hacking candidate search with post-it notes

This happened long before LinkedIn,Taleo, or Jobvite existed. The trick hack still works. Give it a try.

A friend needed to fill in a hurry a developer position for a client. He posted the opening in a job site and waited for several days hoping he would find a good candidate for the job; but It didn’t happened.

He received 5-10 CVs; none of them was a good match for the position. Some lacked experience, and others didn’t have enough technology knowledge.

We conceived this hack after lunch. He told me about his problem trying to fill this position and what started as a joke ended.

Sourcing Hack

0- Go to a physical / brick-and-mortar bookstore.

1- Find a book the potential new employee will need to use in the job. For developers a safe bet is any O’Reilly language cookbook, or Pragmatic Programmers book.

2 – Choose a chapter from the table of contents that deals with the specific skill that’s giving you a headache.

3 – Insert a postit note somewhere within that chapter that says something like this:

I have a job for you. Call/email me for details if you understand this subject well

It worked :-)

(If you liked this post, contact me on LinkedIN here)

Please don’t ever make “backups” like this

Pitfall

Today we spotted (again) the stupidestoldest “backup” bug in history from a script that should backup a 300GB/day repository.

Can you spot the bug? (ps: Don’t try to do this at homework):


 tar -czf ${DEST}/${TIMESTAMP}_reports.tar.gz *${TIMESTAMP}*csv
... some more commands here ...
rm -f *${TIMESTAMP}*csv


There might be a race condition between the tar command that stores the files and the rm command that removes them.
How? some might say. I have a timestamp in the csv filename that *should* guarantee that I am removing the right files. Isn’t it?

Wrong.

First, there might be a race condition between the TAR and the RM command. The set of files that you backup and remove can be different.

If you want to actually remove the same files you have in the tar file you have two options:

  1. Make a list of the files that actually match your condition, then backup them with TAR and then remove them.
  2. If you’re using GNU tar, use the –remove-files option. This way tar removes the files as soon as they are stored in the TAR archive, thus making the operation atomic with less temporary disk space requirements than the previous version.

Morale: If your backup tool can remove files as soon as they are backed up, use that feature (some vendor call this “archive”) .

BTW: Option#1 might be done carefully to avoid another related bug. Can you see how?

5 Consejos para conseguir vuelos baratos



SPAIN Español 125x125

Llevo una temporada cogiendo el avión al mismo ritmo que el Baúl de la Piquer, Matt el Viajero, o Enrique Dans (con la diferencia que yo me pago mis vuelos, y no me los paga la empresa; pero todo se andará).

En este tiempo he aprendido un par de cosas sobre cómo conseguir vuelos baratos sin arruinarme que vale la pena compartir:

1.- Compra con Tiempo

Parece una tontería; pero en todas las compañías se agotan primero los asientos con tarifa barata antes que los asientos con tarifa cara; y no me refiero a asientos en primera clase / clase Bussiness / Avant / First / etc… : ahora las compañías aéreas tienen varios tipos de tarifa muy distintas que no suponen el estar físicamente en una zona del avión concreta.

2.- Billete con restricciones = Más Barato

Los billetes con restricciones siempre son los más baratos. ¿Qué es un billete con restricciones? uno que no permite reembolso, ni cambio de fecha. La compañía los pone más baratos porque siempre hay alguien que no los usa (ver siguiente truco), y así pueden revenderlos después las últimas 48 horas a precio de susto.

3.- ¿Dónde vas Vicente? Donde NO va la gente.

Salir de viaje en fechas en las que sale todo el mundo sale caro. Cuanta más demanda de vuelos hay, más caros son los precios:
Salir un viernes por la tarde sale caro.
Salir un domingo por la tarde sale caro.
Salir de viaje por la tarde, aprovechando que sales a las 3 porque tienes jornada contínua sale más caro que salir a las 10 de la mañana.
Verdad incuestionable: salir a la 1 de la mañana sale más barato que salir a horas decentes. Un vuelo que te devuelve a casa a las 2 de la mañana es más barato que uno que te deja a las 12 del mediodía.
En todos los destinos turísticos, los vuelos más caros son los que salen después de las 17h y te dejan en destino antes de las 24h.
Todo el mundo quiere irse de vacaciones el 15 de Agosto, volver a casa por navidad, y escaparse algún puente.

En estas situaciones, si quieres conseguir un billete barato: Compra con Antelación.

4.- ¿Te sale caro un billete de avión? ¡compra dos!

Algunas compañías (ej: compañías de bandera – Iberia, British, KLM…) hacen descuento si compras un billete de ida y vuelta en el que pasas como mínimo el fín de semana en destino (así llenan el viaje de ida de los aviones que traen a los currantes de vuelta a casa o al trabajo al principio de la semana).

Si compras con poca antelación, puede pasar que te salga más barato comprar dos billetes de ida y vuelta que uno solo. Por ejemplo:

Pepe quiere ir del 19 al 22 de Septiembre de Barcelona a Katmandu. El Billete de ida y vuelta (a horas normales) le cuesta, por ejemplom, unos 800 Euros.

Sin embargo, si pepe se compra dos billetes de ida y vuelta; uno de Katmandú a Barcelona y otro Barcelona Katmandú, puede ahorrar bastante dinero:

Pepe compra un billete de ida y vuelta Barcelona-Katmandú saliendo el 19 de Septiembre, con billete de vuelta para cuando más barato resulte (dentro de dos meses, el primer miércoles, a las tres de la mañana). Por supuesto no va a usar ese billete de vuelta. Como acceder a descuentos por pasar allí el fin de semana, le cuesta el billete 300 Euros.

Acto seguido, compra un billete i/v Katmandú-Barcelona con salida el 22 de Septiembre eligiendo también la vuelta más barata posible. Coste de la operación: 280 Euros.

Sumando los dos billetes, se ha gastado 580 Euros – 220 Euros menos que el trayecto completo.

Ojo: este truco no funciona con vuelos charter, compañías low-cost, ni con las que fijan precios por billete independientemente de si se toma ida y vuelta o no.

5.- Business al precio de Turista

Algunas compañías asignan un número de asientos en Business en función de la demanda. Y mueven la cortina de Business unas cuantas filas hacia atrás o hacia delante dependiendo de si han conseguido vender todos los asientos de Business.

Para conseguir – a precio de Turista – el espacio de un asiento en Business tienes que tratar de elegir (por ejemplo, sacando tu tarjeta de embarque por internet) el asiento más cercano a la cabina que puedas.

Yo he conseguido espacio de clase Avant pagando tarifa de turista en Spanair estando sentado en la fila 7. Eso sí, no he tenido las atenciones de Business.

Es cuestión de suerte; pero vale la pena intentarlo.

Bonus: sitos de vuelos.

De regalo, os mando las direcciones de unos cuantos sitios para ver y reservar vuelos:


UK Skyscanner 120 x 60
Skyscanner

    - el mejor para irse de viaje cuando no sabes a dónde).

  • Trabber – Busca vuelos baratos en muchas webs a la vez).
  • ITA Software – Hacen el software que usan las agencias de viajes. Bueno para ver gráficamente cuánto tardan los vuelos y si tienes conexiones muy justas entre uno y otro).
  • CheckMyTrip – Te muestra el itinerario de viaje, y tipo de avión (!) a partir de un localizador.

 

Gestión de Tecnología, Seguridad, y Negocios, por Íñigo González.